<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Consistent hashing</title>
	<atom:link href="http://michaelnielsen.org/blog/consistent-hashing/feed/" rel="self" type="application/rss+xml" />
	<link>http://michaelnielsen.org/blog/consistent-hashing/</link>
	<description></description>
	<lastBuildDate>Mon, 07 May 2012 21:35:26 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
	<item>
		<title>By: Michael Nielsen</title>
		<link>http://michaelnielsen.org/blog/consistent-hashing/comment-page-1/#comment-30818</link>
		<dc:creator>Michael Nielsen</dc:creator>
		<pubDate>Sun, 26 Jun 2011 00:28:02 +0000</pubDate>
		<guid isPermaLink="false">http://michaelnielsen.org/blog/?p=613#comment-30818</guid>
		<description>@ZD - thanks for pointing this out, it&#039;s now fixed.

(Funny, some people have trouble with left and right, which I find very instinctive.  But I always have to pause to think about clockwise versus counterclockwise, and I sometimes mess them up.)</description>
		<content:encoded><![CDATA[<p>@ZD &#8211; thanks for pointing this out, it&#8217;s now fixed.</p>
<p>(Funny, some people have trouble with left and right, which I find very instinctive.  But I always have to pause to think about clockwise versus counterclockwise, and I sometimes mess them up.)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ZD</title>
		<link>http://michaelnielsen.org/blog/consistent-hashing/comment-page-1/#comment-30817</link>
		<dc:creator>ZD</dc:creator>
		<pubDate>Sat, 25 Jun 2011 16:45:41 +0000</pubDate>
		<guid isPermaLink="false">http://michaelnielsen.org/blog/?p=613#comment-30817</guid>
		<description>Found your post via &quot;crazy websurfer&quot; logic via &lt;a href=&quot;http://www.reddit.com/r/programming/comments/i8k4l/write_your_first_mapreduce_program_in_20_minutes/&quot; title=&quot;Write your first mapreduce program in 20 minutes&quot; rel=&quot;nofollow&quot;&gt;Reddit&lt;/a&gt;.

In your text you say &quot;counterclockwise&quot; for your first example, but then proceed to distribute the key to the next &lt;i&gt;clockwise&lt;/i&gt; machine, machine 1; is this a mistake or am I missing something?</description>
		<content:encoded><![CDATA[<p>Found your post via &#8220;crazy websurfer&#8221; logic via <a href="http://www.reddit.com/r/programming/comments/i8k4l/write_your_first_mapreduce_program_in_20_minutes/" title="Write your first mapreduce program in 20 minutes" rel="nofollow">Reddit</a>.</p>
<p>In your text you say &#8220;counterclockwise&#8221; for your first example, but then proceed to distribute the key to the next <i>clockwise</i> machine, machine 1; is this a mistake or am I missing something?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pregel &#124; DDI</title>
		<link>http://michaelnielsen.org/blog/consistent-hashing/comment-page-1/#comment-30784</link>
		<dc:creator>Pregel &#124; DDI</dc:creator>
		<pubDate>Fri, 17 Jun 2011 21:20:50 +0000</pubDate>
		<guid isPermaLink="false">http://michaelnielsen.org/blog/?p=613#comment-30784</guid>
		<description>[...] To implement Pregel on a cluster, we need a way of assigning vertices to different machines and threads in the cluster. This can be done using a hashing scheme, as was done in the code above to assign vertices to different worker threads. It can also be done using other approaches, if desired, such as consistent hashing. [...]</description>
		<content:encoded><![CDATA[<p>[...] To implement Pregel on a cluster, we need a way of assigning vertices to different machines and threads in the cluster. This can be done using a hashing scheme, as was done in the code above to assign vertices to different worker threads. It can also be done using other approaches, if desired, such as consistent hashing. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: SQL Server Scale Out Solutions &#124; Brent Ozar PLF</title>
		<link>http://michaelnielsen.org/blog/consistent-hashing/comment-page-1/#comment-30726</link>
		<dc:creator>SQL Server Scale Out Solutions &#124; Brent Ozar PLF</dc:creator>
		<pubDate>Wed, 08 Jun 2011 13:00:54 +0000</pubDate>
		<guid isPermaLink="false">http://michaelnielsen.org/blog/?p=613#comment-30726</guid>
		<description>[...] of evenly distributing data (roughly) across that fixed length output value. This is usually called consistent hashing. However the consistent hashing is generated, it&#8217;s easy to divide the total range of hashed [...]</description>
		<content:encoded><![CDATA[<p>[...] of evenly distributing data (roughly) across that fixed length output value. This is usually called consistent hashing. However the consistent hashing is generated, it&#8217;s easy to divide the total range of hashed [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Nielsen</title>
		<link>http://michaelnielsen.org/blog/consistent-hashing/comment-page-1/#comment-30320</link>
		<dc:creator>Michael Nielsen</dc:creator>
		<pubDate>Wed, 23 Feb 2011 23:08:02 +0000</pubDate>
		<guid isPermaLink="false">http://michaelnielsen.org/blog/?p=613#comment-30320</guid>
		<description>I don&#039;t see why more replicas will imply more rehashing.  It&#039;s true that more machines will be involved in the rehashing, but each machine will rehash fewer keys, and the total number of keys rehashed will be the same.</description>
		<content:encoded><![CDATA[<p>I don&#8217;t see why more replicas will imply more rehashing.  It&#8217;s true that more machines will be involved in the rehashing, but each machine will rehash fewer keys, and the total number of keys rehashed will be the same.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Santiago</title>
		<link>http://michaelnielsen.org/blog/consistent-hashing/comment-page-1/#comment-30317</link>
		<dc:creator>Santiago</dc:creator>
		<pubDate>Wed, 23 Feb 2011 15:34:37 +0000</pubDate>
		<guid isPermaLink="false">http://michaelnielsen.org/blog/?p=613#comment-30317</guid>
		<description>Hello  Michael. I&#039;ve been thinking about the replicas &quot;dilema&quot;.

It&#039;s obvious that more replicas will make the keys to &quot;distribute&quot; better. Here you can find a pretty straightforward chart: http://www.lexemetech.com/2007/11/consistent-hashing.html

The thing is that, if you have to add or remove a node, then, more replicas will imply more rehashing between the nodes. A big number of replicas would mess the consistent hashing &quot;concept&quot;.

What do you think?</description>
		<content:encoded><![CDATA[<p>Hello  Michael. I&#8217;ve been thinking about the replicas &#8220;dilema&#8221;.</p>
<p>It&#8217;s obvious that more replicas will make the keys to &#8220;distribute&#8221; better. Here you can find a pretty straightforward chart: <a href="http://www.lexemetech.com/2007/11/consistent-hashing.html" rel="nofollow">http://www.lexemetech.com/2007/11/consistent-hashing.html</a></p>
<p>The thing is that, if you have to add or remove a node, then, more replicas will imply more rehashing between the nodes. A big number of replicas would mess the consistent hashing &#8220;concept&#8221;.</p>
<p>What do you think?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Nielsen</title>
		<link>http://michaelnielsen.org/blog/consistent-hashing/comment-page-1/#comment-29761</link>
		<dc:creator>Michael Nielsen</dc:creator>
		<pubDate>Thu, 06 Jan 2011 13:11:53 +0000</pubDate>
		<guid isPermaLink="false">http://michaelnielsen.org/blog/?p=613#comment-29761</guid>
		<description>HA: In the formula with each node responsible for at most (1+e)K/N keys, with e = O(log N), where does the number of replicas appear?  The variation should decrease as the number of replicas increases.  Do you have a bound on a constant c such that e &lt; c log N?  Without knowing the constant, I don&#039;t see how to tell whether this rule applies or not.

Unfortunately, I haven&#039;t done a detailed analysis of the relationship between the number of replicas and the variation.</description>
		<content:encoded><![CDATA[<p>HA: In the formula with each node responsible for at most (1+e)K/N keys, with e = O(log N), where does the number of replicas appear?  The variation should decrease as the number of replicas increases.  Do you have a bound on a constant c such that e < c log N?  Without knowing the constant, I don&#8217;t see how to tell whether this rule applies or not.</p>
<p>Unfortunately, I haven&#8217;t done a detailed analysis of the relationship between the number of replicas and the variation.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: HA</title>
		<link>http://michaelnielsen.org/blog/consistent-hashing/comment-page-1/#comment-29688</link>
		<dc:creator>HA</dc:creator>
		<pubDate>Thu, 06 Jan 2011 03:34:18 +0000</pubDate>
		<guid isPermaLink="false">http://michaelnielsen.org/blog/?p=613#comment-29688</guid>
		<description>With increased replicas, it gives a more uniform load balance of keys across machines. This is the distribution with 20 replicas, 8 machines, 256 keys:
Machine 0 has 31 keys
Machine 1 has 33 keys
Machine 2 has 41 keys
Machine 3 has 26 keys
Machine 4 has 25 keys
Machine 5 has 35 keys
Machine 6 has 36 keys
Machine 7 has 29 keys

Is there a rule of thumb on how to choose number of replicas to get good load balancing of keys (for instance, less than 2X difference across machines) ?

Chord paper (from SIGCOMM &#039;01) mentions the following about consistent hashing - For any set of N nodes and K keys, with high probability each node is responsible for at most (1+e)K/N keys with a bound of e=O(logN). And e can be reduced to an arbitrarily small constant by having each node run O(logN) replicas. 
But that doesn&#039;t seem to apply for the example I&#039;m using. Do you have any inputs on why that is? 

Thanks!</description>
		<content:encoded><![CDATA[<p>With increased replicas, it gives a more uniform load balance of keys across machines. This is the distribution with 20 replicas, 8 machines, 256 keys:<br />
Machine 0 has 31 keys<br />
Machine 1 has 33 keys<br />
Machine 2 has 41 keys<br />
Machine 3 has 26 keys<br />
Machine 4 has 25 keys<br />
Machine 5 has 35 keys<br />
Machine 6 has 36 keys<br />
Machine 7 has 29 keys</p>
<p>Is there a rule of thumb on how to choose number of replicas to get good load balancing of keys (for instance, less than 2X difference across machines) ?</p>
<p>Chord paper (from SIGCOMM &#8217;01) mentions the following about consistent hashing &#8211; For any set of N nodes and K keys, with high probability each node is responsible for at most (1+e)K/N keys with a bound of e=O(logN). And e can be reduced to an arbitrarily small constant by having each node run O(logN) replicas.<br />
But that doesn&#8217;t seem to apply for the example I&#8217;m using. Do you have any inputs on why that is? </p>
<p>Thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Nielsen</title>
		<link>http://michaelnielsen.org/blog/consistent-hashing/comment-page-1/#comment-29578</link>
		<dc:creator>Michael Nielsen</dc:creator>
		<pubDate>Wed, 05 Jan 2011 16:52:03 +0000</pubDate>
		<guid isPermaLink="false">http://michaelnielsen.org/blog/?p=613#comment-29578</guid>
		<description>HA: What happens when you increase the number of replicas, say to 20?</description>
		<content:encoded><![CDATA[<p>HA: What happens when you increase the number of replicas, say to 20?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: HA</title>
		<link>http://michaelnielsen.org/blog/consistent-hashing/comment-page-1/#comment-29489</link>
		<dc:creator>HA</dc:creator>
		<pubDate>Wed, 05 Jan 2011 00:08:16 +0000</pubDate>
		<guid isPermaLink="false">http://michaelnielsen.org/blog/?p=613#comment-29489</guid>
		<description>Thanks for sharing your program. I tried it out and found that the distribution of keys to machines isn&#039;t balanced for N keys that are number 0...N

For example, for 8 machines and 4 replicas, if we map 256 keys numbered 0..255 using this, we get following distribution of keys to machines:
Machine 0 has 18 keys
Machine 1 has 66 keys
Machine 2 has 39 keys
Machine 3 has 44 keys
Machine 4 has 11 keys
Machine 5 has 21 keys
Machine 6 has 27 keys
Machine 7 has 30 keys

There is as much as 6X difference in number of keys assigned to machines. Am I missing something? How can the load balancing of keys across machines be made more uniform?</description>
		<content:encoded><![CDATA[<p>Thanks for sharing your program. I tried it out and found that the distribution of keys to machines isn&#8217;t balanced for N keys that are number 0&#8230;N</p>
<p>For example, for 8 machines and 4 replicas, if we map 256 keys numbered 0..255 using this, we get following distribution of keys to machines:<br />
Machine 0 has 18 keys<br />
Machine 1 has 66 keys<br />
Machine 2 has 39 keys<br />
Machine 3 has 44 keys<br />
Machine 4 has 11 keys<br />
Machine 5 has 21 keys<br />
Machine 6 has 27 keys<br />
Machine 7 has 30 keys</p>
<p>There is as much as 6X difference in number of keys assigned to machines. Am I missing something? How can the load balancing of keys across machines be made more uniform?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

