<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: CouchDB 0.9.0 bulk document post performance</title>
	<atom:link href="http://www.atypical.net/archive/2009/05/12/couchdb-090-bulk-document-post-performance/feed" rel="self" type="application/rss+xml" />
	<link>http://www.atypical.net/archive/2009/05/12/couchdb-090-bulk-document-post-performance</link>
	<description>the life and times of Joan Touzet</description>
	<lastBuildDate>Thu, 14 Jan 2010 23:21:53 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: John P. Wood &#187; CouchDB: Databases and Documents</title>
		<link>http://www.atypical.net/archive/2009/05/12/couchdb-090-bulk-document-post-performance/comment-page-1#comment-114942</link>
		<dc:creator>John P. Wood &#187; CouchDB: Databases and Documents</dc:creator>
		<pubDate>Fri, 10 Jul 2009 14:34:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.atypical.net/?p=615#comment-114942</guid>
		<description>[...] virtually impossible for multiple CouchDB instances to generate the same id. However, I have read articles on the web indicating that this is a very slow operation, so you may want to avoid letting CouchDB [...]</description>
		<content:encoded><![CDATA[<p>[...] virtually impossible for multiple CouchDB instances to generate the same id. However, I have read articles on the web indicating that this is a very slow operation, so you may want to avoid letting CouchDB [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Wohali</title>
		<link>http://www.atypical.net/archive/2009/05/12/couchdb-090-bulk-document-post-performance/comment-page-1#comment-110551</link>
		<dc:creator>Wohali</dc:creator>
		<pubDate>Tue, 26 May 2009 23:59:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.atypical.net/?p=615#comment-110551</guid>
		<description>Hi Dwight, thanks for stopping by.

 I&#039;ll take a look at MongoDB once this sprint is complete. Right now I&#039;m not &quot;shopping around&quot; for DBs, just trying to figure out how to get the most performance out of the one I&#039;ve chosen already.</description>
		<content:encoded><![CDATA[<p>Hi Dwight, thanks for stopping by.</p>
<p> I&#8217;ll take a look at MongoDB once this sprint is complete. Right now I&#8217;m not &#8220;shopping around&#8221; for DBs, just trying to figure out how to get the most performance out of the one I&#8217;ve chosen already.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dm_mongo</title>
		<link>http://www.atypical.net/archive/2009/05/12/couchdb-090-bulk-document-post-performance/comment-page-1#comment-110543</link>
		<dc:creator>dm_mongo</dc:creator>
		<pubDate>Tue, 26 May 2009 21:26:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.atypical.net/?p=615#comment-110543</guid>
		<description>Hmmm, it would be very interesting to run similar tests with MongoDB and compare the results.</description>
		<content:encoded><![CDATA[<p>Hmmm, it would be very interesting to run similar tests with MongoDB and compare the results.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Wohali</title>
		<link>http://www.atypical.net/archive/2009/05/12/couchdb-090-bulk-document-post-performance/comment-page-1#comment-110024</link>
		<dc:creator>Wohali</dc:creator>
		<pubDate>Fri, 22 May 2009 00:22:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.atypical.net/?p=615#comment-110024</guid>
		<description>Hi Daniel, thanks for stopping by.

There&#039;s definitely some conflicting advice out there for the new person interested in CouchDB. The point made in the book is an important one - namely, that you don&#039;t want a central single autoincrementing ID in a system that will have eventual consistency, lest you reinvent a single point of failure (and performance limiting).

HOWEVER.

It turns out that using the completely random UUIDs generated by CouchDB by default has a serious performance penalty, which is that it will lead to possibly worst-case performance for the underlying b+-trees used in the implementation. In the words of one of the developers (davisp):

&lt;blockquote&gt;I need to draw a picture of this at some point, but the reason is if you think of the idealized btree, when you use UUID&#039;s you might be hitting any number of root nodes in that tree, so with the append only nature you have to write each of those nodes and everything above it in the tree. but if you use monotonically increasing id&#039;s then you&#039;re invalidating the same path down the right hand side of the tree thus minimizing the number of nodes that need to be rewritten. would be just the same for monotonically decreasing as well. and it should technically work if you&#039;re updates can be guaranteed to hit one or two nodes in the inside of the tree, though that&#039;s much harder to prove.&lt;/blockquote&gt; 

So it turns out that using an incrementing (or decrementing) sequence is useful, even if that sequence is just the number of seconds since UNIX epoch. And generating a timestamp at the time of document insertion is effectively free computationally - cheaper than generating 512 bits of pseudorandom data to be sure!. But how should you deal with the points mentioned in the &lt;em&gt;Why CouchDB&lt;/em&gt; chapter? A suggestion is to prefix or suffix some identifier unique to each  instance of the DB or application server (depending on your architecture - whether you&#039;re using CouchDB-hosted applications or an external application server architecture like RoR or Django). Now you can be sure you&#039;ll never have duplicate IDs, you can be sure that grabbing documents &quot;near each other&quot; (in an assumed time series) is straightforward, and that you don&#039;t have a singleton design pattern bottleneck with a centrally determined sequence.

In the short time I&#039;ve been on #couchdb this has come up a few times. Perhaps I should start a new post on this specific topic, or post this on the CouchDB wiki...</description>
		<content:encoded><![CDATA[<p>Hi Daniel, thanks for stopping by.</p>
<p>There&#8217;s definitely some conflicting advice out there for the new person interested in CouchDB. The point made in the book is an important one &#8211; namely, that you don&#8217;t want a central single autoincrementing ID in a system that will have eventual consistency, lest you reinvent a single point of failure (and performance limiting).</p>
<p>HOWEVER.</p>
<p>It turns out that using the completely random UUIDs generated by CouchDB by default has a serious performance penalty, which is that it will lead to possibly worst-case performance for the underlying b+-trees used in the implementation. In the words of one of the developers (davisp):</p>
<blockquote><p>I need to draw a picture of this at some point, but the reason is if you think of the idealized btree, when you use UUID&#8217;s you might be hitting any number of root nodes in that tree, so with the append only nature you have to write each of those nodes and everything above it in the tree. but if you use monotonically increasing id&#8217;s then you&#8217;re invalidating the same path down the right hand side of the tree thus minimizing the number of nodes that need to be rewritten. would be just the same for monotonically decreasing as well. and it should technically work if you&#8217;re updates can be guaranteed to hit one or two nodes in the inside of the tree, though that&#8217;s much harder to prove.</p></blockquote>
<p>So it turns out that using an incrementing (or decrementing) sequence is useful, even if that sequence is just the number of seconds since UNIX epoch. And generating a timestamp at the time of document insertion is effectively free computationally &#8211; cheaper than generating 512 bits of pseudorandom data to be sure!. But how should you deal with the points mentioned in the <em>Why CouchDB</em> chapter? A suggestion is to prefix or suffix some identifier unique to each  instance of the DB or application server (depending on your architecture &#8211; whether you&#8217;re using CouchDB-hosted applications or an external application server architecture like RoR or Django). Now you can be sure you&#8217;ll never have duplicate IDs, you can be sure that grabbing documents &#8220;near each other&#8221; (in an assumed time series) is straightforward, and that you don&#8217;t have a singleton design pattern bottleneck with a centrally determined sequence.</p>
<p>In the short time I&#8217;ve been on #couchdb this has come up a few times. Perhaps I should start a new post on this specific topic, or post this on the CouchDB wiki&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: daniel</title>
		<link>http://www.atypical.net/archive/2009/05/12/couchdb-090-bulk-document-post-performance/comment-page-1#comment-110009</link>
		<dc:creator>daniel</dc:creator>
		<pubDate>Thu, 21 May 2009 20:06:36 +0000</pubDate>
		<guid isPermaLink="false">http://www.atypical.net/?p=615#comment-110009</guid>
		<description>I *just* started looking at CouchDB today. I notice you mention using a sequential number.

http://books.couchdb.org/relax/why-couchdb

There&#039;s a section on &quot;Auto Increment IDs&quot;. They seem to have a justification for doing what they&#039;ve done.</description>
		<content:encoded><![CDATA[<p>I *just* started looking at CouchDB today. I notice you mention using a sequential number.</p>
<p><a href="http://books.couchdb.org/relax/why-couchdb" rel="nofollow">http://books.couchdb.org/relax/why-couchdb</a></p>
<p>There&#8217;s a section on &#8220;Auto Increment IDs&#8221;. They seem to have a justification for doing what they&#8217;ve done.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
