Herding Code 181: CouchDb, Cloudant, MyCouch and SisoDb with Max Thayer and Daniel Wertheim

The guys talk to Max Thayer and Daniel Wertheim about document databases, especially focusing on CouchDb and Cloudant's cloud-hosted CouchDb offering.

Download / Listen:

Show Notes:

Intro
- (00:40) Jon says he heard about Daniel because of SisoDb, a document database running on top of SQL Server. Jon and Daniel talk about what SisoDb does and why it could be useful in a "SQL Server only" shop.
- (01:55) Daniel introduced Jon to Max, who works as a developer evangelist at Cloudant and hacks on Node.js and CouchDb.
CouchDb and Cloudant basics, multi-master replication possibilities
- (02:30) Jon asks Max what Cloudant does. Max says that Cloudant is a database as a service - a hosted, managed document database based on CouchDb.
- (02:58) Max talks about multi-master replication and some of the implications, including PouchDb (which treats your browser as a CouchDb instance), and even running a node on your phone that you can replicate against. Jon's mind is blown.
- (04:30) Jon asks about the latency involved in using an HTTP database as a service. Max says that local caching helps, as well as having your database service physically close to your users (or app/web servers). Queries are always done against precomputed indexes, so query time is always logarithmic. Daniel says you can use replication to bring data as close as possible, and emphasizes the importance of in-application caching.
- (06:38) Kevin says that there have been a lot of attempts at replication based systems over the years and asks what CouchDb does differently. Max says that the big difference is in the way CouchDb handles multi-version concurrency control by keeping revision trees. This lets them run a lockless system and investigate changes later. Daniel says that consistent hashing helps with this and explains the terms. Max talks about the use of revision numbers and conflict handling.
- (09:08) Daniel says he likes that Cloudant adds clustering support, and he's excited that Cloudant is contributing this back to CouchDb. Max says that they've also added a new administrative interface called Photon, which they'll be contributing back.
- (10:19) Daniel asks if this means that Big Couch is being deprecated. Max says yes, and Kevin asks for more information on what Big Couch is.
Migrating to CouchDb
- (10:57) Jon asks about the migration path for applications using traditional RDBMSs to Cloudant or CouchDb. Max explains two options: the do it yourself option (uploading data as CSV's or similar) or using the WEAVE@cloud service from CloudBees.
- (14:13) Daniel asks about task of moving from traditional queries to map - reduce queries. Max talks about some of the migrations he's been a part of, and talks about the use of Lucene queries as a bridge. Daniel says that in moving to document databases you really need to think differently about how you'll consume the data, e.g. . Max talks about design documents, which store indexes, list functions.
- (16:06) Jon says that when he started looking at document databases, he found that it was also helpful to store additional data in a way that it's easy to query. Max gives an example using medical data in which you can normalize data as part of the map reduce process, so you rarely have to worry about schema. Daniel says that it's the result of the map that's being indexed. Max says that unlike some other systems like Hadoop that do the mapping in a batch, CouchDb updates indexes incrementally.
- (18:42) Jon asks how CouchDb compares to Mongo. Max says he found Mongo to be a good transitional system from relational databases because the querying was similar, but it broke down at scale.
- (19:50) Jon puts K Scott on the spot and asks how this strikes him due to his recent work with medical data on Mongo. K Scott says it's good to know CouchDb is an option if they hit scaling issues.
- (20:43) Jon asks about the process of migrating from Mongo to CouchDb. Max says he's written a script that dumps data from JSON in Mongo to be imported into CouchDb. He says that on the surface, Mongo and CouchDb store data similarly so migrating data isn't that hard - the real differences are in querying and locking.
Cloudant - features, pricing model, free account setup
- (22:21) Jon asks how Cloudant compares to other database as a service offerings. Max lists some, and Daniel mentions Iris Couch. Daniel talks about how easy it is to get started with CouchDb on Windows, then migrate to Cloudant.
- (24:12) Max talks about some of the features they've added recently, like Lucene queries and geographic querying. Max says that they contribute a lot to CouchDb.
- (25:30) Jon asks how Cloudant integrates with the hosting providers they've got listed. Max says that they work to host Cloudant servers in the same datacenters as their hosting partners.
- (27:05) Kevin asks how Cloudant charges. Max says that for dedicated clusters, it's per-node and dependant on the hosting provider since they all charge differently. For multi-tenancy, it's on a per-request and per-storage. Migrating between dedicated and multi-tenant is handled using the standard replication mechanism.
- (27:39) Jon asks about the process of getting started with the free level. Max explains how it works and says they'll only charge you if you exceed $5 per month, which is a good amount of use. Daniel says it takes less than 5 minutes to get started.
Client libraries and MyCouch
- (29:03) Kevin asks if there are client libraries for most libraries. Max says there are, but most are just adapted from the CouchDb libraries.
- (29:43) Daniel built MyCouch as a purely async library that doesn't hide the domain knowledge of CouchDb. Jon asks about the overall flow of using the MyCouch NuGet package to get started. Daniel says he's use Portable Class Library support to cover the different the different platforms.
- (31:53) Jon asks about the Query.Configure interface to build a query.
- (32:44) Jon asks about the history of Daniel's interest in CouchDb and MyCouch.
Migrating from Cloudant to in-house CouchDb
- (33:20) Kevin asks if Cloudant is a hosted version of CouchDb or a fork. Max says that currently it's a fork, but they contribute a lot back.
- (33:43) Kevin then asks about what would be required in bringing a Cloudant-hosted application back in-house to run under vanilla CouchDb. Max say that in addition to losing the managed / hosted value, you'd lose Lucene querying and (soon) the geo-indexing features. Daniel also points out Cloudant's clustering support.
Questions from Twitter
- (34:55) Rob Sullivan asks Daniel how working with SisoDb and CouchDb affect the way he views document databases and RDBMSs. Daniel says that he doesn't want to see an ORM anymore and he's noticed that a lot of people are creating hierarchical document structures in SQL Server when a document database would be a better fit. He says that there's a little less safety in distributed document databases, and you just have to get used to working with that. Kevin asks about some of the application strategies people use to deal with that. Max says that CouchDb provides ACIDity at the document level, so as long as you wrap your transactions into a single document you're fine. This leads to event sourcing, in which all your transactions are handled as separate documents.
- (39:23) Steve Strong asks about the offline story to synchronize change changes to a web client. Max talks about PouchDb and how it works in web clients with intermittent data access.
- (41:31) Jon asks if PouchDb and CouchDb could be used in peer-to-peer systems. Max says this is something he's profoundly interested in. He's done some conference talks about it and has a project called Quilter which is aimed at feature parity with Dropbox but with full user control, security and privacy by eliminating centralized network infrastructure. Daniel asks if it's NSA-safe, and Max talks about how you can protect things using HTTPS and friend / reputation systems
Erlang
- (46:13) Kevin asks what it's like working with CouchDb's code, since a lot of it's written in Erlang. Max says that it's built around building effective distributed systems since it incorporates fault handling. Daniel talks about the low memory footprint and Max talks about the ability to pass native Erlang messages over arbitrary protocols including HTTP.
- (49:02) Jon asks where to learn more about Erlang. Max points out the book (available online) called Learn You Some Erlang For Great Good.
- (49:44) Jon asks about what it's like to integrate Erlang into parts of an application.
CouchDb vs. Mongo
- (50:51) Kevin asks why Mongo gets more press than CouchDb. Max says that Mongo has a similar interface to traditional RDBMSs, but a lot of it's just been a marketing victory. He talks about some unappreciated CouchDb advantages, like the fact that it's got a built-in REST interface. He also says that CouchDb scales better than Mongo due to technical differences such as multi-version concurrency control.

Show Links:

Max Thayer (blog, github, twitter)
Daniel Wertheim (blog, github, twitter)
Cloudant
PouchDb
MyCouch on GitHub / MyCouch on NuGet
WEAVE@cloud migration solution from CloudBees
Iris Couch
Quilter
SisoDb
Learn You Some Erlang For Great Good