Herding Code 159: Catching up with Oren Eini on RavenDB

This week on Herding Code, the guys talk with Oren Eini (a.k.a. Ayende Rahien) about what’s new with RavenDB.

Download / Listen:

Herding Code 159: Catching up with Oren Eini on RavenDB [audio://herdingcode.com/wp-content/uploads/HerdingCode-0159-RavenDB.mp3]

Show Notes:

  • (00:47) Introduction and review of document databases and RavenDB
    • Oren gives us a quick overview of document databases and RavenDB
    • Relational databases work for the kind of applications we were building in the early ’90’s. We can kind of make them work in our current applications but it takes too much work.
    • RavenDB is a document database which stores JSON documents.
    • JSON documents can store arbitrarily complex data very easily.
  • (04:35) Comparing accuracy and data consistency between document databases and relational databases
    • Jon asks about Oren’s comments on a recent .NET Rocks podcast in which he said that document databases allow us to be more correct than relational databases.
    • Oren gives a real life example of how an update to a customer’s financial information caused a change to her historical record, which caused some real problems.
    • Jon talks about some of the hoops we jump through in an attempt to maintain historical data in a relational database, e.g. soft deletes.
  • (08:42) Disk space concerns
    • Scott K says he hears DBA’s worry about disk space due to data repetition between documents and asks what other concerns people bring up.
    • Oren says there can be more computation and indexing, but on the other hand temporal data is orders of magnitude easier.
    • Data design principles were established back when space was expensive, that’s all changed now.
    • Oren says he hears people say that space isn’t cheap in the enterprise, but runs some numbers and concludes they’re either very inefficient or someone’s got their hand in the till. Scott K says that enterprises data storage is often expensive because they’re not tiering their data correctly to put low priority data on cheaper storage.
    • Oren says enterprises drive up storage costs by due to foolish backup strategies.
  • (14:42) Query and performance benefits
    • Scott K says that people often view document databases as a giant blob of text rather than structured data which can be searched, indexed, etc.
    • Oren says that you get full text search for free in RavenDB.
    • In relational databases, you’re always working with the very latest data, so you have locks, readers waiting for writers, etc.
    • RavenDB does a lot of precomputation in the background, so it can give you aggregate information immediately.
  • (17:27) RavenDB 2.0 release overview
    • Big improvements to performance on some key codepaths, in some cases over 1000%.
    • Support for JavaScript scripts on the server, which allows for scenarios like mass migrations and batching support on the server.
    • Management UI improvement, better management API coverage, performance counters, etc.
    • Dev improvements – sharding support, full support for async.
  • (19:40) 2.01 release overview
    • Files some rough spots in the 2.0 release – things that beta testers didn’t mind, but can be a little smoother.
    • They added a new feature – improves support for replicating to a relational database.
  • (22:05) Sharding improvements and migrations
    • Sharding’s been around since the beginning, but required you to specify a lot of things – lots of options, too much complexity, too many important decisions early in the development process.
    • Sharding support has been revamped – provide the endpoints, defaults take care of the rest.
    • Oren gives an example with sharding customer data. By default, documents are sharded together based on transaction id. You can specify a shard when you save based on a user specified id.
    • Some people have problems with the default approach because the document id includes the shard id. That’s necessary to prevent having to query all shards.
    • Jon asks how this works over time if you need to add shards, migrate data, etc. Oren says you can rebalance by biasing new data towards a newly added shard.
    • If you need to move data to a new server – for instance, a customer becomes large enough that you want to put all of their documents on a new shard, you’ve got two options for handling the id’s. Oren says some users migrate data, rewriting id’s during the process, but he doesn’t recommend that. Instead, he recommends using a sharding function which allows remapping document id’s to a new shard without changing id’s.
    • Jon obviously doesn’t get it and asks the same question again, also asking how you handle data modifications over time. Oren explains that you can just write a JavaScript function to update your existing documents if needed.
    • Kevin asks how long data a data migration takes. Oren types one up on the fly and explains the parsing and execution time.
  • (34:43) Time for some random questions!
    • Scott K notes that there’s a client that runs on Mono and asks if there are plans to get the server running on Mono. Oren talks about the general plan to handle that, but says it’s not high on the priority list.
    • (35:48) Scott K asks about compact scenarios, including clients that run on mobile and embedded instances that run locally. Oren notes that  clients are easy, because anything that can make a REST call can be a client. They had an embedded version that had very little interest.
    • (38:03) In disconnect scenarios, it’s usually simpler to cache JSON documents locally.
    • (39:10) Jon asks about merge support for occasionally connected scenarios. Oren says that’s intentionally not included.
    • (41:25) Jeremy Miller (@jeremydmiller) asks when Oren is going to fix Lucene.net’s flow control via exception madness. Oren says it’s not planned, and that Jeremy should ignore those exceptions.
    • (42:25) Philip (@autosnak) asks why RavenDB doesn’t do more for startups and small biz pricing-wise. Oren explains the offers they make available – open source is free, RavenDB basic edition is $5 / month, they donate a lot of license for a lot of other cases, and even the full versions are incredibly cheap compared with any other database. Shoot him an e-mail.
    • (44:44) Chris Whellams (@chriswillems) asks how to sell NoSQL and RavenDB to IT management and bosses that are addicted to SQL Sever. Oren outlines a strategy – start with a persistent viewmodel cache on a slow page to get a quick win, then use it for simple storage of ancillary application data (e.g. preferences), then use it in a spike on a new project. This is exactly what the MSNBC team did – they started with a non-operating RavenDB node in production, then slowly moved some things in without taking on any unnecessary risk.
    • (42:50) Jon asks for any closing thoughts. Oren says they’re starting on some weekly webinars for RavenDB users – or just if you’re curious about it. There’s a RavenDB course in the US in May.
    • FIN!

Show Links: