Herding Code 209: Robert Friberg on In-memory Databases and OrigoDB

While at NDC Oslo, Jon talked to Robert Friberg about in-memory databases in general. They discussed OrigoDB, Robert’s open source in-memory database, as well as Redis and SQL Server Hekaton.

Download / Listen: Herding Code 209: Robert Friberg on In-memory Databases and OrigoDB

Show Notes:

  • Hello
    • (00:18) Jon asks Robert what his presentation was about. Robert describes his talk covering in-memory databases, comparing OrigoDB, Redis and Heketon.
  • OrigoDB and working with in-memory databases
    • (00:40) OrigoDB is Robert’s open source project. Jon asks why Robert decided to write his own in-memory database when there others available. Robert says that OrigoDB is unique, and that they’ve been working on it for a long time and there was nothing available when they started on it.
    • (01:18) Jon asks how it compares to relational databases. Robert says that when you move to in-memory, you’re no longer constrained by the need to structure your data in a way that can be easily stored on disk. You can take advantage of the random access nature of memory, and thing more graph oriented than stream oriented.
    • (02:07) Jon asks if the data is persisted and follows ACID principles. Robert says that the data is in memory in any form you like, and you log transactions when you make changes – writing the events to disk or a database.
    • (02:48) Jon asks how data is loaded when an application starts up. Robert says it loads the most recent snapshot and then replays all events that occurred after the snapshot. You can choose the serialization format – JSON, binary and Protocol Buffers (protobuf) are supported. Protocol Buffers fast, compact and interoperable.
    • (03:45) Jon asks what kinds of applications work best with OrigoDB. Robert describes the problem it solves: data access and databases are too slow, so we need to use caching to compensate for that. Traditional relational databases were a good fit when memory was scarce, but now your entire application’s data can fit in memory. Also, historically, relational databases reflected the entity model and allowing business users to run queries; now we’re mapping back and forth between models which don’t match. If you keep all the data in memory, everything’s in one place and the data access is incredibly fast. That allows you to do everything single-threaded, really quickly – on Robert’s laptop he can do 50,000 ACID transactions per second.
    • (07:35) Jon asks what the difference is between using OrigoDB and just using his own in-memory structures. Robert explains that’s how OrigoDB works – you use your own in-memory structures and do LINQ queries against them. OrigoDB adds in snapshotting, persistence, etc.
  • OrigoDB support for clustering
    • (08:22 ) Robert says that in addition to the embedded engine, they also have an out-of-process server product that supports clustering with replication, load balancing, and off-site backup.
    • (09:12) Jon asks how the clusters communicate. Robert says that it’s via TCP, inspired by SQL Server clustering.
    • (09:50) Jon asks what happens when the master goes offline. Robert says that there’s no automatic failover, but you can do manual failover using the web-based UI.
  • Other OrigoDB features: Web UI, cross-platform, cached queries
    • (10:29) Jon asks about the web UI. Robert says there’s a web-based UI that runs in the engine using Nancy. It supports some nice features, including ad-hoc queries using Razor syntax
    • (11:33) OrigoDB’s bindings for JavaScript and Protocol Buffers support cross-platform applications.
    • (12:35) You work with OrigoDB using commands and queries. You can also send in LINQ commands as text and they’ll be compiled, parameterized and cached.
  • How Much Memory?
    • (13:17) Jon asks how much memory it will take. Robert says over the past twenty years, the transactional workloads for the projects he’s worked on have all been under 200GB. You can offload your reporting data to a relational database if needed.
  • Business Model
    • (14:18) Jon asks if the server product is commercial software. Robert said they’ve tried the revenue model but haven’t had any sales. In the next release, they’re pivoting to everything free and open source and trying to build a support business.
  • Case Study
    • (15:15) Jon asks what kinds of projects he’s built with OrigoDB. Robert talks about a consulting job for a large healthcare company in Sweden. The customer was having really bad performance problems – each service would create business components, which would then create data components. Due to the business requirements, the data transactions were complex, and many were written using cursors. Robert said he traced some database use and found that a single transaction could make thousands of database round-trips. Robert did a proof of concept using simple C# collections in-memory and found they could do tens of thousands of transactions per second. Robert says that transactions in SQL Server require logging pages of data to disk, whereas logging an OrigoDB transaction is often just a few bytes since it’s just logging the command.
  • Snapshots
    • (18:18) Jon asks how many snapshots are maintained. Robert says he tries to avoid snapshots since they require a read lock. You can also use an immutable model (using multi-version cursor control).
    • (19:21) You can truncate events when you snapshot, but then you’re losing information. Robert and Jon discuss how this relates to event sourcing.
  • Other In-Memory Databases: SQL Server Hekaton, Redis, VoltDB
    • (20:02) Jon asks what Robert showed off in his talk. Robert says he normally does workshops that are a few days long, so squeezing everything into an hour is difficult. He does demonstrations showing OrigoDB, Redis and Hekaton, but his main message is that your application’s data probably fits in memory and memory is cheap.
    • (21:03) Jon asks about Hekaton. Robert explains how Hekaton works, pointing out that it supports a hybrid model in which only certain tables are in-memory. The advantage is that you can use your existing SQL Server tools, ecosystem and code.
    • (23:20) Robert mentions VoltDB, a startup that offers an in-memory, distributed relational database engine.
    • Redis is a key-value store. Most people use it as a cache, but the values in themselves are structures, so a value can be a hashtable, list, queue, sorted set, etc. There are predefined commands that kind of look like assembler.
    • (24:45) Jon says he remembers running into some objects that were difficult to serialize. Robert says that the default formatter for OrigoDB is the binary formatter, and you have to mark your objects as serializable. If you use Protocol Buffers require you to define a mapping.

Show Links:

  • Kevin Coulombe


    Orego seems like a pretty cool solution, but two worries jumped to mind as I listened to this.

    1. If we’re running C# code that is provided by the client on the database server, we’re opening a huge door. Depending on the project, I’m not sure I would be comfortable with that. This talk skipped over authentication, permissions and security altogether. I would be curious to hear more about that.

    2. Restoring a backup from the last snapshot using the operation logs will take about as long as the operations took. With a nightly snapshot and a server used at 10% average, oregodb’s worst case scenario is a 2.4 hours restore time. It can be mitigated with replication, but can a new slave be booted and copy the data from the master database without taking a read lock on it in the middle of the day?

    It is interesting to think about an in-memory database that is not just a cache for a traditional database. I can think of a few problems this would have avoided me over the years…