Herding Code 185: Glenn Block on Splunk

At NDC Jon and K. Scott talk to Glenn Block about Splunk.

Show Notes:

(00:40) Jon asks Glenn what Splunk does. Splunk has a product that gathers operational intelligence. It’s got a data analytics platform which understands a lot of log formats. It can handle streaming logs and has a bunch of API’s. It can index in realtime, handles unstructured data, and has some advanced pattern matching features.
(02:12) Glenn talks about some common uses. GitHub and Target both use Splunk. It’s especially liked by IT Admins who can query across multiple servers by timeslice in realtime. There’s a customizable dashboard to surface the information.
(03:24) Glenn says that since Splunk has a powerful API, you can push data into it. You can push data in using HTTP or TCP.
(04:01) You can teach Splunk to fetch data from a source using their app platform. Glenn talks about an Azure app he built for Windows Azure Web Sites diagnostics.
(05:39) Splunk is available in the cloud, but it’s often run on premises. It’s cross-platform. It doesn’t store the data, it just indexes it.

(06:44) Glenn says the pricing is based on data throughput. They have a free license that gives you 500MB/day, a developer license that gives you 10GB/day for a limited time, a free cloud product called Splunk Storm which gives you 20GB/application for a 30 days, and a new enterprise product called Splunk Cloud running in AWS. The enterprise cloud product is especially useful for AWS hosted apps.
(08:20) Jon asks if there’s a planned cloud hosted offering for Windows Azure. Glenn says he’s pushing for it, but in the meantime it’s pretty easy to install it yourself.
(08:58) K. Scott asks about what he’d see if he used Glenn’s Azure app on a Windows Azure Web Site. Glenn lists some of the data and sources.

(10:03) K. Scott asks about the process of writing a Splunk app. Glenn talks about all the language specific SDK’s they support and describes the process.
(11:20) K. Scott asks how they support so many languages in Splunk. Glenn says it’s pretty Unixy in that it works with streams, so all the language specific SDK’s work with that.

(12:25) Jon asks about some real world examples of things people are monitoring. Glenn talks about a recent DSL-like feature called data models, which allows business analysts to search through the data, and graphically pivot on it. One of the places people use that is for monitoring the entire dev lifecycle. Security auditing is a huge use case. 50% of the Fortune 100 uses Splunk. Glenn gives an example of how one of his co-workers wrote a Node app using Firebase’s bus feed to show a realtime map with bus location.
(16:00) Jon says this seems to blur the lines between logs and event sourcing. Glenn says it’s not just a log platform, and works really well with evented data.

(16:44) Jon asks what technologies it runs on, and if it’s using Hadoop. Glenn says Hadoop’s great, but not for realtime. They do have a product called Hunk which can access Hadoop HDFS information, though. It’s mostly C++ and Python (Django). They’ve recently rolled out an app frameowrk which makes it easy to customize Splunk using Django. There’s no database, since Splunk really just maintains indexes to data from other sources.

(19:25) Jon asks Glenn what he does in his free time. Glenn talks about the book he (and friends) are just finishing, called Designing Evolvable Web APIs with ASP.NET. It focuses on building a real system using hypermedia using ASP.NET Web API.
(20:35) Jon asks about versioning: are they using headers, URLs, etc.? Glenn says their argument is based on using additional media types and hypermedia. Hypermedia makes it easier to evolve your API because your clients are following links, not using hardcode URLs.
(22:15) Jon says hypermedia sounds great, but developers often want to follow defined links. Glenn says he doesn’t think it as a magical automaton, but both developers and code can look for new links as they’re added.
(23:40) Jon says it’s harder to evolve APIs if you’re thinking RPC style, but once you’re focused on resouces it’s easier. Glenn says this pattern has worked great for the web – clients just ignore things they don’t understand. Jon and Glenn say this is similar also to the move from relational databases to document databases.
(24:30) Glenn says it’s exciting to finally see some hypermedia APIs coming out: PayPal, GitHub, Amazon’s streaming APIs, and NPR’s recent API updates based on hypermedia.
(25:30) Glenn says the book doesn’t try to convince you that this is the only way, just shows the benefits. K. Scott says this sounds really useful to move from the theoretical to some concrete examples.

Show Links: