Herding Code 217: Nick Craver on Stack Overflow Engineering

The guys talk to Nick Craver about all the magic behind the scenes at StackExchange global headquarters.

Download / Listen: Herding Code 217: Nick Craver on Stack Overflow Engineering

Show Notes:

  • Hello
  • Architecture / Network
    • (01:58) Jon notes that they’re using Windows and CentOS and asks why CentOS as opposed to other Linux flavors. Nick says they tried Ubuntu first, but it’s more tuned for clients. CentOS is a variant of Red Hat Enterprise Linux, so all the packages work.
    • (02:53) Nick says that they use whatever’s the best tool for each job – factoring in the costs of each new tool (training, migrating, supporting, vendor overhead, etc.). They run Elastic Search, Redis, HAProxy and Logstash.
    • (03:50) Jon asks about how they’re using protobuf to serialize the information they’re persisting Redis. Nick talks about some specifics, including the different levels of caching that they’re doing. They’re using pub / sub with Redis via websockets. They’re not clustering, partly because there’s one Redis cache per SE site. He and Jon discuss how multitenant scenarios often require custom implementations.
    • (07:20) Jon asks about how they’re using websockets. Nick says that they’re used in a lot of places for optional updates. Their problems come from running on very few servers – they end up with huge connection tables per server.
    • (08:23) Jon asks about one of their Redis instances that’s handling machine learning with Providence. Nick says they log some metadata and performance information about about every single request. Providence is a system their data team wrote that analyzes the data, figures out locations, suggested user tags, etc. As a user, you have control over the personalization and can download the data or disable recommendations if you want. There’s also a mobile feed, for mobile apps as well. There are 40k ops/sec all day long against Redis. Scott K asks if he can manipulate his feed.
    • (11:51) Scott K asks about the L1 and L2 cache that Nick’s talked about. Nick clarifies that he’s been referring to HTTP caching on the web server and Redis caching.
  • Tag Engine
    • (12:36) Nick talks about the tag engine and explains how it’s a lot more complex than people think since it handles some complex queries, needs to sort the results, has minimum question score filters, etc. Every two minutes, the tag engine refreshes all the deltas based on row versions in the SQL Server database.  Their previous implementations were saturating the CPU L3 cache. They wrote some affinity code to split things out so Providence and Stack Server run on different processors. Marc Gravell is now working on getting the tag engine working on GPU’s (demo code). Nick said that the difference between a $500 to a $5000 card is only about a 40% improvement. They’re really excited about upcoming graphics cards with 8GB memory, since that will allow them to fit the entire tag engine in memory on the graphics card; with two cards they’ll be able to run active-active and active-passive configurations (for live upgrades) for only about $600 per card. Jon says he saw a question on Hacker News asking if it could just be handled by a bitmap index, but Nick explains how more complicated queries and sorting make that not a good solution. Jon asks about the GPGPU split between C# and C++ code. Nick explains how it’s almost all handled via C# code, but it’s missing a few newer CUDA commands.
    • (19:48) Jon asks about the Elastic Search implementation. Elastic Search doesn’t really support types, it has field groupings, which makes the upgrade more difficult. Nick explains that things are pretty vanilla now, but they’d like to make some customizations to support nested search results when time permits.
  • Data and SQL Server
    • (21:03) Jon asks about their SQL Server implementation. Nick talks about the clustering setup.
    • (21:49) Jon asks what version of SQL Server they’re using. Nick says they’re currently running the latest version of SQL Server 2014 and will move to SQL Server 2016 as soon as it releases. [Note: SQL Server 2016 has since released and they’ve upgraded.]
    • (22:11) Nick talks about some of the top reasons they’re looking forward to SQL Server 2016: string_split and JSON parsing. These are both useful for queries that take a list as a parameter. Jon reminisces about a time long ago when he used XML to pass lists to SQL queries.
    • (23:32) Kevin asks if they’re able to do a piecemeal migration without downtime. Nick explains how they do upgrades using replicas. They can test on other replicas, then fail over to them, or roll back to the previous master. They hate Windows clustering, and Windows Server 2016 and SQL Server 2016 should soon support distributed affinity groups which would allow them to do simple affinity group based upgrades.
    • (26:15) Jon asks about SQL Server on Linux. Nick says he can’t really talk about it.
    • (26:35) Jon asks if they use SQL Server Hekaton / In-Memory OLTP. Nick says they don’t, they run enough memory in their database servers that it’s not needed. Nick says it’s more for high-frequency no-lock access.
    • (27:27) Jon is a little horrified that they don’t use stored procedures and do all database queries using inline SQL. Nick explains that it’s much simpler to modify queries and app code that consumes the query together rather than managing separate deployments. There’s no performance benefit to using stored procedures over inline queries.
    • (29:27) Jon asks about how they handle migrations. Nick explains how changing tables is handled via a migration file and a ten line bot.
  • Source Control, Localization, Build
    • (30:47) Jon asks about their use of GitLab. Nick says it works okay, but they’re testing GitHub Enterprise internally due to performance. GitHub is significantly faster, search works a lot better (due to using Postgres search rather than Elastic Search), and there are some nice new features in GitHub like squashing commits.
    • (32:23) Jon asks about the localization features and is educated about the ja, ru, pt and es versions of stackoverflow. Nick explains some of the different localization issues that you run into in localization. Most localization solution work by string replacement, which requires string allocations. That doesn’t work at scale. They’ve written a system called Moonspeak which uses Roslyn to precompile view. This allows a direct response.write of the localized string via switch statements, which is a lot more efficient. They haven’t had time to open source it yet, but would like to.
    • (36:36) Jon asks about their build process using MSBuild. They use it mostly because it’s what the tooling uses. They could customize it more with PowerShell, but that would tie them more to TeamCity and Nick’s not sure there’s a benefit to making that move. Nick’s waiting to see where csproj is going – he’s got some big doubts it’ll be as terse as project.json, but he’s interested to see. Nick says historically MSBuild has been optimized for three-way merge generation; Jon says that was technically Visual Studio’s fault since MSBuild actually has had glob support for a while.
  • Upcoming Technologies, Visual Studio
    • (40:35) Nick complains about how slow Visual Studio is to reload projects. Their developers have scripts that just kill and restart Visual Studio, because that’s faster than handling project reloads.
    • (41:16) Jon asks if Nick’s played with Visual Studio “15”. Nick wonders about the technology used in the installer. He says it’s generally good, but they’re running into some issues with solution files changing when moving between versions.
    • (42:33) Nick says that they generally don’t ever use File / New, they copy an existing project and rename things. There’s a discussion about whether it’s possible to customize project templates. Jon says you can export a project as a solution template; K Scott mentions that SideWaffle has some capabilities there, too, but there was some “wonkiness”. And what’s the deal with GUIDs in project and solution files?
    • (46:09) Scott K mentions a command line base project generator that he started on years ago called ProjectStarter. He wishes that it was possible to configure Visual Studio to define a custom build tool rather than assuming everything’s in csproj. He gets that Visual Studio features like IntelliSense depend on controlling the build, but doesn’t like that Visual Studio has to “know everything about everything”.
    • (48:55) Jon says he sees two ways that cross-platform can work: either make the frameworks able to work without the tools knowing and controlling everything, or updating Visual Studio Code so it’s able to know and control everything. He hopes it’s the first way.
    • (49:50) Nick complains about how sometimes in-memory builds don’t reflect changes, or csproj doesn’t save before a build. He’d like everything to save before builds. Scott K calls out the Save All The Time extension for Visual Studio that Paul Betts made.
    • (50:40) Jon asks Nick if they’ve looked at ASP.NET Core. Nick says that they’ll mostly be starting with their internal tools. They have several libs that they’ll need to port, and they’ve got some difficult problems with libraries like MiniProfiler that need to support both .NET 4.x and .NET Core because the underlying APIs have some significant differences. You can’t just multi-target code that targets things like HttpContext. Other libraries like dapper and stackexchange-redis haven’t been as bad, and they’ve been working on them because lots of other developers are depending on them.
    • (55:03) Jon calls out some of Nick’s recent C# 6 tweets. Nick says he likes null coalescing and ternaries – they see more terse code as a lot more readable, but it of course varies by team. Roslyn has been really big for them, things like Moonspeak rely on it.
  • Questions from Twitter
    • (56:23) Matt Warren asks “What performance issues have you had the most fun finding and fixing.” Nick mentions the tag engine and a fun debugging issue they ran into where the TimeSpan int constructor uses Ticks rather than seconds or milliseconds, so their cache code was only caching values for tiny fractions of a second rather than thirty seconds. They find out so many issues using MiniProfiler; he wishes more developers would use MiniProfiler (or another tool like Glimpse) in their applications. They run MiniProfiler for every single request on StackOverflow and the overhead is minimal – if they can do it, you can do it.
    • (58:17) Matt Warren asks “What the craziest thing they’ve done to increase performance.” Nick talks about the IL related work they’ve done – sometimes instead of conditional code, it’s faster to just swap out the method body. They’re pragmatic, they only do this for extreme cases like things that run for every request and have real performance implications. What’s the trick for StackOverflow? Keep it simple.

Herding Code 216: Bash on Windows, Angular 2, Surface Book, Kindle Oasis, Windows Phone

Kevin, K Scott and Jon talk about Bash on Windows, Angular 2, React, new tech devices, and whether Windows Phone is alive, dead, or undead.

Download / Listen: Herding Code 216: Bash on Windows, Angular 2, Surface Book, Kindle Oasis, Windows Phone

Show Notes:

    • Bash on Windows and Windows Insider stuff
      • (00:44) Jon mentions the Bash on Windows announcement at Build and asks if Kevin or K Scott have played with it. This devolves into a discussion of Windows Insider previews. Jon likes it and talks about the steps for enabling Windows Insider preview builds. K Scott has been scared to try it, but it sounds like he’s convinced. Kevin is put off by the Insider term – Windows Insider, Visual Studio Code Insider previews, etc. K Scott adds “Windows Insider” it to his e-mail signature.
      • (05:15) Jon talks about the steps for enabling Bash on Windows. or Windows Subsystem for Linux (Beta), to use the official terminology.
      • (07:08) Feel the excitement of listening to someone type commands into a console window as Kevin asks questions about what’s installed and Jon tries to apt-get it all. K Scott and Kevin wonder about how things like filesystem and processes work, and Jon tries to make up answers.
      • (09:50) Kevin says it feels like an admission of defeat to add *nix support to Windows. Jon says it feels practical to him – developers are building for multiple operating systems (especially including mobile), so it’s nice to have it supported.
      • (11:19) Kevin’s ready for Cygwin to die in a fire, and Jon’s excited about ssh working less horribly on Windows. Kevin says the race is on to get Wine working on Bash on Windows.
    • Angular 2 and React
      • (12:55) Jon worked on a hands on lab for Build that had master-details using Angular 2 and ASP.NET Core. He said Angular 2 seemed a lot simpler than Angular 1 now. K Scott said the component model is simpler, but he’s seeing some resistance to the ECMAScript / TypeScript updates, new binding syntax, etc. The Angular 1.5 release also includes a component model that’s a much easier programming model. It almost feels like some older Microsoft component-based programming frameworks going back to Visual Basic 6: you’re working with components that have simple properties and events. The guys speculate on how soon someone will build the big visual editor for Angular 2.
      • (16:21) Kevin says TypeScript seems like a barrier. K Scott says it’s not strictly required. He rejected TypeScript for a while, but when he was working with Angular 2 and tools and editors supported it he decided he liked it. Jon and K Scott talk about how a lot of things that throw people off about TypeScript are really just modern JavaScript syntax.
      • (18:50) The guys discuss how Angular 2 and React mindshare will play out. Jon likes React as long as he never views source. Jon thinks the unidirectional flow is really simple, and Kevin agrees – after years of lower level Backbone, the simpler flow in React saves some mental energy.
      • (20:29) Jon mentions that React Native recently came to Windows, too.
    • Devices: Surface Book, Kindle and big batteries
      • (20:56) K Scott got a new Surface Book (after waiting to make sure nothing new was coming out at Build). He says it’s the best piece of PC hardware he’s bought in years – the build quality is good, the keyboard is good, he gives the trackpad of 9 out of 10. He says that the detachable tablet is a bit large as a tablet, so he’s using an older Surface for reading.
      • (23:58) Jon jokes that K Scott’s not likely to buy a Kindle and says he gradually stopped using his Kindle when he moved to Audiobooks, and kind of associates reading with work now. Kevin says that’s sad. They talk about Kindles for kids’ books.
      • (27:07) Kevin says the two big things he picked up for the new Kindle are physical page buttons and a three month battery. He says the main thing he likes about Kindle for both him and his kids (as opposed to a tablet) is that it forces you to read instead of getting distracted. The guys decide that tripling the life of an already one month battery isn’t a huge win.
      • (29:20) Jon says he recently bought a portable battery that can recharge his laptop, which is handy for long flights. (note: he said it was iPad size, it’s a lot smaller but is 1.2 pounds)
      • (31:15) Jon asks Kevin about new Apple hardware. Kevin says the iPad Pro screen is apparently astounding – he’s expecting them to be amazing in a few generations. Same for Apple Watch – he doesn’t have one yet, he’s waiting for version two. Jon says the improvement from Microsoft Band 1 to Band 2 was pretty nice, especially in the industrial design.
    • Lightning Round: Will Windows Phone be dead in one year?
      • (33:40) K Scott asks if Windows Phone will be dead in a year. Jon hopes not, as he just bought a Lumia 950 XL. He’d had a budget Blu Windows Phone since September, and the Windows 10 Insider builds were nice, but the camera wasn’t very good. He really likes Windows 10 as a phone operating system and thinks it’s sad that so few people will actually see it.
      • (35:55) K Scott got a Lumia 950 XL in January (when he dropped something on his previous phone). He got the docking station, too, and said it worked surprisingly well. The guys discuss how useful docking a phone is; Jon postulates that it could be useful for someone who does everything on their phone and occasionally needs to write a long email or edit their resume – especially if it can be hooked up to a TV.
      • (38:30) Kevin says that Windows Phone isn’t dead. it’s undead. It will linger on in a zombie-like state indefinitely.
    • Scott Koon sends us out with a request for further information by e-mail.

Links:

Herding Code 215: Jon McCoy on .NET Security and Defensive Patterns

At NDC Oslo, Jon talked to Jon McCoy about .NET security and defensive patterns for building enterprise applications.

Download / Listen: Herding Code 215: Jon McCoy on .NET Security and Defensive Patterns

Show Notes:

    • Security Patterns
      • (0:15) Jon Galloway (henceforth in this post JG) mentions the last time Jon McCoy was on the show and asks him about his talks this time around. This time, he and Topher Timzen did a hands on attack class followed by talks on defensive patterns for enterprise applications (video: Hacking .NET(C#) Application: Building and Breaking Layered Defense).
      • (00:54) JG asks Jon if he showed off any scary hacks. Jon describes an attack in which they made an executable editable both on disk and in memory, then edited both the IL and assembly code to do things like inject direct database attacks (bypassing . He describes how that could be defended using enterprise defensible architecture by talking to databases through services which can implement security layers. The goal is to prevent an attack at one layer from moving through the rest of the enterprise.
      • (1:59) JG says that often hears that major hacks occur by web application attacks that are then escalated to database attacks, often through password reuse. Jon says that’s true, since web applications are often deployed to the same server as a database or authentication server. He recommends using a service that’s locked to a single port, with security unit tests.
      • (2:43) JG asks which patterns he’s describing are unique to .NET development. Jon says that he’s emphasizing patterns that are easy on .NET – for example,  REST services are easy to implement on .NET as compared to C++. He’s advocating architectural changes that are relatively easy to implement in .NET applications provided you start with them early on (rather than trying to retrofit security later).
    • A specific example: Protecting a medical record
      • (3:33) JG asks for some specific examples. Jon says they talked about security unit tests and user stories and gives and example from his talk about a medical record that’s being sent through an enterprise securely. To do that, you’d need to encrypt it on an edge node, so the web server and database don’t have decrypted data or decryption keys. Instead, you use a key server on a segmented network. Because of this, at no point could a sysadmin have gotten access to the record because it was encrypted at all steps.
    • "Above Admin"
      • (4:49) JG points out that Jon is talking about preventing access to sensitive data by sysadmins. Jon says that you should consider your attackers to have more power than a sysadmin – they refer to attacker privileges as "above admin" because they’ve taken over your AD infrastructure, passwords on routers, dropping new firmware, etc.
      • (5:50) JG refers to James Mickens’ keynote from the previous night (Herding Code interview here) and asks if there’s any hope. Jon says you need to plan, you need to minimize pivot points to prevent an attacker from moving between servers. He discusses potentially using static HTML that calls into secured REST services rather than web applications with direct database access, because building security around a REST service is much easier than securing a web application and web server.
    • Mobile and desktop application security
      • (9:03) JG asks about mobile application security, since he frequently hears about mobile apps that are supported by unsecured (or very poorly secured) backend services. Jon refers to Amazon and Twitter as examples of companies with published patterns for secured backend services.
      • (9:52) JG asks if for some tips on how different systems on the network should be secured, referring to desktop applications. Jon says that each system should defend itself from the other systems, so in this case the other systems should assume that the desktops could be compromised, the desktop applications should assumed the database can be compromised, there needs to be thought about defending the outgoing APIs, etc. There needs to be a plan for how to take things down and respond.
    • What do you wish you’d done?
      • (10:35) As a thought exercise, assume that your database or web server has been compromised: what do you wish you’d have done? Do that for different pieces of your application architecture. Jon says that you can make it a fun exercise, pit dev teams against each other, etc. Security should be fun and easy if you’re doing it right.
    • Wave 3D: A 3D operating system front end
      • (12:07) JG asks Jon about the 3D operating system he’d mentioned before the call. Jon describes Wave 3D, a cross platform front-end to multiple operating systems (Windows, Mac, Linux, Android) that gives the same exact experience for all backend systems. It can be connected to Amazon, Google and Dropbox document storage.
      • (12:56) Jon says it’s running on Mono and Unity and says it compares pretty well with 3D operating systems in the movies, and they’re looking at launching it via Kickstarter. It provides a simple, install-free environment with a document viewer, movie player and more on every platform instantly.

Note: This is our last podcast from NDC 2015.

Herding Code 214: Todd Gardner on Tracking JavaScript Errors with {Track:js}

At NDC Oslo, Jon talked to Todd Gardner about {Track:js} and some of the difficulties in diagnosing JavaScript errors.

Download / Listen: Herding Code 214: Todd Gardner on Tracking JavaScript Errors with {Track:js}

Show Notes:

    • What’s TrackJS?
      • (0:20) Jon asks Todd to explain TrackJS. Todd said it’s something he built to help with difficult JavaScript debugging issues a few years ago. JavaScript applications were difficult to debug because it was hard to define the different failure states of his applications. Over time, they worked to add in diagnostic state (e.g. shopping, checkout, etc.). Over time, they added in network state and user state. Over time they came up with telemetry, which defines states and state changes, which can be a lot more helpful than a stacktrace in diagnosing and fixing an error.
    • Why context is more important than stack traces
      • (2:34) Jon agrees, saying that it’s really frustrating to try to reproduce an error as a developer, and context is really important. Todd says that stack traces are limited because applications are so asynchronous in nature now – an error may be from a timeout which was a response to an Ajax event which was a response to a user click.
      • (3:44) Todd says they also track a lot of general information like browser, operating system, and things like date and time information. He once spent a really long time on an error one user was experiencing on a calendaring application because they were stuck in the ’70’s (literally).
    • How TrackJS gathers and exposes data
      • (4:34) Jon asks how developers integrate TrackJS into their applications. Todd says it’s a simple JavaScript include.
      • (4:56) Jon ask more about how he gets access to the reported information when he’s troubleshooting an error. Todd says there are three problems to solve.
        • The first problem to solve is alerting: you get daily rollups on top errors and trends.
        • Second, when you log in, you need to see which errors are which important (filtering out background noise like unsupported useragents, strange Chrome extensions that modify the DOM, etc.). You need to know which errors impact the most people, which cause users to leave, and which are top browser-specific errors.
        • Third, you need to need to be able to dig into and diagnose a specific error.
      • (6:45) Jon asks more about how he’d diagnose a specific error. Todd describes the contextual information TrackJS provides for an error (including things like browser, screen size, user information). For example, one issue they diagnosed recently only occurred at a specific screen size due to a responsive design which modified the page so certain elements were only displayed in a range of screen widths.
    • Common problem: Partial Load Failure
      • (8:14) Jon asks Todd what running the service has shown him are common problems that developers aren’t considering in building JavaScript apps. Todd says it’s partial loading failure – cases where most of the JavaScript assets on a page load, but just one fails. In one instance, they saw everything on a page but the Stripe library load, which caused payment data to be insecurely posted.
      • (10:35) Jon asks about the best way to test for and handle partial load failures.
    • Common problem: Same Origin Policy
      • (11:36) Todd explains same origin policy restrictions in loading loading libraries from other domains, and using cross-origin headers to handle that.
    • Business model and subscription overview
      • (12:30) Jon asks about the business model for TrackJS. Todd explains that they offer free service to charitable and open source projects. For paid accounts, they bill based on page views rather than error count because he feels billing based on errors creates counter-incentives for users.

Herding Code 213: Sean Trelford on Composing 3D Objects with F# and OpenGL

At NDC Oslo, Sean Trelford did a lightning talk on composing 3D objects using F# and OpenGL. Oh, and he’s 8 years old. Sean (and his dad, Phil) talk to Jon about learning coding with Small Basic and F#, and how it’s fun to learn coding by building video games.

Download / Listen: Herding Code 213: Sean Trelford on Composing 3D Objects with F# and OpenGL

Show Notes:

    • Hello
    • Sean’s Talk
      • (1:13) Jon asks Sean whether he showed slides or did live coding.
      • (1:39) Sean made an army from one 3D man model.
      • (2:09) Jon asks how difficult it is to call OpenGL from F#. Sean’s dad, Phil, explains how Sean used a domain specific language – about 50 lines of code. You can use it to create shapes like cubes and spheres, color them, transform them, etc.
      • (3:30) Phil explains how Sean used the F# REPL to create and modify 3D objects interactively, then used functional composition to put the parts (head, arms, legs) together to build the army man, the used functional composition and aggregation to create the army.
    • Learning Small Basic and F#
      • (4:37) Jon asks Sean if F# is his first computer language; Sean says he started with Small Basic. Jon asks if he found it difficult to move from Small Basic to F#, Sean and Phil explain how F# is actually a pretty natural step from Small Basic. Small Basic has no functions with parameters, so when they had a project that ran into that limitation they just moved the code from Small Basic to F# and used the Small Basic library to construct 2D shapes (leveraging WPF).
    • Fun Basic
      • (7:02) Phil said that last year he and Sean rewrote Small Basic as Fun Basic (available free in the Windows Store). It’s an IDE with IntelliSense, code completion, etc. It’s got backward compatibility with Small Basic, but includes function with arguments, pattern matching, etc.
    • Learning programming with video games
      • (8:02) Jon asks about what next for Sean. Recently they went to a game jam and made a platform game. Phil says the REPL environment is huge for game building.
      • (9:04) Jon mentions the IoT lab at NDC and asks Sean if he’s done anything with IoT; Sean says no. Jon says it seems like people try to get kids interested in coding with IoT, but maybe games are more fun to get started. Phil says that Sean’s got a Minecraft channel on YouTube. Jon asks Sean about making videos about making games.
      • (10:38) Phil says he got started on computers with games – he made his first game when he was 11, but his first commercial game was when he was 14 – Flint Eastwood.
      • (11:44) Jon asks Sean if this is his first time in Olso, and his first conference. It’s his first time in Oslo, and his first "speaking" oriented conference.
      • 11:28 Jon asks if they’ve got any last messages, maybe to encourage kids to get started with coding. Phil recommends the Hour of Code, as well as other resources online. Phil says his older son got into programming by adding levels to games, then writing scripts for AI’s, etc. Phil says that’s how he got started – looking at other peoples’ code and going from there. Don’t necessarily get hung up on fancy coding concepts, just script something and have fun!