While K Scott and Jon were at the Microsoft MVP Global Summit, we listened in on a late night debate on NHibernate performance between Oren Eini (a.k.a. Ayende Rahein), David Penton, and Ben Scheirman.
Show Links:
Download / Listen:
Episode 38: NHibernate performance with Ayende, David Penton, and Ben Scheirman
Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.
Mar 13
This entry was posted on Friday, March 13th, 2009 at 11:51 amand is filed under interview, podcast. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
5 Comments Episode 38: NHibernate performance with Ayende, David Penton, and Ben Scheirman
Reflective Perspective - Chris Alcock » The Morning Brew #307
March 16th, 2009 at 1:16 am
[...] NHibernate performance with Ayende, David Penton, and Ben Scheirman – The Herding Code podcast listens in on a late night debate on NHibernate peformance, and publishes the discussion – I’ve not yet listened to this, but certainly am looking forward to it [...]
jswanson
March 16th, 2009 at 12:04 pm
Some very interesting points later in session. First part was painful to listen to. David jumps into details of very specific issues without exploring big picture first.
Nicholas Whitehead
March 18th, 2009 at 5:21 am
Very interesting podcast. I am primarily a Java developer and I have made heavy use of Hibernate/Oracle. ( I plan on becoming a .Net expert by listening to lots of .Net podcasts. Any one see an issue with that ?)
The friendly-and-professional face-off between ORM using developers and DBAs is very familiar to me, and while I operate almost entirely on the Dev side, I was an Oracle DBA in a former life. Here’s some observations which I believe will be also relevant to NHibernate and SQLServer (or almost any RDBMS):
Background: We had a CRM like database. The model was quite simple with less than 20 tables, but several of them were big (10+ million rows) and we developed a pseudo-Query-By-Example web interface as part of a larger application. All the key tables were modeled in Hibernate and we used JBoss Cache as the underlying secondary level cache.
1. Developers implemented the QBE application purely through Hibernate using an XML defined wrapper around the criteria query API. Consequently, the database access was quite abstracted with the first few rounds of access code focused on “What” and not “How”. What I found was that Hibernate generated this incredibly complicated but surprisingly effective and elegant SQL.
2. Had the developers been using straight SQL or stored procedures, I am convinced the access would have required multiple queries and some data “massaging” to get the same data format returned to the browser. I tend to adopt the axiom that one query returning “many” rows is always preferable to multiple queries returning “fewer” rows. I do not believe that most developers would write the SQL that Hibernate generated.
3. Based on the ease of use of the XML criteria API, we were anxious to keep it, even though in early days we did experience some scenarios where the SQL generated was dirt-slow, and many others where it was slightly below adequate. Consequently, I donned my DBA hat and started to investigate what I could do on the Oracle side to better support this query engine. I think that applications that require some level of user defined queries (and indirectly, user defined SQL), it is quite difficult to squeeze out performant queries without implementing an arsenal of database side optimizations that may be beyond the norm in a more static type application. We found the combination of the following pretty much eliminated all our issues:
4. Table partitioning. We were fortunate that our database had several very natural demarcations in the form of region codes (most users only worked in one region out of many) and effective date (most users only worked on the current month’s worth of records that spanned a whole year). We created a synthetic column for effective-month and populated it with a trigger off the effective time stamp column in the same row. We partitioned tables according to this effective month and the region code. No change on the client side, massive change (for the better) in performance.
5. Higher index coverage: On a table with 18 columns, many which may be used in a query, but not always the same ones, we needed multiple (non-unique) indexes that might ordinarily look redundant or excessive. This took some trial and error but we noticed little overhead on having additional indexes, and we dropped indexes that turned out not to get much usage. I created a tool that took the raw XML query, converted to a criteria API call, executed it and generated a database execution plan. I taught the developers how to interpret it and whenever they saw some consistent issue, we investigated to see if a new index was warranted.
6. Other synthetic columns were introduced in order to create superior indexes and coverage. In some cases, but not all, this was preferable to using functional indexes, which also worked well to resolve some corner cases.
7. One specific issue emerged which revolved around the Oracle cost based optimizer and partitioned tables. I think this issue can also occur in SQLServer, based on something I heard on the StackOverflow.com podcast. The issue is that when using prepared statements (using bind variables instead of literals), the execution plan would be locked in and might be bound to a specific partition local index. The SQL engine does not notice that the next query with the same SQL but different binds will not be performant with the same plan. To resolve this, we added an option to the XML-> Criteria syntax, the ability to define an argument as a literal so when the SQL is built, Hibernate would generate the SQL with this one column argument defined as a literal instead of a bind. (usually primary keys or partition keys) Awkward at first, and arguably inelegant, it completely spanked the problem.
8. On a note related to #5 above, I think it is critical for the developers to be able see (and copy and test) the SQL that is generated by Hibernate during an application session. Although generated SQL (and binds) can be logged in Hibernate, we provided the developers a few enhanced tools to do this.
The bottom line, as always, is that the most performant systems will result from cooperation between developers and DBAs. Both can bring creativity and expertise to the table that are complimentary not contra-indicated.
Cheers. Look forward to the next one.
//Nicholas
Daniel Auger
March 18th, 2009 at 6:52 pm
This was a fantastic podcast. I really hope this spurns a more open conversation between programmers and DBAs on the topic of ORM. I look forward to seeing what input David has in the future as he could do a lot of good for the community.
Oddly enough David’s first point regarding the effect of differentiating parameter lengths on Sql Server’s execution plan optimization was something I recently asked about on the NH user group. It turns out there is a setting that negates the problem.
http://groups.google.com/group/nhusers/browse_thread/thread/7a0a341184796836#
Yesterday's news | NHibernate parameter sizes controversy
October 28th, 2009 at 8:41 am
[...] 3/13, the question arises in Herding Code podcast, episode #38 with [...]
RSS feed for comments on this post · TrackBack URI
Leave a Reply