XQuery and an XML Database
Recently I wrote about using XQuery with the Federal Reserve Economic Data (FRED) API. I also complained a bit about sluggish performance (pdf), but noted by way of explanation that I was using a creaky old server and that XQuery was processing XML files and not a database.
Since then I installed Oracle’s BerkeleyDB XML database and duplicated one of the performance tests I’d done with the FRED data. Response time in the test dropped from 195 seconds to 2 seconds. Not bad; much more consistent with what I’d expect from SQL queries against a similarly sized relational database.
These are not precise performance tests by any means. For example, the XQuery processor I used with the original FRED data was Zorba, but XQilla is the XQuery processor that comes bundled with BerkeleyDB XML. So, for certain, I changed at least two important factors in the tests, and unless you make controlled changes in factors it is pretty much impossible to attribute causation.
However, I rather doubt that the choice of the XQuery processor had much to do with the performance improvement. Performance only improved when I created database indexes for the XML.
At this point I am still very much playing; just trying to use XQuery in a variety of environments to better understand what it can and cannot accomplish. If you’re a researcher, for example, it doesn’t much matter if it takes 195 seconds for a data integration step in an entire research process that may take days, weeks or longer. However, 2 seconds is much preferred if you’re doing ad hoc data mashups in a browser.
From what little I’ve seen of Oracle’s BerkeleyDB XML database, it’s an interesting product, available with either a commercial or free open source license. It’s an embedded database accessed through a programmatic API and linked libraries. It is not a relational database, database management system, or a database server. It does, however, offer a lightweight solution when performance and data persistence are critical. Disclaimer: I have no association with Oracle, BerkeleyDB, or BerkeleyDB XML.
I realize this blog post will only interest a very few people, but from time-to-time I may write similar posts that document my playing with XQuery. I continue to like what I see. Truthfully I’m not certain where it’s headed, other than to say that the driver is free learning for everyone everywhere.
