<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Educational Imaginations &#187; web queries</title>
	<atom:link href="http://garymlewis.com/instchg/tag/web-queries/feed/" rel="self" type="application/rss+xml" />
	<link>http://garymlewis.com/instchg</link>
	<description></description>
	<lastBuildDate>Fri, 04 May 2012 11:08:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Another XQuery Use Case: Is Higher Education Countercyclical?</title>
		<link>http://garymlewis.com/instchg/2009/08/10/another-xquery-use-case-is-higher-education-countercyclical/</link>
		<comments>http://garymlewis.com/instchg/2009/08/10/another-xquery-use-case-is-higher-education-countercyclical/#comments</comments>
		<pubDate>Mon, 10 Aug 2009 17:52:12 +0000</pubDate>
		<dc:creator>Gary Lewis</dc:creator>
				<category><![CDATA[Query Tools]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[web queries]]></category>
		<category><![CDATA[xquery]]></category>

		<guid isPermaLink="false">http://garymlewis.com/instchg/?p=1172</guid>
		<description><![CDATA[In this XQuery use case, I consider whether higher education in the U.S. is countercyclical to recessions.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve completed another XQuery demonstration project (for earlier ones, see <a href="http://garymlewis.com/instchg/category/query-tools">XQuery</a>). In this version I used XQuery to assemble data to consider anew the conventional wisdom that higher education is countercyclical (ie, enrollments grow during recessions).</p>
<p>Here&#8217;s a sample graph from the report. It shows the annual percentage change in enrollments from all U.S. degree-granting institutions of higher education from 1968 to 2007. The grayscale background shows unemployment rates and periods of recessions. The report is available in this <a href="http://garymlewis.com/instchg/public/xquery/dt08_189/dt08_189.pdf">pdf</a> that includes better resolution graphics.</p>
<p><img class="alignleft size-full wp-image-1177" title="dt08_main" src="http://garymlewis.com/instchg/wp-content/uploads/2009/08/dt08_main.png" alt="dt08_main" width="500" height="357" /></p>
<p>Here is an abbreviated version of the summary section from the report.</p>
<p>1. The new XQuery use cases included screen scraping enrollment data embedded in html, and the pipelining of enrollment and economic data through several staged transformations.<br />
2. The evidence that higher education in the United States is countercyclical appears weak based on the exploratory analysis done here.<br />
3. The evidence varies somewhat by institutional control and type, but mostly the general observation of a weak association between enrollment change and recession cycles holds true.<br />
4. The tools used  in this project, XQuery and R, are wonderful research tools but are not suitable for more general use without wrappers that mute their complexity.<br />
5. The screen-scraping technique used in this project provides another access route to a vast amount of U.S. Department of Education data on the web.<br />
6. None of the data sources used in this project included semantic or linked data markup. Whether XQuery can be used successfully with RDFa or microformats seems worthy of investigation.</p>
<p>Here are links to various pieces of this project:<br />
a. <a href="http://garymlewis.com/instchg/public/xquery/dt08_189/dt08_189.pdf">Final report</a><br />
b. <a href="http://garymlewis.com/instchg/public/xquery/dt08_189/dt08_189_doc.xq">Documentation for XQuery source programs</a><br />
c. <a href="http://garymlewis.com/instchg/public/xquery/dt08_189/dt08_189_enrl.xq">XQuery to generate enrollment data</a><br />
d. <a href="http://garymlewis.com/instchg/public/xquery/dt08_189/dt08_189_econ.xq">XQuery to generate unemployment and recession data</a><br />
e. <a href="http://garymlewis.com/instchg/public/xquery/dt08_189/dt08_189_final.xq">XQuery to merge enrollment, unemployment, and recession data</a><br />
f. <a href="http://garymlewis.com/instchg/public/xquery/dt08_189/dt08_189_final.Rhistory">R analysis history</a></p>
<p>I remain quite satisfied with XQuery. </p>
]]></content:encoded>
			<wfw:commentRss>http://garymlewis.com/instchg/2009/08/10/another-xquery-use-case-is-higher-education-countercyclical/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>XQuery as a Web Query Tool</title>
		<link>http://garymlewis.com/instchg/2009/06/19/xquery-as-a-web-query-tool/</link>
		<comments>http://garymlewis.com/instchg/2009/06/19/xquery-as-a-web-query-tool/#comments</comments>
		<pubDate>Fri, 19 Jun 2009 11:52:03 +0000</pubDate>
		<dc:creator>Gary Lewis</dc:creator>
				<category><![CDATA[Query Tools]]></category>
		<category><![CDATA[web queries]]></category>

		<guid isPermaLink="false">http://garymlewis.com/instchg/?p=980</guid>
		<description><![CDATA[Recently I asked folks on an XQuery listserv if XQuery might play a role in a p2p web anticipated by such recent announcements as Google's Wave and Opera's Unite. Below is an edited version of that conversation. <a href="http://garymlewis.com/instchg/2009/06/19/xquery-as-a-web-query-tool/">Read more</a>.]]></description>
			<content:encoded><![CDATA[<p>Those of you who have followed this blog know that I&#8217;ve long searched for something that I call a &#8220;web query tool&#8221; (click on the <a href="http://garymlewis.com/instchg/category/query-tools/">Query Tools</a> category to see this history).</p>
<p>As a way of identifying a &#8220;web query tool,&#8221; I use a very simple measurement device &#8230; it is something that can answer the following question: &#8220;Please give me a list of &lt;a&gt; about &lt;b&gt;&#8221; (eg, a list of web tutorials about python programming). This sounds like a search, but it is more than search. It is an ad hoc query run against a web of data instead of a database.</p>
<p>I don&#8217;t think the time is at hand when I&#8217;ll find a real web query tool.</p>
<p>But recently I&#8217;ve been playing quite a bit with XQuery and have been pleased (eg, see <a href="http://garymlewis.com/instchg/2009/06/08/making-a-dent-in-a-steep-learning-curve/">Making a Dent in a Steep Learning Curve</a>). I believe XQuery will do nicely  as a tool for mashups of web data and as a front-end into a technology stack that runs from data integration to transformation to analysis to visualization and then to presentation.</p>
<p>However I wanted a reality check from people far more experienced with XQuery than me. So I posed a question to a list that serves people focused on XQuery. The question contained a twist about the future as well.</p>
<p>The respondents were uniformly gracious, and I found the conversation very helpful. Below is an edited version of that conversation organized as threads rather than in time sequence. Enjoy.</p>
<p>Notes:<br />
1. I&#8217;m reluctant to quote the words of other people. So originally I thought I would summarize the conversation. But this made it lifeless. So I left the respondents anonymous but included their (slightly edited) direct quotes. I also wrote the list where the original conversation happened and included a link to a preview of this blog post, and asked if anyone objected if I included the transcript in the post.<br />
2. In the transcript, indents denote threads. So, for example, something indented once means it is a response to my original question. Something indented twice is a response to the immediately preceding response indented once. And so on.<br />
3. I omitted side conversations not directly relevant to my original question. As I write this, the discussion continues on the list but it has long since left my original topic. This still evolving conversation is not included in the transcript.</p>
<p style="text-align: center;"><strong>Transcript</strong></p>
<p><span style="text-decoration: underline;">Gary Lewis, 10:13am</span><br />
Can anyone comment on the possibility that xquery could be used as a data integration/mashup tool running in a browser that included a web server? There&#8217;s been quite a bit of excitement the past couple days about the Opera Unite announcement of its &#8220;web server in a web browser&#8221; concept. Is it possible that xquery, as  query/integration/transformation tool, might someday be a plugin to a browser (eg, Chrome; Chrome/Wave)?</p>
<p style="padding-left: 30px;"><span style="text-decoration: underline;">Respondent #1, 10:25am</span><br />
sure why not &#8230; XSLT support in the browser was/is a plugin and sets a precedent for this kind of thing</p>
<p style="padding-left: 30px;">XQuery in the browser would be great, anything that reduces the hegemony that is javascript &#8230; which reminds me, I know it sounds strange but perhaps browsers will turn into *the* personal &#8216;database&#8217; for the individual someday and it will be natural to have something like xquery to help process the data it contains.</p>
<p style="padding-left: 30px;"><span style="text-decoration: underline;">Respondent #2, 10:32am</span><br />
This may be of interest -<br />
<a href="http://www.zorba-xquery.com/index.php/xquery-in-the-browser-xqib/">http://www.zorba-xquery.com/index.php/xquery-in-the-browser-xqib/</a></p>
<p style="padding-left: 30px;"><span style="text-decoration: underline;">Respondent #3, 10:42am</span><br />
There are a lot of people keen on the idea of xquery in the web browser.</p>
<p style="padding-left: 30px;">Unfortunately, though, I think there&#8217;s a bit of a conflict. A lot of the benefits of the web architecture come from the fact that the browser is small, ubiqitous, and predictable. The more goodies you put in it, the less that remains true. And the less it is true, the harder it is to write applications that will run anywhere and talk to anything. This is why, after ten years, writing applications that rely on XSLT-in-the-browser can still be problematic, as recently discussed on the xml-dev list.</p>
<p style="padding-left: 60px;"><span style="text-decoration: underline;">Respondent #5, 1:25pm</span><br />
Well, one way to achieve the same result is to build on that small and predictable foundation. This rules out plugins, but there&#8217;s other possibility &#8211; that of &#8220;compiling&#8221; XQuery to cross-browser JavaScript. We already have GWT that does a similar thing, so the general concept is proven, and I don&#8217;t see why doing the same to XQuery would be any harder. The main problem that I see there is that browser XML manipulation APIs aren&#8217;t standardized yet, but this is going to change with HTML5 (and meanwhile there are already &#8220;good enough&#8221; ways to work around this by detecting and using the mush-mash of the existing browser-specific APIs).</p>
<p style="padding-left: 30px;"><span style="text-decoration: underline;">Respondent #4, 11:21am</span><br />
<a href="http://xqib.org/">http://xqib.org</a></p>
<p style="padding-left: 60px;"><span style="text-decoration: underline;">Gary Lewis, 11:30am</span><br />
A couple people have mentioned zorba. I&#8217;ve used and quite like zorba xquery; even wrote a blog post about it and included some analysis of Federal Reserve Economic Data time series accessed via an API with zorba.</p>
<p style="padding-left: 60px;">I&#8217;m also familiar with the zorba XQIB project but do not use a Windows OS so have not played with it yet. But my impression (probably wrong) is that XQIB is cast mostly as an alternative to JavaScript, using scripting extensions to XQuery. If true (please let me know otherwise), I&#8217;m not so much interested in this aspect. As I said earlier, I&#8217;m mostly interested in a data integration &amp; transformation tool that sits in a browser and gets used as a precursor to analysis &amp; graphics tools.</p>
<p style="padding-left: 90px;"><span style="text-decoration: underline;">Respondent #7, 5:00pm</span><br />
the xqib people can answer you in more details, but in general, what they do have is a full embedding of a Zorba xquery processor in the browser &#8212; this includes all the functionalities of Zorba, full XQuery with updates and scripting, XQuery 1.1 with groupby, windowing, try-catch, etc, plus the REST, function libraries.</p>
<p style="padding-left: 90px;">What users use this XQuery for is there own decision: can be very well data integration, data transformation, RSS integration, etc, everything is good, as long as it can be expressed in XQuery.</p>
<p style="padding-left: 30px;"><span style="text-decoration: underline;">Respondent #6, 1:26pm</span><br />
I surely see your point about XQuery in the browser &#8211; as stated before by others: check out our project at <a href="http://www.xqib.org/" target="_blank">http://www.xqib.org/</a></p>
<p style="padding-left: 30px;">At the moment, the release works only with Internet Explorer &#8211; but the next release (which is nearly finished &#8211; we hope to release it this month) will also run with Firefox and will be much faster. [...]</p>
<p style="padding-left: 30px;">But I wanted to ask you, where exactly you see the usecases of having a web server in the browser? To be honest: the last thing I want running in my browser is a server (as [Respondent #3] stated because of complexity, but also because of security). Also, in most companies (and also on most routers), all ports are normally closed, which makes a webserver useless. The only benefit I can see, is that this way it would be easy to implement some kind of RPC from the server to the client. But in my oppinion, this would be a too small advantage to legitimate a webserver.</p>
<p style="padding-left: 60px;"><span style="text-decoration: underline;">Gary Lewis, 4:29pm</span><br />
Let&#8217;s start with this question. In a p2p web &#8230; as Google Chrome/Wave and Opera Unite suggest might happen &#8230; is there a new role for xquery? I guess I&#8217;d answer that question with a tentative &#8220;Yes.&#8221; Here&#8217;s a simple example. There&#8217;s a lot of buzz around transparency now &#8230; government transparency, data transparency, etc. In the U.S., <a href="http://data.gov/" target="_blank">data.gov</a> is a recent case. The idea is simple enough. Make government data sets available to the public and let them mash it up and then share it. [In some ways this scenario gives me the creeps, because I know from years of experience how very difficult it is to insure data integrity during analysis.] I&#8217;d love to use xquery as a data integration and transformation tool as preparation for analysis, visualization, &amp; presentation. And then to share the data with others in a kind of social network of data reuse. I was guessing when I suggested that xquery in a browser with a web server might make this future possible. I&#8217;ve no way of knowing. But maybe one of you might.</p>
<p style="padding-left: 30px;"><span style="text-decoration: underline;">Respondent #8, 6:53pm</span><br />
Why? To me, the best thing about XQuery is that it can run natively (and more efficiently than the more encompassing XSL) on an XML DB providing all the speed and memory efficiency afforded by the different and incompatible XML DBs. Caveat emptor: I fall into the camp that doesn&#8217;t believe XQuery is a good styling language or a good PHP-like webapp templating language. (if there even is a camp, rather than me leaning over a bic lighter).</p>
<p style="padding-left: 30px;">Running XQuery in the browser seems like a huge waste of bandwidth where you would have to download everything you want to query against, perhaps to find only a tiny fragment. I would create a URL and just GET it from a server where XQuery creates the XML or JSON to be styled on the client with XSL or JavaScript.</p>
<p style="padding-left: 60px;"><span style="text-decoration: underline;">Respondent #4, 1:36am</span><br />
i think browser can act as proxy server to localhost-based eXist database (or AMP if somebody still love it).</p>
]]></content:encoded>
			<wfw:commentRss>http://garymlewis.com/instchg/2009/06/19/xquery-as-a-web-query-tool/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Making a Dent in a Steep Learning Curve</title>
		<link>http://garymlewis.com/instchg/2009/06/08/making-a-dent-in-a-steep-learning-curve/</link>
		<comments>http://garymlewis.com/instchg/2009/06/08/making-a-dent-in-a-steep-learning-curve/#comments</comments>
		<pubDate>Mon, 08 Jun 2009 20:14:56 +0000</pubDate>
		<dc:creator>Gary Lewis</dc:creator>
				<category><![CDATA[Query Tools]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[web queries]]></category>

		<guid isPermaLink="false">http://garymlewis.com/instchg/?p=795</guid>
		<description><![CDATA[In a continued search for a worthy web query tool, I spent a fair amount of time playing with Zorba's XQuery and the 20,000 time series from the Federal Reserve Economic Data (FRED) database. I continue to be impressed. <a href="http://garymlewis.com/instchg/2009/06/08/making-a-dent-in-a-steep-learning-curve">Read more</a href>.]]></description>
			<content:encoded><![CDATA[<p>In a previous post called <a href="http://garymlewis.com/instchg/2009/04/24/web-query-tools-part-2/">Web Query Tools &#8211; Part 2</a>, I said:</p>
<blockquote><p>I&#8217;m now ready to start work on another XQuery data integration, this time using the new <a href="http://api.stlouisfed.org/docs/fred/">FRED API</a> from the Federal Reserve Bank of St. Louis. Not that I&#8217;m particularly interested in banking-related data, but FRED uses a REST web service architecture and will allow me to play more  thoroughly with Zorba&#8217;s REST capability. And the volume of the data will allow me to stress test the performance of Zorba&#8217;s XQuery.</p></blockquote>
<p>The siren&#8217;s song, of course, is that the web may one day become a giant database of interrelated data.</p>
<p>I&#8217;ve now finished the XQuery/FRED project. You can see the details <a href="http://garymlewis.com/instchg/public/xquery/fred/fred_xquery.pdf">here</a> (pdf), but I&#8217;ll touch on a few of the points.</p>
<p>The FRED database includes more than 20,000 economic time series. The folks at the Economic Research Division at the Federal Reserve Bank of St. Louis have done a nice job making this data available via an API. With so much data to choose from, however, it helps if you know at the outset what kind of data you&#8217;d like to retrieve. I didn&#8217;t.</p>
<p>So I set myself two tasks: 1) Build the XML documents to provide flexible overviews of the 20,000 time series; and 2) Choose two of these times series and conduct a small analysis.</p>
<p>I did the first job using Zorba XQuery and the second job using XQuery and the R statistics package. Again, the details are <a href="http://garymlewis.com/instchg/public/xquery/fred/fred_xquery.pdf">here</a> (pdf).</p>
<p>Basically I came away feeling like I&#8217;d made some progress. The Zorba REST calls to the FRED API worked flawlessly. Some of the XQuery programs with joins to multiple XML just chugged away slowly, but then I  was using a creaky old server to run Zorba and serve up the XML files. Otherwise it was a wonderful opportunity to learn more about XQuery as a web query tool. I continue to like what I saw.</p>
<p>Since that&#8217;s a pretty positive statement for me, perhaps this is the place to say that I have no relationship with Zorba or any of the other XQuery implementations. I&#8217;m simply looking for a tool that will utilize some of my strengths in pursuit of the main reason I write this blog &#8230; which, if you don&#8217;t yet know, is to help nudge us closer to the day when learning will be free for everyone everywhere.</p>
<p>Next I plan to look at Zorba in a variety of other settings. Options include the XQuery in the Browser project (<a href="http://www.xqib.org/">XQIB</a>), XQuery scripting and <a href="http://sausalito.28msec.com/dokuwiki/doku.php?id=start#overview">Sausilito</a> to construct web sites, and some serious performance tests of  XQuery  on EC2 and S3 (Amazon Web Services).</p>
<p>Below is a screen shot of the R analysis. It shows a not unexpected relationship between the unemployment rate in the United States over the last 55 years and the rate of change in outstanding household credit market debt.  The current recession seems somewhat atypical, however. Details and a legible graphic are also in the <a href="http://garymlewis.com/instchg/public/xquery/fred/fred_xquery.pdf">PDF</a>, as is souce code for the XQuery programs I wrote.</p>
<p><img class="aligncenter size-full wp-image-838" title="cmdebt_unrate1" src="http://garymlewis.com/instchg/wp-content/uploads/2009/06/cmdebt_unrate1.jpg" alt="cmdebt_unrate1" width="500" height="455" /></p>
]]></content:encoded>
			<wfw:commentRss>http://garymlewis.com/instchg/2009/06/08/making-a-dent-in-a-steep-learning-curve/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Web Query Tools &#8211; Part 2</title>
		<link>http://garymlewis.com/instchg/2009/04/24/web-query-tools-part-2/</link>
		<comments>http://garymlewis.com/instchg/2009/04/24/web-query-tools-part-2/#comments</comments>
		<pubDate>Fri, 24 Apr 2009 13:58:06 +0000</pubDate>
		<dc:creator>Gary Lewis</dc:creator>
				<category><![CDATA[Query Tools]]></category>
		<category><![CDATA[web architecture]]></category>
		<category><![CDATA[web data system]]></category>
		<category><![CDATA[web queries]]></category>

		<guid isPermaLink="false">http://garymlewis.com/instchg/?p=698</guid>
		<description><![CDATA[There's huge interest in harnessing the web as a giant database, and the motivations for this are easily as diverse as the interest is large. I'm just looking for a precursor. <a href="http://garymlewis.com/instchg/2009/04/24/web-query-tools-part-2/">Read more</a>.]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s huge interest in harnessing the web as a giant database, and the motivations for this are easily as diverse as the interest is large. I&#8217;m just looking for a precursor.</p>
<p>Two quick examples. Google researchers describe &#8220;a unified Web knowledge base&#8221; as the &#8220;holy grail of web information extraction&#8221; (<a href="http://www.sigmod.org/sigmod/record/issues/0812/p055.special.cafarella.pdf">PDF</a>), and the company energetically applies their research to harvest additional structured data from natural language text, HTML-embedded tables, and the so-called deep-web of databases sheltered behind form-only access.</p>
<p>Tim Berners-Lee, whose legacy is the web, provides another example. In a video from TED, he calls for a &#8220;<a href="http://www.ted.com/index.php/talks/tim_berners_lee_on_the_next_web.html">new reframing</a>&#8221; that will release the &#8220;unlocked  potential&#8221; of the web through linked data. The idea is that people and computers could traverse data related to other data, from A to B to &#8230; wherever you want to go as long as the relationships, the tuple linkings, are available. Berners-Lee&#8217;s new reframing translates into a web-of-data that could be used by scientists, citizens, social reformers, businesses, entrepreneurs, and governments in innovative new ways.</p>
<p>I applaud these efforts even though I realize that the potential for good and for ill seems equally strong. But my focus is nothing so grand as a web-of-data. I&#8217;d be happy if I could just answer a profoundly simple question: &#8220;Please give me a list of all web tutorials on python programming.&#8221; More symbolically I want to answer questions of the form: &#8220;Give me a list of &lt;a&gt; about &lt;b&gt;.&#8221; From there, of course, the possible questions become far more interesting.</p>
<p>For certain my query would be trivial if a web-of-data existed. But it does not, even though progress is being made. So, in the interim I&#8217;m looking for a substitute.</p>
<p>My latest excursions have taken me into the world of XML, and specifically into XQuery and, to a lesser extent, into the XQuery relatives XSLT and XPath. All of these are <a href="http://www.w3.org/TR/2009/PER-xquery-20090421/">W3C recommendations</a> that have been implemented in both open source and proprietary products.</p>
<p>Why XQuery? Well, because XML is part of the fabric of the web, and because data in XML format can be queried with XQuery, and because I stumbled upon <a href="http://www.zorba-xquery.com/">Zorba</a> via an O&#8217;Reilly xml.com article called &#8220;<a href="http://www.oreillynet.com/xml/blog/2008/05/zorba_xquery_processor.html">Something Tells Me You Need to Pay Attention to This</a>.&#8221; How could I resist a title like that?</p>
<p>I started playing with Zorba&#8217;s XQuery about two months ago. Maybe I&#8217;ve written a couple hundred queries now. Like the learning curve for SQL, it&#8217;s clear that a couple hundred is at least an order of magnitude too small to become skilled. But it&#8217;s great fun and I&#8217;m encouraged by what I see.</p>
<p>One example. Tony Hirst at the Open University has done some <a href="http://ouseful.wordpress.com/2009/03/20/my-guardian-openplatform-apindata-hacks-roundup/">cool things</a> recently with The Guardian&#8217;s new data API. In one of these <a href="http://ouseful.wordpress.com/2009/03/13/joining-data-from-the-guardian-data-store-student-satisfaction-data/">projects</a>, Tony used data from The Guardian&#8217;s university guide to do a mashup on student satisfaction in architecture and planning programs at various UK universities. It featured a very nice use of DabbleDB database.</p>
<p>Data integration is one of the strengths of XQuery, so I set about following Tony&#8217;s lead to see if I could duplicate his mashup but by using Zorba. It was very fun and I learned tons. You can see the results in this <a href="http://garymlewis.com/instchg/public/pdf/xq.pdf">PDF</a>.</p>
<p>I&#8217;m now ready to start work on another XQuery data integration, this time using the new <a href="http://api.stlouisfed.org/docs/fred/">FRED API</a> from the Federal Reserve Bank of St. Louis. Not that I&#8217;m particularly interested in banking-related data, but FRED uses a REST web service architecture and will allow me to play more  thoroughly with Zorba&#8217;s REST capability. And the volume of the data will allow me to stress test the performance of Zorba&#8217;s XQuery.</p>
<p>If you are interested at all by Zorba, I&#8217;d recommend you read some of the technical documents where you can catch glimpses of longer term development objectives and experience some of the chutzpah that must exist in the development team. A recent example is <a href="http://data.semanticweb.org/conference/www/2009/paper/102/html">XQuery in the Browser</a>, which was presented this week at the <a href="http://www2009.org/">18th International World Wide Web Conference</a> in Madrid. The article basically takes aim at JavaScript. As another example, check out the plans of a 3 year-old startup called <a href="http://www.28msec.com/ourvision.html">28msec</a> and some of their technical papers. I particularly enjoyed the architecture discussion in Donald Kossmann&#8217;s slide presentation on <a href="http://www.28msec.com/download/edbt08.pdf">Building Web Applications without a DBMS</a> (PDF).</p>
<p>Ok, I better stop. It&#8217;s already beginning to sound like an infomercial. Hopefully some of my sincere enthusiasm comes through, however. It&#8217;s a hopeful time. And now it&#8217;s back to making a dent in that order of magnitude learning curve.</p>
]]></content:encoded>
			<wfw:commentRss>http://garymlewis.com/instchg/2009/04/24/web-query-tools-part-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Re: What Not to Build</title>
		<link>http://garymlewis.com/instchg/2009/01/05/re-what-not-to-build/</link>
		<comments>http://garymlewis.com/instchg/2009/01/05/re-what-not-to-build/#comments</comments>
		<pubDate>Tue, 06 Jan 2009 00:58:38 +0000</pubDate>
		<dc:creator>Gary Lewis</dc:creator>
				<category><![CDATA[Commentary]]></category>
		<category><![CDATA[Query Tools]]></category>
		<category><![CDATA[web queries]]></category>

		<guid isPermaLink="false">http://garymlewis.com/instchg/?p=308</guid>
		<description><![CDATA[Stephen Downes recently advised technology developers what not to build. My response: the major entrepreneurial and socially constructive opportunities await new data systems that will architecturally simplify the web and make it more accessible. <a href="http://garymlewis.com/instchg/2009/01/05/re-what-not-to-build/">Read more</a href>.]]></description>
			<content:encoded><![CDATA[<p>Stephen Downes has written another winner. The putative audience is technology developers, but anyone interested in educational change would do well to read <a href="http://halfanhour.blogspot.com/2009/01/what-not-to-build.html">What Not to Build</a>. Downes begins:</p>
<blockquote><p>So, here is my advice on what not to build. Actually, it&#8217;s a bit more that that: it&#8217;s a list of what not to build, a list of some things that people are working on now, some fads to avoid, and some indication of what&#8217;s out there for the taking, if you can get your act together in a hurry. And what lies beyond that? The domain of real innovation and progress.</p></blockquote>
<p>I left this comment on Stephen&#8217;s post:</p>
<blockquote><p>Here are just 2 quick points.<br />
1. &#8220;How&#8221; is just as important as &#8220;what&#8221;. There&#8217;s a parallel to What Not to Build that might be called How Not to Build.<br />
2. There&#8217;s one feature of the web that nobody has come close to solving. That feature is it&#8217;s monstrous size and it&#8217;s pulsating rate of growth. For example, consider search. Google Search is woefully inadequate in the face of the web&#8217;s size and growth rate. Alternatives like the federated searches (eg, on science.gov) are better but still provide inadequate coverage and features. There must be a hundred or more companies in the search business, and I&#8217;m certainly not suggesting that anyone create another one (this might be worse even than building another OS). But the web desperately needs some kind of middle layer of data systems that can be used by ordinary people. I&#8217;d say that the major entrepreneurial and socially constructive opportunities await builders of the web infrastructure.</p></blockquote>
<p>In some ways the web today parallels the situation with enterprise databases several decades ago. It was easy to put data into these relational databases, but it was dreadfully difficult to get reports, analysis, and information out in any useful form. Development work proceeded along two fronts: i) creating reporting and analysis tools that simplified the user interface and seemingly made SQL unnecessary; and ii) creating a middle layer of data abstracted from the full complexity of the relational databases but with a simplified architecture so that the new tools would actually work.</p>
<p>Google search and other current web search tools are over matched by the complexity of the web, just as reporting and analysis tools were once over matched by enterprise databases. I suspect that it will take a similar architectural change to make the web truly useful. But this is not going to come easily. It was devilishly difficult with enterprise databases, but at least developers had SQL as a query tool that could uncover even the most inaccessible of the data. There is no equivalent query language for the web, so simplifying the web architecture for ease of use will be orders of magnitude more difficult than enterprise databases. It&#8217;s possible that the semantic web may eventually play this role, but I&#8217;m not very sanguine that it will happen any time soon. We need something that will work today.</p>
]]></content:encoded>
			<wfw:commentRss>http://garymlewis.com/instchg/2009/01/05/re-what-not-to-build/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Web Query Tools &#8211; Part 1</title>
		<link>http://garymlewis.com/instchg/2008/11/21/web-query-tools-part-1/</link>
		<comments>http://garymlewis.com/instchg/2008/11/21/web-query-tools-part-1/#comments</comments>
		<pubDate>Fri, 21 Nov 2008 20:17:04 +0000</pubDate>
		<dc:creator>Gary Lewis</dc:creator>
				<category><![CDATA[Commentary]]></category>
		<category><![CDATA[Query Tools]]></category>
		<category><![CDATA[web queries]]></category>
		<category><![CDATA[YQL]]></category>

		<guid isPermaLink="false">http://garymlewis.com/instchg/?p=270</guid>
		<description><![CDATA[In search of any tool that can actually do web queries and analysis, I examine Yahoo's new Yahoo! Query Language (YQL). <a href="http://garymlewis.com/instchg/2008/11/21/web-query-tools-part-1/">Read more</a href>.]]></description>
			<content:encoded><![CDATA[<p>While I was busy <a href="http://garymlewis.com/instchg/2008/11/11/imagining-tomorrows-university/">Imagining Tomorrow&#8217;s University</a>, Yahoo <a href="http://developer.yahoo.net/blog/archives/2008/10/yos_10_launch.html">introduced</a> Yahoo! Open Strategy. It includes a  query tool called Yahoo! Query Language (YQL).</p>
<p>Here&#8217;s how Yahoo describes YQL:</p>
<blockquote><p>YQL is a new web service API that lets you access other web services using a SQL-like language rather than typical programmatic access. You can think of it as a command line version of Pipes. Its goal is to make data from Yahoo! as well as from across the internet universally accessible through a single common interface.</p></blockquote>
<p>I&#8217;ve <a href="http://garymlewis.com/instchg/2008/07/30/google-search-and-who-knew/">lamented</a> previously that search tools provide only meager help for web queries. For example, the following query cannot be accomplished in a web search: &#8220;Give me a list of all JavaScript tutorials available online.&#8221; This is a trivial query as queries go.</p>
<p>So when I heard about the YQL announcement, I promised myself I&#8217;d have a look at it as soon as I could. This post provides a preliminary reaction to YQL based on several readings of the <a href="http://developer.yahoo.com/yql/docs/">documentation</a> and perhaps 6 hours of play at the YQL <a href="http://developer.yahoo.com/yql/console/">console</a> that Yahoo provides (very nice, thank you!).</p>
<p>YQL is definitely a step beyond search. For example, you can get tantalizingly close to an answer for the query on JavaScript tutorials. The YQL looks like:</p>
<p>select * from search.web(0) where query = &#8220;javascript tutorial&#8221; | unique(field=&#8221;url&#8221;)</p>
<p>This is definitely SQL-like. There are select, from, and where clauses.  The search.web notation refers to one of about 50 XML  or JSON data collections presented by Yahoo as a &#8220;table.&#8221; Each table has the equivalent of SQL data columns that can be used for filtering or for display in the select clause.</p>
<p>The (0) after the table name signifies that I requested an unbounded query. A setting of (1000) exposes only a subset of 1000 entries in the source web service.</p>
<p>The | pipe is a post-query operation that discards non-unique url&#8217;s. You can also perform sorts in a post-query operation.</p>
<p>There are several other nice features of YQL that I won&#8217;t go into here. One example is sub-query capability. But take a look at the YQL documentation for examples. Or better yet, play with the YQL console and experiment for yourself.</p>
<p>Compared to any standard SQL, YQL is a very simple query tool. This is not necessarily a bad thing. For example, when someone designs large data warehouses from source data stored in relational tables,the design almost always removes the complexity from the data so that users and analysts can focus on the business questions and not on the database architecture. Simplifying the data structure means that the stress on the query and analysis tools is reduced. Simple tools work fine when the data architecture is also simple.</p>
<p>The YQL documentation that I saw does not contain architectural schemas, so I cannot comment on whether the &#8220;tables&#8221; exposed by Yahoo are adequately served by a simple query tool like YQL. Given the early stage in the lifecycle of this project, I suspect the answer is no. But I&#8217;d also expect additional work on both the architecture and the language will occur as people use it.</p>
<p>The JavaScript query that I wrote in YQL failed to complete. At present, Yahoo has a governor on YQL queries. I believe the max is 30 seconds or 50,000 items. Run-away queries are every DBA&#8217;s nightmare, so it is not surprising that Yahoo would place limits on YQL queries.</p>
<p>For web queries to be truly useful, however, we&#8217;ll need a way to relax the governor restrictions and actually do complete queries. But for now, I&#8217;m very grateful to see the precursors of tomorrow&#8217;s web query tools emerging.</p>
<p>Part 2 of this series will look at another approach to analysis of very large datasets. It&#8217;s called Pig (no joke) and is described by <a href="http://research.yahoo.com/node/90">Yahoo</a> and <a href="http://hadoop.apache.org/pig/">Apache Hadoop</a> as an &#8220;infrastructure&#8221; or &#8220;platform&#8221; to conduct ad-hoc data analysis. Apparently &#8220;the highest abstraction layer in Pig is a query language interface, whereby users express data analysis tasks as queries, in the style of SQL or Relational Algebra.&#8221;</p>
<p>Bingo. If only it&#8217;s true. I&#8217;ll let you know what I find out.</p>
]]></content:encoded>
			<wfw:commentRss>http://garymlewis.com/instchg/2008/11/21/web-query-tools-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Search and Who Knew?</title>
		<link>http://garymlewis.com/instchg/2008/07/30/google-search-and-who-knew/</link>
		<comments>http://garymlewis.com/instchg/2008/07/30/google-search-and-who-knew/#comments</comments>
		<pubDate>Wed, 30 Jul 2008 15:16:08 +0000</pubDate>
		<dc:creator>Gary Lewis</dc:creator>
				<category><![CDATA[Commentary]]></category>
		<category><![CDATA[Query Tools]]></category>
		<category><![CDATA[web queries]]></category>

		<guid isPermaLink="false">http://garymlewis.com/instchg/?p=45</guid>
		<description><![CDATA[I got sidetracked today into nuances in Google search, discovering some useful ways to refine searches but also making me wish yet again for more robust query tools. <a href="http://garymlewis.com/instchg/google-search-and-who-knew">Read more</a href>.]]></description>
			<content:encoded><![CDATA[<p>This is off-topic a bit, but I found it useful and others may also.</p>
<p>After all these years using Google, I finally read through the stuff on the Google Web Search Help Center. Duh, should have done this long ago. For example, I had no idea you could do synonym searches.</p>
<p>But what surprised me the most was <a href="http://www.google.com/support/bin/static.py?page=searchguides.html&amp;ctx=basics&amp;hl=en">Google&#8217;s</a> statement &#8220;keep in mind that the order in which the terms are typed will affect the search results.&#8221; How many thousands of searches have I made and never noticed this? But it&#8217;s true. The search [vacation paris] produces 560,000 results, while the search [paris vacation] produces 913,000 results. Note: I&#8217;ve used square brackets [] for readability only. In an actual search, of course, the brackets would not be used.</p>
<p>I figured an AND would be commutative like an INTERSECT set operator. You get the vacation result set (call it V) and the paris result set (call it P). V intersect P would give the same results as P intersect V. But V AND P != P AND V in a search.</p>
<p>I didn&#8217;t see any explanation of this in Google Web Search Help, but found a pretty helpful <a href="http://www.webmasterworld.com/google/3225547.htm">discussion</a> that makes sense. Search word order matters because bakers dozen means something different from dozen bakers, and factoring word order into the relevance algorithm improves search results.</p>
<p>Well, ok, I get the bakers dozen idea. But vacation paris and paris vacation? Why do these two searches produce such different results?</p>
<p>Which brings me to a couple peeves about search.</p>
<p>It should be possible to string search result sets together. Here&#8217;s an illustration. Suppose I wanted to find out how the search V AND P differs from P AND V. I could do the V AND P search to get a result set (call it R1). Then I could do the P AND V search to get another result set (call it R2). Having these 2 result sets, I could produce R2 MINUS R1 to find out what results were in R2 but not in R1. Also interesting would be R1 MINUS R2 and R1 INTERSECT R2. In other words, I&#8217;d like to produce search result sets that could be refined with other search result sets. This would certainly help with complex queries.</p>
<p>Which brings me to a second but related peeve. Right now we do searches. But to use the web as a research tool, we need to do queries. For example, here&#8217;s a very simple query that you cannot now translate into a Google search: &#8220;give me a list of all the web design courses&#8221;. The query ["web design" ~course] produces roughly 22 million results. It&#8217;s great if you&#8217;re looking for a few examples. But it&#8217;s lousy if you want a reasonably complete universe of unique web design courses.</p>
<p>In his book <em>Ambient Findability</em>, Peter Morville has an interesting discussion about different types of searches (pp 49-50). He distinguishes between sample, existence, and exhaustive searches. In a sample search, you&#8217;re looking for a few examples. Google does this great. An existence search is just a binary yes/no search (does document x exist)? Google also does this great. An exhaustive search should return all of the relevant items. As Morville discusses, the effectiveness of exhaustive searches falls rapidly with collection size. So it&#8217;s no wonder that this type of query/search is not available.</p>
<p>I suppose you could get something like an exhaustive search using Amazon Web Services and the Alexa search engine. I think you can get up to 10 million results returned that can be saved in a file for further manipulation. And maybe for a specific high return-value query, I would do this. But not for day-to-day work.</p>
<p>ok, that&#8217;s it for this search novice.</p>
]]></content:encoded>
			<wfw:commentRss>http://garymlewis.com/instchg/2008/07/30/google-search-and-who-knew/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

