<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Educational Imaginations &#187; web data system</title>
	<atom:link href="http://garymlewis.com/instchg/tag/web-data-system/feed/" rel="self" type="application/rss+xml" />
	<link>http://garymlewis.com/instchg</link>
	<description></description>
	<lastBuildDate>Fri, 04 May 2012 11:08:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Web Query Tools &#8211; Part 2</title>
		<link>http://garymlewis.com/instchg/2009/04/24/web-query-tools-part-2/</link>
		<comments>http://garymlewis.com/instchg/2009/04/24/web-query-tools-part-2/#comments</comments>
		<pubDate>Fri, 24 Apr 2009 13:58:06 +0000</pubDate>
		<dc:creator>Gary Lewis</dc:creator>
				<category><![CDATA[Query Tools]]></category>
		<category><![CDATA[web architecture]]></category>
		<category><![CDATA[web data system]]></category>
		<category><![CDATA[web queries]]></category>

		<guid isPermaLink="false">http://garymlewis.com/instchg/?p=698</guid>
		<description><![CDATA[There's huge interest in harnessing the web as a giant database, and the motivations for this are easily as diverse as the interest is large. I'm just looking for a precursor. <a href="http://garymlewis.com/instchg/2009/04/24/web-query-tools-part-2/">Read more</a>.]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s huge interest in harnessing the web as a giant database, and the motivations for this are easily as diverse as the interest is large. I&#8217;m just looking for a precursor.</p>
<p>Two quick examples. Google researchers describe &#8220;a unified Web knowledge base&#8221; as the &#8220;holy grail of web information extraction&#8221; (<a href="http://www.sigmod.org/sigmod/record/issues/0812/p055.special.cafarella.pdf">PDF</a>), and the company energetically applies their research to harvest additional structured data from natural language text, HTML-embedded tables, and the so-called deep-web of databases sheltered behind form-only access.</p>
<p>Tim Berners-Lee, whose legacy is the web, provides another example. In a video from TED, he calls for a &#8220;<a href="http://www.ted.com/index.php/talks/tim_berners_lee_on_the_next_web.html">new reframing</a>&#8221; that will release the &#8220;unlocked  potential&#8221; of the web through linked data. The idea is that people and computers could traverse data related to other data, from A to B to &#8230; wherever you want to go as long as the relationships, the tuple linkings, are available. Berners-Lee&#8217;s new reframing translates into a web-of-data that could be used by scientists, citizens, social reformers, businesses, entrepreneurs, and governments in innovative new ways.</p>
<p>I applaud these efforts even though I realize that the potential for good and for ill seems equally strong. But my focus is nothing so grand as a web-of-data. I&#8217;d be happy if I could just answer a profoundly simple question: &#8220;Please give me a list of all web tutorials on python programming.&#8221; More symbolically I want to answer questions of the form: &#8220;Give me a list of &lt;a&gt; about &lt;b&gt;.&#8221; From there, of course, the possible questions become far more interesting.</p>
<p>For certain my query would be trivial if a web-of-data existed. But it does not, even though progress is being made. So, in the interim I&#8217;m looking for a substitute.</p>
<p>My latest excursions have taken me into the world of XML, and specifically into XQuery and, to a lesser extent, into the XQuery relatives XSLT and XPath. All of these are <a href="http://www.w3.org/TR/2009/PER-xquery-20090421/">W3C recommendations</a> that have been implemented in both open source and proprietary products.</p>
<p>Why XQuery? Well, because XML is part of the fabric of the web, and because data in XML format can be queried with XQuery, and because I stumbled upon <a href="http://www.zorba-xquery.com/">Zorba</a> via an O&#8217;Reilly xml.com article called &#8220;<a href="http://www.oreillynet.com/xml/blog/2008/05/zorba_xquery_processor.html">Something Tells Me You Need to Pay Attention to This</a>.&#8221; How could I resist a title like that?</p>
<p>I started playing with Zorba&#8217;s XQuery about two months ago. Maybe I&#8217;ve written a couple hundred queries now. Like the learning curve for SQL, it&#8217;s clear that a couple hundred is at least an order of magnitude too small to become skilled. But it&#8217;s great fun and I&#8217;m encouraged by what I see.</p>
<p>One example. Tony Hirst at the Open University has done some <a href="http://ouseful.wordpress.com/2009/03/20/my-guardian-openplatform-apindata-hacks-roundup/">cool things</a> recently with The Guardian&#8217;s new data API. In one of these <a href="http://ouseful.wordpress.com/2009/03/13/joining-data-from-the-guardian-data-store-student-satisfaction-data/">projects</a>, Tony used data from The Guardian&#8217;s university guide to do a mashup on student satisfaction in architecture and planning programs at various UK universities. It featured a very nice use of DabbleDB database.</p>
<p>Data integration is one of the strengths of XQuery, so I set about following Tony&#8217;s lead to see if I could duplicate his mashup but by using Zorba. It was very fun and I learned tons. You can see the results in this <a href="http://garymlewis.com/instchg/public/pdf/xq.pdf">PDF</a>.</p>
<p>I&#8217;m now ready to start work on another XQuery data integration, this time using the new <a href="http://api.stlouisfed.org/docs/fred/">FRED API</a> from the Federal Reserve Bank of St. Louis. Not that I&#8217;m particularly interested in banking-related data, but FRED uses a REST web service architecture and will allow me to play more  thoroughly with Zorba&#8217;s REST capability. And the volume of the data will allow me to stress test the performance of Zorba&#8217;s XQuery.</p>
<p>If you are interested at all by Zorba, I&#8217;d recommend you read some of the technical documents where you can catch glimpses of longer term development objectives and experience some of the chutzpah that must exist in the development team. A recent example is <a href="http://data.semanticweb.org/conference/www/2009/paper/102/html">XQuery in the Browser</a>, which was presented this week at the <a href="http://www2009.org/">18th International World Wide Web Conference</a> in Madrid. The article basically takes aim at JavaScript. As another example, check out the plans of a 3 year-old startup called <a href="http://www.28msec.com/ourvision.html">28msec</a> and some of their technical papers. I particularly enjoyed the architecture discussion in Donald Kossmann&#8217;s slide presentation on <a href="http://www.28msec.com/download/edbt08.pdf">Building Web Applications without a DBMS</a> (PDF).</p>
<p>Ok, I better stop. It&#8217;s already beginning to sound like an infomercial. Hopefully some of my sincere enthusiasm comes through, however. It&#8217;s a hopeful time. And now it&#8217;s back to making a dent in that order of magnitude learning curve.</p>
]]></content:encoded>
			<wfw:commentRss>http://garymlewis.com/instchg/2009/04/24/web-query-tools-part-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Hello World</title>
		<link>http://garymlewis.com/instchg/2009/02/03/hello-world/</link>
		<comments>http://garymlewis.com/instchg/2009/02/03/hello-world/#comments</comments>
		<pubDate>Tue, 03 Feb 2009 16:51:33 +0000</pubDate>
		<dc:creator>Gary Lewis</dc:creator>
				<category><![CDATA[background]]></category>
		<category><![CDATA[rwebdb]]></category>
		<category><![CDATA[free learning]]></category>
		<category><![CDATA[web data system]]></category>

		<guid isPermaLink="false">http://garymlewis.com/instchg/?p=435</guid>
		<description><![CDATA[Yesterday I registered a domain name called rwebdb.com, or rWebDB as in Our Web DB where DB := database (of course). <a href="http://garymlewis.com/instchg/2009/02/03/hello-world/">Read more.</a href>.]]></description>
			<content:encoded><![CDATA[<p>Yesterday I registered a domain name called rwebdb.com, or rWebDB as in Our Web DB where DB := database (of course).</p>
<p>There&#8217;s nothing at the site yet. And there won&#8217;t be for a [very?] long time. So hold your horses.</p>
<p>Why the announcement? A couple reasons. One, I want to make it more difficult for myself to ignore the site. And, two, it&#8217;s a small step toward the ideas I wrote about <a href="http://garymlewis.com/instchg/2008/12/23/its-past-time/">here</a>, <a href="http://garymlewis.com/instchg/2009/01/05/re-what-not-to-build/">here</a>, <a href="http://garymlewis.com/instchg/2009/01/07/etl-and-mashups/">here</a>, and <a href="http://garymlewis.com/instchg/2008/11/11/imagining-tomorrows-university/">here</a>.</p>
<p>In all honesty, I have no idea where things will go with this. I have some vague hopes of designing and developing something [a web db?] that can then be open-sourced, improved upon, and enlivened with the new ideas of others.</p>
<p>My belief is that everyone everywhere has the right to learn. My hope is that someday learning will be free throughout each person&#8217;s lifetime.</p>
<p>It is such an implausible hope that I still get embarrassed saying it, although it&#8217;s getting easier. And when I remember the inimitable question of Richard Feynman, &#8220;what do you care what other people think?&#8221; it brings a smile.</p>
]]></content:encoded>
			<wfw:commentRss>http://garymlewis.com/instchg/2009/02/03/hello-world/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ETL and Mashups</title>
		<link>http://garymlewis.com/instchg/2009/01/07/etl-and-mashups/</link>
		<comments>http://garymlewis.com/instchg/2009/01/07/etl-and-mashups/#comments</comments>
		<pubDate>Wed, 07 Jan 2009 16:45:45 +0000</pubDate>
		<dc:creator>Gary Lewis</dc:creator>
				<category><![CDATA[Query Tools]]></category>
		<category><![CDATA[web architecture]]></category>
		<category><![CDATA[web data system]]></category>

		<guid isPermaLink="false">http://garymlewis.com/instchg/?p=327</guid>
		<description><![CDATA[This post extends my recent observation that new web data systems and associated services might act as incubators for socially constructive innovations in education. <a href="http://garymlewis.com/instchg/2009/01/07/etl-and-mashups/">Read more</a href>.]]></description>
			<content:encoded><![CDATA[<p>This post expands a bit on my recent observation in <a href="http://garymlewis.com/instchg/2009/01/05/re-what-not-to-build/">Re: What Not to Build</a> that new web data systems and associated services might act as incubators for socially constructive innovations in education.</p>
<p>Today in ProgrammableWeb, John Musser <a href="http://blog.programmableweb.com/2009/01/07/enterprise-mashups-new-book-highlights-the-patterns/">previewed</a> a book by Michael Ogrinz called <em>Mashup Patterns</em>. In the book, Ogrinz discusses 34 types of mashups arranged in 5 categories. It was the language that Ogrinz used to describe the 5 categories that got my attention. He calls these categories harvesting, enhancing, assembling, managing, and testing.</p>
<p>Ogrinz is talking about the web, of course. But in the world of enterprise data warehouses, there is an interesting parallel in something called ETL. ETL is short-hand for extraction, transformation, and load. Data is first extracted from one or more data source systems. It is cleaned, aggregated, or otherwise manipulated in the transformation step. And then it is loaded into the warehouse where it is available to users, business analysts, statisticians, and policy researchers.</p>
<p>Data warehouses act as a middle layer between the data source systems and people who use the data in their jobs. This architectural change solves many problems, allowing data from disparate systems to be combined into a single place, inviting folks to ask new business questions, and also creating an enterprise vocabulary that helps people compare apples to apples. Most importantly, warehouses provide relatively simple and easy access points to reliable data.</p>
<p>I&#8217;m not suggesting that data warehouses should serve as web data systems or that mashups should comprise part of the ETL. I&#8217;m really not suggesting anything at this point, but merely using the serendipity of Ogrinz&#8217; words to think about the basic design and development issues intrinsic in any data system build-out.</p>
]]></content:encoded>
			<wfw:commentRss>http://garymlewis.com/instchg/2009/01/07/etl-and-mashups/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

