<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Educational Imaginations &#187; YQL</title>
	<atom:link href="http://garymlewis.com/instchg/tag/yql/feed/" rel="self" type="application/rss+xml" />
	<link>http://garymlewis.com/instchg</link>
	<description></description>
	<lastBuildDate>Tue, 31 Jan 2012 11:45:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Web Query Tools &#8211; Part 1</title>
		<link>http://garymlewis.com/instchg/2008/11/21/web-query-tools-part-1/</link>
		<comments>http://garymlewis.com/instchg/2008/11/21/web-query-tools-part-1/#comments</comments>
		<pubDate>Fri, 21 Nov 2008 20:17:04 +0000</pubDate>
		<dc:creator>Gary Lewis</dc:creator>
				<category><![CDATA[Commentary]]></category>
		<category><![CDATA[Query Tools]]></category>
		<category><![CDATA[web queries]]></category>
		<category><![CDATA[YQL]]></category>

		<guid isPermaLink="false">http://garymlewis.com/instchg/?p=270</guid>
		<description><![CDATA[In search of any tool that can actually do web queries and analysis, I examine Yahoo's new Yahoo! Query Language (YQL). <a href="http://garymlewis.com/instchg/2008/11/21/web-query-tools-part-1/">Read more</a href>.]]></description>
			<content:encoded><![CDATA[<p>While I was busy <a href="http://garymlewis.com/instchg/2008/11/11/imagining-tomorrows-university/">Imagining Tomorrow&#8217;s University</a>, Yahoo <a href="http://developer.yahoo.net/blog/archives/2008/10/yos_10_launch.html">introduced</a> Yahoo! Open Strategy. It includes a  query tool called Yahoo! Query Language (YQL).</p>
<p>Here&#8217;s how Yahoo describes YQL:</p>
<blockquote><p>YQL is a new web service API that lets you access other web services using a SQL-like language rather than typical programmatic access. You can think of it as a command line version of Pipes. Its goal is to make data from Yahoo! as well as from across the internet universally accessible through a single common interface.</p></blockquote>
<p>I&#8217;ve <a href="http://garymlewis.com/instchg/2008/07/30/google-search-and-who-knew/">lamented</a> previously that search tools provide only meager help for web queries. For example, the following query cannot be accomplished in a web search: &#8220;Give me a list of all JavaScript tutorials available online.&#8221; This is a trivial query as queries go.</p>
<p>So when I heard about the YQL announcement, I promised myself I&#8217;d have a look at it as soon as I could. This post provides a preliminary reaction to YQL based on several readings of the <a href="http://developer.yahoo.com/yql/docs/">documentation</a> and perhaps 6 hours of play at the YQL <a href="http://developer.yahoo.com/yql/console/">console</a> that Yahoo provides (very nice, thank you!).</p>
<p>YQL is definitely a step beyond search. For example, you can get tantalizingly close to an answer for the query on JavaScript tutorials. The YQL looks like:</p>
<p>select * from search.web(0) where query = &#8220;javascript tutorial&#8221; | unique(field=&#8221;url&#8221;)</p>
<p>This is definitely SQL-like. There are select, from, and where clauses.  The search.web notation refers to one of about 50 XML  or JSON data collections presented by Yahoo as a &#8220;table.&#8221; Each table has the equivalent of SQL data columns that can be used for filtering or for display in the select clause.</p>
<p>The (0) after the table name signifies that I requested an unbounded query. A setting of (1000) exposes only a subset of 1000 entries in the source web service.</p>
<p>The | pipe is a post-query operation that discards non-unique url&#8217;s. You can also perform sorts in a post-query operation.</p>
<p>There are several other nice features of YQL that I won&#8217;t go into here. One example is sub-query capability. But take a look at the YQL documentation for examples. Or better yet, play with the YQL console and experiment for yourself.</p>
<p>Compared to any standard SQL, YQL is a very simple query tool. This is not necessarily a bad thing. For example, when someone designs large data warehouses from source data stored in relational tables,the design almost always removes the complexity from the data so that users and analysts can focus on the business questions and not on the database architecture. Simplifying the data structure means that the stress on the query and analysis tools is reduced. Simple tools work fine when the data architecture is also simple.</p>
<p>The YQL documentation that I saw does not contain architectural schemas, so I cannot comment on whether the &#8220;tables&#8221; exposed by Yahoo are adequately served by a simple query tool like YQL. Given the early stage in the lifecycle of this project, I suspect the answer is no. But I&#8217;d also expect additional work on both the architecture and the language will occur as people use it.</p>
<p>The JavaScript query that I wrote in YQL failed to complete. At present, Yahoo has a governor on YQL queries. I believe the max is 30 seconds or 50,000 items. Run-away queries are every DBA&#8217;s nightmare, so it is not surprising that Yahoo would place limits on YQL queries.</p>
<p>For web queries to be truly useful, however, we&#8217;ll need a way to relax the governor restrictions and actually do complete queries. But for now, I&#8217;m very grateful to see the precursors of tomorrow&#8217;s web query tools emerging.</p>
<p>Part 2 of this series will look at another approach to analysis of very large datasets. It&#8217;s called Pig (no joke) and is described by <a href="http://research.yahoo.com/node/90">Yahoo</a> and <a href="http://hadoop.apache.org/pig/">Apache Hadoop</a> as an &#8220;infrastructure&#8221; or &#8220;platform&#8221; to conduct ad-hoc data analysis. Apparently &#8220;the highest abstraction layer in Pig is a query language interface, whereby users express data analysis tasks as queries, in the style of SQL or Relational Algebra.&#8221;</p>
<p>Bingo. If only it&#8217;s true. I&#8217;ll let you know what I find out.</p>
]]></content:encoded>
			<wfw:commentRss>http://garymlewis.com/instchg/2008/11/21/web-query-tools-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

