LearnThe Web as a Database

writes on March 4, 2010

One of the things that’s great about the Internet is that it’s open to everyone. There are a million sites, all different, and everyone is free to invent their own way of doing things. Of course, that has left us with a legacy of systems which don’t always work well together.

When you are using a web browser things mostly work because everyone adheres more or less to web standards, or at least the bits major browsers support. When you want to get data from the web to use in your app things aren’t quite as simple. Enter YQL (Yahoo Query Language).

Almost every developer knows about using SQL for getting data from databases. We wanted to do that for the Internet. YQL allows you to access all kinds of information from the internet in a very similar way to the way you would get data using SQL.

A YQL Query

A basic query in YQL is really easy:

select * from search.web where query = "javascript";

The thing we want to do is select all data (*) from Yahoo! search for web pages where the query is the term “javascript”. If you already know SQL it’s obvious, if you don’t it’s still pretty easy to read. In this case what’s happening under the hood is YQL is mapping the request to the search.web table to a pattern to call the Yahoo! BOSS (search) web service. So we are actually making a request, on your behalf, to the url:


A Sample Result

The data is returned in your choice of XML, JSON or JSONP. Like SQL most YQL data tends to be organised as rows of results.

In the case of the above query it looks like this (NB: this is truncated to show one result, view a full version in your browser):

<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="10" yahoo:created="2010-03-02T09:22:58Z" yahoo:lang="en-US" yahoo:updated="2010-03-02T09:22:58Z" yahoo:uri="http://query.yahooapis.com/v1/yql?q=select+*+from+search.web+where+query+%3D+%22javascript%22%3B">
<url execution-time="157"><![CDATA[http://boss.yahooapis.com/ysearch/web/v1/javascript?format=xml&start=0&count=10]]></url>
<result xmlns="http://www.inktomi.com/">
<abstract><![CDATA[<b>JavaScript</b>.com is your source for all things <b>JavaScript</b>, including tutorials, free java scripts, downloads, tools, <b>javascript</b> source code and other scripting resources.]]></abstract>

Read AND Write with YQL

Looking at the example above you might agree that it’s pretty easy to use the BOSS web service already since it’s RESTful and the URL are easy to construct. Well, the same can’t really be said for the WordPress XML-RPC API. But, with YQL we can still easily insert a new post into a WordPress blog:

insert into wordpress.post (title, description, blogurl, username, password) values ("Test Title", "This is a test body", "http://yqltest.wordpress.com", "yqltest", "password");

A few things are interesting here. Firstly we are still using SQL syntax, but this time we are inserting data. YQL doesn’t just read data, you can write too. More than that even though BOSS uses REST and WordPress is XML-RPC with YQL you don’t need to care. You just ask for what you want and the YQL table deals with it. Once you start using YQL to get one piece of data there isn’t any more cost to using more services.

Getting Started

So far we’ve looked at some of the things YQL can do but not how you would go about getting started, so let’s do that. The easiest way to become familiar with YQL is the YQL console. It’s a lot like the MySQL console in that you can use it to easily query available tables or to test and construct queries you want to make in your application. The console looks like this:

The first and most obvious thing is the query box. This opens with the query “show tables” which as in SQL returns a list of all the available tables. You can see that result underneath the query box. You can use the radio buttons to get the results as XML or JSON, your choice.

On the right hand side we have some navigation to get you started quicker. The list of the current tables is shown and you can open the tree to folders to show specific tables.

By default we show all the Yahoo! tables, but you can also click on a button to show all the tables created by the community. These cover a wide variety of services, almost anything you can think of.

Contribute Tables of your Own

You can also contribute new tables. They can be either really straight forward in the case of a good REST web service, or as complex as it needs to be to get the job done. A simple table might look like this one:

<table xmlns="http://query.yahooapis.com/v1/schema/table.xsd">
<author>Yahoo! Inc.</author>
<sampleQuery>select * from {table} where location="sfbay" and type="sss" and query="schwinn mountain bike"</sampleQuery>
<select itemPath="" produces="XML">
<key id="location" type="xs:string" paramType="path" required="true" />
<key id="type" type="xs:string" paramType="path" required="true" />
<key id="query" type="xs:string" paramType="query" required="true" />

You can see that it’s just XML, and only a few lines at that. The important bits are the “select” element. This represents a select statement we can make in YQL. If you wanted to insert you’d need an “insert” element too.

The select statement has a “url” which maps each “key” to the web service. In this case we have two path parameters which are inserted into the URL with { } and a query parameter which is automatically added to the end. This is the entire data required to let you select from Craigslist in YQL.

YQL is easy and extensible, and best of all there are already hundreds of tables contributed by the community to all kinds of great services like Twitter, Facebook, Etsy, bit.ly?

What are you waiting for, try the console today!

3 Responses to “The Web as a Database”

  1. he select statement has a “url” which maps each “key” to the web service. In this case we have two path parameters which are inserted into the URL with { } and a query parameter which is automatically added to the end. This is the entire data required to let you select from Craigslist in YQL.

  2. It seems that you can use YQL only with public sites, would it be possible to use it, say, on an intranet site?

  3. Thanks for this info, I love the idea of being to access information from the web, like a database. And being able to return the data in any format you want, nice.

Leave a Reply

You must be logged in to post a comment.

Learn to code with Treehouse

Start your 7 day free trial today and get access to hundreds of video courses in web development, design and business!

Learn more