Think back to when you first began using the web. It might be a couple of years, probably five, ten or even more. How did you use the web back then? More likely than not, you searched using Google, or Lycos, or Alta Vista (yeah, I’m old, I remember when Alta Vista came along and made web search actually work), or if you are really old like me, you’ll have used a directory – back in the day when human editors could look at all the sites in the world, and manually classify them into directories like Yahoo!, LookSmart, or the open directory project. That didn’t seem to scale past a few million sites.
But our usage patterns then and now are largely the same. Search engines indexed pages based on page content, and other measures of relevance; users searched using keywords, and looked perhaps at the first 5, 10, or at a pinch 20 results. And this is largely how we use the web today. We seek out atomic pieces of information – like pages, and we read them.
But the pages on the web aren’t simply discrete islands of information. Often they contain very similar information – from mundane things like contact details for individuals and organizations, to reviews of films, restaurants, books, to opinions about what’s worth reading online. Yet aggregating all those contacts details to create a distributed Yellow pages-like service, or those opinions about restaurants, books, films, gadgets to create decentralized versions of IMDB reviews, or Amazon books reviews, or Zagats, is very hard, because there are no standardized formats on the web for contact details, or reviews, and many many other kinds of commonly published information. The wisdom of the web, the collective opinions and attitudes expressed there are largely inaccessible, because software isn’t smart enough to recognize it.
Say you want to sell your car. How do we go about doing that? Like with IMDB for film reviews, or Amazon for book reviews we find a centralized place to post a classified listing – or maybe several. Each requiring our effort and time to fill in details, each possibly requiring some form of payment. Each a walled garden.
But what if we could somehow post this listing to our blog, and then easily let services which cared about classifieds listings know that there is a new or updated classified at my site. The missing piece that would enable this is a standard format (after all html doesn’t have a element).
I’m sure you can easily imagine all kinds of situations like these, where centralized solutions are required, simply because there is no standardized, semantic way of marking up the data required for a distributed or devolved system to work.
So data stays locked up in proprietary databases, despite it effectively being public data. Yellow pages-style directories are jealously guarded by their compilers even though it’s our details which are being kept. Amazon owns your reviews, and reserves the right to edit or delete them. And all this cool data that could be being mashed up in amazing ways, is largely inaccessible.
Contents
The open source effect
The last decade or so has seen a growing awareness of the importance of open source in the software world, and open document formats (standards like HTML and CSS and PNG), but we are only just beginning to understand the importance and power of open data. Plenty of open data, published under creative commons and other liberal licenses is available, but we can publish all the data we like, without a way of making it accessible to software via meaningful formats it’s of little useful value.
But where will these formats, for reviews, for classifieds listings, for citations, and as yet undreamt of uses come from?
- Do we wait for the W3C to give us XHTML2, complete with all the semantics we might ever need? It seems a far fetched hope. A foolish one in reality.
- Do we all invent new XML languages for our specific needs – like reviewML, classifiedML? That seems overkill for discrete chunks of information that lots of people are already publishing as it is.
- Do we need all of today’s web developers, who let’s face it often don’t even do HTML all that well, to become XML developers. That’s hardly realistic.
But if we think for a moment about how the web has developed so successfully – it’s been incremental, evolutionary, not revolutionary – we might find a way forward that doesn’t break our tools and browsers and force developers to learn a whole new set of skills and concepts. And this is the approach that microformats take to developing simple, open web data formats built on existing data formats and existing developer practices to enable and encourage decentralized development, content and services.
So let’s just take a quick look at what microformats are and aren’t, then we’ll look at some very significant organizations already using them. Then we’ll take a look at just how easy it is to use them.
What microformats are
microformats are
- simple
- HTML based
- data formats
- based on existing standards
- based on current developer practice
to bring richer semantics to today’s web, and enable decentralized services
- without breaking browsers
- without breaking tools
- without requiring massive changes to developer knowledge
What microformats aren’t
Microformats aren’t a whole new language. They aren’t a whole new way of developing for the web. They don’t try to solve all the web’s problems. And they certainly aren’t hard to use. If you use valid HTML or XHTML, and have used the class and id attributes, then you have all the know how you need to use microformats right now. And if you don’t, a few minutes looking at some of these articles about the proper use of these attributes will get you up to speed
- Competent Classing, by Eric Meyer
- A touch of class by Tantek �elik
- Use class with semantics in mind, W3C
- More about the class attribute, Tantek �elik
OK, but who is actually using microformats?
Digital Web magazine recently published an article of mine which has a round up of what big and small publishers, like Yahoo! and Cork’d are doing with microformats, what support for microformats is available in tools like DreamWeaver, WordPress, MoveableType, and Drupal, and what aggregators like Technorati and Edgeio are doing with microformatted content on the web. So rather than rehash that here, take a look at the state of the art in mid 2006. You’ll most likely be surprised at just how widely used they are.
So how do we use them? It’s hard right?
If you can use HTML, microformats are simple. I mentioned that there are tools and plugins for some of the web’s most popular content development and design tools, but let’s take a look at how to hand code a really commonly used, and quite sophisticated microformat, hCard
One of the most common pieces of information on the web is contact information for people and companies.
Here is how we might typically mark up a company’s contact details, taken from the original markup for our conference company, Web Directions
Web Directions Conference Pty Ltd
8/54 Mitchell St
Bondi NSW 2026
Australia
Phone/Fax: 61 2 9365 5007
Email: info@…
But, we can add a little microformatty goodness, and make this much more machine readable.
We use the hCard micrformat – which for the geeks in the audience is just vCard, a format we all use probably without knowing it, for address books, in HTML.
Here is how easy it is. First, we want to say that this whole construct is an hCard, so we use a class value on our root element of vcard
class=”vcard”>
Web Directions Conference Pty Ltd
8/54 Mitchell St
Bondi NSW 2026
Australia
Phone/Fax: 61 2 9365 5007
Email: info@…
Now, we’ll designate the name of the organization for this hCard
class=”vcard”>
class=”fn org”>Web Directions Conference Pty Ltd
8/54 Mitchell St
Bondi NSW 2026
Australia
Phone/Fax: 61 2 9365 5007
Email: info@…
Where fn is vcard’s way of saying the display (’formatted’) name, and org the organization – vcard comes from the prehistoric times of the early 1990s, and back then we were really concerned with wasting bytes, so almost everything gets abbreviated
Next we have the address bit, so we mark that up as follows
class=”vcard”>
class=”fn org”>Web Directions Conference Pty Ltd
class=”adr”>
class=”street-address”>8/54 Mitchell St
class=”locality”>Bondi class=”region”>NSW 2026
class=”country-name”>Australia
Phone/Fax: 61 2 9365 5007
Email: info@…
Here we’ve had to add a little bit of HTML, the span elements, to create more semantic containers for our chunks of information – the various components of the address, such as region and postal code. All these class names, by the way are directly taken from the field names in the original vCard specification. So, the semantics defined by the fields of vCard become the semantics of hCard. We are ‘reusing existing formats and schemas’, rather than reinventing the wheel,a key principle of the microformats way.
Next we have the telephone and email parts
class=”vcard”>
class=”fn org”>Web Directions Conference Pty Ltd
class=”adr”>
class=”street-address”>8/54 Mitchell St
class=”locality”>Bondi class=”region”>NSW 2026
class=”country-name”>Australia
Phone/Fax: > 61 2 9365 5007
Email: >info@…
And we are more or less done. We could now go and remove our own earlier ‘pseudo semantic’ markup like this
Web Directions Conference Pty Ltd
8/54 Mitchell St
Bondi NSW 2026
Australia
Phone/Fax: 61 2 9365 5007
Email: …
And now we have a completely standardized hCard for the contact information for Web Directions.
And, as we mentioned, if you are even lazier there are tools for DreamWeaver, various blogging system and CMS like WordPress and Textpattern, and the online hCard creator to do all the heavy lifting for you.
Some other common, and widely used microformats that are no harder to use than this, and which might well fit your existing development needs are
- hCalendar – for all kinds of events – used by upcoming.org and eventful as their publishing formats
- hReview – for reviews of all kinds of stuff – used by Yahoo! Local and Yahoo! Tech, and the really cool Cork’d among others
- rel-tag, for tagging your pages – widely used by bloggers, aggregated by Technorati, with over 100 million tagged pages indexed
- votelinks – for voting on stuff – See the cool experimental site http://folkse.de [check this]
- XFN, the original microformat, for describing your relationships with people – friends, colleagues, whom you’ve met, and so on
- hResume – for marking up your resume
- hListing for classified listings, used by Edgeio, a real innovator in decentralized services
So what are you waiting for? Get out there and start marking up your content with Microformats. You’ll be adding to the semantic richness and usefulness of the the web, you’ll be bringing a new sophistication and consistency to your markup, and you’ll be helping the next great evolutionary step in the web – a genuinely semantic web of data, and not just pages.
Some further reading and resources
- Microformats.org the home of microfomats
- Microformatique the unofficial microformats blog
- Introduction to Microformats by Brian Suda, O’Reilly “ShortCuts” series, PDF
- Microformats cheat sheet by Brian Suda