Matt Peperell | S is for "schema.org"

A few months ago I encountered an open set of guidelines for publishing structured data on websites. This standard is referred to by the name “schema.org” or sometimes by just “schema” and it is published at https://schema.org.

The use is rather niche and likely only of interest to site creators and authors of software which indexes web content, although I’m being rather reductive here.

When writing structured data for the web, especially in prose, a search engine cannot extract the relevant details unless given appropriate hints.

E.g. imagine a heading

<h1>Arrival</h1>

Without any such clues, it’s impossible to know whether this refers to

arrival instructions for some event
arrival times for a train or flight
The 2016 movie Arrival starring (amongst others) Jodie Foster
Possibly something else

But the markup can be changed to include this information without affecting the presentation. In fact, I’ve marked up the movie example in that third bullet point. The markup I used is

<span itemscope itemtype="https://schema.org/Movie">
  The <span itemprop="copyrightYear">2016</span> movie
  <span itemprop="name">Arrival</span> starring (amongst others)
  <span itemprop="actor" itemscope
    itemtype="https://schema.org/Person">
    <span itemprop="name">Jodie Foster</span>
  </span>
</span>

It looks very noisy in raw markup, I’ll admit, but look up again at that third bullet point. The extra information such as the the fact that the number is a year (obvious to a human) can be parsed by compliant tools, but doesn’t clutter the text as seen by the end user.

Another case for where they’re useful is contact details for an organisation. Ever searched for a company’s phone number etc - in a search engine, I mean, not directly on their site? With appropriate hints (such as those provided by schema.org) the search engine can provide this even on the search results page. It will know to extract certain information and ignore the rest. You’ll be unsurprised that there is a suitable schema defintion: Organization which has the telephone property.

In some of the other posts I’ve made in this series I mention various books and I’ve made an effort to mark up those book references with the correct schema type, Book.

I’ve been unable to find recent adoption figures for the initiative, but the Wikipedia article currently shows the 2016 adoption figure at 17% across a subset of marketing agencies and industry adjacent organisations. Not excellent, but perhaps the figure is higher now.

I don’t know of any browser plugins to surface this information, but I can imagine a smart phone feature: being able to extract a company’s phone number when viewing their ccontact page rather than having to copy and paste the number into the dialler app.

The book example I gave above? What if your browser (whether tablet, phone, desktop, laptop or other device) were able to add this to your wishlist from your favourite bookstore? Or reserve it at your local library?

You can likely think of more examples when browing the available schemas.

Wishful thinking, perhaps, but wouldn’t it be nice?