What is the problem I have with MicroFormats: there is no URI

I started to take a deeper look at microformats in the last days and it leaves me a little bit on my appetite. I implemented the hCard microformat on my main page, I pinged Pingorati, and I took a look at my face in Technocrati Kitchen’s Microformat Search Engine. The operation was easy to do, take me a couple of minutes and I was already indexed in their system.

Then I started to dig the available microformat: this one is cool, this other one too, etc.

I took a deeper look at the hReview microformat: cool a way to review anything: movies, books, persons, etc.

But…

 

Where are URIs?

I can describe my personal home page using an hCard and someone can review my profile using a hReview. Wow, it’s cool!

But wait a minute, how can I make the link between my hCard and the hReview? Is there a way to describe a resource (in this case the resource is myself) with a URI (in this case the URI that “point” to “myself” is http://fgiasson.com)?

I performed a search on the microformat Wiki with the term “URI”: 0 results.

I wonder: is it possible to assign a URI to a microformat? It seems that it is not possible.

I don’t understand, it is so important in my point of view. I want to be able to say: that information (hCard) is relation to that resource (myself, my profile). I want to say:

  • This hCard belongs to that URI: http://fgiasson.com
  • I want that Bob (who wrote a hReview about me) could be able to say: that hReview belongs to that URI: http://fgiasson.com

That way, the “Technorati Microformat Search Engine” could merge the information from my hCard and the review Bob wrote about me with the hReview. That way someone that search for “Frederick Giasson” in the search engine would come up not just with my hCard but also with the reviews people would have wrote about me.

Unfortunately, I don’t think it is possible to assign a URI to an hCard for the moment. So, what could we do?

 

UNAPI microformat seems to help revolve a part of the problem

Then I started to dig the Web trying to figure out if it was possible. Then I found the UNAPI microformat. From their website, UNAPI is:

unAPI is a tiny HTTP API any web application may use to co-publish discretely identified objects in both HTML pages and disparate bare object formats. It consists of three parts: an identifier microformat, an HTML autodiscovery link, and three HTTP interface functions, two of which have a standardized response format.

Check the Revision 3 APi for more information

I wasn’t satisfied by this approach.

 

From microformat to RDF using GRDDL

I also found a solution wrote by Danny about a year ago. The idea is to transform a microformat document into a RDF one using GRDDL (so XSLT).

I’ll not explain all the procedure here, but I would strongly suggest you to read the clear explanation on Danny’s blog:

Microformats on the GRDDL

 

Forget microformat and adopt Embedded RDF

This is the first “solution” I had in mind when I started to think about that “problem”: why people are using microformat instead of Embedded RDF?

I have to confess: it is sure that it is a little bit more “complex” to implement, but with good tools it would not.

However, in my humble opinion, the eRDF solution is much more powerful.

What is Embedded RDF (eRDF)? Embeded RDF is a way to embed RDF triples into a XHTML file. But “all HTML Embeddable RDF is valid RDF, not all RDF is Embeddable RDF”.

So, if it is possible to embed RDF document into XHTML documents, it tells me that I can use any existing and widespread ontologies such as DC, FOAF, GEO, SIOC, etc. to describe any content available of my XHTML files, exactly as I can do with microformats but with the power of RDF.

Swoogle, the semantic web search engine, is able to parse eRDF content from web pages (in fact, it already index 350 000 eRDF documents). So why the Microformat search engine developed by Technorati would not do the same?

Personally I prefer that method to microformat because it lets me defining my content in a much more powerful way. However it is true: it’s not as simple as microformat to implement.

 

Tools for eRDF

Some tools exists to handle eRDF:

 

 

Technorati: | | | | | | | | | | | | | |

Using SIOC ontology to connect Talk Digger with other online communities

Talk Digger: “Semantic Web Ready”

As you probably know, I am currently working on a new version of Talk Digger since a couple of months. One of the features I wanted for this new version was to make Talk Digger “Semantic Web Ready”. I wanted to be able to broadcast its content data in such a way that people could create new services above it.

 

SIOC ontology to describe Talk Digger’s online community

Since, I mostly finished developing the infrastructure, the features and the user interface of the next version of Talk Digger. Then I started to think about this problem. At first I was thinking to develop an ontology to describe what is a “conversation evolving on the Web”. Then started to refresh my memory on the best practices to develop an ontology. One of the first steps was to search for ontologies I could re-use in the one I was expecting to develop. I have done my research using Swoogle. I finally found the SIOC ontology (Semantically-Interlinked Online Communities). I started to read their publications, the specification, and the forums and then I found exactly what I needed.

At a first glance, when you read the class and property names, you can’t do any correlation between that ontology and Talk Digger. However, when you start reading their descriptions and start to map the SIOC terms with the Talk Digger functionalities, you start to see emerging how Talk Digger is becoming an online community.

After I realized that the SIOC ontology is exactly the ontology I wanted to develop for Talk Digger (even if the names are not really relevant with Talk Digger’s). Also, using existing ontologies is always better than creating new ones. For these reasons I choose to use SIOC to share Talk Digger’s content to the world and making it “Semantic Web Ready”

First of all, I would suggest you to check back these screenshots of the new version of Talk Digger before continuing to read this post:

Some screenshots of the next generation of Talk Digger

 

Mapping SIOC classes and properties to Talk Digger functionalities

The first step if to map the SIOC ontology terms to the Talk Digger’s web site functionalities. Take a look at the schemas bellow to check that mapping. At the left you have the SIOC ontology classes and properties (I only put the properties that create relations between classes. Properties like sioc:topic, sioc:description, etc. are not on that schemas for clarity’s sake). At the left you have the Talk Digger system. In the middle you have the relations between the SIOC ontology and Talk Digger.

 

Update [15 June 2006]:

This schemas changed since its first publication. I added the sioc:topic property to the sioc:Forum and sioc: Post classes. I use the tags defined by Talk Digger users to find the topics of the Forum and the Posts. The 3 most tagged words are used as conversation topics.

Additional Note:
These changes are not reflected in the RDF and SVG documents(RDF graphs) bellow.


[Click on the schemas for the Full View]

 

Description of the schemas

  • The sioc:Site instance is Talk Digger’s web site (talkdigger.com)
  • A sioc:Forum is a Talk Digger conversation page. I consider that a conversation page is a forum. The topic is in relation with the web document that started the conversation. So each time that a new URL is tracked by Talk Digger, then a new “forum” is also created. Forums are interlinked together, so if a url A and B are tracked by the system and that the web document at the url B link to the url A we will have: sioc:Forum(A ) – sioc: parent_of –> sioc:Forum(B ) AND sioc:Forum(B ) — sioc:has_parent –> sioc:Forum(A )
  • A sioc: Post is a comment wrote by a Talk Digger user on a conversation page. So each time a user write a comment, a sioc: Post is created in the sioc:Forum.
  • A sioc:User is a Talk Digger user. A Talk Digger user is defined by his internal and unique username. The personal description of the sioc:User is related (via the rdfs:seeAlso property) to it’s FOAF profile (archived in the Talk Digger System).
  • Each time a conversation page is created in the system, a related sioc:Usergroup is also created. Each time a user start to track that conversation using Talk Digger, it also subscribe to the sioc:Usergroup. So: sioc:User(A) — sioc:member_of –> sioc:Usergroup(conversation)

 

Special case with sioc:Forum

As I said above, two sioc:Forum can be linked together if a url A and B are tracked by Talk Digger and that the web document at the url B link to the url A.

But what happen if the url A link to the url B too?

 

 

There is a circular loop in the model: both sioc:Forum are child and parent.

In the context of Talk Digger, it tells me that A is part of the conversation started by B and B is also part of the conversation started by A. It makes sense in that point of view.

However I am not sure that it semantically makes sense in the context of the SIOC ontology.

 

sioc:reply_of and sioc:has_reply to recreate the course of events

The sioc:reply_to and sioc:has_reply of the sioc: Post class are really great in the context of Talk Digger (and blog comments) because systems will be able re-create the course of events, without needing dates, only by following the graph created by these relations.

 

Conclusion

In the next days I will implement the SIOC ontology in Talk Digger and then I will post a snippet of the generated code for peer review.

I am pretty sure that using the SIOC ontology in Talk Digger will give the possibility to other people to use its content in a pretty and novel way.

I could even make Talk Digger a SIOC data warehouse that could crawl SIOC content sites and add them into Talk Digger conversations.

 

Technorati: | | | | | | | | |

Why using RDF instead of XML?

Danny Ayers dropped a line about that post I wrote yesterday:

Fred Giasson has a nice approach to encouraging feedreaders to do more interesting stuff (one nit – it isn’t clear from this why it’s better to use RDF that Plain Old XML with extensions).

He is right: it was really not clear. The fact is that I am not sure that I will be able to make it clearer.

The thing is that we could certainly do that using Plain Old XML and it would work quite fine. However for me it is partly a question of vision: the Semantic Web. Why not starting to implement the base of the system even is the technologies that would play with that content are not right now ready for prime time? It is sure that it is a tradeoff: adding complexity to explicit meaning in data. But I think that it could pay on the long run.

Semantic Web or not, I think that there is some reasons why we should use RDF instead of XML to perform that task. I will use the words of some mastermind of the Semantic Web and its technologies to try to sketch an answer to that question asked by Danny.

I will start by using Danny’s own word:

You can’t judge the benefits of RDF by judging individual applications in isolation, any more than you could judge the web by looking at a single host. Each application probably could be simpler if it used vanilla XML instead of RDF. But by using RDF they have a level of interoperability that goes far beyond what would be available otherwise. FOAF MusicBrainz data can be inserted directly into RSS 1.0 feeds and their meaning remains globally well-defined – the creator of a resource has the same meaning whether that resource is a document, blog item or piece of music. It means that the same tools, with little or no modification, can be used across all domains. The wheel doesn’t have to be reinvented for every purpose.

The main concepts here are interoperability and re-usability. So I could use the module I am developing for Talk Digger that handle FOAF profiles with little or no modification to handle FOAF information incorporated into a RSS 1.0 feed for example. I think that this is a question of good system design (in the present case, the system we are talking about here is the Web).

In the same blog post Dan Brickley, another brain of the Semantic Web, added in the comment section:

One way to think about this: the Resource Description Framework (RDF) is a family of XML applications who agree to make a certain tradeoff for the sake of cross-compatibility. In exchange for accepting a number of constraints on the way they use XML to write down claims about the world, they gain the ability to have their data freely mixed with that of other RDF applications.

Since many descriptive problems are inter-related, this is an attractive offer, even if the XML syntax is a little daunting. MusicBrainz can focus on describing music, RSS 1.0 on describing news channels, FOAF on describing people, Dublin Core on describing documents, RdfGeo on places and maps, RdfCal on describing events and meetings, Wordnet on classifying things using nouns, ChefMoz on restaurants, and so on.

Yet because they all bought into the RDF framework, any RDF document can draw on any of these ‘vocabularies’. So an RSS feed could contain markup describing the people and places and music associated with a concert; a calendar entry could contain information about it’s location and expected attendees, a restaurant review could use FOAF to describe the reviewer, or a FOAF file could use Dublin Core to describe the documents written by its author, as well as homepages and other information about those authors.

So, for any particular application, you could do it in standalone XML. RDF is designed for areas where there is a likely pay-off from overlaps and data merging, ie. the messy world we live in where things aren’t so easily parceled up into discrete jobs.

But it is a tradeoff. Adopting RDF means that you just can’t make up your XML tagging structure at random, but you have to live by the ‘encoding’ rules expected of all RDF applications.

This is so that software written this year can have some hope of doing useful things with vocabularies invented next year: an unpredictable ‘tag soup’ of arbitrary mixed XML is hard to process. RDF imposes constraints so that all RDF-flavoured XML is in broadly the same style (for example, ordering of tags is usually insignificant to what those tags tell the world). Those constraints take time to learn and understand and explain, and so adopting RDF isn’t without its costs.

And so the more of us who use RDF, the happier the cost/benefit tradeoff gets, since using RDF brings us into a larger and larger family of inter-mixable data.

Tim Berners-Lee also wrote a note at the W3 in 1998 to explain Why RDF model is different from the XML model.

Finally Lee W. Lacy in his book OWL: Representing Information using the Web ontology language is saying:

While XML provides features for representing and interchanging information, it is insufficient for supporting Semantic Web requirements. The primary reasons that XML cannot support the Semantic Web directly are that XML defines syntax, not semantic, and XML descriptions are ambiguous to a computer.

XML is a data formatting language that only provides a grammar (syntax). XML tag names provide few hints to support importing software. XML tags are no better than natural language for providing meaning. XML tag names are simply strings that may be meaningful to humans but are meaningless to software. While XML tag names may suggest meaning to a human reader, they have no meaning by themselves to parsing software. A tag name (e.g., “<orange>”). The meaning of the tags must be agreed to a priori to support interoperability. XML languages often require voluminous companion documentation to explain the meaning of tags [important point for the discussion that currently interest us].

XML does not represent the semantics, the meaning of a concept. Software must understand how to interpret XML tags to perform more meaningful applications. However, there is no straightforward way to describe semantics in traditional XML. XML formats can be specified in DTDs and XML schemas, but these specifications do not describe the relationship between resources.

XML adds structure to data on the current web. However, it is too focused on syntax and exhibits ambiguiyy that make it insufficient to support the Semantic Web itself. A common problem with using XML is that there are too many ways to describe the same thing. Semantic Web developers leverage XML’s features in different ways. They can define their own tag names, nesting structures, identifiers, and representation styles.

While XML provides a number of useful features, there are serious problems with using XML by itself to support the Semantic Web. RDF overcomes many of these challenges […]

At the light of these writings, for the case that interest us, it is a question of cross-compatibility and inter-mixability of data. In the long run, it is for the emergence of the Semantic Web and having meaningful documents to machines.

Technorati: | | | | | | | | | |

Next step with Web Feed Readers: from Passive readers to Active users!

You can download 129 different web feed readers at download.com. Primary, they will all do the same thing: aggregating RSS and Atom feeds content. After they will differ in the way they will manage and present the information. That’s it.

In that case, what is the next step with Web Feed Readers… if any?

If I check the big picture, I can find out one recurrent user state: they sat in front of their web feed reader passively reading their uninterrupted incoming flow of feed content.

From passive readers to active users!

I am playing with this idea since I answered to a comment from Hussein on one of my old blog post talking about the security of the Gmail Atom feeds. There is what I wrote:

Google is supposed to have tested a RSS feed service for Gmail in their GoogleLabs in 2004. I can not confirm if the service is always available because I do not have any Gmail accounts and I can not sing-in for one. This service put new incoming messages of a Gmail account into a RSS feed. Then if you subscribe to that feed you will see your new Gmail messages directly into your web feed reader. What an excellent idea! However, I was surprised to found that they used SSL to create a secure channel between the feed and the feed reader.

Then I thought about all the things that we can aggregate in these days: blog content, incoming emails, UPS package delivery status, calendar events, etc, etc, etc. Then I realized: people have all that content in their face, but what can they do with it? Some web feed readers and other services now implement a “blog this item” feature. It lets the user instantly blog about that specific item. Great, users can act accordingly to aggregated content via their feed reader. Why not extending this behavior to everything else?

 

The email example

In a hypothetical world, I am receiving my incoming email in my web feed reader via a RSS feed service provided by my mail provider.

What is cool is that I will receive my news, my emails, my UPS delivery status, my calendar events, etc, at the same place.

So, I just received an email from Sophie. Instead of opening my email client to answer her (what would be really, really unproductive), my web feed reader detect that the incoming web feed item is an email and let me answer directly from its interface.

Wow, one single application to do all these things.

 

How it would works?

Technologies are already available to be able to do that. I will not re-open the RSS 1.0 vs. RSS 2.0 debate here but this example is just another one in favor of using RSS 1.0 instead of RSS 2.0.


The discussion about RSS 1.0 and RSS 2.0 continue here :

Why using RDF instead of XML? [25 May 2006]

Fred, you are telling us that RSS 1.0 is much powerful than RSS 2.0? Yes, all the power of RSS 1.0 resides in the fact that it supports modules. This capability is given by RDF and his ability to import external RDF schemas to extend his vocabulary. What is a module? A module gives the possibility to the content publisher to extend his file format’s vocabulary by importing external RDF schemas.

RSS 1.0, RSS 2.0: make it simple not simpler

What web feed readers need is to know what particular feed item is (a sort of type). What we need is something to tell to the feed reader that this feed item is in fact an email, and not a normal feed item, and that there are its characteristics.

This is what RSS 1.0 modules are all about. This is a way to extend the information about an item in a web feed.

That way, I could tell to feed readers that this particular web feed item is not a normal one, it is an email, and there are its characteristics (sender email, receiver email, subject, body, attached files, etc, etc).

What is wonderful is that if the web feed reader cannot understand the content provided by the module, then he just doesn’t have to care about it and display the item as if it was a normal feed item. This is what is great with modules: you can act according to or just ignore them, it doesn’t change anything.

 

The email example with RSS 1.0 modules

Now, how it would works? Simple, we could create a RSS 1.0 module that would describe what is an email (a module is an ontology that describe classes (sender, receiver, etc.) and their properties (subject, from, to, etc.)

I will use the mailont ontology used in MailSMORE for my example.

Considering this module, a RSS 1.0 feed of a Gmail email feed would look like something like:

<rdf:RDF
xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:dc=”http://purl.org/dc/elements/1.1/”
xmlns:email=”http://www.wam.umd.edu/~roark/mailont.rdf#”
xmlns=”http://purl.org/rss/1.0/”
>

<channel rdf:about=”http://gmail.com/?_fl=rss1.0″>
<title>GMail feed</title>
<link>http://gmail.com</link>
<description>GMail secure feed</description>
<items>
<rdf:Seq>
<rdf:li resource=”hhttp://gmail.com/getemail?r123″ />
</rdf:Seq>
</items>
</channel>

<item rdf:about=”http://gmail.com/getemail?r123″>
<title>Hello Bob! Tonight’s dinner!</title>
<link>http://gmail.com/getemail?r123</link>

<!– only used to describe the content provider, in that case it is GMail –>
<dc: publisher>Google</dc: publisher>
<dc:creator>GMail system</dc:creator>
<dc:rights>Copyright © 2006 Google</dc:rights>

<email:Message rdf:ID=”Current Message”>
<email: DateTime>10:05:03 25/10/2006</email: DateTime>
<email:Subject>Hello Fred! Tonight’s dinner!</email:Subject>
<email:To>[email protected]</email:To>
<email:From>[email protected]</email:From>
<email:Cc></email:Cc>
<email:MessageId></email:MessageId>
<email:InReplyTo></email:InReplyTo>
<email:ArchiveUrl></email:ArchiveUrl>
<email:References></email:References>
<email:Body>Hello Fred! it was to let you know that it’s working for me for tonight’s dinner. Take care! Sincerely yours, Sophie xxx</email:Body>
</email:Message>

</item>

</rdf:RDF>

Now a Web Feed Reader could act upon this meta-information if he is able to understand it.

Giving this information, I could create a web feed reader that understand the “email” RSS 1.0 module (ontology) and act vis-à-vis its content. So my web feed reader would not only display static information to its users, but it would also let them act (reply to the email) accordingly to that information!



This simple schemas only shows how a RSS reader would act accordingly to the module he understand and not.

Technorati: | | | | | | | | |

EventSites: Web Services architecture pushed to the extreme

I just finished reading that article on Ajaxian that pointed me to Eventsites.

What is EventSites?



Eventsites is a mashup. It’s a web application that uses data and functionality provided by external web services.

  • Eventsites uses no database of its own
  • The only server logic is a single redirect script enabling cross-domain XHR requests
  • All the functionality of the site comes from other websites



So how it works? Simple: it use existing web services likes EVDB, Google Maps and Flickr to create a service of its own: EventSites. EventSites is only an interface, a sort of hub that links these three web services together. EVDB will archive the events’ information, Google will show where the events are in the World and Flickr will host the events’ photos.

Some problems with EventSites

One of the small problems I see with EventSites (like too many other Ajax web sites) is that it only works on FireFox. The problem with Ajax interfaces is that you have to use tricks to make it works the same way on all browsers. This is really time consuming and it is why some developers only make it works on FireFox and/or Internet Explorer. Personally if I am not able to make a functionality working on Opera, Safari, FireFox and Internet Explorer I do not implement it in the Web interface.

Another thing is that most of EventSites’ source code is publicly available because it uses Javascript scripts. It uses some server side things but not that much. This is not a problem in itself but we will see much more copycat web sites wandering on the Web than we currently do see with bookmarking web sites for example.

Sharing the functionalities

The idea of a Web built by web services is somewhat old. However it is not until recently that Web developers (most without any budgets) have been able to create useful mashups with using the existing web services. A mix of powerful open source technologies, low cost hardware and bandwidth, created a great environment that let us see see such services emerging on the Web, and EventSites is one great example.

Sharing the content

More and more functionalities are freely shared on the Web via web services. Functionalities are a thing, but content is another thing. In the best world we will have both: services that share functionalities, and services that share content.

The first step in that direction have been done: a major content publisher created an architecture to freely share its content, meaningful to machines (using ontologies and ontology languages to format the content files), over the Internet. Yup, I am talking about the BBC.

Naturally it is in that direction where I push Talk Digger (as a content sharing web service). Using an ontology describing what a [Web] conversation is, other people will be able to retrieve, analyze and display the information in their mashups.

I can’t wait to see which services will emerge in such an network environment.

Other resources about EventSites:

Eventsites: serverless web-development
Eventsites Architecture
Componentising the web

Technorati: | | | | | | | | | |