Semantic Web

Converting your WordPress and Mediawiki data into RDF on-the-fly

Semantic Web (RDF) data won’t come from initiatives such as LiveJournal.com and Tribe.net with the exportation of their user profiles into RDF using the FOAF ontology; at least not at first. These initiatives are marginal considering the current state of the Web: billion of web pages where most of them are archived into relational database and generated, on-the-fly, in HTML.

Semantic Web (RDF) data will come from the conversation of relational databases of widely used web software such as WordPress, Mediawiki and phpBB, into RDF using some ontologies. Some methods can be used:

This blog post will show you how we can do the same with your WordPress blog and your Mediawiki wiki using Virtuoso RDF Views.

This is quite powerful: by using these views any WordPress or Mediawiki instance could be queried using SPARQL. Other views could easily be created for phpBB (currently on the way), and virtually any relational database accessible from the Web.

Since developing these views is quick and simple, it makes them certainly one of the best tools to convert current relational data sources into RDF.

WordPress and Mediawiki RDF Views


Mitko Iliev
developed these two RDF Views that are using the WordPress and Mediawiki database schemes and convert them into RDF using a RDF View. I added some comments in the code but as you can notice, they are quite simple and intuitive to understand (if you have some knowledge in SPARQL.

Installing these RDF Views

You have 3 possibilities to install these RDF Views.

  1. If you have the commercial version of Virtuoso you only have to connect the MySQL remote database with Virtuoso via Conductor. That way you will see MySQL databases as if they would be local into Virtuoso.
  2. If you have the open-source version of Virtuoso you have two choices:
    1. You make a SQL dump of the MySQL database and import it into Virtuoso.
    2. You install the upgraded version of WordPress or Mediawiki developed by OpenLink Software. These upgraded versions of WordPress and Mediawiki use Virtuoso as dbms instead of MySQL. These two versions should be making available to the public by OpenLink soon.

The idea here is to give access to the relational data to Virtuoso by using one of these three methods. After that, it is just a matter of sending SPARQL queries against the RDF View.

Querying a MediaWiki instance using SPARQL


I will use that MediaWiki instance
to show you a couple of examples. This is a modified version of MediaWiki 1.7 that uses Virtuoso instead of MySQL as dbms. Then we installed the RDF View I talked about above. From that point, we can query this Mediawiki wiki instance using SPARQL. Remember that it is always running in a relational database, but thanks to the RDF View, we can view its data in RDF too!

  • Listing all triples from the RDF view: See results
  • Listing the names of the Wikis hosted on this server: See results
  • Listing the wiki pages of the “DemoWiki” wiki instance: See results
  • Listing the wiki pages created by the “demo” user: See results

Etc.

We can endlessly continue like that. What I would suggest you to do is to click on the results you get in these web pages, and to click on the “explore” link. That way, you will jump from node to node and find interesting stuff.

Conclusion

I believe that it is the best way to push people to adopt the semantic web, and all its concepts, as The way to describes things on the Web. Once we will get all that useful data from existing sources (musicbrainz, US census data, geonames, name it) and that people will start to release services using all this data in a useful way, then people will start to generate their content for the semantic web. This is why we should continue in that direction. Many people are already working to convert existing sources of data (relational database, web APIs, etc.) into RDF: the linked-open-data community, Zitgist, OpenLink, and probably many others. I would guess (in fact I am sure) that in one year we would have several billion of triples ready to be searched and browsed by Web users.

6 thoughts on “Converting your WordPress and Mediawiki data into RDF on-the-fly

  1. Let other know your data!

    Once you jave your RDF view or dump retrievable resolving a URL, make sure other people know that it exists so it can be indexed and information can be meshed with other applications.

    To do so PingTheSemanticWeb.com (they will then put your page in a RSS list of “updated RDF”) and or ping Sindice.com

    Sindice will index your rdf file so that your RDF location (URL) will be returned to those that request information sources about the concepts (URI) you talk about. Using Sindice simple restful API, clients can then integrate your infortmation for whatever purpose.

    Giovanni

  2. Not convinced about some of the arguments you mention. Having developed the SIOC plugin for WordPress I can say that, yes, it takes some time in development, but so do RDF Views. Once developed, though, such a plugin just works.
    .
    What is easier – installing one single plugin for WordPress or installing a custom upgraded version of WordPress?
    .
    In summary: Virtuoso is a larger architecture that enables SPARQL queries, more power at a price of installing custom software. SIOC plugin is a pragmatic solution that aims to put blogs on the Web of Data and lets users of data to do all the fancy stuff.
    .
    Both are useful and possibly complement each other 🙂

  3. Hi Uldis!

    Yeah sure you are right. My examples are possibly not that good, however the idea was to show that RDF could be make from any relational database sources from the Views (so, looking at the semantic web in a lens), and that, even if no plugin api is possible for a certain system 🙂

    But all this is arguable, for example, personally, it would take me much less time to map any system using a RDF View over a remote rdb than learning all the APi to develop plugins. But yes, you need virtuoso commercial edition to do that that straighforward. Anyway, there are so many nthigns to create RDF data, people have choice, and it is what count at the end 🙂

    Take care,

    Fred

  4. Hi Giovanni!

    Thanks for precise that people should notify such services so that web services can find their localization on the Web.

    Take care,

    Salutations,

    Fred

  5. Giovanni has it on a T. `
    At such an early stage in the era of RDF, Propagation and reproduction of semantic data inside this wild wild web is what matters most, not just creating infanticide rdf.

    Secondly, RDF, just like any other semantic format/language, needs to be generated effortlessly, in a stealth fashion, and as your rightly point, using popular tools.

    Obviously your WordPress plugin could prove a godsend in that direction.
    However, I’ve still got to get my head around this wonderful OpenLink Virtuoso, and it ain’t the easiest of tasks.

    Bon Courage

    LSJ

Leave a Reply