Archive for April, 2007

Gone for the next 2 weeks

I am leaving tomorrow morning for the California until the 11 May. Until then I will be reachable via email, but with some latency, so please pardon me if I don’t answer the same day you send the email.

Do not hesitate to send me an email if you have questions, comments or suggestions about my works, I will be a pleasure for me to answer to them, it will just take a little bit longer than usually.

Zitgist in one image

zitgisthome3.gif

Converting your WordPress and Mediawiki data into RDF on-the-fly

Semantic Web (RDF) data won’t come from initiatives such as LiveJournal.com and Tribe.net with the exportation of their user profiles into RDF using the FOAF ontology; at least not at first. These initiatives are marginal considering the current state of the Web: billion of web pages where most of them are archived into relational database and generated, on-the-fly, in HTML.

Semantic Web (RDF) data will come from the conversation of relational databases of widely used web software such as WordPress, Mediawiki and phpBB, into RDF using some ontologies. Some methods can be used:

This blog post will show you how we can do the same with your WordPress blog and your Mediawiki wiki using Virtuoso RDF Views.

This is quite powerful: by using these views any WordPress or Mediawiki instance could be queried using SPARQL. Other views could easily be created for phpBB (currently on the way), and virtually any relational database accessible from the Web.

Since developing these views is quick and simple, it makes them certainly one of the best tools to convert current relational data sources into RDF.

WordPress and Mediawiki RDF Views


Mitko Iliev
developed these two RDF Views that are using the WordPress and Mediawiki database schemes and convert them into RDF using a RDF View. I added some comments in the code but as you can notice, they are quite simple and intuitive to understand (if you have some knowledge in SPARQL.

Installing these RDF Views

You have 3 possibilities to install these RDF Views.

  1. If you have the commercial version of Virtuoso you only have to connect the MySQL remote database with Virtuoso via Conductor. That way you will see MySQL databases as if they would be local into Virtuoso.
  2. If you have the open-source version of Virtuoso you have two choices:
    1. You make a SQL dump of the MySQL database and import it into Virtuoso.
    2. You install the upgraded version of WordPress or Mediawiki developed by OpenLink Software. These upgraded versions of WordPress and Mediawiki use Virtuoso as dbms instead of MySQL. These two versions should be making available to the public by OpenLink soon.

The idea here is to give access to the relational data to Virtuoso by using one of these three methods. After that, it is just a matter of sending SPARQL queries against the RDF View.

Querying a MediaWiki instance using SPARQL


I will use that MediaWiki instance
to show you a couple of examples. This is a modified version of MediaWiki 1.7 that uses Virtuoso instead of MySQL as dbms. Then we installed the RDF View I talked about above. From that point, we can query this Mediawiki wiki instance using SPARQL. Remember that it is always running in a relational database, but thanks to the RDF View, we can view its data in RDF too!

  • Listing all triples from the RDF view: See results
  • Listing the names of the Wikis hosted on this server: See results
  • Listing the wiki pages of the “DemoWiki” wiki instance: See results
  • Listing the wiki pages created by the “demo” user: See results

Etc.

We can endlessly continue like that. What I would suggest you to do is to click on the results you get in these web pages, and to click on the “explore” link. That way, you will jump from node to node and find interesting stuff.

Conclusion

I believe that it is the best way to push people to adopt the semantic web, and all its concepts, as The way to describes things on the Web. Once we will get all that useful data from existing sources (musicbrainz, US census data, geonames, name it) and that people will start to release services using all this data in a useful way, then people will start to generate their content for the semantic web. This is why we should continue in that direction. Many people are already working to convert existing sources of data (relational database, web APIs, etc.) into RDF: the linked-open-data community, Zitgist, OpenLink, and probably many others. I would guess (in fact I am sure) that in one year we would have several billion of triples ready to be searched and browsed by Web users.

The XBRL Ontology: Financial and Economic Ontology based on XBRL Taxonomies

A new ontology development group has been formed: the XBRL Ontology Specification Group. This new ontology will describes financial and economic data in RDF.

Introduction to the XBRL Ontology

As introduced by Kingsley:

The parallel evolution of the XBRL and the Semantic Web is one of the more puzzling of technology misnomers. The Semantic Web expresses a vision about a Web of Data connected by formal meaning (Context). Congruently, XBRL espouses a vision whereby by formally defined Financial Data is accessible via the *Web (and other networks). The Semantic Web uses Schemas and Ontologies for defining Data Domains while XBRL uses Taxonomies that are XML Schema Based. The Semantic Web uses XML as one of its Data Interchange formats (i.e RDF/XML) while XBRL is based on XML at all levels (model and instance data).

It is the goal of the XBRL Ontology project that we mesh the XBRL and Semantic Web realms by producing OWL based Ontologies of XBRL Schemas that facilitate the generation of RDF Instance Data for XBRL Data Sources (e.g. XBRL Documents). This effort is not intended to supercede the use of XML Schemas in XBRL in any way. It simply provides a mechanism for exposing XBRL based Financial Data to the Semantic Web.

What are the anticipated deliverables:

  • OWL Ontologies for XBRL Taxonomies such as the XBRL GL (and others)
  • RDF instance data for said Ontologies
  • SPARQL (Semantic Web Query Language) based Access Points for XBRL Instance Data

Benefits:

  • Transparent integration of disparate financial systems
    • Mapping of application data (e.g. SQL) to relevant XBRL Ontologies which are then exposed to WAN (Web) or LAN (Intranet) via SPARQL access points
  • Easy mechanism for plugging into burgeoning Semantic Data Web

Current people participating to that project

Some people already started to talk about the development of the XBRL Ontology and are interested (or are already in) to join this new ontology development group. These people are:

Development communication infrastructure

Some systems are already up and running to help the development team to communicate their ideas, suggestions and questions vis-à-vis the XBRL Ontology.

Conclusion

This new ontology development project aim to describes financial and economic data for exchange and analysis. Some people already started to work on the project as you can notice in the list above. The development of this ontology will be based on the XBRL initiative and existing XBRL taxonomies. But it won’t restrict its expressiveness to XBRL related works only.

Zitgist.com website now online

I am pleased to announce that we finally put online the Zitgist.com website. Some more information about the future service is available there. People can start to link to zitgist.com when referring to this coming semantic web search engine. Eventually people will be able to fill a subscription form to get a private account to test the alpha version of the service. Finally a developer wiki will be available to explains how people should describe their content in RDF for a better indexation into the system, how their could take advantage of Zitgist, how they can interact with it, etc.

Also check out the new logo for the Music Ontology. This is one more step to brand the Music Ontology and help its adoption among people, companies and the community.

Musicbrainz Relation Database mapped in RDF using the Music Ontology

I am pleased to publish some information about mapping of the Musicbrainz relational database data into RDF using the Music Ontology as I promised some time ago. I know that I have been late on this one, but I was waiting after some things to be released before publishing this blog post.

This is the first step we have to do before getting a “physical” RDF dump of the musicbrainz data. This first step is to use a Virtuoso RDF View to view the musicbrainz relation database as a RDF triple store.

Introduction to Virtuoso RDF Views

Carl Blakeley of OpenLink Software Inc. just published a first Virtuoso RDF View tutorial called “Mapping Relational Data to RDF with Virtuoso’s RDF Views“. This article explains how to define RDF Views inside Virtuoso and how they work.

The first step would be to read that document to make sure you understand how the mapping of the Musicbrainz data into RDF has been performed using Virtuoso.

RDF/XML presentation of the mapping

I have written a RDF/XML file explaining where the data came from the Musicbrainz database schemas to create the actual RDF View. This is a good starting point to “feel” how the Music Ontology can be used to express musicals things such as Artists, Bands, Records, Tracks, etc.; and to see how the Musical Created Workflow supporting the Music Ontology is used in that case.

The Musicbrainz RDF View

This is the RDF View enabling the Musicbrainz relational database to be viewed as a RDF source “queriable” using SPARQL. This view will virtualizes the descriptions of mo:MusicArtist, mo:MusicGroup, mo:Records and mo:Tracks; as long as mo:Performance, mo:Signal, mo:Composition, etc.

Using the RDF View

Installing the Musicbrainz Database instance (the quick guide)

The first step is to download the Musicbrainz DB and to install it on a PostgreSQL server instance. Follow these steps.

Note: I will try to make that guide as short as possible, so if there are steps that you don’t understand or doesn’t work for you, please leave a comment on that blog post or send me an email.

Installing Virtuoso

To use the RDF View, you will first have to install the Virtuoso 5.0 on your computer. OpenLink Virtuoso comes in 2 different flavours: Open Source and Commercial. The difference, besides the obvious, is that the commercial versions include Virtual Database functionality, which makes the following step easier, as the relational data may remain in the PostgreSQL database.

Linking PostgreSQL tables to Virtuoso via ODBC

For the Open Source Edition:

With the Virtuoso Open Source Edition 5.0 you will have to export the data from PostgreSQL server and import to Virtuoso native DBMS.

For the Commercial Edition:

Once the Virtuoso instance will be running, open a browser window to access Conductor by going to http://localhost:8890/conductor/. This is a web-based dbms manager like myPhpAdmin but for Virtuoso. You may then use it to attach the tables though ODBC.Note: you should have a PostgreSQL ODBC driver installed to perform the following steps.

You should see the PostgreSQL instance connection in the list. You only have to click on “connect”, put the credentials, and you should get connected the Virtuoso server to the PostgreSQL running instance.

After that click on the “External Linked Objects” to connect the remote PostgreSQL tables with Virtuoso. Take a special look at schemes created by these links. The remote tables should be available via the schema “DB.[ODBC driver name].[remote table name]“

These Musicbrainz tables should be linked into Virtuoso:

track, albumjoin, album, albummeta, artist, artist_relation, artistalias, album_amazon_asin, country, l_album_url, l_artist_artist, l_artist_track, l_artist_url, l_track_track, l_album_album, l_album_artist, l_track_url, language, release, url, puid, puidjoin.

Installing the RDF View in Virtuoso

Before continuing, you will have to make a little modification to the RDF View document. You should replace all the “DB.MO.” string occurrences for “DB.[name of the DSN entry].”. This will specify to the RDF View where to take the relational data (in that case, from a remote PostgreSQL server instance).

Now click on the first item in the left sidebar menu “Interactive SQL (iSQL)”.

The next step is to copy the fixed RDF View code into this iSQL window and the clicking RUN.

After 1 or 2 minutes the view should be defined into Virtuoso.

Testing the view

Now the only thing that you have to do is testing this new RDF View. Use that simple query to make sure that you get triples from the view by running that simple SPARQL query inside iSQL:

sparql
define input:storage virtrdf:MBZROOT
select *
from <http://musicbrainz.org/>
where
{
?s ?p ?o.
};

Now the only thing you have to do is to query this RDF View like if you would query any triple store using SPARQL. Check out the Music Ontology Wiki for some examples of how this RDF graph can be queried.

Conclusions

The RDF View to convert Musicbrainz RDB into RDF is quite interesting on many aspects. First of all, we have a good representation of the Musicbrainz data in RDF using the Music Ontology. But this example also shows precisely how relation data can somewhat easily be converted into RDF.

Why another Bibliographic Ontology?

This very good question by Peter Mika asked on the Bibliographic Ontology Specification Group yesterday.

So, why? Peter said:

I’ve read Frederick Giasson’s call for this group on PlanetRDF.com. But before getting started on the actual topic of developing an ontology for bibliographies, my question is: why develop a new ontology? What is lacking in SWRC/BuRST or PRISM that this new ontology would add? I’m asking this, because I’m concerned by (even) more fragmentation in this space.

I am not a citations a bibliographic references domain expert. In fact, my knowledge in the domain is somewhat limited. However, my recent blog posts about the integration of Zotero into the semantic web brought a lot of questions related with citations and bibliographic ontologies. Bruce D’Arcus appeared from the Zotero web forum, unsatisfied with current ontologies. Bruce knows a lot about all that stuff: he is a domain expert. So I asked to Bruce if he would be willing to start the development of a new Bibliographic Ontology project that would answer its need. In fact, as I noted on my blog and on the wiki, its needs are applied to real problems: OpenOffice and Zotero.

From there, I put in place the current communication infrastructure to start talking about these problems. In less than 1 day, 17 people subscribed to the mailing list, 11 comments have been posted on my latest blog post, etc.

This tells me that there is a real interest in the question. Why? Possibly because current ontologies doesn’t work well for everybody.

In fact, it wasn’t working well for me neither. When I tried to see what was the bibliographic ontologies landscape when I worked on that problem for Zitgist, I found that it was the jungle. There was so many possible ways to describe them, to describe what was a document, etc. There were no best practice guides, no examples, etc; people were doing anything they wanted. This was rendering the data useless for Zitgist. This is for that exact reason that I am putting time in that initiative right now.

An example to illustrate the problem

I will illustrate the current problem with bibliographic ontologies with the following example:

I gone to the BuRST home page and clicked on one of its example. I then checked the code, I saw some SWRC things… then I tried to dereference the URI of this ontology to get the schema explaining what these properties were. Then I tried to find the properties/classes: they were not there.

I think this simple example explains many the problems out there. There are no consistency, no good doc (I can’t find the good SWRC specification document at the moment), no examples, etc.

Next wave of users

The next wave users for these ontologies aren’t computer scientist students working on some academic projects. The next wave of users for these ontologies are Web developers that has only a basic knowledge of all that stuff. What these people need are good doc, consistent concepts and methods, good examples and a community backing the development of these projects.

This is not what I find right now.

Community driven ontology development

To answer to Peter’s mail, Bruce said:

The first corresponds to a narrow range of academic users (last I looked it wouldn’t work for the humanities or law), and the second is just a series of properties, mostly already covered by DC and maintained by a fairly closed industry group not very interested in RDF.

Later Chris Bizer wrote on my blog:

yes, it would really be nice to have a community-backed ontology for describing publications which is a bit more Semantic-Webby than Dublin Core. So developing a best practice for mixing DC, FOAF, SIOC and the event ontology would really useful.

Once you guys have developed this best practice, we are happy to change the D2R mapping of our DBLP server (http://www4.wiwiss.fu-berlin.de/dblp/) and the RDF book mashup(http://sites.wiwiss.fu-berlin.de/suhl/bizer/bookmashup/index.html) , so that they export RDF according to your best practice.

I think that these two examples describe what is happening. Now people are requesting open communities (could we talk about open-sources communities?) to develop these ontologies.

So why this ontology?

The idea here is to develop yet-another-bibliographic-ontology. But the goal isn’t to re-invent the wheel another time. The goal is to fill-in the blanks, to develop a sort of ontology framework developed in such a way that we can easily plug future extension modules, and to make it interacting easily with already existing ontologies. Yes in RDF you can “theorically” plug everything with everything, but in the reality, this is not that simpler and effective. This new ontology initiative should also act as a “best practices” guide for describing citations and bibliographic references on the Semantic Web for developers that has little knowledge in the semantic web.

This is a question of adoption of the semantic web by Web developers. These people that just don’t have the time to check all these little “fragmented” ontologies wrote in OWL, RDFS or whatever, without too explicit comments, without documentation, examples, etc. This is why microformats are going that well: because there are clear documentation, good examples, etc. Like microformats or not, they got the attention of developers because there is support, docs, examples and a strong community developing them.

Conclusion

So all these projects (the Music Ontology, the Bibliographic Ontology, the Linked-Open-Data community, etc.) make me wondering: now, as I write that, are the challenges that the Semantic Web has to face are more social than technical?

I think this is the time now to show to the World that these things work, and work quite well. Unfortunately for some people, we will have to ask these questions and create communities supervising such ontology developments. Entrepreneurs will tell you that the clients are always right. And the clients of ontologies are developers and they won’t spend their precious time in some bric-a-brac projects.

Finally, what I am proposing here is to create an open-community to supervise the development of an ontology describing citations and bibliographic references. This community will be composed of experts of the domain; companies and organizations that want to use it; developers and hobbyists that has interests in it. And as I said above: “The goal is to fill-in the blanks, to develop a sort of ontology framework in such a way that we can easily plug future extension modules, and to make it interacting easily with already existing ontologies. Yes in RDF you can “theorically” plug everything with everything, but in the reality, this is not that simpler and effective. This new ontology initiative should also act as a “best practices” guide for describing citations and bibliographic references on the Semantic Web for developers that has little knowledge in the semantic web.




This blog is a regularly updated collection of my thoughts, tips, tricks and ideas about my semantic Web researches and related software development.


RSS
275


Follow

Get every new post on this blog delivered to your Inbox.

Join 18 other followers:

Or subscribe to the RSS feed by clicking on the counter:




RSS
275