The Next Bibliographic Ontology: OWL

The Bibliographic Ontology‘s aim is to be expressive and flexible enough to be able to convert any existing bibliographic legacy schema (such as Bibtex and its extensions, MARC, Elsevier’s SDOS & CITADEL citation schemas, etc.) and RDFS/OWL ontologies to it.

This new BIBO version 1.2 is the result of more than one year of thinking and discussions between 101 community members and 1254 mail messages. The project’s first aim of expressiveness and flexibility is nearly reached. BIBO’s ongoing development is now pointing to a series of methods and best practices for mature ontology development.

Some BIBO mappings between legacy schemas have been developed, but this trend will now be accelerated. More people are getting interested in BIBO’s ability to describe bibliographic resources. Some people are interested in it to describe bibliographic citations; others are interested in it to integrate data from different bibliographic data sources, using different schemas, into a single and normalized data source. This single data source (in RDF) can then become easily queried, managed and published. Finally, other people are interested in it as a standard agreed to by an open community, that helps them to describe bibliographic data that aims to be published and consumed by different kind of data consumers (such as standalone software like Zotero; or such as citation aggregation Web services like Scirus or Connotea).

With this BIBO 1.2 release, much has changed and been improved. Now, it is time for the community to start implementing BIBO in different systems; to create more mappings; and to complete more converters.

Design Redux

As you may recall from its early definition, BIBO has been designed for both: (1) a core system with extensions relevant to specific domains and uses, and (2) a collaborative development environment governed by the community process.

These design imperatives have guided much of what we have done in this new version 1.2 release to aid these objectives.

BIBO in OWL 2

The new version of BIBO is now described using OWL 2. In the next sections you will know why we choose to use OWL 2 as the way to describe BIBO in the future. However, saying that it is OWL 2 doesn’t mean that it becomes incompatible with everything else that exists. In fact, it validates OWL 1.1 and its DL expressivity is SHOIN(D); this means that fundamentally nothing has changed, but that we are now leveraging a couple of new tools and concepts that are introduced by OWL 2.

As you will see below this decision results in much more than a single update of the ontology. We are introducing an updated, and more efficient, architecture to develop open source ontologies such as The Bibliographic Ontology.

New Versioning System

OWL 2 is introducing a new versioning and importation system for OWL ontologies. This feature alone strongly argued for the adoption of OWL 2 as the way to develop BIBO in the future.

This new versioning system consists of two things: an ontologyURI and a versionURI. The heuristics to define, check, and cache an ontology that as an ontologyURI and possibly a versionURI are described here.

BIBO has an ontologyURI and multiple versionURIs such as http://purl.org/ontology/bibo/1.0/, http://purl.org/ontology/bibo/1.1/, and http://purl.org/ontology/bibo/1.2/.

Right now, the current version of the ontology is 1.2. This means that the current version of BIBO will be located at two places: http://purl.org/ontology/bibo/ and http://purl.org/ontology/bibo/1.2/.

The location logic of ontologies is described here. What we have to take care here is that if someone dereferences any class or properties of BIBO, it will always get the description of that class or property from the latest version of the ontology. This is why the caching logic is quite important. The user agent has to make sure that it caches the version of the ontology that it knows.

What is really important to understand is that the URI of the ontology won’t change over time when we introduce new versions of the same ontology. Only the location of these versions will change.

Finally, the OWL 2 mapping to RDF document tells us that we have to use the owl:versionInfo OWL property to define the versionURI of an ontology. This is the reason why the use of this OWL 2 versioning system doesn’t affect the validity of BIBO as a OWL 1.1 ontology; because owl:versionInfo is also an OWL 1.1 property.

Now, lets take look at the tools that we will use to continue the development of BIBO.

Protégé 4 for Developing BIBO

We chose to now rely on Protégé 4 to develop BIBO in the future. We wanted to start using a tool that would help the community to develop the ontology. Considering that Protégé 4 Beta has been released in August; that it supports OWL 2 by using the OWLAPI library; and many plugins are already supported; it makes it the best free and open-source option available.

What I have done is to add some SKOS annotation properties to annotate the BIBO classes and properties to help us to edit and comment on the ontology. Here is the list of new annotation properties we introduced:

  • skos:note, is used to write a general notes
  • skos:historyNote, is used to write some historical comments
  • skos:scopeNote, is really important. It is the new way to target the classes and properties, imported from external ontologies, that we recommend to use to describe one aspect of BIBO. The scopeNote will tell the users the expected usage for these external resources.
  • skos:example, is used to give some examples that show how to use a given class or property. Think of RDF/XML or RDF/N3 code examples.

Finally, all these annotations are included in BIBO’s namespace.

OWLDoc for Generating Documentation

OWLDoc is a plugin for Protégé that generates documentation for OWL ontologies. In a single click, we can now get the complete documentation of an ontology. This makes the generation of the documentation for an ontology much, much, more efficient. Users can easily see which ontologies are imported, and then they can easily browse the structure of the ontology. Many facets of the ontology can be explored: all the imported ontologies, the classes, the object/data properties, the individuals, etc.

You can have a look at the new documentation page for BIBO here. On the top-left corner you have a list of all imported ontologies. Then you can click on facet links to display related classes, properties or individuals. Then you may read the description of each of these resources, their usage, and their annotations (scope-notes, Etc.).

Please note there are still some issues and improvements to do with the template used to generate the pages, such as multiple resource descriptions not yet adequately distinguished. We are in the process of cleaning up these minor issues. But, all-in-all, this is a major update to the workflow since any user can easily re-create the documentation pages.

Collaborative Protégé for Community Development

Now that it is available for Protégé 4, we will shortly setup a Protégé server and make it available to the community to support BIBO’s community development. We will shortly announce the availability of this Collaborative Protégé.

In the meantime, I suggest to use the file “bibo.xml” from the “trunk” branch of the SVN repository (see Google Code below). The Bibliographic Ontology can easily be opened that way using the “Open…” option to open the local file of the SVN folder, or by using the “Open URI…” option to open the bibo.xml file from the Google Code servers. That way, each modification to the ontology can easily be committed to the SVN instance.

Google Code to Track Development

As noted above, the BIBO Google Code SVN is used to keep track of the evolution of the ontology. All modifications are tracked and can easily be recovered. This is probably one of the most important features for such a collaborative ontology development effort.

But this is not the only use of this SVN repository. In fact, it as an even more central role: it is the SVN repository that sends the description of the ontology for any location query, by any user, for any version. Below we will see the workflow of a user query that leads the SVN repository to send back a description for the ontology.

Google Groups to Discuss Changes

The best tool to discuss ontology development is certainly a mailing list. A Google Groups is an easy way to create and manage an ontology development mailing list. It is also a good way to archive and search discussions that has an impact on the development (and the history) of the ontology.

Purl.org to Access the Ontology

Another important piece of the puzzle is to have a permanent URI for an ontology that is hosted by an independent organization. That way, even if anything happens with the ontology development group, hopefully, the URI will remain the same over time.

This is what Purl.org is about. It adds one more step to the querying workflow (as you will notice in the querying schema bellow), but this additional step is worth it.

General Query Workflow

There is one remaining thing that I have to talk about: the general querying workflow. I have been talking about the new OWL 2 versioning system, purl.org redirection and using the SVN repository to deliver ontology descriptions. So, there is what the workflow looks like:

[clik to enlarge this schema]

At the first step, the user requests the rdf+xml http://purl.org/ontology/bibo/. As we discussed above, this permanent URI is hosted by Purl.org; what this service does is to redirect the user to the location of the content negotiation script.

At the second step, the user requests the rdf+xml serialization of the description of the ontology at the URI of the location sent by the Purl.org server: http://conneg.com/script/. One of the challenges we have with this architecture is that neither Purl.org nor Google Code handles content negotiation with a user.

Thus, it is also necessary to create a “middle-man” content negotiation script that performs the content negotiation with the user, and redirects it to the proper file hosted on SVN repository. (If Purl.org or the SVN repository could handle the content negotiation part of the workflow, we could then remove the step #2 from the schema above and then improve the general architecture.  However, for the present, this step is necessary.)

Note 1: Take a special look at the redirection location sent back by the content negotiation script: http://…/tags/1.2/bibo.xml. This is a direct cause of the new versioning has the versionURI http://purl.org/ontology/bibo/1.2/. Considering the versioning system, the content negotiation script redirects the user to the description of the latest version of the ontologyURI (which is currently the version 1.2).

Note 2: Purl.org current doesn’t strictly conform with the TAG resolution on httpRange-14. However this should be resolved in an upgrade of the Purl.org system that is underway (the current system is dated as of the early 1990s).

At the third step, the SVN repository returns the requested document by the user with the proper Content-Type.

Conclusion

Developing open source ontologies is not an easy task. Development is made difficult considering the complexity of some ontologies, considering the different way to describe the same thing and considering the level of community involvement needed.  Thus, open source ontology development needs the proper development architecture to succeed.

I have had the good fortune to work on the this kind of ontology development with Yves Raimond on the Music Ontology, with Bruce D’Arcus on the Bibliographic Ontology, and with Mike Bergman on UMBEL. Each of these projects has led to an improvement of this architecture. After two years, these are the latest tools and methods I can now personally recommend to use to collectively create, develop and maintain ontologies.

UMBEL as a Coherent Framework to Support Ontology Development

There are multiple ways to represent the World we live in. Someone will think about something in a way, where someone else next to him will think about the same thing in another way. They will think about it in different ways: different characteristics, different ways to interact with it, different ways to use it, different ways to think about its composition, its relations with other things, and so on.

What is nice is that probably all of these different ways to think about this thing are good: after all, there are many ways to think about the same thing. It is this characteristic of thinking about things in different ways that leads to innovation.

But innovation is also not a game where anything goes. Things that work in the real world and in real ways need to adhere to certain rules, concepts, principles and theories. Continued innovation requires working within these coherent frameworks of natural relationships and order.

So, while a beautiful thing is that we can create new frameworks to think about things differently, not all of those frameworks work as well as others or make sense.

While it is conceivable that one could suppose any new framework or to think about things differently, frameworks that are actually useful should, among other things:

  1. Make sure the development of innovations within the framework is coherent
  2. Make sure the development of innovations within the framework is in context
  3. Help coordinate the development of projects and the cooperation of agents that work on these projects in order to achieve (1) and (2).

What seems clear to me is that the lack of any of (1), (2) or (3) makes innovations difficult and/or less powerful and less useful.

Why Would the Development Of Ontologies be Different?

The Semantic Web is often seen as a place where people describe things in multiple ways and where these things are more or less magically related together. For example, if you can’t properly describe something, you only have to create a new ontology, or to extend an existing one, and to publish it, et voilà!

The more I work in this field, the less I believe in this.

Remember my first point? People tend to think about things in different ways. The same logic applies to the development of ontologies (particularly in the development of ontologies!). Two ontologies, intended to describe the same things, can describe them in totally different ways. So, while some of the magic is that both ontologies can perfectly describe these things but only in different ways, there are other aspects that are not magical at all.

The problem here is to have at least one framework that helps people to develop ontologies such that the:

  1. Developed ontologies remain coherent
  2. Developed ontologies are in context
  3. Coordination of the development of ontologies and the cooperation of the agents working on these ontologies projects is effective to achieve goals (1) and (2).

This construct looks familiar, doesn’t it?

What I am proposing here is to use UMBEL as a coherent framework for ontology development. I am not saying that other frameworks can not play a guiding role in ontology development. But I am saying two things. First, some form of reference framework is necessary. And, second, truly useful frameworks must also be consistent and coherent.

What I am stressing here is the importance of conceptual frameworks to develop ontologies that can be used by people, companies and systems to properly and efficiently exchange data; and at some level, to reason over this data, too.

I think that the only way to do this in an efficient way is by grounding ontologies in such conceptual frameworks.

The ultimate goal is to make data exchange and data reasoning effective to people, organizations and systems that consume this sea of data. And I believe that it is not possible to achieve without grounding these efforts in a coherent, conceptual framework.

An Example at Work

Nothing is better than an example to shows the potential of UMBEL as a coherent framework to develop, and cross-link, ontologies.

Let’s take the Bibliographic Ontology as an example, which we just cross-linked to UMBEL in yesterday’s version 071 release. (Among a dozen other key ontologies; the list is getting pretty cool!)

The goal is to link BIBO classes to UMBEL subject concepts. The linkage is done using three properties: owl:equivalentClass, rdfs:subClassOf and umbel:isAligned.

But firstly, what is the goal here? We try to do two things when linking such ontologies to the UMBEL framework:

  1. To make sure the ontology (BIBO) is coherent and consistent with other existing ontologies that are linked to the framework (other such ontologies could be FOAF, SIOC, etc.)
  2. To make sure that the design choices of the developed ontology are consistent with the design choices of the framework, and the other ontologies that are linked to that framework.

Both points try to help achieve a grander vision: trying to make the semantic Web a little bit more coherent and easy to use and understand.

The BIBO Linkage

This figure shows how BIBO classes have been linked to UMBEL subject concepts in a set-like schema (click to enlarge the schema):

This schema shows what set belongs to what other set. That way, we can quickly notice that bibo:Patent is equivalent to umbel:Patent. We can also see that both classes belongs to (sub-class-of) bibo:Document, umbel:PropositionalConceptualWork and umbel:ConceptualWork, etc.

We have to keep one thing in mind that we made clear in the UMBEL technical documentation: UMBEL has its own view of the World. UMBEL’s subject concept structure is its view of the World. So these linkages are consistent within the UMBEL framework. Now, let’s continue.

The Context

Remember the three points above? What we have done here is to put BIBO in context. The context is created by the UMBEL conceptual framework. Once this is done, we can check for the coherence between BIBO, UMBEL and all the other ontologies that are linked to the framework.

The figure below shows the context created by UMBEL for BIBO, FOAF and SIOC (click to enlarge the schema):

Considering the current description of these three ontologies, we know that bibo:Document is equivalent to foaf:Document. But there exists no relationship between these two classes and sioc:Item and sioc:Post.

Intuitively we know that there are some relationships between all these classes (at least based on their label). We also have to keep in mind that it is not because a description is not defined (in RDF) that this description doesn’t exist (this is the open world assumption).

That being said, the figure above shows how UMBEL can help us to find such “non-described” relationship between classes of different ontologies. By contextualizing these three ontologies we now find that all these classes are sub-classes of umbel:ConceptualWork. We also know that some sioc:Post belongs to umbel:PropositionalConceptualWork (things written), just like some bibo:Document and foaf:Document stuff.

This means that this linkage — this contextualization — of external ontologies now gives us a common ground to play with: umbel:ConceptualWork. By querying this subject concept we can come up with a full range of related things: BIBO, SIOC and FOAF stuff.

For example, take a look at the section “Narrower External Classes” of the umbel:ConceptualWork detailed report and extend the list of external classes (click on the All Classes . . . link). All these things are conceptual works. This fact is explicated by UMBEL even if no relations, or a small number, is described in these ontologies, related to the other ontologies. Also take a look a the list for umbel:PropositionalConceptualWork.

This also shows the coherence of the design of each ontology.

The Coherence

So, once we have the context in place, we are on our way to achieve coherence. UMBEL is 100% based on OpenCyc and Cyc, which are internally consistent and coherent within themselves. We thus use these coherent frameworks to make the mappings to external ontologies coherent, too.

The equation is simple:

“a coherent framework” + “ontologies contextualized by this framework” = “more coherent ontologies”

This context and this coherence helps us to develop ontologies in two ways:

  1. It helps us to make sure the design of an ontology is good
  2. It helps us to make sure the designed ontology is coherent with other existing external ontologies

For example, when I linked BIBO classes to UMBEL subject concept classes, I found that a bibo:Series was a sub-class of umbel:ConceptualWorkSeries. Then I found that bibo:Periodical was the same thing as a umbel:PeriodicalSeries. However I had an issue: a bibo:Series was a sub-class of bibo:Collection and bibo:Periodical was also a sub-class-of bibo:Collection. Then I found that umbel:PeriodicalSeries was a sub-class of umbel:ConceptualWorkSeries. Then the question arose: why bibo:Periodical is not a sub-class of bibo:Series instead of bibo:Collection? This is what I will propose for the next iteration of BIBO.

Now, what about this helping to increase the coherence between external ontologies?

One good example I have is related to SIOC and FOAF. When I linked SIOC to UMBEL, Kingsley asked me why I didn’t link sioc:Item. My answer was simple: I cant do this since if I make this linkage, the coherence of UMBEL will be disturbed. The problem was that sioc:Item was a sub-class-of foaf:Document. But considering sioc:Items definition, and foaf:Documents definition and linkage to UMBEL, by making the linkage of sioc:Item to UMBEL would create some incoherence in the framework because of its relationship with foaf:Document.

From this discussion with Kingsley, this thread appeared on the SIOC mailing list, and the link from sioc:Item to foaf:Document has been removed.

These are the two general cases where UMBEL, as a coherent framework, can help the development of ontologies.

So, by achieving points (1) and (2), we are on the way to achieve point (3): the coordination of the development of ontologies and the cooperation of the agents working on these ontologies projects is effective to achieve goals (1) and (2).

The Final Mapped Relations

So, after application of this process and thinking, here are the UMBEL-BIBO mappings:

You can look at Appendix A to the UMBEL technical document (PDF or online); additionally you will see similar mappings for the existing dozen or so ontologies presently mapped to UMBEL. In combination, these give us the ability to Explode the Domain!

Descriptive Subject Concepts: Icing on the Cake

All of the description above relates to the mapping between the BIBO and UMBEL ontologies (and therefore other external ones). But, of course, we also now have the full scope of UMBEL subject concepts that we can also now apply to describe what the actual BIBO citations are about.

So, while we have structural ontology relationships that can be leveraged, we also now have a common vocabulary to describe the subject matter of what these citations are about. Use of these UMBEL subject concepts now allow us to cluster and retrieve citations by subject matter.

In this manner, UMBEL becomes a consistent tagging vocabulary for describing what citations and references are about. Want everything about weaving or galaxies or opera or anything, for example? Simply characterize your citations by appropriate UMBEL subjects and then use them as part of your retrieval filters.

This makes clear that UMBEL is some kind of Hydra: it can be used as a conceptual framework to help make ontologies (vocabularies) coherent and consistent, and at the same time, it can act as a conceptual description framework that describes the “matter” of things. This means that a subject concept can describe the “nature” of a thing and the “matter” of another thing at the same time.

Conclusion

UMBEL is becoming a wonderful tool that can be used in many ways. It is a vocabulary that is instantiated in a subject concept structure. It can be used not only to categorize things and to help find things, but also to define things, and to develop ontologies that define other things. We are on our way to achieve these three goals:

  1. Develop ontologies that are in context
  2. Develop ontologies that remain coherent
  3. Coordinate the development of ontologies and the cooperation of the agents working on these ontologies projects sufficient to achieve goals (1) and (2).

As usual, I’d like to thank my UMBEL co-editor and colleague, Mike Bergman, for his discussions and assistance on this material.

The Bibliographic Ontology 1.0

After months of development and nearly 1000 messages on the mailing list exchanged between 83 participants, the first version of The Bibliographic Ontology has just been published.

This is an important milestone for this project. It has been postponed weeks after weeks to make sure that it was expressive enough to handle all kind of scenarios for all kind of bibliographic projects. We finally reached a consensus and published the first version of this ontology.

I am quite pleased to release it after nearly one year of development. We have a solid basis that can easily be extended to cope with more specialized bibliographic needs. We already know some projects (such as Zotero; thanks Connie) that are planning to use BIBO to describe things related to documents and collection of documents in RDF.

Ontology Resources

Many resources exist to help people to use this ontology to describe bibliographic things.

  • Ontology documentation – is the human readable documentation of the ontology.
  • Ontology description – is the RDF+N3 description of the ontology. (note: all URIs are dereferencable)
  • Mailing list – is the place where people ask questions about how to use the ontology; where people suggest extensions to the ontology; and where people report potential issues.
  • Wiki – is the place where to archive references, write examples and write other stuff related to the ontology.
    • Examples – It is the place where to write BIBO examples.
  • Google Code Repository – is the place where to download the latest version of the working draft of the ontology. Additionally, people can download tools related to the ontology.

Conclusion

I would like to thank everybody that participated to the mailing list and the wiki. Many people put much time and thinking into this ontology, and this release won’t have been possible without their professional work, time and thinking. This is a really complex domain and countless hours have been spent on this project. It is not an end; it is just the beginning.

Please send any questions, comments, suggest and report issues on the mailing list.

I would like to personally thanks Bruce, Yves, Patrick, Connie, Elena, Mark, (I am missing others, please forgive me), and all others for making this happen.

Blogs, WordPress, Zitgist and the Semantic Web

rdf-zitgist-wordpress.png Every link has a relation on the Semantic Web. Each time a person create a link from a web page to another web page, it does much more than simply linking… In fact, the Web and the Semantic Web are starting to mesh together.

The meshing is occurring at the level of the URI, or more specifically at the level of the URL if we are talking about the Web. This is what I will show you in this post using a WordPress plug-in I developed using Zitgist technologies.

Motivations driving the development of the plug-in

The first objective of this project is to try to find out how people could integrate semantic web concepts and principles in the systems they daily use. How can we integrate the semantic web into Blogs for example? Is the use of semantic web technologies only good at publishing content in RDF? This is certainly one thing, but I doubt it is the only one. This is for that reason why I put some time in developing this prototype.

The second motivation is to create a good prototype of a system using Zitgist’s architecture to show people how they can take advantage of Zitgist to develop their projects; to make their vision a reality.

Some background thinking about the plug-in

On the Web, people mainly manipulate web page resources. They locate them on the Web using a unique locator, called a URL. On the semantic web on the other hand, people do not only manipulate documents; they manipulate many kind of Things, many kind of resources. They refer to them using URIs. The difference between a URI and a URL is that a URL is resolvable on the Web, but not necessarily a URI (in fact, a URI is the super-class of a URL). However, best practices suggest people to make URI resolvable (dereferencable) on the Web; in such a case a URI is a URL.

Anyway, all this to say that a URI in the semantic web can be a URL on the Web. There are many use cases emerging from that special digital environment. As an example, many people will use a Wikipedia Wiki Page URL as an URI for a topic, an interest, or for many other relations to these concepts. In such a case, the URL of a webpage is used to refer as a Concept. I don’t want to discuss about the basis of this, but it is a fact, and we have to handle it.

Introduction to the Zitgist WordPress Plug-in

This plug-in is quite simple in appearance, but has some really interesting results for users.

The only thing this plug-in does, is to show blog readers existing related data for a given URL and, in some case, to enable them to perform actions based on this data.

By example, if I make a link to Tim Berners-Lee‘s web page, a user could be interesting in having more information about Tim, directly from the article he is reading. Tim has many data related to him from the semantic web.

timbl.png

That is it. The plug-in display related information to links from a blog post. In this case, it is people Tim knows and Tim’s profile. The information is shown the users using a contextual menu. The data is requested to Zitgist’s systems and is displayed to the user. This is that simple, but how powerful?

The usefulness of the Zitgist WordPress Plug-in

The plug-in is quite useful in many ways. In fact, it instantly displays related information about a link to readers of the blog. From any blog post, a reader can easily jump to resources related to each link.

Some use cases

Above I said that a URL, a web link, could be much more than it usually appears. So bellow, I show a couple of use cases showing the potential behind the idea.

1. URL as a web page

What happen when a link from a blog post is a URL? Well, some things can happen, and there is an example:

Check it by yourself: The Bibliographic Ontology

bibo.png

Here a user can check the webpage directly, or he can jump to related resources. These related resources come from the semantic web. The first one is the description of the project. The following two are the authors of the ontology. The last resources are documents related to the ontology and the “version” of the ontology.

2. URL as a dereferencable URI

For the non-initiated readers, I would suggest you to read this best practice tutorial explaining how to publish semantic web data on the web.

Sometimes (okay, not that much at the moment, but I hope people will start), people link to resource URIs (so, URL that can be dereferenced to get RDF data about the resource, or its web page representation if available).

Check it by yourself: URI referring to Frederick Giasson

fgiasson.png

The result is that readers have directly access to my profile, articles I wrote, etc.

3. Actionable URL

Sometimes it can be really interesting to be able to act according to some URLs. One example is when a web page, or a resource (identified by a URI) refers to a thing that can be bought. By example: something that can be bought on Amazon.com:

Check it by yourself: Visualizing the Semantic Web

amazon.png

From the blog post, the reader can automatically buy the related resource on Amazon. This is only one possible action, but many others are possible; the only limit is imagination.

Conclusion

The simple links you create from your blog posts to other web pages have much more related information than you can think. Using this prototype Zitgist WordPress plug-in will explicit these links for your reader.

You only have to read some of my other blog posts to try it by yourself. Some results are quite impressing.

I will make this plug-in available for download sometime next week.

This idea has been promoted by Kingsley Idehen for some time now. He uses to call this idea enhanced anchors, or, a++. The idea is simple: enhancing anchors to explicit links to a certain resource (URI or URL), and optionally to perform some action on them.

This prototype is a first try in that direction. Many upgrades should follow so we really unveil the power of this new kind of linking; of this new way to relate things together, and to explicit these relations. Please report me any bug, issues, cross-browsers problems, comments, suggestions, etc.

The Open Library in RDF using The Bibliographic Ontology

openlibrary.png

“What if there was a library which held every book? Not every book on sale, or every important book, or even every book in English, but simply every book-a key part of our planet’s cultural legacy.” — The OpenLibrary Project

This is what I wanted to participate to.

The Open Library is a project that wants to archive information about every book (probably writings) created by mankind. Such a strong vision is naturally closely related to the semantic web.

I contacted Aaron Swartz about this project. I wanted to know what were their plans about making all this data available on the semantic web; what was their plan to describe these books into RDF.

I wanted to participate to the project by describing their information into RDF using the Bibliographic Ontology.

So it is what I started to do. Aaron sent me some snapshots of data using their current database schema (this schemas should be updated soon). Then I described one of them using BIBO. As you will see bellow, the ontology neatly describes the Open Library data and enable us to query, at the same time, the Open Library’s data, the data about the articles I wrote, eventually the Zotero citations if they choose to use BIBO, etc.

So, bellow is my proposition to Aaron and to the Open Library Project. From this post, we will be able to discuss about the implications, how this could be done, how the data could be made available for querying and browsing, etc.

How to Cook Revised Edition described using RDF and BIBO

The current use case is a book by Raymond Sokolov: “How to Cook Revised Edition“. It has been straightforward to map this data into BIBO using the current proposition.

The RDF/N3 example is available here: How to Cook Revised Edition in RDF/N3

Describing this data using BIBO leaded me to find out how to describe topical subjects of documents. It is a discussion we (the BIBO development community) already had, and here I think I found a solution.

Describing topical subjects for a bibo:Document

The goal is to relate a document resource with the concepts describing their topics. There are many ways to describe subjects of documents: it could be with a literal, a class, an individual, etc.

What I am proposing here is to re-use the dcterms:subject property (has we already do) to relate a bibo:Document with the concept of a taxonomy that will acts has the topical subject of a document.

The Open Library is using the BISAC subject standard to relate books with their topics. What I have done is to describe the BISAC standard as a taxonomy in RDF using SKOS. The resulting RDF is: BISAC taxonomy snapshot.

As you can notice, the BISAC taxonomy structure is well-described using SKOS concepts. The relation between these concepts is described as well. Also, the dcterm:identifier property is used to link a concept with its BISAC identifier.

From there, we only have to use the BISAC URIs to link a bibo:Document to its subjects like:

dcterms:subject <http://purl.org/ontology/bibo/bisac#Cooking_Regional_and_Ethnic_American_General> ;
dcterms:subject <http://purl.org/ontology/bibo/bisac#Cooking_General> ;

This is simple and effective. Also, we are not limited to the BISAC taxonomy; one can use the taxonomy he wants to describe subjects of its documents.

Some SPARQL queries

Nothing is better than SPARQL queries to “feel” the power of these RDF descriptions.

Queries related to contributions

The following query will display the documents’ title and the contribution role of Raymond Sokolov. So, if Raymond contributed to some documents as an author and editor, and all these documents will be returned in the resultset:

Finding documents where Raymond Sokolov contributed
The following query is a variable of the above. It will returns all the documents’ title where Raymond is an author.

Finding documents where Raymond Sokolov contributed as an author

Eventually we could also use the bibo:position to know all the documents wrote by Raymond where its author position if less than 2 (so, where he is a primary or secondary author of a document).

Queries related to documents and their subjects

If a user only has the BISAC identification number of a concept, and that he needs to find books about this topic, then he only has to run this query to get the titles with that topic:

Finding documents related to a BISAC identifier

However, it is not really handy. What if I only want books about “cooking”? There is a way to go:

Finding documents about “cooking”

That way, you will get all the “cooking” related concepts from the taxonomy, and you will find all the related books.

Note that there are many other ways to go such as browsing the graph of concepts using the skos:narrower and skos:broader properties from a given skos:Concept. However, the query above is simple and effective.

Other queries

Otherwise you can create a full set of other simple and effective queries by searching all the published books, all the published books by a given author or editor, etc.

There is no limit when all that information is available in RDF and BIBO.

More descriptions of the Open Library using BIBO

If you take a closer look at the current database schemas of the Open Library Project, you will notice that have data about “series”, “notes”, and other things. I don’t have such an example in hands at the moment, but we have to keep in mind that we can easily describe them using BIBO as well.

Conclusion

I described how RDF and The Bibliographic Ontology could be use to describe data from The Open Library Project. Doing this would enable them to easily and effectively publish their data so that other people and applications could take advantage of it.

We also found that it is a powerful method that we can easily use to search complex graphs of relations created by such data described in RDF and BIBO.

Finally, having all this data available in BIBO will enable us to easily merge it with other document data sources such as Zotero or any other writings described using RDF and BIBO. As a final example, we could, for example, find all the documents that Raymond Sokolov contributed to create, as an author, and editor, or whatever. With a single query, once could find out that he wrote some published books, and that he authored some posts on its blog. All that thanks to the RDF, BIBO, SPARQL and all the data sources exporting their data using RDF and BIBO.