Describing Documents, Articles, Series, Volumes and Conferences using the Bibliographic Ontology

The Bibliographic Ontology let you describe all these things, and much more, in RDF. In the last months the community developing BIBO has been quite fruitful. Many questions have been asked, many have been answered, and things are slowly getting shape.

It is for that reason that I started to create some more examples using the ontology; trying to see how people will use it; etc. I created some examples to see if I could easily describe two articles I wrote in the past few years: (1) and accepted article in a proceeding and (2) a refused article submitted for a conference. I was wondering if the current state of the ontology could easily cope with some weird cases. As you will notice bellow, it nicely described some weird cases that I encountered while describing these articles.

First example: Describing a Series, with volumes and articles

I wanted to describe an article I wrote with Uldis Bojars, Alexandre Passant and John Breslin. This article is part of a proceeding that is published in a series, as a volume (248). The series have a ISSN; however it is only published online (no paper is version available).

There is how BIBO describe such a case:

A Complex series + proceeding + article use case in RDF/XML

The series is a bibo:Series. This series has a title, a short title and a ISSN. Also, it is in relation with its publisher and has a status (published). Finally, this series is put in relation with its volume and a web document (a web page) that is a manifestation of the series.

This is something to have in mind for the remaining of this blog post: in BIBO, a web page is a document, like any other document. The only difference between a paper book and a webpage is their identifier(locator): a published paper book will have a ISBN, and a web page will have a URL. This said, we easily relates different documents’ formats using dcterms:relation. That way, we explicit a relation between two different documents (event if they only difference is their format (printer on paper, html, pdf, etc)).

After I described the proceeding that has been published. It is a bibo:Proceeding that has some properties, but particularly a bibo:volume property that describe its location into the series. Finally, the editors of the proceeding are described and are related to the proceeding they edited via a bibo:Contribution.

Contributions are at the core of the ontology; they are defined as:

“The contribution a person, group or organization makes to the creation or realization of a work.”

So, an editor and an author are contributors to the creation or realization of a work (a document).

Finally I described the article that is a bibo:Article. I described its properties, its authors, and the relation between the authors and the article. I also described its status: it has been peer-reviewed and has been published.

The links between the series, the proceeding and the article has been done by re-using the properties dcterms:hasPart and dcterms:isPartOf.

Second example: a rejected article submitted to a conference

For that second example, I wanted to describe an article I wrote a couple of years ago, that I submitted to a conference and that has been rejected. So, I had to describe the article, the conference, and the fact that it has been rejected after peer-reviewing.

There is how BIBO describes this use case:

Rejected article submitted to a conference in RDF/XML

This is basically the same thing has the above: describing a document with its authors.

However, in that case, I had to describe a conference. The Bibliographic Ontology use The Event Ontology to describe such things. The conference event has been described using the even:Event class, along with event:agent that relates the event with the organization that created the event and event:place that locates the event in the World.

However, the description of conference events will change in the next few weeks since Yves Raimond and me will create an extension module to this ontology to specifically describes conference events (so, we will talk about event:Conference, and event:organizer and event:sponsors, etc.).

Finally, I had something to say about this article I wrote. To say it, I created another type of document called a bibo:Note to annotate this document with some comments. A bibo:Note is a document of its own, like a bibo:Article. However, I relates the two documents (the bibo:Note and the bibo:Article) using the bibo:annotates property. That way, I describe the fact that a document is an annotation to another document.

Conclusion

These two examples explain how The Bibliographic Ontology can be used to describe some complex bibliographic use cases. It is just a start, and many questions are yet to be answered by the bibliographic ontology. However, many things are going forward and if you have been interested by this demonstration, I can only suggest you to join the community supporting BIBO’s development and help it evolving.

News at Zitgist: the Browser, PTSW, the Bibliographic Ontology and the Query Service

It is not because we had some issues with the Zitgist Browser‘s server that things stopped at Zitgist. In fact, many projects evolved at the same time and I outline some of these evolutions bellow.

New version of the Zitgist Browser

A new version of the browser is already on the way. In fact, the pre-release version of the browser was a use case; a prototype. Now that we know that it works and that we faced most of the issues that have to be taken into account to develop such a service, we hired Christopher Stewart to work on the next version of the browser. He is already well into the problem now, so you could expect a release of this new version sooner than you could be expecting. At first, there won’t be many modifications at the user interface level, however, many things will be introduced in this new version that will help us to push the service at another level in the future.

New version of Ping the Semantic Web

The version 3.0 of the PingtheSemanticWeb.com web service should be put online next week. It will be a totally new version of the service. It won’t use MySQL anymore; Virtuoso has replaced it. The service will now fully validate RDF files before including them in the index. More stats will be available too. It is much faster (as long as remote servers are fast too) and I estimate that this only server could handle between 5 to 10 million pings per day (enough for the next year’s expansion). This said, the service will be pushed at another level and be ready for more serious traffic. After its release, a daily dump of all links will be produced as well.

The first draft of the Bibliographic Ontology

The Bibliographic Ontology Specification Group is on fire. We are now 55 members and generated 264 posts in July only. Many things are going on here and the ontology is well underway. We should expect to release a first draft of the ontology sometime in August. If you are interested in bibliographic things, I think it’s a good place to be.

The Zitgist Semantic Web Query Service

Finally, Zitgist’s Semantic Web Query Service should be available for alpha subscribed users sometime in September. You can register to get your account here. Also, take a look at what I wrote about vis-à-vis this search module (many things evolved since, but it’s a good introduction to the service).

Conclusion

So, many things are going on at Zitgist and many exiting things should happen this autumn, so stay tuned!

The Bibliographic Ontology: a first proposition

This Document is about the creation of The Bibliographic Ontology. It is the first proposition from Bruce D’Arcus and me that should lead to the writing of the first draft of the ontology. Some things have been developed, many questions have been raised, and the discussion that will arise from this first proposition will set the basis for the first draft of the ontology.

The goal of this ontology is simple: creating a bibliographic ontology that will set the basis to describes a document: so describing a writing that provides information. If well done, it will enable other people or organizations to create extension modules that will enable it to be expressive enough to describe more specialized sub-domains such as law documents, etc. It also re-use existing ontologies that already define some properties of documents.

Related materials

1. The proposed OWL/N3 file describing The Bibliographic Ontology (note: read the comment, FG are from me, and BD are from Bruce)
2. An enhanced version of the Zotero RDF dump of the book “Spinning the Semantic Web”, that shows the expressiveness power of the ontology by extending its content using the bibo:Part class and the locators properties (RDF/XML)
3. Other examples that shows other possible descriptions such as the description of events, places, etc.(RDF/N3)

Main concept of the ontology: a Document

The main concept of the ontology is bibo:Document. This class is described as “Writing that provides information” (from Wordnet). So, basically, any writing is a Document. It is equivalent to a foaf:Document and a dcterms:BibliographicResource. These two links are quite important since it will enable us to re-use these two widely used ontologies: FOAF and DCTERMS.

Second main concept: Contributions to these Documents

The second main concept of the ontology is bibo:Contribution. This class is described as “A part played by a person in bringing about a resulting Document”. The goal of this concept is to relate people, by their contributions, to documents they wrote, or helped to write. For now, contributions are defined by three properties:

  1. bibo:role, that defines the role of the contributor: author, translator, publisher, distributor, etc.
  2. bibo:contributor, that links a contribution to its contributor
  3. bibo:position, that loselessly associates a “contribution” level for each contributors. This property is mainly used to sort multiple authors that worked on the writing of a document. More about that in the examples document.

With these two concepts, you can describe any Document and any Contribution to any document. So you can relate any piece of writing to its contributors.

What is really interesting with the concept (in my opinion) is that it opens the door the much more. In fact, by using this concept, we can now extend the idea and describe many more things about how people contributed to the writing of a document.

From these two concepts, we extended the idea to be able to cope with a larger range of use-cases.

Extensions of bibo:Document

The document class has been specialized in a series of more specialized type of documents, with restrictions of their own:

  • Article
  • LegalCase
  • Manuscript
  • Book
  • Manual
  • Legistlation
  • Patent
  • Report
  • Thesis
  • Transcript
  • Note
  • Law

Classes or individuals?

The development of this proposition has been made with Lee W. Lacy’s OWL book quote in mind:

Individuals often mirror “real world” objects. When you stop having different property attributes (and just have different values) you have often identified an object (individual)

This mean that if a subclass of a class didn’t have specific restrictions, or if no properties were restricted by using this class in their domain, then the class was dropped and an individuals of the super-class.

One example of this is the type bibo_types:dissertation. It is an individual of the class bibo:Thesis, but since it doesn’t have anything different other than its meaning, then we created an individual of the class bibo:Thesis. Check the examples document to see what it means concretely.

Collections of documents

Another main concept of the ontology is bibo:Collection. This concept has an aggregation inherent property. Its only purpose is to aggregate bibo:Document(s). An entity of this class will have a role of hubs into the RDF graph (network) created out of bibliographic relations (properties).

Other types of collections, with some restrictions of their own, have also been created. These other collections, such as bibo:CourtReporter are intended to be anchor points that can be extended by Bibliographic Ontology Extension Modules of particular specialized sub-domains such as Law documents.

There is the current list of specialized collections:

  • InternetSite
  • Series
  • Periodical
    • Journal
    • Magazine
    • CourtReporter

Part of Documents

Another important concept is bibo:Part. This concept, along with locators (more about them in the next section), enables us to specify the components of Document. In fact, sometimes documents are aggregated to create collections, such as journals, magazines or court reporters. However, sometimes, documents are embedded within a document (embedded versus aggregated). This is the utility of bibo:Part; a bibo:Part is a document, but in fact, it’s a part of a document. The special property of a bibo:Part is dcterms:hasPart. So, a bibo:Part has use this property to relate it to the document it is part of. Check the examples document to know how bibo:Part can be used.

Locating Parts

To support the concept of Parts, a set of properties, called “locators” have been created. These locator properties will help to describe the relation between a Part and its related Document.

Three of these locators are bibo:volume, bibo:chapter and bibo:page. So, these properties will locate Parts inside documents. For example: a chapter within a book, or a volumne within a document that is a set of volumes.

Check the example about the document “The Art of Computer Programming” by Donald Knuth for a good example of how locators can be used.

This said, we could now think to describe a document by its parts, recursively from its volumes to its pages.

Open questions

  1. Should we develop the ontology such that we can describe the entire workflow that lead to the creation and publication (possibly) of a document? All this workflow would be supported by the FRBR principles. At the moment, all the ontology describes the manifestation of a work, and not the work itself or its expression. Take a look at The Music Ontology (its workflow) to see how it could be done for the bibliographic ontology.
  2. If the creation of classes and individuals of classes the good way to describe type of documents?
  3. Is it the good way, or is there other ways, to describe contributions of people to the elaboration of documents?

Re-used ontologies

  • DCTERMS: re-used to describe main properties of document.
  • FOAF: re-used to describe people and organizations.
  • EVENT: re-used to describe events (example: conferences)
  • TIME: re-used to describe temporal properties
  • wgs84_pos: re-used to describe geographical entities

Conclusion

Please give any feedbacks, suggestions or comments directly on the mailing list of the group that develop this ontology. This group is intended to create an ontology that would create some type of consensus between people and organization working with bibliographical data.

Note: I disabled comment on this post only, to make sure that people comment on the mailing list.

Why another Bibliographic Ontology?

This very good question by Peter Mika asked on the Bibliographic Ontology Specification Group yesterday.

So, why? Peter said:

I’ve read Frederick Giasson’s call for this group on PlanetRDF.com. But before getting started on the actual topic of developing an ontology for bibliographies, my question is: why develop a new ontology? What is lacking in SWRC/BuRST or PRISM that this new ontology would add? I’m asking this, because I’m concerned by (even) more fragmentation in this space.

I am not a citations a bibliographic references domain expert. In fact, my knowledge in the domain is somewhat limited. However, my recent blog posts about the integration of Zotero into the semantic web brought a lot of questions related with citations and bibliographic ontologies. Bruce D’Arcus appeared from the Zotero web forum, unsatisfied with current ontologies. Bruce knows a lot about all that stuff: he is a domain expert. So I asked to Bruce if he would be willing to start the development of a new Bibliographic Ontology project that would answer its need. In fact, as I noted on my blog and on the wiki, its needs are applied to real problems: OpenOffice and Zotero.

From there, I put in place the current communication infrastructure to start talking about these problems. In less than 1 day, 17 people subscribed to the mailing list, 11 comments have been posted on my latest blog post, etc.

This tells me that there is a real interest in the question. Why? Possibly because current ontologies doesn’t work well for everybody.

In fact, it wasn’t working well for me neither. When I tried to see what was the bibliographic ontologies landscape when I worked on that problem for Zitgist, I found that it was the jungle. There was so many possible ways to describe them, to describe what was a document, etc. There were no best practice guides, no examples, etc; people were doing anything they wanted. This was rendering the data useless for Zitgist. This is for that exact reason that I am putting time in that initiative right now.

An example to illustrate the problem

I will illustrate the current problem with bibliographic ontologies with the following example:

I gone to the BuRST home page and clicked on one of its example. I then checked the code, I saw some SWRC things… then I tried to dereference the URI of this ontology to get the schema explaining what these properties were. Then I tried to find the properties/classes: they were not there.

I think this simple example explains many the problems out there. There are no consistency, no good doc (I can’t find the good SWRC specification document at the moment), no examples, etc.

Next wave of users

The next wave users for these ontologies aren’t computer scientist students working on some academic projects. The next wave of users for these ontologies are Web developers that has only a basic knowledge of all that stuff. What these people need are good doc, consistent concepts and methods, good examples and a community backing the development of these projects.

This is not what I find right now.

Community driven ontology development

To answer to Peter’s mail, Bruce said:

The first corresponds to a narrow range of academic users (last I looked it wouldn’t work for the humanities or law), and the second is just a series of properties, mostly already covered by DC and maintained by a fairly closed industry group not very interested in RDF.

Later Chris Bizer wrote on my blog:

yes, it would really be nice to have a community-backed ontology for describing publications which is a bit more Semantic-Webby than Dublin Core. So developing a best practice for mixing DC, FOAF, SIOC and the event ontology would really useful.

Once you guys have developed this best practice, we are happy to change the D2R mapping of our DBLP server (http://www4.wiwiss.fu-berlin.de/dblp/) and the RDF book mashup(http://sites.wiwiss.fu-berlin.de/suhl/bizer/bookmashup/index.html) , so that they export RDF according to your best practice.

I think that these two examples describe what is happening. Now people are requesting open communities (could we talk about open-sources communities?) to develop these ontologies.

So why this ontology?

The idea here is to develop yet-another-bibliographic-ontology. But the goal isn’t to re-invent the wheel another time. The goal is to fill-in the blanks, to develop a sort of ontology framework developed in such a way that we can easily plug future extension modules, and to make it interacting easily with already existing ontologies. Yes in RDF you can “theorically” plug everything with everything, but in the reality, this is not that simpler and effective. This new ontology initiative should also act as a “best practices” guide for describing citations and bibliographic references on the Semantic Web for developers that has little knowledge in the semantic web.

This is a question of adoption of the semantic web by Web developers. These people that just don’t have the time to check all these little “fragmented” ontologies wrote in OWL, RDFS or whatever, without too explicit comments, without documentation, examples, etc. This is why microformats are going that well: because there are clear documentation, good examples, etc. Like microformats or not, they got the attention of developers because there is support, docs, examples and a strong community developing them.

Conclusion

So all these projects (the Music Ontology, the Bibliographic Ontology, the Linked-Open-Data community, etc.) make me wondering: now, as I write that, are the challenges that the Semantic Web has to face are more social than technical?

I think this is the time now to show to the World that these things work, and work quite well. Unfortunately for some people, we will have to ask these questions and create communities supervising such ontology developments. Entrepreneurs will tell you that the clients are always right. And the clients of ontologies are developers and they won’t spend their precious time in some bric-a-brac projects.

Finally, what I am proposing here is to create an open-community to supervise the development of an ontology describing citations and bibliographic references. This community will be composed of experts of the domain; companies and organizations that want to use it; developers and hobbyists that has interests in it. And as I said above: “The goal is to fill-in the blanks, to develop a sort of ontology framework in such a way that we can easily plug future extension modules, and to make it interacting easily with already existing ontologies. Yes in RDF you can “theorically” plug everything with everything, but in the reality, this is not that simpler and effective. This new ontology initiative should also act as a “best practices” guide for describing citations and bibliographic references on the Semantic Web for developers that has little knowledge in the semantic web.

The Bibliographic Ontology

Zitgist, Bruce D’Arcus, the Zotero team and Michael K. Bergman started a new initiative to develop a new citation and bibliographic references ontology. The idea of that project started a couple of days ago when we tried to find how Zotero could be integrated in a semantic web environment. This brainstorming leaded us to start a new ontology development project: The Bibliographic Ontology.

References

Some things are already in place to start the collaborative development of the ontology:

Starting the development of this ontology

As a starting point of the development of this ontology, we will take the “Citation Oriented Bibliographic Vocabulary” developed by Bruce D’Arcus. It is a start, but as he pointed out in the brainstorming, there are much work to do with it to create a better citations and bibliographic ontology. Also, Bruce wrote an introduction mail about what he has in mind to make it a better ontology, what he thinks we should work on, etc. Have in mind that Bruce has a big background and much experience in the domain of citations and bibliographic references.

Goals

The development of this ontology should be driven by its goals. Bruce outlined some goals for this ontology, and more could be added depending on how people are expecting to use it.

  1. Should be a superset of legacy formats like BibTeX, RIS, and so forth
  2. Must support the most demanding needs in the social sciences, humanities, and law, and those who deal with non-Western languages
  3. The class system must be able to map to the type system in the citation style language I [Bruce] designed. In short, it is not enough to just encode the data: it needs to be able to be formatted according to the often archaic details of citation styles
  4. Should be developer-friendly; I consider examples like DOAP and SKOS to be models here
  5. Behind all of these goals are a more concrete goal: it should be perfect for using in OpenDocument/OpenOffice citation support and should handle Zotero’s needs.

In fact, for the point 5, these systems will be the tests cases for the development of this new ontology. They are the same as Musicbrainz, Magnatune and any musical needs that were the tests cases for the development of the Music Ontology.

Users

Users can be many people or systems. Just to listen a couple of them:

  • OpenDocument/OpenOffice citation system
  • Zotero
  • Zitgist
  • Students or professors in a social science or law department
  • Book selling systems such as Amazon.com, Alibris.com or Abebooks.com
  • Book, journals, etc. publishers
  • Authors

As you can see, many things [people or systems] are potential users of this ontology: from people without computer background to heavy and complexes systems such as Amaon.com Zotero and OpenOffice.

Constraints

Users and goals define the development constraints of that ontology. However, we will try to take the same path as me and Yves Raimond has taken for the development of the Music Ontology: creating many levels of expressiveness for the ontology. These levels will be use depending on the user: does the user need to only describe a simple bibliographic reference? Yes, then he will use the level one. Does the user need to describe a collaborative work aggregating many medium sources like: writings, speeches, and conferences, in many languages and in a special timeframe? Yes, then he will use level three. It has been quite a successful approach in the Music Ontology so we should try it into the Bibliographic Ontology too.

Reuse of existing ontologies

This ontology will probably reuse many existing ontologies. Some of them could be:

  • FRBR: as the basement of the ontology
  • FOAF: as the way to describe authors
  • SIOC: as a way to describe everything related to the social software World: wiki pages, blog posts, mailing list threads, etc.
  • MO: as a way to describe everything related to musical things
  • DC: do I have to say why?
  • Event: as a way to describe some events like workshops, conferences, etc.
  • Timeline: as a way to describe complex temporal frameworks

Conclusion

If you are interested in that new ontology development project, I would suggest you to subscribe to the mailing list as well as creating a user on the Wiki and to start giving your ideas and expertise to develop the Bibliographic Ontology. What is great with that project is that it is already motivated by external projects such as its integration into the OpenDocument/OpenOffice citation support and its use by Zotero for its integration with Ping the Semantic Web and Zitgist.