Exploding the Domain: UMBEL Web Services by Zitgist

Print This Post Print This Post
I am pleased to announce the first phase of the public release of the UMBEL Web Services by Zitgist. This first release consists of a series of user interfaces in-front of several UMBEL web services.

This blog post shows and explains what these web services are about and how people will be able to use them to leverage UMBEL to create new ontologies, to instantiate new data sets and to interlink external ontologies to explode their domains.

Background

For the last four to six months we have been in the process of creating the UMBEL ontology. We have been doing research to find the best basis datasets; we have been cleaning these datasets for UMBEL’s purposes; and we have been developing the ontology and its principles. Starting today, we begin the release process for UMBEL:

  1. UMBEL web services’ user interfaces
  2. UMBEL ontology (OWL-Full)
  3. UMBEL ontology technical documentation
  4. UMBEL subject concepts’ structure (SKOS + OWL-Full) & named entities instantiation
  5. UMBEL web services endpoints.

UMBEL Ontology & Subject Concept Structure

Before starting to show and explain the UMBEL web services’ user interfaces’, I have to give some background information about the UMBEL ontology’s principles, and how the subject concept structure has been created. All this information will be discussed and explained at length in the UMBEL ontology technical documentation that is about to be published; but I have to give some technical background information in order to explain what these web services are about.

As described by Mike, UMBEL’s purposes are:

“[…] to provide a lightweight structure of subject concepts as a reference to what Web content or data “is about”, what is called a concept schema in SKOS […]

Think of the backbone as a set of roadsigns to help find related content. UMBEL is like a map of an interstate highway system, a way of getting from one big place to another. Once in the right vicinity, other maps (or ontologies), more akin to detailed street maps, are then necessary to get to specific locations or street addresses.

By definition, these more fine-grained maps are beyond UMBEL’s scope. But UMBEL can help provide the context for placing such detailed maps in relation to one another and in relation to the Big Picture of what related content is about.

These subject concepts also provide the mapping points for the many, many thousands (indeed, millions) of specific named entities that are the notable instances of these subject concepts. Examples might include the names of specific physicists, cities in a country, or a listing of financial stock exchanges. UMBEL mappings enable us to link a given named entity to the various subject classes of which it is a member.

And, because of relationships amongst subject concepts in the backbone, we can also relate that entity to other related entities and concepts. The UMBEL backbone traces the major pathways through the content graph of the Web. For some visualizations of this subject graph, see So, What Might The Web’s Subject Backbone Look Like?”

A four-article introduction to UMBEL can be read from Mike’s blog at:

UMBEL is a 21 000 subject concept structure that has been derived from the OpenCyc ontology. The structure is described in SKOS and OWL-Full. Each concept is an invididual of the skos:Concept class, which are themselves OWL classes. This dichotomy is the basis of UMBEL. Since the subject concepts are classes, this mean that we can relate these classes to external ontology classes using properties such as rdfs:subClassOf and owl:equivalentClass.

So what does all of this mean? It means that once the linkages between UMBEL subject concepts and external ontologies classes are made, the following becomes possible: 1) the UMBEL subject concept structure can be used to describe (instantiate) things using the UMBEL data structure; 2) external ontology properties can be re-used to describe these new instances since external ontologies classes are linked to UMBEL subject concept classes; and 3) in some cases, the properties defined in these ontologies can be used in relation with UMBEL subject concept classes. The forthcoming technical documentation about this stuff will provide more detailed explanation. For the moment, just accept these assertions as being true.

The UMBEL web services (user interfaces) have been created to help people to manage these relationships between UMBEL subject concepts classes and external ontology classes. People will use the services to infer facts from the structure of the subject concepts, to check if a class is a sub-class, a super-class or an equivalent class of another class. They will also use the services to see what properties, defined in external ontologies, can be re-used, and on which subject concept.

Let the show begin!

UMBEL Web Services Index Page

The entry page lists all the available web services. For each web service, you have a link to the web service user interface, a link to an about page explaining the basis of the web service, and a link to the technical documentation of the web service endpoint: how to communicate with the endpoint web server and how to interpret the answer sent by the web service.

Take note that the web service endpoints are not yet publicly available, and that this endpoint page is provided now for information purposes.

Eleven UMBEL Web Services

  1. Find Subject Concepts
  2. Subject Concept Report
  3. Subject Concept Detailed Report
  4. List Sub-Concepts & Sub-Classes
  5. List Super-Concepts & Super-Classes
  6. List Equivalent External Classes
  7. Verify Sub-Class Relationship
  8. Verify Super-Class Relationship
  9. Verify Equivalent Class Relationship
  10. Subject Concepts Explorer
  11. Yago Ontology — a little help from our friends.

Searching the UMBEL Subject Concept Structure

The first thing people will want to do is to search within the UMBEL subject concept structure. The “Find Subject Concepts” web service helps people to locate potential subject concept they are looking for.

If someone looks at the Find Subject Concepts page and performs a search for the keyword “project”, he will get this list of subject concepts:

umbel_find.png

Note: all subject concepts are ordered alphabetically and the search has been performed on the subject concept label and their semsets (and not in their definition).

The “finding” web service along with all the inferencing web services use the same result page layout: you have a list of subject concepts with their human readable definition (note: 8000 definitions out of 21 000 have yet to be created). If a user clicks on a result, he will be redirected to the Report and the Detailed Report user interfaces. Additionally, a user can click on the small “earth” icon to start browsing the surrounding subject concepts nodes in the Explorer visualization tool.

Inferencing the UMBEL Subject Concept Structure

A series of web services has been created to infer facts in the UMBEL subject concept structure. There are the two main categories of inferencing web services:

  1. The ones that list subject concepts that are more general, more specific or equivalent to a given subject concept
  2. The ones that answer the question: is this subject concept a sub-concept, a super-concept or an equivalent concept to this other subject concept?

These web services can be used not only to infer these facts on UMBEL subject concepts, but also on external ontology classes. There are a couple of examples of what can be done with these inferencing web services:

Note: some people may notice that the doap:Project external ontology class is a sub-class of the “Project” subject concept. This is not intuitive for humans, but this situation will be explained at length in the UMBEL Ontology Technical Documentation. To make a long story short: considering the nature of the current definition of the doap:Project class, we couldn’t say that it is equivalent to the “Project” UMBEL subject concept.

Visualizing the UMBEL Subject Concept Structure

While inferencing and lookup are good, we still have some issues when we try to “feel” what the UMBEL subject concept structure is. The following two user interfaces will do their best to help people visualizing the subject concepts description and their relations with other subject concepts and external ontologies classes.

Lets start with a wonderful visualization tool, created by Moritz Stefaner, and used by UMBEL to let people visualizing and browsing the data structure.

Lets start by browsing the relationship of the “Project” subject concept:

umbel_explorer.png

You can navigate from one node to another by clicking any of the circles. Each circle is an UMBEL subject concept or an external ontology class.

When a node is selected, its concept description is displayed in the right sidebar of the interface.

Note there are four different kinds of relationship between the concepts:

  • Blue (B). (concept A) — broader than –> (Concept B). concept A is more general than concept B
  • Red (N). (concept A) — narrower than –> (Concept B). concept A is more specific than concept B
  • Green (=). (concept A) — equivalent to –> (Concept B). concept A is equivalent to concept B
  • Mauve (I). (concept A) — is a –> (Concept B). concept A is an instance of the concept B

As each node is selected, the display refreshes and shows the new set of relationships for the current node (subject concept or external class). Note the dropdown list shown at the upper right of the display enables you to return to previous views or steps.

The Detailed Subject Concept Report

The detailed subject concept report is the tool to know everything about a specific subject concept. This is not really a web service, but a user interface that uses all existing UMBEL web services to display a detailed report of a subject concept, and all its relations with other UMBEL subject concepts and external ontology classes and properties.

There is the detailed report of the “Project” subject concept:

umbel_detailed_repost.png

There is the list of information available from that detailed report page:

  • UMBEL Subject Concept Name — the name of the subject concept
  • Semset — the preferred label and its alternative labels used to refer to this concept. The alternative labels are aliases, synonyms, collocations, etc.; related to the preferred label of the subject concept
  • Definition — the human readable definition of the subject concept
  • Equivalent External Classes — the classes from external ontologies that refer to this same subject concept. Note that the UMBEL Ontology Technical Documentation will explain how the equivalence relation between an external ontology class and an UMBEL subject concept is done
  • Named Entities — a list of named entities related to this UMBEL subject concept. Most of the time, the subject concept has the “type of” characteristic for these named entities. For example, for the subject concept “Person”, “Albert Einstein” is of type “Person”. The first named entities data set that has been used to create this list of named entities is Yago (more about this below).
  • More General External Classes — these are the classes from external ontologies that refer to a more general concept. Note that the UMBEL Ontology Technical Documentation will explain how the super-class relation between an external ontology class and an UMBEL subject concept is done
  • More Specific External Classes — these are the classes from external ontologies that refer to a more specific concept. Note that the UMBEL Ontology Technical Documentation will explain how the sub-class relation between an external ontology class and an UMBEL subject concept is done
  • In-domain-of — this is a list of properties defined in external ontologies where an individual of the UMBEL subject concept class can be used in the domain of the property. For example, for the subject concept “Person” the in-domain-of property: “foaf:interest (domain: foaf:Person)” means that an individual of the class umbel:Project can re-use the property foaf:interest that is defined in the FOAF ontology in its domain (<umbel:Person> <foaf:internet> <…>). Note that the UMBEL Ontology Technical Documentation will explain how the in-domain-of relation between an external ontology class and an UMBEL subject concept is done
  • In-range-of — this is a list of properties defined in external ontologies where an individual of the UMBEL subject concept class can be used in the range of the property. For example, for the subject concept “Person” the in-range-of property: “doap:developer (range: foaf:Person)” means that an individual of the class umbel:Project can re-use the property doap:developer that is defined in the DOAP ontology in its range (<…> <doap:developer> <umbel:Person>). Note that the UMBEL Ontology Technical Documentation will explain how the in-range-of relation between an external ontology class and an UMBEL subject concept is done
  • More General Subject Concepts — this is the list of more general internal UMBEL subject concepts related to the concept
  • More Specific Subject Concepts — this is the list of more specific internal UMBEL subject concepts related to the concept.

As you can notice, all the relations between any UMBEL subject concept to other subject concepts or external ontologies classes and properties is shown in this detailed report page.

This detailed report page was created not only to show people what UMBEL subject concepts are. I envision that people (more specifically ontologies developer & ontologies users) will also use it to check the current linkage between UMBEL and external ontologies and how to use UMBEL to instantiate and describe resources in RDF, etc. The UMBEL ontology documentation will describe some linkage and re-using use cases in further detail.

Linked External Ontologies and Named Entities

Lets take a deeper look at the named entities section of the detailed report of the “Person” subject concept:

umbel_named_entities.png

These named entities are individuals belonging to the class umbel:Person. If you click on one of these person names, you will notice that they are described the Yago data set. How is this possible?

To make another long story short: umbel:Person is an equivalent class to the cyc:Person class; cyc:Person is an equivalent class to the wordnet:Person class; yago:R._B._Bennett is an individual belonging to the same wordnet:Person class. So we can infer that yago:R._B._Bennett is an individual also belonging to the umbel:Person class. However, these technical details will be explained at length in the UMBEL ontology documentation.

But the truth is that this is not the most wonderful thing around. The most wonderful thing is when we understand what that really means (the linkage between yago:R._B._Bennett and umbel:Person (or any other data sets linked to UMBEL)). This means that this linkage is literally exploding the domain of each of these linked named entities. In fact, now we know this about yago:R._B._Bennett:

  • It is an umbel:Person
  • It is a cyc:Person
  • It is a foaf:Person & a foaf:Agent
  • It is a umbel:HomoSapiens
  • It is a umbel:SocialBeing
  • That we can re-use the foaf:birthday, foaf:name, doap:translator, dcterms:creator, etc.; external ontologies properties to describe this person.

We can infer all these things, and much more, about yago:R._B._Bennett only by linking it to UMBEL. We just contextualized it; and then we exploded its domain!

This is what UMBEL is about; this is the value it creates; and its contribution to the Semantic Web.

Conclusion

This is just the beginning of UMBEL. Currently ten external ontologies have been linked to UMBEL. The attentive eye will notice some strange results in the in-domain-of and in-range-of detailed report sections. More work has to be put in the linkage; however as you will notice in the technical documentation of UMBEL, some weird results come from the way some ontologies are defined. So, these ontologies self-definition create some of these weird results. So this mean that these UMBEL tools won’t only help by linking external ontologies, but they will also help to define new ontologies and to fix existing ones.

Stay tuned; more stuff will be released in the coming weeks and months.

Bookmark and Share this article: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • StumbleUpon
  • del.icio.us
  • Digg
  • Reddit
  • NewsVine
  • Netscape

Yago resources now retrievable on the Web

Print This Post Print This Post
Fabian and I have managed to make Yago resources retrievable on the web. yago-naga.jpg

What that means? This means that if someone has a Yago resource URI in hand, he will be able to check on the Web to get one of the three available representations of the resource:

  • The RDF representation of the resource serialized in XML
  • The RDF representation of the resource serialized in N3
  • The HTML representation of the resource

That way, a person or a software agent doesn’t have to load and index the entire Yago data set in order to get the representations of the Yago resources. They only have to negotiate the content of the document at that URL with the web server to get one of the representations of the resource: RDF+XML, RDF+N3 or HTML.

There are a couple of examples of Yago URIs:

Bookmark and Share this article: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • StumbleUpon
  • del.icio.us
  • Digg
  • Reddit
  • NewsVine
  • Netscape

The emergence of UMBEL and Linked Data

Print This Post Print This Post
Since Mike and I first released UMBEL in 2007, we have not stopped working on it: we have done much research, we defined its concepts and principles, we designed and created it:

the ontology and the instantiation of its subject concepts, abstract concepts, semsets and named ontologies. We intensified our efforts in the last six months so that we nearly worked full time on this project.

We are now starting to release more documentation about the outcome of our work so far. Mike starts to release a really good series of blog posts describing the grounding of this effort. The first blog post that has been published is called A re-Introduction of UMBEL – Part 1 of 4 on foundations of UMBEL. This blog post explains the foundation concepts of UMBEL.

Later this week he will publish three other blog posts that explains what UMBEL adds to Linked Data, how named entities are integrated in this framework and finally how UMBEL relates to its older brother: Cyc and OpenCyc.

So stay tuned on Mike’s blog to read the series of four blog posts that put the basis to future releases and discussions about UMBEL and Linked Data.

Next development of UMBEL

In mean time, we continue our hard work to release the first draft of the UMBEL ontology and a first version of the instantiation of its subject concepts, its abstract concepts, its named entities and their related semsets. Also we will release a first mapping between UMBEL’s subject concepts and related external ontologies classes along with the proper grounding documentation that explains all the things evolved with these instantiations, these linkages and the UMBEL ontology itself.

Bookmark and Share this article: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • StumbleUpon
  • del.icio.us
  • Digg
  • Reddit
  • NewsVine
  • Netscape

Zitgist Got its Orchestrator

Print This Post Print This Post

I am pleased to finally be able to say that Mike Bergman is the new Chief Executive Officer of Zitgist LLC. After months of discussions, hard work, planning and development, Mike became officially the new CEO and Zitgist made a giant leap ahead.

The first contact

The first time I started to collaborate with Mike was related to the UMBEL project. Mike had an idea and I wanted to help him to make it real. At that time I didn’t know that my participation in UMBEL and my collaboration with Mike would impact Zitgist forever.

Months later I released a new prototype project called zLinks. This project has been the tipping point of my collaboration with Mike. However, even at that time, I didn’t know how these two projects would change Zitgist forever.

Those first months were a warm-up session for Mike and me. Everything started from there; we were ready to work together.

Working together

Since that time we have worked together to forge Zitgist, to shape it to Kingsley’s, Mike’s and my vision. The process hasn’t always been easy. Each day brings its challenges, opportunities and work. We spent months to talk about Zitgist’s vision, voice, goals and direction.

Considering Zitgist’s business, people could think that everything was related to technologies, high-tech research and development. But today I would say that those things are nearly secondary. It is sure that activities, services and products are at the center of our discussions; however, we found that the center of everything was: communication.

Communication

Mike lives in Iowa, Kingsley in Boston, me in Quebec City. The three of us have different cultures, different native languages, and live in different places.

On the other hand, Zitgist is a company that gives services and creates products to help people and businesses interlink their data: to make real the value of the global data assets. We try to make data easier to communicate, publish and share.

We belong to the semantic web community. We talk and collaborate with people from around the World: with different cultures and languages. We talk about a domain (the semantic web) that is not yet fully defined and that is still highly academic. We are still juggling with concepts and terminology that we try to share with the community and people from outside this community.

Given that, all challenges can be captured in one word: communication.

We have to communicate our ideas and vision; we have to sell our services and products; we have to make data richer and easier to use and understand; we have to create a vision, a voice and a language. So yes, this is all about communication. But even more: it is all about human communication; communicating to people and companies in different languages with different cultures.

We understand one aspect of the semantic web vision as machines talking to machines. But Zitgist’s challenge is to talk with people.

Mike is now the new orchestrator of Zitgist; it is time for us to communicate our voice to the World.

A new Zitgist

This process forged Zitgist. All the discussions we had, all the ideas we challenged and all the ways we experimented to speak with the outside World forged Zitgist’s vision and voice. The time we put into making Mike the new CEO completely changed Zitgist’s dynamic. We were not just talking about hiring someone; we were talking about growing up a business and achieving a shared vision and voice. Once more, it was about communicating ideas, concepts and vision.

It is all about communication.

Thanks for joining us, Mike.

More references about this news

The official press release
Mike’s personal perspective

Bookmark and Share this article: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • StumbleUpon
  • del.icio.us
  • Digg
  • Reddit
  • NewsVine
  • Netscape

Networks are everywhere

Print This Post Print This Post

Never forget that networks are everywhere. In fact, I have the feeling that anything that has relations with other things can be seen as being part of a network: the so-called social networks, phone networks, DNA networks, protein networks, subject networks, web pages networks, airport networks, street networks, etc, etc, etc.

In an article about the upcoming Twine, Marshall Kirkpatrick said one particular thing that makes my eyebrows rise:

“I would use Twine for recommendation alone, but the value of that feature is minimal until the service finds a large number of users. As it stands, that’s not likely to occur. When it comes to collective organization and discovery of content - nothing is as important as network effect.”

The problem I have with this sentence if that it makes me think that Marshall is saying that: network effect == people collaborating in a same, closed, system (à la Del.icio.us).

The key thing here is that a network effects can take place in many kind of networks, and in many places. So, does Twine or any other so-called semantic web application, need million of users to leverage (create value of) network effects of different kind of networks? I don’t think so.

Network effects will emerge from the interaction of different services, the linkage of different data sources, and the work of millions of people. Who will own all these things? The Web. Then businesses will leverage that Web, like they currently do, to create value for users.

So, is Twine, or any other so-called semantic web application, doomed because of their lack of a user base? I would guess no. It all depends on what network you’re talking about…

Bookmark and Share this article: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • StumbleUpon
  • del.icio.us
  • Digg
  • Reddit
  • NewsVine
  • Netscape

Trusting people on the Web

Print This Post Print This Post

An interesting post appeared in my feed reader this morning. This post, published on Slashdot, is saying:

“[…] a Newsweek piece suggests that the era of user-generated content is going to change in favor of fact-checking and more rigorous standards. […] “User-generated sites like Wikipedia, for all the stuff they get right, still find themselves in frequent dust-ups over inaccuracies, while community-posting boards like Craigslist have never been able to keep out scammers and frauds. Beyond performance, a series of miniscandals has called the whole “bring your own content” ethic into question. Last summer researchers in Palo Alto, Calif., uncovered secret elitism at Wikipedia when they found that 1 percent of the reference site’s users make more than 50 percent of its edits. Perhaps more notoriously, four years ago a computer glitch revealed that Amazon.com’s customer-written book reviews are often written by the book’s author or a shill for the publisher. ‘The wisdom of the crowds has peaked,’ says Calacanis. ‘Web 3.0 is taking what we’ve built in Web 2.0–the wisdom of the crowds–and putting an editorial layer on it of truly talented, compensated people to make the product more trusted and refined.’”

What is probably the best way to sell something to someone? When someone of trust recommends buying something for X, Y and Z reasons, to someone else. It is possibly why blogs are so powerful to sell things. You have people that write about their lives and their passions. From time to time they write about things they bought and they really liked. They are not paid for it; they just share their experience with other people. What if someone you learned to trust over time, by reading its blog, tell you that one of the thing you wanted to buy, but that you were was not sure to buy for some reasons, tell you that it is an awesome thing to have? Probably that you will more than likely be willing to buy the thing right away, online or in a local store. This is only possible because of the trust you have in this blogger, a trust that you learned over time, while reading its blog.

At least, it is what happens with me, and I hope I am not alone.

The problem they outline in this article is that the trust link has been broken between web readers and content creator. In systems such as Amazon.com and Ebay.com your user identity lives by its own, only within these systems. So you, as a reader and consumer on these web sites, only have access to things these content creator said, on these specific web sites only. You don’t have access the other things they written about, elsewhere on the Web. This means that you only have this partial and incomplete information to trust a person that said something about something you are reading, or that you are about to buy. This is more a question of faith than a question of “trusting the crowd”.

Calacanis said ‘Web 3.0 is taking what we’ve built in Web 2.0–the wisdom of the crowds–and putting an editorial layer on it of truly talented, compensated people to make the product more trusted and refined’. First of all, please stop using the Web 3.0 term for anything; just stop using it at all… Otherwise, I don’t think the benefits would be enough to justify the costs of such a system powered by a crowd of “expert”. In that case, is the whole thing doomed?

The main force in action here is trust. The idea is to strengthen the trust level between people across all web sites. What if, from a comment published by a user on Amazon.com, I could end up knowing the URL of its blog, if I could see the ratings he got from Ebay.com users, if I could read other comments he wrote on other web sites and blogs? What if I could know more about a person from any location on the Web, by referring to a comment he wrote?

Then I could start building a better trust relationship with that person, and put more weight in what he said.

Welcome on the Semantic Web.

Bookmark and Share this article: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • StumbleUpon
  • del.icio.us
  • Digg
  • Reddit
  • NewsVine
  • Netscape

Data Referencing, Data Mobility and the Semantic Web

Print This Post Print This Post

I recently started to follow discussions evolving around the Data Portability project. It is an emerging community of people that tries to define the principles and push technologies to encourage the “portability” of data between people and systems. Other such initiative exists, such the Linking Open Data Community (that emerged from the semantic web community more than one year ago), The Open Knowledge Definition, and there are probably many others too. However DP is the one that recently got the biggest media coverage considering “support” and covering from some people and groups.

An interesting thread emerged from the mailing list that was trying to get a better definition of what “Data Portability” means.

Henry Story opened the door of the “linked data” (instead of moving data) and Kingsley nailed the two important distinction points:

  1. Data Referencing
  2. Data Mobility (moving data from distinct locations via Import and Export using agreed data formats)

What the Semantic Web means in this context?

What these two critical points mean in terms of semantic web concepts and technologies?

Defining the context

This discussion will be articulated in one context: the Web. The current discussion will take into consideration that all data is available on the Web. This means the use of Web technologies, protocols, standards and concepts. This could be extended to other networks, with other protocols and technologies, but we will focus the discussion on the Web.

Data Referencing

How data referencing is handled on the semantic web? Well, much information is available about that question on the Linked Data Wikipedia page. Basically it is about referencing data (resources) using URIs (unique resources identifiers), and these URIs should ideally be “dereferencable” on the Web. What “dereferencable on the Web” means? It means that if I have a user account on a certain web service, and that I have one URI that define that account, and that this URI is in fact a URL, so that I can get data (normally a RDF document; in this example it would be a RDF document describing that user account) by looking at this URL on the Web (in this case we say that the URI is dereferencable on the Web).

This means one wonderful thing: if I get a reference (URI) to something, this means that in the best of the cases, I can also get data describing this thing by looking on the Web for its description. So, instead of getting a HTML page describing that thing (this can be the case, but is not limited to) I can get the RDF description of that thing too (via web server content negotiation). This RDF description can be use by any web service, any
software agent, or whatever, to helps me to perform specific tasks using this data (Importing/Exporting my personal data? Merging two agendas in the same calendar? Planning my next trips? And so on).

Now that I have a way to easily reference and access any data on the Web, how that accessible data can become “mobile”?

RDF and Ontologies to makes data “mobile”

RDF is a way to describe things called “resources”. These resources can be anything: people, books, places, events, etc. There exists a mechanism that let anybody describing things according to their properties (predicates). The result of this mechanism is a graph of relationships describing a thing (a resource). This mechanism do not only describes properties of a Thing, but also describe relationship between different things. For example, a person (a resource) can be described by its physical properties, but it can also be described with its relation with other people (other resources). Think about a social graph.

What is this mechanism? RDF.

Ontologies as vocabularies standards

However, RDF can’t be used alone. In order to make this thing effective, one need to use “vocabularies”, called ontologies, to describe a resource and its properties. These ontologies can be seen as a controlled vocabulary defined by a community of experts to describe some domains of things (books, music, people, networks, calendar, etc). It is much more than a controlled vocabulary, but it is easier to understand what it is that way.

FOAF is one of these vocabularies. You can use this ontology to describe a person, and its relation with other people, in RDF. So, you will say: this resource is named Fred; Fred lives near Quebec City; and Fred knows Kingsley. And so on.

By using RDF + Ontologies, the data is easily made Mobile. By using such standards that communities, people and enterprises agree to uses; systems will be able to read, understand and manage data coming from multiple different data sources.

Ontologies are standards ensuring that all the people and systems that understand these ontologies can understand the data that is described, and then accessible. It is where data becomes movable (mobility is not only about accessibility for download, it is also about understanding the transmitted data).
Data description robustness

But you know what is the beauty with RDF? It is that if one of the system doesn’t know one ontology, or do not understand all classes and properties of an ontology used to describe a resource, it will only ignore that data and concentrate its effort to understand the thing being described with the ontologies it knows. It is like if I would speak to you, in the same conversation, in French, English, Italian and Chinese. You would only understand what I say in the languages you know, and you will act considering the things you understood of the conversation. You will only discard the things you don’t understand.

Conclusion

Well, it is hard to put all these things in one single blog post, but I would encourage people that are not familiar with these concepts, terminologies and technologies, and that are interested in the question, to start reading what the semantic web community wrote about these things, what are the standards supported and developed by the W3C, etc. There are so many things that can change the way people use the Web today. It is just a question of time in fact!

Bookmark and Share this article: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • StumbleUpon
  • del.icio.us
  • Digg
  • Reddit
  • NewsVine
  • Netscape

Second version of Yago: more facts and entities

Print This Post Print This Post

In the past month or two I got more and more interested in the Yago project. First this gave me the opportunity to find a really interesting person, the main author of Yago, Fabian Suchanek. I have been impressed by the simplicity (and creating simple things with such complex stuff is certainly the harder task out there) and the coverage of Yago. It was well built and based on solid foundations. It is after downloading it, converting it into RDF, indexing it into a triple store and fixing serialization glitches and semantic relations issues that I really started to appreciate all the work that has been put in that project.

I am now pleased to write about the next version of Yago that has recently been released by Fabian & Co. The papers describing this new version has been published about a week ago (and written by Fabian, Gjergji Kasneci and Gerhard Weikum), and the new data set has been released a couple of days ago. After fixing one last RDF issue with the conversion of the Yago data set into RDF, I am now ready to write something about it.

First of all, what is Yago? Yago is some kind of ontologies. It is a dataset composed of entities and facts about these entities. It describes things such as Abraham Lincoln (entity) is the successor (fact) of James Buchanan (entity). All these entities and facts come from two data sources: Wikipedia and Wordnet. Please read Fabian’s paper to know exactly hat come from where.

Yago has its own representation and logic framework. However, converters exist to convert the Yago dataset into RDF serialized in XML or into other formats. Just to demonstrate how Yago is complete by itself, a query language has been created explicitly to query Yago. However, one can convert the Yago dataset into RDF, index it in a triple store, and query the same information using SPARQL (it is what I have done myself). To read about these frameworks, and to read about how Yago is working internally, you have to read the presentation paper written by Fabian.

So, what is new with this second version of Yago?

There is about 500 000 additional entities (now counting about 1 500 000 entities in the Yago dataset).

Also, many new predicates have been added in this new version, there is the list of 99 predicates available to build queries:

actedIn, bornIn, bornOnDate, created, createdOnDate, dealsWith, describes, diedIn, diedOnDate, directed, discovered, discoveredOnDate, domain, during, during, establishedOnDate, exports, familyNameOf, foundIn, givenNameOf, graduatedFrom, happenedIn, hasAcademicAdvisor, hasArea, hasBudget, hasCallingCode, hasCapital, hasChild, hasCurrency, hasDuration, hasEconomicGrowth, hasExpenses, hasExport, hasGDPPPP, hasGini, hasHDI, hasHeight, hasImdb, hasImport, hasInflation, hasISBN, hasLabor, hasMotto, hasNominalGDP, hasOfficialLanguage, hasPages, hasPopulation, hasPopulationDensity, hasPoverty, hasPredecessor, hasProduct, hasProductionLanguage, hasRevenue, hasSuccessor, hasTLD, hasUnemployment, hasUTCOffset, hasValue, hasWaterPart, hasWebsite, hasWeight, hasWonPrize, imports, influences, inLanguage, interestedIn, inTimeZone, inUnit, isAffiliatedTo, isCalled, isCitizenOf, isLeaderOf, isMarriedTo, isMemberOf, isNativeNameOf, isNumber, isOfGenre, isPartOf, isSubstanceOf, livesIn, locatedIn, madeCoverFor, means, musicalRole, originatesFrom, participatedIn, politicianOf, produced, publishedOnDate, range, since, subClassOf, subPropertyOf, type, until, using, worksAt, writtenInYear, wrote

Also the converted RDF dump is much, much bigger than the previous one. In fact, the RDF dump that is generated is about 15 gigabytes.

Trying to slim the RDF dump using N3 serialization

It is after noticing the size of the RDF dump serialized in XML that I checked if we could slim this data dump a bit by serializing all the RDF using N3/Turtle instead of XML.

However it was not concluding. Except for the friendliness aspect of the N3 code compared to the XML one, there is no real gain in term of space. The reason is that Yago extensively use reification to assert a statement about a triple (a fact). Since there is no reification syntax in N3 (or N3 Turtle), we have to describe the reification statement at length like this:

A RDF/XML Yago fact:

<?xml version=”1.0″?>
<!DOCTYPE rdf:RDF [<!ENTITY d “http://www.w3.org/2001/XMLSchema#”>
<!ENTITY y “http://www.mpii.de/yago#”>]>

<rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:base=”http://www.mpii.de/yago”
xmlns:y=”http://www.mpii.de/yago/”>
<rdf:Description rdf:about=”&y;Abraham_Lincoln”><y:hasSuccessor rdf:ID=”f200876173″ rdf:resource=”&y;Thomas_L._Harris”/></rdf:Description>
<rdf:Description rdf:about=”#f200876173″><y:confidence rdf:datatype=”&d;double”>0.9486150988008782</y:confidence></rdf:Description>
</rdf:RDF>

And its RDF/N3 counterpart:

@base <http://www.mpii.de/yago> .
@prefix y: <http://www.mpii.de/yago/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<#Abraham_Lincoln> y:politicianOf <#United_States> .
<#f201920397> rdf:type rdf:Statement ;
rdf:subject <#Abraham_Lincoln> ;
rdf:predicate y:politicianOf ;
rdf:object <#United_States> ;
y:confidence “0.967356428105286″^^xsd:decimal.

Since Yago is a special case that uses extensively reification for all of its facts, you can’t gain significant hard drive space by serializing in N3: it is at best marginal.

Some queries

What would be the usefulness of Yago without being able to query it? There won’t be any; so lets test it with some SPARQL queries.

Question 1: How is called the place where Andre Agassi is living?


SPARQL query:

sparql
select *
from <http://www.mpii.de/yago/>
where
{
<http://www.mpii.de/yago#Andre_Agassi> <http://www.mpii.de/yago/livesIn> ?place.
?place <http://www.mpii.de/yago/isCalled> ?place_name.
}

Result: “Las Vegas”

Question 2: What are the other film produced by the guy that produced the movie Blade Runner?

SPARQL query:

sparql
select *
from <http://www.mpii.de/yago/>
where
{
?producer <http://www.mpii.de/yago/produced> <http://www.mpii.de/yago#Blade_Runner>.
?producer <http://www.mpii.de/yago/produced> ?other_movies.
}

Result: “The Italian Job”, “Murphy’s War”, “Robbery”

And so on. It is that simple. If you do not know the URI of an entity, you only have to refer to its label using the property isCalled.

Considering that fact that we know the properties that are describing within Yago, and considering that all properties are consistent within Yago, it become quite easy to get interesting stuff by querying the dataset.

Conclusion

This new version is a clear leap ahead. It continues to be as simple as the first version. It is enhanced with more entities and more predicates; but is always consistent with a really good accuracy level.

I would like to see one more thing with Yago: being able to dereference these URIs on the Web. I will check with Fabian to make all these URIs dereferencable on the Web. So expect another blog post announcing this in the following days or weeks.

Bookmark and Share this article: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • StumbleUpon
  • del.icio.us
  • Digg
  • Reddit
  • NewsVine
  • Netscape

Why reading DataViewer pages instead of conventional web pages?

Print This Post Print This Post

Yesterday Georgi Kobilarov asked this question reading this blog post about the DataViewer:

“could you elaborate on an example in which the Zitgist Browser / DataViewer enables me (the end-user) to do something I couldn’t do by reading web documents or even do something faster than by reading web documents?”

This is a good and legitimate question. However the first question would be: what Semantic Web documents (RDF documents) are used for? The idea was probably to gives the Web back to machines, so that they can have access and process the data more easily.

In such a vision, the use of such a DataViewer can raise some questions, as you did.

So, what is the usefulness of such a tool to end-users?

If you tell me that your home page contains the same information as the RDF document describing entities (and all the information about them) available from your home page, then yes, a human should read your home page more easily (since HTML documents are built for humans).

But one characteristic of RDF, and we can see it with the emergence of the Linking Open Data Community, is that data (entities) can be linked together, as webpage are.

Entities, so the data describing these entities, are meshed together. This means that my personal profile can be linked to the semantic web document describing the city where I live; it can also be linked to the articles I wrote; be linked to my friends and co-workers; be linked to the company I currently work for; to the projects I worked on and the ones I am currently working on; and so on.

All this information is accessible from one place: my personal profile semantic web document. All the other information is meshed and displayed in the DataViewer. So, instead of browsing all these web pages, you will have all the information displayed in a DataViewer page.

Given this vision of things, I envision that in the future people will craft data: describing things and linking things together, instead of crafting web pages. The data itself will drive the generation (and optimization) of user interfaces that will display that data.

The idea of the current DataViewer is simple: shapeshifting the user interface to maximize the display of meshed data. The same principles can apply to other systems such as emails clients, rss readers, web widgets, etc. What the DataViewer is, is a kind of semantic web data management system. What it produces, for the moment, is an HTML document. But I can assure you that you will see it incarnated in other services and in
other environments.

The DataViewer is not the answer to all problems of the World; however it tries to do one thing: managing the presentation of semantic web data to end-users. At t