Zitgist.com website now online

I am pleased to announce that we finally put online the Zitgist.com website. Some more information about the future service is available there. People can start to link to zitgist.com when referring to this coming semantic web search engine. Eventually people will be able to fill a subscription form to get a private account to test the alpha version of the service. Finally a developer wiki will be available to explains how people should describe their content in RDF for a better indexation into the system, how their could take advantage of Zitgist, how they can interact with it, etc.

Also check out the new logo for the Music Ontology. This is one more step to brand the Music Ontology and help its adoption among people, companies and the community.

The Bibliographic Ontology

Zitgist, Bruce D’Arcus, the Zotero team and Michael K. Bergman started a new initiative to develop a new citation and bibliographic references ontology. The idea of that project started a couple of days ago when we tried to find how Zotero could be integrated in a semantic web environment. This brainstorming leaded us to start a new ontology development project: The Bibliographic Ontology.

References

Some things are already in place to start the collaborative development of the ontology:

Starting the development of this ontology

As a starting point of the development of this ontology, we will take the “Citation Oriented Bibliographic Vocabulary” developed by Bruce D’Arcus. It is a start, but as he pointed out in the brainstorming, there are much work to do with it to create a better citations and bibliographic ontology. Also, Bruce wrote an introduction mail about what he has in mind to make it a better ontology, what he thinks we should work on, etc. Have in mind that Bruce has a big background and much experience in the domain of citations and bibliographic references.

Goals

The development of this ontology should be driven by its goals. Bruce outlined some goals for this ontology, and more could be added depending on how people are expecting to use it.

  1. Should be a superset of legacy formats like BibTeX, RIS, and so forth
  2. Must support the most demanding needs in the social sciences, humanities, and law, and those who deal with non-Western languages
  3. The class system must be able to map to the type system in the citation style language I [Bruce] designed. In short, it is not enough to just encode the data: it needs to be able to be formatted according to the often archaic details of citation styles
  4. Should be developer-friendly; I consider examples like DOAP and SKOS to be models here
  5. Behind all of these goals are a more concrete goal: it should be perfect for using in OpenDocument/OpenOffice citation support and should handle Zotero’s needs.

In fact, for the point 5, these systems will be the tests cases for the development of this new ontology. They are the same as Musicbrainz, Magnatune and any musical needs that were the tests cases for the development of the Music Ontology.

Users

Users can be many people or systems. Just to listen a couple of them:

  • OpenDocument/OpenOffice citation system
  • Zotero
  • Zitgist
  • Students or professors in a social science or law department
  • Book selling systems such as Amazon.com, Alibris.com or Abebooks.com
  • Book, journals, etc. publishers
  • Authors

As you can see, many things [people or systems] are potential users of this ontology: from people without computer background to heavy and complexes systems such as Amaon.com Zotero and OpenOffice.

Constraints

Users and goals define the development constraints of that ontology. However, we will try to take the same path as me and Yves Raimond has taken for the development of the Music Ontology: creating many levels of expressiveness for the ontology. These levels will be use depending on the user: does the user need to only describe a simple bibliographic reference? Yes, then he will use the level one. Does the user need to describe a collaborative work aggregating many medium sources like: writings, speeches, and conferences, in many languages and in a special timeframe? Yes, then he will use level three. It has been quite a successful approach in the Music Ontology so we should try it into the Bibliographic Ontology too.

Reuse of existing ontologies

This ontology will probably reuse many existing ontologies. Some of them could be:

  • FRBR: as the basement of the ontology
  • FOAF: as the way to describe authors
  • SIOC: as a way to describe everything related to the social software World: wiki pages, blog posts, mailing list threads, etc.
  • MO: as a way to describe everything related to musical things
  • DC: do I have to say why?
  • Event: as a way to describe some events like workshops, conferences, etc.
  • Timeline: as a way to describe complex temporal frameworks

Conclusion

If you are interested in that new ontology development project, I would suggest you to subscribe to the mailing list as well as creating a user on the Wiki and to start giving your ideas and expertise to develop the Bibliographic Ontology. What is great with that project is that it is already motivated by external projects such as its integration into the OpenDocument/OpenOffice citation support and its use by Zotero for its integration with Ping the Semantic Web and Zitgist.

Integration of Zotero in a Semantic Web environment to find, search and browse the Web’s citations

Zotero is a great FireFox add-on that lets its users find, search, edit and create citations they find on the Web while browsing it. All the power of Zotero resides in its “translation modules“. These modules will detect citations in various types of web pages. When it detects one of these citations, it will notify its users to give them the opportunity to save them.

What interest me is that Zotero already use some ontologies to export users’ citations libraries using RDF. When I noticed that I started to wonder: what could we do with Zotero now?

The Zotero vision

Zotero is the best-integrated citation tool for the Web I know. A phenomenal amount of citations can be discovered on the Web via Zotero users community.

Remember what we have done with the Semantic Radar a couple of months ago? This FireFox add-on was detecting SIOC RDF documents in Web page. Then I contacted Uldis Bojar to ask him to ping PingtheSemanticWeb.com each time a user was detecting a RDF file while he was browsing the Web. Now a good source of RDF data pinged to PTSW come from Semantic Radar users. This is a sort of “social semantic web discovering” technique.

What I would like to do is the same thing but for Zotero.

zotero-ptsw-zitgist.jpg
[Click to enlarge to full size]

  1. Zotero users browse the Web, discover citations and save them into their personal libraries.
  2. Each time a Zotero instance discover a citation, it would send the URL where we can find it to PingtheSemanticWeb.com.
    1. Note: the user should be aware of that functionality via an option into Zotero that would explains him what this feature it is all about, and to gives him the possibility to disable it.
    2. Note: Zotero would ping PTSW each time it detects a citation (so that the icon appears in the FireFox’s URL bar), and not each time a user save it.
  3. Via the Virtuoso Sponger, PingtheSemanticWeb.com will check the incoming URL from Zotero users and will check to find citations too. If a citation is found, it will be added to its list of know citations and archive their content.
  4. PingtheSemanticWeb.com will then send the new citations to Zitgist so that it can include them into its database.
    1. Note: here Zitgist could be replaced by any web service wanting them. Remember that PTSW act as a data-multiplexer.
  5. Via Zitgist (that is a semantic web search engine), users from around the World will be able to search among these citations (discovered by Zotero users) and to browse them.

Zitgist has a Zotero citation provider

What is fantastic here is that Zitgist become a source of citations. So if a Zitgist user has Zotero installed, then he will be able to batch-save the list of results returned by Zitgist; and if the user is browsing Zitgist’s citations, he will be able to include them into their Zotero instance like if Zitgist would be Amazon.com or any other citations web sites.

That way, Zotero’s found data would be accessible to Zotero users via Zitgist that would then become a citations provider (mainly feed by the Zotero community).

You see the interaction?

What have to be developed?

Some things have to be developed to make that vision working. No major development, but only a couple of features to develop on each system.

Integration of Ping the Semantic Web into Zotero

The integration of Ping the Semantic Web into Zotero is quite straightforward.

Pinging PingtheSemanticWeb.com via a web service

The first step is to make Zotero notify PTSW each time it comes across a citation. It has to send the URL of that/these citation(s) via XML-RPC or REST.

That is it. Each time Zotero detect a citation, it sends a simple ping to PTSW via an XML-RPC or REST request.

Adding a pinging option to Zotero

Another thing that Zotero would have to add to their add-on is an option that would gives the possibility to their users to disable that feature in case they don’t want to send a notification to PTSW each time they discover a citation on a Web page while they are browsing the Web.

Development of Zotero translators into Sponger Metadata Cartridge

The biggest development effort that would have to be done is to convert the Zotero translators into Virtuoso Sponger’s Metadata Cartridge.

Right now, Metadata Cartridge exists for: Google Base, Flickr, microformats (hReview, hCalendar, etc.), etc. These cartridges are the same things as “Zotero translators” but for the Virtuoso Sponger. By developing these cartridges, everybody running Virtuoso will be able to see these citations (from Amazon, etc.) as RDF data (mapped using some ontologies).

Documentation about how to develop these cartridges will be available in the coming days. From there, we would be able to setup an effort to convert the Zotero Translators into Spongers Metadata Cartridges.

Conclusion

This is the vision I have of the integration of Zotero into the current Semantic Web environment that exists. Any ideas, suggestions, collaboration propositions would be warmly welcome.

Note: a discussion about this subject started on Zotero’s web forum 

Zitgist Search Query Interface: A new search engine paradigm

 

People start to talk about Zitgist: What is it? How does it work? When will it be released? Etc. This is the first article of a series to come that will explain the first portion of the service: the search query builder user interface. As you will see bellow, there are many considerations to take into account when dealing with the development of a semantic web search query builder.

The difference between a traditional search engine like Google and a semantic web search engine like Zitgist is that the aggregated, indexed and queried data is different. Google mostly use text files such as HTML, PDF, DOC, Etc., and Zitgist use RDF files from genuine or converted data sources.

This difference has a big impact on how users build queries to answer to their questions. Google users use keywords to try to define what they are searching for. Then the search engine will check in their database to find these keywords into the texts they aggregated and indexed.

The new paradigm introduced by semantic web search engine such as Zitgist is different: users will describe the characteristics of the subjects of their search instead of using keywords.

As you will see, the difference between using keywords and describing characteristics of subjects will have a great impact on the user interface used to build these search queries.

 

A first query

 

1.jpg
[Click to enlarge]

 

The first step is to choose which type of subject a user is searching for. In the first version of Zitgist, we let users choosing among some type of subjects: musical things such as artists, bands, albums, tracks, performances, or people, groups, projects, geographical locations, documents and discussion forums.

Once the user choose what he was searching for he has to describe the characteristics of that subject.

2.jpg

[Click to enlarge]

In that first example, the user tries to find a person. As you can see, there are some characteristics describing a person that can be defined by the user. Depending on the user interface (basic or advanced) more or less characteristic will be available for the description of that subject.

So the user chooses to search for a person that has the name “Chris” and that is interested in the “Semantic Web”.

 

3.jpg

[Click to enlarge]

 

The search engine will then return results matching subjects know by Zitgist having these two characteristics.

Using Google, the user would have use the query string “chris semantic web” that has three distinct keywords: (1) “chris” (2) “semantic” and (3) “web”. The problem is that there is no relation between these keywords. Is he searching for someone named Chris that is working in the semantic web domain or that is interested in the semantic web? Is the user searching for something else? There is no way to know. The best Google can do, is putting their algorithmic magic into action to try to find what the user is searching for, and hoping it is really what he wants.

But for Zitgist, if the person [the subject of the search] defined himself as having the name Chris and having an interest in the Semantic Web (defining himself using RDF) than we will know that the results are definitely what the user is searching for.

Note: one of the next article will be dedicated to what will happens once a user get results from Zitgist.

 

Describing relationship between more than one subjects

The first example was quite simple. However Zitgist’s query builder interface take all its senses once we try to push it a little further.

How a user could easily describe a subject, with its own set of characteristics, that knows another subject, also with its own set of characteristics?

 

4.jpg

[Click to enlarge]

 

In this example, we have a user that search for a person knows as “Alice”. But he doesn’t search for any person named “Alice”, no. This user wants to find a person knows as “Alice” that know another person named “Bob”.

As you can see in the image above, it is quite easy to do using Zitgist. The user described the subject he wants to search for: “Alice”. This subject is a person with the name “Alice” that “knows” a person called “Bob”.
As you can see, the user interface changed its color when we introduced a new subject into the query [“Bob”]. That way, users can easily see which subject they are describing.

After that, the user could always add new characteristics to Bob. He could say that Bob is interesting in writing and that he lives near London for example.

 

 

In fact, the possibilities are endless.

 

One more step

What is interesting with the semantic web is that anybody can describe anything. One of these interesting example is when we start to think about Document. In fact, what are documents? What describe a Document? Etc.

A document can be described with an author, a creation data, a publication date, an editor, a publisher, its medium, etc. But its content can also be described such as its topics.

 

5.jpg

[Click to enlarge]

 

If someone describes one of the documents he created and that explicited the topic(s) of that document, Zitgist could easily find it that way:

 

6.jpg

[Click to enlarge]

 

As you can see a user can search for a “Document” that as a “Person” named “Alice” that “knows” another person named “Bob” as the “Topic” of the “Document”.

So, if someone would start to describe novels that way, we could easily search for books where its protagonist is called Alice and that is living in London. Wouldn’t that be a terrific way to find books you could like to read? The only thing we need at the moment is people starting to describe books that way: hobbyist, authors or publishers.

 

We always knows the data we are manipulating

What is fantastic with the seman

tic web is that we always know what is the data we are manipulating. As you will see in next articles, this characteristic of the system is the main one when comes the time to talk about users interfaces. However, I will introduce it in this article using the query builder interface.

 

7.jpg

[Click to enlarge]

In the example above a user tries to find a geographic location near a certain place. However the question for the user here is: how should I describe that location? By a name? Which name? By a latitude and longitude? How to? Etc.

By the fact that we know what is the type of the data the user is looking for we can try to assist it with some widgets.

In the example above we know that the user is searching for a geographical location. Ultimately, a geographical location on Earth is defined by a longitude and latitude. So what we do is showing a map widget to the user. The only thing he will have to do is to click on the map to choose the location. That is it. The user interface widget is intuitive for users, and he doesn’t have to bother about how to describe the location.

Another example:

 

8.jpg

[Click to enlarge]

 

Now the user try to find a ” Music Artist” that “composed” “Albums” between “1980” and “1990”.

In such a case, how the user is supposed to describe that fact? Would he writes dates like “1980-01-03”? “1980-03-01″? ” 3 January 1980″? Etc…

Since Zitgist knows what the user is trying to describes, it only popup a small widget that will assists the user in the creation of its search query.

This is by far the greatest strength of the semantic web when come the time to talk about user interfaces. Since the interface knows what is the type of the data being manipulated, it can do a full of things to help users to do what he really want to. And what a user really want to do is certainly not asking questions like: how should I describe this thing, Etc.

And as you will see in next articles, this is just the beginning.

 

More information about Zitgist

There are a list of blogs post I wrote about Zitgist, explaining what the is project, its goals, its vision, its release, etc.

 

Conclusion

Zitgist’s goal is not to be a replacement to traditional search engines such as Google. In short and middle term its goal is to be complementary to traditional search engines; to be another tool in Web users’ toolkit.

As you can see by the description of this semantic web search engine query interface, the semantic web and semantic web search engines like Zitgist will be quite useful to make some order, classify and search in all the data that has been created so far and that is yet to be created on the Web.

In the next articles I will continue to roll out what Zitgist is, where we are with the project and how it integrates into the semantic web that is now emerging on the Web.

The only thing you have to do is to sit down and check the show.

No I am wrong, the only thing you have to do is board the train and continue with us by asking question, making comments and suggestions, describing your data using RDF, letting Zitgist integrating it into its database, etc.

Welcome aboard.

Dynamic Data Web Page

What is a dynamic data web page? It is a shape-shifter data source. That is it. It is a source of data that will change its shape depending on the request that has been made on the data source.

 

Shapes of the data source

The data source will shape the format of its output depending on what you need. If you are a human, you would like to have something that you can read and understand like a HTML web page. However, if you are a web service, you would probably like to get the data in a shape that you could easily understand such as RDF, XML, JSON, etc.

It is as simple as that: a Dynamic Data Web Page is web page that will outputs data in different formats depending on what the requesting users wants.

There are many formats:

  1. HTML – Human readable
  2. RDF/XML
  3. RDF/N3
  4. XML
  5. JSON
  6. Others could be easily implemented if needed.

 

In Dynamic Data Web Page there is: a Web Page and Data

A DDWP is two things:

  1. A Web Page: as we saw above, it is a way to present/publish the data of the source formatted in some way.
  2. Data: as we will see bellow, it is the source of data.

This said a DDWP is nothing else than a source of data published in some ways.

 

Dissection of a Dynamic Data Web Page

 

 

0. Creation of the data source. The preliminary step for the data source (triple store) is to continually index RDF data sources. If we are talking about a generic service, then it should aggregate RDF data from everywhere: the Web, specialized databases such as Musicbrainz, Wikipedia, etc. If it is a specialized system such as the products catalogue of a company, it should constantly synch its triple store with the catalogue. This constant operation will create a valuable data source.

1. Creation of a SPARQL query. An end user wants information. This end user can be anything: a user, a developer, a web service, etc. This user will build a SPARQL query that will returns the results from the data source.

2. Saving the SPARQL query. The SPARQL query will then be saved on the web server of the service. As soon as the query will be saved on the server.

3. Assigning a URL to the SPARQL query. A URL will be assigned by the web server for that saved SPARQL query. From there, anybody will be able to access to the results of the query by looking at that URL.

4. Accessing the URL

4.a. Sending the HTTP query. In our example, a web service tries to get the results returned by the SPARQL query from the DDWP. To get them, it will send a HTTP query to the web server for that URL.

4.b. Doing content negotiation with remote server. However, the web server wants a XML representation of the results since it is the only format it understands. This request will be done via content negotiation with the web server. It is where the shapes of the DDWP are important: depending on want the user want, results of a SPARQL query will be formatted in one of the possible shapes, depending on what the user wants (content negotiation), and will send them to him.

5. Generating the DDWP according to the content negotiated. The Dynamic Data Web Page will be generated by the web server depending on the content negotiation the two parties agreed on.

6. Sending results to the web service. Finally the results, formatted to meet the user’s needs, will be returned to the user.

 

What this means?

This means that data only matters. In fact, the only thing one need now is to build a good data source. Once the data source is well built (remember, the data source can be anything here, from a search engine database to the products catalogue of a company, or even the personal web page of a 14 years old geek).

From that data source, everything can be generated for each web page (URL). If the content requested is a HTML page, then the data source can generate XML, run a XSLT skin template with and then send a HTML page: just like any other web page. However, from the same data source, a semantic web crawler could request the RDF/N3 data for the same URL. Then the DDWP would send the RDF/N3 representation of the URL.

So from one data source, you can get its data the way you want.

From that point a URL (or a web page, call it the way you want) become a presentation page web, a web service, etc; All-in-one!

 

Some examples

Everything is made simpler with examples, so there we are. All the concept of Dynamic Data Web Page is possible thanks to Virtuoso. All the examples above are using this database management system.

Okay, to illustrate the case, we will use this Google Base Jobs page for example:

Step #0

The triple store will get that Google Base Jobs page, convert it into RDF and then will index the triples into the triple store. This will be the data we will try to access.

Step #1

A user created a SPARQL query that will request all that data. The query look-like:

SPARQL
SELECT ?s ?p ?o
FROM <http://www.google.com/base/feeds/snippets/-/ jobs?start-index=30&max-results=30&key= ABQIAAAA7VerLsOcLuBYXR7vZI2NjhTRERdeAiwZ9EeJWta3L_ JZVS0bOBRIFbhTrQjhHE52fqjZvfabYYyn6A>
WHERE
{
?s ?p ?o .
}

Step #2

The user will save the SPARQL query on the web server in the directory “/DAV/home/demo/Public/Queries/DataWeb/” with the file name “google_base_jobs_dataspace.isparql”

Step #3

The web service will assign a URL to that file:
http://demo.openlinksw.com/DAV/home/demo/Public/ Queries/DataWeb/google_base_jobs_dataspace.isparql

Now the user wants to see the results of the query he just built, he can see them only by putting this URL into its web browser. Then a HTML web page will be generated and displayed so that he can easily consult it.

This is a generic html page. But what about generating XML instead of HTML and then applying a XSLT skin template to generate the HTML for the user? Yeah, you just got another way to create traditional dynamic web pages.

Step #4.a / #4.b / #5 / #6

Now what we want is showing what happen when a web service request results not in HTML but in something else like RDF/XML.

To show you how it happens, we will use the OAT RDF Browser. This is a web service that will get RDF data from somewhere on the Web and that will display it to users via a web interface.

This web service do exactly the steps 4, 5 and 6: it send a HTTP query for a URL. It will do some content negotiation with the remote web server to get RDF data, it will download the RDF data sent by the web server, consume it and display it to the user via the interface.

The result for our example is there. As you can see, from the same URL, the DDWP will send RDF/XML data instead of HTML. Then the web service will consume it, and display the same information in a different way. How different? Well, click on the Yahoo! Map Tab and you will see. You see? The same information displayed on a map that shows where the jobs are in the United-States.

 

Conclusion

Dynamic Data Web Page is not a theory. It is a reality; it is something that already exists in Virtuoso and that can be used by anyone who cares about simplifying the exchange of data between its system and other systems. It is all about Web communication. Instead of talking about language (real world) we are talking about formats (web world).