Frederick Giasson – Page 35 – Machine Learning, Engineering & Data

Integration of Zotero in a Semantic Web environment to find, search and browse the Web’s citations

April 12, 2007April 13, 2007 Frederick Giasson

Zotero is a great FireFox add-on that lets its users find, search, edit and create citations they find on the Web while browsing it. All the power of Zotero resides in its “translation modules“. These modules will detect citations in various types of web pages. When it detects one of these citations, it will notify its users to give them the opportunity to save them.

What interest me is that Zotero already use some ontologies to export users’ citations libraries using RDF. When I noticed that I started to wonder: what could we do with Zotero now?

The Zotero vision

Zotero is the best-integrated citation tool for the Web I know. A phenomenal amount of citations can be discovered on the Web via Zotero users community.

Remember what we have done with the Semantic Radar a couple of months ago? This FireFox add-on was detecting SIOC RDF documents in Web page. Then I contacted Uldis Bojar to ask him to ping PingtheSemanticWeb.com each time a user was detecting a RDF file while he was browsing the Web. Now a good source of RDF data pinged to PTSW come from Semantic Radar users. This is a sort of “social semantic web discovering” technique.

What I would like to do is the same thing but for Zotero.

[Click to enlarge to full size]

Zotero users browse the Web, discover citations and save them into their personal libraries.
Each time a Zotero instance discover a citation, it would send the URL where we can find it to PingtheSemanticWeb.com.
1. Note: the user should be aware of that functionality via an option into Zotero that would explains him what this feature it is all about, and to gives him the possibility to disable it.
2. Note: Zotero would ping PTSW each time it detects a citation (so that the icon appears in the FireFox’s URL bar), and not each time a user save it.
Via the Virtuoso Sponger, PingtheSemanticWeb.com will check the incoming URL from Zotero users and will check to find citations too. If a citation is found, it will be added to its list of know citations and archive their content.
PingtheSemanticWeb.com will then send the new citations to Zitgist so that it can include them into its database.
1. Note: here Zitgist could be replaced by any web service wanting them. Remember that PTSW act as a data-multiplexer.
Via Zitgist (that is a semantic web search engine), users from around the World will be able to search among these citations (discovered by Zotero users) and to browse them.

Zitgist has a Zotero citation provider

What is fantastic here is that Zitgist become a source of citations. So if a Zitgist user has Zotero installed, then he will be able to batch-save the list of results returned by Zitgist; and if the user is browsing Zitgist’s citations, he will be able to include them into their Zotero instance like if Zitgist would be Amazon.com or any other citations web sites.

That way, Zotero’s found data would be accessible to Zotero users via Zitgist that would then become a citations provider (mainly feed by the Zotero community).

You see the interaction?

What have to be developed?

Some things have to be developed to make that vision working. No major development, but only a couple of features to develop on each system.

Integration of Ping the Semantic Web into Zotero

The integration of Ping the Semantic Web into Zotero is quite straightforward.

Pinging PingtheSemanticWeb.com via a web service

The first step is to make Zotero notify PTSW each time it comes across a citation. It has to send the URL of that/these citation(s) via XML-RPC or REST.

That is it. Each time Zotero detect a citation, it sends a simple ping to PTSW via an XML-RPC or REST request.

Adding a pinging option to Zotero

Another thing that Zotero would have to add to their add-on is an option that would gives the possibility to their users to disable that feature in case they don’t want to send a notification to PTSW each time they discover a citation on a Web page while they are browsing the Web.

Development of Zotero translators into Sponger Metadata Cartridge

The biggest development effort that would have to be done is to convert the Zotero translators into Virtuoso Sponger’s Metadata Cartridge.

Right now, Metadata Cartridge exists for: Google Base, Flickr, microformats (hReview, hCalendar, etc.), etc. These cartridges are the same things as “Zotero translators” but for the Virtuoso Sponger. By developing these cartridges, everybody running Virtuoso will be able to see these citations (from Amazon, etc.) as RDF data (mapped using some ontologies).

Documentation about how to develop these cartridges will be available in the coming days. From there, we would be able to setup an effort to convert the Zotero Translators into Spongers Metadata Cartridges.

Conclusion

This is the vision I have of the integration of Zotero into the current Semantic Web environment that exists. Any ideas, suggestions, collaboration propositions would be warmly welcome.

Note: a discussion about this subject started on Zotero’s web forum

Music Ontology’s new domain name and Wiki

April 11, 2007 Frederick Giasson

I just finished to setup a web server to host the new Music Ontology’s domain name musicontology.com. The specification document is now available at that new URL. This step has been done to start branding the ontology to help for its adoption.

New development Wiki

I also installed Mediawiki to help the development of the ontology. This new Wiki is available at wiki.musicontology.com.

I think this is an essential element in the ontology’s development infrastructure. This Wiki will be used to make the link between the specification document and the mailing list.

The main incentive to create this Wiki is to put some sections of the specification document into the Wiki. That way, people will be able to easily enhance these sections with new examples, etc.; and it will make the specification document much smaller and simpler.

Wiki main sections

If you take a look at the left side bar, you will notice links to main Wiki sections. I would suggest you to take a look at these sections to see what they are used for. I started to put information in these pages and I will continue in the next days; but I would appreciate if you could put some lines if you have some ideas, examples, use cases, etc in mind.

How members should use this Wiki

First of all, I expect that all mailing list members create a new user in this Wiki. So, people interested in the Music Ontology Development should follow these steps:

They should register a new user
They should put putt their name in the “Community Member list”
They should put some information about them in their user page
They should contribute to this Wiki by adding content, examples, use cases, tools, etc.

Music Ontology Logo

Just for your information, a logo for the Music Ontology should be available soon too.

Conclusion

I think that the Wiki was a requested feature since the beginning of the development of the Music Ontology. If people start to write some things in it, it will not only become the best way to develop the ontology and track its evolution; it will also become the best way to publicize it and to help for its adoption by the Web community.

Virtuoso Open-Source Edition version 5 released

April 11, 2007April 11, 2007 Frederick Giasson

OpenLink Software Inc. just released the Virtuoso Open-Source Edition 5.0.0. From the press release:

This version includes:

Significant rewrite of database engine resulting in 50%-100% improvement on single CPU and in some cases up to 300% on multiprocessor CPUs by decreasing resource-contention between threads and other optimizations.

Radical expansion of RDF support including

In-built middleware (called the Sponger) for transforming non RDF into RDF “on the fly” (e.g. producing Triples from Microformats, REST style Web Services, and (X)HTML etc.)

Full Text Indexing of Literal Objects in Triple Patterns (via Filter or magic bif:contains predicate applied Literal Objects)

Basic Inferencing (Subclass and Subproperty Support)

SPARQL Aggregate Functions

SPARQL Update Language Support (Updates, Inserts, Deletions in SPARQL)

Improved Support of XML Schema Type System (including the use of XML Schema Complex Types as Objects of bif:xcontains predicate)

Enhancements to the in-built SPARQL to SQL Compiler’s Cost Optimizer

Performance Optimizations to RDF VIEWs (SQL to RDF Mapping)

Bug fixes

NOTE: Databases created with earlier versions of Virtuoso will be automatically upgraded to Virtuoso 5.0 but after upgrade will not be readable with older Virtuoso versions.

For more information please see:

Virtuoso Open Source Edition Links:

Home Page

Download Page

Most interesting new/updated features

The Sponger (or RDF crawler) is quite a great improvement of the version 5 of the open source edition of Virtuoso. I talked about it in that blog post: “Making the bridge between the Web and the Semantic Web“. This is a sort of Swiss army knife for on-the-fly Web data conversion into RDF.

I have requested the full-text index over triple store literals a few months ago, now this feature is already included into the open source edition of Virtuoso. A simple SPARQL language extension enable the use of this more-than-essential feature. It is possible to use this feature with triples of the triple store, or via a RDF View.

Finally RDF Views are an essential tool to convert existing data to RDF on-the-fly. A good use case I can think about is:

One enterprise has many data sources (employees and products data), on multiple database systems (mysql and oracle). This enterprise wants to convert all these data sources into RDF using some related ontologies. Their goal is to uniform the data among multiples services points of the enterprise to let them easily exchange information.

The only thing they have to do is to connect Virtuoso, via ODBC, to the mysql and oracle databases. Then, they only have to write some RDF Views using these mysql and oracle relational databases as data source to convert them in RDF. Finally, all the enterprise management applications will be able to query these databases, uniformly among all service points, with a single SPARQL query against this Virtuoso server instance running the RDF Views.

The same information is always in place, the only thing that has to be added is a RDF Views layer that converts the data of relational databases into RDF.

MusicBrainZ RDF Views

I was waiting after that release before publishing the MusciBrainZ RDF View I create to convert MusicBrainZ relational data into RDF using the Music Ontology. In the next few days I should write a blog post explaining how this RDF View is working and how you can use it on your Virtuoso Open Source Server instance.

Conclusion

I am sure that this new and major release will help the development of many semantic web projects that could eventually change the World, as we know it.

Has Robert Scoble got some incentives to ‘finally’ get what the semantic web is?

April 9, 2007April 10, 2007 Frederick Giasson

Everybody do errors and I have just done one.

Thanks for proving me that I was wrong.

I probably should have wrote a blog post about it instead of writing a comment, that way I would have been sure that you get it (like this blog post that created an instant reaction).

Robert, it seems you didn’t received my email 4 days ago, so I am sorry about that.

Anyway it doesn’t change the essence of this blog post, and my comment. This is not a good start, but a good way, to try to tie the link between the “Web 2.0” (sorry but I don’t like that term 😉 ) and the Semantic Web [academic] community. There are much things going on around that could benefit everyone.

The only thing I would like to say that people would remember is that the Semantic Web is not the result of one or a couple of companies, but the result of a Whole; the result of the interaction between all beings.

Robert, I hope you will continue to dig deeper to find all the things people are working on related with the Semantic Web. Sorry about that, and I hope you a beautiful day!

~~I am asking the question and I hope I am wrong.~~

Some days ago Robert Scoble wrote an enflaming post about what Radar Networks are currently developing. This “thing” (I refer to a “thing” because no body know what it really is (some type of semantic web system)) finally helped Robert to understand what the semantic web is.

~~At that moment I was happy to see that a “Web 2.0” guru understood how Semantic Web technologies could help him; how they could be used to make the World a better place to live in.~~

Then I told myself: “Fred, help him to see what other people are doing in that direction too. Show him what you are working on; what other people are developing too; what they are writing on the subject; Etc.”

~~Then I wrote that comment on his blog post:~~

Hi Robert,

Could I suggest a couple of reading in that direction that could potentially interest you?:

Zitgist Search Query Interface: A new search engine paradigm

The Linked-Open-Data mailing list

Planet RDF

From there, you will be able to dig deeper into the semantic web community, the ideas it plays with, what the Web is becoming, etc.

Hope it can helps some people to eventually understand what is going on with the semweb.

Take care,

Fred

~~This comment has never appeared on its blog post. It seems he rejected it by moderation. I sent him an email 3 days ago and he never replayed to me.~~

~~Why Robert rejected this innocent comment? I got my idea that lead to the topic of this blog post: “Has Robert Scoble got sone incensitives to ‘finally” get what the semantic web is?”~~

~~Does Robert rejected it because I was referring to Zitgist; and that is a possible competitor to what Radar Networks is working on right now?~~

~~I have no idea, but I am always frustrated to see when bloggers doesn’t tell to their readers they got some incentives to write articles about special things.~~

~~Otherwise, why my comment got rejected? I have no idea, but I would like to know.~~

At the end, these people will probably have to learn that the Semantic Web is more about cooperation between people, enterprises, other entities and honesty than a more traditional way to do things and business.

I think that the Semantic Web will change things in a major way, as long as people, societies and the way we live.

Making the bridge between the Web and the Semantic Web

March 28, 2007March 28, 2007 Frederick Giasson

Many people think that the semantic web will never happens, at least in next few years, because there is not enough useful data published in RDF. This is fortunately a misconception. In fact, many things are already accessible in RDF, even if it doesn’t appear at the first sigh.

Triplr

Danny Ayers recently pointed out a new web service created by Dave Beckett called Triplr: “Stuff in, triples out”.

Triplr is a bridge between well-formed XHTML web page containing GRRDL, RSS and their RDF/XML or Turtle formatting.

Here is an example

Virtuoso’s Sponger

Another bridging service called the Sponger also exists. Its goal is the same as Triplr: taking different sources of data as input, and creating RDF as output.

The Virtuoso Sponger will do everything possible to find RDF triples from a given URL (via content-negotiation and checking for “link” elements in HTML files). If no RDF document is available from a URL, it will tries to convert the data source available at that URL into RDF triples. Converted data sources are: microformats, RDFa, eRDF, HTML meta data tags, HTTP headers, as well as APIs like Google Base, Flickr, Del.icio.us, etc.

How does it work?

The first thing the Sponger is doing is trying to dereference a given URL to get RDF data from it. If it finds some, it returns it, otherwise, it continues.
If the URL refers to a HTML file, the Sponger will try to find “link” elements referring to RDF documents. If he finds one or more of them, it will add their triples into a temporary RDF graph in and continue its process.
If the Sponger finds microformat data into the HTML file, it will maps it using related ontologies (depending on the microformat) and will creates RDF triples from that mapping. It will add these triples to the temporary RDF graph and continues.
If the Sponger finds eRDF or RDFa data into the HTML file, he will extracts them from the HTML file and add them into the RDF graph and continues.
If the Sponger find that it is talking with a web service such as Google Base, it will maps the API of the web service with an ontology, creates triples from that mapping and includes the triples into the temporary RDF graph and continues.
If nothing is found and that there is some HTML meta-data, it will maps them with some ontologies, creates triples and add them to the temporary RDF graph.
Finally, if nothing is found, it will returns an empty graph.

The result is simple: from any URL, it is most than likely sure that you will get some RDF data related to that URL. The bridge is now made between the Web and the Semantic Web.

Some examples

There are some examples of data sources converted by the Sponger:

RDF/XML from HTML via GRDDL (same as the Triplr example)
Following “link” HTML document to find linked RDF files (from my home page, to my FOAF profile hosted on another website)
From the Google Web service API to RDF/XML (There is the normal web page (a feed) where the triples are generated from)

Conclusion

What is fantastic for a developer is that he only has to develop its system according to RDF to make its application communicating with any of these data sources. The Virtuoso Sponger will do all the job of interpreting the information for him.

This is where we really meet the Semantic Web.

With such tools, it is like looking at the semantic web in a lens.