Archive for July, 2007

News at Zitgist: the Browser, PTSW, the Bibliographic Ontology and the Query Service

It is not because we had some issues with the Zitgist Browser‘s server that things stopped at Zitgist. In fact, many projects evolved at the same time and I outline some of these evolutions bellow.

New version of the Zitgist Browser

A new version of the browser is already on the way. In fact, the pre-release version of the browser was a use case; a prototype. Now that we know that it works and that we faced most of the issues that have to be taken into account to develop such a service, we hired Christopher Stewart to work on the next version of the browser. He is already well into the problem now, so you could expect a release of this new version sooner than you could be expecting. At first, there won’t be many modifications at the user interface level, however, many things will be introduced in this new version that will help us to push the service at another level in the future.

New version of Ping the Semantic Web

The version 3.0 of the PingtheSemanticWeb.com web service should be put online next week. It will be a totally new version of the service. It won’t use MySQL anymore; Virtuoso has replaced it. The service will now fully validate RDF files before including them in the index. More stats will be available too. It is much faster (as long as remote servers are fast too) and I estimate that this only server could handle between 5 to 10 million pings per day (enough for the next year’s expansion). This said, the service will be pushed at another level and be ready for more serious traffic. After its release, a daily dump of all links will be produced as well.

The first draft of the Bibliographic Ontology

The Bibliographic Ontology Specification Group is on fire. We are now 55 members and generated 264 posts in July only. Many things are going on here and the ontology is well underway. We should expect to release a first draft of the ontology sometime in August. If you are interested in bibliographic things, I think it’s a good place to be.

The Zitgist Semantic Web Query Service

Finally, Zitgist’s Semantic Web Query Service should be available for alpha subscribed users sometime in September. You can register to get your account here. Also, take a look at what I wrote about vis-à-vis this search module (many things evolved since, but it’s a good introduction to the service).

Conclusion

So, many things are going on at Zitgist and many exiting things should happen this autumn, so stay tuned!

Zitgist Browser's server stabilized

Five weeks ago I introduced the Zitgist Browser on this blog. At that time, I talked about a pre-release of the service. These two little words probably helped to explain what followed in the following weeks.

In fact, some of you probably noticed that the Zitgist Browser was down half of the time for a couple of weeks. In fact, we found many issues at many levels that rendered the browser's server unstable. In the last weeks, we performed a battery of tests to fix all issues that appeared. Now, about three weeks later, the server is back stable. At least, it has been online for the last couple of days without any issues.

Thanks to the OpenLink Software Inc. development team, we have been able to stabilize the service; and it wouldn't have been possible without their help and expertise.

Finally, stay tuned for the next release of this service and continue to use it and report issues that you could encounter while browsing the semantic web (more information about the next version in the next blog post); and sorry about the possible frustrations you possibly had when you used the unstable version of the service.

UMBEL: Upper-level Mapping and Binding Exchange Layer

umbel_medium.png

Mike Bergman released the first draft of its UMBEL ontology. Me and some other people helped him to come up with that new ontology.What is UMBEL? UMBEL is a lightweight subject reference structure. People can see it as a pool of subjects. Subjects are related together at a synonymy level; so, subjects of related meaning will be binded together.

The objectives

The objectives of this new ontology are:

  • A reference umbrella subject binding ontology, with its own pool of high-level binding subjects
  • Lightweight mechanisms for binding subject-specific ontologies to this structure
  • A standard listing of subjects that can be refererenced by resources described by other ontologies (e.g., dc:subject)
  • Provision of a registration and look-up service for finding appropriate subject ontologies
  • Identification of existing data sets for high-level subject extraction
  • Codification of high-level subject structure extraction techniques
  • Identification and collation of tools to work with this subject structure, and
  • A public Web site for related information, collaboration and project coordination.

Main applications

Given these objectives, I see a couple of main applications where such ontology could be used:

  • Helping systems to find data sources for a given ontology. UMBEL is much more than a subject structure. In fact, UMBEL will bind subjects, with related ontologies and data sources for these related ontologies. So, for a given subject, people will be able to find related ontologies, and then related data sources.
  • Acting as a subject reference backbone. So, it could be use by people to links resources, using dc:subject, to its subject resource (the UMBEL subject proxy resource), etc.
  • Could be used by user interface to help them with handling subjects (keywords) references to find related ontologies (that have the power to describe these subjects).
  • Eventually it should be used by PingtheSemanticWeb to bind pinged data to the subject reference structure.
  • And probably many others.

Creation of the Ontology

A procedure will be created to automatically generate the ontology. The gross idea is to reuse existing knowledge bases to create the set of subjects, and their relationship, that will create the ontology. So, the idea is to come up with a representative, not too general, not too specialized, set of subjects. For that, we will play with knowledge bases such as WordNet, Wikipedia, Dmoz, etc. We will try to find out how we could prune unnecessary subjects out of them, how we could create such a subject reference framework by taking a look at the intersection of each data set, etc. The procedure is not yet developed, but the first experiments will look like that.

As explained in the draft:

The acceptance of the actual subjects and their structure is one key to the acceptance - and thus use and usefulness - of the UMBEL ontology. (The other key is simplicity and ease-of-use or tools.) A suitable subject structure must be adaptable and self-defining. It should reflect expressions of actual social usage and practice, which of course changes over time as knowledge increases and technologies evolve.

A premise of the UMBEL project is that suitable subject content and structures already exist within widely embraced knowledge bases. A further premise is that the ongoing use of these popular knowledge bases will enable them to grow and evolve as societal needs
and practices grow and evolve.

The major starting point for the core subject pool is WordNet. It is universally accepted, has complete noun and class coverage, has an excellent set of synonyms, and has frequency statistics. It also has data regarding hierarchies and relationships useful to the UMBEL look-up reference structure, the ‘unofficial’ complement to the core ontology.

A second obvious foundation to building a subject structure is Wikipedia. Wikipedia’s topic coverage has been built entirely from the bottom up by 75,000 active contributors writing articles on nearly 1.8 million subjects in English alone, with versions in other
degrees of completeness for about 100 different languages. There is also a wealth of internal structure within Wikipedia’s templates.

These efforts suggest a starting formula for the UMBEL project of W + W + S + ? (for WordNet + Wikipedia + SKOS + other?). Other potential data sets with rich subject coverage include existing library classification systems, upper-level ontologies such as SUMO, Proton or DOLCE, the contributor-built Open Directory Project, subject ‘primitives’ in other languages such as Chinese, or the other sources listed in Appendix 2 – Candidate Subject Data Sets.

Though the choice of the contributing data sets from which the UMBEL subject structure is to be built will never be unanimous, using sources that have already been largely selected by large portions of the Web-using public will go a long ways to establishing authoritativeness. Moreover, since the subject structure is only intended as a lightweight reference — and not a complete closed-world definition — the UMBEL project is also setting realistic thresholds for acceptance.

Conclusion

If you are interested in such an ontology project, please join us on the mailing list of the ontology's development group, ask questions, writes comments and suggestions.

Next step is to start creating a first version of the subject proxies.




This blog is a regularly updated collection of my thoughts, tips, tricks and ideas about my semantic Web researches and related software development.


RSS Twitter LinkedIN


Follow

Get every new post on this blog delivered to your Inbox.

Join 66 other followers:

Or subscribe to the RSS feed by clicking on the counter:




RSS Twitter LinkedIN