Wikipedia as a collaborative editing-ontology tool

Introduction: ontologies

Tim Berners-Lee have a dream: the creation of a semantic web. One of the issue of with that dream is that it rely on various and well-defined ontologies. All the power of the semantic web resides in these ontologies. The current question is: how could we create such ontologies: ontologies that would describe, semantically, the world that we are living in?

Creating ontologies

Two choices are offered to us when comes the time to create ontologies:

  1. Manual creation by humans
  2. Automatic creation by documents analysis softwares

The problem with the second method is that we do not have the document analysis methods and technologies to automatically create complete and well-defined ontologies. The best we can current do with these softwares is what we could call lightweight-ontologies or associative semantic networks[1]. This is a good base to start with, but clearly not enough evoluate to make the semantic web a reality.

Manual ontologies creation

At that time, the best way to create completes and well-defined ontologies is to write them manually. Two types of manual creation of ontologies exists:

  1. Non-Collaborative
  2. Collaborative

A non-collaborative ontology is created by an expert or a small group of experts; only he/them will be able to change and update the ontology in the future. Some issues exists with that ontology creation method:

  • The availability of the expert(s)
  • The price related with he/them
  • The possibility that the expert create a part of the ontology with his believes on a subject that are not accepted by the community

A non-collaborative ontology is created and updated by a larger group of people. In fact, such ontology will normally be created on the Internet, edited and maintained with collaborative ontologies editing tools, and shared and open to anyone who would like to change/expend it. Some issues also exists with that ontology creation method:

  • The availability of easy-to-use collaborative ontology-editing tools
  • How should we handle the open issues where people do not agree on
  • Should these ontologies be centralized or distributed?

Wikipedia as a collaborative ontology-editing tool

A new type of web site emerged in the past years on the Internet: Wikis. A wiki is a collaborative web-page editing tool. It enables people to create collective works like Wikipedia.

If we check at the evolution of Wikipedia [a collective encyclopedia] in the past years, we can clearly see that people are willing to take of their time to write about various subjects, and share these writings with the Internet’s community.

People are writing up-to-date, detailed and complete documentation on any subjects that exist in our World. The question I then ask is: why should they not be willing to create ontologies related with these subjects?

So,

We have a dream: the emergence of a semantic web
We have a problem: the creation of complete and well-defined ontologies
We have a solution: to create these ontologies with collaborative ontology-editing tools
We have a problem: we do not have such tools available
We have a partial tested solution: Wikipedia

How could Wikipedia be upgraded in such a tool?

The idea is the following: upgrading the current Wikipedia’s Wiki software to create an ontology-editing module.

That way, when people would write about a subject, they could also create an ontology related with that subject. After, other people would be able to edit, change and upgrade these ontologies.

The infrastructure is already in place and really reliable. We know that people are willing to create such things. Now what we have to do is to create such a module to implement into the current infrastructure.

What would be the utility to create such ontologies?

The utility would be that new services and new applications would be able to request and use these complete, well-defined, and reliable ontologies. It would open unimaginable possibilities in the domain of information processing. I would greatly help us to handle on of the current problem we have with the Internet: the information overload.

Technorati: | | | | | | | |

I will be at the Startup School conference this 15 October at Harvard

This conference is an initiative of Y Combinator. It will be a good opportunity to make news contacts, and to ear how the speakers created, managed and extended their past ventures. It will be a really enjoyable one day conference at Harvard, Cambridge: two places I never been before.

Who will speak at that conference? There is an impressive list of people:


David Cavanaugh

Partner, Wilmer Cutler Pickering Hale and Dorr

Hutch Fishman

CFO, cMarket; CFO, Veveo

Paul Graham
Partner, Y Combinator

Marc Hedlund
Entrepreneur in Residence, O’Reilly Media

Qi Lu
VP of Engineering, Yahoo!

Mark Macenka
Partner, Goodwin Procter

Michael Mandel
Chief Economist, BusinessWeek

Stan Reiss

General Partner, Matrix

Olin Shivers
Associate Professor, Georgia Tech; Co-Founder, Smartleaf

Langley Steinert

Co-Founder, TripAdvisor

Stephen Wolfram

Founder, Wolfram Research

Steve Wozniak
Co-Founder, Apple Computer

If you can’t be there, keep checking at the web site of the Startup School, they are supposed to Podcast the whole event.

Hope to see you there

Technorati: | | | | |

If this is the Web 2.0, then what is the Semantic Web?

I re-read the article wrote by Tim O’reilly about: What is the Web 2.0? If this is the Web 2.0, then, what is the Semantic Web? The article talk about:

1. The Web As Platform
2. Harnessing Collective Intelligence
3. Data is the Next Intel Inside
4. End of the Software Release Cycle
5. Lightweight Programming Models
6. Rich User Experiences

Tim talks about the importance of the data in the Web 2.0. The question he asks is: Who owns the data? It is a legitimate question, but is that really a question of the Web 2.0? Possibly, but the thing is that it is already a question of the Web 1.0.

Personally, I would ask the question: How to present that information? In the recent articles and blog posts I read about the Web 2.0, people talk about the openness of data available through hundred different APIs. It is certainly a good practice to gather the right information: much better than scrapping the HTML content of web sites. However, I do not think that this is the best way, a good way for sure, but not the best.

Why people do not talk about the semantic web: a way to present information in such a way that it is partially, even fully, processable by computers? This is even more powerful than hundred different APIs, no? Why is it much more powerful? Because these same applications could talk together without caring about the APIs’ protocols. I think that we should talk about the semantic web concepts much more than APIs when we talk about the place of the Data into the Web 2.0.

If I am wrong, then what is the Semantic Web?

Who said that conferences are more and more useless? Web 2.0 conference 2005

I read something some months ago saying that the conferences, symposiums, etc. are more and more worthless and useless considering their business evolution, etc, etc, etc. I do not know if he was right, but the only thing I know is that the Web 2.0 conference 2005 is not the case. In fact, people are more than enthusiast to follow it. Many discussions are emerging everywhere on the blogsphere about it, and about the subject it covers: the web 2.0. More than ever, people try to define what is the Web 2.0, a hype term that people use in any context.

I already tried to roughly define what is the Web 2.0; and Tim O’reilly wrote a beautiful essay on that subject. It is a must read for anyone wanting to follow the discussions that will emerge from that conference.

So, are these conferences worthless and useless? Follow the current and future discussions emerging from that conference and re-ask you the question.