Wikipedia concepts to help managing Semantic Web described subjects

 

This article is an aggregation of thoughts I had while working on one of my recent semantic web projects. I have no idea what these ideas worth, but I hope to open a discussion regarding the problematic described bellow.

 

The problematic

What is beautiful with the semantic web is that anyone can define anything about anything. I can define myself; I can define a project; I can define a car; Etc.

What is so beautiful and powerful is also a huge problem and probably the bigger weakness of the semantic web.

In fact, I can define myself with a FOAF profile. This FOAF profile describes relations about me. But nothing stops anybody to define other relations about me. One can describe good things about me, but one can describe bad things as well.

 

How does it happen?

Readers: you will need some understanding of semantic web concepts to read the rest of that article.

    

Basically, I can use any URI (unique resource identifier) to describe relations about me. But anybody can define new relations with that URI as well.

The problem arises if I get each RDF files from these two people and that I index them in the same triple store. Then I would get many relations (defined by two different people) for the same subject: me.

Do you see what the situation is starting to look like?

 

Same as Wikipedia

Yeah, the problem seems to be the same as Wikipedia’s. In fact it is the same thing, except that Wikipedia is centralized on a same infrastructure, and that the semantic web is a decentralized over the Web.

Since we can’t restrict people to use a URI to describe it, we have to find another solution.

 

Some possible solutions

Possible solutions exist. By example, people indexing these documents could only index URI that are dereferenceable (this mean an URI I can look up over the HTTP protocol to get a RDF file about the identified resource).

That way, they would restrict Bob to define anything about my URI from another web server than my own.

A variant of that solution would be to get documents only from a list of trusted resources (like most of the current memetrackers does).

However all these solutions downgrade the power of RDF: defining anything about anything. This mean that if I want, I can define everything about my dog “bud” even if he is not resolvable over HTTP.

 

Another solution: a Wikipedia like supervising authority

One solution could be to create an authority that would supervise the evolution of the description of URIs in the semantic web.

You should see the interface as the one of Wikipedia expect that we would not use WikiWords but URI to define things.

So people could register to that “authority”. They could define things about these URIs. Conflicted URIs could be tagged as is. Discussions about URI description could be open, Etc.

 

How such a system could be use as a solution to the problematic?

The authority could create an ontology to define that “meta-information” about URIs. Then, they would get all the information from the service and publish it in RDF using the ontology.

From that point, everybody that display information about URIs could add the “meta-information” about URIs in their results. That way, their users would see if false information is included into the result, if the things defined for the URI (the subject) are conflicted, etc.

 

Conclusion

For semantic web developers this solution is as simple as indexing RDF data into a triple store: (1) download (2) index (3) query and (4) display.

This solution is not perfect, but it could help developers to display meaningful information to their system users while keeping the description power of RDF.

 

Technorati: | | | | | | | | |

4 thoughts on “Wikipedia concepts to help managing Semantic Web described subjects

  1. Hi Fred,

    You should have a look at context / Named Graphs ( http://www2005.org/cdrom/docs/p613.pdf ) and trust notion (eg: http://trust.mindswap.org/ ) .
    Using a triple store that support contexts, you’ll keep provenance of each statement, so you can define trust levels for statements authors, or simply query or avoid some graphs in your queries.

    BTW, I don’t think relying on such an authority to say if a statement is true or not (if I understood well) is a good idea.
    There can be a lot of way were statements can conflict if you just take them independently, but not depending the context (date, opinion …) – and that’s also why context is important.
    Moreover, if I want to say something in RDF, I just say this. Then, trust this or not must be decided by people themselves, I think, not by an authority (even if then, you can imagine generic trust services, as blacklists for spammers).

  2. Hi Alex!

    Yeah well, I scanned your article, and if you take a look at the section 8.2, it is probably the formalization of what I am talking about here (by the way, good article). And the other solutions of that paper are just other ways to archive the goal.

    Well, I hope people will use named graph to keep track of them (individul grpahs). It is sure that you can create a level of trust for yeah source of information, but you have to get some metrics to set that trust level, and my proposition was a possible metric. In fact, one could take that, in conjonction with the fact that a document if dereferenceable or not, etc. Then one could create a beautiful little algorithm to set the trust level for each source.

    I said authority, but the term is probably not appropriate. In fact, is Wikipedia an authority?

    Also, this is not to say if a statement is true or not, but it is to add trust information (call it the way you want) about data sources and statements. When you talk about opinion, remember what I said when I was talking about “conflicting” documents above… this would be the same thing as when opinions are conflicting in some wikipedia articles.

    So, change “authority” by “people”. In fact, the authority I was talking about was a Wikipedia like system, and wikipedia is drove by people 🙂

    Take care,

    Fred

  3. Hi Fred,

    an alternative solution to having this authority, would be just to use the existing version of Wikipedia like an ontology, which gives you plenty of URIs for describing common concepts.

    We have done some work into this direction at http://sites.wiwiss.fu-berlin.de/suhl/bizer/dbpedia/

    > That way, they would restrict Bob to define anything about my URI from another web server than my own.

    That’s not true. Image a Google like crawler which puts the complete Semantic Web into a single repository. Once you query this repository, you find anything everybody says about you.

    I think that Ping the Semantic Web is a big help in building such repositories (I expect there to be many of them).

    Cheers

    Chris

  4. Hi Chris!

    The idea wasn’t to get reliable information about topics and concepts, it was to get information about things that are already defined (like my foaf profile, a geoname, etc.).

    DBpedia seems great! I saw that I can download it, so if you don’t mind I would put it into PTSW. Also, are you updating that database once in a while?

    Well, my assertion probably need some more explanation:

    If the URI “http://a.com/rdf” is dereferenceable with URL “http://b.com/rdf”, then a crawler could discard the RDF document that came from “b.com” since it doesn’t share the same domain as the URI. It is what I was meaning.

    Yes sure it is. If you are interested in that, so check was will come in the next weeks/month.

    Take care,

    Fred

Leave a Reply

Your email address will not be published. Required fields are marked *