Benchmark of PHP’s main String Search Functions

I am currently upgrading the structWSF ontologies related web service endpoints along with the structOntology conStruct module to make them more performing so that we can load ontologies that have thousands of classes and properties (at least up to 30 000 of them).

While testing these new upgrades with them UMBEL ontology, I noticed that much of the time was spent by a few number of stripos() calls located in the loadXML() function of the ProcessorXML.php internal structXML parser. They were used to extract the prefixes in the header of the structXML files, and then to resolve them into the XML file. I was using stripos() instead of strpos() to make the parsing of these structXML files case-insensitive even if XML is case-sensitive itself. However, due to their processing cost, I did change this behaviors by using the strpos() function instead. Here are the main reasons to this change:

  • XML is itself case-sensitive, so don’t try to be too clever
  • These structXML files that are exchanged are mostly internal to structXML
  • Their parsing performances is critical

The Tests

This is a non-scientific post about some experimentation I made related to the various PHP 5.3 string search functions. These tests have been performed on a small Amazon EC2 instance using DBG and PHPeD.

[cc lang=’php’ line_numbers=’true’]
[raw]

[/raw]
[/cc]

The first test uses a text of 138 words. That text get exploded into an array where each value is a word of that text. Then, before each iteration, we randomly select a word that we will search, within the text, using each of the 4 search functions.

Note that in the result images below, each of the line in the left-most column are the ones of the PHP code above.

That first test starts with 10 000 iterations. Here are the results of the first run:


The second test uses the same 138 words, but the test is performed 100 000 times:

As we can see, strpos() and strstr() are clearly faster than their case-insensitive counterparts.

Now, let’s see what is the impact of the size of the text to search. We will now perform the two tests with 10 000 and 100 000 iterations but with a text that has 497 words.

[cc lang=’php’ line_numbers=’true’]
[raw]

[/raw]
[/cc]

That third test starts with 10 000 iterations. Here are the results of the third run:

The fourth test uses the same 497 words, but the test is performed 100 000 times:

As we can see, even if we add more words, the same kind of performances are experienced.

Conclusion

After many runs (I only demonstrated a few here). I think I can affirm that strpos() and strstr() are way faster than their case-insensitive counterparts. However, strpos() seems a little bit faster than strstr(), but it seems to depends of the context, and which random words are being searched for. In any cases, according to PHP’s documentation, we should always use strpos() instead of strstr() because it supposedly use less memory.

There may also be some unknown memory considerations that may affect the code I used to test these functions. In any case, I can affirm that in a real context, where queries are sent to the Ontology: Read web service endpoint that hosts the UMBEL ontology, that strpos() is a way faster than stripos().

What is an Ontology?

An ontology is the definition of a vocabulary, and the rules for combining its terms, used to describe things that needs to be communicated.

This is yet another tentative definition of what is an ontology applied for the semantic web. Before explaining that definition, I would like to continue by stating what I think is the main purpose of an ontology:

An ontology as for main purpose to communicate coherent and consistent information.

Different Kinds of Ontologies

Over the years, I tended to use the word “vocabulary,” along with the word “ontology,” in different blog posts and technical documents. However, the usage of each word may not always have been clear. Is an vocabulary an ontology? Is an ontology a vocabulary? Are these concepts synonymous? There is an important distinction to make: an ontology can be a vocabulary, but an ontology is much more than a simple vocabulary.

Ontologies can describe all kind of well-known knowledge representation structures, some simple, and others much more complex. Here is a small list of some of them:

  • lexicons
  • taxonomies, or
  • higher order knowledge description frameworks

In its most basic usage, an ontology will define a vocabulary. It will simply define the terms (words) that belongs to that vocabulary without saying anything regarding the usage of these words.

Then, an ontology could evolve into a taxonomy by defined hierarchical relationships between the terms that compose the vocabulary.

Finally, it can evolve further to become a higher order knowledge description framework that defines more complex usage rules such as: usage restrictions, all kind of relationships between described entities, etc. New knowledge could also be inferred. It is why I say that an ontology is not strictly a simple vocabulary, but that it powerful knowledge description framework.

Knowledge Base

As we saw above, the main purpose of an ontology is to be able to create a coherent and consistent knowledge base of information that can get communicated. So an ontology is a kind of language that let you create knowledge bases that are consistent, coherent and where new knowledge can be inferred. That is done by following the usage rules defined in the ontology.

However, there is another important aspect to take into account: an ontology will describe knowledge that is coherent and consistent, but according to the own World view of that ontology. This means that two ontologies, describing the same domain of knowledge, could consistently and coherently describe information according to their view of the World.

Let’s take an example. Let’s say that two book stores developed their own ontologies to describe the books they sell. Both companies sell books. There are good chances that they will use the same vocabulary to describe their books. However, the usage rules between these terms may differ between the two book stores. One of the book stores could say that a proceeding is a specialized kind of book. But the other book store could say that no, a proceeding is not a specialized kind of book, but that it is a document just like a book. So, both would describe a proceeding as a document, but one would have different interpretation rules about what a book really is. As you see, both book stores use the same vocabulary to define their library of books, but they interpret their meaning differently. If the two stores would have to exchange information about books in the future, they won’t have many difficulties because they are probably sharing the same vocabulary, but the interpretation of that information may differ. The result of these potential differences in their interpretations may be where a book will be classified into the store; or how their customers could search for a specific book, using different filtering criterias; etc.

This is not different than what happens in our daily lives: is there a day in your life when you don’t hear people arguing about different point of views? It is exactly the same thing that happens here. We potentially all live and see and the exact same events, images, sound, etc.; but we may all have a different interpretation of these things.

Ontologies in the Open Semantic Framework?

Ontologies are so flexible that we choose to make ontologies the “brain” of the Open Semantic Framework.

We wanted to use the most flexible knowledge description framework that would enable us to integrate any possible information sources that have been describe using any existing kind of simple, or really complex, knowledge representation structures such as simple: lexicons, taxonomies, relational schemas, etc. By using ontologies as its central piece, OSF is a flexibly data integration framework that can consolidate information from various, heterogeneous, sources of information.

If we remember the definition we started with, ontologies are not just about describing terms and their relationships in a coherent and consistent way. The ultimate purpose is to communicate that information. It is what the structWSF part of the Open Semantic Framework does: it let any kind of system that have access to the Internet to send, receive and manipulate information in multiple formats from a series of web service endpoints.

More Reading

Finally, I would suggest you to read Mike’s Intrepid Guide to Ontologies to have a better understanding of where ontologies come from, how they works, what other formats exists, what are the different approaches to ontologies and what tools currently exists to work with ontologies.

Querying the MyPeg datasets using the structWSF SPARQL endpoint

The last blog post I wrote demonstrated how one could query the MyPeg.ca portal using the full set of structWSF web service endpoints to get data out of the portal. However, I didn’t cover the usage of the SPARQL endpoint since I wanted to cover it in its own blog post to explain all its characteristics.

In this blog post, I will demonstrate how one can get data out of the MyPeg.ca community indicators web portal for Winnipeg’s citizens using the SPARQL endpoint. I will also cover all the specificities of this SPARQL endpoint: all its characteristics and access/permission features.

Two Modes

The first characteristic of the structWSF SPARQL endpoint is that there are two modes (usecases) it can be used for:

  1. Getting SPARQL resultsets that match some SPARQL queries patterns
  2. Getting complete records descriptions in any format supported by the endpoint

The first mode is not different than any other SPARQL endpoint. Users just send different SPARQL queries and retrieve their related SPARQL resultsets. These resultsets can be returned using different MIME types.

For a SELECT query, these formats can be used:

  1. application/sparql-results+xml
  2. application/sparql-results+json

For a DESCRIBE or a CONSTRUCT query, these formats can be used:

  1. text/rdf+n3 (RDF+N3)
  2. application/rdf+xml (RDF+XML)
  3. application/rdf+json (RDF+JSON)
  4. text/plain (NTRIPLES)

The second mode is quite different. The main characteristic of the structWSF SPARQL endpoint is that it can export resultsets into different formats, not usually supported by other endpoints. However, these other formats are usually used to describe complete descriptions of records, and not just some triples matching some SPARQL patterns.

For that reason, the SPARQL query that is sent using this second mode needs to have the three variables (1) ?s, (2) ?p and (3) ?o bound in the SPARQL query, otherwise an empty resultset will be returned. For example, the following SPARQL query would return the complete records descriptions of all the records that are peg:Theme and that are themes of the peg:WellBeing cross cutting issue:

[cc lang=’text’ line_numbers=’false’ escaped=’true’]SELECT ?s ?p ?o
WHERE
{
?s a <http://purl.org/ontology/peg#Theme> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#WellBeing> ;
?p ?o .
}[/cc]

This mode is used to return a set of records descriptions that match a SPARQL pattern. Supported MIME formats for that second mode are:

  • text/xml (structXML)
  • application/json (structXML in JSON)
  • application/rdf+xml (RDF+XML)
  • application/rdf+n3 (RDF+N3)
  • application/sparql-results+xml (SPARQL resultset in XML)
  • application/sparql-results+json (SPARQL resultset in XML)

Getting Records in Different Formats

Now, let’s take a look at what is returned for the SPARQL query above, for each of these supported MIME types, from the MyPeg.ca SPARQL endpoint.

Note that the queries below are using the Curl application (available for multiple operating systems) to send the HTTP queries to the structWSF SPARQL web service endpoint.

StructXML: text/xml

[cc lang=’text’ line_numbers=’false’ escaped=’true’]
curl -H “Accept: text/xml” “http://www.mypeg.ca/ws/sparql/” -d “dataset=http://www.mypeg%2Eca%2Fwsf%2Fdatasets%2F249%2F&query= SELECT+%3Fs+%3Fp+%3Fo%0D%0AWHERE%0D%0A{%0D%0A++%3Fs+a+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%23Theme>+%3B%0D%0A+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%23isThemeOf>+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%2Fframework%23WellBeing>+%3B%0D%0A+++++%3Fp+%3Fo+.%0D%0A}%0D%0A”
[/cc]

[cc lang=’xml’ line_numbers=’false’ escaped=’true’]

<?xml version=”1.0″ encoding=”utf-8″?>
<resultset>
<prefix entity=”owl” uri=”http://www.w3.org/2002/07/owl#”/>
<prefix entity=”rdf” uri=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”/>
<prefix entity=”rdfs” uri=”http://www.w3.org/2000/01/rdf-schema#”/>
<prefix entity=”wsf” uri=”http://purl.org/ontology/wsf#”/>
<subject type=”http://purl.org/ontology/peg#Theme” uri=”http://purl.org/ontology/peg/framework#Economy”>
<predicate type=”http://purl.org/ontology/peg#isThemeOf”>
<object uri=”http://purl.org/ontology/peg/framework#WellBeing”/>
</predicate>
<predicate type=”http://purl.org/ontology/peg#isThemeOf”>
<object uri=”http://purl.org/ontology/peg/framework#Poverty”/>
</predicate>
<predicate type=”http://purl.org/ontology/sco#displayComponent”>
<object uri=”http://purl.org/ontology/sco#sRelationBrowser”/>
</predicate>
<predicate type=”http://www.w3.org/2000/01/rdf-schema#label”>
<object type=”rdfs:Literal”>economy</object>
</predicate>
<predicate type=”http://purl.org/ontology/iron#prefLabel”>
<object type=”rdfs:Literal”>Economy</object>
</predicate>
<predicate type=”http://purl.org/dc/elements/1.1/description”>
<object type=”rdfs:Literal”>Economy includes all that people do in our community to produce, exchange, distribute, and consume goods and services.</object>
</predicate>
</subject>
</resultset>

[/cc]

StructXML in JSON: application/json

[cc lang=’text’ line_numbers=’false’ escaped=’true’]

curl -H “Accept: application/json” “http://www.mypeg.ca/ws/sparql/” -d “dataset=http://www.mypeg%2Eca%2Fwsf%2Fdatasets%2F249%2F&query= SELECT+%3Fs+%3Fp+%3Fo%0D%0AWHERE%0D%0A{%0D%0A++%3Fs+a+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%23Theme>+%3B%0D%0A+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%23isThemeOf>+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%2Fframework%23WellBeing>+%3B%0D%0A+++++%3Fp+%3Fo+.%0D%0A}%0D%0A”

[/cc]

[cc lang=’javascript’ line_numbers=’false’ escaped=’true’]

{
“prefixes”: [
{
“owl”: “http://www.w3.org/2002/07/owl#”,
“rdf”: “http://www.w3.org/1999/02/22-rdf-syntax-ns#”,
“rdfs”: “http://www.w3.org/2000/01/rdf-schema#”,
“wsf”: “http://purl.org/ontology/wsf#”,
“ns0”: “http://purl.org/ontology/peg#”,
“ns1”: “http://purl.org/ontology/sco#”,
“ns2”: “http://purl.org/ontology/iron#”,
“ns3”: “http://purl.org/dc/elements/1.1/”
}
],
“resultset”: {
“subject”: [
{
“uri”: “http://purl.org/ontology/peg/framework#Economy”,
“type”: “ns0:Theme”,
“predicates”: [
{
“ns0:isThemeOf”: {
“uri”: “http://purl.org/ontology/peg/framework#WellBeing”
}
},
{
“ns0:isThemeOf”: {
“uri”: “http://purl.org/ontology/peg/framework#Poverty”
}
},
{
“ns1:displayComponent”: {
“uri”: “http://purl.org/ontology/sco#sRelationBrowser”
}
},
{
“rdfs:label”: “economy”
},
{
“ns2:prefLabel”: “Economy”
},
{
“ns3:description”: “Economy includes all that people do in our community to produce, exchange, distribute, and consume goods and services.”
}
]
},
]
}
}

[/cc]

RDF in XML: application/rdf+xml

[cc lang=’text’ line_numbers=’false’ escaped=’true’]

curl -H “Accept: application/rdf+xml” “http://www.mypeg.ca/ws/sparql/” -d “dataset=http://www.mypeg%2Eca%2Fwsf%2Fdatasets%2F249%2F&query= SELECT+%3Fs+%3Fp+%3Fo%0D%0AWHERE%0D%0A{%0D%0A++%3Fs+a+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%23Theme>+%3B%0D%0A+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%23isThemeOf>+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%2Fframework%23WellBeing>+%3B%0D%0A+++++%3Fp+%3Fo+.%0D%0A}%0D%0A”

[/cc]

[cc lang=’xml’ line_numbers=’false’ escaped=’true’]

<?xml version=”1.0″?>
<rdf:RDF  xmlns:owl=”http://www.w3.org/2002/07/owl#” xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:rdfs=”http://www.w3.org/2000/01/rdf-schema#” xmlns:wsf=”http://purl.org/ontology/wsf#” xmlns:ns0=”http://purl.org/ontology/peg#” xmlns:ns1=”http://purl.org/ontology/sco#” xmlns:ns2=”http://purl.org/ontology/iron#” xmlns:ns3=”http://purl.org/dc/elements/1.1/”>

<ns0:Theme rdf:about=”http://purl.org/ontology/peg/framework#Economy”>
<ns0:isThemeOf rdf:resource=”http://purl.org/ontology/peg/framework#WellBeing” />
<ns0:isThemeOf rdf:resource=”http://purl.org/ontology/peg/framework#Poverty” />
<ns1:displayComponent rdf:resource=”http://purl.org/ontology/sco#sRelationBrowser” />
<rdfs:label>economy</rdfs:label>
<ns2:prefLabel>Economy</ns2:prefLabel>
<ns3:description>Economy includes all that people do in our community to produce, exchange, distribute, and consume goods and services.</ns3:description>
</ns0:Theme>

</rdf:RDF>

[/cc]

RDF in N3: application/rdf+n3

[cc lang=’text’ line_numbers=’false’ escaped=’true’]

curl -H “Accept: application/rdf+n3” “http://www.mypeg.ca/ws/sparql/” -d “dataset=http://www.mypeg%2Eca%2Fwsf%2Fdatasets%2F249%2F&query= SELECT+%3Fs+%3Fp+%3Fo%0D%0AWHERE%0D%0A{%0D%0A++%3Fs+a+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%23Theme>+%3B%0D%0A+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%23isThemeOf>+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%2Fframework%23WellBeing>+%3B%0D%0A+++++%3Fp+%3Fo+.%0D%0A}%0D%0A”

[/cc]

[cc lang=’text’ line_numbers=’false’ escaped=’true’]

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix wsf: <http://purl.org/ontology/wsf#> .

<http://purl.org/ontology/peg/framework#Economy> a <http://purl.org/ontology/peg#Theme> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#WellBeing> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#Poverty> ;
<http://purl.org/ontology/sco#displayComponent> <http://purl.org/ontology/sco#sRelationBrowser> ;
<http://www.w3.org/2000/01/rdf-schema#label> “””economy””” ;
<http://purl.org/ontology/iron#prefLabel> “””Economy””” ;
<http://purl.org/dc/elements/1.1/description> “””Economy includes all that people do in our community to produce, exchange, distribute, and consume goods and services.””” .

[/cc]

Getting Records Using CONSTRUCT

You always have the possibility to use a CONSTRUCT query to return data in different formats. Unlike with the second mode supported by the endpoint, you won’t have access to different formats (such as structXML both in XML and JSON). Here is such a CONSTRUCT query:

[cc lang=’text’ line_numbers=’false’ escaped=’true’]
CONSTRUCT FROM <http://www.mypeg.ca/wsf/datasets/249/>
{
?s ?p ?o .
}
WHERE
{
?s peg:isThemeOf <http://purl.org/ontology/peg/framework#WellBeing> .
?s ?o ?p .
}
[/cc]

[cc lang=’text’ line_numbers=’false’ escaped=’true’]
curl -H “Accept: application/rdf+xml” “http://www.mypeg.ca/ws/sparql/” -d “query=PREFIX%20peg%3A%20%3Chttp%3A%2F%2Fpurl.org%2Fontology%2Fpeg%23%3E%0ACONSTRUCT%0A%7B%0A%20%20%3Fs%20peg%3AisThemeOf%20%3Chttp%3A%2F%2Fpurl.org%2Fontology%2Fpeg%2Fframework%23WellBeing%3E%20.%0A%7D%0AFROM%20%3Chttp%3A%2F%2Fwww.mypeg.ca%2Fwsf%2Fdatasets%2F249%2F%3E%0AWHERE%20%0A%7B%20%0A%20%20%3Fs%20peg%3AisThemeOf%20%3Chttp%3A%2F%2Fpurl.org%2Fontology%2Fpeg%2Fframework%23WellBeing%3E%20.%0A%7D”
[/cc]

[cc lang=’xml’ line_numbers=’false’ escaped=’true’]
<?xml version=”1.0″ encoding=”utf-8″ ?>
<rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:rdfs=”http://www.w3.org/2000/01/rdf-schema#”>
<rdf:Description rdf:about=”http://purl.org/ontology/peg/framework#EducationAndLearning”><n0pred:isThemeOf xmlns:n0pred=”http://purl.org/ontology/peg#” rdf:resource=”http://purl.org/ontology/peg/framework#WellBeing”/></rdf:Description>
<rdf:Description rdf:about=”http://purl.org/ontology/peg/framework#BasicNeeds”><n0pred:isThemeOf xmlns:n0pred=”http://purl.org/ontology/peg#” rdf:resource=”http://purl.org/ontology/peg/framework#WellBeing”/></rdf:Description>
<rdf:Description rdf:about=”http://purl.org/ontology/peg/framework#Health”><n0pred:isThemeOf xmlns:n0pred=”http://purl.org/ontology/peg#” rdf:resource=”http://purl.org/ontology/peg/framework#WellBeing”/></rdf:Description>
<rdf:Description rdf:about=”http://purl.org/ontology/peg/framework#SocialVitality”><n0pred:isThemeOf xmlns:n0pred=”http://purl.org/ontology/peg#” rdf:resource=”http://purl.org/ontology/peg/framework#WellBeing”/></rdf:Description>
<rdf:Description rdf:about=”http://purl.org/ontology/peg/framework#BuiltEnvironment”><n0pred:isThemeOf xmlns:n0pred=”http://purl.org/ontology/peg#” rdf:resource=”http://purl.org/ontology/peg/framework#WellBeing”/></rdf:Description>
<rdf:Description rdf:about=”http://purl.org/ontology/peg/framework#NaturalEnvironment”><n0pred:isThemeOf xmlns:n0pred=”http://purl.org/ontology/peg#” rdf:resource=”http://purl.org/ontology/peg/framework#WellBeing”/></rdf:Description>
<rdf:Description rdf:about=”http://purl.org/ontology/peg/framework#Economy”><n0pred:isThemeOf xmlns:n0pred=”http://purl.org/ontology/peg#” rdf:resource=”http://purl.org/ontology/peg/framework#WellBeing”/></rdf:Description>
<rdf:Description rdf:about=”http://purl.org/ontology/peg/framework#Governance”><n0pred:isThemeOf xmlns:n0pred=”http://purl.org/ontology/peg#” rdf:resource=”http://purl.org/ontology/peg/framework#WellBeing”/></rdf:Description>
</rdf:RDF>
[/cc]

SPARQL Queries Restrictions

The structWSF SPARQL endpoint has some restrictions that have been introduced to make sure that the requesting users can only query the data to which they have access.

In structWSF, all permissions are attached to a dataset (a graph). Different users have different Create, Read, Update and Delete permissions on different datasets hosted on the same structWSF endpoint. Because of this core mechanism in structWSF, we had to make sure that these same restrictions were applied for the SPARQL endpoint. This means that different SPARQL clauses and usages are restricted.

This section covers these specific restrictions for a structWSF SPARQL endpoint.

Accessing Dataset Without Permissions

Let’s try to see what happens when someone tries to access a dataset to which he doesn’t have access. Consider this SPARQL query:

[cc lang=’text’ line_numbers=’false’ escaped=’true’]
PREFIX mypeg: <http://www.mypeg.ca/wsf/>
SELECT ?s ?p ?o FROM mypeg:
WHERE
{
?s ?p ?o .
}
[/cc]

Obviously, no user has a direct access to that dataset on the MyPeg instance:

[cc lang=’text’ line_numbers=’false’ escaped=’true’]
curl -H “Accept: application/rdf+n3” “http://www.mypeg.ca/ws/sparql/” -d “query= PREFIX%20mypeg%3A%20%3Chttp%3A%2F%2Fwww.mypeg.ca%2Fwsf%2F%3E%0ASELECT%20%3Fs%20%3Fp%20%3Fo%20FROM%20mypeg%3A%0AWHERE%0A%7B%0A%20%20%3Fs%20%3Fp%20%3Fo%20.%0A%7D%0A”
[/cc]

[cc lang=’xml’ line_numbers=’false’ escaped=’true’]

<error>
<id>WS-AUTH-VALIDATOR-303</id>
<webservice>/ws/auth/validator/</webservice>
<name>No access defined</name>
<description>No access defined for this requester IP , dataset and web service</description>
<debugInformation>No access defined for this requester IP (174.129.43.163), dataset (http://www.mypeg.ca/wsf/) and web service (http://www.mypeg.ca/wsf/ws/sparql/)</debugInformation>
<level>Warning</level>
</error>

[/cc]

So, even if a dataset exists in a triple store that exposes a SPARQL endpoint, not all users have access to all of these datasets. The access and permissions layer will restrict the access to them if need be.

If a FROM clause, or multiple FROM NAMED clauses are specified in the SPARQL query, the access layer will make sure that the user has access to all these datasets. If he doesn’t have access to one of them, then an error will be returned.

CONSTRUCT

The CONSTRUCT clause can be used against this SPARQL endpoint, but only if it doesn’t use any GRAPH clauses. However, we encourage users to use the method described in the section “Getting Records in Different Formats” since more formats can be requested, and more formats can easily be added in the future.

Here is an example of a CONSTRUCT query that uses a GRAPH clause:

[cc lang=’text’ line_numbers=’false’ escaped=’true’]
CONSTRUCT
{
?s ?p ?o
}
WHERE
{
graph <http://www.mypeg.ca/wsf/datasets/249/>
{
?s ?p ?o
}
}
[/cc]

[cc lang=’text’ line_numbers=’false’ escaped=’true’]
curl -H “Accept: application/rdf+xml” “http://www.mypeg.ca/ws/sparql/” -d ” query=CONSTRUCT%0A%7B%0A%20%20%3Fs%20%3Fp%20%3Fo%0A%7D%0AWHERE%20%0A%7B%20%0A%20%20graph%20%3C%20http%3A%2F%2Fwww.mypeg.ca%2Fwsf%2Fdatasets%2F249%2F%3E%0A%20%20%7B%0A%20%20%20%20%3Fs%20%3Fp%20%3Fo%0A%20%20%7D%0A%7D%0A”
[/cc]

[cc lang=’xml’ line_numbers=’false’ escaped=’true’]
<?xml version=”1.0″ encoding=”utf-8″?>
<error>
<id>WS-SPARQL-205</id>
<webservice>/ws/sparql/</webservice>
<name>GRAPH not permitted.</name>
<description>The SPARQL GRAPH clause is not permitted for this sparql endpoint. Please change your SPARQL query to specify the datasets you want to query with the FROM and FROM NAMED sparql clauses, or with the dataset parameter.</description>
<debugInformation></debugInformation>
<level>Warning</level>
</error>
[/cc]

As you can see, the endpoint will return a 205 error if a GRAPH clause is used within a CONSTRUCT statement.

GRAPH

As we saw above, no GRAPH clauses can be used in a SPARQL query. The reason is that we don’t want people to send SPARQL queries with GRAPH clauses that use variables. Otherwise, if we permitted GRAPH clauses to be used with variables, we couldn’t currently determine what triple comes from what dataset and so, we couldn’t ensure the access and permissions to that data.

However, in the future two improvements could be created to enable the usage of GRAPH clauses in SPARQL queries processed by structWSF:

  1. We could enable people to use GRAPH clauses that use direct IRI_REF references. That way, structWSF could easily check the permissions for these graphs (just like it does handle the FROM, FROM NAMED and DESCRIPTION clauses).
  2. We could enable the full usage of the GRAPH clause. However, we would have to modify the queries at the level of the endpoint to get the graph provenance of all the triples. Then the endpoint would have to analyze the provenance of each triple and only return the ones that the user has access to. This would inevitably slow down the query time to process the SPARQL request.

In the mean time, no GRAPH clauses can be used in any SPARQL query, and people should use the FROM and FROM NAMED clauses to get access to all the datasets they want from a particular endpoint.

SPARUL

No SPARQL/Update (SPARUL) queries can be sent via the structWSF SPARQL endpoint. All data modifications (records and/or dataset creation, updating and deleting) have to be performed by the Dataset and Record CRUD web service endpoints.

Conclusion

The structWSF SPARQL endpoint is a wrapper above a triple store’s SPARQL endpoint. It adds a permissions and access layer that is compatible with that used by other structWSF web services. This permission layer ensures that requesters only access the information they have access to within the triple store. Also, all of these access permissions are managed by the other structWSF web service endpoints, and can also be managed via the conStruct user interface.

The structWSF SPARQL endpoint also supports more resultset formats than are generally supported by mainstream triple stores. Also, the addition of new formats is made easier by using structWSF’s way to convert data in different formats.

Getting Data Out of MyPeg.ca using structWSF Endpoints

A few weeks ago I presented the new MyPeg.ca community indicators web portal for Winnipeg’s citizens. I explained how in MyPeg.ca we leverage Structured Dynamics’ semantic technologies stack (akaThe Semantic Muffin). Today’s blog post explains one facet of the project that shows how external agents (people, services, software, etc.) can interact with the system’s indicator datasets using the structWSF web service endpoints.Since this post focuses only on data export, I suggest you read the structWSF Web Services Tutorial for a complete overview of how the endpoints architecture works.

Merging Pipes

Two Main structWSF Characteristics: Accessibility & Management

structWSF is a set of 22 web service endpoints that lets you integrate data from different sources, manage that integrated data, and publish it via different communication channels such as web pages, software applications, etc.

Obviously, the main characteristic of this framework is that everything is a web service. This means that all functionality of the system can be accessed from anywhere on the Internet. However, this doesn’t mean that everything is open like a snack-bar. In fact, there are two levels of accessibility: (1) access to the web service endpoint’s URL, and (2) access to the content of datasets hosted on structWSF. Depending on the usecase, people could restrict the direct access to the web service endpoint(s) by properly configuring their web server, others could choose to let anyone access the endpoints, but would restrict the access to the dataset(s) hosted by structWSF. In case of MyPeg.ca, the sponsor chose to open the access to their web service endpoints and datasets.

Just by surfing on the MyPeg.ca portal, you are already leveraging these endpoints in multiple different ways. First, each time you generate a browse or a search Web page, you are telling the web server to send multiple queries to different endpoints; then the web page’s content will be populated with that information and presented to you. But, each time you click on an explorer node, your web browser is also sending queries to exactly the same web service endpoints. So, in one case a PHP script acts to query the endpoints; and, in other cases, a Flash Semantic Component does. Depending, all structWSF data can be accessed from quite different environments.

The other main characteristic of structWSF is that any kind of data can be imported in, and exported out, of the system. structWSF leverages RDF (Resource Description Framework) as the canonical data format that can be used to express any other formats. It is because of the usage of RDF that structWSF can act as an effective ETL (Extract, transform, load) system. Depending on the web service endpoint, the output formats currently supported by most of the endpoints are:

But the architecture of the web service endpoints can easily accommodate other formats if needed for a specific usecase.

Getting Data Out Of MyPeg.ca

Now, how can you get data out of MyPeg.ca? There are really two methods. This blog post discusses the CRUD: Read, Browse and Search web service endpoints. In my next blog post, I will focus on using the SPARQL web service endpoint to do the same.

All of the query examples in this blog post will use a tool called Curl to send the queries and to get back the resultsets. I encourage you to download and use that tool to test these endpoints and to gain a feeling for how it works. Also note that only the first record of each resultset is recorded below (of course, the actual results include all records).

Browse

The Browse web service endpoint is used to return lists of records. These records can also be filtered according to their provenance (dataset), type and the attributes that describe them. Now, let’s see how you can use this web service to get data out of MyPeg.ca.

First, there are three datasets available to the public:

  1. Well-being Indicators (http://www.mypeg.ca/wsf/datasets/258/)
  2. Stories (http://www.mypeg.ca/wsf/datasets/272/)
  3. PEG Framework (http://www.mypeg.ca/wsf/datasets/249/)

The resultsets can be serialized using one of these four different formats:

  • text/xml (structXML)
  • application/json (structXML in JSON)
  • application/rdf+xml (RDF/XML)
  • application/rdf+n3 (RDF/N3)

Note: if one of your desired formats is not directly available at the endpoint level, you can always use one of the converter web service endpoints such as: commON, irJSON or TSV/CSV.

Get the first 10 results of the Stories dataset in structXML

Query:

[cc lang=’text’ line_numbers=’false’]curl -H “Accept: text/xml” “http://www.mypeg.ca/ws/browse/” -d “attributes=all&types=all&datasets=http%3A%2F%2Fwww.mypeg.ca%2Fwsf%2Fdatasets%2F272%2F&items=10&page=0&inference=on&include_aggregates=true”[/cc]

StructXML resultset:

[cc lang=’xml’ line_numbers=’false’ escaped=’true’]

<?xml version=”1.0″ encoding=”utf-8″?>
<resultset>
<prefix entity=”owl” uri=”http://www.w3.org/2002/07/owl#”/>
<prefix entity=”rdf” uri=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”/>
<prefix entity=”rdfs” uri=”http://www.w3.org/2000/01/rdf-schema#”/>
<prefix entity=”wsf” uri=”http://purl.org/ontology/wsf#”/>
<subject type=”http://purl.org/ontology/muni#Story” uri=”http://www.mypeg.ca/wsf/datasets/272/resource/AgeOpportunity”>
<predicate type=”http://purl.org/dc/terms/isPartOf”>
<object type=”http://rdfs.org/ns/void#Dataset” uri=”http://www.mypeg.ca/wsf/datasets/272/”/>
</predicate>
<predicate type=”http://purl.org/ontology/iron#prefLabel”>
<object type=”rdfs:Literal”>Age &amp; Opportunity</object>
</predicate>
<predicate type=”http://purl.org/dc/terms/created”>
<object type=”rdfs:Literal”>2010-10-28T19:38:58+00:00</object>
</predicate>
<predicate type=”http://purl.org/ontology/bibo/abstract”>
<object type=”rdfs:Literal”>Amanda Macrae, Deborah Lorteau and Stacey Miller work for Age and Opportunity.
The majority of clients are older adults living at lower socio economic status. When addressing the housing issue they say, “In a nutshell, it’s dire.” There is simply not enou…</object>
</predicate>
<predicate type=”http://purl.org/ontology/peg#interviewee”>
<object type=”rdfs:Literal”>Amanda Macrae, Deborah Lorteau, Stacey Miller</object>
</predicate>
<predicate type=”http://purl.org/ontology/peg#interviewer”>
<object type=”rdfs:Literal”>Molly Johnson</object>
</predicate>
<predicate type=”http://purl.org/ontology/peg#storyRelatedAgencyProgram”>
<object type=”rdfs:Literal”>Age &amp; Opportunity</object>
</predicate>
<predicate type=”http://purl.org/ontology/sco#storyAnnotatedTextUri”>
<object>http://www.mypeg.ca/scones/AgeOpportunity.xml</object>
</predicate>
<predicate type=”http://purl.org/ontology/sco#storyTextUri”>
<object type=”rdfs:Literal”>http://www.mypeg.ca/scones/AgeOpportunity.txt</object>
</predicate>
</subject>
</resultset>

[/cc]

Get the 10 first results from all datasets that are records of type Neighborhoods in RDF/XML

Query:

[cc lang=’text’ line_numbers=’false’]curl -H “Accept: application/rdf+xml ” “http://www.mypeg.ca/ws/browse/” -d “attributes=all& type=http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%23Neighborhood &datasets=all&items=10&page=0&inference=on&include_aggregates=true”[/cc]

RDF/XML resultset:

[cc lang=’xml’ line_numbers=’false’ escaped=’true’]

<?xml version=”1.0″?>
<rdf:RDF  xmlns:owl=”http://www.w3.org/2002/07/owl#” xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:rdfs=”http://www.w3.org/2000/01/rdf-schema#” xmlns:wsf=”http://purl.org/ontology/wsf#” xmlns:ns0=”http://purl.org/ontology/peg#” xmlns:ns1=”http://purl.org/dc/terms/” xmlns:ns2=”http://purl.org/ontology/iron#” xmlns:ns3=”” xmlns:ns4=”http://purl.org/dc/elements/1.1/” xmlns:ns5=”http://purl.org/ontology/aggregate#”>

<ns0:Component rdf:about=”http://purl.org/ontology/peg/framework#Safety”>
<ns1:isPartOf rdf:resource=”http://www.mypeg.ca/wsf/datasets/249/” />
<ns2:prefLabel>Safety</ns2:prefLabel>
<ns2:altLabel>safety</ns2:altLabel>
<ns3:>safety</ns3:>
<ns4:description>Safety is the state of being “safe”, the condition of being protected against physical, social, spiritual, financial, political, emotional, occupational, psychological, educational or other types or consequences of failure, damage, error, accidents, harm or any other event which could be considered non-desirable.</ns4:description>
<rdfs:comment>Includes the idea of safety prevention</rdfs:comment>
<rdfs:seeAlso>http://en.wikipedia.org/wiki/Safety</rdfs:seeAlso>
<ns0:hasIndicator rdf:resource=”http://purl.org/ontology/peg/framework#HouseholdIncome” />
<ns0:hasIndicator rdf:resource=”http://purl.org/ontology/peg/framework#LowIncomeCutOffAfterTax” />
<ns0:hasIndicator rdf:resource=”http://purl.org/ontology/peg/framework#MarketBasketMeasure” />
<ns0:hasIndicator rdf:resource=”http://purl.org/ontology/peg/framework#ParticipationInSportsAndRecreation” />
<ns0:hasIndicator rdf:resource=”http://purl.org/ontology/peg/framework#MaternalSocialIsolation” />
<ns0:hasIndicator rdf:resource=”http://purl.org/ontology/peg/framework#PersonalSafety” />
<ns0:hasIndicator rdf:resource=”http://purl.org/ontology/peg/framework#EarlyDevelopmentInstrument” />
<ns0:hasIndicator rdf:resource=”http://purl.org/ontology/peg/framework#HighSchoolGraduationRate” />
<ns0:hasIndicator rdf:resource=”http://purl.org/ontology/peg/framework#LongTermUnemployment” />
<ns0:hasIndicator rdf:resource=”http://purl.org/ontology/peg/framework#TeenageBirths” />
<ns0:isComponentOf rdf:resource=”http://purl.org/ontology/peg/framework#BasicNeeds” />
<ns0:isComponentOf rdf:resource=”http://purl.org/ontology/peg/framework#Poverty” />
</ns0:Component>
</rdf:RDF>

[/cc]

Search

The Search web service endpoint is also used to return lists of records. These records should match a search string and can also be filtered according to their provenance (dataset), type and the attributes that describe them.

The same mime types and datasets as the ones for the Browse web service are available for the Search endpoint.

Searching for records with the keyword “poverty” and get resultsets in RDF/N3

Query:

[cc lang=’text’ line_numbers=’false’]curl -H “Accept: application/rdf+n3” “http://www.mypeg.ca/ws/search/” -d “query=poverty&datasets=all&items=10&page=0&inference=on&include_aggregates=true”[/cc]

RDF/N3 resultset:

[cc lang=’text’ line_numbers=’false’ escaped=’true’]

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix wsf: <http://purl.org/ontology/wsf#> .

<http://purl.org/ontology/peg/framework#Poverty> a <http://purl.org/ontology/peg#CrossCuttingIssue> ;
<http://purl.org/dc/terms/isPartOf> <http://www.mypeg.ca/wsf/datasets/249/> ;
<http://purl.org/ontology/iron#prefLabel> “””Poverty””” ;
<http://purl.org/dc/elements/1.1/description> “””Poverty is not having the sufficient resources, capabilities, choices, security and power necessary to enjoy an adequate standard of living.  Poverty includes much more than a lack of money.  It includes being excluded from ordinary living patterns, customs and activities.  Consequently, people living in poverty are often unable to participate fully in their communities or to reach their full potential.””” ;
<http://www.w3.org/2000/01/rdf-schema#seeAlso> “””http://en.wikipedia.org/wiki/Poverty””” .

[/cc]

CRUD: Read

The Browse and Search web service endpoints are really used to find lists of records according to some provided criteria. However, the complete description of these records is not returned by these endpoints, but only the information necessary to create the proper list to display to users in a user interface. So, to get the complete description of a record (or multiples thereof), you have to use the CRUD: Read web service endpoint. Also, sometimes you may get a reference to a record hosted on a structWSF node, then CRUD: Read is the way to get its full description.

Get the full description of the Ida story in irJSON

Query:

[cc lang=’text’ line_numbers=’false’]curl -H “Accept: application/iron+json” “http://www.mypeg.ca/ws/crud/read/?uri=http%3A%2F%2Fwww.mypeg.ca%2Fwsf%2Fdatasets%2F272%2Fresource%2FIda&dataset=http%3A%2F%2Fwww.mypeg.ca%2Fwsf%2Fdatasets%2F272%2F&include_reification=true&include_linksback=false[/cc]

irJSON resulset:

[cc lang=’javascript’ line_numbers=’false’ escaped=’true’]

{
“dataset”: {
“linkage”: [
{
“linkedType”: “application/rdf+xml”,
“attributeList”: {
“created”: {
“mapTo”: “http://purl.org/dc/terms/created”
},
“isAbout”: {
“mapTo”: “http://umbel.org/umbel#isAbout”
},
“prefLabel”: {
“mapTo”: “http://purl.org/ontology/iron#prefLabel”
},
“interviewee”: {
“mapTo”: “http://purl.org/ontology/peg#interviewee”
},
“interviewer”: {
“mapTo”: “http://purl.org/ontology/peg#interviewer”
},
“abstract”: {
“mapTo”: “http://purl.org/ontology/bibo/abstract”
},
“storyVideoAudio”: {
“mapTo”: “http://purl.org/ontology/peg#storyVideoAudio”
},
“storyAnnotatedTextUri”: {
“mapTo”: “http://purl.org/ontology/sco#storyAnnotatedTextUri”
},
“storyTextUri”: {
“mapTo”: “http://purl.org/ontology/sco#storyTextUri”
}
},
“typeList”: {
“Story”: {
“mapTo”: “http://purl.org/ontology/muni#Story”
}
}
}
]},
“recordList”: [
{
“id”: “http://www.mypeg.ca/wsf/datasets/272/resource/Ida”,
“type”: “Story”,
“created”: “2010-10-28T18:11:27+00:00”,
“isAbout”: [
{
“ref”: “@@http://purl.org/ontology/peg/framework#EducationAndLearning”
},
{
“ref”: “@@http://purl.org/ontology/peg/framework#Health”
},
{
“ref”: “@@http://purl.org/ontology/peg/framework#Program”
},
{
“ref”: “@@http://purl.org/ontology/peg/framework#Income”
},
{
“ref”: “@@http://purl.org/ontology/peg/framework#Poverty”
}     ],
“prefLabel”: “Ida”,
“interviewee”: “Ida”,
“interviewer”: “Christa Rust”,
“abstract”: “‘Poverty is earning just enough to get by; never having money for extras.’\n\nIda is the mother of two grown children, eight years apart.  She lives in a small bachelor suite, which costs her $428 per month, or 62% of her income.  She volunteers twice a we…”,
“storyVideoAudio”: “http://www.youtube.com/v/0zIqtYhiHfM”,
“storyAnnotatedTextUri”: “http://www.mypeg.ca/scones/Ida.xml”,
“storyTextUri”: “http://www.mypeg.ca/scones/Ida.txt”
}
]
}

[/cc]

Get Well-Being record description with linkbacks in RDF+N3

The characteristic of this query is that I enabled the “include_linksback” parameter. This returns a reference to all the records, in the datasets hosted on the structWSF node, that refers to that target record.

Query:

[cc lang=’text’ line_numbers=’false’]curl -H “Accept: application/rdf+n3” “http://www.mypeg.ca/ws/crud/read/?uri=http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%2Fframework%23WellBeing&datast=http%3A%2F%2Fwww.mypeg.ca%2Fwsf%2Fdatasets%2F249%2F&registered_ip=self%3A%3A0&include_reification=true&include_linksback=true”[/cc]

RDF+N3 resultset:

[cc lang=’text’ line_numbers=’false’ escaped=’true’]

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<http://purl.org/ontology/peg/framework#WellBeing> a <http://purl.org/ontology/peg/framework#WellBeing> ;
<http://purl.org/ontology/iron#prefLabel> “””Well-being””” ;
<http://purl.org/dc/elements/1.1/description> “””Well-being refers to the general quality of life experienced by individuals and communities. The elements of wellbeing include: the ability to meet basic needs, the economy, health, the built environment, governance, education and learning, the natural environment, and social vitality.””” ;
<http://purl.org/ontology/sco#displayComponent> <http://purl.org/ontology/sco#sRelationBrowser> .

<http://purl.org/ontology/peg/framework#WellBeing> a <http://www.w3.org/2002/07/owl#Thing> ;
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ontology/peg/framework#WellBeing> .

<http://purl.org/ontology/peg/framework#Economy> a <http://www.w3.org/2002/07/owl#Thing> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#WellBeing> .

<http://purl.org/ontology/peg/framework#Governance> a <http://www.w3.org/2002/07/owl#Thing> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#WellBeing> .

<http://purl.org/ontology/peg/framework#BuiltEnvironment> a <http://www.w3.org/2002/07/owl#Thing> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#WellBeing> .

<http://purl.org/ontology/peg/framework#NaturalEnvironment> a <http://www.w3.org/2002/07/owl#Thing> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#WellBeing> .

<http://purl.org/ontology/peg/framework#SocialVitality> a <http://www.w3.org/2002/07/owl#Thing> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#WellBeing> .

<http://purl.org/ontology/peg/framework#BasicNeeds> a <http://www.w3.org/2002/07/owl#Thing> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#WellBeing> .

<http://purl.org/ontology/peg/framework#EducationAndLearning> a <http://www.w3.org/2002/07/owl#Thing> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#WellBeing> .

<http://purl.org/ontology/peg/framework#Health> a <http://www.w3.org/2002/07/owl#Thing> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#WellBeing> .

[/cc]

General Endpoint Parameters

The general parameters available for each of these web services is provided in their respective TechWiki documentation. For that detailed information, see the Browse, Search, or CRUD: Read articles.

Conclusion

As you can see, agents can get different kinds of data from the MyPeg.ca portal by querying a set of web service endpoints. This is one way to get data out of the system. These data can then be accessed for indexing in other systems, for direct use, or for dynamic applications like browsing the nodes in the explorer.

This is one of the ways to get data out of the system. A user can also export that same information from the Export features on the Browse, Search and Record View pages. Also, other methods will be explained in the next blog posts from this MyPeg.ca series.

All in all, this shows how effective structWSF can be to integrate, manage and publish a wide range of data in different data formats. It also shows how completely different parts of your software architecture can leverage your information, the way you want, from anywhere on the Internet.

UMBEL Blooms with New Colors

We are happy to announce the new, intermediary, UMBEL version 0.80. This is a major upgrade of the UMBEL ontology: both its vocabulary and its reference structure have been greatly enhanced, an upper structure called the SuperTypes has been added and everything got updated to OWL 2. You can read more about the overall changes on Mike’s blog post.

In this blog post I will focus on two topics: using some existing tools and frameworks to view and manage the reference concepts structure, and how one can use and leverage the coherency of the reference structure.

Navigating and Updating the Reference Structure

One thing that was lacking with the previous version of UMBEL was to have access to a user interface tool that would let you navigate and update the reference structure as you want. Because of the way the conceptual structure was created, it was hard for tools such as Protégé to load it because of all the individuals that were created (such as the SemSet individuals, etc.).

As stated in Mike’s blog post, we made significant changes to the UMBEL vocabulary, and how we instantiate the reference structure. Along with the OWL 2 upgrade, we made sure that the Protégé version 4.1 and the latest version of the OWLAPI could easily load both the UMBEL vocabulary and the reference structure.

Reasoning

One of the major additions to UMBEL v080 is the SuperTypes upper structure, an organizational layer above the UMBEL reference structure. We created these SuperTypes because we found that we could effectively cluster most UMBEL reference concepts into a small set of mostly distinct upper concepts (33 in fact, 29 of which are designed as disjoint).

This new SuperTypes structure helps us mine external sources of information by leveraging related concepts in the reference structure. Moreover, SuperTypes also help us perform easier, simpler, better and faster reasoning over the entire 21 K reference concepts structure.

Thus, SuperTypes provide a new tool to help determine if the UMBEL reference structure is consistent and coherent within itself. This is important, of course, to ensure that linkages between UMBEL and external ontologies is consistent and coherent as well.

So far, the entire reference concepts structure has been tested for its coherency according to the restrictions we defined at the level of the SuperTypes upper structure. Using different reasoners such as Pellet, Fact++ and Hermit (available by default with Protégé 4.1), we made sure that all the statements made between all the RefConcept classes and individuals, and all the statements made between these and the SuperTypes upper structure, are consistent within themselves. This method enabled us to find and fix some early assignment issues.

This new upper structure, along with its now consistent reference structure, helps provide confidence that statements based on UMBEL reference concepts are also consistent. And, all of this is made more testable by virtue of being able to use the OWL API and Protégé with its embedded reasoners.

How is Coherency Tested?

This is the core question. In fact, the more informative answer to this question will be part of a forthcoming blog post. But let’s start here.

The current way to check if the structure is coherent is by making sure that we don’t have an individual that belongs to two different SuperTypes that are stated to be disjoint. What we did with the SuperType upper structure is really simple: we categorized each and every RefConcept (using rdfs:subClassOf) under a SuperType. Most of the SuperTypes are disjoint: this means that if an individual is of rdf:type for two SuperTypes that are stated to be disjoint, then you will end-up with an incoherent structure because you are making a statement that is not permitted by the reference structure.

So, the way to check if your statements are coherent according to this structure, is to make your statements (right now, in terms of individual instantiation), and then to check using a reasoner such as Pellet. There is now a general testing structure to see if any ontology is coherent with respect to the UMBEL reference structure.

In the next blog post in this series, I will tell you how to use exactly the same method for coherency testing, but now for testing if linkages between external ontologies and the UMBEL reference structure are consistent. In that case, you will make the class-to-class assertions you want, and then you will instantiate individuals of these classes, then run the reasoner. Then, the reasoner will tell you if your ontology is still consistent according to the structure and the new statements you created.

Next Step

In parallel with these tutorials, we are also working hard on the next version of UMBEL. As outlined in the Next Changes section of the new UMBEL website, the next step is to release UMBEL v1.0, with a set of new features, before Christmas.