Tag Archive for 'sparql'

Volkswagen’s RDF Data Management Workflow

TribalDDB UK’s team just published a new case study to the W3C: Case Study: Contextual Search for Volkswagen and the Automotive Industry. They discuss the benefits of some of the semantic web technologies, techniques and concepts that they use to help them managing their data. They describe their approach and outline their design. It covers the technical aspects of their new Semantic Web Platform that I wrote about a few weeks ago.

In this blog post, I want to further explain their data management workflow, and how their data get exposed to different kind of users.

Two Classes of Users

Let’s take a look at their data ingest/management/publishing workflow:

As you can see, all their data get collected, transformed and imported into structWSF. As I explained in my previous blog post, they are using structWSF to manage all their RDF data and access all the functionalities from the different web service endpoints.

However, how the data get exposed to the users is not that clear. In fact, it depends on the classes of users. A user can be multiple different things: it may be a person, it may be a computer software, it may be an organization, etc. However, there are two general classes of users:

  1. Public users, and
  2. Private users

Public users are users that have no direct relation with Volkswagen and that have no access to their internal network. Private users are generally internal departments or some internal software applications that have direct access to the structWSF instance.

Private Users

Private users generally have access to all structWSF web service endpoints. This means that all structWSF functionalities are accessible to them by querying the endpoints.

Two different kind of private users are specified in the use case’s schema:

  1. Volkswagen Site Search
  2. Other / External Applications

The Volkswagen site search is a software application that uses the structWSF Search endpoint to search, filter and expose their data to their users (the people who perform searches on the Volkswagen UK website).

The other/external applications are software applications that have access to the structWSF instance. These are generally internal applications that run in the same network. One of these applications is an internal software that exports all the RDF data from the structWSF SPARQL endpoint, and import it into Kasabi.

These are two examples of software applications that Volkswagen created around the structWSF web services to re-purpose, re-contextualize and re-publish their RDF data.

Public Users

There is currently two kinds of public users of this new Volkswagen Semantic Platform:

  • People, and
  • Software applications

Two interfaces have been made publicly available for each of these kinds of users:

  • A website search engine page for people, and
  • A SPARQL endpoint for software applications

When a person user reaches the website’s search page, the search query get sent to the structWSF Search web service endpoint. The result is then returned to the engine, get templated and displayed to the person user.

A SPARQL endpoint is accessible to the software applications. This endpoint is hosted by the Kasabi information marketplace. Volkswagen chooses to export everything from their structWSF into Kasabi to outsource the maintenance of their public SPARQL endpoint.

Unlock the Power

As we saw in this blog post and in the W3C use case, all Volkswagen UK data is internally managed by structWSF; however they are not locked into that system. They can easily communicate with external services to add new functionalities to their stack or to take business decision such as outsourcing the management of some publicly accessible data access endpoints.

This is an important characteristic of their design:

By choosing semantic web technologies (such as structWSF), techniques and concepts (such as their Vehicles OWL Ontology and RDF), they are not locking themselves into a specific framework. They can easily communicate with external systems and applications. This means that they can quickly adapt their system to their constantly changing needs.

Conclusion

I wrote this blog post to further explain Volkswagen’s data management workflow. I wanted to make sure that people were understanding the role that structWSF has in this use case, and the ecosystem it operates in.

Querying the MyPeg datasets using the structWSF SPARQL endpoint

The last blog post I wrote demonstrated how one could query the MyPeg.ca portal using the full set of structWSF web service endpoints to get data out of the portal. However, I didn't cover the usage of the SPARQL endpoint since I wanted to cover it in its own blog post to explain all its characteristics.

In this blog post, I will demonstrate how one can get data out of the MyPeg.ca community indicators web portal for Winnipeg's citizens using the SPARQL endpoint. I will also cover all the specificities of this SPARQL endpoint: all its characteristics and access/permission features.

Two Modes

The first characteristic of the structWSF SPARQL endpoint is that there are two modes (usecases) it can be used for:

  1. Getting SPARQL resultsets that match some SPARQL queries patterns
  2. Getting complete records descriptions in any format supported by the endpoint

The first mode is not different than any other SPARQL endpoint. Users just send different SPARQL queries and retrieve their related SPARQL resultsets. These resultsets can be returned using different MIME types.

For a SELECT query, these formats can be used:

  1. application/sparql-results+xml
  2. application/sparql-results+json

For a DESCRIBE or a CONSTRUCT query, these formats can be used:

  1. text/rdf+n3 (RDF+N3)
  2. application/rdf+xml (RDF+XML)
  3. application/rdf+json (RDF+JSON)
  4. text/plain (NTRIPLES)

The second mode is quite different. The main characteristic of the structWSF SPARQL endpoint is that it can export resultsets into different formats, not usually supported by other endpoints. However, these other formats are usually used to describe complete descriptions of records, and not just some triples matching some SPARQL patterns.

For that reason, the SPARQL query that is sent using this second mode needs to have the three variables (1) ?s, (2) ?p and (3) ?o bound in the SPARQL query, otherwise an empty resultset will be returned. For example, the following SPARQL query would return the complete records descriptions of all the records that are peg:Theme and that are themes of the peg:WellBeing cross cutting issue:

SELECT ?s ?p ?o
WHERE
{
?s a <http://purl.org/ontology/peg#Theme> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#WellBeing> ;
?p ?o .
}

This mode is used to return a set of records descriptions that match a SPARQL pattern. Supported MIME formats for that second mode are:

  • text/xml (structXML)
  • application/json (structXML in JSON)
  • application/rdf+xml (RDF+XML)
  • application/rdf+n3 (RDF+N3)
  • application/sparql-results+xml (SPARQL resultset in XML)
  • application/sparql-results+json (SPARQL resultset in XML)

Getting Records in Different Formats

Now, let's take a look at what is returned for the SPARQL query above, for each of these supported MIME types, from the MyPeg.ca SPARQL endpoint.

Note that the queries below are using the Curl application (available for multiple operating systems) to send the HTTP queries to the structWSF SPARQL web service endpoint.

StructXML: text/xml

curl -H "Accept: text/xml" "http://www.mypeg.ca/ws/sparql/" -d "dataset=http://www.mypeg%2Eca%2Fwsf%2Fdatasets%2F249%2F&query= SELECT+%3Fs+%3Fp+%3Fo%0D%0AWHERE%0D%0A{%0D%0A++%3Fs+a+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%23Theme>+%3B%0D%0A+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%23isThemeOf>+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%2Fframework%23WellBeing>+%3B%0D%0A+++++%3Fp+%3Fo+.%0D%0A}%0D%0A"
<?xml version="1.0" encoding="utf-8"?>
<resultset>
<prefix entity="owl" uri="http://www.w3.org/2002/07/owl#"/>
<prefix entity="rdf" uri="http://www.w3.org/1999/02/22-rdf-syntax-ns#"/>
<prefix entity="rdfs" uri="http://www.w3.org/2000/01/rdf-schema#"/>
<prefix entity="wsf" uri="http://purl.org/ontology/wsf#"/>
<subject type="http://purl.org/ontology/peg#Theme" uri="http://purl.org/ontology/peg/framework#Economy">
<predicate type="http://purl.org/ontology/peg#isThemeOf">
<object uri="http://purl.org/ontology/peg/framework#WellBeing"/>
</predicate>
<predicate type="http://purl.org/ontology/peg#isThemeOf">
<object uri="http://purl.org/ontology/peg/framework#Poverty"/>
</predicate>
<predicate type="http://purl.org/ontology/sco#displayComponent">
<object uri="http://purl.org/ontology/sco#sRelationBrowser"/>
</predicate>
<predicate type="http://www.w3.org/2000/01/rdf-schema#label">
<object type="rdfs:Literal">economy</object>
</predicate>
<predicate type="http://purl.org/ontology/iron#prefLabel">
<object type="rdfs:Literal">Economy</object>
</predicate>
<predicate type="http://purl.org/dc/elements/1.1/description">
<object type="rdfs:Literal">Economy includes all that people do in our community to produce, exchange, distribute, and consume goods and services.</object>
</predicate>
</subject>
</resultset>

StructXML in JSON: application/json

curl -H "Accept: application/json" "http://www.mypeg.ca/ws/sparql/" -d "dataset=http://www.mypeg%2Eca%2Fwsf%2Fdatasets%2F249%2F&query= SELECT+%3Fs+%3Fp+%3Fo%0D%0AWHERE%0D%0A{%0D%0A++%3Fs+a+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%23Theme>+%3B%0D%0A+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%23isThemeOf>+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%2Fframework%23WellBeing>+%3B%0D%0A+++++%3Fp+%3Fo+.%0D%0A}%0D%0A"
{
"prefixes": [
{
"owl": "http://www.w3.org/2002/07/owl#",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"wsf": "http://purl.org/ontology/wsf#",
"ns0": "http://purl.org/ontology/peg#",
"ns1": "http://purl.org/ontology/sco#",
"ns2": "http://purl.org/ontology/iron#",
"ns3": "http://purl.org/dc/elements/1.1/"
}
],
"resultset": {
"subject": [
{
"uri": "http://purl.org/ontology/peg/framework#Economy",
"type": "ns0:Theme",
"predicates": [
{
"ns0:isThemeOf": {
"uri": "http://purl.org/ontology/peg/framework#WellBeing"
}
},
{
"ns0:isThemeOf": {
"uri": "http://purl.org/ontology/peg/framework#Poverty"
}
},
{
"ns1:displayComponent": {
"uri": "http://purl.org/ontology/sco#sRelationBrowser"
}
},
{
"rdfs:label": "economy"
},
{
"ns2:prefLabel": "Economy"
},
{
"ns3:description": "Economy includes all that people do in our community to produce, exchange, distribute, and consume goods and services."
}
]
},
]
}
}

RDF in XML: application/rdf+xml

curl -H "Accept: application/rdf+xml" "http://www.mypeg.ca/ws/sparql/" -d "dataset=http://www.mypeg%2Eca%2Fwsf%2Fdatasets%2F249%2F&query= SELECT+%3Fs+%3Fp+%3Fo%0D%0AWHERE%0D%0A{%0D%0A++%3Fs+a+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%23Theme>+%3B%0D%0A+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%23isThemeOf>+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%2Fframework%23WellBeing>+%3B%0D%0A+++++%3Fp+%3Fo+.%0D%0A}%0D%0A"
<?xml version="1.0"?>
<rdf:RDF  xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:wsf="http://purl.org/ontology/wsf#" xmlns:ns0="http://purl.org/ontology/peg#" xmlns:ns1="http://purl.org/ontology/sco#" xmlns:ns2="http://purl.org/ontology/iron#" xmlns:ns3="http://purl.org/dc/elements/1.1/">

<ns0:Theme rdf:about="http://purl.org/ontology/peg/framework#Economy">
<ns0:isThemeOf rdf:resource="http://purl.org/ontology/peg/framework#WellBeing" />
<ns0:isThemeOf rdf:resource="http://purl.org/ontology/peg/framework#Poverty" />
<ns1:displayComponent rdf:resource="http://purl.org/ontology/sco#sRelationBrowser" />
<rdfs:label>economy</rdfs:label>
<ns2:prefLabel>Economy</ns2:prefLabel>
<ns3:description>Economy includes all that people do in our community to produce, exchange, distribute, and consume goods and services.</ns3:description>
</ns0:Theme>

</rdf:RDF>

RDF in N3: application/rdf+n3

curl -H "Accept: application/rdf+n3" "http://www.mypeg.ca/ws/sparql/" -d "dataset=http://www.mypeg%2Eca%2Fwsf%2Fdatasets%2F249%2F&query= SELECT+%3Fs+%3Fp+%3Fo%0D%0AWHERE%0D%0A{%0D%0A++%3Fs+a+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%23Theme>+%3B%0D%0A+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%23isThemeOf>+<http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%2Fframework%23WellBeing>+%3B%0D%0A+++++%3Fp+%3Fo+.%0D%0A}%0D%0A"
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix wsf: <http://purl.org/ontology/wsf#> .

<http://purl.org/ontology/peg/framework#Economy> a <http://purl.org/ontology/peg#Theme> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#WellBeing> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#Poverty> ;
<http://purl.org/ontology/sco#displayComponent> <http://purl.org/ontology/sco#sRelationBrowser> ;
<http://www.w3.org/2000/01/rdf-schema#label> """economy""" ;
<http://purl.org/ontology/iron#prefLabel> """Economy""" ;
<http://purl.org/dc/elements/1.1/description> """Economy includes all that people do in our community to produce, exchange, distribute, and consume goods and services.""" .

Getting Records Using CONSTRUCT

You always have the possibility to use a CONSTRUCT query to return data in different formats. Unlike with the second mode supported by the endpoint, you won't have access to different formats (such as structXML both in XML and JSON). Here is such a CONSTRUCT query:

CONSTRUCT FROM <http://www.mypeg.ca/wsf/datasets/249/>
{
?s ?p ?o .
}
WHERE
{
?s peg:isThemeOf <http://purl.org/ontology/peg/framework#WellBeing> .
?s ?o ?p .
}
curl -H "Accept: application/rdf+xml" "http://www.mypeg.ca/ws/sparql/" -d "query=PREFIX%20peg%3A%20%3Chttp%3A%2F%2Fpurl.org%2Fontology%2Fpeg%23%3E%0ACONSTRUCT%0A%7B%0A%20%20%3Fs%20peg%3AisThemeOf%20%3Chttp%3A%2F%2Fpurl.org%2Fontology%2Fpeg%2Fframework%23WellBeing%3E%20.%0A%7D%0AFROM%20%3Chttp%3A%2F%2Fwww.mypeg.ca%2Fwsf%2Fdatasets%2F249%2F%3E%0AWHERE%20%0A%7B%20%0A%20%20%3Fs%20peg%3AisThemeOf%20%3Chttp%3A%2F%2Fpurl.org%2Fontology%2Fpeg%2Fframework%23WellBeing%3E%20.%0A%7D"
<?xml version="1.0" encoding="utf-8" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<rdf:Description rdf:about="http://purl.org/ontology/peg/framework#EducationAndLearning"><n0pred:isThemeOf xmlns:n0pred="http://purl.org/ontology/peg#" rdf:resource="http://purl.org/ontology/peg/framework#WellBeing"/></rdf:Description>
<rdf:Description rdf:about="http://purl.org/ontology/peg/framework#BasicNeeds"><n0pred:isThemeOf xmlns:n0pred="http://purl.org/ontology/peg#" rdf:resource="http://purl.org/ontology/peg/framework#WellBeing"/></rdf:Description>
<rdf:Description rdf:about="http://purl.org/ontology/peg/framework#Health"><n0pred:isThemeOf xmlns:n0pred="http://purl.org/ontology/peg#" rdf:resource="http://purl.org/ontology/peg/framework#WellBeing"/></rdf:Description>
<rdf:Description rdf:about="http://purl.org/ontology/peg/framework#SocialVitality"><n0pred:isThemeOf xmlns:n0pred="http://purl.org/ontology/peg#" rdf:resource="http://purl.org/ontology/peg/framework#WellBeing"/></rdf:Description>
<rdf:Description rdf:about="http://purl.org/ontology/peg/framework#BuiltEnvironment"><n0pred:isThemeOf xmlns:n0pred="http://purl.org/ontology/peg#" rdf:resource="http://purl.org/ontology/peg/framework#WellBeing"/></rdf:Description>
<rdf:Description rdf:about="http://purl.org/ontology/peg/framework#NaturalEnvironment"><n0pred:isThemeOf xmlns:n0pred="http://purl.org/ontology/peg#" rdf:resource="http://purl.org/ontology/peg/framework#WellBeing"/></rdf:Description>
<rdf:Description rdf:about="http://purl.org/ontology/peg/framework#Economy"><n0pred:isThemeOf xmlns:n0pred="http://purl.org/ontology/peg#" rdf:resource="http://purl.org/ontology/peg/framework#WellBeing"/></rdf:Description>
<rdf:Description rdf:about="http://purl.org/ontology/peg/framework#Governance"><n0pred:isThemeOf xmlns:n0pred="http://purl.org/ontology/peg#" rdf:resource="http://purl.org/ontology/peg/framework#WellBeing"/></rdf:Description>
</rdf:RDF>

SPARQL Queries Restrictions

The structWSF SPARQL endpoint has some restrictions that have been introduced to make sure that the requesting users can only query the data to which they have access.

In structWSF, all permissions are attached to a dataset (a graph). Different users have different Create, Read, Update and Delete permissions on different datasets hosted on the same structWSF endpoint. Because of this core mechanism in structWSF, we had to make sure that these same restrictions were applied for the SPARQL endpoint. This means that different SPARQL clauses and usages are restricted.

This section covers these specific restrictions for a structWSF SPARQL endpoint.

Accessing Dataset Without Permissions

Let's try to see what happens when someone tries to access a dataset to which he doesn't have access. Consider this SPARQL query:

PREFIX mypeg: <http://www.mypeg.ca/wsf/>
SELECT ?s ?p ?o FROM mypeg:
WHERE
{
?s ?p ?o .
}

Obviously, no user has a direct access to that dataset on the MyPeg instance:

curl -H "Accept: application/rdf+n3" "http://www.mypeg.ca/ws/sparql/" -d "query= PREFIX%20mypeg%3A%20%3Chttp%3A%2F%2Fwww.mypeg.ca%2Fwsf%2F%3E%0ASELECT%20%3Fs%20%3Fp%20%3Fo%20FROM%20mypeg%3A%0AWHERE%0A%7B%0A%20%20%3Fs%20%3Fp%20%3Fo%20.%0A%7D%0A"
<error>
<id>WS-AUTH-VALIDATOR-303</id>
<webservice>/ws/auth/validator/</webservice>
<name>No access defined</name>
<description>No access defined for this requester IP , dataset and web service</description>
<debugInformation>No access defined for this requester IP (174.129.43.163), dataset (http://www.mypeg.ca/wsf/) and web service (http://www.mypeg.ca/wsf/ws/sparql/)</debugInformation>
<level>Warning</level>
</error>

So, even if a dataset exists in a triple store that exposes a SPARQL endpoint, not all users have access to all of these datasets. The access and permissions layer will restrict the access to them if need be.

If a FROM clause, or multiple FROM NAMED clauses are specified in the SPARQL query, the access layer will make sure that the user has access to all these datasets. If he doesn't have access to one of them, then an error will be returned.

CONSTRUCT

The CONSTRUCT clause can be used against this SPARQL endpoint, but only if it doesn't use any GRAPH clauses. However, we encourage users to use the method described in the section "Getting Records in Different Formats" since more formats can be requested, and more formats can easily be added in the future.

Here is an example of a CONSTRUCT query that uses a GRAPH clause:

CONSTRUCT
{
?s ?p ?o
}
WHERE
{
graph <http://www.mypeg.ca/wsf/datasets/249/>
{
?s ?p ?o
}
}
curl -H "Accept: application/rdf+xml" "http://www.mypeg.ca/ws/sparql/" -d " query=CONSTRUCT%0A%7B%0A%20%20%3Fs%20%3Fp%20%3Fo%0A%7D%0AWHERE%20%0A%7B%20%0A%20%20graph%20%3C%20http%3A%2F%2Fwww.mypeg.ca%2Fwsf%2Fdatasets%2F249%2F%3E%0A%20%20%7B%0A%20%20%20%20%3Fs%20%3Fp%20%3Fo%0A%20%20%7D%0A%7D%0A"
<?xml version="1.0" encoding="utf-8"?>
<error>
<id>WS-SPARQL-205</id>
<webservice>/ws/sparql/</webservice>
<name>GRAPH not permitted.</name>
<description>The SPARQL GRAPH clause is not permitted for this sparql endpoint. Please change your SPARQL query to specify the datasets you want to query with the FROM and FROM NAMED sparql clauses, or with the dataset parameter.</description>
<debugInformation></debugInformation>
<level>Warning</level>
</error>

As you can see, the endpoint will return a 205 error if a GRAPH clause is used within a CONSTRUCT statement.

GRAPH

As we saw above, no GRAPH clauses can be used in a SPARQL query. The reason is that we don't want people to send SPARQL queries with GRAPH clauses that use variables. Otherwise, if we permitted GRAPH clauses to be used with variables, we couldn't currently determine what triple comes from what dataset and so, we couldn't ensure the access and permissions to that data.

However, in the future two improvements could be created to enable the usage of GRAPH clauses in SPARQL queries processed by structWSF:

  1. We could enable people to use GRAPH clauses that use direct IRI_REF references. That way, structWSF could easily check the permissions for these graphs (just like it does handle the FROM, FROM NAMED and DESCRIPTION clauses).
  2. We could enable the full usage of the GRAPH clause. However, we would have to modify the queries at the level of the endpoint to get the graph provenance of all the triples. Then the endpoint would have to analyze the provenance of each triple and only return the ones that the user has access to. This would inevitably slow down the query time to process the SPARQL request.

In the mean time, no GRAPH clauses can be used in any SPARQL query, and people should use the FROM and FROM NAMED clauses to get access to all the datasets they want from a particular endpoint.

SPARUL

No SPARQL/Update (SPARUL) queries can be sent via the structWSF SPARQL endpoint. All data modifications (records and/or dataset creation, updating and deleting) have to be performed by the Dataset and Record CRUD web service endpoints.

Conclusion

The structWSF SPARQL endpoint is a wrapper above a triple store's SPARQL endpoint. It adds a permissions and access layer that is compatible with that used by other structWSF web services. This permission layer ensures that requesters only access the information they have access to within the triple store. Also, all of these access permissions are managed by the other structWSF web service endpoints, and can also be managed via the conStruct user interface.

The structWSF SPARQL endpoint also supports more resultset formats than are generally supported by mainstream triple stores. Also, the addition of new formats is made easier by using structWSF's way to convert data in different formats.




This blog is a regularly updated collection of my thoughts, tips, tricks and ideas about data mining, data integration, data publishing, the semantic Web, my researches and other related software development.


RSS Twitter LinkedIN


Follow

Get every new post on this blog delivered to your Inbox.

Join 73 other followers:

Or subscribe to the RSS feed by clicking on the counter:




RSS Twitter LinkedIN