Semantic Web, Structured Dynamics, OSF for Drupal, irON, OSF Web Services, OSF Widgets

Getting Data Out of MyPeg.ca using structWSF Endpoints

A few weeks ago I presented the new MyPeg.ca community indicators web portal for Winnipeg’s citizens. I explained how in MyPeg.ca we leverage Structured Dynamics’ semantic technologies stack (akaThe Semantic Muffin). Today’s blog post explains one facet of the project that shows how external agents (people, services, software, etc.) can interact with the system’s indicator datasets using the structWSF web service endpoints.Since this post focuses only on data export, I suggest you read the structWSF Web Services Tutorial for a complete overview of how the endpoints architecture works.

Merging Pipes

Two Main structWSF Characteristics: Accessibility & Management

structWSF is a set of 22 web service endpoints that lets you integrate data from different sources, manage that integrated data, and publish it via different communication channels such as web pages, software applications, etc.

Obviously, the main characteristic of this framework is that everything is a web service. This means that all functionality of the system can be accessed from anywhere on the Internet. However, this doesn’t mean that everything is open like a snack-bar. In fact, there are two levels of accessibility: (1) access to the web service endpoint’s URL, and (2) access to the content of datasets hosted on structWSF. Depending on the usecase, people could restrict the direct access to the web service endpoint(s) by properly configuring their web server, others could choose to let anyone access the endpoints, but would restrict the access to the dataset(s) hosted by structWSF. In case of MyPeg.ca, the sponsor chose to open the access to their web service endpoints and datasets.

Just by surfing on the MyPeg.ca portal, you are already leveraging these endpoints in multiple different ways. First, each time you generate a browse or a search Web page, you are telling the web server to send multiple queries to different endpoints; then the web page’s content will be populated with that information and presented to you. But, each time you click on an explorer node, your web browser is also sending queries to exactly the same web service endpoints. So, in one case a PHP script acts to query the endpoints; and, in other cases, a Flash Semantic Component does. Depending, all structWSF data can be accessed from quite different environments.

The other main characteristic of structWSF is that any kind of data can be imported in, and exported out, of the system. structWSF leverages RDF (Resource Description Framework) as the canonical data format that can be used to express any other formats. It is because of the usage of RDF that structWSF can act as an effective ETL (Extract, transform, load) system. Depending on the web service endpoint, the output formats currently supported by most of the endpoints are:

But the architecture of the web service endpoints can easily accommodate other formats if needed for a specific usecase.

Getting Data Out Of MyPeg.ca

Now, how can you get data out of MyPeg.ca? There are really two methods. This blog post discusses the CRUD: Read, Browse and Search web service endpoints. In my next blog post, I will focus on using the SPARQL web service endpoint to do the same.

All of the query examples in this blog post will use a tool called Curl to send the queries and to get back the resultsets. I encourage you to download and use that tool to test these endpoints and to gain a feeling for how it works. Also note that only the first record of each resultset is recorded below (of course, the actual results include all records).

Browse

The Browse web service endpoint is used to return lists of records. These records can also be filtered according to their provenance (dataset), type and the attributes that describe them. Now, let’s see how you can use this web service to get data out of MyPeg.ca.

First, there are three datasets available to the public:

  1. Well-being Indicators (http://www.mypeg.ca/wsf/datasets/258/)
  2. Stories (http://www.mypeg.ca/wsf/datasets/272/)
  3. PEG Framework (http://www.mypeg.ca/wsf/datasets/249/)

The resultsets can be serialized using one of these four different formats:

  • text/xml (structXML)
  • application/json (structXML in JSON)
  • application/rdf+xml (RDF/XML)
  • application/rdf+n3 (RDF/N3)

Note: if one of your desired formats is not directly available at the endpoint level, you can always use one of the converter web service endpoints such as: commON, irJSON or TSV/CSV.

Get the first 10 results of the Stories dataset in structXML

Query:

curl -H "Accept: text/xml" "http://www.mypeg.ca/ws/browse/" -d "attributes=all&types=all&datasets=http%3A%2F%2Fwww.mypeg.ca%2Fwsf%2Fdatasets%2F272%2F&items=10&page=0&inference=on&include_aggregates=true"

StructXML resultset:

<?xml version="1.0" encoding="utf-8"?>
<resultset>
<prefix entity="owl" uri="http://www.w3.org/2002/07/owl#"/>
<prefix entity="rdf" uri="http://www.w3.org/1999/02/22-rdf-syntax-ns#"/>
<prefix entity="rdfs" uri="http://www.w3.org/2000/01/rdf-schema#"/>
<prefix entity="wsf" uri="http://purl.org/ontology/wsf#"/>
<subject type="http://purl.org/ontology/muni#Story" uri="http://www.mypeg.ca/wsf/datasets/272/resource/AgeOpportunity">
<predicate type="http://purl.org/dc/terms/isPartOf">
<object type="http://rdfs.org/ns/void#Dataset" uri="http://www.mypeg.ca/wsf/datasets/272/"/>
</predicate>
<predicate type="http://purl.org/ontology/iron#prefLabel">
<object type="rdfs:Literal">Age &amp; Opportunity</object>
</predicate>
<predicate type="http://purl.org/dc/terms/created">
<object type="rdfs:Literal">2010-10-28T19:38:58+00:00</object>
</predicate>
<predicate type="http://purl.org/ontology/bibo/abstract">
<object type="rdfs:Literal">Amanda Macrae, Deborah Lorteau and Stacey Miller work for Age and Opportunity.
The majority of clients are older adults living at lower socio economic status. When addressing the housing issue they say, "In a nutshell, it's dire." There is simply not enou...</object>
</predicate>
<predicate type="http://purl.org/ontology/peg#interviewee">
<object type="rdfs:Literal">Amanda Macrae, Deborah Lorteau, Stacey Miller</object>
</predicate>
<predicate type="http://purl.org/ontology/peg#interviewer">
<object type="rdfs:Literal">Molly Johnson</object>
</predicate>
<predicate type="http://purl.org/ontology/peg#storyRelatedAgencyProgram">
<object type="rdfs:Literal">Age &amp; Opportunity</object>
</predicate>
<predicate type="http://purl.org/ontology/sco#storyAnnotatedTextUri">
<object>http://www.mypeg.ca/scones/AgeOpportunity.xml</object>
</predicate>
<predicate type="http://purl.org/ontology/sco#storyTextUri">
<object type="rdfs:Literal">http://www.mypeg.ca/scones/AgeOpportunity.txt</object>
</predicate>
</subject>
</resultset>

Get the 10 first results from all datasets that are records of type Neighborhoods in RDF/XML

Query:

curl -H "Accept: application/rdf+xml " "http://www.mypeg.ca/ws/browse/" -d "attributes=all&amp; type=http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%23Neighborhood &amp;datasets=all&amp;items=10&amp;page=0&amp;inference=on&amp;include_aggregates=true"

RDF/XML resultset:

<?xml version="1.0"?>
<rdf:RDF  xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:wsf="http://purl.org/ontology/wsf#" xmlns:ns0="http://purl.org/ontology/peg#" xmlns:ns1="http://purl.org/dc/terms/" xmlns:ns2="http://purl.org/ontology/iron#" xmlns:ns3="" xmlns:ns4="http://purl.org/dc/elements/1.1/" xmlns:ns5="http://purl.org/ontology/aggregate#">

<ns0:Component rdf:about="http://purl.org/ontology/peg/framework#Safety">
<ns1:isPartOf rdf:resource="http://www.mypeg.ca/wsf/datasets/249/" />
<ns2:prefLabel>Safety</ns2:prefLabel>
<ns2:altLabel>safety</ns2:altLabel>
<ns3:>safety</ns3:>
<ns4:description>Safety is the state of being "safe", the condition of being protected against physical, social, spiritual, financial, political, emotional, occupational, psychological, educational or other types or consequences of failure, damage, error, accidents, harm or any other event which could be considered non-desirable.</ns4:description>
<rdfs:comment>Includes the idea of safety prevention</rdfs:comment>
<rdfs:seeAlso>http://en.wikipedia.org/wiki/Safety</rdfs:seeAlso>
<ns0:hasIndicator rdf:resource="http://purl.org/ontology/peg/framework#HouseholdIncome" />
<ns0:hasIndicator rdf:resource="http://purl.org/ontology/peg/framework#LowIncomeCutOffAfterTax" />
<ns0:hasIndicator rdf:resource="http://purl.org/ontology/peg/framework#MarketBasketMeasure" />
<ns0:hasIndicator rdf:resource="http://purl.org/ontology/peg/framework#ParticipationInSportsAndRecreation" />
<ns0:hasIndicator rdf:resource="http://purl.org/ontology/peg/framework#MaternalSocialIsolation" />
<ns0:hasIndicator rdf:resource="http://purl.org/ontology/peg/framework#PersonalSafety" />
<ns0:hasIndicator rdf:resource="http://purl.org/ontology/peg/framework#EarlyDevelopmentInstrument" />
<ns0:hasIndicator rdf:resource="http://purl.org/ontology/peg/framework#HighSchoolGraduationRate" />
<ns0:hasIndicator rdf:resource="http://purl.org/ontology/peg/framework#LongTermUnemployment" />
<ns0:hasIndicator rdf:resource="http://purl.org/ontology/peg/framework#TeenageBirths" />
<ns0:isComponentOf rdf:resource="http://purl.org/ontology/peg/framework#BasicNeeds" />
<ns0:isComponentOf rdf:resource="http://purl.org/ontology/peg/framework#Poverty" />
</ns0:Component>
</rdf:RDF>

Search

The Search web service endpoint is also used to return lists of records. These records should match a search string and can also be filtered according to their provenance (dataset), type and the attributes that describe them.

The same mime types and datasets as the ones for the Browse web service are available for the Search endpoint.

Searching for records with the keyword “poverty” and get resultsets in RDF/N3

Query:

curl -H "Accept: application/rdf+n3" "http://www.mypeg.ca/ws/search/" -d "query=poverty&amp;datasets=all&amp;items=10&amp;page=0&amp;inference=on&amp;include_aggregates=true"

RDF/N3 resultset:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix wsf: <http://purl.org/ontology/wsf#> .

<http://purl.org/ontology/peg/framework#Poverty> a <http://purl.org/ontology/peg#CrossCuttingIssue> ;
<http://purl.org/dc/terms/isPartOf> <http://www.mypeg.ca/wsf/datasets/249/> ;
<http://purl.org/ontology/iron#prefLabel> """Poverty""" ;
<http://purl.org/dc/elements/1.1/description> """Poverty is not having the sufficient resources, capabilities, choices, security and power necessary to enjoy an adequate standard of living.  Poverty includes much more than a lack of money.  It includes being excluded from ordinary living patterns, customs and activities.  Consequently, people living in poverty are often unable to participate fully in their communities or to reach their full potential.""" ;
<http://www.w3.org/2000/01/rdf-schema#seeAlso> """http://en.wikipedia.org/wiki/Poverty""" .

CRUD: Read

The Browse and Search web service endpoints are really used to find lists of records according to some provided criteria. However, the complete description of these records is not returned by these endpoints, but only the information necessary to create the proper list to display to users in a user interface. So, to get the complete description of a record (or multiples thereof), you have to use the CRUD: Read web service endpoint. Also, sometimes you may get a reference to a record hosted on a structWSF node, then CRUD: Read is the way to get its full description.

Get the full description of the Ida story in irJSON

Query:

curl -H "Accept: application/iron+json" "http://www.mypeg.ca/ws/crud/read/?uri=http%3A%2F%2Fwww.mypeg.ca%2Fwsf%2Fdatasets%2F272%2Fresource%2FIda&amp;dataset=http%3A%2F%2Fwww.mypeg.ca%2Fwsf%2Fdatasets%2F272%2F&amp;include_reification=true&amp;include_linksback=false

irJSON resulset:

{
"dataset": {
"linkage": [
{
"linkedType": "application/rdf+xml",
"attributeList": {
"created": {
"mapTo": "http://purl.org/dc/terms/created"
},
"isAbout": {
"mapTo": "http://umbel.org/umbel#isAbout"
},
"prefLabel": {
"mapTo": "http://purl.org/ontology/iron#prefLabel"
},
"interviewee": {
"mapTo": "http://purl.org/ontology/peg#interviewee"
},
"interviewer": {
"mapTo": "http://purl.org/ontology/peg#interviewer"
},
"abstract": {
"mapTo": "http://purl.org/ontology/bibo/abstract"
},
"storyVideoAudio": {
"mapTo": "http://purl.org/ontology/peg#storyVideoAudio"
},
"storyAnnotatedTextUri": {
"mapTo": "http://purl.org/ontology/sco#storyAnnotatedTextUri"
},
"storyTextUri": {
"mapTo": "http://purl.org/ontology/sco#storyTextUri"
}
},
"typeList": {
"Story": {
"mapTo": "http://purl.org/ontology/muni#Story"
}
}
}
]},
"recordList": [
{
"id": "http://www.mypeg.ca/wsf/datasets/272/resource/Ida",
"type": "Story",
"created": "2010-10-28T18:11:27+00:00",
"isAbout": [
{
"ref": "@@http://purl.org/ontology/peg/framework#EducationAndLearning"
},
{
"ref": "@@http://purl.org/ontology/peg/framework#Health"
},
{
"ref": "@@http://purl.org/ontology/peg/framework#Program"
},
{
"ref": "@@http://purl.org/ontology/peg/framework#Income"
},
{
"ref": "@@http://purl.org/ontology/peg/framework#Poverty"
}     ],
"prefLabel": "Ida",
"interviewee": "Ida",
"interviewer": "Christa Rust",
"abstract": "'Poverty is earning just enough to get by; never having money for extras.'\n\nIda is the mother of two grown children, eight years apart.  She lives in a small bachelor suite, which costs her $428 per month, or 62% of her income.  She volunteers twice a we...",
"storyVideoAudio": "http://www.youtube.com/v/0zIqtYhiHfM",
"storyAnnotatedTextUri": "http://www.mypeg.ca/scones/Ida.xml",
"storyTextUri": "http://www.mypeg.ca/scones/Ida.txt"
}
]
}

Get Well-Being record description with linkbacks in RDF+N3

The characteristic of this query is that I enabled the “include_linksback” parameter. This returns a reference to all the records, in the datasets hosted on the structWSF node, that refers to that target record.

Query:

curl -H "Accept: application/rdf+n3" "http://www.mypeg.ca/ws/crud/read/?uri=http%3A%2F%2Fpurl.org%2Fontology%2Fpeg%2Fframework%23WellBeing&amp;datast=http%3A%2F%2Fwww.mypeg.ca%2Fwsf%2Fdatasets%2F249%2F&amp;registered_ip=self%3A%3A0&amp;include_reification=true&amp;include_linksback=true"

RDF+N3 resultset:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<http://purl.org/ontology/peg/framework#WellBeing> a <http://purl.org/ontology/peg/framework#WellBeing> ;
<http://purl.org/ontology/iron#prefLabel> """Well-being""" ;
<http://purl.org/dc/elements/1.1/description> """Well-being refers to the general quality of life experienced by individuals and communities. The elements of wellbeing include: the ability to meet basic needs, the economy, health, the built environment, governance, education and learning, the natural environment, and social vitality.""" ;
<http://purl.org/ontology/sco#displayComponent> <http://purl.org/ontology/sco#sRelationBrowser> .

<http://purl.org/ontology/peg/framework#WellBeing> a <http://www.w3.org/2002/07/owl#Thing> ;
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ontology/peg/framework#WellBeing> .

<http://purl.org/ontology/peg/framework#Economy> a <http://www.w3.org/2002/07/owl#Thing> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#WellBeing> .

<http://purl.org/ontology/peg/framework#Governance> a <http://www.w3.org/2002/07/owl#Thing> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#WellBeing> .

<http://purl.org/ontology/peg/framework#BuiltEnvironment> a <http://www.w3.org/2002/07/owl#Thing> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#WellBeing> .

<http://purl.org/ontology/peg/framework#NaturalEnvironment> a <http://www.w3.org/2002/07/owl#Thing> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#WellBeing> .

<http://purl.org/ontology/peg/framework#SocialVitality> a <http://www.w3.org/2002/07/owl#Thing> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#WellBeing> .

<http://purl.org/ontology/peg/framework#BasicNeeds> a <http://www.w3.org/2002/07/owl#Thing> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#WellBeing> .

<http://purl.org/ontology/peg/framework#EducationAndLearning> a <http://www.w3.org/2002/07/owl#Thing> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#WellBeing> .

<http://purl.org/ontology/peg/framework#Health> a <http://www.w3.org/2002/07/owl#Thing> ;
<http://purl.org/ontology/peg#isThemeOf> <http://purl.org/ontology/peg/framework#WellBeing> .

General Endpoint Parameters

The general parameters available for each of these web services is provided in their respective TechWiki documentation. For that detailed information, see the Browse, Search, or CRUD: Read articles.

Conclusion

As you can see, agents can get different kinds of data from the MyPeg.ca portal by querying a set of web service endpoints. This is one way to get data out of the system. These data can then be accessed for indexing in other systems, for direct use, or for dynamic applications like browsing the nodes in the explorer.

This is one of the ways to get data out of the system. A user can also export that same information from the Export features on the Browse, Search and Record View pages. Also, other methods will be explained in the next blog posts from this MyPeg.ca series.

All in all, this shows how effective structWSF can be to integrate, manage and publish a wide range of data in different data formats. It also shows how completely different parts of your software architecture can leverage your information, the way you want, from anywhere on the Internet.

One thought on “Getting Data Out of MyPeg.ca using structWSF Endpoints

Leave a Reply