MyPeg.ca – A Community Indicators Web Portal Using Semantic Web Technologies

Now that the MyPeg.ca project has been unveiled at the Winnipeg Poverty Reduction Partnership Forum, I can now start to write about each and every feature of this innovative website. Peg

MyPeg.ca is a public indicators Web portal for the Canadian city of Winnipeg. It is supported by an open-source semantic web framework called OSF. This initial beta version of the Web portal emphasizes the integration, management, exploration and display of a few hundred Well-being indicators’ data for the city.

This community indicators portal is currently the best example of a Citizen Dan instance (by Structured Dynamics). MyPeg.ca has been developed using the complete OSF (Open Semantic Framework) technologies stack. It is the reason why I (we) are really proud to start writing about this new innovative project. Mike also published an article that talk about other characteristics of the Peg project.

However, this project would not have been possible without the vision and the dedication of the IISD and the United Way of Winnipeg teams along with their partners. Also, it would not have been that well designed without Tactica‘s high quality graphics and design work.

MyPeg.ca’s Technology Stack

The project fully integrates, and leverages, the OSF (Open Semantic Framework) technologies stack and is based on the Citizen Dan community indicators principles. In the coming weeks, I will write about all and every aspects of the portal, however let’s take a first general overview of what is in the box.

The OSF stack is represented by this beautiful semantic muffin:

OSF layers
OSF layers

Everything starts with MyPeg’s existing assets:

  1. Their Peg Framework which is the conceptual framework they created to analyze different facets of their community by leveraging a series of hundreds of indicators.
  2. The indicators data that they aggregated and collected from different federal, provincial, municipal and local sources
  3. The interviews they are performing with tens, and eventually hundreds, of Winnipeg citizens

Then all this data has been imported into the structWSF semantic data management framework by using two other pieces of technology:

  1. The indicators data is described using the commON irON profile, and is maintained by the IISD team using a set of Excel spreadsheets. Then the dataset have been imported using the structImport conStruct module.
  2. The interviews have been analyzed, tagged and imported in the system by using the Scones service and its structScones conStruct user interface.

Once all the data gets imported into the structWSF instance, it becomes available to all the conStruct modules, all the Semantic Components and all other tools that communicate with the structWSF web service endpoints.

Then ontologies have been used to describe the Peg Framework and to describe all the attributes of all the records (Neighborhoods, Cities, Community Areas and Stories). Already existing ontologies such as SCO have also been used for different criteria (such as driving the usage of the Semantic Component tools).

Then the sRelationBrowser, sDashboard, sMap, sStory, sBarChart and the sLinearChart Semantic Components along with the PortableControlApplication and Workbench applications have been used by Peg to create, manage, explore and publish information from their datasets.

Finally, the entire portal is published using Drupal and the set of conStruct modules. conStruct is the user interface to the structWSF web service endpoints. The mix of Drupal & conStruct templating technologies make it the perfect match to expose all the data, in different ways, by embedding different tools (such as the Semantic Components) depending on different criteria (user permissions, how the information is described into the system, etc.).

This is not a simple technology stack. However, this MyPeg.ca project is a good example of how an organization that never worked with semantic web technologies in the past have been able to has a long term vision of its objectives and how it understands that semantic technologies could help it to reach the aims of its vision. Then it demonstrates how everything has been integrated in an innovative Web portal.

Next Steps…

As I said above, in the coming weeks I will write about each of these technologies. I will show how each of them have been leveraged into the MyPeg.ca portal: how such generic tools have been used for highly specific tasks within the Peg project. Here is an overview of what is coming, where each main topic will result in a new blog post:

  • How to integrate MyPeg indicators data into any Web application by using the structWSF web service endpoint
  • Querying the MyPeg datasets, the geeky way, using the SPARQL endpoint
  • Six ways to get data out of the system
    • By using the CrudRead/Search/Browse web service endpoints
    • By querying the SPARQL endpoint
    • By dereferencing record URIs
    • By using the export features on any record view pages
    • By using the export features of the search/browse modules pages
    • By using the structExport conStruct module
  • How to use the explorer (sRelationBrowser) to browse conceptual structure and to display all kind of related information at each step
  • Use of Scones to analyze, tag, index and display unstructured data
  • Use of ontologies to drive the system
    • How ontologies are used to describe conceptual frameworks that drive these portals
    • How ontologies are used to drive the semantic components (SCO)
  • Use of the commON irON profile and conStruct to serialize indicators data and to import it into the system
    • The benefits of commON as a common ground between the semantic web practitioner and the client.
    • commON as a wonderful format to manage indicator related datasets by indicators practitioners.

So stay tuned, because plenty of innovative stuff is coming!

Citizen DAN demo: The first live OSF instance

Structured Dynamics just released the Citizen DAN demo. This is the sum of nearly two years of efforts in developing different pieces of technologies such as structWSF, conStruct, irON and Semantic Components. Citizen DAN is the first OSF (Open Semantic Framework) instance.

This demo shows how we managed to get a subset of the US Census data related to the Iowa Metropolitain area, how we created a small ontology to describe its instance records, and how they got managed, displayed, browsable and searchable by using the complete tools stack we created for other purposes. All pieces have been integrated together around this Citizen DAN demo that Mike gave at SemTech 2010. We are now releasing a publicly accessible instance of this demo.

I am really proud of what we accomplished so far with the very little resources we are working with since two years. Even if we got nothing from our Knight News Challenge application, we were convinced that Citizen DAN was an important project to build and release for local communities. This is an important open source project geared to help local governments and communities to create value out of the data they own and to publish it in meaningful ways on the Web. It is why we used our small resources to create Citizen DAN. We managed to bootstrap ourselves even more, and we managed to get some early clients interested in investing resources in this project.

It is not just about Citizen DAN

Citizen DAN is one kind of OSF instance. However, OSF can have multiple incarnations. The framework is geared so that any kind of data can be indexed, managed and published by this same framework. We can think of usecases in the financial, consumer and business sectors just to name a few.

Next steps

In the near future, we will release new and updated tools and services; we will add value to the framework. We will create new online services, in other sectors, that also leverage OSF.

What about documentation?

More and more documentation will be written on the TechWiki. We are committed to one thing going forward: documentation as we go; to make sure that our clients doesn’t require us to maintain their instances.

Is there a supporting community?

We will also work hard to develop the community around all pieces of OSF. We already have some active members in the community. Some of them will start committing new code and tools; and writing new documentation on the TechWiki. We are expecting to see a significant growth in the community for the next year.

Each thing that get committed by any members of the community will benefits all other members. So far, all our clients committed the result of their work to the project, because they know that this small investment would worth much more as the community grows by getting freebees from our other clients, and other members committing resources into the development of any OSF piece.

The places to start with the community is on the OpenStructs Community web site, and the OSF Mailing List.

Conclusion

This is just the beginning.

I would encourage your to read Mike’s blog post about this new release to have more background information on OSF.

Global structWSF Statistics Report

triple_120Today we released a simple structWSF nodes statistics report. It aggregates different statistics from all know (and accessible) structWSF nodes on the Web. It is still in its early stage, but aggregated statistics so far are quite interesting.

This global statistics reports has two aims:

  1. Monitoring the evolution of the usage of structWSF, and
  2. Monitoring the overall performance of structWSF web services in different setups for different usages

The report is accessible here in all time. The report is updated hourly.

Overall Statistics

The main statistics of the report are:

  • The number of structWSF nodes participating to the report
  • The total number of HTTP queries processed by the structWSF nodes
  • The total number of datasets created on the nodes
  • The total number of records indexed, and
  • The total number of triples indexed

These statistics gives a general overview of the size of the “global structWSF network of nodes”.

Web Service Statistics

Each Web service endpoint has its own statistics, which are:

  • The number of queries processed by the web service
  • The average time it took to process the query (without the network latency between the requested and the web service endpoint server)
  • All the requested mime-types, and the number of times a mime-type have been requested, and
  • All the HTTP response code returned by the endpoint

These Web service specific statistics are helpful to have a general understanding of each web service endpoint.

The average time per query is helpful to know what kind of performance a developer should expect when using this web service endpoint.

The list of requested MIME types gives an overall usage of the web service endpoint: are users mostly requesting XML data, JSON data, RDF+XML data, etc. Such usage statistics is helpful to prioritize future development tasks.

The list of all HTTP response code is helpful to notice possible issues with a web service endpoint. If error codes are returned often, this could pinpoint a possible bug in the web service endpoint, an issue with its usage that could lead to a fix in the documentation, etc.

Participating to the Global structWSF Statistics Report

If you are operating a structWSF instance and want to participate to the Global structWSF Statistics Report, you first have to download the new statisticsBroker.php script and install it on your structWSF node.

The statistics broker script is what calculates the statistics of a structWSF node, and what is used to aggregate statistics from all nodes, to generate the consolidated report.

The first thing to do is to edit the file, and to change the value of the $enableStatisticsBroadcast variable from FALSE to TRUE at the line 46. This will enable the script.

Normally you should install the script in the root folder of your structWSF node, but you can install it anywhere on your server, where it will be accessible on the Web.

The final step is to register your node to the reporting system. It is just a matter of registering the URL address where the statisticsBroker.php script is accessible. It should be added to the global report within 24 hours, once I validated it.

Other Usage of the Statistics Broker

This is nice to participate to such global statistics report, but much more can be done with such a statistics broker.

A structWSF developer or a structWSF node maintainer could use it to have statistics of the local node. As described above, such statistics can be used to pinpoint possible performance issues, bottlenecks and possible bugs in web service endpoints. It could also be use to plan future extension of the network to scale some highly used web service endpoint in the network.

Additionally, the statistics broker could be used in a broader server maintenance architecture. It could be used in conjunction with another script to be part of a Ganglia monitoring system for example. Performances could be monitored by Ganglia, rate of requests per hours, raise in the number different HTTP response returned by some web services. Additionally, each of these statistics could be bound to different alerts notification messages that would alert the structWSF system maintainers and developers of possible issues with the network.

Next Step

The next step with the statistics broker will be to create a structWSF web service out of it. That way, structWSF node maintainers will be easily able to define access and usage permissions for such statistics.

structWSF Web Services Tutorial

One thing that was hard to do with structWSF was explaining what structWSF is, and how users can interact with it. For most people, structWSF was abstracted behind conStruct and they didn’t know that each single functionalities of conStruct was bound to one, or multiple queries to one, or multiple, structWSF instance.

It is the reason why we took the time to write a complete structWSF interaction tutorial. This tutorial explains what the general structWSF architecture is, and it describes a series of general interaction usecases. We hope that this tutorial will helps developers and system implementators understanding the capabilities of structWSF and how they can use it.

You can read the complete structWSF Web Services Tutorial here.

Additionally, we released a new version of structWSF, conStruct and the irJSON Parser which are products of this toturial.

New versions of structWSF and conStruct

triple_120construct_logo_120

We just released a new (major) version of both structWSF and conStruct. Though some months had passed since we last released this software, we finally got the time and opportunity to make these important upgrades.Many things have changed in both packages. I don’t want to iterate all the changes in this blog post, so I would suggest you to read the changes log files here:

These new versions have greatly been impacted by the needs of our clients. We also started to introduce some new concepts we wrote about the last few months.

A really good addition to this release is the a brand new Installation Manual. Hopefully people will be able to “easily” and properly install and setup a Web server to host these two packages.

All documentation files have been updated:

You can download both software packages from here:

An Amazon EC2/EBS Architecture

Some of the changes to these new versions have been made to help create, setup and maintain Web servers that host structWSF and conStruct instances.

At Structured Dynamics, we have developed and use a server architecture that leverages Amazon computer-in-the-clouds services such as: EC2, EBS, Elastic IP in the Cloud. Such an architecture is giving us the flexibility to easily maintain and upgrade server instances, to instantly create new structWSF instances in one click (without performing all these steps everytime), etc.

You can contact us for more information about these EC2 AMIs and EBS Volumes that we developed for this purpose. Here is an overview of the architecture that is now in place:

structwsf_amazon

There is a clear separation of concerns between three major things:

  • Software & libraries
  • Configuration files
  • Data files.

We chose to put all software and libraries needed to create a stand-alone structWSF instance in an EC2 AMI. This means that all needed software to run a structWSF instance is present on the Virtuoso server running Ubuntu server.

Then we chose to put all configuration and data files on an EBS volume that we attach, and mount, on the EC2 instance. You can think about a EBS volume as a physical hard drive: it can be mounted on a server instance, but it can’t be shared between multiple instances.

By splitting the software & libraries, configuration and data files, we make sure that we can easily upgrade a structWSF server in production with the latest version of structWSF (its code base and all related software such as Virtuoso, Solr, etc). Since the configuration and data files are not on the EC2 instance, we can easily create a new EC2 instance by using the latest structWSF AMI we produced, and then to mount the configuration and data files EBS volume on the new (and upgraded) structWSF instance. That way, in a few clicks, we can fully upgrade a server in production without fear of disturbing the configuration or data files.

Additionally, we can easily create backups of configuration and data files at different intervals by using Amazon’s Snapshot technology.

Finally, we chose to put all related software and configuration files needed to run a conStruct instance in another, separate, EBS volume. That way, we have a clean structWSF AMI instance that can be upgraded at any time, and we can plug (mount) a conStruct instance (EBS instance) into a structWSF server at any time. This means that we can easily have structWSF instances with or without a conStruct instance. The same strategy can easily be used to create plugin packages that can be mounted and unmounted to any structWSF instance at any time, depending on the needs.

All this makes structWSF server instances maintenance easier, simpler and faster.