Citizen DAN demo: The first live OSF instance

Structured Dynamics just released the Citizen DAN demo. This is the sum of nearly two years of efforts in developing different pieces of technologies such as structWSF, conStruct, irON and Semantic Components. Citizen DAN is the first OSF (Open Semantic Framework) instance.

This demo shows how we managed to get a subset of the US Census data related to the Iowa Metropolitain area, how we created a small ontology to describe its instance records, and how they got managed, displayed, browsable and searchable by using the complete tools stack we created for other purposes. All pieces have been integrated together around this Citizen DAN demo that Mike gave at SemTech 2010. We are now releasing a publicly accessible instance of this demo.

I am really proud of what we accomplished so far with the very little resources we are working with since two years. Even if we got nothing from our Knight News Challenge application, we were convinced that Citizen DAN was an important project to build and release for local communities. This is an important open source project geared to help local governments and communities to create value out of the data they own and to publish it in meaningful ways on the Web. It is why we used our small resources to create Citizen DAN. We managed to bootstrap ourselves even more, and we managed to get some early clients interested in investing resources in this project.

It is not just about Citizen DAN

Citizen DAN is one kind of OSF instance. However, OSF can have multiple incarnations. The framework is geared so that any kind of data can be indexed, managed and published by this same framework. We can think of usecases in the financial, consumer and business sectors just to name a few.

Next steps

In the near future, we will release new and updated tools and services; we will add value to the framework. We will create new online services, in other sectors, that also leverage OSF.

What about documentation?

More and more documentation will be written on the TechWiki. We are committed to one thing going forward: documentation as we go; to make sure that our clients doesn’t require us to maintain their instances.

Is there a supporting community?

We will also work hard to develop the community around all pieces of OSF. We already have some active members in the community. Some of them will start committing new code and tools; and writing new documentation on the TechWiki. We are expecting to see a significant growth in the community for the next year.

Each thing that get committed by any members of the community will benefits all other members. So far, all our clients committed the result of their work to the project, because they know that this small investment would worth much more as the community grows by getting freebees from our other clients, and other members committing resources into the development of any OSF piece.

The places to start with the community is on the OpenStructs Community web site, and the OSF Mailing List.

Conclusion

This is just the beginning.

I would encourage your to read Mike’s blog post about this new release to have more background information on OSF.

Semantic Components

For few months now at Structured Dynamics we have been developing what we call “Semantic Components“. A semantic component is an ontology-driven component, or widget, based on Flex. Such a component takes record descriptions, ontologies and target attributes/types as inputs and outputs some (possibly interactive) visualizations of the records. Depending on the logic described in the input schema and the input records descriptions, the semantic component may behave differently to optimize its own layout/behavior to users.

The purpose of these semantic components is to have a framework of adaptive user interfaces that can be plugged directly to structWSF Web service endpoint instances. The goal is to plug some data, schema and target attributes into these components, and then to let them change their behaviors and appearances depending on the input data and schema.

The picture is simple. We tell the components: here is a set of records serialized in structXML, here is a set of schema serialized in irXML, and here are the target attributes and types I want the components to display. Then, different components get selected and behave differently depending on how the schema have been defined, and how the records have been described.

Ultimately, development time is saved because developers don’t have to hard-code the appearance and the behavior of the user interfaces depending on the data and schema that the user interface was receiving at a certain point in time: the logic is built-in to the components.

Overall Workflow

These various semantic components get embedded in a layout canvas. By interacting with the various components, new queries are generated (most often as SPARQL queries) to the various structWSF Web services endpoints. The result of these requests is to generate a structured results set, which includes various types and attributes.

An internal ontology that embodies the desired behavior and display options (SCO, the Semantic Component Ontology) is matched with these types and attributes to generate the formal instructions to the semantic components. These instructions are presented via the sControl component, that determines which widgets (individual components) needs to be invoked and displayed on the layout canvas.

Semantic Components Framework

New interactions with the resulting displays and components cause the iteration path to be generated anew, again starting a new cycle of queries and results sets.

As these pathways and associated display components get created, they can be named and made persistent for later re-use or within dashboard invocations.

A Shift in Design Perspective

There is a bit of a user interface design shift here. User interfaces have always been developed to present information (data) to users, and to let them interact with it. When someone develops such an interface, he has to make thousands of decisions to enable the user interface to cope with different data description situations. Our semantic component framework tries to remove some of this burden on the shoulders of the designer so that it takes these decisions itself. Such decisions are in the range of:

  • The text control X displays the value of an attribute Y. If the attribute Y doesn’t exist in the description of a record A, then we have to remove it from the user interface.
    • Note: if the text control X gets removed from the interface, there is a good chance that we may have to change other controls as well so that the user interface remains usable to the users.
  • If the text control X gets removed, then there is no reason why its associated icon image should remain in the user interface, so let’s provide accommodations to remove it as well.
  • Some attributes describing the records have values that are comparable with related attributes, so let’s compare these values in a linear chart
  • Some records may be useful baselines for comparison with other records, so let’s allow that to be externally specified, too.
  • All these decisions are true for record A, but not for record B since we have a value to display for the text control X, so let’s behave differently by displaying the text control X and its associated icon image.
  • Etc.

All of these kinds of decisions are now made by the semantic components within our new framework depending on how the input records are described and what ontologies (schema) drive the system.

Thus, the designer can now put more time and effort on the questions of general layout and behavior, themes and styles for her applications, without caring much about how to display information for specific records descriptions.

Perhaps most significantly is that the behavior and presentation of information can now be described within these records and schema, an activity that users and knowledge workers can do directly, thus bypassing the need for IT and development. A new balance gets established: developers focus on creating generic tools (widgets or components); consumers of data (users and knowledge workers) determine how they want to display and compare their information.

Unbelievably Fast Implementation

While this shift or change may appear on its face to require some big new framework, the fact is we have been able to accomplish this with simple approaches leading to simple outcomes. Structured Dynamics has been able to put in place a complete Web portal of integrated data that publish all its data in several serialization languages, with many utilities by which users can interact with the data, slice and dice it, visualize it, and filter and manipulated it … and all of this in within two weeks of effort for one developer!

One good example of this is the Citizen Dan demo, composed of Census data and stories related to the Iowa City Metropolitan Area that Mike presented at SemTech 2010 (and some screenshots).

Oh, and did I mention? This system handles text, images, tags, maps, dashboards, numeric data and any kind of structure you can throw at it. And all with the same set of generic components (to which we and others are adding).

More Information

Here is some more information about the semantic component framework and its related pieces:

This is an alpha version of the library. We would also welcome any contributor to the project! We hope you like what you see and that you will be able to leverage it the way we did so that you, and your team, can save as much time as we did!

Global structWSF Statistics Report

triple_120Today we released a simple structWSF nodes statistics report. It aggregates different statistics from all know (and accessible) structWSF nodes on the Web. It is still in its early stage, but aggregated statistics so far are quite interesting.

This global statistics reports has two aims:

  1. Monitoring the evolution of the usage of structWSF, and
  2. Monitoring the overall performance of structWSF web services in different setups for different usages

The report is accessible here in all time. The report is updated hourly.

Overall Statistics

The main statistics of the report are:

  • The number of structWSF nodes participating to the report
  • The total number of HTTP queries processed by the structWSF nodes
  • The total number of datasets created on the nodes
  • The total number of records indexed, and
  • The total number of triples indexed

These statistics gives a general overview of the size of the “global structWSF network of nodes”.

Web Service Statistics

Each Web service endpoint has its own statistics, which are:

  • The number of queries processed by the web service
  • The average time it took to process the query (without the network latency between the requested and the web service endpoint server)
  • All the requested mime-types, and the number of times a mime-type have been requested, and
  • All the HTTP response code returned by the endpoint

These Web service specific statistics are helpful to have a general understanding of each web service endpoint.

The average time per query is helpful to know what kind of performance a developer should expect when using this web service endpoint.

The list of requested MIME types gives an overall usage of the web service endpoint: are users mostly requesting XML data, JSON data, RDF+XML data, etc. Such usage statistics is helpful to prioritize future development tasks.

The list of all HTTP response code is helpful to notice possible issues with a web service endpoint. If error codes are returned often, this could pinpoint a possible bug in the web service endpoint, an issue with its usage that could lead to a fix in the documentation, etc.

Participating to the Global structWSF Statistics Report

If you are operating a structWSF instance and want to participate to the Global structWSF Statistics Report, you first have to download the new statisticsBroker.php script and install it on your structWSF node.

The statistics broker script is what calculates the statistics of a structWSF node, and what is used to aggregate statistics from all nodes, to generate the consolidated report.

The first thing to do is to edit the file, and to change the value of the $enableStatisticsBroadcast variable from FALSE to TRUE at the line 46. This will enable the script.

Normally you should install the script in the root folder of your structWSF node, but you can install it anywhere on your server, where it will be accessible on the Web.

The final step is to register your node to the reporting system. It is just a matter of registering the URL address where the statisticsBroker.php script is accessible. It should be added to the global report within 24 hours, once I validated it.

Other Usage of the Statistics Broker

This is nice to participate to such global statistics report, but much more can be done with such a statistics broker.

A structWSF developer or a structWSF node maintainer could use it to have statistics of the local node. As described above, such statistics can be used to pinpoint possible performance issues, bottlenecks and possible bugs in web service endpoints. It could also be use to plan future extension of the network to scale some highly used web service endpoint in the network.

Additionally, the statistics broker could be used in a broader server maintenance architecture. It could be used in conjunction with another script to be part of a Ganglia monitoring system for example. Performances could be monitored by Ganglia, rate of requests per hours, raise in the number different HTTP response returned by some web services. Additionally, each of these statistics could be bound to different alerts notification messages that would alert the structWSF system maintainers and developers of possible issues with the network.

Next Step

The next step with the statistics broker will be to create a structWSF web service out of it. That way, structWSF node maintainers will be easily able to define access and usage permissions for such statistics.

structWSF Web Services Tutorial

One thing that was hard to do with structWSF was explaining what structWSF is, and how users can interact with it. For most people, structWSF was abstracted behind conStruct and they didn’t know that each single functionalities of conStruct was bound to one, or multiple queries to one, or multiple, structWSF instance.

It is the reason why we took the time to write a complete structWSF interaction tutorial. This tutorial explains what the general structWSF architecture is, and it describes a series of general interaction usecases. We hope that this tutorial will helps developers and system implementators understanding the capabilities of structWSF and how they can use it.

You can read the complete structWSF Web Services Tutorial here.

Additionally, we released a new version of structWSF, conStruct and the irJSON Parser which are products of this toturial.

Behind Oz’s Curtain

Benjamin Nowack, creator of ARC and Trice, wrote an interesting blog post about the place of Microformats and RDFa in the HTML 5 specification. I am not deep into the specification itself, and so may lack some history context. However, the most interesting point in this article is not related to Microformats, RDFx or the new HTML 5 specification.

The point is that apparently, some people believe that it is RDF or nothing. This is not new, but is that true?

People (and particularly enterprises) want the benefits of structured data, not necessarily RDF. In fact, many people don’t know about RDF, or don’t understand RDF, or just don’t care about RDF. But, is it because you don’t know, understand or care about RDF that you cannot benefit from it? No, certainly not. And I think that is what Benjamin is talking about when he mentions things such as: “[…] to get RDF to the broader developer community“, “[…] here could have been a solution that would have served everybody sufficiently well, both HTMLers and RDFers“. “[…] they would most probably have been able to define RDFa 1.1 as a proper superset of Microdata”. RDF can be incarnated in multiple bodies, but it is still RDF. I think it is what Benjamin was suggesting, and it the path we took at Structured Dynamics.

We choose to use RDF behind Oz’s curtain. This means that at the core of any of our methodologies, systems and specifications, we use RDF. Why? Because it is the more flexible description framework available that helps us handle any other source of data. However, does that mean that we should push RDF in everybody’s face? Certainly not.

Our work with different enterprises from all kind of domains told us that we have to look beyond RDF while still using it (as paradoxically as that may appear). For example, we developed structWSF and conStruct such that people can upload (and manage) their data in different formats while being able to export it in all other different formats. At the core, these systems use RDF to manipulate all these different kind of formats, but from the outside, users simply use the format they care about, they use, or that they have available in their workflow. These users benefits from RDF without knowing it, understanding it or without caring about it. We don’t think RDF is for everyone, but everyone can benefit from RDF.

Another example of RDF behind Oz’s curtain is the irON description framework and its three serialization profiles: irJSON, irXML and commON that we developed. As stated in the Purpose section of this document, the goal was quite clear:

irON (instance record and Object Notation) is a abstract notation and associated vocabulary for specifying RDF triples and schema in non-RDF forms. Its purpose is to allow users and tools in non-RDF formats to stage interoperable datasets using RDF. The notation supports writing RDF and schema in JSON (irJSON), XML (irXML) and comma-delimited (CSV) formats (commON). The notation specification includes guidance for creating instance records (including in bulk), linkages to existing ontologies and schema, and schema definitions. Profiles and examples are also provided for each of the irXML, irJSON and commON serializations.

irON is premised on these considerations and observations:

  • RDF (Resource Description Framework) is a powerful canonical data model for data interoperability
  • However, most existing data is not written in RDF and many authors and publishers prefer other formats for various reasons
  • Many formats that are easier to author and read than RDF are variants of the attribute-value pair construct [2], which can readily be expressed as RDF, and
  • A common abstract notation for converting to RDF would also enable non-RDF formats to become somewhat interchangeable, thus allowing the strengths of each to be combined.

The irON notation and vocabulary is designed to allow the conceptual structure (“schema”) of datasets to be described, to facilitate easy description of the instance records that populate those datasets, and to link different structures for different schema to one another. In these manners, more-or-less complete RDF data structures and instances can be described in alternate formats and be made interoperable. irON provides a simple and naive information exchange notation expressive enough to describe most any data entity.

I think this is what Benjamin was talking about in his article, and the kind of mindset he was suggesting the RDF community to adopt. At least this is the minding we adopted at Structured Dynamics, and apparently it is the minding Benjamin adopted for his own business. I am sure there are many other people and organizations out there that are adopting the same point of view according to RDF and its role in the current data ecosystem.