<?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Frederick Giasson's Weblog</title>
	<atom:link href="http://fgiasson.com/blog/index.php/feed/" rel="self" type="application/rss+xml" />
	<link>http://fgiasson.com/blog</link>
	<description></description>
	<lastBuildDate>Tue, 06 Jul 2010 01:14:18 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Semantic Components</title>
		<link>http://fgiasson.com/blog/index.php/2010/07/05/semantic-components/</link>
		<comments>http://fgiasson.com/blog/index.php/2010/07/05/semantic-components/#comments</comments>
		<pubDate>Mon, 05 Jul 2010 21:32:32 +0000</pubDate>
		<dc:creator>Fred</dc:creator>
				<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Dynamics]]></category>

		<guid isPermaLink="false">http://fgiasson.com/blog/?p=1055</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Semantic Components&amp;rft.aulast=Giasson&amp;rft.aufirst=Frédérick&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.source=Frederick Giasson&#8217;s Weblog&amp;rft.date=2010-07-05&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://fgiasson.com/blog/index.php/2010/07/05/semantic-components/&amp;rft.language=English"></span>
For few months now at Structured Dynamics we have been developing what we call &#8220;Semantic Components&#8220;. A semantic component is an ontology-driven component, or widget, based on Flex. Such a component takes record descriptions, ontologies and target attributes/types as inputs and outputs some (possibly interactive) visualizations of the records. Depending on the logic described in [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Semantic Components&amp;rft.aulast=Giasson&amp;rft.aufirst=Frédérick&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.source=Frederick Giasson&#8217;s Weblog&amp;rft.date=2010-07-05&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://fgiasson.com/blog/index.php/2010/07/05/semantic-components/&amp;rft.language=English"></span>
<p>For few months now at <a href="http://structureddynamics.com/">Structured Dynamics</a> we have been developing what we call &#8220;<a href="http://openstructs.org/semantic-components/">Semantic Components</a>&#8220;. A semantic component is an ontology-driven component, or widget, based on Flex. Such a component takes record descriptions, ontologies and target attributes/types as inputs and outputs some (possibly interactive) visualizations of the records. Depending on the logic described in the input schema and the input records descriptions, the semantic component may behave differently to optimize its own layout/behavior to users.</p>
<p>The purpose of these semantic components is to have a framework of adaptive user interfaces that can be plugged directly to <a href="http://openstructs.org/structwsf">structWSF</a> Web service endpoint instances. The goal is to plug some data, schema and target attributes into these components, and then to let them change their behaviors and appearances depending on the input data and schema.</p>
<p>The picture is simple. We tell the components: here is a set of records serialized in <a href="http://openstructs.org/structxml">structXML</a>, here is a set of <a href="http://openstructs.org/iron/iron-specification#mozTocId217695">schema</a> serialized in <a href="http://openstructs.org/iron/iron-specification#mozTocId408837">irXML</a>, and here are the target attributes and types I want the components to display. Then, different components get selected and behave differently depending on how the schema have been defined, and how the records have been described.</p>
<p>Ultimately, development time is saved because developers don&#8217;t have to hard-code the appearance and the behavior of the user interfaces depending on the data and schema that the user interface was receiving at a certain point in time: the logic is built-in to the components.</p>
<h3>Overall Workflow</h3>
<p>These various semantic components get embedded in a layout canvas. By interacting with the various components, new queries are generated (most often as SPARQL queries) to the various structWSF Web services endpoints. The result of these requests is to generate a structured results set, which includes various types and attributes.</p>
<p>An internal ontology that embodies the desired behavior and display options (SCO, the Semantic Component Ontology) is matched with these types and attributes to generate the formal instructions to the semantic components. These instructions are presented via the sControl component, that determines which widgets (individual components) needs to be invoked and displayed on the layout canvas.</p>
<p><img class="aligncenter size-full wp-image-1056" title="Semantic Components Framework" src="http://fgiasson.com/blog/wp-content/uploads/2010/07/Untitled1.png" alt="Semantic Components Framework" width="433" height="430" /></p>
<p>New interactions with the resulting displays and components cause the iteration path to be generated anew, again starting a new cycle of queries and results sets.</p>
<p>As these pathways and associated display components get created, they can be named and made persistent for later re-use or within dashboard invocations.</p>
<h3>A Shift in Design Perspective</h3>
<p>There is a bit of a user interface design shift here. User interfaces have always been developed to present information (data) to users, and to let them interact with it. When someone develops such an interface, he has to make thousands of decisions to enable the user interface to cope with different data description situations. Our semantic component framework tries to remove some of this burden on the shoulders of the designer so that it takes these decisions itself. Such decisions are in the range of:</p>
<ul>
<li>The      text control X displays the value of an attribute Y. If the attribute Y      doesn&#8217;t exist in the description of a record A, then we have to remove it      from the user interface.
<ul>
<li>Note:       if the text control X gets removed from the interface, there is a good       chance that we may have to change other controls as well so that the user       interface remains usable to the users.</li>
</ul>
</li>
<li>If the      text control X gets removed, then there is no reason why its associated icon      image should remain in the user interface, so let&#8217;s provide accommodations      to remove it as well.</li>
<li>Some      attributes describing the records have values that are comparable with      related attributes, so let&#8217;s compare these values in a linear chart</li>
<li>Some      records may be useful baselines for comparison with other records, so      let&#8217;s allow that to be externally specified, too.</li>
<li>All these      decisions are true for record A, but not for record B since we have a      value to display for the text control X, so let&#8217;s behave differently by      displaying the text control X and its associated icon image.</li>
<li>Etc.</li>
</ul>
<p>All of these kinds of decisions are now made by the semantic components within our new framework depending on how the input records are described and what ontologies (schema) drive the system.</p>
<p>Thus, the designer can now put more time and effort on the questions of general layout and behavior, themes and styles for her applications, without caring much about how to display information for specific records descriptions.</p>
<p>Perhaps most significantly is that the behavior and presentation of information can now be described within these records and schema, an activity that users and knowledge workers can do directly, thus bypassing the need for IT and development. A new balance gets established: developers focus on creating generic tools (widgets or components); consumers of data (users and knowledge workers) determine how they want to display and compare their information.</p>
<h3>Unbelievably Fast Implementation</h3>
<p>While this shift or change may appear on its face to require some big new framework, the fact is we have been able to accomplish this with simple approaches leading to simple outcomes. Structured Dynamics has been able to put in place a complete Web portal of integrated data that publish all its data in several serialization languages, with many utilities by which users can interact with the data, slice and dice it, visualize it, and filter and manipulated it &#8230; and all of this in within two weeks of effort for one developer!</p>
<p>One good example of this is the Citizen Dan demo, composed of Census data and stories related to the Iowa City Metropolitan Area that <a href="http://www.mkbergman.com/881/two-presentations-at-semtech-2010/">Mike presented at SemTech 2010</a> (<a href="http://citizen-dan.org/details.html">and some screenshots</a>).</p>
<p>Oh, and did I mention? This system handles text, images, tags, maps, dashboards, numeric data and any kind of structure you can throw at it. And all with the same set of generic components (to which we and others are adding).</p>
<h3>More Information</h3>
<p>Here is some more information about the semantic component framework and its related pieces:</p>
<ul>
<li><a href="http://openstructs.org/semantic-components/">Homepage</a></li>
<li><a href="http://openstructs.org/semantic-components/manual">Manual</a></li>
<li><a href="http://code.google.com/p/semanticcomponents/">Source code repository</a></li>
<li><a href="http://openstructs.org/doc/code/semanticcomponents/index.html">Code      documentation</a></li>
<li><a href="http://groups.google.com/group/open-semantic-framework">Discussion      forum</a></li>
<li><a href="http://openstructs.org/semantic-components/demos">Demos</a></li>
</ul>
<p>This is an alpha version of the library. We would also welcome any contributor to the project! We hope you like what you see and that you will be able to leverage it the way we did so that you, and your team, can save as much time as we did!</p>
]]></content:encoded>
			<wfw:commentRss>http://fgiasson.com/blog/index.php/2010/07/05/semantic-components/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Global structWSF Statistics Report</title>
		<link>http://fgiasson.com/blog/index.php/2010/04/09/global-structwsf-statistics-report/</link>
		<comments>http://fgiasson.com/blog/index.php/2010/04/09/global-structwsf-statistics-report/#comments</comments>
		<pubDate>Fri, 09 Apr 2010 15:53:18 +0000</pubDate>
		<dc:creator>Fred</dc:creator>
				<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[conStruct]]></category>
		<category><![CDATA[structWSF]]></category>

		<guid isPermaLink="false">http://fgiasson.com/blog/?p=1048</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Global structWSF Statistics Report&amp;rft.aulast=Giasson&amp;rft.aufirst=Frédérick&amp;rft.subject=Structured Dynamics&amp;rft.subject=conStruct&amp;rft.subject=structWSF&amp;rft.source=Frederick Giasson&#8217;s Weblog&amp;rft.date=2010-04-09&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://fgiasson.com/blog/index.php/2010/04/09/global-structwsf-statistics-report/&amp;rft.language=English"></span>
Today we released a simple structWSF nodes statistics report. It aggregates different statistics from all know (and accessible) structWSF nodes on the Web. It is still in its early stage, but aggregated statistics so far are quite interesting.
This global statistics reports has two aims:

Monitoring      the evolution of the usage of [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Global structWSF Statistics Report&amp;rft.aulast=Giasson&amp;rft.aufirst=Frédérick&amp;rft.subject=Structured Dynamics&amp;rft.subject=conStruct&amp;rft.subject=structWSF&amp;rft.source=Frederick Giasson&#8217;s Weblog&amp;rft.date=2010-04-09&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://fgiasson.com/blog/index.php/2010/04/09/global-structwsf-statistics-report/&amp;rft.language=English"></span>
<p><img class="alignright size-full wp-image-941" title="triple_120" src="http://fgiasson.com/blog/wp-content/uploads/2009/06/triple_120.png" alt="triple_120" width="120" height="120" />Today we released a simple structWSF nodes statistics report. It aggregates different statistics from all know (and accessible) structWSF nodes on the Web. It is still in its early stage, but aggregated statistics so far are quite interesting.</p>
<p>This global statistics reports has two aims:</p>
<ol>
<li>Monitoring      the evolution of the usage of structWSF, and</li>
<li>Monitoring      the overall performance of structWSF web services in different setups for      different usages</li>
</ol>
<p><a href="http://openstructs.org/structwsf/stats/">The report is accessible here in all time</a>. The report is updated hourly.</p>
<h3>Overall Statistics</h3>
<p>The main statistics of the report are:</p>
<ul>
<li>The      number of structWSF nodes participating to the report</li>
<li>The      total number of HTTP queries processed by the structWSF nodes</li>
<li>The      total number of datasets created on the nodes</li>
<li>The      total number of records indexed, and</li>
<li>The      total number of triples indexed</li>
</ul>
<p>These statistics gives a general overview of the size of the “global structWSF network of nodes”.</p>
<h3>Web Service Statistics</h3>
<p>Each Web service endpoint has its own statistics, which are:</p>
<ul>
<li>The      number of queries processed by the web service</li>
<li>The      average time it took to process the query (without the network latency      between the requested and the web service endpoint server)</li>
<li>All      the requested mime-types, and the number of times a mime-type have been      requested, and</li>
<li>All      the HTTP response code returned by the endpoint</li>
</ul>
<p>These Web service specific statistics are helpful to have a general understanding of each web service endpoint.</p>
<p>The average time per query is helpful to know what kind of performance a developer should expect when using this web service endpoint.</p>
<p>The list of requested MIME types gives an overall usage of the web service endpoint: are users mostly requesting XML data, JSON data, RDF+XML data, etc. Such usage statistics is helpful to prioritize future development tasks.</p>
<p>The list of all HTTP response code is helpful to notice possible issues with a web service endpoint. If error codes are returned often, this could pinpoint a possible bug in the web service endpoint, an issue with its usage that could lead to a fix in the documentation, etc.</p>
<h3>Participating to the Global structWSF Statistics Report</h3>
<p>If you are operating a structWSF instance and want to participate to the Global structWSF Statistics Report, you first have to download the new <a href="http://code.google.com/p/structwsf/source/browse/branches/dev/statisticsBroker.php">statisticsBroker.php script</a> and install it on your structWSF node.</p>
<p>The statistics broker script is what calculates the statistics of a structWSF node, and what is used to aggregate statistics from all nodes, to generate the consolidated report.</p>
<p>The first thing to do is to edit the file, and to change the value of the $enableStatisticsBroadcast variable from FALSE to TRUE at the line 46. This will enable the script.</p>
<p>Normally you should install the script in the root folder of your structWSF node, but you can install it anywhere on your server, where it will be accessible on the Web.</p>
<p><a href="http://openstructs.org/structwsf/stats/subscribe/">The final step is to register your node to the reporting system</a>. It is just a matter of registering the URL address where the statisticsBroker.php script is accessible. It should be added to the global report within 24 hours, once I validated it.</p>
<h3>Other Usage of the Statistics Broker</h3>
<p>This is nice to participate to such global statistics report, but much more can be done with such a statistics broker.</p>
<p>A structWSF developer or a structWSF node maintainer could use it to have statistics of the local node. As described above, such statistics can be used to pinpoint possible performance issues, bottlenecks and possible bugs in web service endpoints. It could also be use to plan future extension of the network to scale some highly used web service endpoint in the network.</p>
<p>Additionally, the statistics broker could be used in a broader server maintenance architecture. It could be used in conjunction with another script to be part of a <a href="http://ganglia.sourceforge.net/">Ganglia</a> monitoring system for example. Performances could be monitored by Ganglia, rate of requests per hours, raise in the number different HTTP response returned by some web services. Additionally, each of these statistics could be bound to different alerts notification messages that would alert the structWSF system maintainers and developers of possible issues with the network.</p>
<h3>Next Step</h3>
<p>The next step with the statistics broker will be to create a structWSF web service out of it. That way, structWSF node maintainers will be easily able to define access and usage permissions for such statistics.</p>
]]></content:encoded>
			<wfw:commentRss>http://fgiasson.com/blog/index.php/2010/04/09/global-structwsf-statistics-report/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>structWSF Web Services Tutorial</title>
		<link>http://fgiasson.com/blog/index.php/2010/02/18/structwsf-web-services-tutorial/</link>
		<comments>http://fgiasson.com/blog/index.php/2010/02/18/structwsf-web-services-tutorial/#comments</comments>
		<pubDate>Thu, 18 Feb 2010 21:45:40 +0000</pubDate>
		<dc:creator>Fred</dc:creator>
				<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[conStruct]]></category>
		<category><![CDATA[irON]]></category>
		<category><![CDATA[structWSF]]></category>

		<guid isPermaLink="false">http://fgiasson.com/blog/?p=1044</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=structWSF Web Services Tutorial&amp;rft.aulast=Giasson&amp;rft.aufirst=Frédérick&amp;rft.subject=Structured Dynamics&amp;rft.subject=conStruct&amp;rft.subject=irON&amp;rft.subject=structWSF&amp;rft.source=Frederick Giasson&#8217;s Weblog&amp;rft.date=2010-02-18&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://fgiasson.com/blog/index.php/2010/02/18/structwsf-web-services-tutorial/&amp;rft.language=English"></span>
One thing that was hard to do with structWSF was explaining what structWSF is, and how users can interact with it. For most people, structWSF was abstracted behind conStruct and they didn’t know that each single functionalities of conStruct was bound to one, or multiple queries to one, or multiple, structWSF instance.
It is the reason [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=structWSF Web Services Tutorial&amp;rft.aulast=Giasson&amp;rft.aufirst=Frédérick&amp;rft.subject=Structured Dynamics&amp;rft.subject=conStruct&amp;rft.subject=irON&amp;rft.subject=structWSF&amp;rft.source=Frederick Giasson&#8217;s Weblog&amp;rft.date=2010-02-18&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://fgiasson.com/blog/index.php/2010/02/18/structwsf-web-services-tutorial/&amp;rft.language=English"></span>
<p>One thing that was hard to do with <a href="http://openstructs.org/structwsf/">structWSF</a> was explaining what structWSF is, and how users can interact with it. For most people, structWSF was abstracted behind <a href="http://constructscs.com/">conStruct</a> and they didn’t know that each single functionalities of conStruct was bound to one, or multiple queries to one, or multiple, structWSF instance.</p>
<p>It is the reason why we took the time to write a complete structWSF interaction tutorial. This tutorial explains what the general structWSF architecture is, and it describes a series of general interaction usecases. We hope that this tutorial will helps developers and system implementators understanding the capabilities of structWSF and how they can use it.</p>
<p><a href="http://openstructs.org/structwsf/web-services-tutorial">You can read the complete structWSF Web Services Tutorial here.</a></p>
<p>Additionally, we released a new version of <a href="http://openstructs.org/blog/2010/2/fgiasson/structwsf-10a5-released">structWSF</a>, <a href="http://constructscs.com/blog/fgiasson/2010/2/construct-6x-1x-dev-5-released">conStruct</a> and the <a href="http://openstructs.org/blog/2010/2/fgiasson/irjson-parser-10a2-released">irJSON Parser</a> which are products of this toturial.</p>
]]></content:encoded>
			<wfw:commentRss>http://fgiasson.com/blog/index.php/2010/02/18/structwsf-web-services-tutorial/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Behind Oz&#8217;s Curtain</title>
		<link>http://fgiasson.com/blog/index.php/2010/01/27/behind-ozs-curtain/</link>
		<comments>http://fgiasson.com/blog/index.php/2010/01/27/behind-ozs-curtain/#comments</comments>
		<pubDate>Wed, 27 Jan 2010 20:50:41 +0000</pubDate>
		<dc:creator>Fred</dc:creator>
				<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Dynamics]]></category>

		<guid isPermaLink="false">http://fgiasson.com/blog/?p=1030</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Behind Oz&#8217;s Curtain&amp;rft.aulast=Giasson&amp;rft.aufirst=Frédérick&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.source=Frederick Giasson&#8217;s Weblog&amp;rft.date=2010-01-27&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://fgiasson.com/blog/index.php/2010/01/27/behind-ozs-curtain/&amp;rft.language=English"></span>
Benjamin Nowack, creator of ARC and Trice, wrote an interesting blog post about the place of Microformats and RDFa in the HTML 5 specification. I am not deep into the specification itself, and so may lack some history context. However, the most interesting point in this article is not related to Microformats, RDFx or the [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Behind Oz&#8217;s Curtain&amp;rft.aulast=Giasson&amp;rft.aufirst=Frédérick&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.source=Frederick Giasson&#8217;s Weblog&amp;rft.date=2010-01-27&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://fgiasson.com/blog/index.php/2010/01/27/behind-ozs-curtain/&amp;rft.language=English"></span>
<p>Benjamin Nowack, creator of <a href="http://arc.semsol.org/">ARC</a> and <a href="http://trice.semsol.org/">Trice</a>, <a href="http://bnode.org/blog/2010/01/26/microdata-semantic-markup-for-both-rdfers-and-non-rdfers">wrote an interesting blog post about the place of Microformats and RDFa in the HTML 5 specification</a>. I am not deep into the specification itself, and so may lack some history context. However, the most interesting point in this article is not related to Microformats, RDFx or the new HTML 5 specification.</p>
<p>The point is that apparently, some people believe that it is RDF or nothing. This is not new, but is that true?</p>
<p>People (and particularly enterprises) want the benefits of structured data, not necessarily RDF. In fact, many people don&#8217;t know about RDF, or don&#8217;t understand RDF, or just don&#8217;t care about RDF. But, is it because you don&#8217;t know, understand or care about RDF that you cannot benefit from it? No, certainly not. And I think that is what Benjamin is talking about when he mentions things such as: &#8220;[...] <em>to get RDF to the broader developer community</em>&#8220;, &#8220;[...] <em>here could have been a solution that would have served everybody sufficiently well, both HTMLers and RDFers</em>&#8220;. &#8220;[...] <em>they would most probably have been able to define RDFa 1.1 as a proper superset of Microdata&#8221;</em>. RDF can be incarnated in multiple bodies, but it is still RDF. I think it is what Benjamin was suggesting, and it the path we took at <a href="http://structureddynamics.com">Structured Dynamics</a>.</p>
<p>We choose to use RDF behind Oz&#8217;s curtain. This means that at the core of any of our methodologies, systems and specifications, we use RDF. Why? Because it is the more flexible description framework available that helps us handle any other source of data. However, does that mean that we should push RDF in everybody&#8217;s face? Certainly not.</p>
<p>Our work with different enterprises from all kind of domains told us that we have to look beyond RDF while still using it (as paradoxically as that may appear). For example, we developed <a href="http://openstructs.org/structwsf/">structWSF</a> and <a href="http://constructscs.com">conStruct</a> such that people can upload (and manage) their data in different formats while being able to export it in all other different formats. At the core, these systems use RDF to manipulate all these different kind of formats, but from the outside, users simply use the format they care about, they use, or that they have available in their workflow. These users benefits from RDF without knowing it, understanding it or without caring about it. We don’t think RDF is for everyone, but everyone can benefit from RDF.</p>
<p>Another example of RDF behind Oz&#8217;s curtain is the <a href="http://openstructs.org/iron/iron-specification">irON</a> description framework and its three serialization profiles: <a href="http://openstructs.org/iron/iron-specification#mozTocId462570">irJSON</a>, <a href="http://openstructs.org/iron/iron-specification#mozTocId408837">irXML</a> and <a href="http://openstructs.org/iron/iron-specification#mozTocId603499">commON</a> that we developed. As stated in the <a href="http://openstructs.org/iron/iron-specification#mozTocId212042">Purpose section</a> of this document, the goal was quite clear:</p>
<p style="padding-left: 30px;"><em>irON (instance record and Object Notation) is a abstract notation and associated vocabulary for specifying RDF triples and schema in non-RDF forms. Its purpose is to allow users and tools in non-RDF formats to stage interoperable datasets using RDF. The notation supports writing RDF and schema in JSON (irJSON), XML (irXML) and comma-delimited (CSV) formats (commON). The notation specification includes guidance for creating instance records (including in bulk), linkages to existing ontologies and schema, and schema definitions. Profiles and examples are also provided for each of the irXML, irJSON and commON serializations.</em></p>
<p style="padding-left: 30px;"><em> </em></p>
<p style="padding-left: 30px;"><em>irON is premised on these considerations and observations:</em></p>
<p style="padding-left: 30px;"><em> </em></p>
<ul style="padding-left: 30px;">
<li><em>RDF (Resource Description Framework) is a powerful canonical data model for data interoperability</em></li>
<li><em>However, most existing data is not written in RDF and many authors and publishers prefer other formats for various reasons</em></li>
<li><em>Many formats that are easier to author and read than RDF are variants of the attribute-value pair construct [2], which can readily be expressed as RDF, and</em></li>
<li><em>A common abstract notation for converting to RDF would also enable non-RDF formats to become somewhat interchangeable, thus allowing the strengths of each to be combined.</em></li>
</ul>
<p style="padding-left: 30px;"><em> </em></p>
<p style="padding-left: 30px;"><em>The irON notation and vocabulary is designed to allow the conceptual structure (&#8221;schema&#8221;) of datasets to be described, to facilitate easy description of the instance records that populate those datasets, and to link different structures for different schema to one another. In these manners, more-or-less complete RDF data structures and instances can be described in alternate formats and be made interoperable. irON provides a simple and naive information exchange notation expressive enough to describe most any data entity.</em></p>
<p>I think this is what Benjamin was talking about in his article, and the kind of mindset he was suggesting the RDF community to adopt. At least this is the minding we adopted at Structured Dynamics, and apparently it is the minding Benjamin adopted for his own business. I am sure there are many other people and organizations out there that are adopting the same point of view according to RDF and its role in the current data ecosystem.</p>
]]></content:encoded>
			<wfw:commentRss>http://fgiasson.com/blog/index.php/2010/01/27/behind-ozs-curtain/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New versions of structWSF and conStruct</title>
		<link>http://fgiasson.com/blog/index.php/2010/01/20/new-versions-of-structwsf-and-construct/</link>
		<comments>http://fgiasson.com/blog/index.php/2010/01/20/new-versions-of-structwsf-and-construct/#comments</comments>
		<pubDate>Wed, 20 Jan 2010 22:42:34 +0000</pubDate>
		<dc:creator>Fred</dc:creator>
				<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[conStruct]]></category>
		<category><![CDATA[structWSF]]></category>

		<guid isPermaLink="false">http://fgiasson.com/blog/?p=1024</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=New versions of structWSF and conStruct&amp;rft.aulast=Giasson&amp;rft.aufirst=Frédérick&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=conStruct&amp;rft.subject=structWSF&amp;rft.source=Frederick Giasson&#8217;s Weblog&amp;rft.date=2010-01-20&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://fgiasson.com/blog/index.php/2010/01/20/new-versions-of-structwsf-and-construct/&amp;rft.language=English"></span>

We just released a new (major) version of both structWSF and conStruct. Though some months had passed since we last released this software, we finally got the time and opportunity to make these important upgrades. Many things have changed in both packages. I don’t want to iterate all the changes in this blog post, so I [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=New versions of structWSF and conStruct&amp;rft.aulast=Giasson&amp;rft.aufirst=Frédérick&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=conStruct&amp;rft.subject=structWSF&amp;rft.source=Frederick Giasson&#8217;s Weblog&amp;rft.date=2010-01-20&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://fgiasson.com/blog/index.php/2010/01/20/new-versions-of-structwsf-and-construct/&amp;rft.language=English"></span>
<p><img class="size-full wp-image-941 alignright" title="triple_120" src="http://fgiasson.com/blog/wp-content/uploads/2009/06/triple_120.png" alt="triple_120" width="120" height="120" /><img class="alignright size-full wp-image-942" title="construct_logo_120" src="http://fgiasson.com/blog/wp-content/uploads/2009/06/construct_logo_120.png" alt="construct_logo_120" width="120" height="120" /></p>
<p>We just released a new (major) version of both structWSF and conStruct. Though some months had passed since we last released this software, we finally got the time and opportunity to make these important upgrades. Many things have changed in both packages. I don’t want to iterate all the changes in this blog post, so I would suggest you to read the changes log files here:</p>
<ul>
<li><a href="http://community.openstructs.org/content/structwsf-10a4">structWSF      changes log</a></li>
<li><a href="http://community.openstructs.org/content/construct-6x-1x-dev-4">conStruct      changes log</a></li>
</ul>
<p>These new versions have greatly been impacted by the needs of our clients. We also started to introduce some new concepts we wrote about the last few months.</p>
<p>A really good addition to this release is the <a href="http://openstructs.org/structwsf/installation-guide">a brand new Installation Manual</a>. Hopefully people will be able to “easily” and properly install and setup a Web server to host these two packages.</p>
<p>All documentation files have been updated:</p>
<ul>
<li><a href="http://openstructs.org/structwsf/individual-ws-documentation">structWSF      Web Service Endpoints documentation</a></li>
<li><a href="http://openstructs.org/doc/code/structwsf/index.html">structWSF code      documentation</a></li>
<li><a href="http://constructscs.com/doc/code/construct/index.html">conStruct      code documentation</a></li>
</ul>
<p>You can download both software packages from here:</p>
<ul>
<li><a href="http://structwsf.googlecode.com/files/structwsf-1.0a4.zip">structWSF      version 1.0a4</a></li>
<li><a href="http://drupal.org/project/construct">conStruct version 6.x-1.x-dev-4</a> (Drupal should create the new package within 1 day)</li>
</ul>
<h2>An Amazon EC2/EBS Architecture</h2>
<p>Some of the changes to these new versions have been made to help create, setup and maintain Web servers that host structWSF and conStruct instances.</p>
<p>At Structured Dynamics, we have developed and use a server architecture that leverages Amazon computer-in-the-clouds services such as: EC2, EBS, Elastic IP in the Cloud. Such an architecture is giving us the flexibility to easily maintain and upgrade server instances, to instantly create new <strong>structWSF</strong> instances in one click (without performing all these steps everytime), etc.</p>
<p>You can contact us for more information about these EC2 AMIs and EBS Volumes that we developed for this purpose. Here is an overview of the architecture that is now in place:</p>
<p><img class="aligncenter size-full wp-image-1025" title="structwsf_amazon" src="http://fgiasson.com/blog/wp-content/uploads/2010/01/structwsf_amazon.png" alt="structwsf_amazon" width="501" height="446" /></p>
<p>There is a clear separation of concerns between three major things:</p>
<ul>
<li>Software &amp; libraries</li>
<li>Configuration files</li>
<li>Data files.</li>
</ul>
<p>We chose to put all software and libraries needed to create a stand-alone <strong>structWSF</strong> instance in an EC2 AMI. This means that all needed software to run a <strong>structWSF</strong> instance is present on the Virtuoso server running Ubuntu server.</p>
<p>Then we chose to put all configuration and data files on an EBS volume that we attach, and mount, on the EC2 instance. You can think about a EBS volume as a physical hard drive: it can be mounted on a server instance, but it can&#8217;t be shared between multiple instances.</p>
<p>By splitting the software &amp; libraries, configuration and data files, we make sure that we can easily upgrade a <strong>structWSF</strong> server in production with the latest version of <strong>structWSF</strong> (its code base and all related software such as Virtuoso, Solr, etc). Since the configuration and data files are not on the EC2 instance, we can easily create a new EC2 instance by using the latest <strong>structWSF</strong> AMI we produced, and then to mount the configuration and data files EBS volume on the new (and upgraded) <strong>structWSF</strong> instance. That way, in a few clicks, we can fully upgrade a server in production without fear of disturbing the configuration or data files.</p>
<p>Additionally, we can easily create backups of configuration and data files at different intervals by using Amazon&#8217;s Snapshot technology.</p>
<p>Finally, we chose to put all related software and configuration files needed to run a <strong>conStruct</strong> instance in another, separate, EBS volume. That way, we have a clean <strong>structWSF</strong> AMI instance that can be upgraded at any time, and we can <em>plug</em> (mount) a <strong>conStruct</strong> instance (EBS instance) into a <strong>structWSF</strong> server at any time. This means that we can easily have <strong>structWSF</strong> instances with or without a <strong>conStruct</strong> instance. The same strategy can easily be used to create <em>plugin packages</em> that can be mounted and unmounted to any <strong>structWSF</strong> instance at any time, depending on the needs.</p>
<p>All this makes <strong>structWSF</strong> server instances maintenance easier, simpler and faster.</p>
]]></content:encoded>
			<wfw:commentRss>http://fgiasson.com/blog/index.php/2010/01/20/new-versions-of-structwsf-and-construct/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>When Linked Data Rules Fail</title>
		<link>http://fgiasson.com/blog/index.php/2009/11/16/when-linked-data-rules-fail/</link>
		<comments>http://fgiasson.com/blog/index.php/2009/11/16/when-linked-data-rules-fail/#comments</comments>
		<pubDate>Mon, 16 Nov 2009 17:03:11 +0000</pubDate>
		<dc:creator>Fred</dc:creator>
				<category><![CDATA[Semantic Web]]></category>

		<guid isPermaLink="false">http://fgiasson.com/blog/?p=1003</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=When Linked Data Rules Fail&amp;rft.aulast=Giasson&amp;rft.aufirst=Frédérick&amp;rft.subject=Semantic Web&amp;rft.source=Frederick Giasson&#8217;s Weblog&amp;rft.date=2009-11-16&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://fgiasson.com/blog/index.php/2009/11/16/when-linked-data-rules-fail/&amp;rft.language=English"></span>

High Visibility Problems with NYT, data.gov Show Need for Better
Practices
When I say, &#8220;shot&#8221;, what do you think of? A flu shot? A shot of whisky? A moon shot? A gun shot? What if I add the term &#8220;bank&#8221;? Do you now think of someone being shot in an armed robbery of a local bank or [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=When Linked Data Rules Fail&amp;rft.aulast=Giasson&amp;rft.aufirst=Frédérick&amp;rft.subject=Semantic Web&amp;rft.source=Frederick Giasson&#8217;s Weblog&amp;rft.date=2009-11-16&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://fgiasson.com/blog/index.php/2009/11/16/when-linked-data-rules-fail/&amp;rft.language=English"></span>
<p><a href="http://www.adhd-mindbydesign.com"><img style="border: 0px solid; width: 220px; height: 223px; float: left; margin-right: 10px;" title="Image Source: www.adhd-mindbydesign.com" src="http://fgiasson.com/blog/wp-content/uploads/2009/11/091115_disconnected.jpg" alt="Image Source: www.adhd-mindbydesign.com" hspace="5" vspace="5" align="left" /></a></p>
<h2>High Visibility Problems with NYT, data.gov Show Need for Better<br />
Practices</h2>
<p>When I say, &#8220;shot&#8221;, what do you think of? A flu shot? A shot of whisky? A moon shot? A gun shot? What if I add the term &#8220;bank&#8221;? Do you now think of someone being shot in an armed robbery of a local bank or similar?</p>
<p>And, now, what if I add a reference to say, <a style="font-style: italic;" href="http://en.wikipedia.org/wiki/The_Hustler_%28film%29">The Hustler</a>, or Minnesota Fats, or &#8220;Fast Eddie&#8221; Felson? Do you now see the connection to a pressure-packed banked pool shot in some smoky bar room?</p>
<p>As humans we need context to make connections and remove ambiguity. For machines, with their limited reasoning and inference engines, context and accurate connections are even more important.</p>
<p>Over the past few weeks we have seen announcements of two large and high-visibility <a href="http://en.wikipedia.org/wiki/Linked_data">linked data</a></p>
<p>projects:  One, a first release of references for articles concerning about 5,000 people from the New York Times at <a class="http" href="http://data.nytimes.com/">data.nytimes.com</a>; and Two, a massive exposure of 5 billion triples from <a href="http://tw.rpi.edu/">data.gov</a> datasets provided by the <a href="http://tw.rpi.edu/">Tetherless World Constellation</a> (TWC) at <a href="http://rpi.edu/">Rennselaer Polytechnic Institute</a> (RPI).</p>
<p>On various grounds from <a href="http://go-to-hellman.blogspot.com/2009/10/new-york-times-blunders-into-linked.html"> licensing</a> to <a href="http://dowhatimean.net/2009/10/linked-data-at-the-new-york-times-exciting-but-buggy">data characterization</a> and to creating linked data for its <a href="http://www.betaversion.org/%7Estefano/linotype/news/351/">own sake</a>, some prominent commentators have weighed in on what is good and what is not so good with these datasets. One of us, Mike, <a href="http://www.mkbergman.com/843/must-read-data-smoke-and-mirrors/">commented</a> about a week ago that &#8220;we have now moved beyond &#8216;proof of concept&#8217; to<br />
the need for actual useful data of trustworthy provenance and proper mapping and characterization. Recent efforts are a disappointment that no enterprise would or could rely upon.&#8221;</p>
<p>Reactions to <a href="http://www.mkbergman.com/843/must-read-data-smoke-and-mirrors/">that posting</a> and continued discussion on various <a href="http://lists.w3.org/Archives/Public/public-esw-thes/2009Nov/0000.html"> mailing lists</a> warrant a more precise dissection of what is wrong and still needs to be done with these datasets <a href="#ld1">[1]</a>.<br />
<h3>Berners-Lee&#8217;s Four Linked Data &#8220;Rules&#8221;</h3>
<p> It is useful, then, to return to first principles, namely the original four &#8220;rules&#8221; posed by Tim Berners-Lee in his design note on linked data <a href="#ld2">[2]</a>:</p>
<ol>
<li>Use URIs as names for things</li>
<li>Use HTTP URIs so that people can look up those names</li>
<li>When someone looks up a URI, provide useful information, using thestandards (RDF, SPARQL)</li>
<li>Include links to other URIs so that they can discover more things.</li>
</ol>
<p>The first two rules are definitional to the idea of linked data. They cement the basis of linked data in the Web, and are not at issue with either of the two linked data projects that are the subject of this posting.</p>
<p>However, it is the lack of specifics and guidance in the last two rules where the breakdowns occur. Both the NYT and the RPI datasets suffer from a lack of &#8220;providing useful information&#8221; (Rule #3). And, the <span class="double_u">nature</span> of the links in Rule #4 is a real problem for the NYT dataset.<br />
<h3>What Constitutes &#8220;Useful Information&#8221;?</h3>
<p> The Wikipedia entry on <a href="http://en.wikipedia.org/wiki/Linked_data">linked data</a> expands on &#8220;useful information&#8221; by augmenting the original rule with the parenthetical clause, &#8221; (<span style="font-style: italic;">i.e.</span>, a structured description — metadata).&#8221; But even that expansion is insufficient.</p>
<p>Fundamentally, what are we talking about with linked data? Well, we are talking about instances that are characterized by one or more attributes. Those instances exist within contexts of various natures. And, those contexts may relate to other existing contexts.</p>
<p>We can break this problem description down into three parts:</p>
<ul>
<li>A <span style="font-weight: bold; font-style: italic;">vocabulary</span> that defines the nature of the instances and their descriptive attributes</li>
<li>A <span style="font-weight: bold; font-style: italic;">schema</span> of some nature that describes the structural relationships amongst instances and their characteristics, and, optimally,</li>
<li>A <span style="font-weight: bold; font-style: italic;">mapping</span> to existing external schema or constructs that help place the data into context.</li>
</ul>
<p>At minimum, <span class="double_u">ANY</span> dataset exposed as linked data needs to be described by a <span style="font-weight: bold; font-style: italic;">vocabulary</span>. Both the NYT and RPI datasets fail on this score, as we elaborate below. Better practice is to also provide a <span style="font-weight: bold; font-style: italic;">schema</span> of relationships in which to embed each instance record. And, best practice is to also <span style="font-weight: bold; font-style: italic;">map</span> those structures to external schema.</p>
<p>Lacking this &#8220;useful information&#8221;, especially a defining vocabulary, we cannot begin to understand whether our instances deal with drinks, bank robberies or pool shots. This lack, in essence, makes the information worthless, even though available via URL.<br />
<h4>The data.gov (RPI) Case</h4>
<p> With the support of NSF and various grant funding, RPI has set up the<br />
<a href="http://data-gov.tw.rpi.edu/wiki/The_Data-gov_Wiki">Data-Gov Wiki</a> <a href="#ld3">[3]</a>, which is in the process of converting the datasets on <a ref="http://www.data.gov">data.gov</a> to RDF,placing them into a semantic wiki to enable comment and annotation, and providing that data as RSS feeds. Other demos are also being placed on the site.</p>
<p>As of the date of this posting, the site had a <a href="http://data-gov.tw.rpi.edu/wiki/Data.gov_Catalog">catalog</a> of 116 datasets from the 800 or so available on data.gov, leading to these statistics:</p>
<ul>
<li>459,412,419 table entries</li>
<li>5,074,932,510 triples, and</li>
<li>7,564 properties (or attributes).</li>
</ul>
<p>We&#8217;ll take one of these datasets, <a href="http://www.data.gov/details/319">#319</a>, and look a bit closer at it:</p>
<table border="1" cellspacing="0" cellpadding="4">
<tbody>
<tr>
<th style="background-color: #cccccc;">Wiki</th>
<th style="background-color: #cccccc;"> Title</th>
<th style="background-color: #cccccc;"> Agency</th>
<th style="background-color: #cccccc;"> Name</th>
<th style="background-color: #cccccc;"> data.gov Link</th>
<th style="background-color: #cccccc;"> No Properties</th>
<th style="background-color: #cccccc;"> No Triples</th>
<th style="background-color: #cccccc;">RDF File</th>
</tr>
<tr>
<td><a title="Dataset 319" href="http://data-gov.tw.rpi.edu/wiki/Dataset_319">Dataset 319</a></td>
<td>Consumer Expenditure Survey</td>
<td><a title="Department of Labor" href="http://data-gov.tw.rpi.edu/wiki/Department_of_Labor">Department of Labor</a></td>
<td><a title="LABOR-STAT (page does not exist)" href="http://data-gov.tw.rpi.edu/w/index.php?title=LABOR-STAT&amp;action=edit&amp;redlink=1">LABOR-STAT</a></td>
<td><a title="http://www.data.gov/details/319" rel="nofollow" href="http://www.data.gov/details/319">http://www.data.gov/details/319</a></td>
<td style="text-align: right;">22</td>
<td style="text-align: right;">1,583,236</td>
<td><a title="http://data-gov.tw.rpi.edu/raw/319/index.rdf" rel="nofollow" href="http://data-gov.tw.rpi.edu/raw/319/index.rdf">http://data-gov.tw.rpi.edu/raw/319/index.rdf</a></td>
</tr>
</tbody>
</table>
<p>This report was picked solely because it had a small number of attributes (properties), and is thus easier to screen capture. The summary report on the wiki is shown by this <a href="http://data-gov.tw.rpi.edu/wiki/Dataset_319">page</a>:</p>
<div style="margin: 10px;">
<p><a href="http://fgiasson.com/blog/wp-content/uploads/2009/11/091115_wiki_dataset_319.png"><br />
<img class="center" style="border: 0px solid; width: 600px; height: 611px;" title="Click to expand" src="http://fgiasson.com/blog/wp-content/uploads/2009/11/091115_wiki_dataset_319.png" alt="Data-gov-Wiki Dataset #319" /></a></p>
<p><span style="font-style: italic; font-size: 90%;">(click to expand)</span></div>
<p>So, we see that this specific dataset contains about 22 of the nearly 8,000 attributes across all datasets.</p>
<p>When we click on one of these attribute names, we are then taken to a specific wiki page that only reiterates its label. There is no definition or explanation.</p>
<p>When we inspect this page further we see that, other than the broad characterization of the dataset itself (the bulk of the page), we see at the bottom 22 undefined attributes with labels such as <span style="font-style: italic;">item code</span>, <span style="font-style: italic;">periodicity code</span>, <span style="font-style: italic;">seasonal</span>, and the like. These attributes are the real structural basis for the data in this dataset.</p>
<p>But, what does all of this mean???</p>
<p>To gain a clue, now let&#8217;s go to the source data.gov site for this <a href="http://www.data.gov/details/319">dataset (#319)</a>. Here is how that report looks:</p>
<div style="margin: 10px;">
<p><a href="http://fgiasson.com/blog/wp-content/uploads/2009/11/091115_data_gov_319.png"><br />
<img class="center" style="border: 0px solid; width: 600px; height: 1146px;" title="Click to expand" src="http://fgiasson.com/blog/wp-content/uploads/2009/11/091115_data_gov_319.png" alt="Data.gov Dataset #319" /></a></p>
<p><span style="font-style: italic; font-size: 90%;">(click to expand)</span></div>
<p> Contained within this report we see a listing for additional <a href="ftp://ftp.bls.gov/pub/time.series/cx/cx.txt">metadata</a>. This link tells us about the various data fields contained in this dataset; we see many of these attributes are &#8220;codes&#8221; to various data categories.</p>
<p>Probing further into the dataset&#8217;s <a href="http://www.bls.gov/cex/">technical documentation</a>, we see that there is indeed a rich structure underneath this report, again provided<br />
via various code lookups. There are codes for geography, seasonality (adjusted or not), consumer demographic profiles and a variety of consumption categories. (See, for example, the link to this <a href="http://www.bls.gov/cex/csxgloss.htm">glossary page</a>.) These are the keys to understanding the actual values within this dataset.</p>
<p>For example, one major dimension of the data is captured by the attribute <span style="font-style: italic;">item_code</span>. The survey breaks down consumption expenditures within the broad categories of  Food, Housing, Apparel and Services, Transportation, Health Care, Entertainment, and Other. Within a category, there is also a rich  structural breakdown. For  xample, expenditures for Bakery Products within Food is given a <a href="ftp://ftp.bls.gov/pub/time.series/cx/cx.item">code</a> of FHC2.</p>
<p>But, nowhere are these codes defined or unlocked in the RDF datasets. This absence is true for virtually all of the datasets exposed on this wiki.</p>
<p>So, for literally billions of triples, and 8,000 attributes, we have <span style="font-weight: bold;">ABSOLUTELY NO INFORMATION ABOUT WHAT THE DATA CONTAINS OTHER THAN A PROPERTY LABEL</span>. There is much,much rich value here in data.gov, but all of it remains locked up and hidden.</p>
<p>The sad truth about this data release is that it provides absolutely no value in its current form. We lack the keys to unlock the value.</p>
<p>To be sure, early essential spade work has been done here to begin putting in place the conversion infrastructure for moving text files, spreadsheets and the like to an RDF form. This is yeoman work important to ultimate access. But, until a <span style="font-weight: bold; font-style: italic;">vocabulary</span> is published that defines the attributes and their codes so we can unlock this value, it will remain hidden. And only when its further value (by connecting attributes and relations across datasets) through a <span style="font-weight: bold; font-style: italic;">schema</span> of some nature is also published, the real value from connecting the dots will also remain hidden.<img style="width: 160px; height: 218px; float: right; margin-left: 10px;" title="The Hustler" src="http://fgiasson.com/blog/wp-content/uploads/2009/11/091115_the_hustler.jpg" alt="The Hustler" align="right" /></p>
<p>These datasets may meet the partial conditions of providing clickable URLs, but the crucial &#8220;useful information&#8221; as to what any of this data means is absent.</p>
<p>Every single dataset on data.gov has supporting references to text files, PDFs, Web pages or the like that describe the nature of the data within each dataset. Until that information is exposed and made usable, we have no linked data. </p>
<p>Until ontologies get created from these technical documents, the value of these data instances remain locked up, and no value can be created from having these datasets expressed in RDF.</p>
<p>The devil lies in the details. The essential hard work has not yet begun.</p>
<h4>The NYT Case</h4>
<p>Though at a much smaller scale with many fewer attributes, the <a href="http://data.nytimes.com">NYT dataset</a> suffers from the same failing: it too lacks a <span style="font-weight: bold; font-style: italic;">vocabulary</span>.</p>
<p>So, let&#8217;s take the case of one of the lead actors in <a style="font-style: italic;" href="http://en.wikipedia.org/wiki/The_Hustler_%28film%29">The Hustler</a>, Paul Newman, who played the role of &#8220;Fast Eddie&#8221; Felson. Here is the <a href="http://data.nytimes.com/N31738445835662083893.html">NYT record</a> for the &#8220;person&#8221; <span style="font-style: italic;">Paul<br />
Newman</span> (which they also refer to as <a href="http://data.nytimes.com/newman_paul_per">http://data.nytimes.com/newman_paul_per</a>). Note the header title of <span style="font-weight: bold;">Newman, Paul</span>:</p>
<div style="margin: 10px;">
<p><a href="http://fgiasson.com/blog/wp-content/uploads/2009/11/091115_nyt_paul_newman.png"><br />
<img class="center" style="border: 0px solid; width: 600px; height: 593px;" title="Click to expand" src="http://fgiasson.com/blog/wp-content/uploads/2009/11/091115_nyt_paul_newman.png" alt="NYT 'Paul Newman Articles' Record" /></a></p>
<p><span style="font-style: italic; font-size: 90%;">(click to expand)</span></div>
<p> Click on any of the internal labels used by the NYT for its own attributes (such as <a  ref="http://data.nytimes.com/elements/first_use">nyt:first_use</a>), and you will be given this message:</p>
<div style="margin-left: 40px;">
<p><span style="font-style: italic;">&#8220;An RDFS description and English language documentation for the NYT namespace will be provided soon. Thanks for your patience.&#8221;</span></div>
<p>We again have no idea what is meant by all of this data except for the labels used for its attributes. In this case for <a href="http://data.nytimes.com/elements first_use">nyt:first_use</a> we have a value of &#8220;2001-03-18&#8243;.</p>
<p>Hello? What? What is a &#8220;first use&#8221; for a &#8220;Paul Newman&#8221; of &#8220;2001-03-18&#8243;???</p>
<p>The NYT put the cart before the horse: even if minimal, they should have released their ontology first — or at least at the same time — as they released their data instances. (See further <a href="http://www.mkbergman.com/825/fresh-perspectives-on-the-semantic-enterprise/"> this discussion</a> about how an ontology creation workflow can be incremental by starting simple and then upgrading as needed.) </p>
<h3>Links to Other Things</h3>
<p>Since there really are no links to other things on the Data-Gov Wiki, our focus in this section continues with the NYT dataset using our same example.</p>
<p>We now are in the territory of the fourth &#8220;rule&#8221; of linked data: <span style="font-style: italic;">4. Include links to other URIs so that they can discover more things</span>.</p>
<p>This will seem a bit basic at first, but before we can talk about linking to other things, we first need to understand and define the starting &#8220;thing&#8221; to which we are linking.<br />
<h4>What is a &#8220;Newman, Paul&#8221; Thing?</h4>
<p> Of course, without its own vocabulary, we are left to deduce what this thing &#8220;<span style="font-weight: bold;">Newman, Paul</span>&#8220; <span  class="double_u">is</span> that is shown in the previous screen shot. Our first clue comes from the statement that it is of <span style="font-style: italic;">rdf:type</span> <a href="http://www.w3.org/TR/skos-reference/">SKOS</a> <span style="font-style: italic;">concept</span>. By looking to the SKOS vocabulary, we see that <a href="http://www.w3.org/TR/skos-reference/#concepts"><span style="font-style: italic;">concept</span></a> is a class and is defined as: </p>
<p style="margin-left: 40px; font-style: italic;">A SKOS concept can be viewed as an idea or notion; a unit of thought. However, what constitutes a unit of thought is subjective, and this<br />
definition is meant to be suggestive, rather than restrictive. The notion of a SKOS concept is useful when describing the conceptual or intellectual structure of a knowledge organization system, and when referring to specific ideas or meanings established within a KOS.</p>
<p>We also see that this instance is given a <a href="http://xmlns.com/foaf/0.1/primaryTopic">foaf:primaryTopic</a> of <span style="font-style: italic;">Paul Newman</span>.</p>
<p>So, we can deduce so far that this instance is about the concept or idea of <span style="font-style: italic;">Paul Newman</span>. Now, looking to the attributes of this instance — that is the defining properties provided by the NYT — we see the properties of <a href="http://data.nytimes.com/elements/associated_article_count">nyt:associated_article_count</a>, <a href="http://data.nytimes.com/elements/first_use">nyt:first_use</a>, <a href="http://data.nytimes.com/elements/last_use">nyt:last_use</a> and <a href="http://data.nytimes.com/elements/topicPage">nyt:topicPage</a>. Completing our deductions, and in the absence of its own vocabulary, we can now define this concept instance somewhat as follows:
<p style="margin-left: 40px;"><span style="font-style: italic;">New York Times articles in the period 2001 to 2009 having as their primary topic the actor Paul Newman</span></p>
<p>(BTW, across all records in this dataset, we could see what the earliest first use was to better deduce the time period over which these articles have been assembled, but that has not been done.)</p>
<p>We also would re-title this instance more akin to &#8220;2001-2009 NYT Articles with a Primary Topic of Paul Newman&#8221; or some such and use URIs more akin to this usage. </p>
<h4>sameAs Woes</h4>
<p>Thus, in order to make links or connections with other data, it is essential to understand what the nature is of the subject &#8220;thing&#8221; at hand. There is much confusion about actual &#8220;things&#8221; and the references to &#8220;things&#8221; and what is the nature of a &#8220;thing&#8221; within the literature and on mailing lists.</p>
<p>Our belief and usage in matters of the semantic Web is that all &#8220;things&#8221; we deal with are a reference to whatever the &#8220;true&#8221;, actual thing is. The question then becomes:  What is the nature (or scope) of this referent?</p>
<p>There are actually quite easy ways to determine this nature. First, look to one or more instance examples of the &#8220;thing&#8221; being referred to. In our case above, we have the &#8220;<span style="font-weight: bold;">Newman, Paul</span>&#8221; instance record. Then, look to the properties (or attributes) the publisher of that record has used to describe that thing. Again, in the case above, we have <a href="http://data.nytimes.com/elements/associated_article_count">nyt:associated_article_count</a>, <a href="http://data.nytimes.com/elements/first_use">nyt:first_use</a>, <a href="http://data.nytimes.com/elements/latest_use">nyt:last_use</a> and <a href="http://data.nytimes.com/elements/topicPage">nyt:topicPage</a>.</p>
<p>Clearly, this instance record — that is, its nature — deals with articles or groups of articles. The relation to <span style="font-style: italic;">Paul Newman</span> occurs as a basis of<br />
the <span class="double_u">primary topic</span> of these articles, and not a <span class="double_u">person</span> basis for which to describe the instance. If the nature of the instance was indeed the person <span style="font-style: italic;">Paul Newman</span>, then the attributes of the record would more properly be related to &#8220;person&#8221; properties such as age, sex, birthdate, death date, marital status, etc.</p>
<p>This confusion by NYT as to the nature of the &#8220;things&#8221; they are describing then leads to some very serious errors. By confusing the topic (<span style="font-style: italic;">Paul Newman</span>) of a record with the nature of that record (articles about topics), NYT next misuses one of the most powerful semantic Web predicates available, <span style="font-weight: bold;">owl:sameAs</span>.</p>
<p>By asserting in the &#8220;<span style="font-weight: bold;">Newman, Paul</span>&#8221; record that the instance has a <span style="font-weight: bold;">sameAs</span> relationship with external records in <a href="http://rdf.freebase.com/ns/en.paul_newman">Freebase</a> and <a href="http://dbpedia.org/resource/Paul_Newman">DBpedia</a>, the NYT both <a href="http://en.wikipedia.org/wiki/Entailment">entail</a>s that properties from any of the associated records are shared and <a href="http://en.wikipedia.org/wiki/Inference">infers</a> a chain of other types to describe the record. More precisely, the NYT is asserting that the &#8220;thing&#8221; referred to by these instances are <strong class="moz-txt-star">identical</strong> resources.</p>
<p>Thus, by the <span style="font-weight: bold;">sameAs</span> statements in the <span style="font-weight: bold;">“Newman, Paul”</span> record, the NYT is also asserting that that record is an instance of all these classes:</p>
<table border="0">
<tbody>
<tr>
<td></td>
<td>
<ul>
<li> <a class="uri" rel="rdf:type" href="http://dbpedia.org/about/html/http://www.w3.org/2002/07/owl%23Thing">owl:Thing</a></li>
<li> <a href="http://xmlns.com/foaf/spec/#term_Agent">foaf:Agent</a></li>
<li> <a href="http://xmlns.com/foaf/spec/#term_Person">foaf:Person</a></li>
<li> <a class="uri" rel="rdf:type" href="http://dbpedia.org/ontology/Actor">dbpedia-owl:Actor</a></li>
<li> <a class="uri" rel="rdf:type" href="http://dbpedia.org/class/yago/JewishActors">http://dbpedia.org/class/yago/JewishActors</a></li>
<li> <a class="uri" rel="rdf:type" href="http://dbpedia.org/class/yago/PeopleFromCleveland,Ohio">http://dbpedia.org/class/yago/PeopleFromCleveland,Ohio</a></li>
<li><a class="uri" rel="rdf:type" href="http://dbpedia.org/ontology/Artist">dbpedia-owl:Artist</a></li>
<li> <a class="uri" rel="rdf:type" href="http://dbpedia.org/ontology/Person">dbpedia-owl:Person</a></li>
<li> <a class="uri" rel="rdf:type" href="http://dbpedia.org/class/yago/Person100007846">http://dbpedia.org/class/yago/Person100007846</a></li>
<li> <a class="uri" rel="rdf:type" href="http://dbpedia.org/class/yago/AmericanFilmDirectors">http://dbpedia.org/class/yago/AmericanFilmDirectors</a></li>
<li> <a class="uri" rel="rdf:type" href="http://dbpedia.org/class/yago/YaleUniversityAlumni">http://dbpedia.org/class/yago/YaleUniversityAlumni</a></li>
<li><a class="uri" rel="rdf:type" href="http://dbpedia.org/class/yago/OhioUniversityAlumni">http://dbpedia.org/class/yago/OhioUniversityAlumni</a></li>
<li> <a class="uri" rel="rdf:type" href="http://sw.opencyc.org/2008/06/10/concept/Mx4rvVjWoZwpEbGdrcN5Y29ycA">opencyc:en/MaleHuman</a></li>
<li> <a class="uri" rel="rdf:type" href="http://dbpedia.org/class/yago/AmericanFilmActors">http://dbpedia.org/class/yago/AmericanFilmActors</a></li>
<li> <a class="uri" rel="rdf:type" href="http://dbpedia.org/class/yago/Liberals">http://dbpedia.org/class/yago/Liberals</a></li>
<li> <a class="uri" rel="rdf:type" href="http://dbpedia.org/class/yago/OhioActors">http://dbpedia.org/class/yago/OhioActors</a></li>
<li><a class="uri" rel="rdf:type" href="http://dbpedia.org/class/yago/UnitedStatesNavySailors">http://dbpedia.org/class/yago/UnitedStatesNavySailors</a></li>
<li> <a class="uri" rel="rdf:type" href="http://dbpedia.org/class/yago/PeopleFromWestport,Connecticut"> http://dbpedia.org/class/yago/PeopleFromWestport,Connecticut</a></li>
<li> <a class="uri" rel="rdf:type" href="http://sw.opencyc.org/2008/06/10/concept/Mx4rwQB4UJwpEbGdrcN5Y29ycA"></a> <a class="uri" rel="rdf:type" href="http://sw.opencyc.org/2008/06/10/concept/Mx4rwQB4UJwpEbGdrcN5Y29ycA"> opencyc:en/JewishPerson</a></li>
<li> <a class="uri" rel="rdf:type" href="http://sw.opencyc.org/2008/06/10/concept/Mx4rwMRyTJwpEbGdrcN5Y29ycA">opencyc:en/ActorInMovies</a></li>
<li> <a class="uri" rel="rdf:type" href="http://dbpedia.org/class/yago/LivingPeople">http://dbpedia.org/class/yago/LivingPeople</a></li>
<li> <a class="uri" rel="rdf:type" href="http://dbpedia.org/class/yago/Actor109765278">http://dbpedia.org/class/yago/Actor109765278</a></li>
<li> <a class="uri" rel="rdf:type" href="http://dbpedia.org/class/yago/AmericanVegetarians">http://dbpedia.org/class/yago/AmericanVegetarians</a></li>
<li><a class="uri" rel="rdf:type" href="http://dbpedia.org/class/yago/AmericanPhilanthropists">http://dbpedia.org/class/yago/AmericanPhilanthropists</a></li>
<li> <a class="uri" rel="rdf:type" href="http://dbpedia.org/class/yago/KenyonCollegeAlumni">http://dbpedia.org/class/yago/KenyonCollegeAlumni</a></li>
<li> <a class="uri" rel="rdf:type" href="http://dbpedia.org/class/yago/WesternFilmActors">http://dbpedia.org/class/yago/WesternFilmActors</a></li>
<li> <a class="uri" rel="rdf:type" href="http://dbpedia.org/class/yago/ActorsStudioAlumni">http://dbpedia.org/class/yago/ActorsStudioAlumni</a></li>
<li>and, a hundred other dbpedia_yago superClasses.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>Furthermore, because of its strong, reciprocal entailments, the <span style="font-weight: bold;">owl:sameAs</span> assertion would also now entail that the person <span style="font-style: italic;">Paul Newman</span> has the <a href="http://data.nytimes.com/elements/first_use">nyt:first_use</a> and <a href="http://data.nytimes.com/elements/latest_use">nyt:last_use</a> attributes, clearly illogical for a &#8220;person&#8221; thing.</p>
<p>This connection is clearly wrong in both directions. <span style="font-style: italic;">Articles</span> are not <span style="font-style: italic;">persons</span> and don&#8217;t have <span style="font-style: italic;">marital status</span>; and <span style="font-style: italic;">persons</span> do not have <span style="font-style: italic;">first_uses</span>. By misapplying this <span style="font-weight: bold;">sameAs</span> linkage relationship, we have screwed things up in every which way. And the error began with misunderstanding what kinds of &#8220;things&#8221; our data is about.</p>
<h4>Some Options</h4>
<p>However, there are solutions. First, the <span style="font-weight: bold;">sameAs</span> assertions, at least involving these external resources, should be dropped.</p>
<p>Second, if linkages are still desired, a vocabulary such as <a href="http://umbel.org">UMBEL</a> <a href="#ld4">[4]</a> could be used to make an assertion between such a concept, and these other related resources. So, even though these resources are not the same, they are <strong>closely</strong> related. The UMBEL ontology helps us to define this kind of relation between related, but non-identical, resources.</p>
<p>Instead of using the <span style="font-weight: bold;">owl:sameAs</span></p>
<p>property, we would suggest the usage of the <span style="font-weight: bold;">umbel:linksEntity</span>, which links a <span style="font-weight: bold;">skos:Concept</span> to related named entities resources. Additionally, Freebase, which also currently asserts a <span style="font-weight: bold;">sameAs</span> relationship to the NYT resource, could use the <span style="font-weight: bold;">umbel:isAbout</span> relationship to assert that their resource &#8220;is about&#8221; a certain concept, which is the one defined by the NYT.</p>
<p>Alternatively, still other external vocabularies that more precisely capture the intent of the NYT publishers could be found, or the NYT editors could define their own properties specifically addressing their unique linkage interests. </p>
<h4>Other Minor Issues</h4>
<p>As a couple of additional, minor suggestions for the NYT dataset, we would suggest:</p>
<ul>
<li>Create a <span style="font-weight: bold;">foaf:Organization</span> description of the NYT organization, then use it with <span style="font-weight: bold;">dc:creator</span> and <span style="font-weight: bold;">dcterms:rightsHolder</span> rather than using a literal, and</li>
<li>The dual URIs such as &#8220;<a href="http://data.nytimes.com/N31738445835662083893">http://data.nytimes.com/N31738445835662083893</a>&#8221; and &#8220;<a href="http://data.nytimes.com/newman_paul_per">http://data.nytimes.com/newman_paul_per</a>&#8221; are not wrong in themselves, but the purpose is hard to understand. Why does a single organization need to create multiple resources for the <strong class="moz-txt-star">identical resource,</strong> when it comes from the same system and has the same purpose?</li>
</ul>
<h4>Re-visiting the Linkage &#8220;Rule&#8221;</h4>
<p>There are very valuable benefits from entailment, inference and logic to be gained from linking resources. However, if the nature of the &#8220;things&#8221; being linked — or the properties that define these linkages — are incorrect, then very wrong logical implications result. Great care and understanding should be applied to linkage assertions.</p>
<h3>In the End, the Challenge is Not Linked Data, but <span style="font-style: italic; text-decoration: underline;">Connected</span> Data</h3>
<p>Our critical comments are not meant to be disrespectful and are not being picky. The NYT and TWC are prominent institutions for which we should expect leadership on these issues. Our criticisms (and we believe those of others) are also not an expression of a &#8220;<a href="http://en.wikipedia.org/wiki/Hype_cycle">trough of disillusionment</a>&#8221; as <a href="http://twitter.com/gregboutin/status/5558525462">some</a> have been pointing out.</p>
<p>This posting is about poor practices, pure and simple. The time to correct them is now. If asked, we would be pleased to help either institution establish exemplar practices. This is not automatic, and it is not always easy. The data.gov datasets, in particular, will require much time and effort to get right. There is much documentation that needs to be transitioned and expressed in semantic Web formats.</p>
<p>In a broader sense, we also seem to lack a definition of best practices related to <span style="font-weight: bold;">vocabularies</span>, <span style="font-weight: bold;">schema</span> and <span style="font-weight: bold;">mappings</span>. The Berners-Lee rules are imprecise and insufficient as is. Prior best guidance documents tend to<br />
be more how to publish and make URIs linkable, than to properly characterize, describe and connect the data.</p>
<p>Perhaps, in part, this is a bit of a semantics issue. The challenge is not the mechanics of <span style="font-style: italic;">linking data</span>, but the meaning and basis for <span class="double_u">connecting</span> that data. Connections require logic and rationality sufficient to reliably inform inference and rule-based engines. It also needs to pass the sniff test as we &#8220;follow our nose&#8221; by clicking the links exposed by the data.</p>
<p>It is exciting to see high-quality content such as from national governments and major publishers like the New York Times begin to be exposed as linked data. When this content finally gets embedded into usable contexts, we should see manifest uses and benefits emerge. We hope both institutions take our criticisms in that spirit.</p>
<div style="background-color: #ffffcc;border: 1px dotted yellow;margin: 15px 60px;padding: 8px;vertical-align: middle;margin: 0pt 0pt 0pt 10px;  width: 300px; text-align: center;">This posting has been jointly authored by <a href="http://mkbergman.com"> Mike Bergman</a> and <a href="http://fgiasson.com/blog">Fred Giasson</a> and simultaneously published on both of their blogs, hoping to draw more attention to the need for better practices in publishing linked data.</div>
<hr style="margin: 15px 0px;" size="1" />
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ld1" name="ld1"></a> [1] The NYT has been updated with improvements and they fixed multiple issues from the first release. The<br />
problems listed herein, however, still pertain after these improvements.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ld2" name="ld2"></a> [2] Tim Berners-Lee, 2006. Linked Data (Design Issues), first posted on 2006-07-27; last updated on<br />
2009-06-18. See <a href="http://www.w3.org/DesignIssues/LinkedData.html">http://www.w3.org/DesignIssues/LinkedData.html</a>. Berners-Lee refers to the steps above as &#8220;rules,&#8221; but he elaborates they are expectations of behavior. Most later citations refer to these as &#8220;principles.&#8221;</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ld3" name="ld3"></a> [3] Li Ding, Dominic DiFranzo, Sarah Magidson, Deborah L. McGuinness and Jim Hendler, 2009. Data-GovWiki: Towards Linked Government Data. See <a href="http://www.cs.vu.nl/%7Epmika/swc/documents/Data-gov%20Wiki-data-gov-wiki-v1.pdf"></a><br />
<a href="http://www.cs.vu.nl/%7Epmika/swc/documents/Data-gov%20Wiki-data-gov-wiki-v1.pdf"> http://www.cs.vu.nl/~pmika/swc/documents/Data-gov%20Wiki-data-gov-wiki-v1.pdf</a>.</div>
<div style="margin: 10px 0pt; font-size: 90%;"><a id="ld4" name="ld4"></a> [4] UMBEL <em>(Upper Mapping and Binding Exchange Layer)</em> is a lightweight ontology structure in development for relating Web content and data to a standard set of subject concepts. It purpose has resulted in its creation of an associated vocabulary geared to both class-instance and reciprocal relationships, as well as partial or likelihood relationships. See <a href="http://umbel.org/technical_documentation.html#vocabulary">http://umbel.org/technical_documentation.html#vocabulary</a>.</div>
]]></content:encoded>
			<wfw:commentRss>http://fgiasson.com/blog/index.php/2009/11/16/when-linked-data-rules-fail/feed/</wfw:commentRss>
		<slash:comments>32</slash:comments>
		</item>
		<item>
		<title>commON and irJSON PHP parsers released</title>
		<link>http://fgiasson.com/blog/index.php/2009/10/20/common-and-irjson-php-parsers-released/</link>
		<comments>http://fgiasson.com/blog/index.php/2009/10/20/common-and-irjson-php-parsers-released/#comments</comments>
		<pubDate>Tue, 20 Oct 2009 21:15:45 +0000</pubDate>
		<dc:creator>Fred</dc:creator>
				<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[irON]]></category>

		<guid isPermaLink="false">http://fgiasson.com/blog/?p=986</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=commON and irJSON PHP parsers released&amp;rft.aulast=Giasson&amp;rft.aufirst=Frédérick&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=irON&amp;rft.source=Frederick Giasson&#8217;s Weblog&amp;rft.date=2009-10-20&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://fgiasson.com/blog/index.php/2009/10/20/common-and-irjson-php-parsers-released/&amp;rft.language=English"></span>
Two days ago we released irON: Instance Record and Object Notation (irON) Specification. irON is a new notation that has been created to describe instance records. irON records can be serialized in 3 different formats: irXML (XML), irJSON (JSON) and commON (CSV: mainly for spreadsheet manipulations).
The release of irON has already been covered at length [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=commON and irJSON PHP parsers released&amp;rft.aulast=Giasson&amp;rft.aufirst=Frédérick&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=irON&amp;rft.source=Frederick Giasson&#8217;s Weblog&amp;rft.date=2009-10-20&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://fgiasson.com/blog/index.php/2009/10/20/common-and-irjson-php-parsers-released/&amp;rft.language=English"></span>
<p><img class="size-full wp-image-988 alignleft" title="iron_logo_235" src="http://fgiasson.com/blog/wp-content/uploads/2009/10/iron_logo_235.png" alt="iron_logo_235" width="99" height="53" />Two days ago <a href="http://structureddynamics.com">we</a> released irON: <a href="http://openstructs.org/iron/iron-specification">Instance Record and Object Notation (irON) Specification</a>. irON is a new notation that has been created to describe instance records. irON records can be serialized in 3 different formats: <a href="http://openstructs.org/iron/iron-specification#mozTocId408837">irXML</a> (XML), <a href="http://openstructs.org/iron/iron-specification#mozTocId462570">irJSON</a> (JSON) and <a href="http://openstructs.org/iron/iron-specification#mozTocId603499">commON</a> (CSV: mainly for spreadsheet manipulations).</p>
<p>The release of irON has already been covered at length on <a href="http://www.mkbergman.com/838/iron-semantic-web-for-mere-mortals/">Mike&#8217;s blog</a> and in <a href="http://structureddynamics.com/pr20091018.html">Structure Dynamics&#8217;s press room</a>; so I won&#8217;t talk more about it here.</p>
<h3>irON Parsers</h3>
<p>What I am happy to release today are the first two parsers that can be used to parse and validate irON datasets of instance records. The first two parsers that have been developed so far are the ones for irJSON and commON. Each parser has been developed in PHP and is available under the <a href="http://www.apache.org/licenses/LICENSE-2.0.html">Apache 2 licence</a>. Now, lets take a look at each of them</p>
<h3>irJSON Parser</h3>
<p style="text-align: left;">The irJSON parser package can be <a href="http://code.google.com/p/iron-notation/downloads/list">downloaded here</a>. Additionally, the source code can be <a href="http://code.google.com/p/iron-notation/source/browse/#svn/trunk/irJSON">browsed here</a>.</p>
<p>First of all, to understand the code, you have to understand the <a href="http://openstructs.org/iron/iron-specification#mozTocId462570">specification of the irJSON serialization</a>.</p>
<p>The irON parser package is everything you need to test and use the parser. The package is composed of the following files:</p>
<ul>
<li>test.php &#8211; If you want to quick-start with      this package, just run this test.php script and you will have an idea of      what it can do for you. This script just runs the parser over a irJSON test      file, and shows you some validation errors along with the internal parsed      structure of the file. From there, you can simply use the irJSONParser      class, with the structure that is returned to do whatever is needed for      you: adding the information in you database, converting the data to      another format, etc.</li>
<li>irJSONParser.php &#8211; This is the irJSON      parser class. It parses the irJSON file and populates its internal      structure that is composed of instances of the classes below.</li>
<li>Dataset.php &#8211; This      class defines a Dataset records with all its attributes. It is the object      that the developed has to manipulate that comes from the parser.</li>
<li>InstanceRecord.php &#8211; This class defines an      Instance Records with all its attributes. It is the object that the      developed has to manipulate that comes from the parser.</li>
<li>StructureSchema.php &#8211; This class defines a      Structure Schema records with all its attributes. It is the object that      the developed has to manipulate that comes from the parser.</li>
<li>LinkageSchema.php &#8211;      This class defines a Linkage Schema records with all its attributes. It is      the object that the developed has to manipulate that comes from the      parser.</li>
</ul>
<p>The irJSON parser also validates the incoming irJSON files according to these three levels of validation:</p>
<ol>
<li>JSON well-formedness validation      &#8211; The first validation test occurs on the JSON serialization itself. A      JSON file has to be a well formed in order to be processed. An error at      this level will raise an error to the user.</li>
<li>irJSON well-formedness validation &#8211; Once      JSON is parsed and well formed, the parser make sure that the file is      irJSON well-formed. If it is not well formed according to the irJSON spec,      an error will be raised to the user.</li>
<li>Structure Schema validation &#8211; The last      validation that occurs is between instance records, and their related      (if available) Structure Schema. If a validation error happens at this      level, a notice will be raised to the user.</li>
</ol>
<p>You can experiment with some of these validation errors and notices by running the test.php script in the package.</p>
<p>With this package, developers can already start to parse irJSON files and to integrate them with some of their prototype projects.</p>
<h3>commON Parser</h3>
<p>The commON parser package can be <a href="http://code.google.com/p/iron-notation/downloads/list">downloaded here</a>. Additionally, the source code can be <a href="http://code.google.com/p/iron-notation/source/browse/#svn/trunk/commON">browsed here</a>.</p>
<p>To understand the code, you have to understand the <a href="http://openstructs.org/iron/iron-specification#mozTocId603499">specification of the commON serialization</a>.</p>
<p>The commON parser package is everything you need to test the parser. The package is composed of the following files:</p>
<ul>
<li>test.php      &#8211; If you want to quick-start with this package, just run this test.php      script and you will have an idea of what it can do for you. This script      just run the parser over a file, and shows you some validation errors      along with the internal parsed structure of the file. From there, you can      simply use the CommonParser class, with the structure that is returned to      do whatever is needed for you: adding the information in you database,      converting the data to another format, etc.</li>
<li>CommonParser.php      &#8211; This is the commON parser class. It parses the commON file and populates      its internal structure that is described in the code. the parser.</li>
</ul>
<p>The commON parser also validates the incoming commON files according to these two levels:</p>
<ol>
<li>CSV      well-formedness validation &#8211; The first validation test occurs on the <a href="http://www.rfc-editor.org/rfc/rfc4180.txt">CSV</a> serialization itself. A CSV file has to be a well formed in order to be      processed. An error at this level will raise an error to the user.</li>
<li>commON      well-formedness validation &#8211; Once CSV is parsed and well formed, the      parser make sure that the file is CSV well-formed. If it is not well      formed according to the CSV RFC, an error will be raised to the user.</li>
</ol>
<p>You can experiment some of these validation errors and notices by running the test.php script in the package.</p>
<p>With this package, developers can already start to parsing commON files and to integrate them with some prototypes of their projects.</p>
<p>The commON parser is less advanced than the irJSON one. For example, the implementation of the &#8220;dataset&#8221; and the &#8220;schema&#8221; processor keywords are not yet done. Other keywords haven&#8217;t (yet) been integrated too. Take a look at the source code to know what is currently missing.</p>
<p>In any case, a lot of things can currently be done with this parser. We will publish specific commON usage use-cases in the coming weeks that will shows people are we are using commON internally and how we will expect our customers to use it to create and maintain different smaller datasets.</p>
<h3><strong>Conclusion</strong></h3>
<p>These are the first versions of the irJSON and commON parsers. We have to continue to development to make them perfectly reflecting the current and future irON specification. We yet have to write the irXML parser too.</p>
<p>I would encourage reporting any issues with these parsers, or any enhancement suggestions, <a href="http://code.google.com/p/iron-notation/issues/list">on this issue tracked</a>.</p>
<p>All discussions regarding these parsers and the irON specification document should happen on the <a href="http://groups.google.com/group/iron-notation?pli=1">irON group mailing list here</a>.</p>
<p>Finally, another step for us will be to embed these parsers in converter web services for <a href="http://openstructs.org/structwsf/">structWSF</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://fgiasson.com/blog/index.php/2009/10/20/common-and-irjson-php-parsers-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A New Home for UMBEL Web Services</title>
		<link>http://fgiasson.com/blog/index.php/2009/09/18/a-new-home-for-umbel-web-services/</link>
		<comments>http://fgiasson.com/blog/index.php/2009/09/18/a-new-home-for-umbel-web-services/#comments</comments>
		<pubDate>Fri, 18 Sep 2009 21:27:46 +0000</pubDate>
		<dc:creator>Fred</dc:creator>
				<category><![CDATA[Ping the Semantic Web]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[UMBEL]]></category>

		<guid isPermaLink="false">http://fgiasson.com/blog/?p=974</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=A New Home for UMBEL Web Services&amp;rft.aulast=Giasson&amp;rft.aufirst=Frédérick&amp;rft.subject=Ping the Semantic Web&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=UMBEL&amp;rft.source=Frederick Giasson&#8217;s Weblog&amp;rft.date=2009-09-18&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://fgiasson.com/blog/index.php/2009/09/18/a-new-home-for-umbel-web-services/&amp;rft.language=English"></span>
Eight months ago we announced the dissolution of Zitgist LLC. This event led to the creation of a &#8220;sandbox&#8220; to keep alive all the online assets of the company. Since this sandbox server was not owned by Structured Dynamics, it was becoming hard for us to update UMBEL and its online services. It is why [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=A New Home for UMBEL Web Services&amp;rft.aulast=Giasson&amp;rft.aufirst=Frédérick&amp;rft.subject=Ping the Semantic Web&amp;rft.subject=Semantic Web&amp;rft.subject=Structured Dynamics&amp;rft.subject=UMBEL&amp;rft.source=Frederick Giasson&#8217;s Weblog&amp;rft.date=2009-09-18&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://fgiasson.com/blog/index.php/2009/09/18/a-new-home-for-umbel-web-services/&amp;rft.language=English"></span>
<p><span style="font-weight: normal; font-size: 14px; "><img class="alignleft size-full wp-image-916" title="umbel_ws" src="http://fgiasson.com/blog/wp-content/uploads/2008/10/umbel_ws.png" alt="umbel_ws" width="170" height="74" />Eight months ago we announced the dissolution of Zitgist LLC. This event led to the creation of a </span>&#8220;<span style="font-weight: normal; font-size: 14px; ">sandbox</span>&#8220;<span style="font-weight: normal; font-size: 14px; "> to keep alive all the online assets of the company. Since this sandbox server was not owned by <a href="http://structureddynamics.com/">Structured Dynamics</a>, it was becoming hard for us to update UMBEL and its online services. It is why we took the time to move the services back on to our new servers.</span><br />
<span style="font-weight: normal; font-size: 14px; "><br />
</span></p>
<h3>A New Home</h3>
<p><img class="alignright size-full wp-image-920" title="sd_logo_260" src="http://fgiasson.com/blog/wp-content/uploads/2009/01/sd_logo_260.png" alt="sd_logo_260" width="260" height="60" />Structured Dynamics LLC now hosts a new version for the UMBEL Web services. From the main menu at the <a href="http://structureddynamics.com/">SD Web site</a> you can access these services under the &#8220;<a href="http://structureddynamics.com/umbel_ws/index.php">umbel ws</a>&#8221; menu option (you can also bookmark the Web services site at <a href="http://umbel.structureddynamics.com/">umbel.structureddynamics.com</a> or <a href="http://ws.umbel.org/">ws.umbel.org</a>.)</p>
<p>This move of UMBEL&#8217;s Web services to a new home will make the future upgrade of UMBEL easier, and this will make the maintenance of the Web services endpoints easier as well. With this move, I am pleased to announce the release of five initial Web services and one visualization tool:</p>
<p><strong>Lookup Web Services:</strong></p>
<ul>
<li><a href="http://ws.umbel.org/finder_subject_concept.php">Finder: Subject      Concept</a></li>
<li><a href="http://ws.umbel.org/reporter_subject_concept.php">Reporter: Subject      Concept</a></li>
</ul>
<p><strong>Inference Engine Web Services:</strong></p>
<ul>
<li><a href="http://ws.umbel.org/inference_lister.php">Inference: Lister &#8212; list      sub-classes, super-classes and equivalent-classes</a></li>
<li><a href="http://ws.umbel.org/inference_validator.php">Inference: Validator &#8212;      verify sub-class, super-class and equivalent-class relationships</a></li>
</ul>
<p><strong>SPARQL endpoint Web Service:</strong></p>
<ul>
<li><a href="http://ws.umbel.org/sparql.php">SPARQL Endpoint</a></li>
</ul>
<p><strong>Visual Tool:</strong></p>
<ul>
<li><a href="http://ws.umbel.org/explorer.php">Subject Concept Explorer</a></li>
</ul>
<p><em>Note that the visual tool is using <a href="http://moritz.stefaner.eu/projects/relation-browser/">Moritz Stefaner&#8217;s Relation Browser</a>.</em></p>
<p><em><br />
</em></p>
<h3>Ping the Semantic Web</h3>
<p><img class="alignright size-full wp-image-832" title="ptswlogo160.gif" src="http://fgiasson.com/blog/wp-content/uploads/2007/08/ptswlogo160.gif" alt="ptswlogo160.gif" width="160" height="90" />Additionally, the <a href="http://pingthesemanticweb.com">Ping the Semantic Web</a> RDF pinging service is now the property of <a href="http://openlinksw.com">OpenLink Software Inc.</a> OpenLink is now hosting, maintaining and developing the service.</p>
]]></content:encoded>
			<wfw:commentRss>http://fgiasson.com/blog/index.php/2009/09/18/a-new-home-for-umbel-web-services/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>New release of UMBEL: v072</title>
		<link>http://fgiasson.com/blog/index.php/2009/08/21/new-release-of-umbel-v072/</link>
		<comments>http://fgiasson.com/blog/index.php/2009/08/21/new-release-of-umbel-v072/#comments</comments>
		<pubDate>Fri, 21 Aug 2009 18:49:49 +0000</pubDate>
		<dc:creator>Fred</dc:creator>
				<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[UMBEL]]></category>

		<guid isPermaLink="false">http://fgiasson.com/blog/?p=967</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=New release of UMBEL: v072&amp;rft.aulast=Giasson&amp;rft.aufirst=Frédérick&amp;rft.subject=Semantic Web&amp;rft.subject=UMBEL&amp;rft.source=Frederick Giasson&#8217;s Weblog&amp;rft.date=2009-08-21&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://fgiasson.com/blog/index.php/2009/08/21/new-release-of-umbel-v072/&amp;rft.language=English"></span>
I am pleased to announce that we resumed our work with UMBEL. We just released the version v0.72, which is based on the OpenCyc version 2009-01-31. This new version is intermediary and has been created mostly to check the evolution of OpenCyc vis-ŕ-vis UMBEL. Within the next month or so, we will release a new [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=New release of UMBEL: v072&amp;rft.aulast=Giasson&amp;rft.aufirst=Frédérick&amp;rft.subject=Semantic Web&amp;rft.subject=UMBEL&amp;rft.source=Frederick Giasson&#8217;s Weblog&amp;rft.date=2009-08-21&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://fgiasson.com/blog/index.php/2009/08/21/new-release-of-umbel-v072/&amp;rft.language=English"></span>
<p style="text-align: left; "><img class="alignright size-full wp-image-825" title="umbel_medium.png" src="http://fgiasson.com/blog/wp-content/uploads/2007/07/umbel_medium.png" alt="umbel_medium.png" width="206" height="100" />I am pleased to announce that we resumed our work with <a href="http://umbel.org">UMBEL</a>. We just released the version <a href="http://umbel.org/documentation.html">v0.72</a>, which is based on the <a href="http://opencyc.org">OpenCyc</a> version 2009-01-31. This new version is intermediary and has been created mostly to check the evolution of OpenCyc vis-ŕ-vis UMBEL. Within the next month or so, we will release a new version (v.080), which will introduce a major new concept that should help systems and users manipulating the entire UMBEL Subject Concepts structure.</p>
<p>For them who want to know what changed between versions v071 and v072, <a href="http://umbel.org/ontology/umbel_v071_v072_difference.csv">here is CVS file that list all the changes between the versions</a>. There are four columns: (1) source node, (2) attribute, (3) target node and (4) version number. This file list all triples that are present in a version, but not in the other. So, you have all changes (nodes &amp; arcs) between the two versions. Mostly all the changes come from internal changes to OpenCyc. We did fix a couple of things such as removing cycles in the graph, etc. But 99% of the changes come from changes within OpenCyc.</p>
<p>Finally note that the web services endpoints will be updated with this new version of UMBEL subject concepts in the coming week along with the dereferencing of their URIs. Stay tuned!</p>
]]></content:encoded>
			<wfw:commentRss>http://fgiasson.com/blog/index.php/2009/08/21/new-release-of-umbel-v072/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>structWSF Early Querying Metrics</title>
		<link>http://fgiasson.com/blog/index.php/2009/08/18/structwsf-early-querying-metrics/</link>
		<comments>http://fgiasson.com/blog/index.php/2009/08/18/structwsf-early-querying-metrics/#comments</comments>
		<pubDate>Tue, 18 Aug 2009 21:04:12 +0000</pubDate>
		<dc:creator>Fred</dc:creator>
				<category><![CDATA[Structured Dynamics]]></category>
		<category><![CDATA[structWSF]]></category>

		<guid isPermaLink="false">http://fgiasson.com/blog/?p=949</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=structWSF Early Querying Metrics&amp;rft.aulast=Giasson&amp;rft.aufirst=Frédérick&amp;rft.subject=Structured Dynamics&amp;rft.subject=structWSF&amp;rft.source=Frederick Giasson&#8217;s Weblog&amp;rft.date=2009-08-18&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://fgiasson.com/blog/index.php/2009/08/18/structwsf-early-querying-metrics/&amp;rft.language=English"></span>
We have been running different structWSF instances for about two months now. Each instance is hosting different dataset(s) that are queried for different purposes. I think that it worth taking some time starting to analyze the querying stats of two of these instances of the early Alpha version of structWSF.
The goal is to create some [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=structWSF Early Querying Metrics&amp;rft.aulast=Giasson&amp;rft.aufirst=Frédérick&amp;rft.subject=Structured Dynamics&amp;rft.subject=structWSF&amp;rft.source=Frederick Giasson&#8217;s Weblog&amp;rft.date=2009-08-18&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://fgiasson.com/blog/index.php/2009/08/18/structwsf-early-querying-metrics/&amp;rft.language=English"></span>
<p>We have been running different structWSF instances for about two months now. Each instance is hosting different dataset(s) that are queried for different purposes. I think that it worth taking some time starting to analyze the querying stats of two of these instances of the early Alpha version of structWSF.</p>
<p>The goal is to create some kind of checkpoints that we will be able to use in the future to check how the system improved or deteriorated. It is also to check what kind of metrics we could derive from the current logging system, and to check if we could find any bottle neck or issues with any of the endpoints.</p>
<p>The data used to analyze the instance A span from the 2009-06-08 at 7:16:38 to the 2009-08-18 at 12:28:37.</p>
<p>The data used to analyze the instance B span from the 2009-05-20 at 1:46:31to the 2009-08-18 at 12:40:28.</p>
<h3>structWSF Instance A</h3>
<p>The instance A only has 1 dataset with about 1000 instance records in it. As we can notice bellow, the average time of a query to that instance for all web service endpoints is about 210 milliseconds.</p>
<table border="0">
<tbody>
<tr style="border: 1px solid">
<td style="border: 1px solid"><strong><span class="rescolname">Number of queries</span></strong><br />
<span> </span></td>
<td style="border: 1px solid"><strong><span class="rescolname">Average time for each query in seconds</span></strong></td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">27956</td>
<td style="border: 1px solid">0.218252857656909</td>
</tr>
</tbody>
</table>
<p>The table bellow give us the total number of queries sent to each web service endpoint with an average time for each web service.</p>
<table class="listing" border="0">
<tbody>
<tr>
<td class="restitle" colspan="5"></td>
</tr>
<tr>
<td style="border: 1px solid"><strong><span class="rescolname">Web Service</span></strong></td>
<td style="border: 1px solid"><strong>Number of queries</strong></td>
<td style="border: 1px solid"><strong>Average time for each query in seconds</strong></td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">dataset_create</td>
<td style="border: 1px solid">265</td>
<td style="border: 1px solid">0.126993534699919</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">converter/tsv</td>
<td style="border: 1px solid">48</td>
<td style="border: 1px solid">0.128808428843714</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">dataset_update</td>
<td style="border: 1px solid">17</td>
<td style="border: 1px solid">0.140141641392576</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">dataset_read</td>
<td style="border: 1px solid">11780</td>
<td style="border: 1px solid">0.144073766884864</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">auth_registrar_access</td>
<td style="border: 1px solid">883</td>
<td style="border: 1px solid">0.145781793788779</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">converter/bibtex</td>
<td style="border: 1px solid">49</td>
<td style="border: 1px solid">0.149710825511323</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">auth_lister</td>
<td style="border: 1px solid">1970</td>
<td style="border: 1px solid">0.159979685066925</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">search</td>
<td style="border: 1px solid">1397</td>
<td style="border: 1px solid">0.180938945980523</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">browse</td>
<td style="border: 1px solid">8949</td>
<td style="border: 1px solid">0.199636802392004</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">crud_read</td>
<td style="border: 1px solid">638</td>
<td style="border: 1px solid">0.241032384406063</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">dataset_delete</td>
<td style="border: 1px solid">263</td>
<td style="border: 1px solid">0.420157149717388</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">crud_delete</td>
<td style="border: 1px solid">3</td>
<td style="border: 1px solid">0.637878338496</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">converter/irv</td>
<td style="border: 1px solid">792</td>
<td style="border: 1px solid">0.661979901670313</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">sparql</td>
<td style="border: 1px solid">715</td>
<td style="border: 1px solid">1.123084135322358</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">crud_create</td>
<td style="border: 1px solid">187</td>
<td style="border: 1px solid">1.486844727060763</td>
</tr>
<tr>
<td class="resfooter" colspan="5"></td>
</tr>
</tbody>
</table>
<p>This table gives the number of queries for each returned HTTP response status code by the endpoint. This kind of metrics is useful to debug potential issues</p>
<table class="listing" border="0">
<tbody>
<tr>
<td class="restitle" colspan="5"></td>
</tr>
<tr>
<td style="border: 1px solid"><strong><span>Web Service</span></strong></td>
<td style="border: 1px solid"><strong>Number of queries</strong></td>
<td style="border: 1px solid"><strong><span>HTTP Response Status</span></strong></td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">auth_lister</td>
<td style="border: 1px solid">1968</td>
<td style="border: 1px solid">200</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">auth_lister</td>
<td style="border: 1px solid">2</td>
<td style="border: 1px solid">400</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">auth_registrar_access</td>
<td style="border: 1px solid">883</td>
<td style="border: 1px solid">200</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">browse</td>
<td style="border: 1px solid">8949</td>
<td style="border: 1px solid">200</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">converter/bibtex</td>
<td style="border: 1px solid">45</td>
<td style="border: 1px solid">200</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">converter/bibtex</td>
<td style="border: 1px solid">2</td>
<td style="border: 1px solid">400</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">converter/bibtex</td>
<td style="border: 1px solid">2</td>
<td style="border: 1px solid">406</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">converter/irv</td>
<td style="border: 1px solid">740</td>
<td style="border: 1px solid">200</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">converter/irv</td>
<td style="border: 1px solid">51</td>
<td style="border: 1px solid">400</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">converter/irv</td>
<td style="border: 1px solid">1</td>
<td style="border: 1px solid">406</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">converter/tsv</td>
<td style="border: 1px solid">43</td>
<td style="border: 1px solid">200</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">converter/tsv</td>
<td style="border: 1px solid">2</td>
<td style="border: 1px solid">400</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">converter/tsv</td>
<td style="border: 1px solid">3</td>
<td style="border: 1px solid">406</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">crud_create</td>
<td style="border: 1px solid">66</td>
<td style="border: 1px solid">200</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">crud_create</td>
<td style="border: 1px solid">116</td>
<td style="border: 1px solid">400</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">crud_create</td>
<td style="border: 1px solid">5</td>
<td style="border: 1px solid">500</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">crud_delete</td>
<td style="border: 1px solid">3</td>
<td style="border: 1px solid">200</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">crud_read</td>
<td style="border: 1px solid">480</td>
<td style="border: 1px solid">200</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">crud_read</td>
<td style="border: 1px solid">158</td>
<td style="border: 1px solid">400</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">dataset_create</td>
<td style="border: 1px solid">265</td>
<td style="border: 1px solid">200</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">dataset_delete</td>
<td style="border: 1px solid">261</td>
<td style="border: 1px solid">200</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">dataset_delete</td>
<td style="border: 1px solid">2</td>
<td style="border: 1px solid">500</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">dataset_read</td>
<td style="border: 1px solid">11767</td>
<td style="border: 1px solid">200</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">dataset_read</td>
<td style="border: 1px solid">9</td>
<td style="border: 1px solid">400</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">dataset_read</td>
<td style="border: 1px solid">4</td>
<td style="border: 1px solid">500</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">dataset_update</td>
<td style="border: 1px solid">17</td>
<td style="border: 1px solid">200</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">search</td>
<td style="border: 1px solid">1393</td>
<td style="border: 1px solid">200</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">search</td>
<td style="border: 1px solid">4</td>
<td style="border: 1px solid">400</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">sparql</td>
<td style="border: 1px solid">693</td>
<td style="border: 1px solid">200</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">sparql</td>
<td style="border: 1px solid">19</td>
<td style="border: 1px solid">400</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">sparql</td>
<td style="border: 1px solid">3</td>
<td style="border: 1px solid">406</td>
</tr>
<tr>
<td class="resfooter" colspan="5"></td>
</tr>
</tbody>
</table>
<h3>structWSF Instance B</h3>
<p>The instance B has 25 datasets with about 2 312 000 instance records in it. As we can notice bellow, the average time of a query to that instance for all web service endpoints is about 550 milliseconds.</p>
<p>Why the average query time per query double with the size of that instance? It is what we will check.</p>
<table class="listing" border="0">
<tbody>
<tr>
<td class="restitle" colspan="5"></td>
</tr>
<tr>
<td style="border: 1px solid"><strong>Number of queries</strong></td>
<td style="border: 1px solid"><strong>Average time for each query in seconds</strong></td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">37575</td>
<td style="border: 1px solid">0.556303637714566</td>
</tr>
<tr>
<td class="resfooter" colspan="5"></td>
</tr>
</tbody>
</table>
<p>The table bellow give us the total number of queries sent to each web service endpoint with an average time for each web service. What we can notice is that the time it takes to create, delete and update records in the database management systems is related to the size of the dataset. So, what happened and is there anything we can do?</p>
<p>Most of the queries used for this analysis come from queries sent to structWSF v.1.0a1 and v1.0a2. However, something that has a major impact on these results changed in v1.0a3 that has been released last week. The big problem with these numbers is Solr&#8217;s commit time. In version v1.0a1 and v1.0a2, a Solr commit was issued each time something was updated in the index. Commit could take up to minutes sometimes with the size of its index. Since v1.0a3, we give that choice to the system administrator: he can issue commit each time something change in the index, or setup Solr&#8217;s AutoCommit setting properly. That means that we increased the performance of these CUD endpoints by about 95%.</p>
<p>For the SPARQL endpoint, the reason is that it is mostly exclusively used to export data from a structWSF instance. This means that big dump of RDF triples are incurred for each query, which justify the average time per query of 2.1 seconds.</p>
<table class="listing" border="0">
<tbody>
<tr>
<td class="restitle" colspan="5"></td>
</tr>
<tr>
<td style="border: 1px solid"><strong><span>Web Service</span></strong></td>
<td style="border: 1px solid"><strong>Number of queries</strong></td>
<td style="border: 1px solid"><strong>Average time for each query in seconds</strong></td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">dataset_create</td>
<td style="border: 1px solid">173</td>
<td style="border: 1px solid">0.09835156953404</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">auth_registrar_access</td>
<td style="border: 1px solid">1135</td>
<td style="border: 1px solid">0.114255581658327</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">dataset_update</td>
<td style="border: 1px solid">121</td>
<td style="border: 1px solid">0.119028852005636</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">dataset_read</td>
<td style="border: 1px solid">12683</td>
<td style="border: 1px solid">0.159165935205064</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">crud_read</td>
<td style="border: 1px solid">8546</td>
<td style="border: 1px solid">0.23457546435556</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">converter/bibtex</td>
<td style="border: 1px solid">109</td>
<td style="border: 1px solid">0.405608450600873</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">auth_lister</td>
<td style="border: 1px solid">2315</td>
<td style="border: 1px solid">0.471687612780759</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">search</td>
<td style="border: 1px solid">2313</td>
<td style="border: 1px solid">0.533951056245796</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">browse</td>
<td style="border: 1px solid">9103</td>
<td style="border: 1px solid">0.758227908033767</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">converter/tsv</td>
<td style="border: 1px solid">8</td>
<td style="border: 1px solid">0.863690733909698</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">sparql</td>
<td style="border: 1px solid">650</td>
<td style="border: 1px solid">2.115058046487879</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">converter/irv</td>
<td style="border: 1px solid">166</td>
<td style="border: 1px solid">2.681712512510398</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">crud_update</td>
<td style="border: 1px solid">13</td>
<td style="border: 1px solid">4.649851157114154</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">crud_create</td>
<td style="border: 1px solid">75</td>
<td style="border: 1px solid">11.306954870223277</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">dataset_delete</td>
<td style="border: 1px solid">140</td>
<td style="border: 1px solid">27.511527856750207</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">crud_delete</td>
<td style="border: 1px solid">25</td>
<td style="border: 1px solid">34.33350466727492</td>
</tr>
<tr>
<td class="resfooter" colspan="5"></td>
</tr>
</tbody>
</table>
<p>This table gives the number of queries for each returned HTTP response status code by the endpoint.</p>
<table class="listing" border="0">
<tbody>
<tr>
<td class="restitle" colspan="5"></td>
</tr>
<tr>
<td style="border: 1px solid"><strong><span>Web Service</span></strong></td>
<td style="border: 1px solid"><strong>Number of queries</strong></td>
<td style="border: 1px solid"><strong><span class="rescolname">HTTP Response Status</span></strong></td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">auth_lister</td>
<td style="border: 1px solid">2275</td>
<td style="border: 1px solid">200</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">auth_lister</td>
<td style="border: 1px solid">11</td>
<td style="border: 1px solid">400</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">auth_lister</td>
<td style="border: 1px solid">2</td>
<td style="border: 1px solid">406</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">auth_lister</td>
<td style="border: 1px solid">27</td>
<td style="border: 1px solid">500</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">auth_registrar_access</td>
<td style="border: 1px solid">1110</td>
<td style="border: 1px solid">200</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">auth_registrar_access</td>
<td style="border: 1px solid">25</td>
<td style="border: 1px solid">400</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">browse</td>
<td style="border: 1px solid">9084</td>
<td style="border: 1px solid">200</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">browse</td>
<td style="border: 1px solid">18</td>
<td style="border: 1px solid">400</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">browse</td>
<td style="border: 1px solid">1</td>
<td style="border: 1px solid">406</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">converter/bibtex</td>
<td style="border: 1px solid">108</td>
<td style="border: 1px solid">200</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">converter/bibtex</td>
<td style="border: 1px solid">1</td>
<td style="border: 1px solid">400</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">converter/irv</td>
<td style="border: 1px solid">154</td>
<td style="border: 1px solid">200</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">converter/irv</td>
<td style="border: 1px solid">12</td>
<td style="border: 1px solid">400</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">converter/tsv</td>
<td style="border: 1px solid">8</td>
<td style="border: 1px solid">200</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">crud_create</td>
<td style="border: 1px solid">41</td>
<td style="border: 1px solid">200</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">crud_create</td>
<td style="border: 1px solid">33</td>
<td style="border: 1px solid">400</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">crud_create</td>
<td style="border: 1px solid">1</td>
<td style="border: 1px solid">500</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">crud_delete</td>
<td style="border: 1px solid">24</td>
<td style="border: 1px solid">200</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">crud_delete</td>
<td style="border: 1px solid">1</td>
<td style="border: 1px solid">400</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">crud_read</td>
<td style="border: 1px solid">8268</td>
<td style="border: 1px solid">200</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">crud_read</td>
<td style="border: 1px solid">273</td>
<td style="border: 1px solid">400</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">crud_read</td>
<td style="border: 1px solid">5</td>
<td style="border: 1px solid">406</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">crud_update</td>
<td style="border: 1px solid">4</td>
<td style="border: 1px solid">200</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">crud_update</td>
<td style="border: 1px solid">9</td>
<td style="border: 1px solid">400</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">dataset_create</td>
<td style="border: 1px solid">171</td>
<td style="border: 1px solid">200</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">dataset_create</td>
<td style="border: 1px solid">2</td>
<td style="border: 1px solid">400</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">dataset_delete</td>
<td style="border: 1px solid">79</td>
<td style="border: 1px solid">200</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">dataset_delete</td>
<td style="border: 1px solid">61</td>
<td style="border: 1px solid">500</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">dataset_read</td>
<td style="border: 1px solid">12647</td>
<td style="border: 1px solid">200</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">dataset_read</td>
<td style="border: 1px solid">11</td>
<td style="border: 1px solid">400</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">dataset_read</td>
<td style="border: 1px solid">25</td>
<td style="border: 1px solid">500</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">dataset_update</td>
<td style="border: 1px solid">113</td>
<td style="border: 1px solid">200</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">dataset_update</td>
<td style="border: 1px solid">8</td>
<td style="border: 1px solid">500</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">search</td>
<td style="border: 1px solid">2286</td>
<td style="border: 1px solid">200</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">search</td>
<td style="border: 1px solid">24</td>
<td style="border: 1px solid">400</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">search</td>
<td style="border: 1px solid">3</td>
<td style="border: 1px solid">406</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">sparql</td>
<td style="border: 1px solid">618</td>
<td style="border: 1px solid">200</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">sparql</td>
<td style="border: 1px solid">22</td>
<td style="border: 1px solid">400</td>
</tr>
<tr style="border: 1px solid">
<td style="border: 1px solid">sparql</td>
<td style="border: 1px solid">6</td>
<td style="border: 1px solid">406</td>
</tr>
<tr class="resrowodd">
<td style="border: 1px solid">sparql</td>
<td style="border: 1px solid">4</td>
<td style="border: 1px solid">500</td>
</tr>
<tr>
<td class="resfooter" colspan="5"></td>
</tr>
</tbody>
</table>
<h3>Generating the Stats</h3>
<p>Here is the list of SQL query used to create these stat tables. You can run them locally on your structWSF instance to generate the same kind of statistics.</p>
<p>Timespan of the queries</p>
<blockquote><p>select min(request_datetime) as startdate, max(request_datetime) as enddate from SD.WSF.ws_queries_log;</p></blockquote>
<p>Get the average number of milliseconds per query sent to the syste</p>
<blockquote><p>select count(request_processing_time) as nb_queries, avg(request_processing_time) as average_query_time from SD.WSF.ws_queries_log order by ID desc;</p></blockquote>
<p>Get the average query time for each web service of a structWSF instance.</p>
<blockquote><p>select requested_web_service, count(request_processing_time) as nb_queries, avg(request_processing_time) as average_query_time from SD.WSF.ws_queries_log GROUP BY requested_web_service ORDER BY average_query_time ASC;</p></blockquote>
<p>Status messages counts per web service endpoint</p>
<blockquote><p>select requested_web_service, count(request_http_response_status) as nb_queries, request_http_response_status from SD.WSF.ws_queries_log GROUP BY requested_web_service, request_http_response_status ORDER BY requested_web_service, request_http_response_status;</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://fgiasson.com/blog/index.php/2009/08/18/structwsf-early-querying-metrics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
