For more than a year we have been developing a completely new version of conStruct for Drupal 7 for one of our clients.
conStruct for Drupal 6 is really decoupled from Drupal and all the other contributed modules; in a word, it was not playing nice with Drupal. The goal of this new version has been to change that situation. The focus of this completely new conStruct module has been to create a series of connector modules that bridge most of Drupal’s core functionalities with remote structWSF instances.
We wanted to make sure that Drupal developers could manipulate content, within Drupal, that is hosted in structWSF instance(s). The best way to start aiming for that goal was to make sure that all of the core Drupal APIs commonly used by Drupal developers could be used to manipulate structWSF data like if it was native in Drupal. This is what these connectors are about.
The development of conStruct for Drupal 7 is not finished, but it is available in the Git repository. There is still refactoring and improvements required, mainly to make it easier to use and understand, but all of the code is working properly and is already used on production sites.
conStruct As a Large Scale Drupal Implementation
Those who follow the evolution of conStruct know that conStruct’s main goal is to use Drupal as a user interface for structWSF for administrative purposes, or for creating complete portals like the NOW portal. However, in our initial versions, Structured Dynamics’ purpose was to not tightly integrate with Drupal. Over time, though, we have seen broad acceptance for the Drupal front end and Drupal itself is evolving in ways compatible with semantic technologies.
What is changing with conStruct for Drupal 7, with all these connectors, is that we are now using conStruct to bridge Drupal with structWSF server instances. We supercharge Drupal 7’s capabilities with structWSF. Our evolution to a tighter Drupal coupling means the ability to manage, query, search, data mine, million of entities; to have vocabularies of tens of thousands of concepts; and to enable the querying of all of these entities and their content from any kind of devices or systems via a family of web services endpoints.
This is the initial version of what is (or should be) Drupal LSD for Structured Dynamics: A semantic web service framework backend system for Drupal.
conStruct’s Drupal Connectors
Here is the initial list of the connectors that exists:
structFieldStorage: this module creates a new structfieldstorage field storage system that can be used by Drupal fields to save the fields’ data into a remote structWSF instance. This is used to enable the Content Type entities to be saved into a structWSF instance. It is an extension of the Drupal field storage system
structEntities: this module creates a new Entity Type called the Resource Type that is used to see all the structWSF indexed records as native Entities in Drupal. This means that the Entity API can be used to manipulate any content in structWSF
structViews: this module creates a new data source for Views 3. This means that the Views 3 user interface is used to generate structWSF Search endpoint queries instead of SQL queries
structSearchAPI: this module exposes new search indexes to the Search API. This means that the Search API can be used to query a structWSF instance.
I will write about all these connectors individually in upcoming blog posts. I will cover their design, architecture and usage.
I am proud to announce the new NOW (Neighbourhoods Of Winnipeg) semantic web portal! This new and innovative semantic web portal was publicly announced by the Mayor of Winnipeg City last week.
The NOW (Neighbourhoods of Winnipeg) portal is “a new Web portal (the “Portal”) produced by the City of Winnipeg to provide broad, dynamic and interactive access to local and neighbourhood information. Designed for easy access and use by all citizens, businesses, community organizations and Governments, the information on the site includes municipal data, census and demographic information, economic development information, historical data, much spatial and mapping information, and facilities for including and sharing data by external groups and constituencies.”
I would suggest you to read Mike Bergman’s blog post about this new semantic web portal to have the proper background about that initiative by the city of Winnipeg and how it uses the OSF (Open Semantic Framework) as its foundational technology stack.
This project has been the springboard that led to the Open Semantic Framework version 1.1. Multiple pieces of the framework have been developed in relation to this project, and more particularly pieces like the sWebMap semantic component and several improvements to the structWSF web services endpoints and conStruct modules for Drupal 6.
Development of the Portal
The development plan of this portal is composed of four major areas:
Development of the data structure of the municipal domain by creating a series of ontologies
Conversion of existing data asset using this new data structure
Creation of the web portal by creating its design and by developing all the display templates
Creation of new tools to let users interact with the data available on the portal
Structured Dynamics has been involved in #1, #2 and #4 by providing design and development resources, technology transfer sessions and material and supporting internal teams to create, maintain and deploy their 57 publicly available datasets.
The Data Structure
This technology stack does not have any meaning without the proper data and data structures (ontologies) in place. This gold mine of information is what drives the functionality of the portal.
The portal is driven by 12 ontologies: 2 internal and 10 external. The content of the 57 publicly available datasets is defined by the classes and properties defined in one of these ontologies.
The two internal ontologies have been created jointly by Structured Dynamics and the City of Winnipeg, but they are extended and maintained by the city only.
These ontologies are maintained using two different kind of tools:
Protege is used for the big development tasks such as creating a big number of classes and properties, to do a big reorganization of the classes structure, etc.
structOntology is used for quick ontological changes to have an immediate impact on the behaviors of the portals such as label changes, SCO ontology property assignments to change the behavior of some of the tools that exist in the portal, etc.
structOntology can also be used by portal users to understand the underlying data structure used to define the data available on the portal. All users have access to the reading mode of the tool which let them browse, search and export the loaded ontologies on the portal.
The Data
Except for rare exceptions such as the historical photos, no new data has been created by the City of Winnipeg to populate this NOW portal. Most of its content comes from existing internal sources of data such as:
Conventional relational databases
GIS (Geographic Information System) on-top of relational databases
Spreadsheets
All of the conventional relation databases and legacy data from the GIS systems has been converted into RDF using the FME WorkbenchETL system. All of the FME workbench templates are mapping the relational data into RDF using the ontologies loaded into the portal. All of the geolocated records that exist in the portal come from this ETL process and have been converted using FME.
Some smaller datasets come from internal spreadsheets that got modified to comply with the commON spreadsheet format that is used to convert spreadsheet (CSV/TSV) data files into RDF.
All of the dataset creation and maintenance is managed internally by the City of Winnipeg using one of these two data conversion and importation processes.
Here are some internal statistics of the content that is currently accessible on the NOW portal.
General Portal
These are statistics related to different functionalities of the portal.
Number of neighbourhoods: 236
Number of community areas: 14
Number of wards: 15
Number of neighbourhood clusters: 23
Number of major site sections: 7
Total number of site pages: 428,019
Static pages: 2,245
Record-oriented pages: 425,874
Dynamic (search-based) pages: infinite
Number of documents: 1,017
Number of images: 2,683
Number of search facets: 1,392
Number of display templates: 54
Number of links: 1,067
External links: 784
Internal links: 283
Site Data
These statistics show the things that are available via the portal, what are their types, their properties, what is the quantity of data that is searchable, manipulable and exportable from the portal.
Number of datasets: 57
Number of records: 425,874
Number of geolocational records: 418,869
Point of interest (POI) records: 193,272
Polygon records: 218,602
Path (route) records: 6,995
Number of classes (types): 84
Number of properties: 1,308
Number of triple assertions: 8,683,103
Sharing Content
An important aspect of this portal is that all of the content is contextually available, in different formats, to all of the users of the portal. Whether you are browsing content within datasets, searching for specific pieces of content, or looking at a specific record page, you always have the possibility to get your hands on the content that is being displayed to you, the user, with a choice of five different data formats:
All content pages can be exported in one of the formats outlined above. In the bottom right corner of these pages you will see a Export button that you can click to get the content of that page in one of these formats.
Export Search Content
Every time you do a search on the portal, you can export the results of that search in one of the formats outlined above. You can do that by selecting the Export tab, and by selecting one of the formats you want to use for exporting the data.
Export Datasets
You can export any publicly available dataset from the portal. These datasets have to be exported in slices if they are too big to fit in a single slice. The datasets can be exported in one of the formats mentioned above.
Export Census
Users also have the possibility to export census data, from the census section of the portal, in spreadsheets. They only have to select the Tables tab, and then to click the Export Spreadsheet button.
Export Ontologies
The export functionality would not be complete without the ability to consult and export the ontologies that are used to describe the content exposed by the portal. These ontologies can be read from the ontologies reader user interface, or can be exported from the portal to be read by external ontologies management tools such as Protege.
Portal Design
The portal is using Drupal 6 as its CMS (Content Management System). The Drupal 6 instance communicates with structWSF using the conStruct module, which acts as a bridge between a Druapal portal and a structWSF web service network.
Here are the main design phases that have been required to create the portal:
Creation of the portal’s design, and the Drupal 6 theme that implements it
Creation of the Search and Browse results templates
Creation of the individual records’ page design and templates based on their type
Creation of the sWebMap search results templates.
The portal’s design has been created internally by the City of Winnipeg and by Tactica based on the Citizen DAN demo. Tactica also worked on another Citizen DAN like portal called MyPeg.ca.
Semantic Components
The NOW Web portal is using a series of tools that are called the Semantic Components. These are a set of Flash and JavaScript tools that can be embedded within any web page and that can easily communicate with structWSF instance(s). They display information in all kinds of charts, they can display document reading widgets, they can create dashboards of structured data, etc. The initial set of Semantic Components was developed for the MyPeg.ca project back in November 2010. This was before Steve Jobs announced that Apple would not support Adobe Flash, and far before Google announced that it would drop support for it as well.
Since the NOW portal wanted to re-use as much as possible to lower the development cost related to the portal, they choose to use the complete OSF stack which includes these Semantic Components.
However, when we participated in developing this new NOW portal, we did extended the set of Semantic Components by creating the most complex Semantic Component: the sWebMap. However, because of the two announcements mentioned above, we choose to move forward and to create the sWebMap Semantic Component using JavaScript instead of Flash. The other Semantic Component tools that have been developed in Flash have not yet been ported into JavaScript.
Conclusion
The new NOW semantic web portal’s main asset is its data: how it can be searched (with traditional search engines or using a semantic component to search, browse, filter and localize results), displayed and exported. This portal has been developed using a completely free and open source semantic platform that has been developed from previous projects that open sourced their code.
I consider this portal a pioneer in the way municipal organization will provide new online services to their citizens and to the commercial enterprises based on the quality of the data that will be exposed via such Web portals.
I am please to announce the release of the new sWebMap Semantic Component in JavaScript. This new mapping component is a standalone JavaScript application that can be integrated on any new or existing web sites and that interact with an Open Semantic Framework (OSF) instance to search, browse, filter and display with geographically-located information on an interactive map.
Features
The sWebMap is a rich mapping tool that can easily be integrated on any webpage, and that can be extensively customized. The sWebMap does support these features:
Full text search for searching and displaying results on a map
Extensive filtering capabilities
Filtering by dataset source
Filtering by type
Filtering by attribute/value
Filtering of records that belongs to a specific geographic region
Display of record on the map using:
Different markers depending on the type of record to display (determined by the ontologies)
Polygon shapes for records that refers to a geographic region
Polyline shapes for records that refers to a geographically-located path
Templating of records in a resultset depending on their type
Templating of records’ preview, displayed in an overlay window, depending on their type
Persist records on the map accros searches and filtering operations
Supports map sessions
Save map sessions
Load saved map sessions
Delete saved map sessions
Share saved map sessions
Supports a multiple-maps mode
Three focus maps are available under the main map
Each map focus on a particular region of the main map
User can switch between focus map to see different records in different region
Each sWebMap component communicates with an OSF (Open Semantic Framework) instance. More specifically, a sWebMap component will send Search/Filtering queries to a geo-enabled structWSF Search web service endpoint.
Depending on the options you had specified when you created the sWebMap control, each time you move (option), zoom (option) or change the filtering criterias, this will send a query to the Search endpoint. The sWebMap control then requests JSON formatted resultset and display the results to the user.
This means that to implement the sWebMap component on your website, you will need to have:
You can immediately download the entire code source from this GitHub reposiroty:
Installation
Installing the sWebMap component is really easy. In fact, you only have to load a few JavaScript and CSS files, to defined a <div></div> container for the map, and to create a sWebMap component object, which is a single line of code.
Additionally, you can initialize the sWebMap component with one of the multiple options available.
After releasing the new Open Semantic Framework Installer, we started to test it on machines with all kind of different specifications: different CPU limits, different amount of memory, etc. One of the setup that caught our attention was Amazon’s EC2 Micro Instance.
The Micro Instance is a virtual server type that has been introduced by Amazon a little bit more than a year ago. As described by Amazon, Micro Instances are:
Instances of this family provide a small amount of consistent CPU resources and allow you to burst CPU capacity when additional cycles are available. They are well suited for lower throughput applications and web sites that consume significant compute cycles periodically.
We were intrigued by this particular type of instance because we wanted to know how the complete Open Semantic Framework stack could operate on such a small server instance.
Micro Instance Specifications
The Micro Instance’s specifications are as follow:
613 MB memory
Up to 2 EC2 Compute Units (for short periodic bursts)
32-bit or 64-bit platform
I/O Performance: Low
Note that a EC2 Compute Unitprovides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.
Installing The Stack
Installing the stack on the Amazon Micro Instance, using the OSF Installer, is not the fastest experience in the World. In fact, installing the complete stack takes up to 10 hours (5 minutes of your time, but compiling and installing everything takes about 10 hours of CPU time).
The problem is that installing OSF is a CPU intensive task, while the Micro instance is not. The micro instance can sustain small CPU bursts, but it can’t sustain the creation and compilation of the entire stack. That means that the CPU cycles won’t be available to the instance, and that the CPU consumption of that instance will be throttled by Amazon, which will significantly slow down the installation process.
However, as you will see below, once OSF is installed on the Micro instance, the complete stack responds perfectly to all queries sent to it.
Creating an AMI
The only time you have to spend 10 hours to install the OSF stack on an Amazon Micro Instance is the first time. After that, you would only have to create an Amazon AMI from that vanilla OSF instance for future use. If you proceed that way, you will lower the installation time from 10 hours to a few minutes.
Reading and Searching Data
The testing we did for reading and searching data from structWSF shows that performances are as good as the ones you would get from a small instance with a normal workload. The Crud: Read and the Search structWSF endpoints are fully responsive and operational.
Creating, Updating and Deleting Data
The testing we did for creating, updating and deleting entire datasets takes more time than with a small instance even if the instance is dedicated to that only task, without any other queries processed by the instance at the same time. The reason for this decrease in performances is due to the CPU throttling done by Amazon for this kind of more CPU intensive task. However, since individual records creation, updating and deletion creates “CPU Peaks”, such isolated create/update/delete queries doesn’t greatly affect the overall performances of the instance.
What This Type Of Instance Is Good For?
We found that such small instances were perfect for data collection activities performed by a single person, or a small group of collaborators. We also found that it could be used by low-traffic websites such as personal web portal, personal blogs, etc. The complete OSF stack is fully responsive and our analysis shows that the resources (CPU and Memory) are stable and responsive with a normal workload.
Conclusion
Such a small server instance can easily be used to create a personal data collection endpoint, or a personal, or small, data presentation portal such as Mike’s semantic web Sweet Tools. It is well suited for data portals that require reading and searching of data with occasional data changes (addition, removal and modification of instance records).
The first is that it gives you a single place to access data. Streit explains: “Applications often need to retrieve data from multiple sources which adds complexity and development time. By using this technology we can get everything we need from a single place which drastically lowers development time and running costs.” Furthermore the exposure of data improves search and means that it can be repurposed in new and imaginative ways.