Tag Archive for 'data'

Winnipeg City’s NOW [Data] Portal

The Winnipeg City’s NOW (Neighbourhoods Of Winnipeg) Portal is an initiative to create a complete neighbourhood web portal for its citizens. At the core of the project we have a set of about 47 fully linked, integrated and structured datasets of things of interests to Winnipegers. The focal point of the portal is Winnipeg’s 236 neighbourhoods, which define the main structure of the portal. The portal has six main sections: topics of interests, maps, history, census, images and economic development. The portal is meant to be used by citizens to find things of interest in their neibourhood, to learn their history, to see the images of the things of interest, to find tools to help economic development, etc.

The NOW portal is not new; Structured Dynamics was also its main technical contractor for its first release in 2013. However we just finished to help Winnipeg City’s NOW team to migrate their older NOW portal from OSF 1.x to OSF 3.x and from Drupal 6 to Drupal 7; we also trained them on the new system. Major improvements accompany this upgrade, but the user interface design is essentially the same.

The first thing I will do is to introduce each major section of the portal and I will explain the main features of each. Then I will discuss the new improvements of the portal.

Datasets

A NOW portal user won’t notice any of this, but the main feature of the portal is the data it uses. The portal manages 47 datasets (and growing) of fully structured, integrated and linked datasets of things of interests to Winnipegers. What the portal does is to manage entities. Each kind of entity (swimming pools, parks, places, images, addresses, streets, etc.) are defined with multiple properties and values. Several of the entities reference other entities in other datasets (for example, an assessment parcel from the Assessment Parcels dataset references neighbourhoods entities and property addresses entities from their respective datasets).

The fact that these datasets are fully structured and integrated means that we can leverage these characteristics to create a powerful search experience by enabling filtering of the information on any of the properties, to bias the searches depending where a keyword search match occurs, etc.

Here is the list of all the 47 datasets that currently exists in the portal:

  1. Aboriginal Service Providers
  2. Arenas
  3. Neighbourhoods of Winnipeg City
  4. Streets
  5. Economic Development Images
  6. Recreation & Leisure Images
  7. Neighbourhoods Images
  8. Volunteer Images
  9. Library Images
  10. Parks Images
  11. Census 2006
  12. Census 2001
  13. Winnipeg Internal Websites
  14. Winnipeg External Websites
  15. Heritage Buildings and Resources
  16. NOW Local Content Dataset
  17. Outdoor Swimming Pools
  18. Zoning Parcels
  19. School Divisions
  20. Property Addresses
  21. Wading Pools
  22. Electoral wards of Winnipeg City
  23. Assessment Parcels
  24. Libraries
  25. Community Centres
  26. Police Service Centers
  27. Community Gardens
  28. Leisure Centres
  29. Parks and Open Spaces
  30. Community Committee
  31. Commercial real estates
  32. Sports and Recreation Facilities
  33. Community Characterization Areas
  34. Indoor Swimming Pools
  35. Neighbourhood Clusters
  36. Fire and Paramedic Stations
  37. Bus Stops
  38. Fire and Paramedic Service Images
  39. Animal Services Images
  40. Skateboard Parks
  41. Daycare Nurseries
  42. Indoor Soccer Fields
  43. Schools
  44. Truck Routes
  45. Fire Stations
  46. Paramedic Stations
  47. Spray Parks Pads

Structured Search

The most useful feature of the portal to me is its full-text search engine. It is simple, clean and quite effective. The search engine is configured to try to give the most relevant results a NOW portal user may be searching. For example, it will positively bias some results that comes from some specific datasets, or matches that occurs in specific property values. The goal of this biasing is to improve the quality of the returned results. This is somewhat easy to do since the context of the portal is well known and we can easily boost scoring of search results since everything is fully structured.

Another major gain is that all the search results are fully templated. The search results do not simply return a title and some description for your search results. It does template all the information the system has about the matched results, but also displays the most relevant information to the users in the search results.

For example, if I search for a indoor swimming pool, in most of the cases it may be to call the front desk to get some information about the pool. This is why different key information will be displayed directly in the search results. That way, most of the users won’t even have to click on the result to get the information they were looking for directly in the search results page.

Here is an example of a search for the keywords main street. As you can notice, you are getting different kind of results. Each result is templated to get the core information about these entities. You have the possibility to focus on particular kind of entities, or to filter by their location in specific neighbourhoods.

now--search-1

Templated Search Results

Now let’s see some of the kind of entities that can be searched on the portal and how they are presented to the users.

Here is an example of an assessment parcel that is located in the St. John’s neighbourhood. The address, the value, the type and the location of the parcel on a map is displayed directly into the search results.

now--template-search-assessment-pacels

Another kind of entity that can be searched are the property addresses. These are located on a map, the value of the parcels and the building and the zoning of the address is displayed. The property is also linked to its assessment parcel entity which can be clicked to get additional information about the parcel.

now--template-search-property-address

Another interesting type of entity that can be searched are the streets. What is interesting in this case is that you get the complete outline of the street directly on a map. That way you know where it starts and where it ends and where it is located in the city.

now--template-search-street

There are more than a thousand geo-localized images of all different things in the city that can be searched. A thumbnail of the image and the location of the thing that appears on the image appears in the search results.

now--template-search-heritage-building-image

If you were searching for a nursery for your new born child, then you can quickly see the name, location on a map and the phone number of the nursery directly in the search result.

now--template-search-nurseries

There are just a few examples of the fifty different kind of entities that can appear like this in the search results.

Mapping

The mapping tool is another powerful feature of the portal. You can search like if you were using the full-text search engine (the top search box on the portal) however you will only get the results that can be geo-localized on a map. You can also simply browse entities from a dataset or you can filter entities by their properties/values. You can persist entities you find on the map and save the map for future reference.

In the example below, it shows that someone searched for a street (main street) and then he persisted it on the map. Then he search for other things like nurseries and selected the ones that are near the street he persisted, etc. That way he can visualize the different known entities in the portal on a map to better understand where things are located in the city, what exists near a certain location, within a neighbourhood, etc.

now--map

Census Analysis

Census information is vital to the good development of a city. They are necessary to understand the trends of a sector, who populates it, etc., such that the city and other organizations may properly plan their projects to have has much impact as possible.

These are some of the reason why one of the main section of the site is dedicated to census data. Key census indicators have been configured in the portal. Then users can select different kind of regions (neighbourhood clusters, community areas and electoral wards) to get the numbers for each of these indicators. Then they can select multiple of these regions to compare each other. A chart view and a table view is available for presenting the census data.

now--census

History, Images & Points of Interest

The City took the time to write the history of each of its neighbourhoods. In additional to that, they hired professional photographs to photograph the points of interests of the city, to geo-localize them and to write a description for each of these photos. Because of this dedication, users of the portal can learn a much about the city in general and the neighbourhood they live in. This is what the History and Image sections of the website are about.

now--history

Historic buildings are displayed on a map and they can be browsed from there.

now--history-heritage-buildings

Images of points of interests in the neighbourhood are also located on a map.

now--history-heritage-resources

Find Your Neighbourhood

Ever wondered in which neighbourhood you live in? No problem, go on the home page, put your address in the Find your Neighbourhood section and you will know it right away. From there you can learn more about your neighbourhood like its history, the points of interest, etc.

now--find-your-neighbourhood

Your address will be located on a map, and your neighbourhood will be outlined around it. Not only you will know in which neighbourhood you live, but you will also know where you live within it. From there you can click on the name of the neigbourhood to get to the neighbourhood’s page and start learning more about it like its history, to see photos of points of interest that exists in your neighbourhood, etc.

now--find-your-neighbourhood-result

Browsing Content by Topic

Because all the content of the portal is fully structured, it is easy to browse its content using a well defined topic structure. The city developed its own ontology that is used to help the users browse the content of the portal by browsing topics of interest. In the example below, I clicked the Economic Development node and then the Land use topic. Finally I clicked the Map button to display things that are related to land use: in this case, zoning and assessment parcels are displayed to the user.

This is another way to find meaningful and interesting content from the portal.

now--topics

Depending on the topic you choose, and the kind of information related to that topic, you may end up with different options like a map, a list of links to documents related to that topic, etc.

Export Content

Now that I made an overview of each of the main features of the portal, let’s go back to the geeky things. The first thing I said about this portal is that at its core, all information it manages is fully structured, integrated and linked data. If you get to the page of an entity, you have the possibility to see the underlying data that exists about it in the system. You simply have to click the Export tab at the top of the entity’s page. Then you will have access to the description of that entity in multiple different formats.

now--export-entity

In the future, the City should (or at least I hope will) make the whole set of datasets fully downloadable. Right now you only have access to that information via that export feature per entity. I hope because this NOW portal is fully disconnected from another initiative by the city: data.winnipeg.ca, which uses Socrata. The problem is that barely any of the datasets from NOW are available on data.winnipeg.ca, and the ones that are appearing are the raw ones (semi-structured, un-documented, un-integrated and non-linked) all the normalization work, the integration work, the linkage work done by the NOW team hasn’t been leveraged to really improve the data.winnipeg.ca datasets catalog.

New with the upgrades

Those who are familiar with the NOW portal will notice a few changes. The user interface did not change that much, but multiple little things got improved in the process. I will cover the most notable of these changes.

The major changes that happened are in the backend of the portal. The data management in OSF for Drupal 7 is incompatible with what was available in Drupal 6. The management of the entities became easier, the configuration of OSF networks became a breeze. A revisioning system has been added, the user interface is more intuitive, etc. There is no comparison possible. However, portal users’ won’t notice any of this, since these are all site administrator functions.

The first thing that users will notice is the completely new full-text search engine. The underlying search engine is almost the same, but the presentation is far better. All entity types have gotten their own special template, which are displayed in a special way in the search results. Most of the time results should be much more relevant, filtering is easier and cleaner. The search experience is much better in my view.

The overall site performance is much better since different caching strategies have been put in place in OSF 3.x and OSF for Drupal. This means that most of the features of the portal should react more swiftly.

Now every type of entity managed by the portal is templated: their webpage is templated in specific ways to optimize the information they want to convey to users along with their search result “mini page” when they get returned as the result of a search query.

Multi-linguality is now fully supported by the portal, however not everything is currently templated. However expect a fully translated NOW portal in French in the future.

Creating a Network of Portals

One of the most interesting features that goes with this upgrade is that the NOW portal is now in a position to participate into a network of OSF instances. What does that mean? Well, it means that the NOW portal could create partnerships with other local (regional, national or international) organizations to share datasets (and their maintenance costs).

Are there other organizations that uses this kind of system? Well, there is at least another one right in Winnipeg City: MyPeg.ca, also developed by Structured Dynamics. MyPeg uses RDF to model its information and uses OSF to manage its information. MyPeg is a non-profit organization that uses census (and other indicator) data to do studies on the well being of Winnipegers. The team behind MyPeg.ca are research experts in indicator data. Their indicator datasets (which includes census data) is top notch.

Let’s hypothetize that there would be interest between the two groups to start collaborating. Let’s say that the NOW portal would like to use MyPeg’s census datasets instead of its own since they are more complete, accurate and include a larger number of important indicators. What they basically want is to outsource the creation and maintenance of the census/indicators data to a local, dedicated and highly professional organization. The only things they would need to do is to:

  1. Formalize their relationship by signing a usage agreement
  2. The NOW portal would need to configure the MyPeg.ca OSF network into their OSF for Drupal instance
  3. The NOW portal would need to register the datasets it want to use from MyPeg.ca.

Once these 3 steps are done, taking no more than a couple of minutes, then the system administrators of the NOW portal could start using the MyPeg.ca indicator datasets like they were existing on their own network. (The reverse could also be true for MyPeg.) Everything would be transparent to them. From then on, all the fixes and updates performed by MyPeg.ca to their indicator datasets would immediately appear on the NOW portal and accessible to its users.

This is one possibility to collaborate. Another possibility would be to simply on a routine basis (every month, every 6 months, every year) share the serialized datasets such that the NOW portal re-import the dataset from the files shared by MyPeg.ca. This is also possible since both organizations use the same Ontology to describe the indicator data. This means that no modification is required by the City to take that new information into account, they only have to import and update their local datasets. This is the beauty of ontologies.

Conclusion

The new NOW portal is a great service for citizens of Winnipeg City. It is also a really good example of a web portal that leverages fully structured, integrated and linked data. To me, the NOW portal is a really good example of the features that should go along with a municipal data portal.

Open Semantic Framework 3.3 Released

Structured Dynamics is happy to announce the immediate availability of the Open Semantic Framework version 3.3. This new release of OSF lets system administrators choose between two different communication channels to send SPARQL queries to the triple store: triple_120
  1. HTTP
  2. ODBC

In OSF 3.1, the only communication channel available was a ODBC channel using the iODBC drivers. In OSF 3.2, the only communication channel available was a HTTP channel. What we did with OSF 3.3 is to let the system administrator choose between the two.

Quick Introduction to the Open Semantic Framework

What is the Open Semantic Framework?

The Open Semantic Framework (OSF) is an integrated software stack using semantic technologies for knowledge management. It has a layered architecture that combines existing open source software with additional open source components. OSF is designed as an integrated content platform accessible via the Web, which provides needed knowledge management capabilities to enterprises. OSF is made available under the Apache 2 license.

OSF can integrate and manage all types of content – unstructured documents, semi-structured files, spreadsheets, and structured databases – using a variety of best-of-breed data indexing and management engines. All external content is converted to the canonical RDF data model, enabling common tools and methods for tagging and managing all content. Ontologies provide the schema and common vocabularies for integrating across diverse datasets. These capabilities can be layered over existing information assets for unprecedented levels of integration and connectivity. All information within OSF may be powerfully searched and faceted, with results datasets available for export in a variety of formats and as linked data.

Why Multiple Channels in OSF?

Historically, OSF only used the ODBC channel to communicate with Virtuoso, and it was using the iODBC drivers. As explained in a previous blog post, the fact that we were using the iODBC drivers in Ubuntu was adding a lot of complexity into the system since we had to recompile most of the PHP packages to use that other ODBC driver.

With OSF 3.2, we refactored the code such that we could query any SPARQL HTTP endpoint. The goal of this current improvement is to be able to use any triple store that has a compatible SPARQL HTTP endpoint with OSF, and not just Virtuoso.

With OSF 3.3, what we choose to do is to make both options a possibility. However, what we did is to make sure that the latest version of Virtuoso was now properly working with the unixODBC drivers, which are shipped by default with Ubuntu.

This means that people can now use the ODBC channel, but using the unixODBC drivers instead. The end result of this enhancement is that it makes the maintenance of a Ubuntu/OSF instance much easier since no packages are on hold, and that the PHP5 packages can be updated at any time without needing to be recompiled using the iODBC drivers.

Deploying a New OSF 3.3 Server

Using the OSF Installer

OSF 3.3 can easily be deployed on a Ubuntu 14.04 LTS server using the osf-installer application. The deployment is done by executing the following commands in your terminal:

mkdir -p /usr/share/osf-installer/

cd /usr/share/osf-installer/

wget https://raw.github.com/structureddynamics/Open-Semantic-Framework-Installer/3.3/install.sh

chmod 755 install.sh

./install.sh

./osf-installer --install-osf -v

Using an Amazon AMI

If you are an Amazon AWS user, you also have access to a free AMI that you can use to create your own OSF instance. The full documentation for using the OSF AMI is available here.

Upgrading Existing Installations

It is not possible to automatically upgrade previous versions of OSF to OSF 3.3. It is possible to upgrade an older instance of OSF to OSF version 3.3, but only manually. If you have this requirement, just let me know and I will write about the upgrade steps that are required to upgrade these instances to OSF version 3.3.

Conclusion

This new version of the Open Semantic Framework should be even simpler to install, deploy and maintain. Several additional small updates have also provided in this new version to other aspects of installation simpler and faster.

Open Semantic Framework 3.2 Released

Structured Dynamics is happy to announce the immediate availability of the Open Semantic Framework version 3.2. This is the second important OSF release in a month and a half. triple_120

This new major release of OSF changes the way the web services communicate with the triple store. Originally, OSF web services were using a ODBC channel to communicate with the triple store (Virtuoso). This new release uses the SPARQL HTTP endpoints of the triple store to send queries to it. This is the only changes that occurs in this new version, but as you will see bellow, this is a major one.

Why switching to HTTP?

The problem with using ODBC as the primary communication channel between the OSF web services and the triple store is that it was adding a lot of complexity into OSF. Because the UnixODBC drivers that are shipped with Ubuntu had issues with Virtuoso, we had to use the iODBC drivers to make sure that everything was working properly. This situation forced us to recompile PHP5 such that it uses iODBC instead of UnixODBC as the ODBC drivers for PHP5.

This was greatly complexifying the deployment of OSF since we couldn’t use the default PHP5 packages that shipped with Ubuntu, but had to maintain our own ones that were working with iODBC.

The side effect of this is that system administrators couldn’t upgrade their Ubuntu instances normally since PHP5 needed to be upgraded using particular packages created for that purpose.

Now that OSF doesn’t use ODBC to communicate with the triple store, all this complexity goes away since no special handling is now required. All of the default Ubuntu packages can be used like system administrators normally do.

With this new version, the installation and deployment of a OSF instance has been greatly simplified.

Supports New Triple Stores

Another problem with using ODBC is that it was limiting the number of different triple stores that could be used for operating OSF. In fact, people could only use Virtuoso with their OSF instance.

This new release opens new opportunities. OSF still ships with Virtuoso Open Source as its default triple store, however any triple store that has the following characteristics could replace Virtuoso in OSF:

  1. It has a SPARQL HTTP endpoint
  2. It supports SPARQL 1.1 and SPARQL Update 1.1
  3. It supports SPARQL Update queries that can be sent to the SPARQL HTTP endpoint
  4. It supports the SPARQL 1.1 Query Results JSON Format
  5. It supports the SPARQL 1.1 Graph Store HTTP Protocol via a HTTP endpoint (optional, only required by the Datasets Management Tool)

Deploying a new OSF 3.2 Server

Using the OSF Installer

OSF 3.2 can easily be deployed on a Ubuntu 14.04 LTS server using the osf-installer application. It can easily be done by executing the following commands in your terminal:

mkdir -p /usr/share/osf-installer/

cd /usr/share/osf-installer/

wget https://raw.github.com/structureddynamics/Open-Semantic-Framework-Installer/3.2/install.sh

chmod 755 install.sh

./install.sh

./osf-installer --install-osf -v

Using a Amazon AMI

If you are an Amazon AWS user, you also have access to a free AMI that you can use to create your own OSF instance. The full documentation for using the OSF AMI is available here.

Upgrading Existing Installations

It is not possible to automatically upgrade previous versions of OSF to OSF 3.2. It is possible to upgrade a older instance of OSF to OSF version 3.2, but only manually. If you have this requirement, just let me know and I will write about the upgrade steps that are required to upgrade these instances to OSF version 3.2.

Security

Now that the triple store’s SPARQL HTTP endpoint requires it to be enabled with SPARQL Update rights, it is more important than ever to make sure that the SPARQL HTTP endpoint of the triple store is only available to the OSF web services.

This can be done by properly configuring your firewall or proxy such that only local traffic, or traffic coming from the OSF web service processes, can reach the endpoint.

The SPARQL endpoint that should be exposed to the outside World is OSF’s SPARQL endpoint, which adds an authentication layer above the triple store’s endpoint, and restricts potentially armful SPARQL queries.

Conclusion

This new version of the Open Semantic Framework greatly simplifies its deployment and its maintenance. It also enables other triple stores that exist on the market to be used for OSF instead of Virtuoso Open Source.




This blog is a regularly updated collection of my thoughts, tips, tricks and ideas about data mining, data integration, data publishing, the semantic Web, my researches and other related software development.


RSS Twitter LinkedIN


Follow

Get every new post on this blog delivered to your Inbox.

Join 93 other followers:

Or subscribe to the RSS feed by clicking on the counter:




RSS Twitter LinkedIN