Web Page Analysis With Cognonto

Extract Structured Content, Tag Concepts & Entities

 

Cognonto is brand new. At its core, it uses a structure of nearly 40 000 concepts. It has about 138,000 links to external classes and concepts that defines huge public datasets such as Wikipedia, DBpedia and USPTO. Cognonto is not a children’s toy. It is huge and complex… but it is very usable. Before digging into the structure itself, before starting to write about all the use cases that Cognonto can support, I will first cover all of the tools that currently exist to help you understand Cognonto and its conceptual structure and linkages (called KBpedia).

The embodiment of Cognonto that people can see are the tools we created and that we made available on the cognonto.com web site. Their goal is to show the structure at work, what ties where, how the conceptual structure and its links to external schemas and datasets help discover new facts, how it can drive other services, etc.

This initial blog post will discuss the demo section of the web site. What we call the Cognonto demo is a web page crawler that analyzes web pages to tag concepts, to tag named entities, to extract structured data, to detect language, to identity topics, and so forth. The demo uses the KBpedia structure and its linkages to Wikipedia, Wikidata, Freebase and USPTO to tag content that appears in the analyzed web pages. But there is one thing to keep in mind: the purpose of Cognonto is to link public or private datasets to the structure to expand its knowledge and make these tools (like the demo) even more powerful. This means that a private organization could use Cognonto, add their own datasets and link their own schemas, to improve their own version of Cognonto or to tailor it for their own purpose.

Let’s see what the demo looks like, what is the information it extracts and analyzes from any web page, and how it ties into the KBpedia structure.

Continue reading “Web Page Analysis With Cognonto”

Cognonto

I am proud to announce the start of a new venture called Cognonto. I am particularly proud of it because even if it is just starting, it is in fact more than eight years old. It is the embodiment of eight years of research, of experimentation, of a big deal of frustration and of great joy with my long-time partner Mike. cognonto_logo-square

Eight years ago, we set a 5-to-10-year vision for our work as partners. We defined an initial series of technological goals for which we outlined a series of yearly milestones. The goals were related to help solving decades old problems with data integration and interoperability using a completely new research field (at the time): the Semantic Web.

And there we are eight years later, after working for an endless number of hours to create all kinds of different projects and services to pay for the research and the pieces of technologies we develop for these purposes. Cognonto is the embodiment of that effort, but it also created a series of other purposeful projects such as the creation of Stuctured Dynamics, UMBEL, the Open Semantic Framework and a series of other open source collaterals.

We spent eight years to create, sanitize, to make coherent and consistent, to generate and regenerate a conceptual structure of now 38,930 reference concepts with 138,868 mapping links to 27 external schemas, vocabularies and datasets. This led to the creation of KBpedia, which is the knowledge graph that drives Cognonto. The full statistics are available here.

I can’t thank Mike enough for this long and wonderful journey that led to the creation of Cognonto. I sent him an endless number of concepts lists that he diligently screened, assessed and mapped. We spent hundred of hours to discuss the knots and bolts of the structure, to argue about its core concepts and how it should be defined and used. It was not without pain, but I believe that the result is truly astonishing.

I won’t copy/paste the Cognonto press release here, a link will suffice. I it is just not possible for me to write a better introduction than the two pagers that Mike wrote for the press release. I would also suggest that you read his Cognonto introduction blog post: Cognonto is on the Hunt for Big AI Game.

In the coming weeks, I will write a lot about Cognonto, what it is, how it can be used, what are its use cases, how the information that is presented in the demo and the knowledge graph sections should be interpreted and what these pages tell you.

Winnipeg City’s NOW [Data] Portal

The Winnipeg City’s NOW (Neighbourhoods Of Winnipeg) Portal is an initiative to create a complete neighbourhood web portal for its citizens. At the core of the project we have a set of about 47 fully linked, integrated and structured datasets of things of interests to Winnipegers. The focal point of the portal is Winnipeg’s 236 neighbourhoods, which define the main structure of the portal. The portal has six main sections: topics of interests, maps, history, census, images and economic development. The portal is meant to be used by citizens to find things of interest in their neibourhood, to learn their history, to see the images of the things of interest, to find tools to help economic development, etc.

The NOW portal is not new; Structured Dynamics was also its main technical contractor for its first release in 2013. However we just finished to help Winnipeg City’s NOW team to migrate their older NOW portal from OSF 1.x to OSF 3.x and from Drupal 6 to Drupal 7; we also trained them on the new system. Major improvements accompany this upgrade, but the user interface design is essentially the same.

The first thing I will do is to introduce each major section of the portal and I will explain the main features of each. Then I will discuss the new improvements of the portal.

[extoc]

Continue reading “Winnipeg City’s NOW [Data] Portal”

New UMBEL 1.50 Ships With 20 Linked Ontologies

I am proud to announce the immediate release of UMBEL version 1.50. This is a major effort that took a year to release.

What is UMBEL?

Let’s start by explaining what is UMBEL for the ones that never encountered this project before. UMBEL stands for “Upper Mapping and Binding Exchange Layer“. It is a conceptual structure that is designed to help content interoperate between systems.

UMBEL is a coherent general structure of 34 000 reference concepts which provides a scaffolding to link and interoperate other datasets and domain vocabularies. The conceptual structure is organized in a structure of 31 mostly disjoint SuperType.

UMBEL is written in OWL 2 and SKOS.

Continue reading “New UMBEL 1.50 Ships With 20 Linked Ontologies”

Open Semantic Framework 3.3 Released

Structured Dynamics is happy to announce the immediate availability of the Open Semantic Framework version 3.3. This new release of OSF lets system administrators choose between two different communication channels to send SPARQL queries to the triple store: triple_120
  1. HTTP
  2. ODBC

In OSF 3.1, the only communication channel available was a ODBC channel using the iODBC drivers. In OSF 3.2, the only communication channel available was a HTTP channel. What we did with OSF 3.3 is to let the system administrator choose between the two.

Quick Introduction to the Open Semantic Framework

What is the Open Semantic Framework?

The Open Semantic Framework (OSF) is an integrated software stack using semantic technologies for knowledge management. It has a layered architecture that combines existing open source software with additional open source components. OSF is designed as an integrated content platform accessible via the Web, which provides needed knowledge management capabilities to enterprises. OSF is made available under the Apache 2 license.

OSF can integrate and manage all types of content – unstructured documents, semi-structured files, spreadsheets, and structured databases – using a variety of best-of-breed data indexing and management engines. All external content is converted to the canonical RDF data model, enabling common tools and methods for tagging and managing all content. Ontologies provide the schema and common vocabularies for integrating across diverse datasets. These capabilities can be layered over existing information assets for unprecedented levels of integration and connectivity. All information within OSF may be powerfully searched and faceted, with results datasets available for export in a variety of formats and as linked data.

Why Multiple Channels in OSF?

Historically, OSF only used the ODBC channel to communicate with Virtuoso, and it was using the iODBC drivers. As explained in a previous blog post, the fact that we were using the iODBC drivers in Ubuntu was adding a lot of complexity into the system since we had to recompile most of the PHP packages to use that other ODBC driver.

With OSF 3.2, we refactored the code such that we could query any SPARQL HTTP endpoint. The goal of this current improvement is to be able to use any triple store that has a compatible SPARQL HTTP endpoint with OSF, and not just Virtuoso.

With OSF 3.3, what we choose to do is to make both options a possibility. However, what we did is to make sure that the latest version of Virtuoso was now properly working with the unixODBC drivers, which are shipped by default with Ubuntu.

This means that people can now use the ODBC channel, but using the unixODBC drivers instead. The end result of this enhancement is that it makes the maintenance of a Ubuntu/OSF instance much easier since no packages are on hold, and that the PHP5 packages can be updated at any time without needing to be recompiled using the iODBC drivers.

Deploying a New OSF 3.3 Server

Using the OSF Installer

OSF 3.3 can easily be deployed on a Ubuntu 14.04 LTS server using the osf-installer application. The deployment is done by executing the following commands in your terminal:

[cc lang=”bash”]
[raw]
mkdir -p /usr/share/osf-installer/

cd /usr/share/osf-installer/

wget https://raw.github.com/structureddynamics/Open-Semantic-Framework-Installer/3.3/install.sh

chmod 755 install.sh

./install.sh

./osf-installer –install-osf -v
[/raw]
[/cc]

Using an Amazon AMI

If you are an Amazon AWS user, you also have access to a free AMI that you can use to create your own OSF instance. The full documentation for using the OSF AMI is available here.

Upgrading Existing Installations

It is not possible to automatically upgrade previous versions of OSF to OSF 3.3. It is possible to upgrade an older instance of OSF to OSF version 3.3, but only manually. If you have this requirement, just let me know and I will write about the upgrade steps that are required to upgrade these instances to OSF version 3.3.

Conclusion

This new version of the Open Semantic Framework should be even simpler to install, deploy and maintain. Several additional small updates have also provided in this new version to other aspects of installation simpler and faster.