Open Sources Projects As A Pool Of Resources

In a previous blog post, I wrote about how Open Source may be unnatural, and even counter intuitive, to many people. However, that really begs some questions evident with my current company’s strategy.

Why have Mike Bergman and I chosen to develop no less than three major open source projects (structWSF, conStruct and the Semantic Components), encompassing more than 100 000 lines of new code and leveraging between 30 to 50 other open source software and libraries? Why have we open sourced all our software? Why has open source formed the core business strategy of Structured Dynamics in the last three years? How have we been able to profitably sustain the company, even in the midst of the global economic crisis that began in 2008?

I will try to answer these questions in this blog post, perhaps even providing some guidance for newer startups that may follow behind us.

Why Open Sourcing?

Why did Structured Dynamics chose to open source all of its software? There are multiple reasons why people and businesses choose to go open source. For some, it is because they think that it is where the market place is moving. For others it is because they think that a community will emerge around their effort, and then get free resources that improve the piece of software. Some think that their software will promptly be reviewed by professional programmer. Others may think that their system will become more secure. Etc.

For Structured Dynamics the reason why we choose to go open source is somewhat different:

We perceived that by open sourcing our complete software stack we could bootstrap the company without any external investment.

Making a Living out of Open Source Projects

There are multiple ways to do a living from an open source project:

  • Doing consultancy work related to the project
  • Implementing the software(s) into clients’ computer environment(s)
  • Selling training classes
  • Selling support contracts
  • Selling maintenance contracts
  • Selling hosted instances of the software (the SaaS model for one)
  • Selling development time to improve some part(s) of the software
  • Creating conferences around their open source projects
  • Selling proprietary extensions
  • I am probably missing a few, so please add them in a comment section below, and I will make sure to add them to this list.

Depending on the software you are developing, and depending on the business plan of your company, you may be doing one — or multiple — of these things to generate some money from your open source projects.

At Structured Dynamics we are doing some of them: we do get consultancy contracts related to the Open Semantic Framework and we do implement OSF in our clients’ computer environments.

But, more importantly, we are also doing development contracts related to the framework. In fact, each project we are working on is quite different. Our major projects involve companies that reside in totally different domains, have different needs and need to accommodate different kinds of users. However, most of the projects share the same core needs, and all of them advance the core technology in ways meaningful to our vision. We choose our customers — and , of course, vice versa — based on a true sense of partnership wherein both parties have their objectives furthered.

Let’s see how we use these relationships to drive the development of the Open Semantic Framework.

Open Source Project as a Pool of Resources

In the last three years, Structured Dynamics has attracted multiple companies and organizations that share our vision, and which are willing to invest in the Open Semantic Framework open source project. (See Mike’s recent post on business development for a bit more on that aspect of things.) Each of these clients did want to use the OSF framework for their own needs. However, each of them did want to do something special that was not currently implemented in the framework.

What we created in these three years is a pool of resources that we used to develop the framework such that it accommodates the needs of each of our clients. Each of our clients then becomes a participant to the shared pool of innovation. Our clients have been willing to invest in the open source framework because they need their own features and because they know that they will benefit from what other participants of the pool will invest themselves down the road.

In that scenario, we are the managers of a pool of resources. We have the vision of where we want the framework to go, we know the roadmap of the project and we know the needs of each participant (our clients). What we do is to try to optimize the resources we get from each of our clients by developing the framework such that it can accommodate as broad of a spectrum of participants as possible. Then, we seek to find new participants that have some needs that will help us continue to develop the next steps of the roadmap. In this manner, we Jacob’s Ladder our existing work to increase the capabilities for later clients, but earlier clients still benefit because they can upgrade to the later improvements. This is a self-sustaining model to continue to move the development of the framework forward.

By finding new clients, what we do is to give a return on investment to the other pool participants. Most of the new features that we develop for these new clients will benefit the other participants to the pool and will create new possibilities for them without any additional investment. All of our first clients have implemented what other participants later invest into the pool, thus crystallizing and augmenting their return on investment by using these new features.

Open Source is Not Just About Software

Open Source is not just about pieces of code, and this is quite important to understand. What we have open sourced with the Open Semantic Framework is much more than a series of code sources. We open sourced the entire framework:

  1. The source codes
  2. The documentation
  3. The processes
  4. The methodologies

We term this comprehensive approach our total open solution.

This distinction with other open source projects is an essential differentiator with our approach. We choose to open source all of the pieces related to the framework. What drove this decision is a simple sentence that shows our philosophy behind it:

“We’re Successful When We’re Not Needed”

If the APIs, processes and methodologies are not properly documented, it means that we would certainly be needed by our clients, which would mean that we failed to open source our solution. But since we are working to open source our code, our processes and our methodologies, we are on the way to successfully open source the Open Semantic Framework since we won’t be needed by our clients.

This business approach is not as crazy as it sounds. We are free to work on new and important innovations, and are not basing our company culture on dependency and a constant drain by our customers. I know, it does not sound like Larry Ellison, but sounds good to us and our clients. It is certainly not a maximum revenue objective built on the backs of individual clients.

Our life is more fun and our clients trust us with new stuff. Further, each step of the way, we are able to leverage our own framework for unbelievable productivity in what we deliver for the money. But that is a topic for another day.

We think Structured Dynamics’ business approach is a contemporary winning strategy. Our customers get good and advanced capabilities at low cost and risk, while we get to work on innovative extensions that are raising the semantic baseline for the marketplace. Who knows if we will always continue this path, but for now it is leading to sustained development and market growth for open semantic frameworks, including our own OSF.

 

 

Volkswagen’s RDF Data Management Workflow

TribalDDB UK’s team just published a new case study to the W3C: Case Study: Contextual Search for Volkswagen and the Automotive Industry. They discuss the benefits of some of the semantic web technologies, techniques and concepts that they use to help them managing their data. They describe their approach and outline their design. It covers the technical aspects of their new Semantic Web Platform that I wrote about a few weeks ago.

In this blog post, I want to further explain their data management workflow, and how their data get exposed to different kind of users.

Two Classes of Users

Let’s take a look at their data ingest/management/publishing workflow:

As you can see, all their data get collected, transformed and imported into structWSF. As I explained in my previous blog post, they are using structWSF to manage all their RDF data and access all the functionalities from the different web service endpoints.

However, how the data get exposed to the users is not that clear. In fact, it depends on the classes of users. A user can be multiple different things: it may be a person, it may be a computer software, it may be an organization, etc. However, there are two general classes of users:

  1. Public users, and
  2. Private users

Public users are users that have no direct relation with Volkswagen and that have no access to their internal network. Private users are generally internal departments or some internal software applications that have direct access to the structWSF instance.

Private Users

Private users generally have access to all structWSF web service endpoints. This means that all structWSF functionalities are accessible to them by querying the endpoints.

Two different kind of private users are specified in the use case’s schema:

  1. Volkswagen Site Search
  2. Other / External Applications

The Volkswagen site search is a software application that uses the structWSF Search endpoint to search, filter and expose their data to their users (the people who perform searches on the Volkswagen UK website).

The other/external applications are software applications that have access to the structWSF instance. These are generally internal applications that run in the same network. One of these applications is an internal software that exports all the RDF data from the structWSF SPARQL endpoint, and import it into Kasabi.

These are two examples of software applications that Volkswagen created around the structWSF web services to re-purpose, re-contextualize and re-publish their RDF data.

Public Users

There is currently two kinds of public users of this new Volkswagen Semantic Platform:

  • People, and
  • Software applications

Two interfaces have been made publicly available for each of these kinds of users:

  • A website search engine page for people, and
  • A SPARQL endpoint for software applications

When a person user reaches the website’s search page, the search query get sent to the structWSF Search web service endpoint. The result is then returned to the engine, get templated and displayed to the person user.

A SPARQL endpoint is accessible to the software applications. This endpoint is hosted by the Kasabi information marketplace. Volkswagen chooses to export everything from their structWSF into Kasabi to outsource the maintenance of their public SPARQL endpoint.

Unlock the Power

As we saw in this blog post and in the W3C use case, all Volkswagen UK data is internally managed by structWSF; however they are not locked into that system. They can easily communicate with external services to add new functionalities to their stack or to take business decision such as outsourcing the management of some publicly accessible data access endpoints.

This is an important characteristic of their design:

By choosing semantic web technologies (such as structWSF), techniques and concepts (such as their Vehicles OWL Ontology and RDF), they are not locking themselves into a specific framework. They can easily communicate with external systems and applications. This means that they can quickly adapt their system to their constantly changing needs.

Conclusion

I wrote this blog post to further explain Volkswagen’s data management workflow. I wanted to make sure that people were understanding the role that structWSF has in this use case, and the ecosystem it operates in.

One of Semantic Web’s Core Added Value

If I ask the question: “What added value(s) does the Semantic Web brings on the table?”. So, what are the benefits that companies and organizations would get from using the Semantic Web? I am pretty sure that after asking this question, I would get answers such as:
  • You will instantly be able to traverse graphs of relationships
  • You will be able to infer facts (so create/persist new knowledge) from other existing facts
  • You will be able to check to make sure that your knowledge base is consistent and satisfiable
  • You will be able to modify your ontologies/vocabularies/schemas without impacting the description of your instance records or the usability of any software that use it (unlike relation databases)
  • And so on…

All these answers would be accurate. However, what if these answers would only be a part of the real added value that the Semantic Web brings on the table?

Note: when I refer to the “Semantic Web” on this blog post (and across all my writings), I refer to a set of technologies, techniques and concepts referred as the Semantic Web. So it is not a single thing, but a complete set of things that creates new ways of working with, and manipulating, information.

Strong of about 7 years of research and development of Semantic Web technologies that includes about 3 years of developing the Open Semantic Framework, that the biggest added value that I found from utilizing Semantic Web technologies is only partially related to these answers. In fact the biggest added value for me, as a developer can be resumed in one word:

PRODUCTIVITY

As simple as this. The biggest added value I gained from using and applying Semantic Web related technologies, techniques and concepts is an important increase in development, and data integration productivity.

Such productivity gain as to do with one of Semantic Web’s core attribute:

FLEXIBILITY

This is what I was suggesting in my latest blog post about Volkswagen’s use of the Open Semantic Framework: how Volkswagen uses the Open Semantic Framework to get flexibility that will lead to a gain in productivity to integrate, publish, and re-contextualize their data assets. The few gains that I listed above are part of the reason why the Semantic Web gives you flexibility that leads to an increase in productivity.

This same point as been re-affirmed today by Lee Feigenbaum in its latest blog post Saving Months, Not Milliseconds: Do More Faster with the Semantic Web:

Why is this? Ultimately, it’s because of the inherent flexibility of the Semantic Web data model (RDF). This flexibility has been described in many different ways. RDF relies on an adaptive, resilient schema (from Mike Bergman); it enables cooperation without coordination (from David Wood via Kendall Clark); it can be incrementally evolved; changes to one part of a system don’t require re-designs to the rest of the system. These are all dimensions of the same core flexibility of Semantic Web technologies, and it is this flexibility that lets you do things fast with the Semantic Web.

Warning: Productivity is not synonymous with simplicity

However, I would warn people that think that productivity gains are possible because semantic web technologies are simpler to use, manage and implement than other existing technologies.

It is certainly not the case, and I don’t think it will ever be. Semantic Web technologies, techniques and concepts are not easy to understand, and have a big learning curve. This is partly true because these techniques, technologies and concepts are relatively new in the field of the computer sciences, and because they are not fully understood, defined, implemented and used.

Volkswagen’s Use of structWSF in their Semantic Web Platform

TribalDDB London, Volkswagen UK‘s partner, mentioned earlier this week that Volkswagen are using some parts of the Open Semantic Framework to develop the next generation of their online platform.

This story has been published by Jennifer Zaino’s in her article: Volkswagen: Das Auto Company is Das Semantic Web Company!

I can now talk about this project that uses some pieces of the framework that we have been developing for more than 3 years now.

The Objective

Volkswagen’s main objective behind the development of the next version of their Web platform started by improving their online search engine, but as William Greenly mentioned, it quickly became a strategic decision:

“So the objectives were about site search and improving it, but in the long-run it was always the idea to contextualize content, to facet content, to promote it in different contexts.”

The objective is to create a platform that gives them the flexibility to leverage all the data assets they own. This flexibility will help them to leverage the data assests they have to improve not only their search engine, but also to contextualize it in different parts of their websites, partner’s websites or to promote, and publish that same information on different communication channels or devices.

The Flexibility

What is a flexible platform in that context? A flexible platform is one that can integrate any kind of information sources. Such information sources in the context of Volkswagen can be a series of relational dataset schemas spread around the World, Excel spreadsheets, CSV files, old plain text technical documents about past model of cars, semi-structured documents such as webpages, etc.

A flexible platform is also one that minimally impact (if at all) the data consumers if the data structure changes in the system. This is really important since the World we live in constantly changes. This means that things constantly change and we have to reflect these changes in the data we own and maintain. This is why this point is so important, because we want to minimize the impact of the data structure changes that will happen all the time.

Having the flexibility to constantly adapt your data, while minimally impacting the data consumers of the system, enables you to make quick decision to adapt your strategy in a highly competitive World. This flexibility gives you a clear business advantage.

A flexible platform is also one that let you publish your data the way you want, in the format that is needed. Such a flexible platform has to give you access to an interface that give you access to all the functionalities of the platform without having to care about what happens under the hood.

A flexible system is one that can communicate your information on any kind of communication channels, and to any devices that have access to the Web.

Under the Hood

That next generation platform that Volkswagen is currently developing is partly based on a few of the main pieces of the Open Semantic Framework. These pieces help them to reach their goal by helping them giving the flexibility their platform needs.

The first step they gone thru was to create their Volkswagen Vehicles Ontology that is used to describe all the entities they want to index into their platform. The Web Ontology Language (OWL), along with the Resource Description Framework (RDF) is what gives them the complete flexibility on how they can integrate all the pieces of information they want, in a canonical format.

Then they choose to use structWSF (the structured data web services framework). This piece gives them the flexibility to get a series of web interfaces (web service endpoints) to create, update, manage and query their data. This web service layer enables them to do anything they want with their data, from anywhere on the Web. This is possible because all the functionalities of the framework are exposed as web service endpoints. StructWSF also gives them the possibility to communicate their data in multiple different formats. This makes it the perfect flexible system to feed their information in different contexts, in different communication channels or on different devices.

At Volkswagen, structWSF is used to populate, and keep in sync, their Solr and Triple Store instances. It gives them the time to care about the more important aspects of their platform, and to care about how the data should be synced between the various specialized data management systems.

By using structWSF to manage their data, they are able to reach some objectives to make their platform as flexible as possible:

  • To be able to minimize the impact of data changes to the data consumers
    • Because structWSF uses OWL & RDF to describe all the data it index
  • To be able to manipulate their data from anywhere
    • Because all the functionalities of structWSF are exposed as web service endpoints
  • To be able to communicate the information in different contexts, communication channels and devices
    • Because structWSF has, in its core, is designed to transform all the data it indexes in any other kind of format

The Next Step

One of their longer term goal and objective is to analyze their unstructured and semi-structured textual documents to extract some structure out of them, and to index them into their semantic platform. To do this, they are looking at using Scones, which is the structWSF semantic tagger web service endpoint. Scones will use some subject reference structures such as UMBEL to semantically tag the textual document. Once the document as been processed by Scones, and indexed in structWSF, it can now be re-published in different contexts based on the reference concepts that have been tagged to it. This gives them the flexibility to leverage non-structured sources of data and to re-purpose it in different ways by publishing it in different context and in different systems.

This second system will enable them to leverage the investment they made in the past, by writing all these textual documents, and to re-purpose, and re-contextualizing, them in all kind of different contexts.

Conclusion

I think that TribalDDB and Volkswagen make the good decision for their future. Taking the business decision to develop and maintain a completely new kind of information system is not an easy decision to take. I am not saying that they made the good choice to use our pieces of the stack. The decision goes far beyond this. Such a Semantic Platform challenges everything in an organization: the people that takes the decisions, the people that create and manage the data, the people that develop the system, the people that maintain that system, the consumers of the system, the customers, the partners, etc. This is a big decision; whatever the technology stack you plan to use. I congratulate them for the decision they took.

I strongly believe that this was the right decision to take considering the future opportunities they are creating to themselves.

 

 

What is an Ontology?

An ontology is the definition of a vocabulary, and the rules for combining its terms, used to describe things that needs to be communicated.

This is yet another tentative definition of what is an ontology applied for the semantic web. Before explaining that definition, I would like to continue by stating what I think is the main purpose of an ontology:

An ontology as for main purpose to communicate coherent and consistent information.

Different Kinds of Ontologies

Over the years, I tended to use the word “vocabulary,” along with the word “ontology,” in different blog posts and technical documents. However, the usage of each word may not always have been clear. Is an vocabulary an ontology? Is an ontology a vocabulary? Are these concepts synonymous? There is an important distinction to make: an ontology can be a vocabulary, but an ontology is much more than a simple vocabulary.

Ontologies can describe all kind of well-known knowledge representation structures, some simple, and others much more complex. Here is a small list of some of them:

  • lexicons
  • taxonomies, or
  • higher order knowledge description frameworks

In its most basic usage, an ontology will define a vocabulary. It will simply define the terms (words) that belongs to that vocabulary without saying anything regarding the usage of these words.

Then, an ontology could evolve into a taxonomy by defined hierarchical relationships between the terms that compose the vocabulary.

Finally, it can evolve further to become a higher order knowledge description framework that defines more complex usage rules such as: usage restrictions, all kind of relationships between described entities, etc. New knowledge could also be inferred. It is why I say that an ontology is not strictly a simple vocabulary, but that it powerful knowledge description framework.

Knowledge Base

As we saw above, the main purpose of an ontology is to be able to create a coherent and consistent knowledge base of information that can get communicated. So an ontology is a kind of language that let you create knowledge bases that are consistent, coherent and where new knowledge can be inferred. That is done by following the usage rules defined in the ontology.

However, there is another important aspect to take into account: an ontology will describe knowledge that is coherent and consistent, but according to the own World view of that ontology. This means that two ontologies, describing the same domain of knowledge, could consistently and coherently describe information according to their view of the World.

Let’s take an example. Let’s say that two book stores developed their own ontologies to describe the books they sell. Both companies sell books. There are good chances that they will use the same vocabulary to describe their books. However, the usage rules between these terms may differ between the two book stores. One of the book stores could say that a proceeding is a specialized kind of book. But the other book store could say that no, a proceeding is not a specialized kind of book, but that it is a document just like a book. So, both would describe a proceeding as a document, but one would have different interpretation rules about what a book really is. As you see, both book stores use the same vocabulary to define their library of books, but they interpret their meaning differently. If the two stores would have to exchange information about books in the future, they won’t have many difficulties because they are probably sharing the same vocabulary, but the interpretation of that information may differ. The result of these potential differences in their interpretations may be where a book will be classified into the store; or how their customers could search for a specific book, using different filtering criterias; etc.

This is not different than what happens in our daily lives: is there a day in your life when you don’t hear people arguing about different point of views? It is exactly the same thing that happens here. We potentially all live and see and the exact same events, images, sound, etc.; but we may all have a different interpretation of these things.

Ontologies in the Open Semantic Framework?

Ontologies are so flexible that we choose to make ontologies the “brain” of the Open Semantic Framework.

We wanted to use the most flexible knowledge description framework that would enable us to integrate any possible information sources that have been describe using any existing kind of simple, or really complex, knowledge representation structures such as simple: lexicons, taxonomies, relational schemas, etc. By using ontologies as its central piece, OSF is a flexibly data integration framework that can consolidate information from various, heterogeneous, sources of information.

If we remember the definition we started with, ontologies are not just about describing terms and their relationships in a coherent and consistent way. The ultimate purpose is to communicate that information. It is what the structWSF part of the Open Semantic Framework does: it let any kind of system that have access to the Internet to send, receive and manipulate information in multiple formats from a series of web service endpoints.

More Reading

Finally, I would suggest you to read Mike’s Intrepid Guide to Ontologies to have a better understanding of where ontologies come from, how they works, what other formats exists, what are the different approaches to ontologies and what tools currently exists to work with ontologies.