structFieldStorage: A New Field Storage System for Drupal 7

Structured Dynamics has been working with Drupal for quite some time. This week marks our third anniversary of posting code to the contributed conStruct modules in Drupal. But, what I’m able to share today is our most exciting interaction with Drupal to date. In essence, we now can run Drupal directly from an RDF triplestore and take full advantage of our broader Open Semantic Framework (OSF) stack. Massively cool!

On a vanilla Drupal 7 instance, everything ends up being saved into Drupal’s default storage system. This blog post introduces a new way to save (local) Content Type entities: the structfieldstorage field storage system. This new field storage system gives the possibility to Drupal administrators to choose to save specific (or all) fields and their values into a remote structWSF instance. This option replaces Drupal’s default storage system (often MySQL) for the content types and their fields chosen.

By using this new field storage system, all of the local Drupal 7 content can be queried via any of structWSF’s web service endpoints (which includes a SPARQL endpoint). This means that all Drupal 7 content (using this new storage system) gets converted and indexed as RDF data. This means that all of the Drupal local content gets indexed in a semantic web service framework.

Fields and Bundles

There are multiple core concepts in Drupal, two of which are Bundles and Fields. A Field is basically an attribute/value tuple that describes an entity. A Bundle is a set (an aggregation) of fields. The main topic of this blog post is a special feature of the field: their storage system.

In Drupal, each field instance does have its own field storage system associated to it. A field storage system is a system that manages the field/value tuples of each entity that has been defined as a Drupal instance. The default storage system of any field is the field_sql_storage, which is normally a MySQL server or database.

The field storage system allows a bundle to have multiple field instances, each of which may have a different field storage target. This means that the data that describes an entity can be saved in multiple different data stores. Though it may appear odd at first as to why such flexibility has merit, but we will see that this design is quite clever, and probably essential.

There are currently a few other field storage systems that have been developed for Drupal 7 so far. The most cited one is probably the MongoDB module, and there is also Riak. What I am discussing in this blog post is a new field storage system for Drupal 7 which uses structWSF as the data store. This new module is called the structFieldStorage module and it is part of conStruct.

Flexibility of the Field Storage API design

The design of having one field storage system per field is really flexible and probably essential. By default, all of the field widgets and all the modules have been created using the field_sql_storage system. This means that a few things here and there have been coded with the specificities of that field storage system. The result is that even if the Field Storage API has been designed and developed such that we can create new field storage systems, the reality is that once you do it, multiple existing field widgets and modules can break from the new field storage systems.

What the field storage system developer has to do is to test all the existing (core) field widgets and modules and make sure to handle all the specifics of these widgets and modules within the field storage system. If it cannot handle a specific widget or module, it should restrict their usage and warn the user.

However, there are situations where someone may require the use of a specific field widget that won’t work with that new field storage system. Because of the flexibility of the design, we can always substitute the field_sql_storage system for the given field dependent on that special widget. Under this circumstance, the values of that field widget would be saved in the field_sql_storage system (MySQL) while the other fields would save their value in a structWSF instance. Other circumstances may also warrant this flexibility.

structFieldStorage Architecture

Here is the general architecture for the structFieldStorage module. The following schema shows how the Drupal Field Storage API Works, and shows the flexibility that resides into the fields, and how multiple fields, all part of the same bundle, can use different storage systems:

bundles_fields_field_storage_api_outline

By default, on a vanilla Drupal instance, all the fields use the field_sql_storage field storage system:

default_field_storage_system_interaction

Here is what that same bundle looks like when all fields use the structfieldstorage field storage system:

ccr_field_storage_system_interaction

Finally here is another schema that shows the interaction between Drupal’s core API, structFieldStorage and the structWSF web service endpoints:

structFieldStorage

Synchronization

Similar to the default MySQL field_sql_storage system, we have to take into account a few synchronization use cases when dealing with the structfieldstorage storage system for the Drupal content types.

Synchronization with structFieldStorage occurs when fields and field instances that use the structfieldstorage storage system get deleted from a bundle or when an RDF mapping changes. These situations don’t appear often once a portal is developed and properly configured. However, since things evolve all the time, the synchronization mechanism is always available to handle deleted content or changed schema.

The synchronization workflow answers the following questions:

  • What happens when a field get deleted in a content type?
  • What happens when a field’s RDF mapping changes for a new property?
  • What happens when a bundle’s type RDF mapping changes for a new one?

Additionally, if new field instances are being created in a bundle, no synchronization of any kind is required. Since this is a new field, there is necessarily no data for this field in the OSF, so we just wait until people start using this new field to commit new data in the OSF.

The current synchronization heuristics follow the following steps:

  1. Read the structfieldstorage_pending_opts_fields table and get all the un-executed synchronization change operations
    1. For each un-executed change:
      1. Get 20 records within the local content dataset from the Search endpoint. Filter the results to get all the entities that would be affected by the current change
        1. Do until the Search query returns 0 results
          1. For each record within that list
            1. Apply the current change to the entities
            2. Save that modified entities into the OSF using the CRUD: Update web service endpoint
      2. When the Search query returns 0 results, it means that this change got fully applied to the OSF. The state of this change record then get marked as executed.
  2. Read the structfieldstorage_pending_opts_bundles table and get all the un-executed synchronization change operations
    1. For each un-executed change:
      1. Get 20 records within the local content dataset from the Search endpoint. Filter the results to get only the ones that would be affected by the current change
        1. Do until the Search query returns 0 results
          1. For each record within that list
            1. Apply the current change to the entities
            2. Save that changed record into the OSF using the CRUD: Update web service endpoint
      2. When the Search query returns 0 results, it means that this change got fully applied to the OSF. The state of this change record then get marked as executed.

The synchronization process is triggered by a Drupal cron job. Eventually this may be changed to have a setting option that would let you use cron synchronization or to trigger it by hand using some kind of button.

Compatibility

The structFieldStorage module is already compatible with multiple field widgets and external contributed Drupal 7 modules. However, because of Drupal’s nature, other field widgets and contributed modules that are not listed in this section may be working with this new field storage system, but tests will be required by the Drupal system administrator.

Field Widgets

Here is a list of all the core Field Widgets that are normally used by Drupal users. This list tells you which field widget is fully operational or disabled with the structfieldstorage field storage system.

Note that if a field is marked as disabled, it only means that it is not currently implemented for working with this new field storage system. It may be re-enabled in the future if it become required.

Field Type Field Widget Operational?
Text Text Field Fully operational
Autocomplete for predefined suggestions Fully operational
Struct Lookup Fully operational
Struct Lookup with suggestion Fully operational
Autocomplete for existing field data Disabled
Autocomplete for existing field data and some node titles Disabled
Term Reference Autocomplete term widget (tagging) Disabled
Select list Disabled
Check boxes/radio buttons Disabled
Long text and summary Text area with a summary Fully operational
Long text Text area (multiple rows) Fully operational
List (text) Select list Fully operational
Check boxes/radio buttons Fully operational
Autocomplete for allowed values list Disabled
List (integer) Select list Fully operational
Check boxes/radio buttons Fully operational
Autocomplete for allowed values list Disabled
List (float) Select list Fully operational
Check boxes/radio buttons Fully operational
Autocomplete for allowed values list Disabled
Link Link Fully operational
Integer Text field Fully operational
Float Text field Fully operational
Image Image Fully operational
File File Fully operational
Entity Reference Select list Fully operational
Check boxes/radio buttons Fully operational
Autocomplete Fully operational
Autocomplete (Tags style) Fully operational
Decimal Text field Fully operational
Date (Unix timestamp) Text field Fully operational
Select list Fully operational
Pop-up calendar Fully operational
Date (ISO format) Text field Fully operational
Select list Fully operational
Pop-up calendar Fully operational
Date Text field Fully operational
Select list Fully operational
Pop-up calendar Fully operational
Boolean Check boxes/radio buttons Fully operational
Single on/off checkbox Fully operational

Core & Popular Modules

Revisioning

The Revisioning module is fully operational with the structfieldstorage field storage system. All the operations exposed in the UI have been handled and implemented in the hook_revisionapi() hook.

Diff

The Diff module is fully operational. Since it compares entity class instances, there is no additional Diff API implementation to do. Each time revisions get compared, then structfieldstorage_field_storage_load() gets called to load the specific entity instances. Then the comparison is done on these entity descriptions.

Taxonomy

The Taxonomy module is not currently supported by the structfieldstorage field storage system. The reason is that the Taxonomy module is relying on the design of the field_sql_storage field storage system, which means that it has been tailored to use that specific field storage system. In some places it can be used, such as with the entity reference field widget, but its core functionality, the term reference field widget, is currently disabled.

Views

structViews is a Views query plugin for querying an OSF backend. It interfaces the Views 3 UI and generates OSF Search queries for searching and filtering all the content it contains. However, Views 3 is intimately tied with the field_sql_storage field storage system, which means that Views 3 itself cannot use the structfieldstorage storage system off the shelf. However, Views 3 design has been created such that a new Views querying engine could be implemented, and used, with the Views 3 user interface. This is no different than how the Field Storage API works for example. This is exactly what structViews is, and this is exactly how we can use Views on all the fields that uses the structfieldstorage field storage system.

This is not different than what is required for the mongodb Drupal module. The mongodb Field Storage API implementation is not working with the default Views 3 functionality either, as shown by this old, and very minimal, mongodb Views 3 integration module.

structViews is already working because all of the information defined in fields that use the structfieldstorage storage system is indexed into the OSF. What structViews does is just to expose this OSF information via the Views 3 user interface. All the fields that define the local content can be added to a structViews view, all the fields can participate into filter criteria, etc.

What our design means is that the structFieldStorage module doesn’t break the Views 3 module. It does not because structViews takes care to expose that entity storage system to Views 3, via the re-implmented API.

efq_views

efq_views is another contributed module that exposes the EntityFieldQuery API to Views 3. What that means is that all of the Field Storage Systems that implement the EntityFieldQuery API should be able to interface with Views 3 via this efq_views Views 3 querying engine.

Right now, the structFieldStorage module does not implement the EntityFieldQueryAPI. However, it could implement it by implementing the hook_field_storage_query() hook. (This was not required by our current client.)

A Better Revisioning System

There is a problem with the core functionality of Drupal’s current revisioning system. The problem is that if a field or a field instance gets deleted from a bundle, then all of the values of those fields, within all of the revisions of the entities that use this bundle, get deleted at the same time.

This means that there is no way to delete a field without deleting the values of that field in existing entities revisions. This is a big issue since there is no way to keep that information, at least for archiving purposes. This is probably working that way because core Drupal developers didn’t want break the feature that enables people to revert an entity to one of its past revisions. This would have meant that data for fields that no longer existed would have to be re-created (creating its own set of issues).

However, for all the fields that uses the structfieldstorage field storage system, this issue is non-existing. Even if fields or fields instances are being deleted, all the past information about these fields remains in the revisions of the entities.

Conclusion

This blog post exposes the internal mechanism of this new structfieldstorage backend to Drupal. The next blog post will focus on the user interface of this new module. It will explain how it can be configured and used. And it will explain the different Drupal backend user interface changes that are needed to expose the new functionality related to this new module.

conStruct for Drupal 7

construct_logo_120For more than a year we have been developing a completely new version of conStruct for Drupal 7 for one of our clients.

conStruct for Drupal 6 is really decoupled from Drupal and all the other contributed modules; in a word, it was not playing nice with Drupal. The goal of this new version has been to change that situation. The focus of this completely new conStruct module has been to create a series of connector modules that bridge most of Drupal’s core functionalities with remote structWSF instances.

We wanted to make sure that Drupal developers could manipulate content, within Drupal, that is hosted in structWSF instance(s). The best way to start aiming for that goal was to make sure that all of the core Drupal APIs commonly used by Drupal developers could be used to manipulate structWSF data like if it was native in Drupal. This is what these connectors are about.

The development of conStruct for Drupal 7 is not finished, but it is available in the Git repository. There is still refactoring and improvements required, mainly to make it easier to use and understand, but all of the code is working properly and is already used on production sites.

conStruct As a Large Scale Drupal Implementation

Those who follow the evolution of conStruct know that conStruct’s main goal is to use Drupal as a user interface for structWSF for administrative purposes, or for creating complete portals like the NOW portal. However, in our initial versions, Structured Dynamics’ purpose was to not tightly integrate with Drupal. Over time, though, we have seen broad acceptance for the Drupal front end and Drupal itself is evolving in ways compatible with semantic technologies.

What is changing with conStruct for Drupal 7, with all these connectors, is that we are now using conStruct to bridge Drupal with structWSF server instances. We supercharge Drupal 7’s capabilities with structWSF. Our evolution to a tighter Drupal coupling means the ability to manage, query, search, data mine, million of entities; to have vocabularies of tens of thousands of concepts; and to enable the querying of all of these entities and their content from any kind of devices or systems via a family of web services endpoints.

This is the initial version of what is (or should be) Drupal LSD for Structured Dynamics: A semantic web service framework backend system for Drupal.

conStruct’s Drupal Connectors

Here is the initial list of the connectors that exists:

  • structFieldStorage: this module creates a new structfieldstorage field storage system that can be used by Drupal fields to save the fields’ data into a remote structWSF instance. This is used to enable the Content Type entities to be saved into a structWSF instance. It is an extension of the Drupal field storage system
  • structEntities: this module creates a new Entity Type called the Resource Type that is used to see all the structWSF indexed records as native Entities in Drupal. This means that the Entity API can be used to manipulate any content in structWSF
  • structViews: this module creates a new data source for Views 3. This means that the Views 3 user interface is used to generate structWSF Search endpoint queries instead of SQL queries
  • structSearchAPI: this module exposes new search indexes to the Search API. This means that the Search API can be used to query a structWSF instance.

I will write about all these connectors individually in upcoming blog posts. I will cover their design, architecture and usage.

 

Open Semantic Framework Running on Micro Instances

After releasing the new Open Semantic Framework Installer, we started to test it on machines with all kind of different specifications: different CPU limits, different amount of memory, etc. One of the setup that caught our attention was Amazon’s EC2 Micro Instance.

The Micro Instance is a virtual server type that has been introduced by Amazon a little bit more than a year ago. As described by Amazon, Micro Instances are:

Instances of this family provide a small amount of consistent CPU resources and allow you to burst CPU capacity when additional cycles are available. They are well suited for lower throughput applications and web sites that consume significant compute cycles periodically.

We were intrigued by this particular type of instance because we wanted to know how the complete Open Semantic Framework stack could operate on such a small server instance.

Micro Instance Specifications

The Micro Instance’s specifications are as follow:

  • 613 MB memory
  • Up to 2 EC2 Compute Units (for short periodic bursts)
  • 32-bit or 64-bit platform
  • I/O Performance: Low

Note that a EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.

Installing The Stack

Installing the stack on the Amazon Micro Instance, using the OSF Installer, is not the fastest experience in the World. In fact, installing the complete stack takes up to 10 hours (5 minutes of your time, but compiling and installing everything takes about 10 hours of CPU time).

The problem is that installing OSF is a CPU intensive task, while the Micro instance is not. The micro instance can sustain small CPU bursts, but it can’t sustain the creation and compilation of the entire stack. That means that the CPU cycles won’t be available to the instance, and that the CPU consumption of that instance will be throttled by Amazon, which will significantly slow down the installation process.

However, as you will see below, once OSF is installed on the Micro instance, the complete stack responds perfectly to all queries sent to it.

Creating an AMI

The only time you have to spend 10 hours to install the OSF stack on an Amazon Micro Instance is the first time. After that, you would only have to create an Amazon AMI from that vanilla OSF instance for future use. If you proceed that way, you will lower the installation time from 10 hours to a few minutes.

Reading and Searching Data

The testing we did for reading and searching data from structWSF shows that performances are as good as the ones you would get from a small instance with a normal workload. The Crud: Read and the Search structWSF endpoints are fully responsive and operational.

Creating, Updating and Deleting Data

The testing we did for creating, updating and deleting entire datasets takes more time than with a small instance even if the instance is dedicated to that only task, without any other queries processed by the instance at the same time. The reason for this decrease in performances is due to the CPU throttling done by Amazon for this kind of more CPU intensive task. However, since individual records creation, updating and deletion creates “CPU Peaks”, such isolated create/update/delete queries doesn’t greatly affect the overall performances of the instance.

What This Type Of Instance Is Good For?

We found that such small instances were perfect for data collection activities performed by a single person, or a small group of collaborators. We also found that it could be used by low-traffic websites such as personal web portal, personal blogs, etc. The complete OSF stack is fully responsive and our analysis shows that the resources (CPU and Memory) are stable and responsive with a normal workload.

Conclusion

Such a small server instance can easily be used to create a personal data collection endpoint, or a personal, or small, data presentation portal such as Mike’s semantic web Sweet Tools. It is well suited for data portals that require reading and searching of data with occasional data changes (addition, removal and modification of instance records).

Volkswagen UK’s Search Engine Powered by structWSF

It is now official, Volkswagen UK‘s search engine is now powered by structWSF. Their new contextual search engine has been released last Friday. I covered the underlying architecture in one of my recent blog post: Volkswagen’s RDF Data Management Workflow.

 

 

John Streit, head of technology at Tribal DDB, described the two key advantages of using the structWSF (part of the Open Semantic Framework (OSF)) for their website in an interview with Wired UK:

The first is that it gives you a single place to access data. Streit explains: “Applications often need to retrieve data from multiple sources which adds complexity and development time. By using this technology we can get everything we need from a single place which drastically lowers development time and running costs.” Furthermore the exposure of data improves search and means that it can be repurposed in new and imaginative ways.

The Open Semantic Framework Installer

We are excited to introduce the first Open Semantic Framework installation script. This new installer application will install and configure the entire Open Semantic Framework stack for you. It will take about 10 minutes of your time, and will process in the background for a few hours while everything necessary to build the OSF stack is downloaded and compiled. Open Semantic Framework Installer

The only thing you have to do to run the OSF Installer is to issue the few commands outlined below, and then to answer a few questions in the process (which, since most of them use the standard default values, is pretty easy).

The OSF Installer is a major addition to the Open Semantic Framework since it now enables a greater number of people (mere mortals) to install and use the stack, and it enables much faster deployment of the system.

The full installation manual, where each of the steps performed by the installer is explained in detail, is available as a reference here.

Requirements

The current version of the Open Semantic Framework Installer is fully operational on:

  1. Ubuntu 10.04 (Lucid)
  2. 32 Bits Operating System
  3. Access to internet from the server
  4. 5GIG of disk space on the partition where you are installing OSF

Eventually this installer will be upgraded for 64-bits operating systems, and for other Linux distributions. Also, the current installer should work on newer versions of Ubuntu, but it has only been tested to date on the latest LTS version.

Installing the Open Semantic Framework

The only manual steps need to do to install the Open Semantic Framework are to:

  1. Create a folder where to install OSF on your server
  2. Download the osf-install.zip installation package
  3. Make the osf-install.sh installation script executable
  4. Run the osf-install.sh installation script
  5. Answer the questions asked by the installer

Here are the commands you have to run:

[cc lang=’bash’ line_numbers=’true’ ]

cd /mnt/
sudo wget https://github.com/downloads/structureddynamics/Open-Semantic-Framework-Installer/osf-installer-v1.0a4.zip
sudo unzip osf-installer-v1.0a4.zip
cd `ls -d structureddynamics*/`
sudo chmod 755 osf-install.sh
./osf-install.sh

[/cc]

conStruct and structWSF Upgrades

In the process, both conStruct and structWSF have been enhanced to enable automatic upgrading in the future. Starting with structWSF version 1.0a92 and conStruct version 6.x-1.0-beta9, future upgrades should be done automatically using automatic upgrading procedures.

However, to enable this, existing users will have to upgrade their current versions manually to establish the new automatic upgrades baseline.

Next Steps

Once you have installed the OSF stack, you next query the structWSF Web service endpoints, and import datasets using conStruct. Here are a few things you can do to start exploring the Open Semantic Framework:

  1. Start exploring structWSF
  2. Start exploring conStruct
  3. Start exploring Ontologies usage in OSF
  4. Start importing and manipulating datasets
  5. Start exploring the Open Semantic Framework architecture
  6. Start playing with the structWSF web service endpoints

Since everything is installed on your server, so you only have to play with the stack now. If you break something, just ping us on the mailing list or re-install it without worrying about each installation steps!

Help

It may be possible that you experience some issues with this new OSF Installer. If that is the case, I would suggest your to make an outreach to the Open Semantic Web Mailing List so that we fix it on the Git repository.

Just write an email that includes the specifications of the server where you are trying to install OSF on. Then tell us where the issue happens in the installation process. Also add any logs that could be helpful in debugging the issue.

Conclusion

This is the first version of the OSF installer, but this is a real balm for installing OSF. As noted, this installer will eventually be upgraded to support 64-bit servers and other Linux distributions. Also, any help improving this installer from Bash wizards would naturally be greatly welcomed.