My Optimal GNU Emacs Settings for Developing Clojure (so far)

Note: this blog post has been revised with this other blog post.

In the coming months, I will start to publish a series of blog posts that will explain how RDF data can be serialized in Clojure code and more importantly what are the benefits of doing this. At Structured Dynamics, we started to invest resources into this research project and we believe that it will become a game changer regarding how people will consume, use and produce RDF data.

But I want to take a humble first step into this journey just by explaining how I ended up configuring Emacs for working with Clojure. I want to take the time to do this since this is a trials and errors process, and that it may be somewhat time-consuming for the new comers.

Light Table

Before discussing how I configured Emacs, I want to introduce you to the new IDE: Light Table. This new IDE is mean to be the next generation of code editor. If you are new to Clojure, and more particularly if you never used Emacs before, I would strongly suggest you to start with this code editor. It is not only simple to use, but all the packages you will require to work with Clojure are already built-in.

As you may know, GNU Emacs has been developed using Emacs Lisp (a Lisp dialect). This means that it can be extended by installing and enabling  packages, all configurations options and behaviors can be changes, and even while it is running! Light Table is no different. It has been developed in ClosureScript, and it can be extended the same way. To me, the two real innovations with Light Table are:

  • The instarepl
  • The watches

The instarepl is a way to evaluate the value of anything, while you are coding, directly inline in your code. This is really powerful and handy when prototyping and testing code. Every time you type some code, it get evaluated in the REPL, and displayed inline in the code editor.

The watches are like permanent instarepl that you place within the code. Then every time the value changes, you see the result in the watch section. This is really handy when you have to see the value of some computation while the application, or part of the application, are running. You get a live output of what is being computed, directly into your code.

The only drawback I have with LightTable is that there is no legacy REPL available (yet?). This means that if you want to evaluate something unrelated to your code, you have to write the code directly into the editor and then evaluate it with the instarepl. Another issue regarding some use cases is that the evaluation of the code can become confusing like when you define a Jetty server in your code. Since everything get evaluated automatically (if the live mode is enabled) then it can start the server without you knowing it. Then to stop it, you have to write a line of code into your code and then to evaluate it to stop the server.

Because of the nature of my work, I am a heavy user of multiple monitors (daily working with six monitors). This means that properly handling multiple monitors is essential to my productivity. That is another issue I have with LightTable: you can create new windows that you can move to other monitors, but these windows are unconnected: they are different instances of LightTable.

Simple is beautiful, and it is why I really do like LightTable and why I think it is what beginners should use to start working with Clojure. However, it is not yet perfect for what I have to do. That is why I choose to use GNU Emacs for my daily work.

GNU Emacs

I don’t think that GNU Emacs needs any kind of introduction. It is heavy, it is unnatural, it takes time to get used to, the learning curve is steep, but… hell it is powerful for working with Lisp dialects like Clojure!

The problem with Emacs is not just to learn the endless list of key bindings (even if you can go a long way with the core ones), but also to configure it for your taste. Since everything can be configured, and that there exists hundred of all kind of packages, it takes time to configures all the options you want, and all the modules you require. This is the main reason I wrote this blog post: to share my (currently) best set of configuration options and packages for using Emacs for developing with Clojure.

I am personally developing on Windows 8, but these steps should be platform agnostic. You only have to download and install the latest GNU Emacs 24 version.

The first thing you have to do is to locate you .emacs file. All the configurations I am defining in this blog post goes into that file.

Packages

Once Emacs is installed, the first thing you have to do is to install all the packages that are required to develop in Clojure or that will make your life easier for handling the code. The packages that you have to install are:

  • cider
    • Clojure Integrated Development Environment and REPL – This is like Slime for Common Lisp. This is what turns Emacs into a Clojure IDE
    • Important note: make sure that the Cider version you are installing is coming from the MELPA-Stable repository, and not the MELPA one. At the time of the publication of the blogpost, the latest stable release is 0.6.
  • clojure-mode
    • Major mode for Clojure code
  • clojure-test-mode
    • Minor mode for Clojure tests
  • auto-complete
    • Auto Completion for GNU Emacs – This is what is used to have auto-completion capabilities into your code and in the mini-buffer
  • ac-nrepl
    • Auto-complete sources for Clojure using nrepl completions – This is what is used to add auto-completion capabilities to the NREPL
  • paredit
    • minor mode for editing parentheses  -*- Mode: Emacs-Lisp -*- – This is what will do all the Lisp like code formatting (helping you managing all these parenthesis)
  • popup
    • Visual Popup User Interface – This is what will enable popup contextual menus when using auto-completion in your code and in the NREPL
  • raindow-delimiters
    • Highlight nested parens, brackets, braces a different color at each depth – This is really handy to visually see where you are with your parenthesis. An essential to have
  • rainbow-mode
    • Colorize color names in buffers

Before installing them, we have to tell Emacs to use the Marmelade packages repository where all these packages are hosted and ready to the installed into your Emacs instance. At the top of your .emacs file, put:

[cc lang=’lisp’ line_numbers=’false’]
[raw](require ‘package)

(add-to-list ‘package-archives
‘(“melpa-stable” . “http://melpa-stable.milkbox.net/packages/”))

(add-to-list ‘package-archives
‘(“melpa” . “http://melpa.milkbox.net/packages/”))

(add-to-list ‘package-archives
‘(“marmalade” . “http://marmalade-repo.org/packages/”))

;; Initialize all the ELPA packages (what is installed using the packages commands)
(package-initialize)[/raw]
[/cc]

Important note: only use the MELPA repository if you want to install non-stable modules such as the Noctulix theme. If you are not expecting using it, then I would strongly suggest you to remove it and only to keep the MELPA-Stable repository in that list.

If you are editing your .emacs file directly into Emacs, and you can re-evaluate the settings file using Emacs, then by moving cursor at each top-level expression end (after closing parenthesis) and press C-x C-e. However, it may be faster just to close and restart Emacs to take the new settings into account. You can use any of these methods for the following set of settings changes.

Before changing any more settings, we will first install all the required packages using the following sequence of commands:

  • M-x package-install [RET] cider [RET]
  • M-x package-install [RET] clojure-mode [RET]
  • M-x package-install [RET] clojure-test-mode[RET]
  • M-x package-install [RET] auto-complete[RET]
  • M-x package-install [RET] ac-nrepl [RET]
  • M-x package-install [RET] paredit[RET]
  • M-x package-install [RET] popup [RET]
  • M-x package-install [RET] rainbow-delimiters [RET]
  • M-x package-install [RET] rainbow-mode [RET]

Additionally, you could have used M-x package-list-packages, then move your cursor in the buffer to the packages’ line. Then press i (for install) and once all the packages are selected, you could have press x (execute) to install all the packages all at once.

In the list of commands above, M-x is the “meta-key” normally bound to the left Alt key on your keyboard. So, M-x usually means Alt-x.

Now that all the packages are installed, let’s take a look at how we should configure them.

Configuring Keyboard

If you are using an English/US keyboard, you can skip this section. Since I use a French Canadian layout (On an English/US Das Keyboard!), I had multiple issues to have my keys working since all the binding changed in Emacs. To solve this problem, I simply had to define that language configuration option. Then I had to start using the right Alt key of my keyboard to write my brackets, curly brackets, etc:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; Enable my Canadian French keyboard layout
(require ‘iso-transl)[/raw]
[/cc]

Configuring Fonts

Since I am growing older (and that I have much screen estates with six monitors), I need bigger fonts. I like coding using Courier New, so I just configured it to use the font size 13 instead of the default 10:
[cc lang=’lisp’ line_numbers=’false’]
[raw];; Set bigger fonts
(set-default-font “Courier New-13”)[/raw]
[/cc]

Cider and nREPL

The next step is to configure Cider and the nREPL which are the two pieces that turns Emacs into a wonderful Clojure IDE:

[cc lang=’lisp’ line_numbers=’false’]
[raw](add-hook ‘clojure-mode-hook ‘turn-on-eldoc-mode)
(setq nrepl-popup-stacktraces nil)
(add-to-list ‘same-window-buffer-names “nrepl“)[/raw]
[/cc]

Auto-completion

The next step is to configure the auto-completion feature everywhere in Emacs: in any buffer, nREPL or in the mini-buffer. Then we want the auto-completion to appear in a contextual menu where the docstrings (documentation) of the functions will be displayed:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; General Auto-Complete
(require ‘auto-complete-config)
(setq ac-delay 0.0)
(setq ac-quick-help-delay 0.5)
(ac-config-default)

;; ac-nrepl (Auto-complete for the nREPL)
(require ‘ac-nrepl)
(add-hook ‘cider-mode-hook ‘ac-nrepl-setup)
(add-hook ‘cider-repl-mode-hook ‘ac-nrepl-setup)
(add-to-list ‘ac-modes ‘cider-mode)
(add-to-list ‘ac-modes ‘cider-repl-mode)[/raw]
[/cc]

Popping Contextual Documentation At Any Time

What is really helpful is to be able to pop the documentation for any symbol at any time just by pressing a series of keys. What need to be done is to configure Cider & ac-nrepl to bind this behavior to the C-c C-d sequence of keys:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; Poping-up contextual documentation
(eval-after-load “cider”
‘(define-key cider-mode-map (kbd “C-c C-d”) ‘ac-nrepl-popup-doc))[/raw]
[/cc]

Par Edit

Par Edit is the package that will help you out automatically formatting you Clojure code. It will balance the parenthesis, automatically indenting your S-expressions, etc.

[cc lang=’lisp’ line_numbers=’false’]
[raw](add-hook ‘clojure-mode-hook ‘paredit-mode)[/raw]
[/cc]

Show Parenthesis Mode

Another handy feature is to enable, by default, the show-parent-mode configuration option. That way, every time the cursor points to a parenthesis, the parent parenthesis will be highlighted into the user interface. This is an essential most-have with Par Edit:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; Show parenthesis mode
(show-paren-mode 1)[/raw]
[/cc]

Rainbow Delimiters

Another essential package to have to help you out maintaining these parenthesis. The rainbow delimiters will change the color of the parenthesis depending on how “deep” they are into the structure. Another essential visual cue:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; rainbow delimiters
(global-rainbow-delimiters-mode)[/raw]
[/cc]

Noctilux Theme

Did I say that I like LightTable? In fact, I really to like their dark theme. It is the best I saw so far. I never used any in my life since I never liked any of them. But that one is really neat, particularly to help visualizing Clojure code. That is why I really wanted to get a LightTable theme for Emacs. It exists and it is called Noctilux and works exactly the same way with the same colors.

If you want to install it, you can get it directly from the packages archives. Type M-x package-list-packages then search and install noctilux-theme.

Then enable it by adding this setting:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; Noctilus Theme
(load-theme ‘noctilux t)[/raw]
[/cc]

Binding Some Keys

Then I wanted to bind some behaviors to the F-keys. What I wanted is to be able to run Cider, to be able to start and stop Par Edit and to switch frames (windows within monitors) in a single click. I also added a shortkey for starting speedbar for the current buffer, it is an essential for managing project files. What I did is to bind these behaviors to these keys:

[cc lang=’lisp’ line_numbers=’false’]
[raw](global-set-key [f8] ‘other-frame)
(global-set-key [f7] ‘paredit-mode)
(global-set-key [f9] ‘cider-jack-in)
(global-set-key [f11] ‘speedbar)[/raw]
[/cc]

Fixing the Scroll

There is one thing that I really didn’t like, and it was the default behavior of the scrolling of Emacs on Windows. After some searching, I found the following configurations that I could fix to have a smoother scrolling behavior on Windows:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; scroll one line at a time (less “jumpy” than defaults)

(setq mouse-wheel-scroll-amount ‘(1 ((shift) . 1))) ;; one line at a time

(setq mouse-wheel-progressive-speed nil) ;; don’t accelerate scrolling

(setq mouse-wheel-follow-mouse ‘t) ;; scroll window under mouse

(setq scroll-step 1) ;; keyboard scroll one line at a time[/raw]
[/cc]

Complete Configuration File

Here is the full configuration file that I am using:

[cc lang=’lisp’ line_numbers=’false’]
[raw](require ‘package)

(add-to-list ‘package-archives
‘(“melpa-stable” . “http://melpa-stable.milkbox.net/packages/”))

(add-to-list ‘package-archives
‘(“melpa” . “http://melpa.milkbox.net/packages/”))

(add-to-list ‘package-archives
‘(“marmalade” . “http://marmalade-repo.org/packages/”))

;; Initialize all the ELPA packages (what is installed using the packages commands)
(package-initialize)

;; Enable my Canadian French keyboard layout
(require ‘iso-transl)

;; Set bigger fonts
(set-default-font “Courier New-13”)

;; Cider & nREPL
(add-hook ‘clojure-mode-hook ‘turn-on-eldoc-mode)
(setq nrepl-popup-stacktraces nil)
(add-to-list ‘same-window-buffer-names “nrepl“)

;; General Auto-Complete
(require ‘auto-complete-config)
(setq ac-delay 0.0)
(setq ac-quick-help-delay 0.5)
(ac-config-default)

;; ac-nrepl (Auto-complete for the nREPL)
(require ‘ac-nrepl)
(add-hook ‘cider-mode-hook ‘ac-nrepl-setup)
(add-hook ‘cider-repl-mode-hook ‘ac-nrepl-setup)
(add-to-list ‘ac-modes ‘cider-mode)
(add-to-list ‘ac-modes ‘cider-repl-mode)

;; Popping-up contextual documentation
(eval-after-load “cider”
‘(define-key cider-mode-map (kbd “C-c C-d”) ‘ac-nrepl-popup-doc))

;; paredit
(add-hook ‘clojure-mode-hook ‘paredit-mode)

;; Show parenthesis mode
(show-paren-mode 1)

;; rainbow delimiters
(global-rainbow-delimiters-mode)

;; Noctilus Theme
(load-theme ‘noctilux t)

;; Switch frame using F8
(global-set-key [f8] ‘other-frame)
(global-set-key [f7] ‘paredit-mode)
(global-set-key [f9] ‘cider-jack-in)
(global-set-key [f11] ‘speedbar)

;; scroll one line at a time (less “jumpy” than defaults)
(setq mouse-wheel-scroll-amount ‘(1 ((shift) . 1))) ;; one line at a time
(setq mouse-wheel-progressive-speed nil) ;; don’t accelerate scrolling
(setq mouse-wheel-follow-mouse ‘t) ;; scroll window under mouse
(setq scroll-step 1) ;; keyboard scroll one line at a time[/raw]
[/cc]

Conclusion

Now that we have the proper development environment in place, the next blog posts will really get into the heart of the matter: what are the different ways to serialize RDF data in Clojure code, how the generated code can be used, what are the benefits, how it changes the way that data (RDF in this case, but really any data) can be produced and consumed.

We think that there are profound implications into how we, as Semantic Web specialists, will work with data instances and ontologies in the future. The initial project that will embed and benefit from these new principles and techniques will be the next version of the UMBEL ontology.

Final note: there are an endless list of features and packages for Emacs. Obviously, I don’t know all of them, so if you are aware of any settings or packages that I missed here and that could improve this setup, please share them in the comments.

Schema.org: Forcing the Emergence of a New Web Paradigm

schema-org1Sometime this week I was reading a blog post that was giving some statistics related to Schema.org‘s usage on the Web. It states:

36.6 percent of Google’s search results include “at least one snippet with information derived from Schema.org.”

 

only about 0.3 percent of domains are using the markup code on their websites.

Someone may be surprise to see how that little number of domains produces that much snippet uses in Google searches. But this is not what interest me in this blog post. What I am interested in is that considering that 36.6% of the Google search appears to be returning structured information that uses Schema.org microdata, why is there only 0.3 percent of the domains that are using the markup?

Introduction of a new paradigm on the Web

I think that what is happening at the moment is the emergence of a new paradigm on the Web: publication of structured data. Some may say that this is happening for a long time1 and I agree with them. However, what is happening is that this structured data starts to emerge to the end users. This is not something that happened until recently (the last year or so).

What the major search engines, which participate to Schema.org, are doing is to push (to force?) this new paradigm to emerge. The thing is that to my experience the management of structured data to be published on the Web needs a different set of concepts, minding, terminology, specifications and more importantly tools.

It is true that current tools and techniques can be used to publish Schema.org markup in HTML Web pages, but to me, they are sub-optimal for the task at hands. This is probably one of the reasons why the authors of this blog posts stated:

Not surprisingly, the study also found that “larger sites” are more likely to use Schema markup. There’s no definition given in the study on what makes a site big or small, but this has long been one of the concerns about Schema.org – whether small businesses/websites would have the technical chops to take advantage of the rich snippet opportunity, or if that would be left to bigger companies with more skilled webmasters and more organized online marketing efforts.

I tend to agree with that. However, this shouldn’t be the case. I think that the reason for that is that people doesn’t tend to use the proper frameworks (CMS, programming API, etc.) and data management systems that are optimized for that task. Another reason is that there is no widespread understanding and adoption of the new underlying concepts, technologies and techniques that are emerging with this new paradigm.

Coping with the evolution of Schema.org

One of the core concept introduced by this new paradigm is the Open World Assumption. This assumption basically means that we don’t know if something exists or not, if something is true or not, until it is explicitly stated. This means that it is not because we (our systems) doesn’t have some information, that this information doesn’t exists.

This is really important to understand, and this assumption has a dramatic impact on how we develop the systems that will publish this structured information on the Web. On the Web, there is no one system that has complete control over the information that may exists. Major search engines such as Google have this Open World Assumption at the core of their system. It is why they are pushing initiatives such as Schema.org, of their Knowledge Graph. Because this is how they can try to cope with the constantly evolving Web.

How does this relate to Schema.org?

Right now, there is 585 types and 807 properties in schema.org2. and there are even ways to extend the vocabulary.What that means is that this vocabulary is constantly evolving, changing, improving and increasing. If the vocabularies (ontologies) changes that often, it means that the data may should as well. However, the way most of the data management systems are currently used to publish content on the Web (mostly relational databases) can hardly cope with these kind of changes in the data, and its structure.

This is the reason why I am stating that new concepts, techniques, technologies, methods and tools needs to be used in order to be able to cope with these constant changes.

With traditional (relational) systems, every time someone would want to add new micro-data in their webpage, they would have to do an analysis of their relational data, and then to map it to different Schema.org types and properties, and then to create all the code to perform this linkage, and generate the enhanced HTML code which includes the Schema.org micro data.

Then once this is done, what happens if the vocabulary changed? If the data to publish changed? Well, all this analysis and work will need to be done again to reflect the changes in the vocabulary and the data.

However, what if a different set of concepts, techniques and tools are used to publish structured content on the Web?

What I am proposing here is a system, a framework, that manipulate entities as its core: things that are described with attributes and values. Then, these entities descriptions are carried around within your code. The logic required to handle the use case I outlined above is embedded into the ontologies, the system, the framework, the API… The only thing a developer should need to do is to care about its code and the functionalities of the system.

In such an information system, all the entities are described using internal and external ontologies. All these ontologies concepts (types and properties) need to be linked to the ones of Schema.org (or any other sources of information). Every time something change, the changes should be reflected, accounted for, into these ontologies, not into the code, the templates, or whatever. It need be transparent to the developers.

In the next section, I will show you how this can be done using the Open Semantic Framework (OSF). However keep in mind that what I am discussing in this blog post is much more general than that, and can be implemented using different tools. I used the Schema.org example, but the same minding can be applied to lot of different use cases.

Care about the code, not the data

To make my point, I will demonstrate how publishing Schema.org microdata in a Web portal can be done using a new set of techniques, concepts and tools.

The initial goal is to split the concerns: ontologists should care about the ontologies and their linkage, and developers should care about the code and the functionalities of the system. The best way to make sure that a developer cares about the code, is to abstract this complexity of the Open World Assumption behind a programming API. In this example, we will demonstrate that using the OSF PHP API.

Such a API should use the resources provided by the framework to determine if the properties and types that are used to describe a given entity can be expressed/serialized in Schema.org microdata. All this mechanic should be hidden the the developer, and should be driven by the ontologies.

This is the crux of the matter. We want to manage this complexity where it is much easier to manage: at the level of the ontologies 3. These Ontologies Driven Applications (in this case, the Ontologies Driven Frameworks or Systems) will abstract this complexity to the developers.

Let’s take this PHP [nearly pseudo] code as an example. It uses the OSF PHP API to retrieve information about an entity from a OSF Web Services instance by querying the CRUD: Read web service endpoint. Then it uses the Subject class to determine if the property(ies) and type(s) of the entity can be serialized in Schema.org microdata. In this example, the Subject class is using non-existing function calls. The goal is to show how such a basic programming API can abstract all the complexity of an evolving Schema.org vocabulary.

Let’s take that pseudo PHP code:

[cc lang=’php’ line_numbers=’false’]

[raw]
uri($entityIdentifier)
->send();

$resultset = $crudRead->getResultset();

// Get the entity (instance of the class Subject) from the resultset
$entity = $resultset->getSubject($entityIdentifier);

// Get the first type of the entity
$type = current($entity->getTypes());

// Get the name of the entity
$name = $entity->getPrefLabel();

// Get the genre of the entity
$genre = $entity->getDataAttribute(‘http://purl.org/ontology/movies#genre’);

// Get the director of the entity
$director = $entity->getDataAttribute(‘http://purl.org/ontology/movies#director’);

// Then run this template to generate the HTML which will embed,
// or not, some Schema.org microdata
?>

serializeMicroformat(); ?>>

serializeMicroformat(); ?>>getValue() ?>

Director: serializeMicroformat(); ?>>getValue() ?> (born August 16, 1954)
serializeMicroformat(); ?>>getValue() ?>

[/raw]

[/cc]

What this template does, is to generate the HTML code, enhanced with Schema.org microdata. The serializeMicroformat($format) function does:

  1. Get the URI reference of the type/property
  2. Query the ontology to check if the type/propertyis linked to a Schema.org concept
    1. If it is not, then an empty string is returned
    2. If it is, then it serializes the micro data to add to the HTML and return it

It is as simple as that. All the “complexity”, all the work, is done at the level of the reference structure (the ontology). The result would be something like:

[cc lang=’html’ line_numbers=’false’]
[raw]

Avatar

Director: James Cameron (born August 16, 1954)
Science fiction

[/raw]
[/cc]

Here is another example that does exactly the same, but that produces RDFa Lite markup:

[cc lang=’php’ line_numbers=’false’]

[raw]

serializeRDFaLite(); ?>>

serializeRDFaLite(); ?>>getValue() ?>

Director: serializeRDFaLite(); ?>>getValue() ?> (born August 16, 1954)
serializeRDFaLite(); ?>>getValue() ?>

[/raw]

[/cc]

This would produces that HTML code with RDFa Lite embedded:

[cc lang=’html’ line_numbers=’false’]
[raw]

Avatar

Director: James Cameron (born August 16, 1954)
Science fiction

[/raw]
[/cc]

What happens there is that the API uses the Ontology (which is linked to Schema.org concepts) to determine if the entity can be rendered in Schema.org microdata. What it does is to check if the type used to describe the entity we retrieved from OSF is linked to a Schema.org concept. If it is, then the API get that reference to Schema.org, and properly serialize the Schema.org microdata snippet. The only thing the developer need to do, is to properly use the API functions. Nothing else need to be determined by him, the system will take care of the rest.

The beauty of this is that you don’t have to worry about any kind of mapping between the vocabulary (ontology) you use for describing your entities, and the Schema.org types and properties. The only thing you have to do is to re-use such a mapping with your ontologies. The PHP API will take care to produce the proper Schema.org microdata, only if the linkage exists between the content you are publishing and the schema.org vocabulary. The only thing you have to worry about is to use the API when you create your code to publish your content on the Web.

Is this vision possible? The platform that manipulates entities that way is already existing: it is the Open Semantic Framework. Everything you manipulate are such entities descriptions. Then you have the PHP API available to query the web services to get the descriptions of your entities. The only missing piece is the glue that map your entities’ types and properties to the schema.org vocabulary.

The good news is that this glue already exists, but will greatly improve in the coming months. We are currently working on a completely new version of UMBEL (Upper Mapping and Binding Exchange Layer) that will include, amongst other things, a fully updated Schema.org mapping between UMBEL and Schema.org. Note that UMBEL and its linkages is meant to be that reference structure to be used in the code I outlined above.

Conclusion

A new Web paradigm is being pushed, is being forced, by the major search engines. However, the issue that is emerging is that the current systems that are used by 98%4 of the people are not geared toward that kind of data management and these new development concepts and techniques. If this paradigm shift continues, then it will force developers to adopt completely new platforms, which rely on new technologies, concepts and specifications such as the Open Semantic Framework. The way people of that field are working will change quite significantly.

This blog post focused on the Web, and companies that are publishing content on the Web. However, to my experience with multiple different kind of organizations (municipalities, governments, Fortune 500 companies, etc.) are now experiencing the influences of the Open World Assumption in what they thought to be a Closed World. The data they are using to give different kind of services, changes, evolves. New acquisitions and new projects challenge their Close World Assumption. These changes have dramatic impacts on their infrastructure, their data and their ability to evolve and adapt to the constantly (fast) changing World .

The leap is big since the minding is quite different, investments will be required in terms of software and data migration and training of the staff. But in my view, it is essential, things are changing and organizations will need to adapt.

To conclude, how many time in a day can I read blog posts, tweets and forums where naysayers state that the Semantic Web never worked, never existed and it is doomed to be used by academicians only? Multiple… but who cares?

This is the Semantic Web. This is Linked Data. And it is changing the way people works.

  1. I am professionally working in the Semantic Web field for more than seven years now
  2. according to their RDFa schema
  3. Ontologies Driven Applications is a concept Structured Dynamics introduced a few years ago. Keep in mind that some of the systems referenced in this blog post are not existing anymore, and have been superseded by the Open Semantic Framework (OSF)
  4. this is a completely random number coming out of my intuition

Exporting Entities using OSF for Drupal (Screencast)

This screencast will introduce you to the OSF for Drupal features that let you export Drupal Entities in one of the following supported serializations:

  • RDF+XML (RDF in XML)
  • RDF+N3 (RDF in N3)
  • structJSON (Internal OSF RDF serialization in JSON)
  • structXML (Internal OSF RDF serialization in XML)
  • ironJSON (irON serialization in JSON)
  • commON (CSV serialization to be used in spreadsheet applications)

I will show you how you can use OSF for Drupal to export entire datasets of Entities, or how to export Entities individually. You will see how you can configure Drupal such that different users roles get access to these functionalities.

I will also briefly discuss how you can create new converters to support more data formats.

Finally, I will show you how Drupal can be used as a linked data platform with a feature that makes every Drupal Entities dereferencable on the Web1. You will see how you can use cURL to export the Entities‘ descriptions using their URI in one of the 6 supported serialization formats.


tut_11_blog_400
 

  1. OSF for Drupal follows the Cool URIs for the Semantic Web W3C’s interest group notes

The Open Semantic Framework Academy

osf_academy_logo_pillarsThe Open Semantic Framework Academy YouTube channel has just been released this morning.

The Open Semantic Framework Academy is a dedicated channel for instructional screencasts on OSF. Via its growing library of videos, the OSF Academy is your one-stop resource for how to deploy, manage and use the Open Semantic Framework. OSF is a complete, turnkey stack of semantic technologies and methods for enterprises of all sizes.

All the aspects and features of the Open Semantic Framework will be covered in this series of screencasts. Dozens of such screencasts will be published in the following month or two. They are a supplement to the OSF Wiki documentation, but they are not mean to be a replacement.

Intro to the Open Semantic Framework (OSF)

This kick-off video to the OSF Academy overviews the Open Semantic Framework platform and describes it in terms of the 5 Ws (welcome, why, what, when, where) and the 1 H (how).

tut_1_blog_400

Installing Core OSF (Open Semantic Framework)

This screencast Introduce you to the Open Semantic Framework. Then it will show you how to install OSF using the OSF Installer script on a Ubuntu server. Finally it will introduce you to the system integration tests using the OSF Tests Suites.

tut_2_blog_400

Open Semantic Framework Web Resources

This screencast will show you all the web sites that exists to help you learning about the Open Semantic Framework. Such websites are the OSF main site, the OSF Wiki, Mike Bergman and Fred Giasson‘s blogs, demo portals such as Citizen DAN, NOW, MyPeg, HealthDirect and Pregnancy Birth and Babies.

tut_3_blog_400

3.5 Million DBpedia Entities in Drupal 7

In the previous article Loading DBpedia into the Open Semantic Framework, we explained how we could load the 3.5 million DBpedia entities into a Open Semantic Framework instance. In this article, we will show how these million of entities can be used in Drupal for searching, browsing, mapping and templating these DBpedia entities.

Installing and Configuring OSF for Drupal

This article doesn’t cover how OSF for Drupal can be installed and configured. If you want to properly install and configure OSF for Drupal, you should install it using the OSF Installer by running this command:

[cc lang=’bash’ line_numbers=’false’]
[raw]
./osf-installer –install-osf-drupal
[/raw]
[/cc]

Then you should configure it using the first section of the OSF for Drupal user manual.

Once this is done, the only thing you will have to do is to register the OSF instance that hosts the DBpedia dataset. Then to register the DBpedia data into the Drupal instance. The only thing you will have to do is to make sure that the Drupal’s administator role has access to the DBpedia dataset. It can be done by using the PMT (Permissions Management Tool) by running the following command:

[cc lang=’bash’ line_numbers=’false’]
[raw]
pmt –create-access –access-dataset=”http://dbpedia.org” –access-group=”http://YOU-DRUPAL-DOMAIN/role/3/administrator” –access-perm-create=”true” –access-perm-read=”true” –access-perm-delete=”true” –access-perm-update=”true” –access-all-ws
[/raw]
[/cc]

Searching Entities using the Search API

All the DBpedia entities are searchable via the SearchAPI. This is possible because of the OSF SearchAPI connector module that interface the SearchAPI with OSF.

Here is an example of such a SearchAPI search query. Each of these result come from the OSF Search endpoint. Each of the result is templated using the generic search result template, or other entity type search templates.

What is interesting is that depending on the type of the entity to display in the results, its display can be different. So instead of having a endless list of results with titles and descriptions, we can have different displays depending on the type of the record, and the information we have about that record.

dbpedia_search_3

In this example, only the generic search template got used to display these results. Here is the generic search results template code:

[gist id=”8560305″]

Manipulating Entities using the Entity API

The Entity API is a powerful Drupal API that let developers and designers loading and manipulating entities that are indexed in the data store (in this case, OSF). The full Entity API is operational on the DBpedia entities because of the OSF Entities connector module.

As you can see in the template above (and in the other templates to follow), we can easily use the Entity API to load DBpedia entities. In these templates examples, what we are doing is to use this API to load the entities referenced by an entity. In this case, we do this to get their labels. Once we loaded the entity, we end-up with an Entity object that we can use like any other Drupal entities:

[gist id=”8585440″]

Mapping Entities using the sWebMap OSF Widget

Because a big number of DBpedia entities does have geolocation data, we wanted to test the sWebMap OSF Widget to be able to search, browse and locate all the geolocalized entities. What we did is to create a new Content Type. Then we created a new template for that content type that implements the sWebMap widget. The simple template we created for this purpose is available here:

[gist id=”8561181″]

Then, once we load a page of that Content Type, we can see the sWebMap widget populated with the geolocalized DBpedia entities. In the example below, we see the top 20 records in that region (USA):

dbpeida_swebmap_2

Then what we do is to filter these entities by type and attribute/values. In the following example, we filtered by RadioStation, and then we are selecting a filter to define the type of radio station we are looking for:

dbpeida_swebmap_3

Finally we add even more filtering options to drill-down the geolocalized information we are looking for.

dbpeida_swebmap_4

We end-up with all the classical radio station that broadcast in the region of Pittsburgh.

dbpeida_swebmap_5

Templating Entities using Drupal’s Templating Engine

Another thing we get out of the box with Drupal and OSF for Drupal, is the possibility to template the entities view pages and the search resultsets. In any case, the selection of the template is done depending on the type of the entity to display.

With OSF for Drupal, we created a template selection mechanism that uses the ontologies’ structure to select the proper templates. For example, if we have a Broadcaster template, then it could be used to template information about a RadioStation or a TelevisionStation, even if these templates are not existing.

Here is an example of a search resultset that displays information about different type of entities:

dbpedia_search_2

The first entity is an organization that has an image. It uses the generic template. The second one is a person which also use the generic template, but it has no image. Both are using the generic template because none of the Organization nor the Person templates have been created. However, the third result uses a different template. The third result is a RadioStation. However, it uses the Broadcaster template since the RadioStation class is a sub-class-of Broadcaster and because the Broadcaster template exists in the Drupal instance.

Here is the code of the Broadcaster search result template:

[gist id=”8581527″]

Now let’s take a look at the template that displays information about a specific Entity type:

dbpedia_entity_view

This minimal records displays some information about this radio station. The code of this template is:

[gist id=”8585895″]

Building Complex Search Queries using the OSF Query Builder

A system administrator can also use the OSF Query Builder to create more complex search queries. In the following query, we are doing a search for the keyword “radio“, we are filtering by type RadioStation, and we are boosting the scoring value of all the results that have the word “life” in their slogan.

dbpeida_querybuilder_1

The top result is a radio station of Moscow that has “Life in Motion!” as its slogan. We can also see the impact of the scoring booster on the score of that result.

Conclusion

As we can see with these two articles, it is relatively easy and fast to import the DBpedia dataset into a OSF instance. By doing so, we end-up with a series of tools to access, manage and publish this information. Then we can leverage the OSF platform to create all kind of web portals or other web services. All the tools are there, out-of-the-box.

This being said, this is not where lies the challenge. The thing is that there is more than 500 classes and 2000 properties that describes all the content present in the DBpedia Ontology. This means that more than 2000 filters may exists for the Search API, the sWebMap widget, etc. This also means that more than 500 Drupal bundles can be created with hundred of fields, etc.

All this need to be properly configured and managed by the Drupal site developer. However, there are mechanisms that have been developed to help them managing this amount of information such as the entity template selection mechanism that uses the ontologies’ structure to select the display templates to use. For example, you could focus on the entity Broadcaster, and create a single template for it. Automatically, this template could be used by sub-classes such as BroadcastNetwork, RadioStation, TelevisionStation and many others.

The Open Semantic Framework is really flexible and powerful as you may have noticed with this series of two articles. However, the challenge and most of the work lies into creating and configuring the portal that will use this information. The work lies into creating the search and entities templates. To properly define and manage the bundles and fields, etc.