New UMBEL Concept Tagger Web Service

We just released a new UMBEL web service endpoint and online tool: the Concept Tagger Plain. umbel_ws

This plain tagger uses UMBEL reference concepts to tag an input text. The OBIE (Ontology-Based Information Extraction) method is used, driven by the UMBEL reference concept ontology. By plain we mean that the words (tokens) of the input text are matched to either the preferred labels or alternative labels of the reference concepts. The simple tagger is merely making string matches to the possible UMBEL reference concepts.

This tagger uses the plain labels of the reference concepts as matches against the input text. With this tagger, no manipulations are performed on the reference concept labels nor on the input text (like stemming, etc.). Also, there is NO disambiguation performed by the tagger if multiple concepts are tagged for a given keyword.

Intended Users

This tool is intended for those who want to focus on UMBEL and do not care about more complicated matches. The output of the tagger can be used as-is, but it is intended to be the initial input to more sophisticated reference concept matching and disambiguation methods. Expect additional tagging methods to follow (see conclusion).

The Web Service Endpoint

The web service endpoint is freely available. It can return its resultset in JSON, Clojure code or EDN (Extensible Data Notation).

This endpoint will return a list of matches on the preferred and alternative labels of the UMBEL reference concepts that match the tokens of an input text. It will also return the number of matches and the position of the tokens that match the concepts.

The Online Tool

We also provide an online tagging tool that people can use to experience interacting with the web service.

The results are presented in two sections depending on whether the preferred or alternative label(s) were matched. Multiple matches, either by concept or label type, are coded by color. Source words with matches and multiple source occurrences are ranked first; thereafter, all source words are presented alphabetically.

The tagged concepts can be clicked to have access to their full description.

reference_concept_tagger_uiEDN and ClojureScript

An interesting thing about this user interface is that it has been implemented in ClojureScript and the data serialization exchanged between this user interface and the tagger web service endpoint is in EDN. What is interesting about that is that when the UI receives the resultset from the endpoint, it only has to evaluate the EDN code using the ClojureScript reader (cljs.reader/read-string) to consider the output of the web service endpoint as native data to the application.

No parsing of non-native data format is necessary, which makes the code of the UI simpler and makes the data manipulation much more natural to the developer since no external API is necessary.

What is Next?

This is the first of a series of tagging web service endpoints that will be released. Our intent is to release UMBEL tagging services that have different level of sophistication. Depending on how someone wants to use UMBEL, he will have access to different tagging services that he could use and supplement with their own techniques to end up with their desired results.

The next taggers (not in order) that are planned to be released are:

  • Plaintagger – no weighting or classification except by occurrence count
    • Entity plain tagger (using the Wikidata dictionary)
    • Scones plain tagger – concept + entity
  • Nountagger – with POS, only tags the nouns; generally, the preferred, simplest baselinetagger
    • Concept noun tagger
    • Entity noun tagger
    • Scones noun tagger
  • N-gramtagger – a phrase-basedtagger
    • Concept n-gram tagger
    • Entity n-gram tagger
    • Scones n-gram tagger
  • Completetagger – combinations of above with different machine learning techniques
    • Concept complete tagger
    • Entity complete tagger
    • Scones complete tagger.

So, we welcome you to try out the system online and we welcome your comments and suggestions.

My Optimal GNU Emacs Settings for Developing Clojure (so far)

Note: this blog post has been revised with this other blog post.

In the coming months, I will start to publish a series of blog posts that will explain how RDF data can be serialized in Clojure code and more importantly what are the benefits of doing this. At Structured Dynamics, we started to invest resources into this research project and we believe that it will become a game changer regarding how people will consume, use and produce RDF data.

But I want to take a humble first step into this journey just by explaining how I ended up configuring Emacs for working with Clojure. I want to take the time to do this since this is a trials and errors process, and that it may be somewhat time-consuming for the new comers.

Light Table

Before discussing how I configured Emacs, I want to introduce you to the new IDE: Light Table. This new IDE is mean to be the next generation of code editor. If you are new to Clojure, and more particularly if you never used Emacs before, I would strongly suggest you to start with this code editor. It is not only simple to use, but all the packages you will require to work with Clojure are already built-in.

As you may know, GNU Emacs has been developed using Emacs Lisp (a Lisp dialect). This means that it can be extended by installing and enabling  packages, all configurations options and behaviors can be changes, and even while it is running! Light Table is no different. It has been developed in ClosureScript, and it can be extended the same way. To me, the two real innovations with Light Table are:

  • The instarepl
  • The watches

The instarepl is a way to evaluate the value of anything, while you are coding, directly inline in your code. This is really powerful and handy when prototyping and testing code. Every time you type some code, it get evaluated in the REPL, and displayed inline in the code editor.

The watches are like permanent instarepl that you place within the code. Then every time the value changes, you see the result in the watch section. This is really handy when you have to see the value of some computation while the application, or part of the application, are running. You get a live output of what is being computed, directly into your code.

The only drawback I have with LightTable is that there is no legacy REPL available (yet?). This means that if you want to evaluate something unrelated to your code, you have to write the code directly into the editor and then evaluate it with the instarepl. Another issue regarding some use cases is that the evaluation of the code can become confusing like when you define a Jetty server in your code. Since everything get evaluated automatically (if the live mode is enabled) then it can start the server without you knowing it. Then to stop it, you have to write a line of code into your code and then to evaluate it to stop the server.

Because of the nature of my work, I am a heavy user of multiple monitors (daily working with six monitors). This means that properly handling multiple monitors is essential to my productivity. That is another issue I have with LightTable: you can create new windows that you can move to other monitors, but these windows are unconnected: they are different instances of LightTable.

Simple is beautiful, and it is why I really do like LightTable and why I think it is what beginners should use to start working with Clojure. However, it is not yet perfect for what I have to do. That is why I choose to use GNU Emacs for my daily work.

GNU Emacs

I don’t think that GNU Emacs needs any kind of introduction. It is heavy, it is unnatural, it takes time to get used to, the learning curve is steep, but… hell it is powerful for working with Lisp dialects like Clojure!

The problem with Emacs is not just to learn the endless list of key bindings (even if you can go a long way with the core ones), but also to configure it for your taste. Since everything can be configured, and that there exists hundred of all kind of packages, it takes time to configures all the options you want, and all the modules you require. This is the main reason I wrote this blog post: to share my (currently) best set of configuration options and packages for using Emacs for developing with Clojure.

I am personally developing on Windows 8, but these steps should be platform agnostic. You only have to download and install the latest GNU Emacs 24 version.

The first thing you have to do is to locate you .emacs file. All the configurations I am defining in this blog post goes into that file.

Packages

Once Emacs is installed, the first thing you have to do is to install all the packages that are required to develop in Clojure or that will make your life easier for handling the code. The packages that you have to install are:

  • cider
    • Clojure Integrated Development Environment and REPL – This is like Slime for Common Lisp. This is what turns Emacs into a Clojure IDE
    • Important note: make sure that the Cider version you are installing is coming from the MELPA-Stable repository, and not the MELPA one. At the time of the publication of the blogpost, the latest stable release is 0.6.
  • clojure-mode
    • Major mode for Clojure code
  • clojure-test-mode
    • Minor mode for Clojure tests
  • auto-complete
    • Auto Completion for GNU Emacs – This is what is used to have auto-completion capabilities into your code and in the mini-buffer
  • ac-nrepl
    • Auto-complete sources for Clojure using nrepl completions – This is what is used to add auto-completion capabilities to the NREPL
  • paredit
    • minor mode for editing parentheses  -*- Mode: Emacs-Lisp -*- – This is what will do all the Lisp like code formatting (helping you managing all these parenthesis)
  • popup
    • Visual Popup User Interface – This is what will enable popup contextual menus when using auto-completion in your code and in the NREPL
  • raindow-delimiters
    • Highlight nested parens, brackets, braces a different color at each depth – This is really handy to visually see where you are with your parenthesis. An essential to have
  • rainbow-mode
    • Colorize color names in buffers

Before installing them, we have to tell Emacs to use the Marmelade packages repository where all these packages are hosted and ready to the installed into your Emacs instance. At the top of your .emacs file, put:

[cc lang=’lisp’ line_numbers=’false’]
[raw](require ‘package)

(add-to-list ‘package-archives
‘(“melpa-stable” . “http://melpa-stable.milkbox.net/packages/”))

(add-to-list ‘package-archives
‘(“melpa” . “http://melpa.milkbox.net/packages/”))

(add-to-list ‘package-archives
‘(“marmalade” . “http://marmalade-repo.org/packages/”))

;; Initialize all the ELPA packages (what is installed using the packages commands)
(package-initialize)[/raw]
[/cc]

Important note: only use the MELPA repository if you want to install non-stable modules such as the Noctulix theme. If you are not expecting using it, then I would strongly suggest you to remove it and only to keep the MELPA-Stable repository in that list.

If you are editing your .emacs file directly into Emacs, and you can re-evaluate the settings file using Emacs, then by moving cursor at each top-level expression end (after closing parenthesis) and press C-x C-e. However, it may be faster just to close and restart Emacs to take the new settings into account. You can use any of these methods for the following set of settings changes.

Before changing any more settings, we will first install all the required packages using the following sequence of commands:

  • M-x package-install [RET] cider [RET]
  • M-x package-install [RET] clojure-mode [RET]
  • M-x package-install [RET] clojure-test-mode[RET]
  • M-x package-install [RET] auto-complete[RET]
  • M-x package-install [RET] ac-nrepl [RET]
  • M-x package-install [RET] paredit[RET]
  • M-x package-install [RET] popup [RET]
  • M-x package-install [RET] rainbow-delimiters [RET]
  • M-x package-install [RET] rainbow-mode [RET]

Additionally, you could have used M-x package-list-packages, then move your cursor in the buffer to the packages’ line. Then press i (for install) and once all the packages are selected, you could have press x (execute) to install all the packages all at once.

In the list of commands above, M-x is the “meta-key” normally bound to the left Alt key on your keyboard. So, M-x usually means Alt-x.

Now that all the packages are installed, let’s take a look at how we should configure them.

Configuring Keyboard

If you are using an English/US keyboard, you can skip this section. Since I use a French Canadian layout (On an English/US Das Keyboard!), I had multiple issues to have my keys working since all the binding changed in Emacs. To solve this problem, I simply had to define that language configuration option. Then I had to start using the right Alt key of my keyboard to write my brackets, curly brackets, etc:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; Enable my Canadian French keyboard layout
(require ‘iso-transl)[/raw]
[/cc]

Configuring Fonts

Since I am growing older (and that I have much screen estates with six monitors), I need bigger fonts. I like coding using Courier New, so I just configured it to use the font size 13 instead of the default 10:
[cc lang=’lisp’ line_numbers=’false’]
[raw];; Set bigger fonts
(set-default-font “Courier New-13”)[/raw]
[/cc]

Cider and nREPL

The next step is to configure Cider and the nREPL which are the two pieces that turns Emacs into a wonderful Clojure IDE:

[cc lang=’lisp’ line_numbers=’false’]
[raw](add-hook ‘clojure-mode-hook ‘turn-on-eldoc-mode)
(setq nrepl-popup-stacktraces nil)
(add-to-list ‘same-window-buffer-names “nrepl“)[/raw]
[/cc]

Auto-completion

The next step is to configure the auto-completion feature everywhere in Emacs: in any buffer, nREPL or in the mini-buffer. Then we want the auto-completion to appear in a contextual menu where the docstrings (documentation) of the functions will be displayed:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; General Auto-Complete
(require ‘auto-complete-config)
(setq ac-delay 0.0)
(setq ac-quick-help-delay 0.5)
(ac-config-default)

;; ac-nrepl (Auto-complete for the nREPL)
(require ‘ac-nrepl)
(add-hook ‘cider-mode-hook ‘ac-nrepl-setup)
(add-hook ‘cider-repl-mode-hook ‘ac-nrepl-setup)
(add-to-list ‘ac-modes ‘cider-mode)
(add-to-list ‘ac-modes ‘cider-repl-mode)[/raw]
[/cc]

Popping Contextual Documentation At Any Time

What is really helpful is to be able to pop the documentation for any symbol at any time just by pressing a series of keys. What need to be done is to configure Cider & ac-nrepl to bind this behavior to the C-c C-d sequence of keys:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; Poping-up contextual documentation
(eval-after-load “cider”
‘(define-key cider-mode-map (kbd “C-c C-d”) ‘ac-nrepl-popup-doc))[/raw]
[/cc]

Par Edit

Par Edit is the package that will help you out automatically formatting you Clojure code. It will balance the parenthesis, automatically indenting your S-expressions, etc.

[cc lang=’lisp’ line_numbers=’false’]
[raw](add-hook ‘clojure-mode-hook ‘paredit-mode)[/raw]
[/cc]

Show Parenthesis Mode

Another handy feature is to enable, by default, the show-parent-mode configuration option. That way, every time the cursor points to a parenthesis, the parent parenthesis will be highlighted into the user interface. This is an essential most-have with Par Edit:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; Show parenthesis mode
(show-paren-mode 1)[/raw]
[/cc]

Rainbow Delimiters

Another essential package to have to help you out maintaining these parenthesis. The rainbow delimiters will change the color of the parenthesis depending on how “deep” they are into the structure. Another essential visual cue:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; rainbow delimiters
(global-rainbow-delimiters-mode)[/raw]
[/cc]

Noctilux Theme

Did I say that I like LightTable? In fact, I really to like their dark theme. It is the best I saw so far. I never used any in my life since I never liked any of them. But that one is really neat, particularly to help visualizing Clojure code. That is why I really wanted to get a LightTable theme for Emacs. It exists and it is called Noctilux and works exactly the same way with the same colors.

If you want to install it, you can get it directly from the packages archives. Type M-x package-list-packages then search and install noctilux-theme.

Then enable it by adding this setting:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; Noctilus Theme
(load-theme ‘noctilux t)[/raw]
[/cc]

Binding Some Keys

Then I wanted to bind some behaviors to the F-keys. What I wanted is to be able to run Cider, to be able to start and stop Par Edit and to switch frames (windows within monitors) in a single click. I also added a shortkey for starting speedbar for the current buffer, it is an essential for managing project files. What I did is to bind these behaviors to these keys:

[cc lang=’lisp’ line_numbers=’false’]
[raw](global-set-key [f8] ‘other-frame)
(global-set-key [f7] ‘paredit-mode)
(global-set-key [f9] ‘cider-jack-in)
(global-set-key [f11] ‘speedbar)[/raw]
[/cc]

Fixing the Scroll

There is one thing that I really didn’t like, and it was the default behavior of the scrolling of Emacs on Windows. After some searching, I found the following configurations that I could fix to have a smoother scrolling behavior on Windows:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; scroll one line at a time (less “jumpy” than defaults)

(setq mouse-wheel-scroll-amount ‘(1 ((shift) . 1))) ;; one line at a time

(setq mouse-wheel-progressive-speed nil) ;; don’t accelerate scrolling

(setq mouse-wheel-follow-mouse ‘t) ;; scroll window under mouse

(setq scroll-step 1) ;; keyboard scroll one line at a time[/raw]
[/cc]

Complete Configuration File

Here is the full configuration file that I am using:

[cc lang=’lisp’ line_numbers=’false’]
[raw](require ‘package)

(add-to-list ‘package-archives
‘(“melpa-stable” . “http://melpa-stable.milkbox.net/packages/”))

(add-to-list ‘package-archives
‘(“melpa” . “http://melpa.milkbox.net/packages/”))

(add-to-list ‘package-archives
‘(“marmalade” . “http://marmalade-repo.org/packages/”))

;; Initialize all the ELPA packages (what is installed using the packages commands)
(package-initialize)

;; Enable my Canadian French keyboard layout
(require ‘iso-transl)

;; Set bigger fonts
(set-default-font “Courier New-13”)

;; Cider & nREPL
(add-hook ‘clojure-mode-hook ‘turn-on-eldoc-mode)
(setq nrepl-popup-stacktraces nil)
(add-to-list ‘same-window-buffer-names “nrepl“)

;; General Auto-Complete
(require ‘auto-complete-config)
(setq ac-delay 0.0)
(setq ac-quick-help-delay 0.5)
(ac-config-default)

;; ac-nrepl (Auto-complete for the nREPL)
(require ‘ac-nrepl)
(add-hook ‘cider-mode-hook ‘ac-nrepl-setup)
(add-hook ‘cider-repl-mode-hook ‘ac-nrepl-setup)
(add-to-list ‘ac-modes ‘cider-mode)
(add-to-list ‘ac-modes ‘cider-repl-mode)

;; Popping-up contextual documentation
(eval-after-load “cider”
‘(define-key cider-mode-map (kbd “C-c C-d”) ‘ac-nrepl-popup-doc))

;; paredit
(add-hook ‘clojure-mode-hook ‘paredit-mode)

;; Show parenthesis mode
(show-paren-mode 1)

;; rainbow delimiters
(global-rainbow-delimiters-mode)

;; Noctilus Theme
(load-theme ‘noctilux t)

;; Switch frame using F8
(global-set-key [f8] ‘other-frame)
(global-set-key [f7] ‘paredit-mode)
(global-set-key [f9] ‘cider-jack-in)
(global-set-key [f11] ‘speedbar)

;; scroll one line at a time (less “jumpy” than defaults)
(setq mouse-wheel-scroll-amount ‘(1 ((shift) . 1))) ;; one line at a time
(setq mouse-wheel-progressive-speed nil) ;; don’t accelerate scrolling
(setq mouse-wheel-follow-mouse ‘t) ;; scroll window under mouse
(setq scroll-step 1) ;; keyboard scroll one line at a time[/raw]
[/cc]

Conclusion

Now that we have the proper development environment in place, the next blog posts will really get into the heart of the matter: what are the different ways to serialize RDF data in Clojure code, how the generated code can be used, what are the benefits, how it changes the way that data (RDF in this case, but really any data) can be produced and consumed.

We think that there are profound implications into how we, as Semantic Web specialists, will work with data instances and ontologies in the future. The initial project that will embed and benefit from these new principles and techniques will be the next version of the UMBEL ontology.

Final note: there are an endless list of features and packages for Emacs. Obviously, I don’t know all of them, so if you are aware of any settings or packages that I missed here and that could improve this setup, please share them in the comments.