Archive for the 'Programming' Category

RDF Code: Serializing RDF Data as Clojure Code

RDF Code is a specification to serialize RDF data as Clojure code1. This blog post introduce the first version of this new RDF serialization format. I will outline the rules that specify how such RDF data should be serialized using the Clojure programming language.

This specification may change over time. However, this is the specification that will be used for the future blog posts that I will write about this subject, and for the code that will be released.

I am also expecting feedbacks and propositions to make this serialization easier to use, simpler to define and cleaner.

What we do with this serialization is to write RDF resources as Clojure maps. This is not about defining a DSL to manipulate this data, but really to define the core Clojure structure that will be manipulated by Clojure functions and applications. Eventually a (or multiple) DSL could be created to help users and developers using this RDF data in their Clojure application. But this is not the current focus.

A Complete RDF Resource

Before outlining all the rules to create well-formed RDF data as Clojure code, let's take a look at a resource that uses all the serialization features2. Note that this is the more complex it can be. Below we will see how we can normalize the usage of the serialization rules to end up with clean and easy to read RDF data as Clojure code.

(def fred {uri "http://foo.com/datasets/people/fred"
           rdf/type [foaf/+person owl/+thing]
           iron/pref-label "Fred"
           iron/alt-label {value "Frederick"
                           lang "en"}
           foaf/skypeID {value "frederick.giasson"
                         datatype xsd/*string}
           foaf/knows [{uri "http://foo.com/datasets/people/bob"}
                       mike
                       "http://foo.com/datasets/people/teo"]})

This code shows how to:

  • Serialize a single resource as a Clojure map
  • How to define a URI for that resource
  • How to define one or multiple rdf:type for a resource
  • How to define one or multiple values for a owl:DatatypeProperty
  • How to define one or multiple values for a owl:ObjectProperty
  • How to define the language of a Literal
  • How to define the datatype of a Literal

As you can see, such a RDF serialization format is expressive enough to be able to express any RDF triples. It also has syntactic rules that help reading and writing RDF data in that format.

Note that since this RDF data is also Clojure code, it means that the serialization format has been highly influenced by Clojure's own syntax and coding style3. RDF data serialized using this format also needs to be valid Clojure code. Now, let's outline the rules that govern the creation of such RDF data, then let explains all these rules using simple RDF code examples.

RDF Code Rules

Here is the list of all the rules that govern the creation of RDF data serialized as Clojure code:

  1. A RDF resource is defined as a Clojure map where:
    1. Every key is a symbol that reference a function
    2. Every value is a:
      1. string
        1. A string is considered a literal if the key is a owl:DatatypeProperty
        2. A string is considered a URI if the key is a owl:ObjectProperty
      2. map
        1. A map represent a literal if the value key is present
        2. A map represent a reference to another resource if the uri key is present
        3. A map is invalid if it doesn't have a uri nor a value key
      3. vector
        1. A vector refer to multiple values. Values of a vector can be strings, maps or symbols
      4. symbol
        1. A symbol can be created to simplify the serialization. However, these symbols have to reference a string or a map object

In addition to these rules, there are some more specific rules such as:

  1. The value of a uri key is always a string
  2. If the rdf/type key is not defined for a resource, then the resource is considered to be of type owl:Thing (since everything is at least an instance of the owl:Thing class in OWL)

Finally, there are two additional classes and datatypes creation conventions:

  1. The name of the classes starts with a + sign, like: owl/+thing
  2. The name of the datatypes starts with a * sign, like: xsd/*string

As you can see, the rules that govern the serialization of RDF data as Clojure code are minimal and should be simple to understand for someone who is used to Clojure code and that tried to write a few resource examples using this format. Now, let's apply these rules with a series of examples.

Note 1: in the examples of this blog post, I am referring to symbols like uri, value, lang, datatype, etc. To make the rules simpler to read and understand, consider that these symbols are defined in the user's namespace. However, they are symbols that are defined in the rdf.core namespace that will be made publicly available later.

Note 2: I am also referring to namespaced symbols like rdf/type, iron/pref-label, etc. These symbols are defined in their respective namespaces. They have been defined such as (:require [ontologies.rdf :as rdf] [ontologies.iron :as iron]) into the ns function of the Clojure source code file that define this RDF resource. We will discuss about these namespaces in a subsequent blog post.

Serializing RDF Code in N-Triples

Before starting to list all the examples, let's define a Clojure function that we will use to convert the RDF code as N-Triples4. N-Triples is just a list of <subject> <predicate> <object> triples that describes the RDF resources we are describing. N-Triples is the simplest and most verbose RDF serialization that currently exists. What this serialize-ntriples function does is to take a Clojure map that represent a RDF resource and return a string that represents the serialized N-Triples.

What is important with that code is that is shows how the rules we outlined above got implemented to serialize RDF code as N-Triples. Such serializer function could be created to serialize Turtle and XML RDF serializations as well. Note that you won't be able to use the serialize-ntriples function because you are missing the ontologies files. I will make them available in a subsequent blog post that will explain how properties, classes and datatypes are created and used in this context.

(declare serialize-ntriples-map-value serialize-ntriples-string-value is-datatype-property?)

(defn serialize-ntriples
  [resource]
  (let [n3 (atom "")
        iri (get resource rdf.core/uri)]
    (doseq [[property vals] resource]
      (let [property-uri (get (meta property) rdf.core/uri)]
        ; Don't do anything with the "uri" key
        (if (not= property rdf.core/uri)
          (if (vector? vals)
            ; Here the value is a vector of maps or values
            (doseq [val vals]
              (if (map? val)
                ; The value of the vector is a map
                (reset! n3 (str @n3 (serialize-ntriples-map-value val iri property-uri)))
                (if (string? val)
                  ; The value of the vector is a string
                  (reset! n3 (str @n3 (serialize-ntriples-string-value val iri property-uri property))))))
            (if (map? vals)
              ; The value of the property is a map
              (reset! n3 (str @n3 (serialize-ntriples-map-value vals iri property-uri)))
              (if (string? vals)
                ; The value of the property is some kind of literal
                (reset! n3 (str @n3 (serialize-ntriples-string-value vals iri property-uri property)))))))))
    @n3))

(defn- serialize-ntriples-map-value
  [m iri property-uri]
  (if (not (nil? (get m rdf.core/uri)))
    ; The value is a reference to another resource
    (format "&lt;%s> &lt;%s> &lt;%s> .\n" iri property-uri (get m rdf.core/uri))
    (if (not (nil? (get m rdf.core/value)))
      ; The value is some kind of literal
      (let [value (get m rdf.core/value)
            lang (if (get m rdf.core/lang) (str "@" (get m rdf.core/lang)) "")
            datatype (if (get m rdf.core/datatype) (str "^^&lt;" (get (get m rdf.core/datatype) rdf.core/uri) ">") "")]
        (format "&lt;%s> &lt;%s> \"\"\"%s\"\"\"%s%s .\n" iri property-uri value lang datatype))
      (if (string? m)
        ; The value of the sector is some kind of literal
        (format "&lt;%s> &lt;%s> \"\"\"%s\"\"\" .\n" iri property-uri m)))))

(defn- serialize-ntriples-string-value
  [s iri property-uri property]
  ; The value of the vector is a string
  (if (true? (is-datatype-property? property))
    ; The property referring to this value is a owl:DatatypeProperty
    (format "&lt;%s> &lt;%s> \"\"\"%s\"\"\" .\n" iri property-uri s)
    ; The property referring to this value is a owl:ObjectProperty
    (format "&lt;%s> &lt;%s> &lt;%s> .\n" iri property-uri s)))

(defn is-datatype-property?
  [property]
  (if (= (get (get (meta property) rdf/type) rdf.core/uri) (get ontologies.owl/+datatype-property rdf.core/uri))
    (eval true)
    (eval false)))

The Simplest Resource

Here is the simplest resource that can be written:

(def fred {uri "http://foo.com/datasets/people/fred"
           rdf/type "http://xmlns.com/foaf/0.1/Person"
           iron/pref-label "Fred"
           foaf/skypeID "frederick.giasson"
           foaf/knows ["http://foo.com/datasets/people/bob"
                       "http://foo.com/datasets/people/mike"
                       "http://foo.com/datasets/people/teo"]})

It produces these triples:

<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://foo.com/datasets/people/fred> <http://purl.org/ontology/iron#prefLabel> """Fred""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/skypeID> """frederick.giasson""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/bob> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/mike> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/teo> .

In this example, all the references to the classes and to other resources are made using strings that represent URIs. Then all the literal values are normal strings as well. The vector is composed of a list of URIs without mixing different type of values.

Using Classes Symbols

In this example, we will use the symbol that reference a class resource we have defined in an ontology:

(def fred {uri "http://foo.com/datasets/people/fred"
           rdf/type foaf/+person
           iron/pref-label "Fred"
           foaf/skypeID "frederick.giasson"
           foaf/knows ["http://foo.com/datasets/people/bob"
                       "http://foo.com/datasets/people/mike"
                       "http://foo.com/datasets/people/teo"]})

The same set of triples as the previous example will be generated by serialize-ntriples.

What is interesting with this new example is that it is more appealing to human readers. Instead of having a full URI string, we are seeing a symbol which is more pleasant to the eyes.

But the real benefit is no that it is more pleasant to the eyes, but that it really refers to something. It refers to a class resource. This means that we have a docstring for that class, that we can check the code that describes the class to see all its characteristics, that you will be able to auto-complete it in your IDE, etc.

Multiple Types

It is possible to define multiple types for a resource. The only thing you have to do is to use a vector as the value of the rdf/type key:

(def fred {uri "http://foo.com/datasets/people/fred"
           rdf/type [foaf/+person owl/+thing]
           iron/pref-label "Fred"
           foaf/skypeID "frederick.giasson"
           foaf/knows ["http://foo.com/datasets/people/bob"
                       "http://foo.com/datasets/people/mike"
                       "http://foo.com/datasets/people/teo"]})

This code will generate the following triples:

<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Thing> .
<http://foo.com/datasets/people/fred> <http://purl.org/ontology/iron#prefLabel> """Fred""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/skypeID> """frederick.giasson""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/bob> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/mike> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/teo> .

Specifying a Datatype for a Literal

It is also possible to define a datatype for a literal. What you have to do is to use a map with a value and a datatype key:

(def fred {uri "http://foo.com/datasets/people/fred"
           rdf/type [foaf/+person owl/+thing]
           iron/pref-label {value "Fred"
                            datatype xsd/*string}
           foaf/skypeID "frederick.giasson"
           foaf/knows ["http://foo.com/datasets/people/bob"
                       "http://foo.com/datasets/people/mike"
                       "http://foo.com/datasets/people/teo"]})

The triples that will be generated are:

<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Thing> .
<http://foo.com/datasets/people/fred> <http://purl.org/ontology/iron#prefLabel> """Fred"""^^<http://www.w3.org/2001/XMLSchema#string> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/skypeID> """frederick.giasson""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/bob> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/mike> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/teo> .

Specifying a Language Tag for a Literal

It is also possible to define a language tag for a literal. What you have to do is to use a map with a value and a lang key:

(def fred {uri "http://foo.com/datasets/people/fred"
           rdf/type [foaf/+person owl/+thing]
           iron/pref-label {value "Fred"
                            lang "en"}
           foaf/skypeID "frederick.giasson"
           foaf/knows ["http://foo.com/datasets/people/bob"
                       "http://foo.com/datasets/people/mike"
                       "http://foo.com/datasets/people/teo"]})

This code will produce the following triples:

<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Thing> .
<http://foo.com/datasets/people/fred> <http://purl.org/ontology/iron#prefLabel> """Fred"""@en .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/skypeID> """frederick.giasson""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/bob> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/mike> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/teo> .

Referring to a URI Using a Map

It is possible to explicit the fact that a literal value is a URI reference. It is possible to do that by using a map and by specifying the uri key. Some people may prefer that approach because it makes the fact that a literal is a URI explicit without having to know the nature of the key (i.e. if the property is a datatype or an object property):

(def fred {uri "http://foo.com/datasets/people/fred"
           rdf/type [foaf/+person owl/+thing]
           iron/pref-label {value "Fred"
                            lang "en"}
           foaf/skypeID "frederick.giasson"
           foaf/knows [{uri "http://foo.com/datasets/people/bob"}
                       {uri "http://foo.com/datasets/people/mike"}
                       {uri "http://foo.com/datasets/people/teo"}]})

The same triples will be generated as the example above.

Mixing Values in a Vector

It is possible to mix the values in a vector. In the following example, we are using literals as URIs, and maps that refer to URIs as well. The rules permit this kind of values mixing within a vector and the software that manipulate this kind of RDF data should be agnostic to this:

(def fred {uri "http://foo.com/datasets/people/fred"
           rdf/type [foaf/+person owl/+thing]
           iron/pref-label {value "Fred"
                            lang "en"}
           foaf/skypeID "frederick.giasson"
           foaf/knows ["http://foo.com/datasets/people/bob"
                       {uri "http://foo.com/datasets/people/mike"}
                       "http://foo.com/datasets/people/teo"]})

The same triples will be generated as the example above.

Using Symbols to Get Cleaner Code

Because of the way Clojure works, you can define new symbols that will refer to the values to add to this structure. Let's go wild, and do define a symbol for each value of the resource we are describing:

(def fred-uri "http://foo.com/datasets/people/fred")

(def foaf_person foaf/+person)

(def owl_thing owl/+thing)

(def fred-label {value "Fred"
                 lang "en"})

(def fred-skype-id "frederick.giasson")

(def bob-uri "http://foo.com/datasets/people/bob")

(def mike-uri {uri "http://foo.com/datasets/people/mike"})

(def teo-uri "http://foo.com/datasets/people/teo")

(def fred {uri fred-uri
           rdf/type [foaf/+person owl/+thing]
           iron/pref-label fred-label
           foaf/skypeID fred-skype-id
           foaf/knows [bob-uri
                       mike-uri
                       teo-uri]})

The triples generated will be the same as the example above.

What happens here is that the Clojure reader substitutes the symbols by the actual things they refer to at evaluation time.

This is why this works and it is why it is still valid RDF code according to the rules outlined above. The serialize-ntriples function is not even aware that this data structure has been defined that way. It is the case because the map that it will receive as input will already be evaluated. This means that what it gets as input are the objects referenced by these symbols, and not the symbols themselves.

This mechanism can be leveraged to make the RDF code even more readable if you have patterns that a repeated constantly in a dataset file you are creating.

Conclusion

This is all good, but why yet another serialization for RDF? As I started to outline in my two previous blogposts on this topic, this is not just about serializing RDF in another format. It is about having RDF data that can be evaluated as code (in this case, Clojure code).

Considering RDF properties as functions open-up a World of possibilities. First, it means that by evaluating and compiling this kind of RDF code, you have a data structure that is able to validate itself according to the way the properties are defined into the OWL ontologies used to define their behaviors and semantics.

This blog post focused on how to serialize instances data. In the next blog posts, I will cover:

  • How to reify triples using this serialization
  • How to create the classes, properties and datatypes used by the serialization
  • How to validate the data structure
  • How to manage ontologies as Clojure packages

Feel free to comment this blog post and propose changes to the serialization format or the serialize-ntriples function.

Data as Code. Code as Data: Tighther Semantic Web Development Using Clojure

LhrMyRXKX9w!v!gOqzkEBlYSdf8
I have been professionally working in the field of the Semantic Web for more than 7 years now. I have been developing all kind of Ontologies. I have been integrating all kind of datasets from various sources. I have been working with all kind of tools and technologies using all kind of technologies stacks. I have been developing services and user interfaces of all kinds. I have been developing a set of 27 web services packaged as the Open Semantic Framework and re-implemented the core Drupal modules to work with RDF data has I wanted it to. I did write hundred of thousands of line of codes with one goal in mind: leveraging the ideas and concepts of the Semantic Web to make me, other developers, ontologists and data-scientists working more accurately and efficiently with any kind data.

However, even after doing all that, I was still feeling a void: a disconnection between how I was thinking about data and how I was manipulating it using the programming languages I was using, the libraries I was leveraging and the web services that I was developing. Everything is working, and is working really well; I did gain a lot of productivity in all these years. However, I was still feeling that void, that disconnection between the data and the programming language.

Every time I want to work with data, I have to get that data serialized using some format, then I have to parse it using a parser available in the language I am working with. Then the data needs to be converted into an internal structure by the parser. Then I have to use all kind of specialized APIs to work with the data represented by that structure. Then if I want to validate the data that I am working with, I will probably have to use another library that will perform the validation for me which may force me to migrate that data to another system that will make it available to these reasoners and validators. Etc, etc, etc…

All this is working: I have been doing this for years. However, the level of interaction between all these systems is big and the integration take time and resources. Is there a way to do things differently?

The Pink Book

417XBWM48NL._Once I realized that, I started a quest to try to change that situation. I had no idea where I was heading, and what I would find, but I had to change my mind, to change my view-point, to start getting influenced by new ideas and concepts.

What I realized is how disconnected mainstream programming languages may be with the data I was working with. That makes a natural first step to start my investigation. I turned my chair and started to stare at my bookshelves. Then, like the One Ring, there was this little Pink (really pink) book that was staring at me: Lambda-calcul types et modèles. I bought that books probably 10 years ago, then I forgot about it. I always found its cover page weird, and its color awkward. But, because of these uncommon features, I got attracted by it.

Re-reading about lambda-calculus opened my eyes. It leaded me to have a particular interest in homoiconic programming languages such as Lisp and some of its dialects.

Code as Data. Data as Code.

Is this not what I was looking for? Could this not fill the void I was feeling? Is this not where my intuition was heading?

What if the “data” I manipulate is the same as the code I am writing? What if the data that I publish could be the code of a module of an application? What if writing code is no different than creating data? What if data could be self-aware of its own semantic? What if by evaluating data structures, I would validate that data at the same time? What if “parsing” my data is in fact evaluating the code of my application? What if I could reuse the tools and IDEs I use for programming, but for creating, editing and validating data? Won’t all these things make things simpler and make me even more productive to work with data?

My intuition tells me: yes!

We have a saying at Structured Dynamics: the right tool for the right job.

That seems to be the kind of tool I need to fill that void I was feeling. I had the feeling that the distinction between the code and the data should be as minimal as possible and homoiconic languages seems to be the right tool for that job.

Code as Data. Data as Code.

That is all good, but what does that really mean? What are the advantages and benefits?

That is the starting of a journey, and this is what we will discover in the coming weeks and months. Structured Dynamics is starting to invest resources into that new project. We choose to do our work using Clojure instead of other Lisp dialects such as Common Lisp. We choose Clojure for many reason: it is compiled in JVM bytecode. This means that you can re-use any of this code into any other Java applications and this also means that you can re-use any Java libraries natively into Clojure. But we also did use it because of its native way to handle concurrency and parallelism, its unique way to manage metadata within data structures, for its meta-programming capabilities using its macro system that enable us to create DSL, etc.

The goal was to create a new serialization format for RDF and to serialize RDF data as Clojure code. The intuition was that RDF data would then become an integral part of Clojure applications because the data would be the code as well.

The data would be self-aware of its own semantic, which means that by evaluating the Clojure “RDF” code it would also auto-validate itself using its embedded semantic. The RDF data would be in itself an [Clojure] application that would be self-aware of its own semantic and that would know how to validate itself.

That is the crux of my thinking. Then, how could this be implemented?

That is what I will cover in the coming weeks and months. We choose to use Clojure because it seems to be a perfect fit for that job. We will discover the reasons over time. However, the goal of these blog posts is to show how RDF can be serialized into [Clojure] code and the benefits of doing so. It is not about showing all the neat features of, and the wonderful minding behind Clojure. For that, I would strongly suggest you to get started with Clojure by reading the material covered in Tips for Clojure Beginners, and particularly to take a few hours to listen Rich Hickey’s great videos.

 

 

jQuery Cookie Pluging Extended With HTML5 localStorage And Chunked Cookies

Is there a web developer that never used cookies to save some information in a user’s browser? There may be, but they should be legion. As you probably know, the problem with cookies is that their implementation in browsers is random: some will limit the size of the cookie to 4096 bytes, others will limit the number of cookies from a specific domain to 50, others will have no perceivable limits, etc.


In any case, if one of these limits is reached, the cookie is simply not created the browser. This is fine, because web developer expects cookies to fail from time to time, and the system they develop has to cope with this unreliableness. However, this situation can sometimes become frustrating, and it is why I wanted to extend the default behavior of the jQuery Cookie plugin with a few more capabilities.

This extension to the jQuery Cookie plugin adds the capability to save content that is bigger than 4096 bytes long using two different mechanism: the usage of HTML5′s localStorage, or the usage of a series of cookies where the content is chunked and saved. This extension is backward compatible with the jQuery Cookie plugin and its usage should be transparent to the users. Even if existing cookies have been created with the normal Cookie plugin, they will still be usable by this new extension. The usage syntax is the same, but 3 new options have been created.

Now, let’s see how this plugin works, how developers should use it, what are its limitations, etc.

You can immediately download the jQuery Extended Cookie plugin from here:

Limitations Of Cookies

First, let’s see what the RFC 2109 says about the limitations of cookies in web browsers. Browsers should normally have these implementation limits (see section 6.3):

   Practical user agent implementations have limits on the number and
   size of cookies that they can store.  In general, user agents' cookie
   support should have no fixed limits.  They should strive to store as
   many frequently-used cookies as possible.  Furthermore, general-use
   user agents should provide each of the following minimum capabilities
   individually, although not necessarily simultaneously:

      * at least 300 cookies
      * at least 4096 bytes per cookie (as measured by the size of the
        characters that comprise the cookie non-terminal in the syntax
        description of the Set-Cookie header)
      * at least 20 cookies per unique host or domain name

   User agents created for specific purposes or for limited-capacity
   devices should provide at least 20 cookies of 4096 bytes, to ensure
   that the user can interact with a session-based origin server.

   The information in a Set-Cookie response header must be retained in
   its entirety.  If for some reason there is inadequate space to store
   the cookie, it must be discarded, not truncated.

   Applications should use as few and as small cookies as possible, and
   they should cope gracefully with the loss of a cookie.

New Options

Before I explains how this extension works, let me introduce three new options that have been added to the Cookie plugin. These new options will be put into context, and properly defined later in this blog post.

  • maxChunkSize - This defines the maximum number of bytes that can be saved in a single cookie. (default: 3000)
  • maxNumberOfCookies - This is the maximum number of cookies that can be created for a single domain name. (default: 20)
  • useLocalStorage - This tells the extended Cookie plugin to use the HTML5′s localStorage capabilities of the browser instead of a cookie to save that value. (default: true)

How Does This Extension Works?

As I said in the introduction of this blog post, this extension to the jQuery Cookie plugin does two things:

  1. It uses the HTML5 localStorage capabilities of the browser if this feature is available instead of relying on the cookies. However, if cookies are needed by the developer, this feature can be turned off with the useLocalStorage = false option
  2. If the localStorage option is disable, or simply not available on a browser, and if the content is bigger than the limit of the size of a cookie, then this extension will chunk the input content, and save it in multiple cookies

If the useLocalStorage is true, then the plugin will try to see if the HTML5 localStorage mechanism is available on the browser. If it is, then it will use that local storage to save and retrieve content to the browser. If it is not, then the plugin will act like if useLocalStorage is false and the process will continue by using cookies to save and read that content from the browser.

If useLocalStorage is false, or if the HTML5 localStorage mechanism is not available on the browser, then the plugin will check if the content is bigger than the maxChunkSize option, than all the chunks will be saved in different cookies until it reaches the limit imposed by the maxNumberOfCookies option.

If cookies are used, then two use-cases can happen:

  1. The content is smaller than or equal to maxChunkSize
  2. The content is bigger than maxChunkSize

If the content is smaller than or equal to maxChunkSize than only one cookie will be created by the browser. The name of the cookie will be the value provided to the key parameter.

If the content is bigger than maxChunkSize than multiple cookies will be created, one per chunk. The convention is that the name of the first cookie is the value provided to the key parameter. The name of the other chunks is the value provided to the key parameter with the chunk indicator ---ChunkNum append to it. For example, if we have a cookie with a content of 10000 bytes that has maxChunkSize defined to 4000 bytes, then these three cookies would be created:

  • cookie-name
  • cookie-name---1
  • cookie-name---2

Usage

Now, let’s see how this extended jQuery Cookie plugin should be used in your code. The usage of the extension is no different from the usage of the normal jQuery Cookie plugin. However, I am showing how to use the new options along with how to use the plugin in general.

Create a Cookie

Let’s create a cookie that expires in 365 days and where the path is the root:

$.cookie('my-cookie', "the-content-of-my-cookie", { expires: 365, path: "/" });

By default, this value will be persisted in the localStorage if the browser supports it, and not in a cookie. So, let’s see how to force the plugin to save the content in a cookie by using the useLocalStorage option:

$.cookie('my-cookie', "the-content-of-my-cookie", {useLocalStorage: false, expires: 365, path: "/" });

Delete a Cookie

Let’s see how a cookie can be deleted. The method is simply to put null as the value of the cookie. This will instruct the plugin to remove the cookie.

$.cookie('my-cookie', null);

With that call, the plugin will try to remove my-cookie both in the localStorage and in the cookies.

Read a Cookie

Let’s see how we can read the content of a cookie:

var value = $.cookie('my-cookie');

With this call, value will get the content that has been saved in the localStorage, or the cookies. This will depend if the localStorage was available in the browser.

Now, let’s see how to force reading the cookies by bypassing the localStorage mechanism:

var value = $.cookie('my-cookie', {useLocalStorage: false});

Note that if the cookie is not existing for a key, then the $.cookie() function will return null.

Using Limitations

Let’s see how to use the maxNumberOfCookies and maxChunkSize options to limit the size and the number of cookies to be created.

With this example, the content will be saved in multiple cookies of 1000 bytes each up to 30 cookies:

var value = $.cookie('my-cookie', "the-content-of-my-cookie-is-10000-bytes-long...", {useLocalStorage: false, maxChunkSize  = 1000, maxNumberOfCookies = 30, expires: 365, path: "/" });

Limitations

Users have to be aware of the limitations of this enhanced plugin. Depending on the browser, the values of the maxChunkSize and the maxNumberOfCookies options should be different. In the worse case, some cookies (or cookies chunks) may simply not be created by the browser. As stated in the RFC 2109, the web applications have to take that fact into account, and be able to gracefully cope with this.

Future Enhancements

In the future, this extension should detect the browser where it runs, and setup the maxChunkSize and the maxNumberOfCookies parameters automatically depending on the cookies limitation of each browser.

Conclusion

I had to create this extension to the jQuery Cookie plugin to be able to store the resultsets returned by some web service endpoints. It is only used to limit the number of queries sent to these endpoints. Since the values returned by the endpoints are nearly static, that they are loaded at each page view and that they are a few kilobytes big, I had to find a way to save that information in the browser, and to overcome the size limitation of the cookies if possible. I also needed to be able to cope with older versions of browsers that only supports cookies. In the worse case scenario, the browser will simply send the request to the endpoints at each page load for the special use-cases where nothing works: not the cookies and not the localStorage. But at least, my application will benefit of this enhancement from the 95% of the users were one of these solutions works.




This blog is a regularly updated collection of my thoughts, tips, tricks and ideas about my semantic Web researches and related software development.


RSS Twitter LinkedIN


Follow

Get every new post on this blog delivered to your Inbox.

Join 69 other followers:

Or subscribe to the RSS feed by clicking on the counter:




RSS Twitter LinkedIN