Friday, May 28, 2010

Understanding JavaScript libraries

I'd like to share with you some thoughts on supporting javascript libraries in an IDE. I found it quite interesting so far and I hope you'll enjoy the read :-)

Let's assume we have to add content-assist for popular javascript libraries, like jQuery or Prototype, etc, there's dozens of those.

First of all, if those libraries are written entierly in JavaScript, why do we have to add anything special? Can't some JS parser (e.g. JSDT) just take care of understanding those libraries, like it's done for Java or other languages?

Unfortunately no. Most popular convention in JavaScript world is to write libraries that self-expand themselves in runtime. For example:


/** this function adds all fields and functions to given object
*/
jQuery.extend = function(object, fields) {
for (var field in fields) {
object.prototype[field] = fields[field];
}
};

jQuery.extend(Array, {

/** Removes an object from array
*/
remove : function(obj) {
}

/** Prints JSON string of an object
*/
toJSON : function() {
}
});

So statically analyzing above code, compiler could guess only that extend(objects, fields) function is defined and added to jQuery object. For the other functions - remove(obj) and toJSON(), we can only assume, based on some comments (and code), that they'll be added to Array object after invoking the extend() method. This extending pattern is the fundament behind jQuery plugins, Prototype and ExtJS.

Before going further into details with possible solutions, let's see what does typical static analyzer (e.g JSDT) understand. It understands basic JavaScript structures, like:


/** jQuery.function1 jsdoc
*/
jQuery.prototype.extend = function(object, object) {}

/** jQuery function jsdoc
*/
Array.remove = function(object) {}
Array.toJSON = function() {}
}

Based on above code, JSDT can provide content assist for functions and fields and properly match jsdoc to them. So maybe all that is needed is some convertion between original library source code and understandable code?

Ok so what are possible solutions to build proper JSDT data?
There's several options:

  1. improve static analysis to understand popular JavaScript patterns, such as extend patten.
  2. forget about source code and use libraries provided API documentation to derive JSDT documentation
  3. run the source code in real environment and capture it's behavior


AD1. Improve static analysis with knowledge of popular JavaScript patterns, such as extend patten.

It's possible to extend JSDT with extra logic that understands what extend() function does. Then, whenever static analyzer finds call to extend(), it can easily apply the pattern. The drawback is that there's more patterns, for example extendIf(boolean, object, fields) (in ExtJS),
or code is more complicated, for example the features to be added to object are computed in a non-trivial way. For example:

features = { // an object with various features that should be added to
various objects
objectAdditions : { object : Object, extend : function(), id :
function() },
domAdditions : { object : Element, toHTML : function() },
docuAdditions : ( object : Document, connect : function() }
}

for (var featureSet in features) {
Lib.extend(features[featureSet].object, features[featureSet]);
}

Libraries authors have a lot of invention: they either optimize their libraries for JS file size, or try to make it as human-readable as possible by adding more JS structures. In both cases we can find either more functions, similar to extend() or more JavaScript code used
to build new API.

AD2. Forget about source code and use libraries provided API documentation to derive JSDT documentation

The most successful libraries have excellent documentation. JQuery has it's own in XML, which is then converted to HTML. ExtJS and Prototype have greatly commented source code. It's worth mentioning that JS libraries source code can be found in several formats. First as a
development source code - single library is split into many separate files of logically-related stuff. Second format is the debugging source code - library in it's final single file shape, but with formatting and comments included. Finally, there's minified source code - typically a
single line, with all whitespaces and comments ripped off.

The main advantage is that everything that is officially documented/supported in the docs, will be correctly reflected in JSDT. The main drawback is that for each library a separate converter is needed. Additionally, XML and HTML structure are subject to change, so there's a risk that each new version of library will also need a new converter. Documents may also show API in ways that are not possible to encode in JavaScript stubs. For example use overloading to describe
different kinds of the same function, whereas in fact in JavaScript this is just a single function with variable number of arguments:

/** jsdoc1 */
jQuery(selector) {}
/** jsdoc2 */
jQuery(element, expr)

Finally, while parsing XML is relatively painless, the HTML is often a subject to many XML parser errors, so regular expressions may be better tool than XML parser.

AD3. Run the source code in real environment and capture it's behavior

This sounds like WIN, because after all, all javascript libraries are written to be run, so it's only matter of capturing the result.

There's two ways to capture the JavaScript structures. One way is by using Mozilla Rhino. It's a JavaScript engine written completely in Java. Rhino also has excellent debugging facility that let's easily inspect the memory state. Being Java-based has another advantage, that it's relatively easy to integrate.

Another solution is one of standard JavaScript engines, e.g IE or Mozilla - those are supported by all libraries. Natively in JavaScript, it's fairly easy to capture the result of above examples, because all objects in JavaScript are hashmaps, so reading an object API is as easy as iterating over it's members with a for loop. The drawback is that it's impossible to read API of objects that have not been created yet. For example:

AnEvent = function(name, time, source) {
this.eventName = name;
this.time = time;
this.source = source;
this.send = function(object) {...};
}

So until someone calls new AnEvent(name, time, source), it's impossible to figure out the structure of the AnEvent type. The other problem with this solution is that only trusted code can be run. Any infinite loop would break it.

1 comment:

  1. Jacek,

    Nice description of approaches to support IDE functionality for dynamic languages. It's a difficult problem.

    I worked on "IDL Workbench" which is Eclipse-based and we had to tackle some of these non-trivial issues. We parsed the code in the projects to get as much info as possible. Then as the code ran, we would add additional runtime information to our model.

    I stumbled on your blog, as I've been investigating the JSDT and how feasible it is for doing real js work. Unfortunately, so far the answer to that question is not so good. It seems you've been on the path ahead of me!

    ReplyDelete