Mon, 26 Mar 2007

SpyderMonkey // twiddle your Javascript from Python

I should prefix this post with the proviso there is no code online yet—yes, I know that sucks but given a choice between (finally) posting about this now or waiting until I have the code online I figured the former was better.

Here is the abstract about the project I submitted to linux.conf.au last year:

SpyderMonkey : twiddle your Javascript (or someone else's!) from Python

SpyderMonkey (http://code.rancidbacon.com/spydermonkey/) aims to let you twiddle with your Javascript (or someone else's!) from Python.

While at least one unmaintained Python wrapper for JavaScript exists (http://wwwsearch.sourceforge.net/python-spidermonkey/) SpyderMonkey differs in its implementation by using ctypes (http://docs.python.org/dev/lib/module-ctypes.html) to wrap the underlying Mozilla spidermonkey JavaScript implementation (http://www.mozilla.org/js/spidermonkey/). SpyderMonkey is also the only implementation I am aware of that also wraps the parser and not just the interpreter--this is key to its use in static JavaScript code analysis.

The primary motivation for wrapping the JavaScript parser was to enable further development of JavaScript reverse-engineering and code analysis tools.

During my previous efforts of reverse engineering of JavaScript "Rich Internet Applications" like GMail, Google Maps and similar products I developed a number of scripts based on regular expression parsing "pretty-printed" versions of the obfuscated/compressed source code. While simplistic, these scripts were able to generate Class and Function references listing arguments and the locations where they were used (http://libgmail.sourceforge.net/googlemaps/maps.js.html).

Eventually regex based parsing runs into a wall and this drives the use of an actual parser. JavaScript has a quite complex grammar and in my research I was unable to find a functioning pure-Python JavaScript parser.

This presentation will look at the development of the GPL SpyderMonkey wrapper and some of the issues involved in its construction. In addition we will look at some actual and potential applications that an easier Python-friendly interface allows us to construct in order to assist efforts in areas of debugging, reverse engineering, inter-operability, maintenance and source recovery.

With most RIA JS applications using compression or obfuscation and environments like Google's Web Toolkit AJAX framework for Java (http://code.google.com/webtoolkit/) (or Python's PyJamas http://pyjamas.pyworks.org/) producing JavaScript without direct human involvement there is a growing need for tools to analyse this generated-code. Attend this session and learn what approaches can work for you now and how SpyderMonkey may help you create new tools for the future. (In preparation for the perhaps inevitable "Ummmm, I thought *you* had the original source code..." realisation.)

So, yeah, anyway, the proposal didn't get accepted and so the code's still sitting on my harddrive... The SpyderMonkey page does have a cool logo on it though—that's got to count for something, surely? :-)

As mentioned in the abstract I was attempting to wrap the SpiderMonkey Parser API except it didn't/doesn't really have an official one so I had to make it up as I went along. I wrapped enough of the spidermonkey library with ctypes that I was able to detect items such as functions and variable declarations. This was implemented by way of processing the tree produced by calling js_ParseTokenStream.

I would like to Tidy-Up-The-Code-Enough-To-Release (TM)—don't you hate it when people write that—but a release ain't going to happen tonight. Sorry!

Oh, but I will leave you with a link to the post with instructions on how to compile the spidermonkey JavaScript library for Mac OS X.

Posted at: 04:55 | category: / | Tags: , , , , , , | Comments ()