Tue, 08 Dec 2009

TweetZombie — eating your brain. one tweet at a time.

TweetZombie is a site that does some very basic vocabulary analysis of an individual's Twitter messages. It will tell you the size of the vocabulary that the person uses and provide a vocabulary rating (v-rating). The exact rating calculation method is of course a closely guarded trade secret. :) (And yes, you can try to game it with antidisestablishmentarianism if you really want to do so. You wouldn't be the first.)

A handy pie chart shows you at a glance how often the person replies or retweets. Last I looked the highest rating was 51,801 and the biggest vocabulary was 1,240 words.

Applying new technologies

Development of TweetZombie was an exercise in integrating and learning more about a number of technologies. It was originally developed using Django, jQuery, the Twitter API (via tweepy) and sqlite but then ported to run on Google App Engine with Google App Engine Helper for Django and a side order of Google AdSense. (What do you mean assimilated? :) )

The porting exercise was interesting as developing for the App Engine DataStore with its non-SQL approach to queries was an exercise in changing how one thinks about data retrieval. The main change to thinking was pre-calculating more values up front.

I also took a brief look at making use of the Python Natural Language Toolkit for more sophisticated vocabulary analysis (e.g. n-grams) but have not integrated it yet.

Related Wiki Updates

During the development process I added a few related pages to my Wiki/Notebook:

Try it yourself

Head to TweetZombie and try it on your own account or on the accounts of your friends and then brag about how superior your intelligence must be. Or something.

Posted at: 20:35 | category: / | Tags: , , , , , , , , | Comments ()

Mon, 14 Sep 2009

DigitalNZ HackFest Christchurch 2009

On Saturday I took part in the DigitalNZ HackFest - Christchurch and was there for a couple of hours. "DigitalNZ is a publicly funded, not-for-profit initiative that aims to make NZ digital content easier to find, share and use."

The purpose of the HackFest was to introduce developers to the DigitalNZ API and encourage them to experiment with it.

I made some small changes to make the existing DigitalNZ API bindings for Python compatible with Python 2.5 and wrote a command line script to show how to do a basic search. (The patch has already been incorporated into the repository thanks to a quick response from the original author.) As I try to do these days, I made some notes on learning about the DigitalNZ API.

It was good to see a number of people made the effort to come along and contribute code and feedback. I'm pleased DigitalNZ made the decision to send Jo on the road to bring the event to Christchurch and I hope to see more HackFests in the future.

Posted at: 19:01 | category: / | Tags: , , , , | Comments ()

Fri, 27 Jul 2007

pwyky wiki

Like many hackerly-inclined individuals I tend to develop or research a lot more stuff than I release or comment on. The reasons for non-release vary but generally relate to the non-trivial amount of work it takes to put even a half-baked idea on a webpage or blog entry. (A workload increased by the accompanying handy case of perfection-itis.)

The amount of work required to document something depends significantly on the tools at hand so in light of that I've installed a version of the pwyky python wiki on code.rancidbacon.com. The version I'm running is actually a cosmetically modified version of a modified pwyky version hacked by a guy I ran into during my Google Maps hacking escapades. There's some notes on the modified pwyky version and they include some Apache configuration suggestions also.

I'm treating the site as a "one-way wiki"--it's intended to make it easy for me to update it, not in order foster community additions. Various obvious reasons apply.

The updated site will include project documentation and general link-storage--I guess a local del.icio.us replacement--in an attempt to reduce my browser-tabbage when I'm exploring half a dozen paths at once. While much of the wiki is intended purely for my use I made it public on the off-chance it ends up being useful for anyone else.

With some use of mod_rewrite I think I even managed to preserve the existing URLs on the site.

"Oh, I'll chuck it in the wiki" will hopefully help with my information processing activities... :-)

Posted at: 02:05 | category: / | Tags: , , , | Comments ()

Thu, 29 Mar 2007

Flash, ming, Python and ctypes...

You didn't think I was going to leave you without a dope beat to step to... I mean, having to use ming from C, did you? Of course not, I wouldn't be so cruel...

So, ming has a Python binding available as a separate download or from CVS but it's based on SWIG and not exactly actively maintained (for example no SWFVideoStream support). In fact, I did try to bring the SWIG binding up to date but it wasn't even clear what parameters were used to produce it in the first case...

Since any potential speed benefit from having a compiled C extension is probably not a major factor for ming and because I've had experience with ctypes before I decided I'd try using ctypes to create a Python binding for ming.

The cool thing about ctypes is that in the simplest cases once you've compiled the target library in the usual way you're ready to go. In addition, it's relatively straightforward to convert a chunk of C to the equivalent Python—it won't be pretty or Pythonic but will probably be functional.

A basic example

By way of a simple example based on test/Movie/new/test01.c: (Note: Change library name as appropriate.)

#!/usr/bin/python
#
# Requires `libming.0.4.0.dylib` in the current directory
# or in the search path.
#
from ctypes import *

libming = cdll.load("libming.0.4.0.dylib")

if __name__ == "__main__":
    
    if libming.Ming_init() != 0:
        raise Exception("Ming_init failed.");
    
    movie = libming.newSWFMovie();
    
    bytesout = libming.SWFMovie_save(movie, "test01.swf");

    print "Bytes written:", bytesout            
The result will be a valid, but boring .swf file.

More complicated stuff

Once we get much deeper into ming usage we start needing to do things like casting values and it gets somewhat unpleasant. Fortunately, there's a way of making that easier too—my intention is to return to that topic in a later post.

Posted at: 06:10 | category: / | Tags: , , , | Comments ()

Mon, 26 Mar 2007

SpyderMonkey // twiddle your Javascript from Python

I should prefix this post with the proviso there is no code online yet—yes, I know that sucks but given a choice between (finally) posting about this now or waiting until I have the code online I figured the former was better.

Here is the abstract about the project I submitted to linux.conf.au last year:

SpyderMonkey : twiddle your Javascript (or someone else's!) from Python

SpyderMonkey (http://code.rancidbacon.com/spydermonkey/) aims to let you twiddle with your Javascript (or someone else's!) from Python.

While at least one unmaintained Python wrapper for JavaScript exists (http://wwwsearch.sourceforge.net/python-spidermonkey/) SpyderMonkey differs in its implementation by using ctypes (http://docs.python.org/dev/lib/module-ctypes.html) to wrap the underlying Mozilla spidermonkey JavaScript implementation (http://www.mozilla.org/js/spidermonkey/). SpyderMonkey is also the only implementation I am aware of that also wraps the parser and not just the interpreter--this is key to its use in static JavaScript code analysis.

The primary motivation for wrapping the JavaScript parser was to enable further development of JavaScript reverse-engineering and code analysis tools.

During my previous efforts of reverse engineering of JavaScript "Rich Internet Applications" like GMail, Google Maps and similar products I developed a number of scripts based on regular expression parsing "pretty-printed" versions of the obfuscated/compressed source code. While simplistic, these scripts were able to generate Class and Function references listing arguments and the locations where they were used (http://libgmail.sourceforge.net/googlemaps/maps.js.html).

Eventually regex based parsing runs into a wall and this drives the use of an actual parser. JavaScript has a quite complex grammar and in my research I was unable to find a functioning pure-Python JavaScript parser.

This presentation will look at the development of the GPL SpyderMonkey wrapper and some of the issues involved in its construction. In addition we will look at some actual and potential applications that an easier Python-friendly interface allows us to construct in order to assist efforts in areas of debugging, reverse engineering, inter-operability, maintenance and source recovery.

With most RIA JS applications using compression or obfuscation and environments like Google's Web Toolkit AJAX framework for Java (http://code.google.com/webtoolkit/) (or Python's PyJamas http://pyjamas.pyworks.org/) producing JavaScript without direct human involvement there is a growing need for tools to analyse this generated-code. Attend this session and learn what approaches can work for you now and how SpyderMonkey may help you create new tools for the future. (In preparation for the perhaps inevitable "Ummmm, I thought *you* had the original source code..." realisation.)

So, yeah, anyway, the proposal didn't get accepted and so the code's still sitting on my harddrive... The SpyderMonkey page does have a cool logo on it though—that's got to count for something, surely? :-)

As mentioned in the abstract I was attempting to wrap the SpiderMonkey Parser API except it didn't/doesn't really have an official one so I had to make it up as I went along. I wrapped enough of the spidermonkey library with ctypes that I was able to detect items such as functions and variable declarations. This was implemented by way of processing the tree produced by calling js_ParseTokenStream.

I would like to Tidy-Up-The-Code-Enough-To-Release (TM)—don't you hate it when people write that—but a release ain't going to happen tonight. Sorry!

Oh, but I will leave you with a link to the post with instructions on how to compile the spidermonkey JavaScript library for Mac OS X.

Posted at: 04:55 | category: / | Tags: , , , , , , | Comments ()

Sun, 18 Mar 2007

Link dump

Broked Apples

Time to dig out the AppleCare again—the lower RAM-slot issue's back again. (And the power cord... And the accelerometer—well, that's new...)

Oh, and browsers that don't preserve open windows/tabs thru power failures and crashes suck. Happy, happy.

Posted at: 22:55 | category: / | Tags: , , , | Comments ()