Python IDE Setup

I occasionally get some questions about how I do my python development and though I would share it.

 

For my main IDE on my windows box I use Eclipse with the PyDev module installed:

http://marketplace.eclipse.org/content/pydev-python-ide-eclipse

 

Since most of my development is on Linux, I have the Remote System Explorer plugin installed which lets me create and edit files on our linux server over sftp like they were local to my workstation:

http://help.eclipse.org/galileo/index.jsp?topic=/org.eclipse.rse.doc.user/gettingstarted/g1installing.html

 

Last but not least, I installed Eclipse Color Theme which is an easy way to install themes for eclipse:

http://marketplace.eclipse.org/content/eclipse-color-theme

I even when an made a theme (kind of Borland blue, not shown above) you can download here:

http://www.eclipsecolorthemes.org/?view=theme&id=4514

Speeding Python Up With Pypy

I’ve been working on some code that will use some supplied regular expressions to search through log files (I know, regex isn’t that efficient, yadda, yadda, yadda, but these were the requirements). The issue I was running into was that there was a lot of data. For example, I had 10 regexes that would search 36 gzipped files averaging 1.2 million lines each. The real issue was that these logs came in hourly, so if it couldn’t finish searching them all within an hour it was going to get backed up.

Being a good Pythonista, I followed the cardinal rules of:
Get it right.
Test it’s right.
Profile if slow.
Optimize.
Repeat from 2.
http://wiki.python.org/moin/PythonSpeed/PerformanceTips

The problem was, after a while I sort of hit a wall. Nothing I did could make this code appreciably faster (of course this was with my limited knowlegde. I’m sure that more experienced Python programmers could optimize this code a lot better then I can, rewrite the regex bottleneck in C, etc) but I was at the end of my rope.

On thing led to another and I remembered reading about Pypy. Pypy is implementation of Python using a JIT (Just In Time Compiler) and other things that have lost there meaning for me since I did systems programming in college. “What the heck”, I thought, “I’ll give it a try”. Pypy is supposed to be highly compatible with CPython (the regular python implementation) and my code didn’t use any exotic libraries.

So I dumped the tarball on my linux machine, unzipped and ran my unmodified code against it, and DAMN was it fast.

CPython run:

sean@linux1:~/code/python/hourly_alerts$ python alerter.py
Loaded regexes
Processing known_bad.log.gz
Searched 817051 lines in 119.398156166 seconds using filter

Pypy Run

sean@linux1:~/code/python/hourly_alerts$ /home/sean/bin/pypy-1.7/bin/pypy alerter.py
Loaded regexes
Processing known_bad.log.gz
Searched 817051 lines in 51.1275110245 seconds using filter

More then twice as fast! Now I know that it was a totally unscientific test and all, but its great to see such an improvement right away.

I my also try Cython, but that looks like it doesn’t have quite the drop-in functionality of Pypy.

Here are the link to Pypy : http://pypy.org/

Links

The yield keyword in Python - A great discussion on iterators and generators

Understanding Python Decorators - I’d never head of decorators until I saw this link and now can think of a bunch of uses for them

In Defense of the Internet Craftsman - “In Gutenberg’s era, the printmaker, not the machine, determined the subject matter of his work. No printing press could impose terms of service that dictated the language or content that could be printed. Instead, the craftsman was in full control of his speech. Yet these restrictions are being hardwired into modern technologies.”

Why Every Programmer Should Have A Tiddlywiki
 - I’ve used tiddlywiki’s before, but Eric shows a great way to start using a tiddlywiki to keep track of your projects and todo’s.

An Introduction to Asynchronous Programming and Twisted - This is not short, in fact it is probably book length and I have not read more then the beginning, but Twisted is something that I’ve always been really keen to learn and this looks like it might be a good introduction to it.