Python IDE Setup

I occasionally get some questions about how I do my python development and though I would share it.

 

For my main IDE on my windows box I use Eclipse with the PyDev module installed:

http://marketplace.eclipse.org/content/pydev-python-ide-eclipse

 

Since most of my development is on Linux, I have the Remote System Explorer plugin installed which lets me create and edit files on our linux server over sftp like they were local to my workstation:

http://help.eclipse.org/galileo/index.jsp?topic=/org.eclipse.rse.doc.user/gettingstarted/g1installing.html

 

Last but not least, I installed Eclipse Color Theme which is an easy way to install themes for eclipse:

http://marketplace.eclipse.org/content/eclipse-color-theme

I even when an made a theme (kind of Borland blue, not shown above) you can download here:

http://www.eclipsecolorthemes.org/?view=theme&id=4514

Speeding Python Up With Pypy

I’ve been working on some code that will use some supplied regular expressions to search through log files (I know, regex isn’t that efficient, yadda, yadda, yadda, but these were the requirements). The issue I was running into was that there was a lot of data. For example, I had 10 regexes that would search 36 gzipped files averaging 1.2 million lines each. The real issue was that these logs came in hourly, so if it couldn’t finish searching them all within an hour it was going to get backed up.

Being a good Pythonista, I followed the cardinal rules of:
Get it right.
Test it’s right.
Profile if slow.
Optimize.
Repeat from 2.
http://wiki.python.org/moin/PythonSpeed/PerformanceTips

The problem was, after a while I sort of hit a wall. Nothing I did could make this code appreciably faster (of course this was with my limited knowlegde. I’m sure that more experienced Python programmers could optimize this code a lot better then I can, rewrite the regex bottleneck in C, etc) but I was at the end of my rope.

On thing led to another and I remembered reading about Pypy. Pypy is implementation of Python using a JIT (Just In Time Compiler) and other things that have lost there meaning for me since I did systems programming in college. “What the heck”, I thought, “I’ll give it a try”. Pypy is supposed to be highly compatible with CPython (the regular python implementation) and my code didn’t use any exotic libraries.

So I dumped the tarball on my linux machine, unzipped and ran my unmodified code against it, and DAMN was it fast.

CPython run:

sean@linux1:~/code/python/hourly_alerts$ python alerter.py
Loaded regexes
Processing known_bad.log.gz
Searched 817051 lines in 119.398156166 seconds using filter

Pypy Run

sean@linux1:~/code/python/hourly_alerts$ /home/sean/bin/pypy-1.7/bin/pypy alerter.py
Loaded regexes
Processing known_bad.log.gz
Searched 817051 lines in 51.1275110245 seconds using filter

More then twice as fast! Now I know that it was a totally unscientific test and all, but its great to see such an improvement right away.

I my also try Cython, but that looks like it doesn’t have quite the drop-in functionality of Pypy.

Here are the link to Pypy : http://pypy.org/

Python : using filter and regular expressions to search a file

I’ve been working on a script that searches files based on regular expressions and came up with a pretty neat way to do this  – using filter! (this may be old hat to more experienced Python programers, but I am just starting to get comfortable with map/filter/etc…). In this example we compile our regex and filter the gzip file that we have read into memory and display the lines that match and the type of regex that they match on.

import re
import gzip

searchName = "Example Regex"
compiledRegex = re.compile(r'example' re.I)

dataFile = gzip.open(fileName, "rb")
#read in the file to memory
fullData = dataFile.readlines()
regexName = searchName
regexSearch = compiledRegex .search
#Filter for just the lines matching the regex
hitList = filter(regexSearch, fullData)
#hitList now is just a list of our matches
for item in hitList:
    print "MATCH:%s" % regexName
    print "ENTRY:%s" % item

Obviously this code needs more to be at all usable, but this is the general idea and a pretty cool use of the filter function.

If you know of a better way, let me know in the comments.

References:
Python Regular Expressions (re) : http://docs.python.org/library/re.html
Filter Function : http://docs.python.org/library/functions.html#filter