Speeding Python Up With Pypy

I’ve been working on some code that will use some supplied regular expressions to search through log files (I know, regex isn’t that efficient, yadda, yadda, yadda, but these were the requirements). The issue I was running into was that there was a lot of data. For example, I had 10 regexes that would search 36 gzipped files averaging 1.2 million lines each. The real issue was that these logs came in hourly, so if it couldn’t finish searching them all within an hour it was going to get backed up.

Being a good Pythonista, I followed the cardinal rules of:
Get it right.
Test it’s right.
Profile if slow.
Optimize.
Repeat from 2.
http://wiki.python.org/moin/PythonSpeed/PerformanceTips

The problem was, after a while I sort of hit a wall. Nothing I did could make this code appreciably faster (of course this was with my limited knowlegde. I’m sure that more experienced Python programmers could optimize this code a lot better then I can, rewrite the regex bottleneck in C, etc) but I was at the end of my rope.

On thing led to another and I remembered reading about Pypy. Pypy is implementation of Python using a JIT (Just In Time Compiler) and other things that have lost there meaning for me since I did systems programming in college. “What the heck”, I thought, “I’ll give it a try”. Pypy is supposed to be highly compatible with CPython (the regular python implementation) and my code didn’t use any exotic libraries.

So I dumped the tarball on my linux machine, unzipped and ran my unmodified code against it, and DAMN was it fast.

CPython run:

sean@linux1:~/code/python/hourly_alerts$ python alerter.py
Loaded regexes
Processing known_bad.log.gz
Searched 817051 lines in 119.398156166 seconds using filter

Pypy Run

sean@linux1:~/code/python/hourly_alerts$ /home/sean/bin/pypy-1.7/bin/pypy alerter.py
Loaded regexes
Processing known_bad.log.gz
Searched 817051 lines in 51.1275110245 seconds using filter

More then twice as fast! Now I know that it was a totally unscientific test and all, but its great to see such an improvement right away.

I my also try Cython, but that looks like it doesn’t have quite the drop-in functionality of Pypy.

Here are the link to Pypy : http://pypy.org/

4 thoughts on “Speeding Python Up With Pypy

  1. Fantastic blog! Do you have any helpful hints
    for aspiring writers? I’m planning to start my own website soon but I’m a little lost on everything.
    Would you recommend starting with a free platform like WordPress or go for a paid option?
    There are so many choices out there that I’m totally confused .. Any recommendations? Cheers!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>