I’ve been working on some code that will use some supplied regular expressions to search through log files (I know, regex isn’t that efficient, yadda, yadda, yadda, but these were the requirements). The issue I was running into was that there was a lot of data. For example, I had 10 regexes that would search 36 gzipped files averaging 1.2 million lines each. The real issue was that these logs came in hourly, so if it couldn’t finish searching them all within an hour it was going to get backed up.
Being a good Pythonista, I followed the cardinal rules of:
Get it right.
Test it’s right.
Profile if slow.
Optimize.
Repeat from 2.
http://wiki.python.org/moin/PythonSpeed/PerformanceTips
The problem was, after a while I sort of hit a wall. Nothing I did could make this code appreciably faster (of course this was with my limited knowlegde. I’m sure that more experienced Python programmers could optimize this code a lot better then I can, rewrite the regex bottleneck in C, etc) but I was at the end of my rope.
On thing led to another and I remembered reading about Pypy. Pypy is implementation of Python using a JIT (Just In Time Compiler) and other things that have lost there meaning for me since I did systems programming in college. “What the heck”, I thought, “I’ll give it a try”. Pypy is supposed to be highly compatible with CPython (the regular python implementation) and my code didn’t use any exotic libraries.
So I dumped the tarball on my linux machine, unzipped and ran my unmodified code against it, and DAMN was it fast.
CPython run:
sean@linux1:~/code/python/hourly_alerts$ python alerter.py Loaded regexes Processing known_bad.log.gz Searched 817051 lines in 119.398156166 seconds using filter
Pypy Run
sean@linux1:~/code/python/hourly_alerts$ /home/sean/bin/pypy-1.7/bin/pypy alerter.py Loaded regexes Processing known_bad.log.gz Searched 817051 lines in 51.1275110245 seconds using filter
More then twice as fast! Now I know that it was a totally unscientific test and all, but its great to see such an improvement right away.
I my also try Cython, but that looks like it doesn’t have quite the drop-in functionality of Pypy.
Here are the link to Pypy : http://pypy.org/
Fantastic blog! Do you have any helpful hints
for aspiring writers? I’m planning to start my own website soon but I’m a little lost on everything.
Would you recommend starting with a free platform like WordPress or go for a paid option?
There are so many choices out there that I’m totally confused .. Any recommendations? Cheers!
I was suggested this web site by my cousin. I am not sure whether
this post is written by him as nobody else know such detailed about my trouble.
You are amazing! Thanks!
I know this site offers quality depending content and additional data,
is there any other website which gives such things
in quality?
I am sure this post has touched all the internet people, its really really fastidious piece
of writing on building up new web site.