Python IDE Setup

I occasionally get some questions about how I do my python development and though I would share it.

 

For my main IDE on my windows box I use Eclipse with the PyDev module installed:

http://marketplace.eclipse.org/content/pydev-python-ide-eclipse

 

Since most of my development is on Linux, I have the Remote System Explorer plugin installed which lets me create and edit files on our linux server over sftp like they were local to my workstation:

http://help.eclipse.org/galileo/index.jsp?topic=/org.eclipse.rse.doc.user/gettingstarted/g1installing.html

 

Last but not least, I installed Eclipse Color Theme which is an easy way to install themes for eclipse:

http://marketplace.eclipse.org/content/eclipse-color-theme

I even when an made a theme (kind of Borland blue, not shown above) you can download here:

http://www.eclipsecolorthemes.org/?view=theme&id=4514

Speeding Python Up With Pypy

I’ve been working on some code that will use some supplied regular expressions to search through log files (I know, regex isn’t that efficient, yadda, yadda, yadda, but these were the requirements). The issue I was running into was that there was a lot of data. For example, I had 10 regexes that would search 36 gzipped files averaging 1.2 million lines each. The real issue was that these logs came in hourly, so if it couldn’t finish searching them all within an hour it was going to get backed up.

Being a good Pythonista, I followed the cardinal rules of:
Get it right.
Test it’s right.
Profile if slow.
Optimize.
Repeat from 2.
http://wiki.python.org/moin/PythonSpeed/PerformanceTips

The problem was, after a while I sort of hit a wall. Nothing I did could make this code appreciably faster (of course this was with my limited knowlegde. I’m sure that more experienced Python programmers could optimize this code a lot better then I can, rewrite the regex bottleneck in C, etc) but I was at the end of my rope.

On thing led to another and I remembered reading about Pypy. Pypy is implementation of Python using a JIT (Just In Time Compiler) and other things that have lost there meaning for me since I did systems programming in college. “What the heck”, I thought, “I’ll give it a try”. Pypy is supposed to be highly compatible with CPython (the regular python implementation) and my code didn’t use any exotic libraries.

So I dumped the tarball on my linux machine, unzipped and ran my unmodified code against it, and DAMN was it fast.

CPython run:

sean@linux1:~/code/python/hourly_alerts$ python alerter.py
Loaded regexes
Processing known_bad.log.gz
Searched 817051 lines in 119.398156166 seconds using filter

Pypy Run

sean@linux1:~/code/python/hourly_alerts$ /home/sean/bin/pypy-1.7/bin/pypy alerter.py
Loaded regexes
Processing known_bad.log.gz
Searched 817051 lines in 51.1275110245 seconds using filter

More then twice as fast! Now I know that it was a totally unscientific test and all, but its great to see such an improvement right away.

I my also try Cython, but that looks like it doesn’t have quite the drop-in functionality of Pypy.

Here are the link to Pypy : http://pypy.org/

Python : using filter and regular expressions to search a file

I’ve been working on a script that searches files based on regular expressions and came up with a pretty neat way to do this  – using filter! (this may be old hat to more experienced Python programers, but I am just starting to get comfortable with map/filter/etc…). In this example we compile our regex and filter the gzip file that we have read into memory and display the lines that match and the type of regex that they match on.

import re
import gzip

searchName = "Example Regex"
compiledRegex = re.compile(r'example' re.I)

dataFile = gzip.open(fileName, "rb")
#read in the file to memory
fullData = dataFile.readlines()
regexName = searchName
regexSearch = compiledRegex .search
#Filter for just the lines matching the regex
hitList = filter(regexSearch, fullData)
#hitList now is just a list of our matches
for item in hitList:
    print "MATCH:%s" % regexName
    print "ENTRY:%s" % item

Obviously this code needs more to be at all usable, but this is the general idea and a pretty cool use of the filter function.

If you know of a better way, let me know in the comments.

References:
Python Regular Expressions (re) : http://docs.python.org/library/re.html
Filter Function : http://docs.python.org/library/functions.html#filter

You don’t need regular expressions for most common matching operations in Python

This is something that I find myself explaining a lot to team members that are just picking up Python.   Many times you see that you need to match part of a string just hop right into regular expressions.  I know that I did it when I first started coding in Python.  The thing is though; the standard methods on strings handle most of the simple text matching that you do on a day to day basis.

Continue reading

Links

The yield keyword in Python - A great discussion on iterators and generators

Understanding Python Decorators - I’d never head of decorators until I saw this link and now can think of a bunch of uses for them

In Defense of the Internet Craftsman - “In Gutenberg’s era, the printmaker, not the machine, determined the subject matter of his work. No printing press could impose terms of service that dictated the language or content that could be printed. Instead, the craftsman was in full control of his speech. Yet these restrictions are being hardwired into modern technologies.”

Why Every Programmer Should Have A Tiddlywiki
 - I’ve used tiddlywiki’s before, but Eric shows a great way to start using a tiddlywiki to keep track of your projects and todo’s.

An Introduction to Asynchronous Programming and Twisted - This is not short, in fact it is probably book length and I have not read more then the beginning, but Twisted is something that I’ve always been really keen to learn and this looks like it might be a good introduction to it.

Command Line Processing in Python – Several Options

Processing command line parameters is one of the most common tasks that any programmer will do.  Luckily for us, Python has several options for quickly and easily taking arguments from the command line.  This is by no means an exhaustive or authoritative look at how to do this, just a comparison of several ways that I’ve run across in the past.

Continue reading

Why I dropped Librarything for Goodreads

I’m kind of a list-o-holic.  I love making lists of things.  A couple years ago I started looking for a why to track what I was reading on the web and the two big choices were LibraryThing.com and Goodreads.com.  I signed up for both, but I found that I liked the no-nonsense interface (even though it is pretty ugly) of Librarything better.

So dutifully I cataloged my books for a couple years as I read them and was pretty content.  That is until earlier this week when I tried to add the last book I read (Day by Day Armageddon : Beyond Exile (which, while I love zombie apocalypse stories, was really rather meh)) I ran across this fun message:

Free memberships are limited to 200 books. Upgrade to a paid membership and catalog as many books as you like. A yearly membership costs only $10.

Now, I’m sure that this was listed somewhere in the Terms of Service for Librarything, but I was more then a little taken by surprise.

I have no problem paying for web services that I use and $10 dollars is really not a lot of money, but for basically keeping a spreadsheet of what I’ve read it seemed a bit excessive.  I really didn’t use any of the “social” functions of Librarything and didn’t especially like the records for individual books.  Also the 200 limit seemed pretty low since I only added 3 years worth of books that I was actually reading and hadn’t even loaded up books that I meant to read, or owned but haven’t gotten around to reading or whatever.

So I started going back over my choices.  I knew about Goodreads, but I really didn’t want to lose all the 200 books that I had cataloged and tagged in Librarything.  Luckily there exists an export function in Librarything and an import from a file in Goodreads.  I found that exporting and importing as a tab delimited file (xls) seemed to work better then exporting and importing as a comma separated value file (csv). I have noticed that some of the tags apprear to be messed up, but fixing that should be no big deal.

I also like the fact that Goodreads provides an Android App (Android Market Link)

So long story short, I’ve ported all my books over to Goodreads and that is where you can find me now.

My Goodreads profile http://www.goodreads.com/user/show/1401430-sean-lavelle

Using Python’s subprocess module to run piped commands

Python’s subprocess module is really powerful.  It gives you a way to cleanly integrate shell commands into your scripts while managing input/output in a standard way.  The one place that I’ve found that it can get tricky though is when you need to pipe one command into another.  For example, say you want to send an email using the unix mail command:

$ echo "This is the subject of my email" | mail - s "My Email Title" sean@example.com

(Of course you could do this though the python smtp library, but what we are interested in is the behavior of piping the echo command into the mail command).

Using the subprocess module,  we would would write it like this

import subprocess

emailAddress = 'sean@example.com'
title = 'My Email Title'
subject = 'This is the subject of my email'

p1 = subprocess.Popen(['echo', subject], stdout=subprocess.PIPE) #Set up the echo command and direct the output to a pipe
p2 = subprocess.Popen(['mail', '-s' ,title, emailAddress], stdin=p1.stdout #send p1's output to p2
p1.stdout.close() #make sure we close the output so p2 doesn't hang waiting for more input
output = p2.communicate()[0] #run our commands

Although it might seem a bit difficult at first, now that I’ve used this method for a while it make sense to me.  I’ve used it to rewrite shell scripts with six commands piping into each other and the method is the same as above.  And once you have these shell commands being run within python, you can do error handling, or more advanced processing which is difficult (at least for me) within shell.

As always the Python Documentation Library is your best friend for a more detailed explanation and other examples: http://docs.python.org/library/subprocess.html

PyODBC and FreeTDS : Unicode ntext problem [Solved]

While working on a script to find blocking processes on a SQL Server 2008 database, I ran across this error when I tried to execute a query:

$ python find_blocking_processes.py
Traceback (most recent call last):
File "find_blocking_processes.py", line 77, in <module>
find_blocking_processes(brokerConn)
File "find_blocking_processes.py", line 67, in find_blocking_processes
) x""").fetchall()
pyodbc.ProgrammingError: ('42000', '[42000] [FreeTDS][SQL Server]Unicode data in a Unicode-only collation or ntext data cannot be sent to clients using DB-Library (such as ISQL) or ODBC version 3.7 or earlier. (4004) (SQLExecDirectW)')

After a bit of digging, it appears that you need to tell the ODBC driver which protocol to talk to a server in. Rectifying this was pretty straight forward:

In the file /etc/freetds.conf I added a line to the serverconfig  stating to use the version 8.0 protocol:

# A typical Microsoft SQL Server 2008 configuration
[DEVDATABASE]
host = 10.10.10.100
port = 1433
tds version = 8.0

Then in the file /etc/odbc.ini I added a line for the version as well:

[DEVDATABASE]
Driver          = /usr/lib64/libtdsodbc.so.0
Server          = 10.10.10.100
Port            = 1433
Trace           = Yes
TraceFile       = /tmp/freetdssql-foobar.log
tds_version     = 8.0

* Note the ‘_’ in the tds_version  variable in the odbc.ini

After this, the database call ran like a dream.

These were the links that pointed me in the right direction to solve this issue and contain a little more information on the causes of this: