Python : using filter and regular expressions to search a file

I’ve been working on a script that searches files based on regular expressions and came up with a pretty neat way to do this  – using filter! (this may be old hat to more experienced Python programers, but I am just starting to get comfortable with map/filter/etc…). In this example we compile our regex and filter the gzip file that we have read into memory and display the lines that match and the type of regex that they match on.

import re
import gzip

searchName = "Example Regex"
compiledRegex = re.compile(r'example' re.I)

dataFile = gzip.open(fileName, "rb")
#read in the file to memory
fullData = dataFile.readlines()
regexName = searchName
regexSearch = compiledRegex .search
#Filter for just the lines matching the regex
hitList = filter(regexSearch, fullData)
#hitList now is just a list of our matches
for item in hitList:
    print "MATCH:%s" % regexName
    print "ENTRY:%s" % item

Obviously this code needs more to be at all usable, but this is the general idea and a pretty cool use of the filter function.

If you know of a better way, let me know in the comments.

References:
Python Regular Expressions (re) : http://docs.python.org/library/re.html
Filter Function : http://docs.python.org/library/functions.html#filter

You don’t need regular expressions for most common matching operations in Python

This is something that I find myself explaining a lot to team members that are just picking up Python.   Many times you see that you need to match part of a string just hop right into regular expressions.  I know that I did it when I first started coding in Python.  The thing is though; the standard methods on strings handle most of the simple text matching that you do on a day to day basis.

Continue reading

Command Line Processing in Python – Several Options

Processing command line parameters is one of the most common tasks that any programmer will do.  Luckily for us, Python has several options for quickly and easily taking arguments from the command line.  This is by no means an exhaustive or authoritative look at how to do this, just a comparison of several ways that I’ve run across in the past.

Continue reading

Using Python’s subprocess module to run piped commands

Python’s subprocess module is really powerful.  It gives you a way to cleanly integrate shell commands into your scripts while managing input/output in a standard way.  The one place that I’ve found that it can get tricky though is when you need to pipe one command into another.  For example, say you want to send an email using the unix mail command:

$ echo "This is the subject of my email" | mail - s "My Email Title" sean@example.com

(Of course you could do this though the python smtp library, but what we are interested in is the behavior of piping the echo command into the mail command).

Using the subprocess module,  we would would write it like this

import subprocess

emailAddress = 'sean@example.com'
title = 'My Email Title'
subject = 'This is the subject of my email'

p1 = subprocess.Popen(['echo', subject], stdout=subprocess.PIPE) #Set up the echo command and direct the output to a pipe
p2 = subprocess.Popen(['mail', '-s' ,title, emailAddress], stdin=p1.stdout #send p1's output to p2
p1.stdout.close() #make sure we close the output so p2 doesn't hang waiting for more input
output = p2.communicate()[0] #run our commands

Although it might seem a bit difficult at first, now that I’ve used this method for a while it make sense to me.  I’ve used it to rewrite shell scripts with six commands piping into each other and the method is the same as above.  And once you have these shell commands being run within python, you can do error handling, or more advanced processing which is difficult (at least for me) within shell.

As always the Python Documentation Library is your best friend for a more detailed explanation and other examples: http://docs.python.org/library/subprocess.html

PyODBC and FreeTDS : Unicode ntext problem [Solved]

While working on a script to find blocking processes on a SQL Server 2008 database, I ran across this error when I tried to execute a query:

$ python find_blocking_processes.py
Traceback (most recent call last):
File "find_blocking_processes.py", line 77, in <module>
find_blocking_processes(brokerConn)
File "find_blocking_processes.py", line 67, in find_blocking_processes
) x""").fetchall()
pyodbc.ProgrammingError: ('42000', '[42000] [FreeTDS][SQL Server]Unicode data in a Unicode-only collation or ntext data cannot be sent to clients using DB-Library (such as ISQL) or ODBC version 3.7 or earlier. (4004) (SQLExecDirectW)')

After a bit of digging, it appears that you need to tell the ODBC driver which protocol to talk to a server in. Rectifying this was pretty straight forward:

In the file /etc/freetds.conf I added a line to the serverconfig  stating to use the version 8.0 protocol:

# A typical Microsoft SQL Server 2008 configuration
[DEVDATABASE]
host = 10.10.10.100
port = 1433
tds version = 8.0

Then in the file /etc/odbc.ini I added a line for the version as well:

[DEVDATABASE]
Driver          = /usr/lib64/libtdsodbc.so.0
Server          = 10.10.10.100
Port            = 1433
Trace           = Yes
TraceFile       = /tmp/freetdssql-foobar.log
tds_version     = 8.0

* Note the ‘_’ in the tds_version  variable in the odbc.ini

After this, the database call ran like a dream.

These were the links that pointed me in the right direction to solve this issue and contain a little more information on the causes of this: