You don’t need regular expressions for most common matching operations in Python

This is something that I find myself explaining a lot to team members that are just picking up Python.   Many times you see that you need to match part of a string just hop right into regular expressions.  I know that I did it when I first started coding in Python.  The thing is though; the standard methods on strings handle most of the simple text matching that you do on a day to day basis.

[ED: Before we get into it though, regular expressions do have their place and are very powerful. None of this should be taken to mean "Don't use regular expressions EVER!" Use them where they make sense, but as always, try to write idiomatic python (http://jaynes.colorado.edu/PythonIdioms.html) when possible.]

Let’s look at a couple examples:

Here we have 2 strings:

myString = "This is a test string.  We'd like to match it."
myString2 = "We don't want to match this string."
stringList = [myString, myString2] #Just so we can iterate over them

If we wanted to find the string that started with “This” we could do this:

import re
for line in stringList:
      if (re.search("^This", line)):
            print "SUCCESS with line :" + line
      else:
            print "FAILURE with line :" + line

This gives:

SUCCESS with line :This is a test string.  We'd like to match it.
FAILURE with line :We don't want to match this string.

We could do the same thing using the standard methods on a string in a (at least to me) cleaner and more pythonic way:

for line in stringList:
      if line.startswith("This"):
            print "SUCCESS with line :" + line
      else:
            print "FAILURE with line :" + line
SUCCESS with line :This is a test string.  We'd like to match it.
FAILURE with line :We don't want to match this string.

Using regex to see if a certain substring is in the line:

import re
for line in stringList:
      if (re.search("test", line)):
            print "SUCCESS with line :" + line
      else:
            print "FAILURE with line :" + line

This gives:

SUCCESS with line :This is a test string.  We'd like to match it.
FAILURE with line :We don't want to match this string.

Now using the built in operations:

for line in stringList:
      if "This" in line:
            print "SUCCESS with line :" + line
      else:
            print "FAILURE with line :" + line
SUCCESS with line :This is a test string.  We'd like to match it.
FAILURE with line :We don't want to match this string.

Keeping it simple always seems to me to be the best policy.  Of course this doesn’t meant that you should write hugely complex code to avoid using regexes.  It just means that sometimes they might not be the right choice.  Use the right tool for the job.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>