Python -Get Physical HD Space in Windows

Recently, I needed to find out the free space on the physical disks of some of my windows servers programmatically. There doesn’t seem to be an easy way to do this without installing other libraries and I didn’t want to do that since I was going to be deploying this script to a good number of servers.

So I started poking around and I found we can figure out the local hard disks via windows wmic commands and our friend the subprocess library:

import subprocess
import ctypes

#Get the fixed drives
#wmic logicaldisk get name,description
drivelist = subprocess.check_output(['wmic', 'logicaldisk', 'get', 'name,description'])
driveLines = drivelist.split('\n')
for line in driveLines:
    if line.startswith("Local Fixed Disk"):
        elements = line.split()
        driveLetter = elements[-1]
        free_bytes = ctypes.c_ulonglong(0)
         total_bytes = ctypes.c_ulonglong(0)
         ctypes.windll.kernel32.GetDiskFreeSpaceExW(ctypes.c_wchar_p(driveLetter), None, ctypes.pointer(total_bytes), ctypes.pointer(free_bytes))
         print "Drive %s" % driveLetter
         print free_bytes.value
         print total_bytes.value
          print str(int(float(free_bytes.value) / float(total_bytes.value) * 100.00)) + "% Free"

What we do is call the command wmic logicaldisk get name,description with subprocess and then parse the local disks out of that. Afterwords we pull the used and free bytes using ctypes with the Windows command GetDiskFreeSpaceExW

More on wmic http://www.computerhope.com/wmic.htm
The Stackoverflow question on using GetDiskFreeSpaceExW to get free disk space. http://stackoverflow.com/questions/51658/cross-platform-space-remaining-on-volume-using-python

Make os.stat times readable in Python

I find myself needing to get file creation or modification times a lot. This is pretty trivial with using Python’s built in os.stat function. The problem is that it returns the file modification time in seconds since the epoch, which is not exactly the friendliest format to read. Here is an example:

>>> import os
>>> file = "H:\SSDC\common_functions\common_functions.py"
>>> modTime = os.stat(file).st_mtime
>>> print modTime
1334325015.57

Now more power to you if you can look at that and get the date out of there, but I prefer a little more standard notation.

Luckily, this is doable. We need to first convert the epoch seconds that os.stat returned into a time_struct and then we can print that time_struct in a more readable manner.

Here is an example using the same file above:

>>> import time
>>> modTime2 = time.gmtime(os.stat(file).st_mtime)
>>> print modTime2
time.struct_time(tm_year=2012, tm_mon=4, tm_mday=13, tm_hour=13, tm_min=50, tm_sec=15, tm_wday=4, tm_yday=104, tm_isdst=0)
>>> modTime2_hr = time.strftime("%m/%d/%Y %H:%M:%S", modTime2)
>>> print modTime2_hr
04/13/2012 13:50:15

References:
Python Standard Library – os.stat
Python Standard Library – time module

Reading a file N lines at a time with Python (using Generators!)

Recently found myself needing to read in a file 4 lines at a time. The solution to this problem turned out to nicely illustrate a concept that I’d read about but never really understood – generators. Let’s take a look at the problem:

Say we have some data in a file like this:

entry1:name
entry1:description
entry1:reference_number
---
entry2:name
entry2:description
entry2:reference_number
---
entry3:name
entry3:description
entry3:reference_number
---

So the easiest way to deal with this would be to read it in 4 lines at a time and deal with each entry as a unique entity with 3 attributes.

Now we could just read in until we hit a record separator (‘—’) but using generators we can have a much more general purpose solution.

First we’ll need to define our generator:

def chunks(l, n):
    #Generator to yield n sized chunks from l
    for i in xrange(0, len(l), n):
        yield l[i: i + n]

What this little bit of code does is return an iterator object that iterates over our list object ‘l’ (in this case a file) and returns ‘n’ chunks from it at a time.

We’d use it to read our example like so:

items = []
fileContents = open(file, 'r').readlines()
for entry in chunks(fileContents, 4):
    items.append(entry[0], entry[1], entry[2])

What we did here is to first read our file into memory as one big list separated by newlines. We then used our generator to return the file in 4 line blocks and shoved the relevant information (ie everything beside the record separator) into a list called items. Now we have a list where each item is a tuple of name, description and reference_number.

Hopefully this helps clear up generators and provides a concrete usage for them.

Link to the Stack Overflow answer that gave me the generator code: http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python/312464#312464

Checking For a Valid IP in Python

I’ve been working to clean up some utility functions that I use a lot and one thing that I do is check for string to see if it is a valid IP v4 address. I have a simple function for this which is by no means exhaustive, but it covers my needs most of the time:

def isIP(address):
    #Takes a string and returns a status if it matches
    #a ipv4 address
    # 0 = false / 1 = true
    ip = False
    try:
        if address[0].isdigit():
            octets = address.split('.')
            if len(octets) == 4:
                ipAddr = "".join(octets)
                if ipAddr.isdigit():
                #correct format
                    if (int(octets[0]) >= 0) and (int(octets[0]) <= 255):
                        if (int(octets[1]) >= 0) and (int(octets[1]) <= 255):
                            if (int(octets[2]) >= 0) and (int(octets[2]) <= 255):
                                if (int(octets[3]) >= 0) and (int(octets[3]) <= 255):
                                    ip = True
    except IndexError:
        pass
    return ip

As you can see, it just takes a string and checks if it meets the form of X.X.X.X and then checks to see if the values fall in the legal (0-255) range and returns True or False. As I said, by no means a complete check on the IPv4 spec, but it covered my needs.

Just out of curiosity, I also wrote up a function to do the same check except using the socket library in Python:

import socket

def isIP_v2(address):
    try:
        socket.inet_aton(address)
        ip = True
    except socket.error:
        ip = False

    return ip

I then wrote up a quick script to check how fast each one was. I assumed mine would be faster since I am doing a simpler check, but wanted to see by how much. I ran it against a list of IP address and non IP address (41,715 real IPs, 10,905 bad addresses) and I was pretty surprised by the results:

PS H:\SSDC\common_functions> C:\Python27\python.exe .\ip_checker.py
Custom Function
Found 41715 Good IPs
Found 10905 Bad IPs
Time: 0.318000078201
Socket Function
Found 41715 Good IPs
Found 10905 Bad IPs
Time: 0.069000005722
PS H:\SSDC\common_functions> C:\Python27\python.exe .\ip_checker.py
Custom Function
Found 41715 Good IPs
Found 10905 Bad IPs
Time: 0.31299996376
Socket Function
Found 41715 Good IPs
Found 10905 Bad IPs
Time: 0.069000005722
PS H:\SSDC\common_functions> C:\Python27\python.exe .\ip_checker.py
Custom Function
Found 41715 Good IPs
Found 10905 Bad IPs
Time: 0.311000108719
Socket Function
Found 41715 Good IPs
Found 10905 Bad IPs
Time: 0.0700001716614

You can see from this (admittedly non-scientific) test that the built in socket test was more then 4 times faster in each test!

So not only is the built in socket function much easier to read, it is significatly faster! I guess it goes back to the old adage, Use built in functions whenever possible.

Scripting Splunk Alerts with Python

Occasionally for one reason or another you may need to call a script when an alert triggers within Splunk. This is pretty easy with Python as long as you keep a couple things in mind. First, is that one the script is called via the Splunk alert, all the important data is passed from via environmental variables.

Reading these is fairly straight forward using the os module:

import os

#Read the environment variables that Splunk has passed to us
scriptName = os.environ['SPLUNK_ARG_0']
numberEventsReturned = os.environ['SPLUNK_ARG_1']
searchTerms = os.environ['SPLUNK_ARG_2']
queryString = os.environ['SPLUNK_ARG_3']
searchName = os.environ['SPLUNK_ARG_4']
triggerReason = os.environ['SPLUNK_ARG_5']
browserUrl = os.environ['SPLUNK_ARG_6']
rawEventsFile = os.environ['SPLUNK_ARG_8']

You can find more information on what the environmental variable contain here : http://docs.splunk.com/Documentation/Splunk/latest/Admin/Configurescriptedalerts

The other thing to keep in mind is that Splunk does not actually pass the actual data that generated the alert. That data is stored in an file (a gzipped csv (at least on unix)) and the location of the file is passed to us. So if we need to action the actual alert data within our script we will need to read the events out of the file. Luckily again, this is pretty straight forward with the csv and gzip modules in the standard library:

import gzip
import csv

logFile = open('/tmp/splunk_alert_events', 'a')

#We got the file name from the envioenment vars
eventFile = csv.reader(gzip.open(rawEventsFile, 'rb'))
for line in eventFile:
logFile.write(line)

logFile.close()

Putting it all together, we get this:

import os
import csv
import gzip

if __name__ == "__main__":

#Read the environment variables that Splunk has passed to us
scriptName = os.environ['SPLUNK_ARG_0']
numberEventsReturned = os.environ['SPLUNK_ARG_1']
searchTerms = os.environ['SPLUNK_ARG_2']
queryString = os.environ['SPLUNK_ARG_3']
searchName = os.environ['SPLUNK_ARG_4']
triggerReason = os.environ['SPLUNK_ARG_5']
browserUrl = os.environ['SPLUNK_ARG_6']
rawEventsFile = os.environ['SPLUNK_ARG_8']

logFile = open('/tmp/splunk_alert_events', 'a')

#We got the file name from the envioenment vars
eventFile = csv.reader(gzip.open(rawEventsFile, 'rb'))
for line in eventFile:
logFile.write(line)

logFile.close()

Saving this script in $SPLUNK_HOME/bin/scripts and set the alert to call the script (without the path; all scripts are assumed to run from $SPLUNK_HOME/bin/scripts) and you now have a script that will right the events that caused the alert to a file in /tmp.

Find a String in Windows Files

I’ve recently found myself having to find certain strings in files on Windows. Other then making me long for grep, this has really pointed out how poor the explorer search implementation on windows is. The easiest and most useful way I’ve run across to accomplish this is to use findstr from the command line. Findstr is sort of like grep in that it will allow you to search a file (or more importantly recursively search files and directories) for a fixed string or a regular expression.

Here is an example:

c:\Temp>findstr /s /p /C:"<TITLE>" *.htm
readme.htm:<TITLE>Readme for Visual Studio 2010</TITLE>...
Setup\readme.htm:<TITLE>Readme for Visual Studio 2010</TITLE>...
c:\Temp>

The flags are a little confusing, but basically what we are saying is seach all files and subirectories recursively (/s) for the string <TITLE> (/C:"<TITLE>") on any file named [something].htm and ignore files with non-printable characters (/p).

Sort of ugly, but it gets the job done.

You can use regular expression in your search, but it is a subset of what grep can use.

Details and all flags can be found on the MSDN page here: http://msdn.microsoft.com/en-us/library/bb490907.aspx

Query MSSQL Linked Server Credential Information

We recently had a team member go to a different job and one of the tasks that I had was to verify that none of our linked servers in Microsoft SQL server were using his credentials for creating the link. After a little while of opening the linked server and checking the credentials in the Security tab, I thought that there had to be a better way to do this. I googled around a bit and poking through the system tables, I came up with this query:

SELECT
    serv.NAME,
    serv.product,
    serv.provider,
    serv.data_source,
    serv.catalog,
    prin.name,
    ls_logins.uses_self_credential,
    ls_logins.remote_name
FROM
    sys.servers AS serv
    LEFT JOIN sys.linked_logins AS ls_logins
    ON serv.server_id = ls_logins.server_id
    LEFT JOIN sys.server_principals AS prin
    ON ls_logins.local_principal_id = prin.principal_id

Run it on [master] and you get a list of the linked server usernames.

Getting Started With Python List Comprehensions

I’ve been hearing people talk about how useful list comprehensions in Python are for a while now, but it never really seemed to click until recently. When it did, I find myself using them all the time. Basically list comprehnsion is a way to create a list from a list. It’s fast and makes a lot of sense once you see it in action.

One thing I find myself doing a lot is checking if any member of a list of strings is within another string. We should easily write this as a loop:

>>> myList = ['a.html', 'b.html', 'c.html']
>>> myString = "www.someserver.com/dir1/b.html"
>>> for item in myList:
    if item in myString:
        print "Found %s in %s" % (item, myString)

Found b.html in www.someserver.com/dir1/b.html
>>>

Not too bad, but we can write it as a list comprehension even more succinctly:

>>> myList = ['a.html', 'b.html', 'c.html']
>>> myString = "www.someserver.com/dir1/b.html"
>>> myHits = [x for x in myList if x in myString]
>>> myHits
['b.html']
>>>

We can even use the whole list comprehension as a test:

>>> myString2 = 'www.someserver.com/dir2/d.html'

>>> if [x for x in myList if x in myString]:
    print "Got a hit on %s" % myString

Got a hit on www.someserver.com/dir1/b.html

>>> if [x for x in myList if x in myString2]:
    print "Got a hit on %s" % myString2
else:
    print "No hits on %s" % myString2

No hits on www.someserver.com/dir2/d.html

I know this seems like a somewhat trivial example, but once you start replacing loops with list comprehensions you can see the uses all over the place. Here are some more good examples of using list comprehensions:

http://docs.python.org/tutorial/datastructures.html#list-comprehensions
http://docs.python.org/howto/functional.html#generator-expressions-and-list-comprehensions
http://www.siafoo.net/article/52 (Section 2 deals with list comprehensions)

Python IDE Setup

I occasionally get some questions about how I do my python development and though I would share it.

 

For my main IDE on my windows box I use Eclipse with the PyDev module installed:

http://marketplace.eclipse.org/content/pydev-python-ide-eclipse

 

Since most of my development is on Linux, I have the Remote System Explorer plugin installed which lets me create and edit files on our linux server over sftp like they were local to my workstation:

http://help.eclipse.org/galileo/index.jsp?topic=/org.eclipse.rse.doc.user/gettingstarted/g1installing.html

 

Last but not least, I installed Eclipse Color Theme which is an easy way to install themes for eclipse:

http://marketplace.eclipse.org/content/eclipse-color-theme

I even when an made a theme (kind of Borland blue, not shown above) you can download here:

http://www.eclipsecolorthemes.org/?view=theme&id=4514

Speeding Python Up With Pypy

I’ve been working on some code that will use some supplied regular expressions to search through log files (I know, regex isn’t that efficient, yadda, yadda, yadda, but these were the requirements). The issue I was running into was that there was a lot of data. For example, I had 10 regexes that would search 36 gzipped files averaging 1.2 million lines each. The real issue was that these logs came in hourly, so if it couldn’t finish searching them all within an hour it was going to get backed up.

Being a good Pythonista, I followed the cardinal rules of:
Get it right.
Test it’s right.
Profile if slow.
Optimize.
Repeat from 2.
http://wiki.python.org/moin/PythonSpeed/PerformanceTips

The problem was, after a while I sort of hit a wall. Nothing I did could make this code appreciably faster (of course this was with my limited knowlegde. I’m sure that more experienced Python programmers could optimize this code a lot better then I can, rewrite the regex bottleneck in C, etc) but I was at the end of my rope.

On thing led to another and I remembered reading about Pypy. Pypy is implementation of Python using a JIT (Just In Time Compiler) and other things that have lost there meaning for me since I did systems programming in college. “What the heck”, I thought, “I’ll give it a try”. Pypy is supposed to be highly compatible with CPython (the regular python implementation) and my code didn’t use any exotic libraries.

So I dumped the tarball on my linux machine, unzipped and ran my unmodified code against it, and DAMN was it fast.

CPython run:

sean@linux1:~/code/python/hourly_alerts$ python alerter.py
Loaded regexes
Processing known_bad.log.gz
Searched 817051 lines in 119.398156166 seconds using filter

Pypy Run

sean@linux1:~/code/python/hourly_alerts$ /home/sean/bin/pypy-1.7/bin/pypy alerter.py
Loaded regexes
Processing known_bad.log.gz
Searched 817051 lines in 51.1275110245 seconds using filter

More then twice as fast! Now I know that it was a totally unscientific test and all, but its great to see such an improvement right away.

I my also try Cython, but that looks like it doesn’t have quite the drop-in functionality of Pypy.

Here are the link to Pypy : http://pypy.org/