Python -Get Physical HD Space in Windows

Recently, I needed to find out the free space on the physical disks of some of my windows servers programmatically. There doesn’t seem to be an easy way to do this without installing other libraries and I didn’t want to do that since I was going to be deploying this script to a good number of servers.

So I started poking around and I found we can figure out the local hard disks via windows wmic commands and our friend the subprocess library:

import subprocess
import ctypes

#Get the fixed drives
#wmic logicaldisk get name,description
drivelist = subprocess.check_output(['wmic', 'logicaldisk', 'get', 'name,description'])
driveLines = drivelist.split('\n')
for line in driveLines:
    if line.startswith("Local Fixed Disk"):
        elements = line.split()
        driveLetter = elements[-1]
        free_bytes = ctypes.c_ulonglong(0)
         total_bytes = ctypes.c_ulonglong(0)
         ctypes.windll.kernel32.GetDiskFreeSpaceExW(ctypes.c_wchar_p(driveLetter), None, ctypes.pointer(total_bytes), ctypes.pointer(free_bytes))
         print "Drive %s" % driveLetter
         print free_bytes.value
         print total_bytes.value
          print str(int(float(free_bytes.value) / float(total_bytes.value) * 100.00)) + "% Free"

What we do is call the command wmic logicaldisk get name,description with subprocess and then parse the local disks out of that. Afterwords we pull the used and free bytes using ctypes with the Windows command GetDiskFreeSpaceExW

More on wmic http://www.computerhope.com/wmic.htm
The Stackoverflow question on using GetDiskFreeSpaceExW to get free disk space. http://stackoverflow.com/questions/51658/cross-platform-space-remaining-on-volume-using-python

Scripting Splunk Alerts with Python

Occasionally for one reason or another you may need to call a script when an alert triggers within Splunk. This is pretty easy with Python as long as you keep a couple things in mind. First, is that one the script is called via the Splunk alert, all the important data is passed from via environmental variables.

Reading these is fairly straight forward using the os module:

import os

#Read the environment variables that Splunk has passed to us
scriptName = os.environ['SPLUNK_ARG_0']
numberEventsReturned = os.environ['SPLUNK_ARG_1']
searchTerms = os.environ['SPLUNK_ARG_2']
queryString = os.environ['SPLUNK_ARG_3']
searchName = os.environ['SPLUNK_ARG_4']
triggerReason = os.environ['SPLUNK_ARG_5']
browserUrl = os.environ['SPLUNK_ARG_6']
rawEventsFile = os.environ['SPLUNK_ARG_8']

You can find more information on what the environmental variable contain here : http://docs.splunk.com/Documentation/Splunk/latest/Admin/Configurescriptedalerts

The other thing to keep in mind is that Splunk does not actually pass the actual data that generated the alert. That data is stored in an file (a gzipped csv (at least on unix)) and the location of the file is passed to us. So if we need to action the actual alert data within our script we will need to read the events out of the file. Luckily again, this is pretty straight forward with the csv and gzip modules in the standard library:

import gzip
import csv

logFile = open('/tmp/splunk_alert_events', 'a')

#We got the file name from the envioenment vars
eventFile = csv.reader(gzip.open(rawEventsFile, 'rb'))
for line in eventFile:
logFile.write(line)

logFile.close()

Putting it all together, we get this:

import os
import csv
import gzip

if __name__ == "__main__":

#Read the environment variables that Splunk has passed to us
scriptName = os.environ['SPLUNK_ARG_0']
numberEventsReturned = os.environ['SPLUNK_ARG_1']
searchTerms = os.environ['SPLUNK_ARG_2']
queryString = os.environ['SPLUNK_ARG_3']
searchName = os.environ['SPLUNK_ARG_4']
triggerReason = os.environ['SPLUNK_ARG_5']
browserUrl = os.environ['SPLUNK_ARG_6']
rawEventsFile = os.environ['SPLUNK_ARG_8']

logFile = open('/tmp/splunk_alert_events', 'a')

#We got the file name from the envioenment vars
eventFile = csv.reader(gzip.open(rawEventsFile, 'rb'))
for line in eventFile:
logFile.write(line)

logFile.close()

Saving this script in $SPLUNK_HOME/bin/scripts and set the alert to call the script (without the path; all scripts are assumed to run from $SPLUNK_HOME/bin/scripts) and you now have a script that will right the events that caused the alert to a file in /tmp.

Getting Started With Python List Comprehensions

I’ve been hearing people talk about how useful list comprehensions in Python are for a while now, but it never really seemed to click until recently. When it did, I find myself using them all the time. Basically list comprehnsion is a way to create a list from a list. It’s fast and makes a lot of sense once you see it in action.

One thing I find myself doing a lot is checking if any member of a list of strings is within another string. We should easily write this as a loop:

>>> myList = ['a.html', 'b.html', 'c.html']
>>> myString = "www.someserver.com/dir1/b.html"
>>> for item in myList:
    if item in myString:
        print "Found %s in %s" % (item, myString)

Found b.html in www.someserver.com/dir1/b.html
>>>

Not too bad, but we can write it as a list comprehension even more succinctly:

>>> myList = ['a.html', 'b.html', 'c.html']
>>> myString = "www.someserver.com/dir1/b.html"
>>> myHits = [x for x in myList if x in myString]
>>> myHits
['b.html']
>>>

We can even use the whole list comprehension as a test:

>>> myString2 = 'www.someserver.com/dir2/d.html'

>>> if [x for x in myList if x in myString]:
    print "Got a hit on %s" % myString

Got a hit on www.someserver.com/dir1/b.html

>>> if [x for x in myList if x in myString2]:
    print "Got a hit on %s" % myString2
else:
    print "No hits on %s" % myString2

No hits on www.someserver.com/dir2/d.html

I know this seems like a somewhat trivial example, but once you start replacing loops with list comprehensions you can see the uses all over the place. Here are some more good examples of using list comprehensions:

http://docs.python.org/tutorial/datastructures.html#list-comprehensions
http://docs.python.org/howto/functional.html#generator-expressions-and-list-comprehensions
http://www.siafoo.net/article/52 (Section 2 deals with list comprehensions)

Python IDE Setup

I occasionally get some questions about how I do my python development and though I would share it.

 

For my main IDE on my windows box I use Eclipse with the PyDev module installed:

http://marketplace.eclipse.org/content/pydev-python-ide-eclipse

 

Since most of my development is on Linux, I have the Remote System Explorer plugin installed which lets me create and edit files on our linux server over sftp like they were local to my workstation:

http://help.eclipse.org/galileo/index.jsp?topic=/org.eclipse.rse.doc.user/gettingstarted/g1installing.html

 

Last but not least, I installed Eclipse Color Theme which is an easy way to install themes for eclipse:

http://marketplace.eclipse.org/content/eclipse-color-theme

I even when an made a theme (kind of Borland blue, not shown above) you can download here:

http://www.eclipsecolorthemes.org/?view=theme&id=4514

Speeding Python Up With Pypy

I’ve been working on some code that will use some supplied regular expressions to search through log files (I know, regex isn’t that efficient, yadda, yadda, yadda, but these were the requirements). The issue I was running into was that there was a lot of data. For example, I had 10 regexes that would search 36 gzipped files averaging 1.2 million lines each. The real issue was that these logs came in hourly, so if it couldn’t finish searching them all within an hour it was going to get backed up.

Being a good Pythonista, I followed the cardinal rules of:
Get it right.
Test it’s right.
Profile if slow.
Optimize.
Repeat from 2.
http://wiki.python.org/moin/PythonSpeed/PerformanceTips

The problem was, after a while I sort of hit a wall. Nothing I did could make this code appreciably faster (of course this was with my limited knowlegde. I’m sure that more experienced Python programmers could optimize this code a lot better then I can, rewrite the regex bottleneck in C, etc) but I was at the end of my rope.

On thing led to another and I remembered reading about Pypy. Pypy is implementation of Python using a JIT (Just In Time Compiler) and other things that have lost there meaning for me since I did systems programming in college. “What the heck”, I thought, “I’ll give it a try”. Pypy is supposed to be highly compatible with CPython (the regular python implementation) and my code didn’t use any exotic libraries.

So I dumped the tarball on my linux machine, unzipped and ran my unmodified code against it, and DAMN was it fast.

CPython run:

sean@linux1:~/code/python/hourly_alerts$ python alerter.py
Loaded regexes
Processing known_bad.log.gz
Searched 817051 lines in 119.398156166 seconds using filter

Pypy Run

sean@linux1:~/code/python/hourly_alerts$ /home/sean/bin/pypy-1.7/bin/pypy alerter.py
Loaded regexes
Processing known_bad.log.gz
Searched 817051 lines in 51.1275110245 seconds using filter

More then twice as fast! Now I know that it was a totally unscientific test and all, but its great to see such an improvement right away.

I my also try Cython, but that looks like it doesn’t have quite the drop-in functionality of Pypy.

Here are the link to Pypy : http://pypy.org/

Links

The yield keyword in Python - A great discussion on iterators and generators

Understanding Python Decorators - I’d never head of decorators until I saw this link and now can think of a bunch of uses for them

In Defense of the Internet Craftsman - “In Gutenberg’s era, the printmaker, not the machine, determined the subject matter of his work. No printing press could impose terms of service that dictated the language or content that could be printed. Instead, the craftsman was in full control of his speech. Yet these restrictions are being hardwired into modern technologies.”

Why Every Programmer Should Have A Tiddlywiki
 - I’ve used tiddlywiki’s before, but Eric shows a great way to start using a tiddlywiki to keep track of your projects and todo’s.

An Introduction to Asynchronous Programming and Twisted - This is not short, in fact it is probably book length and I have not read more then the beginning, but Twisted is something that I’ve always been really keen to learn and this looks like it might be a good introduction to it.