Our second iteration – setupapi_parser.v2.py
With a functioning prototype, we now have some cleanup work to do. The first iteration was a proof of concept to illustrate how a setupapi.dev.log
file can be parsed for forensic artifacts. With our second revision, we will clean up the code and make it so that it will be easier to use in the future by rearranging the code. In addition, we will integrate a more robust command-line interface, validate any user-supplied inputs, improve processing efficiency, and better display results.
On lines 1 through 3, we import libraries that we will need for these improvements. Argparse
is a library we discussed at length in Chapter 2, Python Fundamentals, and is used to implement and structure arguments from the user. Next, we import os
, a library we will use in this script to check the existence of input files before continuing. This will prevent us from trying to process a file that does not exist. The os
module is used to access common operating system functionality in an operating system agnostic manner. That is to say, these functions, which may be handled differently on other operating systems, are treated as the same and share the same module. We can use the os
module to recursively walk through a directory, create new directories, and change permissions of an object.
Finally, we import sys
, which we will use to exit the script in case an error occurs to prevent faulty or improper output. After our imports, we have kept our documentation variables from before, only modifying the __version__
variable with an updated version number.
001 import argparse 002 import os 003 import sys 004 005 __author__ = 'Preston Miller & Chapin Bryce' 006 __date__ = 20160401 007 __version__ = 0.02 008 __description__ = 'This scripts reads a Windows 7 Setup API log and prints USB Devices to the user'
The functions defined in our previous script are still present here. However, these functions contain new code that allows for improved handling and logically flows in a different manner. Modularized code like this allows for these kinds of modifications without requiring a major overhaul. This segmentation also allows for easier debugging when reviewing an error raised within a function:
010 def main() ... 029 def parse_setupapi() ... 052 def print_output()
The if
statement serves the same purpose as the prior iteration. The additional code shown later allows the user to provide input. On line 65, we create an ArgumentParser
object with a description, script version, and epilog containing author and date information. This, in conjunction with the argument options, allows us to display information about the script that might be helpful to the user when running the -h
switch. See the following code:
063 if __name__ == '__main__': 064 065 parser = argparse.ArgumentParser(description='SetupAPI Parser', version=__version__, 066 epilog='Developed by ' + __author__ + ' on ' + __date__)
After defining the ArgumentParser
object as parser
, we add the IN_FILE
parameter on line 67 to allow the user to specify which file to use for input. Already, this increases the usability of our script by adding flexibility in the input path, rather than hard coding the path, as in the previous iteration. On line 68, we parse any provided arguments and store them in the args
variable. Finally, we call the main()
function on line 71, passing a string representing the setupapi.dev.log
file location to the function, as follows:
067 parser.add_argument('IN_FILE', help='Windows 7 SetupAPI file') 068 args = parser.parse_args() 069 070 # Run main program 071 main(args.IN_FILE)
Note the difference in our flow chart. No longer is our script linear. The main()
function calls and accepts returned data from the parseSetupapi()
method (indicated by the dashed arrow). The printOutput()
method is called to print the parsed data to the console.
Improving the main() function
On line 10, we define the main()
function that now accepts a new argument we will call setupapi
. This argument, as defined by the docstring, is a string path to the setupapi.dev.log
file to be analyzed.
010 def main(in_file): 011 """ 012 Main function to handle operation 013 :param setupapi: string path to Windows 7 setupapi.dev.log 014 :return: None 015 """
On line 17, we perform a validation check on the input file to ensure the filepath and file exists using the os.path.isfile()
function, which will return True
if it is a file accessible by the script. As an aside, the os.path.isdir()
function can be used to perform the same validation check for directories. These functions work well with both absolute or relative paths for strings representing file paths:
017 if os.path.isfile(in_file):
If the file path is valid, we print the version of the script. This time, we use the format()
method to create our desired string. Let's look at the formatters we've used on lines 18 and 20, starting with a colon to define our specified format. The caret (^) symbol centers the supplied object on 20 equal signs. In this case, we supply an empty string as the object because we only want 20 equal signs to create visual separation from the output. If this was a real object such as the string "Try not. Do, or do not. There is no try."
it would be sandwiched between 10 equal signs on both sides. On line 19, the format()
method is used to print the script name and version strings, as follows:
018 print '{:=^20}'.format('') 019 print '{} {}'.format('SetupAPI Parser, ', __version__) 020 print '{:=^20} \n'.format('')
On line 21, we call the parseSetupapi()
function and pass the setupapi.dev.log
file that has been validated. This function returns a list of USB entries, with one entry per discovered device. Each entry in device_information
consists of two elements, the device name and the associated date value. On line 22, we iterate through this list using a for
loop and feed each entry to the printOutput()
function on line 23:
021 device_information = parseSetupapi(in_file) 022 for device in device_information: 023 printOutput(device[0], device[1])
On line 24, we handle the case where the provided file is not valid. This is a common way to handle errors generated from invalid paths. Within this condition, we print to the user that the input is not a valid file on line 25. On line 26, we call sys.exit()
to quit the program with an error of one. You may place any number here, however, since we defined this as one, we will know where the error was raised at exit:
024 else: 025 print 'Input is not a file.' 026 sys.exit(1)
Tuning the parseSetupapi() function
The parseSetupapi()
function accepts the path of the setupapi.dev.log
file as its only input. Before opening the file, we initialize the device_list
variable on line 35 to store extracted device records in a list.
029 def parseSetupapi(setup_log): 030 """ 031 Read data from provided file for Device Install Events for USB Devices 032 :param setup_log: str - Path to valid setup api log 033 :return: list of tuples - Tuples contain device name and date in that order 034 """ 035 device_list = list()
Starting on line 36, we open the input file in a novel manner—the with
statement opens the file as in_file
and allows us to manipulate data within the file without having to worry about closing the file afterwards. Inside this with
loop is a for
loop that iterates across each line, which provides superior memory management. In the previous iteration, we used the .readlines()
method to read the entire file into a list by line; though not very noticeable on smaller files, the .readlines()
method on a larger file would cause performance issues on systems with limited resources:
036 with open(setup_log) as in_file: 037 for line in in_file:
Within the for
loop, we leverage similar logic to determine if the line contains our device installation indicators. If responsive, we extract the device information, using the same manner as discussed previously. By defining the lower_line
variable on line 38, we can truncate the remaining code by preventing continuous calls to the lower()
method:
038 lower_line = line.lower() 039 # if 'Device Install (Hardware initiated)' in line: 040 if 'device install (hardware initiated)' in lower_line and ('ven' in lower_line or 'vid' in lower_line): 041 device_name = line.split('-')[1].strip()
As noted in the first iteration, a fair number of false positives were displayed in our output. That's because this log contains information relating to many types of hardware devices, including those interfacing with PCI, and not just USB devices. In order to remove the noise, we will check to see what type of device it is.
We can split on the backslash character, seen escaped on line 43, to access the first split element of the device_name
variable and see if it contains the string usb
. As mentioned in Chapter 1, Now For Something Completely Different, we need to escape a single backslash with another backslash, so Python knows to treat it as a literal backslash character. This will respond for devices labeled as USB and USBSTOR in the file. Some false positives will still exist, as mice, keyboards, and hubs will likely display as USB devices; however, we do not want to overfilter and miss relevant artifacts. If we discover that the entry does not contain the string "usb"
, we execute the continue
statement, telling Python to step through the next iteration of the for
loop:
043 if 'usb' not in device_name.split('\\')[0].lower(): 044 continue # Remove most non-USB devices
To retrieve the date, we need to use a different procedure to get the next line since we have not invoked the enumerate
function. To solve this challenge, we use the next()
function on line 46 to step into the next line in the file. We then process this line in the same fashion as previously discussed.
046 date = next(in_file).split('start')[1].strip()
With the device name and date processed, we append it to the device_list
as a tuple where the device's name is the first value and the date is the second. We need the double parenthesis in this case to ensure that our data is appended properly. The outer set is used by the function append()
. The inner parentheses allow us to build a tuple and append it as one value. If we did not have the inner parentheses, we would be passing the two elements as separate arguments instead of a single tuple. Once all lines have been processed in the for
loop, the with
loop will end and close the file. On line 49, the device_list
is returned and the function exits.
047 device_list.append((device_name, date)) 048 049 return device_list
Modifying the printOutput() function
This function is identical to the previous iteration, with the exception of the addition of the newline character \n
on line 60. This helps separate entries in the console output. When iterating through code, we will find that not all functions need updating to improve the user experience, accuracy, or efficiency of the code.
Only modify an existing function if some benefit will be achieved:
052 def printOutput(usb_name, usb_date): 053 """ 054 Print formatted information about USB Device 055 :param usb_name: 056 :param usb_date: 057 :return: 058 """ 059 print 'Device: {}'.format(usb_name) 060 print 'First Install: {}\n'.format(usb_date)
Running the script
In this iteration, we address several issues from the proof of concept. These changes include the following:
- The improvement of resource management by iterating through a file rather than reading the entire file into a variable
- The addition of user specification of a file to use at the command line
- The validation of the input file from the user
- The filtering of responsive hits to reduce noise in the output
With the additional formatting changes, the entries are now spaced apart for easier review and contain fewer non-USB device entries. This iteration also allows users to re-run the program against multiple setupapi logs in different locations. The following screenshot shows output of our script after executing the script:
Last but not least, we achieved considerable performance improvements over our previous design. The two screenshots later display the impact on the machine's memory utilization. The first iteration is displayed on the left and the second on the right. The red lines highlight the start and finish time of our script. As we can see, we have reduced our resource utilization by iterating across the lines of the file with the for
loop over the readlines()
method. This is a small-scale example of resource management, but a larger input file would have a more dramatic impact on the system.