Learning Python for Forensics
上QQ阅读APP看书,第一时间看更新

Our second iteration – setupapi_parser.v2.py

With a functioning prototype, we now have some cleanup work to do. The first iteration was a proof of concept to illustrate how a setupapi.dev.log file can be parsed for forensic artifacts. With our second revision, we will clean up the code and make it so that it will be easier to use in the future by rearranging the code. In addition, we will integrate a more robust command-line interface, validate any user-supplied inputs, improve processing efficiency, and better display results.

On lines 1 through 3, we import libraries that we will need for these improvements. Argparse is a library we discussed at length in Chapter 2, Python Fundamentals, and is used to implement and structure arguments from the user. Next, we import os, a library we will use in this script to check the existence of input files before continuing. This will prevent us from trying to process a file that does not exist. The os module is used to access common operating system functionality in an operating system agnostic manner. That is to say, these functions, which may be handled differently on other operating systems, are treated as the same and share the same module. We can use the os module to recursively walk through a directory, create new directories, and change permissions of an object.

Finally, we import sys, which we will use to exit the script in case an error occurs to prevent faulty or improper output. After our imports, we have kept our documentation variables from before, only modifying the __version__ variable with an updated version number.

001 import argparse
002 import os
003 import sys
004 
005 __author__ = 'Preston Miller & Chapin Bryce'
006 __date__ = 20160401
007 __version__ = 0.02
008 __description__ = 'This scripts reads a Windows 7 Setup API log and prints USB Devices to the user'

The functions defined in our previous script are still present here. However, these functions contain new code that allows for improved handling and logically flows in a different manner. Modularized code like this allows for these kinds of modifications without requiring a major overhaul. This segmentation also allows for easier debugging when reviewing an error raised within a function:

010 def main()
...
029 def parse_setupapi()
...
052 def print_output()

The if statement serves the same purpose as the prior iteration. The additional code shown later allows the user to provide input. On line 65, we create an ArgumentParser object with a description, script version, and epilog containing author and date information. This, in conjunction with the argument options, allows us to display information about the script that might be helpful to the user when running the -h switch. See the following code:

063 if __name__ == '__main__':
064     
065     parser = argparse.ArgumentParser(description='SetupAPI Parser', version=__version__,
066                                      epilog='Developed by ' + __author__ + ' on ' + __date__)

After defining the ArgumentParser object as parser, we add the IN_FILE parameter on line 67 to allow the user to specify which file to use for input. Already, this increases the usability of our script by adding flexibility in the input path, rather than hard coding the path, as in the previous iteration. On line 68, we parse any provided arguments and store them in the args variable. Finally, we call the main() function on line 71, passing a string representing the setupapi.dev.log file location to the function, as follows:

067     parser.add_argument('IN_FILE', help='Windows 7 SetupAPI file')
068     args = parser.parse_args()
069 
070     # Run main program
071     main(args.IN_FILE)

Note the difference in our flow chart. No longer is our script linear. The main() function calls and accepts returned data from the parseSetupapi() method (indicated by the dashed arrow). The printOutput() method is called to print the parsed data to the console.

Our second iteration – setupapi_parser.v2.py

Improving the main() function

On line 10, we define the main() function that now accepts a new argument we will call setupapi. This argument, as defined by the docstring, is a string path to the setupapi.dev.log file to be analyzed.

010 def main(in_file):
011     """
012     Main function to handle operation
013     :param setupapi: string path to Windows 7 setupapi.dev.log
014     :return: None
015     """

On line 17, we perform a validation check on the input file to ensure the filepath and file exists using the os.path.isfile() function, which will return True if it is a file accessible by the script. As an aside, the os.path.isdir() function can be used to perform the same validation check for directories. These functions work well with both absolute or relative paths for strings representing file paths:

017     if os.path.isfile(in_file):

If the file path is valid, we print the version of the script. This time, we use the format() method to create our desired string. Let's look at the formatters we've used on lines 18 and 20, starting with a colon to define our specified format. The caret (^) symbol centers the supplied object on 20 equal signs. In this case, we supply an empty string as the object because we only want 20 equal signs to create visual separation from the output. If this was a real object such as the string "Try not. Do, or do not. There is no try." it would be sandwiched between 10 equal signs on both sides. On line 19, the format() method is used to print the script name and version strings, as follows:

018         print '{:=^20}'.format('')
019         print '{} {}'.format('SetupAPI Parser, ', __version__)
020         print '{:=^20} \n'.format('')

On line 21, we call the parseSetupapi() function and pass the setupapi.dev.log file that has been validated. This function returns a list of USB entries, with one entry per discovered device. Each entry in device_information consists of two elements, the device name and the associated date value. On line 22, we iterate through this list using a for loop and feed each entry to the printOutput() function on line 23:

021         device_information = parseSetupapi(in_file)
022         for device in device_information:
023             printOutput(device[0], device[1])

On line 24, we handle the case where the provided file is not valid. This is a common way to handle errors generated from invalid paths. Within this condition, we print to the user that the input is not a valid file on line 25. On line 26, we call sys.exit() to quit the program with an error of one. You may place any number here, however, since we defined this as one, we will know where the error was raised at exit:

024     else:
025         print 'Input is not a file.'
026         sys.exit(1)

Tuning the parseSetupapi() function

The parseSetupapi() function accepts the path of the setupapi.dev.log file as its only input. Before opening the file, we initialize the device_list variable on line 35 to store extracted device records in a list.

029 def parseSetupapi(setup_log):
030     """
031     Read data from provided file for Device Install Events for USB Devices
032     :param setup_log: str - Path to valid setup api log
033     :return: list of tuples - Tuples contain device name and date in that order
034     """
035     device_list = list()

Starting on line 36, we open the input file in a novel manner—the with statement opens the file as in_file and allows us to manipulate data within the file without having to worry about closing the file afterwards. Inside this with loop is a for loop that iterates across each line, which provides superior memory management. In the previous iteration, we used the .readlines() method to read the entire file into a list by line; though not very noticeable on smaller files, the .readlines() method on a larger file would cause performance issues on systems with limited resources:

036     with open(setup_log) as in_file:
037         for line in in_file:

Within the for loop, we leverage similar logic to determine if the line contains our device installation indicators. If responsive, we extract the device information, using the same manner as discussed previously. By defining the lower_line variable on line 38, we can truncate the remaining code by preventing continuous calls to the lower() method:

038             lower_line = line.lower()
039             # if 'Device Install (Hardware initiated)' in line:
040             if 'device install (hardware initiated)' in lower_line and ('ven' in lower_line or 'vid' in lower_line):
041                 device_name = line.split('-')[1].strip()

As noted in the first iteration, a fair number of false positives were displayed in our output. That's because this log contains information relating to many types of hardware devices, including those interfacing with PCI, and not just USB devices. In order to remove the noise, we will check to see what type of device it is.

We can split on the backslash character, seen escaped on line 43, to access the first split element of the device_name variable and see if it contains the string usb. As mentioned in Chapter 1, Now For Something Completely Different, we need to escape a single backslash with another backslash, so Python knows to treat it as a literal backslash character. This will respond for devices labeled as USB and USBSTOR in the file. Some false positives will still exist, as mice, keyboards, and hubs will likely display as USB devices; however, we do not want to overfilter and miss relevant artifacts. If we discover that the entry does not contain the string "usb", we execute the continue statement, telling Python to step through the next iteration of the for loop:

043                 if 'usb' not in device_name.split('\\')[0].lower():
044                     continue  # Remove most non-USB devices

To retrieve the date, we need to use a different procedure to get the next line since we have not invoked the enumerate function. To solve this challenge, we use the next() function on line 46 to step into the next line in the file. We then process this line in the same fashion as previously discussed.

046                 date = next(in_file).split('start')[1].strip()

With the device name and date processed, we append it to the device_list as a tuple where the device's name is the first value and the date is the second. We need the double parenthesis in this case to ensure that our data is appended properly. The outer set is used by the function append(). The inner parentheses allow us to build a tuple and append it as one value. If we did not have the inner parentheses, we would be passing the two elements as separate arguments instead of a single tuple. Once all lines have been processed in the for loop, the with loop will end and close the file. On line 49, the device_list is returned and the function exits.

047                 device_list.append((device_name, date))
048 
049     return device_list

Modifying the printOutput() function

This function is identical to the previous iteration, with the exception of the addition of the newline character \n on line 60. This helps separate entries in the console output. When iterating through code, we will find that not all functions need updating to improve the user experience, accuracy, or efficiency of the code.

Only modify an existing function if some benefit will be achieved:

052 def printOutput(usb_name, usb_date):
053     """
054     Print formatted information about USB Device
055     :param usb_name:
056     :param usb_date:
057     :return:
058     """
059     print 'Device: {}'.format(usb_name)
060     print 'First Install: {}\n'.format(usb_date)

Running the script

In this iteration, we address several issues from the proof of concept. These changes include the following:

  • The improvement of resource management by iterating through a file rather than reading the entire file into a variable
  • The addition of user specification of a file to use at the command line
  • The validation of the input file from the user
  • The filtering of responsive hits to reduce noise in the output

With the additional formatting changes, the entries are now spaced apart for easier review and contain fewer non-USB device entries. This iteration also allows users to re-run the program against multiple setupapi logs in different locations. The following screenshot shows output of our script after executing the script:

Running the script

Last but not least, we achieved considerable performance improvements over our previous design. The two screenshots later display the impact on the machine's memory utilization. The first iteration is displayed on the left and the second on the right. The red lines highlight the start and finish time of our script. As we can see, we have reduced our resource utilization by iterating across the lines of the file with the for loop over the readlines() method. This is a small-scale example of resource management, but a larger input file would have a more dramatic impact on the system.

Running the script