Forensic scripting best practices
Forensic best practices play a big part in what we do and, traditionally, refer to handling or acquiring evidence. However, we've designated some forensic best practices of our own when it comes to programming, as follows:
- Do not modify the original data you're working with
- Work on copies of the original data
- Comment code
- Validate your program's results (and other application results)
- Maintain extensive logging
- Return output in an easy-to-analyze format (your users will thank you)
The golden rule of forensics is: strongly avoid modification of the original data. Work on a verified forensic copy whenever possible. However, this may not be an option for other disciplines, such as for incident responders where the parameters and scope varies. As always, this varies on the case and circumstances, but please keep in mind the ramifications of working on live systems or with original data.
In these cases, it is important to consider what the code does and how it might interact with the system at runtime. What kind of footprint does the code leave behind? Could it inadvertently destroy artifacts or references to them? Has the program been validated in similar conditions to ensure that it operates properly? These are the kinds of considerations that are necessary when it comes to running a program on a live system.
We've touched on commenting code before, but it cannot hurt to overstate its value. Soon, we will create our first forensic script, usb_lookup.py, which is a little over 90 lines of code. Imagine being handed the code without any explanation or comments. It might take a few minutes to read and understand what it does exactly, even for an experienced developer. Now, imagine a large project's source code that has thousands of lines of code—it should be apparent how valuable comments are, not just for the developer but also those who examine the code afterwards.
Validation essentially comes down to knowing the code's behavior. Obviously, bugs are going to be discovered and addressed. However, bugs have a way of frequently turning up and are ultimately unavoidable as it is impossible to test against all possible situations during development. Instead, what can be established is an understanding of the behavior of the code in a variety of environments and situations. Mastering the behavior of your code is important, not only to be able to determine if the code is up for the task at hand but also when asked to explain its function and inner workings in a courtroom.
Logging can help keep track of any potential errors during runtime and act as an audit-chain of sorts for what the program did and when. Python supplies a robust logging module in the standard library, unsurprisingly named logging. We will use this module and its various options throughout this book.
The purpose of our scripts is to automate some of the tedious repetitive tasks in forensics and supply analysts with actionable knowledge. Oftentimes, the latter refers to storing data in a format that is easily manipulated. In most cases, a CSV file is the simplest way to achieve this so that it can be opened with a variety of different text or workbook editors. We will use the csv module in many of our programs.