Learning Python for Forensics
上QQ阅读APP看书,第一时间看更新

Advanced data types and functions

This section highlights two common features of Python that we will frequently encounter in forensic scripts. Therefore, we will introduce these objects and functionality in great detail.

Iterators

You have previously learned several iterable objects, such as lists, sets, and tuples. In Python, a data type is considered an iterator if an __iter__ method is defined or elements can be accessed using indices. These three data types (that is, lists, sets, and tuples) allow us to iterate through their contents in a simple and efficient manner. For this reason, we often use these data types when iterating through the lines in a file, file entries within a directory listing, or trying to identify a file based on a series of file signatures.

The iter data type allows us to step through data in a manner that doesn't preserve the initial object. This seems undesirable; however, when working with large sets or on machines with limited resources, it is very useful. This is due to the resource allocation associated with the iter data type, where only active data is stored in the memory. This preserves memory allocation when stepping through every line of a three gigabyte file by feeding one line at a time and preventing massive memory consumption while still handing each line in order.

The code block mentioned later steps through the basic usage of iterables. We use the next() function on an iterable to retrieve the next element. Once an object is accessed using next(), it is no longer available in iter(), as the cursor has moved past the element. If we have reached the end of the iterable object, we will receive StopIteration for any additional next() method calls. This exception allows us to gracefully exit loops with an iterator and alerts us to when we are out of content to read from the iterator:

>>> y = iter([1, 2, 3])
>>> y.next()
1
>>> y.next()
2
>>> y.next()
3
>>> y.next()
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
StopIteration

The reversed() built-in function can be used to create a reversed iterator. In the following example, we reverse a list and retrieve the next() object from the iterator:

>>> j = reversed([7, 8, 9])
>>> j.next()
9
>>> j.next()
8
>>> j.next()
7
>>> j.next()
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
StopIteration

By implementing generators, we can further take advantage of the iter data type. Generators are a special type of function that produce iterator objects. Generators are similar to functions, as those discussed in Chapter 1, Now For Something Completely Different, though instead of returning objects, they yield iterators. Generators are best used with large data sets that would consume vast quantities of memory, similar to the use case of the iter data type.

The code block mentioned later shows the implementation of a generator. In the function fileSigs(), we create a list of tuples stored in the variable sigs. We then loop through each element in sigs and yield a tuple. This creates a generator, allowing us to use the next() method to retrieve each tuple individually and limit the generators memory impact. See the following code:

>>> def fileSigs():
... sigs = [('jpeg', 'FF D8 FF E0'), ('png', '89 50 4E 47 0D 0A 1A 0A'), ('gif', '47 49 46 38 37 61')]
... for s in sigs:
... yield s

>>> fs = fileSigs()
>>> fs.next()
('jpeg', 'FF D8 FF E0')
>>> fs.next()
('png', '89 50 4E 47 0D 0A 1A 0A')
>>> fs.next()
('gif', '47 49 46 38 37 61')

Tip

You can refer to the file signatures at http://www.garykessler.net/library/file_sigs.html.

Datetime objects

Investigators are often asked to determine when a file was deleted, when a text message was read, or the correct order for a sequence of events. Consequently, a great deal of analysis revolves around timestamps and other temporal artifacts. Understanding time can help us piece together the puzzle and further understand the context surrounding an artifact. For this, and many other reasons, let's practice handling timestamps using the datetime module.

Python's datetime module supports the interpretation and formatting of timestamps. This module has many features, most notably—getting the current time; determining the change, or delta, between two timestamps; and converting common timestamp formats into a human readable date. The datetime.datetime() method creates a datetime object and accepts the year, month, day, and optionally hour, minute, second, millisecond, and time zone arguments. The timedelta() method shows the difference between two datetime objects by storing the difference in days, seconds, and microseconds.

First, we need to import the datetime library. This will allow us to use functions from the module. We can see the current date with the datetime.now() method. This creates a datetime object, which we then manipulate. For instance, let's create a timedelta by subtracting two datetime objects separated by a few seconds. In this case, our timedelta is a negative value. We can add or subtract the timedelta to or from our right_now variable to generate another datetime object:

>>> import datetime
>>> right_now = datetime.datetime.now()
>>> right_now
datetime.datetime(2015, 8, 18, 18, 20, 55, 284786)

>>> # Subtract time
>>> delta = right_now - datetime.datetime.now()
>>> delta
datetime.timedelta(-1, 85785, 763931)

>>> # Add datetime to time delta to produce second time
>>> right_now + delta
datetime.datetime(2015, 8, 18, 18, 10, 41, 48717)

Note

Results may vary if these commands are run at a different time than when they were in this book.

Another commonly used application of the datetime module is strftime(), which allows datetime objects to be converted into custom-formatted strings. This function takes a format string as its input. This format string is made up of special characters beginning with the percent sign. The following table illustrates the examples of the formatters we can use with the strftime() function:

In addition, the function strptime(), which we do not showcase here, can be used for the reverse process. The strptime() function will take a string containing a date and time and convert it to a datetime object using the formatting string. In the following example, we convert a UNIX timestamp, represented as an integer, into a UTC datetime object:

>>> epoch_timestamp = 874281600
>>> datetime_timestamp = datetime.datetime.utcfromtimestamp(epoch_timestamp)

We can print this new object, and it will automatically be converted into a string representing the datetime object. However, let's pretend that we do not like to separate our date by hyphens. Instead, we can use the strftime() method to display the date with forward slashes or using any of the defined formatters:

>>> print datetime_timestamp
1997-09-15 00:00:00
>>> print datetime_timestamp.strftime('%m/%d/%Y %H:%M:%S')
09/15/1997 00:00:00
>>> print datetime_timestamp.strftime('%A %B %d, %Y at %I:%M:%S %p')
Monday September 15, 1997 at 12:00:00 AM

The datetime library alleviates a great deal of stress involved in handling date and time values in Python. This module is also well-suited for processing time formats often encountered during investigations.