Link and Correlation-based Forensic Analysis

2.1 Digital Forensics

2.1.8 Link and Correlation-based Forensic Analysis

Information can be linked together, be divided into groups based on common content and properties. Social relations and networks are good examples of domains which reflects well what the various players have in common, whether it is friendships, family rela-tionships, interests or employment. The characteristics that unite the different players together can be genetic, social, depending on what one wants to investigate further. In such cases, the actors, whether they are objects, assets, people, or program code in a computer, can be linked together in one way or another. This will improve the unders-tanding of the information set as a whole. There are different methods to identify and represent such networks and connections. In this chapter we will discuss the use of link analysis and corresponding benefits of using it for forensic purposes.

Linking of Criminal Behaviour

The concept of linking entities and objects involved in the same crime can uncover pre-viously unknown information that gives a forensic investigator a new picture of the event and parties involved. In [35], Jesus Mena presents data mining techniques for investi-gating and detecting security breach and crime, where link analysis offer one of the important analysis techniques. Personal credentials, locations, information about orga-nizations, telephone numbers and email addresses are some of the attributes that can be used to link entities, objects and events involved in, e.g., financial fraud, laundering

or terrorist networks. Visual linking gives investigators the opportunity to handle evi-dence in an efficient manner and sometimes also work proactively to prevent organized crime from evolving and criminal events from happening in the future. Mena is mainly concerned about links between financial transactions, drug trafficking and ways of dra-wing organized crime and their network into graphs and maps. However investigation through the use of correlations and links can have a positive effect on defeating computer crime.

To give an example of successful use of linking, Mena presents a case study that deals with a drug trafficking case where the responsible police department had huge amounts of information available, but no good way to analyze the links associated with the crime.

The solution to this limitation was a web based application for querying and searching for links among the large amounts of data stored in the database. This gave the investigators a better view of the crime.

Figure 5: Illustration of i2 Analyst’s Notebook (from [4])

One of the most popular link analyst tools available, used by security, law enforce-ment and investigative analysts is the i2 Analyst’s Notebook [36]. Figure 5 illustrates how i2 Analyst’s Notebook can be used for Social Network analysis. This tool offers link and case analysis and support, e.g., network, flow, transactions and time line features, the ability to combine multiple charts to provide large ones of correlated subjects, infor-mation objects, events and additional relevant data (e.g., multimedia). Knowledge about linking tools (such as i2 Analyst’s Notebook) and how they can improve the efficiency of digital forensics are useful for designing a correlation system to identify malware and

other correlated incidents.

Correlation and Linking Techniques used in Digital Forensics

Existing digital forensics tools have limitations regarding correlations between different computer components that store digital evidence. Due to this issue, Caseet al.present FACE [12], which is a framework for discovering correlations between various evidence targets (e.g., hard drives). Manual analysis of correlated events from multiple evidence targets is time consuming and can involve subjectivity from the investigators point-of-view. With FACE, the goal is to provide automated correlation of events from a computer’s memory, disk and network capture. By correlating these sources, it is possible to give an investigator structured views of scenarios that have happened on the target computer.

The different views that can be presented to the investigator are user, groups, processes, file system and network. When, e.g., examining the user view, information about the user’s activity, used processes, files, network traffic linked to the activity are presented.

This way of looking at evidence and links to other evidence, increase the investigators efficiency in detecting malicious behavior.

While correlation among multiple computers are not considered for FACE, the view feature and the correlation techniques presented are certainly relevant for correlating malware evidence from multiple machines as well. Here the ability to map dependencies and highly relevant activities among them can reveal new and previously undetected evidence.

Another important element of correlating evidence in digital investigations is the use of timestamps. Due to variations between timestamps created by unsynchronized system clocks, issues arise when analyzing multiple computer systems. Schatzet al.investigate in [37] the details of how operating systems (in particular Windows) and computers synchronize clocks, and based on the knowledge obtained they presented a correlation method for how to increase timestamp’s integrity. As a resource of additional time infor-mation, browser records, which is a typical source of information in personal computers nowadays was used as an additional reference point. This is an interesting approach to decreasing uncertainty of the timestamps when analyzing multiple machines, using unsynchronized system clocks and for machines located in different time zones.

During analysis of numerous secondary market hard drives for detecting valuable cre-dentials, e.g., credit card numbers and email addresses, Simson L. Garfinkel discovered the utility value of automating correlation analysis. He introduced Forensic Feature Ex-traction (FFE) and Cross-Drive Analysis (CDA) in [13]. Based on feature exEx-traction and pseudo-unique identifiers, the proposed prototype is able to correlate information dis-tributed over multiple digital storage devices. Even though this approach fulfills many of the aspects of correlating evidence from multiple sources, the feature extractors are limited to only scan for and find predefined identifiers such as email addresses, social se-curity numbers (SSN), cookies and credit card number strings. Identifiers and extraction of features for correlating potential malware or other incidents in multiple sources are not included. Neither is identification of other correlated files and data that deviate from a standard set of system information.

In a study conducted by Hoelzet al.[38], a method for applying the concept of artifi-cial intelligence into computer forensics was presented. The goal was to further automate the process of examining correlated information from multiple sources and present the

evidence of highest value to the investigators. The work is mainly based on the Mul-tiAgent Digital Investigation toolKit (MADIK) [39] using Intelligent Software Agents (ISA) that can be used to extract relevant evidence for a specific forensic case. MADIK agents can make decisions to ignore standard system files (using MD5 hashing and com-parison against a knowledge base), examine time-specific information, system data and keyword searches to identify relevant evidence to an investigator.

Through the above-mentioned related work, we can see that to correlate data from multiple sources increases knowledge. In situations where multiple devices, such as com-puters are involved in an incident, correlations between the identification systems can be represented as links, depending on the context.

In document Cross-Computer Malware Detection in Digital Forensics (sider 38-41)