End User Privacy Evaluation of Computer Software

(1)

End User Privacy Evaluation of Computer Software

—–

Asbjørn Klevjer

Thesis submitted for the degree of

Master in Network and system administration 60 credits

Department of Informatics

Faculty of mathematics and natural sciences

UNIVERSITY OF OSLO

(2)

(3)

End User Privacy Evaluation of Computer Software

—–

Asbjørn Klevjer

(4)

End User Privacy Evaluation of Computer Software http://www.duo.uio.no/

Printed: Reprosentralen, University of Oslo

(5)

Abstract

Privacy has been an important subject in the last years, and the newest EU data protection regulation (GDPR) has made privacy an important focus for both small and large businesses which process personal information. Privacy is important, but even though it is well known and in the news, it is not yet a well-developed field of study. Even though the GDPR has made end users (or “data subjects” as they are called in the GDPR) a priority, they may not have a full understanding of privacy, especially when it comes to software applications that process their information.

When presented with a piece of computer software installed on their computer, what can end users do to find out if it can be trusted with their personal information? What are the risks and possible consequences of using it, and does it outweigh the benefits? In this thesis, we will look at the definition of privacy in general and how it is described in the GDPR.

A Privacy Impact Assessment (PIA) is required in some instances in Europe by the new GDPR regulation when “the processing of personal information is likely to result in a high risk to the rights and freedoms of natural persons” [1]. A PIA report should be created for each project or set of projects that process personal information. It is recommended that a public part of this report is made available to the public, benefiting among others the data subjects whose personal information is processed in the system.

However, this is not widespread public knowledge, and the public part of these assessments often does not exist. A privacy concerned end user has no visible benefits from this and will have a hard time evaluating the potential privacy risks they face in using a new application. In this thesis, we will look at the obstacles of evaluating computer software from the viewpoint of a concerned data subject. As an example, we will try to do a PIA of Cybereason, which is an EDR and NGAV application. We will also look at alternative approaches for end user privacy evaluations.

(6)

(7)

List of Figures

3.1 The privacy impact assessment process steps (Source:[34]). . 26 3.2 Privacy principles and privacy targets (Source: [34]). . . 28 3.3 The six steps of LINDDUN (Source: [45]). . . 31 3.4 DFD

elements (Redrawn after: [45]) . . . 34 3.5 Data Flow Diagram example (Redrawn after: [45]) . . . 34 3.6 An example privacy attack tree from the LINDDUN webpage

(Source: [39]). . . 37 3.7 LINDDUN Mitigation taxonomy (Redrawn from: [45]). . . 39 4.1 The Cybereason platform architecture (Based on: [53]). . . . 43 4.2 Data flow between endpoint sensors and detection server

(Source: [57]) . . . 46 4.3 Data flows between the webApp server, micro services,

detection server and private intel threat server (Based on [57]) 48 4.4 Data Flow Diagram for the Cybereason architecture . . . 49 4.5 LINDDUN threat tree for “Linkability of data store” (Source:

[59]). . . 58 4.6 LINDDUN threat tree for “Identifiability of data store”

(Source: [59]). . . 59 4.7 LINDDUN threat tree for “Identifiability of data flow”

(Source: [59]). . . 60 4.8 LINDDUN threat tree for “Unawareness of entity” (Source:

[60]). . . 63

(12)

(13)

List of Tables

3.1 Threat modeling (Source: [42]) . . . 30

3.2 Mapping components to element types (Source: [45]). . . 35

3.3 Mapping table example (Source: [45]). . . 36

4.1 User types in Cybereason [55] . . . 45

4.2 List of privacy targets. . . 52

4.3 Envisioned protection demand for each privacy target . . . 56

4.4 Mapping table for Cybereason. . . 57

4.5 Mapping identified threats to privacy targets . . . 68

5.1 Privacy maturity example for linkability . . . 74

5.2 Privacy Maturity example for disclosure of information . . . . 76

5.3 Privacy Maturity example for secondary use . . . 78

5.4 Example of completed table for disclosure of information . . . 79

(14)

(15)

Acknowledgements

I would like to thank my supervisor at UiO Nils Gruschka for continual feedback and help in defining the problem statement. I also want to thank my colleague and co-supervisor Per Juvhaugen at Kommando for his suggestions and feedback.

Finally, a big thank you to my wife for financing most of my studies and for forsaking five years’ worth of fancy restaurant visits.

(16)

(17)

Chapter 1

Introduction

1.1 Motivation

In the last few years, privacy has been a common reoccurrence in the news and in public discourse. In March last 2018 it was revealed that Cambridge Analytica had harvested 87 million Facebook profiles. Facebook had created a third party API which gave developers access to a lot of personal information about accepting users, including information about their friends. Cambridge Analytica had created an app with approximately 300000 users which took advantage of this shared information. [2] This was later used by the US presidential candidate Ted Cruz and also reportedly by Donald Trump’s election campaign. There have also been allegations that the sharing of data has also contributed to Brexit. Of course, this is not the first time people’s personal information has been used in elections or in elections or in public influence campaigns. [3, 4].

Facebook has been under criminal investigation in Ireland and the US because of its data-sharing practices [5, 6]. The EU GDPR regulation has increased the awareness of privacy rights in general and made the big tech companies liable for damages. In January 2019 Google was hit with a 57 million dollar fine, and in 2018 it paid more GDPR fines than taxes [7].

In November 2018, the Norwegian government proposed changes to the law for data processing and retention by the Norwegian Intelligence Service, to make sure that the law covers their mission. The proposed law would let the intelligence agency store some data and all internet metadata crossing the border from and to Norway. The intelligence service wants this to be able to get reliable information about foreign actors who pose risks to Norway and its interests.

Since searching through the relevant data in transit is said to be impossible [8], the intelligence service must necessarily process all the information to see what is relevant and what is not. Most of the collected data will be surplus data, but it will also be data about Norwegians in Norway, and not only foreigners. It is however technically impossible to only look at the data about foreigners. The law and the intelligence service’s task prohibits it to directly collect data about Norwegians in Norway.

Social media and privacy online have also been given a lot of attention

(18)

in the news. People are being informed about privacy policies on these platforms, and even if they do not read it, there is a lot of attention given to user privacy. Internet privacy is important because it deals with personal information.

Companies wanting to protect their assets, business and computer systems employ different security controls in their networks, their computer systems and on end user workstations. EDR and NGFW are somewhat new examples of these and are used to defend against some cyber security threats. These tools work by collecting a lot of timestamped information about the machine, including which processes are running, network connections, IP addresses, file information, username, name, email address among others. One may argue that whatever an employee does at work is and should be work-related. However, there is an emerging trend that works and leisure is co-mingled, and it is getting more normal to check your work email in the evenings and have chats with friends and family during working hours. Some employees may also want to use the office computer to check personal matters in a break or after work.

While these kinds of software are not dealing primarily with personal information, it may be possible to extrapolate or infer information about users by looking at the collected metadata from such security applications.

This is only one example of a computer application meant to provide some kind of service, where a lot of information is needed constantly for it to be effective and valuable. How can the users of such systems make an evaluation of what risks they expose themselves to (or are exposed to by others)? What tools are at their disposal, and what benefits do these tools provide?

1.2 Contribution

There is no standardized and easily comparable way of doing an after- the-fact third party privacy evaluation. The closest we get to a privacy evaluation is the Privacy Impact Assessment (PIA) or Data Protection Impact Assessment (DPIA) as required under certain circumstances by GDPR. These assessment frameworks require detailed information about stakeholders, system requirements, system design, and internal data structures, which may not be available by any outside entity. The PIA is not meant to be an after-the-fact evaluation, but a process for identifying and mitigating privacy threats as early as possible in the development of the system. However, it may be used to help the discussion when evaluating an application.

Additionally, we may be able to learn something about other possible approaches by making an attempt to do the assessment even without the required documentation, policies, and stakeholders.

In this thesis, I will use the PIA framework by Oetzel and Spiekermann to help the discussion when trying to evaluate the privacy impact of endpoint detection and response (EDR) applications. The DPIA framework consists of seven steps, which I will go through later. Step 4 and 5 are the identification of threats and the corresponding controls. How these steps should be done

(19)

are not specified in detail, and I will try to make use of the LINDDUN privacy threat modeling methodology to support the analysis while doing these steps.

The scope of the thesis is the evaluation of user privacy risks, and not a risk management assessment from a data controller’s perspective, which would depend on a lot of factors, including its risk tolerance level and resources. In rating risks and probabilities for an end user, I would need to make educated guesses.

My problem statement will be: “How can end users evaluate the privacy risks and potential impacts of a computer application?” What tools and methods exist today for end user evaluations of privacy risks in computer applications? Are there obstacles today for such end user evaluations, and if so what can be done to reduce such obstacles? What can be done to create a user-friendly solution where users can make an informed decision about whether to trust a specific piece of software?

(20)

(21)

Chapter 2

Background

2.1 Definition of Privacy

Privacy is a concept that has been given different meanings from different people and scholars.

Harward Law Review identified in 1890 that attention was needed to protect “the right to be let alone”, considering “instantaneous photographs”

and the “newspaper enterprise’ [9].

Alan Westin defined privacy in 1967 as “the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others” [10].

Nissenbaum defined privacy as “contextual integrity”, a construct meant to handle new challenges posed by new information technologies, citing the need for information gathering and distribution only in its defined context [11].

Pennsylvania Law Review has even claimed that privacy [is] a concept in disarray and that nobody can articulate what it means [12].

It has been defined as a human right by the European Convention on Human Rights (ECHR). There it is described as the right to respect for one’s “private and family life, his home and his correspondence, and his correspondence” [13].

Sociologist Barrington Moore said that “the need for privacy is a socially created need. Without society there would be no need for privacy” [12].

Solove lists even more concepts associated with privacy: [14]

• Freedom of thought

• Control over one’s body

• Solitude in one’s home

• Control over personal information

• Freedom from surveillance

• Protection of one’s reputation

• Protection from searches and interrogation

(22)

2.2 Three waves of privacy definitions

Tomossy [15] shows that the concept of privacy has evolved over time and identifies three waves of evolution of its definition, which followed the development of technology and human interaction over time: The invention of photographs and printed media, information technology, and the growth of telecommunications and the internet.

The first wave was influenced by the work of Warren and Brandeis called

“The Right to Privacy”. The emphasis was then on the protection of the private person, against the aforementioned inventions of “instantaneous photographs and newspaper enterprise”. A tort of invasion of privacy was established, but it did not apply to “matters of public or general interest”

[9].

Solove explains how privacy relates to the individual and society. Society is made up of people and unwritten social rules and norms that hold it together and limit deviations. However, each does not conform perfectly to society and has activities he wants to which are not activities for sharing with the outside world or seen as impure or unappealing.

Privacy lets people do activities they would otherwise not be able to do.

These activities may not be a problem in themselves but are personal and do not relate to the rest of society [12]. Let us take the perfectly natural activity of human defecation as an example. This activity is an essential part of the human digestion, and necessary for human life. It is however very unappealing, has a possibly bad smell and involves natural reflexes that we are not in total control of. One would not be proud if one’s toilets visits were broadcasted to the rest of society, even though rationally it would not make any difference since all others engage in the same activities, some of which may be even more unappealing.

An individual is either alone, or interacting with society and privacy in itself is only an interesting problem when the individual needs to interact with society. With a more interconnected society, there is a greater need for personal space needs to be defended, as identified by Warren and Brandeis.

The second wave came with the establishment of human rights and legal reforms after the Second World War. Privacy was seen not only as a private right but as a public good. According to Prosser [16], privacy was now seen not only as the right to be let alone but as four torts:

• Intrusion into apersons seclusion, solitude or private affairs;

• Public disclosure of embarrassing facts

• Publicly placing a person in a false light

• Appropriation of a persons name or likeness for advantage of another The “. . . claim of individuals to determine for themselves . . . ” [10] by Westin was added later, noting that government was controlled by publicity and privacy as protection for individuals. [16]

The third and current wave of privacy started with the exponential growth of the telecommunication and the computer industry and the

(23)

invention of social media. The definition has again shifted and discussion has involved equality, liberty, and freedom.

2.3 Privacy taxonomy

Seeing that there are a wide variety of definitions on the same subject, it seems like privacy is a concept that is not well understood, or perhaps has a different meaning for different contexts or users. Finding the correct definition, or taxonomy is important if one is to use it as the basis for evaluation. A more structured and detailed explanation of the definition of privacy and its components is needed. There have been a few attempts at creating a general taxonomy:

Clarke introduced four types of privacy [17]:

• Privacy of the person

• Privacy of personal behaviour

• Privacy of personal communications

• Privacy of personal data

Eckhoff and Wagner argues that privacy should be categourized into the following five types [18]:

• Location

• State of Body & Mind

• Behaviour & Action

• Social life

• Media

The most extensive and encompassing taxonomy is given by Solove [12].

It is a general taxonomy of privacy, with subcategories. The taxonomy touches upon a lot of the previous definitions but gives a more detailed and exhaustive explanation. The taxonomy is focused on law and goes into detail in explaining the different subcategories with examples. It identifies four general “harmful activities” which are directly related to privacy violations:

• Information collection

• Information processing

• Information dissemination

• Invasion

The general ideas in the following are taken from “A Taxonomy of Privacy” by Daniel J Solove [12]. I have omitted some minor areas unrelated to the task at hand.

(24)

Information collection

The basis for all privacy violations is information about an individual.

Information collection is the act of collecting information using either interrogation or surveillance.

Surveillancecan in itself be a tool of social control, because it “enhances the power of social norms”, which are more effectively enforced when people are being observed [19]. It limits personal expression and can lead to changed behavior, self-censorship, and inhibition. In other words, the act of surveillance has consequences in addition to the collected information.

Interrogation is defined as “the pressuring of individuals to divulge information”, either by questioning, social pressure or risk of alienation [12]. Interrogation is not defined only as classic police interrogation techniques, but may also be divulging personal information to get a job, or not risk being left out of a social group.

Information processing

Information processing encompasses aggregation, identification, insecurity, secondary use, and exclusion.

Data aggregation is the act of gathering information in different areas which may reveal information not intended when the information was being gathered. Different data points may reveal parts of the whole and may show only parts of the truth. The aggregation of data may create false connections that are not real or reveal more than the person has agreed to share.

Identification is attaching a database of pre-gathered information to a physical identity. A lot of information can be collected on a person, but is relatively useless in itself. When the data is attached to a personal identifier, this can be hard to get rid of, may lead to biases, and limit the subjects’

ability to express themselves freely.

Insecuritymeans the harm of being put in a disadvantage because of the information processing, or being vulnerable to future harm.

Secondary useis the use of collected and aggregated data for purposes not intended when the data was collected. This may also lead to the data being misunderstood when taken out of context, or the context being distorted. Together with data aggregation, this can lead to major privacy violations and a loss of personal control over the information.

Exclusionmeans not notifying the individuals originating the data about the data collection or limiting their access to the data.

Information dissemination

Information dissemination is the act of releasing the processed information.

It can come about in different manners and can be in the form of disclosure, exposure or breach of confidentiality.

Disclosure is when factual information about a person is revealed to others. The harm of this is related to the potential damage to the persons’

(25)

reputation, and the dissemination of information beyond what the person had expected or consented to in the first place.

Exposure is when physical or emotional details about a person is revealed. These details may be normal and natural, but are part of the things we tend to keep secret. This may be sexual activity, naked images or anything a person may deem “animal-like” or disgusting. Showing pictures of a persons’ naked body is not something unnatural or illegal if done by the person on the pictures, or in agreement with him. Normally however, this is not something people want to share, and even though he may not suffer monetary losses, people may hold him in a different view afterward.

Breach of confidentialityis when trust in a specific relationship is violated, ie. a doctor-patient relationship. The harm is both the disclosure of the information itself, but also the betrayal of the victim.

Information dissemination may also come as a result of Blackmail where someone threatens to make true information about a person public, Distortion where false or distorted information may be revealed, or appropriation where money or fame is acquired off a person’s good name and character.

Invasion

The fourth harmful activity is invasion, either as intrusion or decisional interference.

Intrusionis invasions into a person’s life, and disturbs his daily activities and solitude. Intrusion means that a person is not able to rest in private from the pressures of social life. Intrusion in this context may be the constant surveillance of a persons’ habits during work,

Decisional interference means undue influence over a persons’ life and decisions, especially the decisions that are considered private, such as home, family and body.

2.4 GDPR

The General Data Protection Regulation (GDPR) was approved by the EU Parliament in 2016 and implemented in May of 2018. It replaces the old Data Protection Directive 95/46/EC. It is designed to protect citizens data privacy and harmonize data privacy law across the European Union. It includes a set of rules for processing personal information of European Union natural persons, a set of data subject rights, and duties corresponding to data controllers and data processors when handling personal information.

It is intended to change organizations’ attitudes towards data privacy rights and introduces significant fines for non-compliance and breach of these rights. It also covers transfers of such data to third countries, and a framework for cooperation, data sharing, and regulatory authorities in each member country.

The GDPR applies not only to organizations inside the EU, but also all organizations outside the EU providing services relying on personal

(26)

information of citizens located inside the EU. [20]

Personal information is usually described as personal data, which essentially is the same thing. I will for the most part describe this aspersonal informationin this thesis.

2.4.1 GDPR Privacy Principles

GDPR builds on the following privacy principles [1]:

• Lawfulness, fairness and transparency: Personal information shall be processed lawfully, fairly and transparently in relation to the data subject

• Accountability: The data controller is responsible for the data processing and is accountable to demonstrate compliance

• Purpose limitation: Data processing should only be done based on

“specified, explicit and legitimate purposes” [1].

• Integrity and confidentiality:

• Accuracy: Personal information should be kept up to date and reasonable steps should be taken to ensure the accuracy of the data

• Storage limitation: The personal information should no be kept only in such a state that data subjects are identifiable longer than necessary

• Data minimization: The collection and processing of personal information should be relevant to the stated purpose.ma

2.4.2 Data controllers

Under GDPR, a data controller is the entity that decides the purposes for and the means for which personal information is processed. [21] In the case of antivirus and EDR, the organization using the application to secure its own systems are the data controller.

2.4.3 Data processors

Data processors process data on behalf of the data controller and is usually a third party, external to the data controller. The duties and responsibilities of the data controller and the data processor must be specified in a contract “or another legal act” [21]. In the case of antivirus and EDR, the organization doing the actual processing of information in order to secure a customers’

computer systems are the data processor

(27)

2.4.4 Data Subject rights

The GDPR defines a number of Data Subject rights [1]:

1. The right to be informed

Articles 13 and 14 refers to information that must be provided to the data subject upon request, either when data is collected directly from the data subject (13), or when the data is collected indirectly or from a third party (14).

The data controller is required to give the data subject information on who the data controller is, the legal basis for processing the information, if the information is shared with third parties, and if so safeguards to the data.

The data controller must also inform about the storage duration, legal basis and the purpose for processing. There are other points as well, including information about the use of automated decision-making or profiling, including the logic behind this decision-making.

Finally, the data processor should inform the data subject about the other rights, including the right to request access, rectification, erasure, object and data portability, and if applicable, the right to withdraw consent and lodge a complaint with a supervisory authority

2. The right to access

Article 15 of the GDPR states that the data processor should give the data subject access to his personal information being processed by the data processor. This empowers the data subject to further exercise his (other) rights. Article 12 is also relevant, as it treats transparent information and communication with the data subject.

The data subject should have access to his personal information if the data controller does process any. If so, and in addition, the data subjects can ask about the reasons for processing and how the information is processed, the categories of information and retention times of the processing. He also has the right to know who can see the information, and if not collected directly from the data subject, how it is collected. A data subject may also ask how the information is used for profiling and automated decision- making. The request should be answered within a month, and in most circumstances without paying a fee.

3. The right to rectification

The data subject shall have the right to obtain from the controller without undue delay the rectification of inaccurate personal information concerning him or her. Taking into account the purposes of the processing, the data subject shall have the right to have incomplete personal information completed, including by means of providing a supplementary statement.

Article 16 describes the data subject’s right to rectification, meaning that the data controller shall change or modify the information they process on

(28)

the request of the data subject if he believes the information is outdated or inaccurate or incomplete. This rectification should be made without undue delay.

4. The right to be forgotten

Article 17 treats the right to erasure, or “the right to be forgotten”. The right to be forgotten applies in the following instances: i) The personal information is no longer necessary in relation to the purposes for which they were processed ii) The processing was based on consent, which the data subject has withdrawn, iii) the data subject objects to the processing without any “overriding legitimate grounds”, iv) The information was unlawfully processed, or v) the information must be erased to comply with a legal obligation or law.

If and when the information is deleted, the data controller must also take reasonable measures to inform other controllers or processors it cooperates with so they are able to erase links to, and copies of the personal information.

5. The right to restriction of processing

Article 18 defines the right for the data subject to request “restriction of processing”. This means that the data controller must stop processing temporarily if one of the following is true: i) The data subject objects to the accuracy of the data ii) The data subject objects to unlawful processing, and rather wants the data controller to restrict processing instead of deleting the information or iv) the data controller has no other need for the data than to prepare for an eventual legal claim or defense.

When data processing is stopped temporarily, the data controller must inform the data subject before resuming the data processing.

6. The right to data portability

GDPR introduces the right to receive a copy of the personal information stored by the data controller in a commonly used “machine-readable”

format, supporting the possibility of easy transfer of personal information to a second service provider. This applies only where the processing is based on consent or a contract and processing is done with automation.

This right only applies to information the data subject has provided directly to the data controller.

7. The right to object

The data subject has the right to object to the processing of his personal information, described in article 21. The controller should no longer process the personal information unless it demonstrates “legitimate grounds”

overriding the objection. If the data is processed for marketing purposes, the data subject should be able to object to this marketing including any

(29)

profiling, and the data controller may no longer contact the data subject for marketing.

These rights should be presented before the data subject communicates with the data controller or the first time they communicate.

8. The right to not be subject to a decision solely based on automatic decision-making

There is also the right to avoid decisions solely based on automatic processes, which “produces legal effects concerning him or her or similarly significantly affects him or her”. See Article 22 of the GDPR [1].

This right does however not apply if it is necessary for a contract between the data subject and a data controller, if it is authorized by law which also safeguards the data subjects’ rights, or if it is based on the data subjects explicit consent.

This does not mean that the data subject may avoid any automatic decision-making, but that he is able to contest the decision and demand a second look by a human.

2.5 Privacy By design

Privacy By Design is a concept created by Ann Cavoukian when seh was Information & Privacy Commissioner in Canada. It is included as privacy principles in the GDPR in article 25 (Data protection by design and by default). It contains seven principles[22]:

• 1. Proactive not Reactive; Preventative not Remedial

• 2. Privacy as the Default Setting

• 3. Privacy Embedded into Design

• 4. Full Functionality. Positive-Sum, not Zero-Sum

• 5. End-to-End Security. Full Lifecycle Protection

• 6. Visibility and Transparency. Keep it Open

• 7. Respect for User Privacy Keep it User-Centric

2.6 Analysis context

This section will explain the context of the analysis, including Cybereason and the technologies it uses.

2.6.1 NGAV and EDR

Next Generation AntiVirus (NGAV) and End Point Detection and Response (EDR) are marketing terms for computer software that provides anti- malware services (NGAV) and firewall and intrusion detection services for end user machines (EDR).

(30)

Traditional antivirus

Some years ago, malware was earlier categorized by the activity of an executable that was either a malicious file itself or contained vulnerabilities that could be exploited. A piece of malware could be a computer virus, worm, trojan horse, logic bomb, backdoor, botnet, ransomware, rootkit, keylogger or some other name.

Traditionally, executables which were inherently malicious or contained malicious code were stopped by using antivirus software which used a combination of the following techniques for identifying and handling malware [23]:

• On-Access Scanning

• Background scanning

• Heuristic checking

• Content scanning

• Full system scans

Antivirus software contains or queries a large database of file signatures and possibly a set of heuristic characteristics for identifying malware. This is used when opening a file, either directly, or as a background process.

Antivirus software is able to review the contents of a file before opening and identify some malicious behavior as seen in already registered malware.

This database needs to be kept continuously updated to be effective.

Even though heuristic algorithms are used, the software usually makes use of simple techniques such as hash values, which is can be easily avoided.

Antivirus software does generally not identify system behavior over time, and the scans are done on an independent and isolated basis, making it unable to identify complex and changing behavior over time. In short, antivirus software protects an endpoint from isolated and known malware attacks, on an isolated basis.

NGAV is the name of antivirus software that uses new technology such as artificial intelligence and machine learning to provide antivirus services.

The name is more of a marketing slogan than a new technology, and there is no definition of the term. [24]

EDR tools work by installing a sensor on each endpoint in a computer network, that continuously monitors user and system activity over time and stores this information in a central database. The database collects information from a number of endpoints in the system. The EDR solution uses this information for analysis and detection of malware. The database is connected to an external signature database which contains updated signatures and patterns for known malware.

The main difference between EDR and traditional antivirus software is that events are not seen in isolation, neither on one machine nor at one point of time. Data from different sources are correlated and analyzed, making the tool capable of early detection of an attempted attack, even for new attacks that have no former fingerprint.

(31)

Anton Chuvakin first coined the name EDTR (Endpoint Threat Detection

& Response), which was later renamed to EDR. The name emphasized that it is dealing with threats to the endpoint, instead of treats to the network.

[25]

EDR tools also feature some dashboard where alerts are shown, and suspected malware is given a criticality rating, to make prioritizing potential malware easier. There is also a user-friendly investigation access, security alerts, and logging.

Artificial intelligence and machine learning are also leveraged in EDR to be able to detect previously unseen malware through behavior analysis.

Sensors can be installed on domain controllers, database servers, and workstations and can provide real-time information about the state of the system and make it easier and quicker to identify and handle security incidents. [26] To investigate security incidents, all relevant data must be gathered and analyzed. By using continuous monitoring and data collection as done in EDR tools, most of the needed data in an investigation may already be collected when the investigation begins.

2.6.2 Cybereason

Cybereason is a cyber security company established by former Israel Defence Forces Unit 8200 [27] security experts, and now has its headquarters in Boston, US. Cybereason specializes in EDR, NGAV and Managed Detection and Response (NDR) software.

Its software uses machine learning and behavior-based detection. It collects information about computers on the network and uses a proprietary

“hunting engine” analyzing the collected data. It detects incidents or

“malops” across the network and offers automatic remediation tools in its user interface. A closer description of the software is given when doing the privacy analysis.

(32)

(33)

Chapter 3

Methodology

3.1 Privacy Impact Assessment

“Where a type of processing in particular using new technologies, and taking into account the nature, scope, context and purposes of the processing,is likely to result in a high risk to the rights and freedoms of natural persons, the controller shall, prior to the processing, carry out an assessment of the impact of the envisaged processing operations on the protection of personal information. A single assessment may address a set of similar processing operations that present similar high risks”[1].

“A Privacy Impact Assessment (PIA) is a systematic process for identifying and addressing privacy issues in an information system that considers the future consequences for privacy of a current or proposed action”[28].

Article 35 of the GDPR requires that the data controller carry out a Data Protection Impact Assessment (DPIA) where the data processing is likely to result in a high risk to the rights and freedoms of natural persons.

This should be done and documented before starting the processing. The distinction between PIA and DPIA is not great, but a PIA emphasizes an assessment into the privacy, whereas the DPIA in GDPR is focusing on data protection, which may be a more narrow definition. I will use the acronym PIA in the following, since I am not doing the assessment to be compliant with the GDPR.

Before doing the PIA, I will go through some evaluation criteria for assessing whether or not a high risk is envisioned for the data processing after explaining the concept of the PIA and the inherent steps in the PIA process.

In the GDPR, the burden for managing user risks are placed on the data controller, which has to take into account the risks of varying likelihood and severity for the rights and freedoms of natural persons [1].

The obligation of conducting a PIA should be seen in the context of their general obligation to manage risks to the rights and freedoms of the natural persons whose personal information is being processed.

A PIA is not a privacy strategy implemented by an organization but is rather done on a project basis. It should be done in advance of the project, rather than after the fact and is thus separate from a privacy audit. The

(34)

PIA should be done as early as possible, both to avoid incurring unnecessary cost, and to follow the Privacy By Design principles where privacy should be embedded into the design.

The PIA should be based on an understanding of the privacy risks to the data subjects and society in general. It should also have a broad scope and take into consideration the interest of the affected persons, not only the organization heading the PIA and its partners. It is important to assess if the importance and necessity of the processing is proportionate for the stated objective of the processing.

The group doing the analysis should try to identify all privacy risks and negative impacts. The goal should be to mitigate all identified risks, and residual risks should be documented and justified.

3.1.1 Benefits of the PIA

The PIA/DPIA is not only a document that helps reduce risks to the data subjects but does also involve benefits to the data processor and its organization in different ways. Tancock et al. have made a list of potential benefits of the DPIA [28]:

• Meeting and Exceeding Legal Requirements: The DPIA may help the organization to meet and even exceed legal requirements. It is in the organizations interest to assure compliance to legal requirements to avoid unnecessary legal fees. The GDPR introduces substantial fines for non-compliance.

• Needing to be Prospective: The DPIA forces the organization to be prospective, meaning that they are able to “identify, avoid, mitigate, stop, or suggest alternative solutions to privacy risks”.

• Avoiding Unnecessary Costs: An early implementation of the DPIA may also avoid unnecessary costs and expensive changes later in the project when all changes are relatively more expensive.

• Eliminating Inadequate Solutions: Inadequate solutions may also be avoided, due to privacy issues being identified at the start of the project.

• Avoiding Loss of Trust and Reputation: The DPIA may also help the company keep its trust and reputation. If the system is designed in a good way and privacy flaws are not implemented, this will not give reasons for concern for customers or the affected persons. The reputation is maintained or even strengthened in an organic way, not only because attentive eyes are not at the organization, but because privacy issues are being handled in the design of the system. People who may be concerned about potential privacy risks may be reassured, and this may help the adoption of the product and support from stakeholders at an early stage of the project.

• Informing Decision Makers and Stakeholders: Lastly, decision- makers and stakeholders are better informed, and give them a clearer

(35)

understanding of the project, making it easier to have their own perspectives heard.

3.1.2 What entails a high risk?

As we have seen, a PIA is needed and required where the data processing is likely to result in a high risk to the rights and freedoms of natural persons. In evaluating whether or not a high risk may be envisioned, the GDPR Article 35(3) outlines a few examples for when the processing operation is likely to result in high risks [1]:

• (a) a systematic and extensive evaluation of personal aspects relating to natural persons which is based on automated processing, including profiling, and on which decisions are based that produce legal effects concerning the natural person or similarly significantly affect the natural person 12;

• (b) processing on a large scale of special categories of data referred to in Article 9(1), or of personal data relating to criminal convictions and offenses referred to in Article 10 13 ; or

• (c) a systematic monitoring of a publicly accessible area on a large scale.

There are however other circumstances for when a PIA is required.

The Article 29 Data Protection Working Party (WP29) has made a set of guidelines for the PIA to ensure that the GDPR is applied consistently. They have developed the following nine criteria for assessing whether the risk is

“high” [1, 29]:

• 1. Evaluation or scoring. The data subject has the right to an evaluation not solely based on automatic processing. Any automatic decision-making including profiling which may result in an automatic refusal such as an automatically denied credit application or job application. Recital 71 states that this is “in particular to analyze or predict aspects concerning the data subject’s performance at work [. . . ]” among other examples. The data controller is required to use

“appropriate mathematical or statistical procedures” for such profiling, and implement measures to minimize the risk of errors. Recital 91 describes that an assessment is required for large-scale processing which could affect a large number of data subjects.

• 2. Automated-decision making. A DPIA is also required if the processing with automated decision-making produces legal effects of data subjects, or “significantly affects the natural person” (See Article 35(3)). This could be any exclusion or discrimination as a result of the data processing.

• 3. Systematic monitoring. If processing is done to monitor or control data subjects, where data may be collected through networks or monitoring of a publicly accessible space. In these circumstances,

(36)

personal information may be collected without data subjects being aware of what is collected and how it is used. It may also be impossible for data subjects to avoid the data collection.

• 4. Sensitive data or data of a highly personal nature: If the personal data is sensitive (contains, for instance, health data, criminal offences) or is linked to household or private activities (for instance contents of text message), or if it impacts the exercise of a fundamental right (for instance location data upon the right of free movement).

• 5. Data processed on a large scale. The WP29 recommends the following factors to assess whether data is processed on a large scale [29]:

– the number of data subjects concerned, either as a specific number or as a proportion of the relevant population;

– b. the volume of data and/or the range of different data items being processed;

– c. the duration, or permanence, of the data processing activity;

– d. the geographical extent of the processing activity.

• 6. Matching or combining datasets: Aggregating datasets creates risks for the data subject as explained by Solove, both by linking information which together reveals more information than intended, and by possibly creating false connections, and this holds especially when what is aggregated exceeds what the data subject would reasonably expect.

• 7. Data concerning vulnerable data subjects: One of the principles in GDPR is that an imbalance of power between the data controller and the data subject may hinder the data subject’s ability to consent, oppose or in other ways exercise their rights. Vulnerable data subjects may be children, employees and the elderly.

• 8.Innovative use or applying new technological or organizational solutions: The use of new technology where there has not previously has been done a similar assessment can mean that there is a need for a DPIA. The new technology may result in new collection and usage with high risk to the data subjects’ rights and freedoms.

• 9. The processing itself prevents data subjects from exercising a right or using a service or a contract. This means that any processing that allows, modifies or refuses data subject’ access to a service or a contract.

(37)

3.1.3 Choosing PIA framework

There exist a few frameworks for carrying out a PIA:

• Conducting Privacy Impact Assessments Code of practice, Information Commissioner’s Office (ICO) [30]

• Privacy and Data Protection Impact Assessment Framework for RFID Applications [31]

• Privacy Impact Assessment (PIA), Commission nationale de l’informatique et des libertés (CNIL) [32]

• ISO/IEC 29100:2011 - Information technology Security techniques Privacy framework [33]

• A systematic methodology for privacy impact assessments: a design science approach [34]

I have chosen to use the PIA framework by Oetzel and Spiekerman. Its predecessor (RFID PIA) was endorsed by the Article 29 working party. [29]

It lists many of the other PIA frameworks and criticizes them for lack of a wider understanding of privacy. It also states that privacy is more than data protection as outlined in law and regulation. It builds on the same privacy taxonomy of Solove and extends it with a comparison with data protection.

Additionally, it seems to be one of the more mature frameworks for privacy impact assessment.

Other frameworks could have been used as well but did not seem fitting.

The ISO standard [33] is very comprehensive and may be unfit for a single end user. The current draft for the ICO’s Privacy Impact Assessment Code of Practice is rather general, is not standardized or streamlined for use in all situations. [35] It does not have any input or output factors to help the analysis, and individual risks are not matched with corresponding privacy controls. [34] The RFID PIA framework is very similar to the chosen one, but older and less developed. The PIA framework from the French CNIL could also have been used. It comes with methodology, templates and a knowledgebase [36]. The chosen framework is more scientific, and thus seems more fitting to the task at hand.

3.2 Privacy Impact Assessment

3.2.1 Is a PIA required?

I will use the PIA as a tool for evaluating privacy impacts on the Cybereason platform. However, firstly I will make an assessment as to whether a PIA would be required for such an application, both in terms of GDPR, and the Article 29 Data Protection Working Party (WP29) that we saw in section 3.1.2.

As mentioned earlier, a PIA is required when the data processing is likely to result in a high risk to natural persons. Having GDPR article 35(3) in mind [1], let us evaluate if a PIA would be required for the data controller.

(38)

A high risk would be envisioned when there is“systematic and extensive evaluation of personal aspects”(...) “ which is based on automated processing”

(...) (article 35(3)a). We do not know in which degree automatic processing or decision-making is based upon personal information, but we know that some personal information is being collected. The answer is probably yes, but maybe no to the question on whether or not they do produce legal effects upon the natural person.[1]

No special categories of dataare being collected in Cybereason as listed in GDPR article 9(1), so article 35(3)b would not lead to the requirement of the PIA. No systematic monitoring of publicly accessible areas is going on, so 35(3)c also does not apply.

Let us also look at the WP29 criteria for high privacy risk. The first one, evaluation and scoring is not as relevant, as any scoring is not done in terms of the affected user, rather in terms of security events. Automated decision- making (2) is definitely present, but it is not the personal information of the affected user that is interesting in these decisions.

Cybereason obviously checks off on the third criteria “systematic monitoring”. There is comprehensive real-time monitoring and personal information are being collected from the user, perhaps (and probably) without his awareness. Checking off on these criteria should already mean that a PIA (or a privacy evaluation) is required, but we will also look at the rest of the list.

No sensitive data is collected in Cybereason, such as health information or household activities (4). Data is being processed on a large scale, as criteria 5 outlines, both in terms of the number of data subjects, the volume of data and the duration. The aggregation of datasets (6) is a concern, and if not necessarily for the functioning of the application itself, at least if the information is exported outside the application and used for nefarious purposes.

Data is being collected from employees, which are defined as vulnerable data subjects by the GDPR (7). There may be an imbalance of power between the employer and employee in this regard.

Cybereason and its EDR and NGAV technologies may be seen as innovative use or applying new technological (...) solutions[29] but machine learning is usually just statistical analysis and behavioral analysis may not be anything revolutionary. The last criteria are not relevant to this discussion.

I think I have made the case that a PIA would be necessary both in terms of the GDPR and the WP29 high-risk criteria. If the resulting risk to the data subject is likely to be high by these standards, a privacy evaluation by the end user would also be necessary and possibly very helpful in understanding how the software processes his personal information.

3.2.2 Privacy Impact Assessment of Cybereason

The chosen PIA framework consists of seven steps, is based on the german BSI risk method [37], and is very similar to the RFID PIA framework. [31]

The seven steps can be seen in figure 3.1.

(39)

There should be a thorough description of the application, so there is a good fundament upon which to base the rest of the assessment. The next step is describing the privacy principles which may be vulnerable, or rather the privacy targets the application should uphold. These privacy targets should be evaluated and their relative importance should be assessed. For each privacy targets, threats to these targets should be identified. This should be based on the organization’s risk assessment, because each threat should be countered and mitigated by one or more privacy controls. [34]

Step 5 is the identification and recommendation of controls for each threat.

If some threats are not mitigated by any planned controls, this should be justified and documented. Finally, the whole process should be documented in a PIA report.

I will now do the the steps in the process in more detail.

3.2.3 Step 1: Characterisation of the application

The foundation for the privacy risk assessment is a thorough description of the application and its internal and external environments. The design and architecture of the system, its communication with other systems and data flows should be made clear. The goal is to describe the system as detailed as possible to detect all potential privacy threats.

The documentation of the system should be in the form of four views [34]:

• 1. System view: application and system components, hardware, software, internal and external interfaces, network topology;

• 2. functional view: generic business processes, detailed use cases, roles and users, technical controls;

• 3. data view: categories of processed data, data flow diagrams of internal and external data flows, including actors and data types;

• 4. physical environment view: physical security and operational controls such as backup and contingency.

To be complete, all four views should be included, and all system components and interfaces that process personal information should be described. All data flows should contain all actors transmitting the information. The already existing system description or design documents may need to be modified to focus on the data flows into and inside the system.

One important aspect of this step is finding out where the relevant system boundaries should be placed, and what should be included in the analysis. The PIA framework’s authors argue that the system boundaries are reached when “data flows end or none of the internally or externally adjacent systems are relevant for privacy” [34]. This means that as long as personal information is used or stored in the system, its components and data flows should be included in the analysis.

(40)

Figure 3.1: The privacy impact assessment process steps (Source:[34]).

3.2.4 Step 2: Definition of privacy targets

The second step is defining the privacy targets of the application in question. There is a substantial difference between privacy principles, legal privacy principles, and the description of a computer applications and its components and functions. Legal privacy principles are not easily translated into concrete targets, and privacy principles are more generic than descriptions of system functions [34]. The PIA framework focuses on privacy targets, which are concrete privacy goals that go in line with the new GDPR regulation. A table of mappings between privacy principles and

(41)

privacy targets is included, which takes into account laws and regulations (figure 3.3). These should be used as a guide to describing the privacy targets of the system.

In addition to focusing on the legal aspect, the PIA framework recommends stakeholder involvement. Stakeholders are important because of changing society and exiting laws may not take all ethical and societal effects into consideration [34].

The output of this step is to produce a list of privacy targets, where each target must be described in the context of the application being analyzed.

More targets may be added and explained in the same way.

3.2.5 Step 3: Evaluation of degree of protection demand for each privacy target

Each privacy target from step 2 should be ranked, and the priorities of the privacy requirements for the system should be identified. The level of “protection demand” is determined by the organization creating damage scenarios by asking“What could be impacted if privacy target was not met...?”

[34].

The impacts can be both on the owner of the system (the data controller) and to the data subjects. Authors on risk analysis such as Fritsch and Abie [frabe] also talks about the duality of the impact of privacy risks: The data controller may suffer reputation loss, leading to loss of branding or trust from customers, from fines or expenses due to legal actions, or lost opportunities due to exclusion and lost market share. The data subject may suffer a loss of reputation, dangers to its health and freedom and confusion about what others know, and loss of trust.

Since personal impacts of privacy exposures breaches are “softer” in nature, and such impacts are difficult to calculate, there is no quantitative calculation of monetary impact, as in other security assessment frameworks and standards [34]. The impacts are rated on a scale from 1 (limited) to 3 (devastating), which corresponds to degrees of protection required of low (1) to high (3), meaning that privacy targets with low impacts do not require the same strength in privacy controls as those privacy targets with devastating impacts.

3.2.6 Step 4: Identification of threats for each privacy target The output from step 3 is a list of ranked privacy targets, and their impact degree. In step 4, the potential threats to the privacy targets, and their probability of the threats materializing and preventing the realization of the privacy targets. The threats must correspond directly to one or more of the privacy targets to be included.

The framework does not specify how to identify these threats, only that they should be identified and mitigated by corresponding controls. There are a number of different threat lists, such as the study by ENISA [38], the PIA framework by the French Data Protection Office CNIL [32] and the threat modeling methodology LINDDUN [39] as well as lists of privacy

(42)

Figure 3.2: Privacy principles and privacy targets (Source: [34]).

(43)

breaches including Wikipedia [40] and research about costs of privacy breaches such as the study by Ponemon [41]. LINDDUN is one of the most well-known methodologies for threat modeling, and even though it uses another list of privacy principles as a goal, I will try to make use of it later when doing the threat identification and selection of controls.

3.2.7 Step 5: Identification and recommendation of controls suited to protect against threats

The most important step of the PIA is finding the right controls to mitigate the threats identified and minimize the risks to the privacy of the data subjects. The chosen controls are either technical or non-technical.

Technical controls are implemented in the architecture or functionality of the application, whereas non-technical controls may be administrative controls or internal policy. The framework provides a table with generic examples of controls for the list of privacy targets.

The other dimension of controls is preventative or detective. Preventa- tive prevents the privacy breach altogether, whereas detective controls aim to detect and notify when violations or attempted violations take place [34].

One control can mitigate more than one threat.

Since the most intensive controls require more work and resources, it is not possible to enforce the same amount of rigor for every control. The framework recommends three levels of rigor for controls: (1) satisfactory, (2) strong and (3) very strong [34]. The level of rigor required depends on the protection demand as determined in step 3, meaning that privacy threats with a high demand for protection combined with a high likelihood of occurring should be mitigated with very strong controls [34].

3.2.8 Step 6: Assessment and documentation of residual risks After all privacy controls are found, it should be analyzed whether they will fork and how effective they will be. Stakeholders should be invited to discuss this, and compare it with alternative controls. When all of the controls are evaluated, a plan should be made on how to implement them into the system requirements and design. [34]

Some threats may remain without mitigation and some threats may be only partly covered by a control. These threats are called residual threats.

These should be documented together with a justification for not addressing them completely. Whether a threat should remain unhandled or require a control depends on the overall risk management system of the organisation.

[34]

3.2.9 Step 7: Documentation of PIA process

The last step is to document the PIA process. This framework emphasizes meeting the expectations of the different groups reading the PIA report.

Corporate risk management, IT and upper management will need to be

(44)

invested in privacy risks, both because of financial liabilities, and because they may be held responsible if privacy breaches occur. [34]

Data protection authorities will in most countries have the legal right to review the PIA process, what risks the organization faces, and what the residual risks may be. Customers, media and other stakeholders will also have an interest in how personal information is processed, what the risks are, which controls have been selected and how the decisions have been taken. Additionally, when privacy breaches do occur, a comprehensive PIA report can help identify what went wrong, and who is responsible, and how to avoid a new breach. [34]

3.3 Threat Modeling

Threat modeling is creating models to find vulnerabilities in an application that may leave it open for exploitation by threats. When threat modeling, we try to find bugs and weaknesses early in the development, before they lead way to vulnerabilities. By creating models in this way, it helps us understand the application better, which not only results in a more secure application but also one that is more secure and trustworthy.

According to Adam Schostack, threat modeling consists of four essential questions that must be answered [42]:

Question Description

1. What are you building? Describe, create DFD 2. What can go wrong with it once its

built?

Find threats using STRIDE 3. What should you do about those

things that can go wrong?

Manage and address threats 4. Did you do a decent job of analysis? Validate results

Table 3.1: Threat modeling (Source: [42])

The first question is about modeling the application, using diagrams such as Data Flow Diagrams (DFD). The diagrams should have clearly separated trust boundaries. These are places in the application where the level of trust and reliability changes, and can also be seen as zones of differing security requirements and management. A DFD will be made when analyzing the Cybereason architecture in the PIA.

The second question is about different strategies for identifying threats to the application, from the perspective of assets, software or attackers.

Methods include the STRIDE mnemonic, attack trees, attack libraries and attack tools [43].

The mitigations an controls to manage the identified threats are dependent on the threats identified, but should be found and validated.

(45)

3.4 LINDDUN

The threat modeling questions by Shostack is focused on general security threats and not privacy threats. LINDDUN builds on security threat modeling and is a method of modeling privacy threats. The methodology may not be used as a standalone tool for evaluating privacy impact as it does not include methods for assessing legal compliance and risk assessment [44]

but can help identify threats when doing the PIA analysis.

LINDDUN is a methodology using Data Flow Diagrams (DFD) as the basis for threat modeling. The letters L, I, N, D, D, U, and N are an acronym for the seven threat categories Linkability, Identifiability, Non-repudiation, Detectability, Disclosure of Information, Unawareness and Non-Compliance.

I will go through these concepts so that they are clear when identifying threats.

The methodology provides an overview of common attack paths based on these categories and is presented as attack trees with potential causes of threats. Each tree in this presentation is related to one of the seven categories and applied to one of the three elements in the DFD. [45]

The output of the methodology is the mitigation strategies and privacy- enhancing solutions needed for protecting the privacy of the personal information in the system which is being analyzed.

Figure 3.3: The six steps of LINDDUN (Source: [45]).

As seen in figure 2.3, LINDDUN consists of six steps. First, a DFD is created based on a description of the system. There can be more than one DFD, depending on the detail and complexity of the system. Then privacy threats are mapped to the elements in the DFD. A catalog of common threats with corresponding preconditions can be used to identify threat scenarios.

The threats are then prioritized based on a risk assessment (which is not included in the methodology). For each threat, the corresponding threat mitigation strategy and privacy-enhancing solution, technology or requirement is chosen or implemented. [45]

LINDDUN is based on a few privacy-related concepts, which are presented below. Most of the information is from the tutorial which can be found on their webpage. [45]

End User Privacy Evaluation of Computer Software

End User Privacy Evaluation of Computer Software

—–

Asbjørn Klevjer

Thesis submitted for the degree of

Master in Network and system administration 60 credits

Department of Informatics

Faculty of mathematics and natural sciences

UNIVERSITY OF OSLO

End User Privacy Evaluation of Computer Software

—–

Asbjørn Klevjer

Abstract

Contents

List of Figures

List of Tables

Acknowledgements

Chapter 1

Introduction

1.1 Motivation

1.2 Contribution

Chapter 2

Background

2.1 Definition of Privacy

2.2 Three waves of privacy definitions

2.3 Privacy taxonomy

2.4 GDPR

2.5 Privacy By design

2.6 Analysis context

Chapter 3

Methodology

3.1 Privacy Impact Assessment

3.2 Privacy Impact Assessment

3.3 Threat Modeling

3.4 LINDDUN