• No results found

Privacy Analysis of Smart TV Communication

N/A
N/A
Protected

Academic year: 2022

Share "Privacy Analysis of Smart TV Communication"

Copied!
138
0
0

Laster.... (Se fulltekst nå)

Fulltekst

(1)

Privacy Analysis of Smart TV Communication

A case study of privacy threats in Smart TVs

Abdulaziz Abdugani

Thesis submitted for the degree of

Master in Informatics: Programming and System Architecture

60 credits

Department of Informatics

Faculty of mathematics and natural sciences

UNIVERSITY OF OSLO

(2)
(3)

Privacy Analysis of Smart TV Communication

A case study of privacy threats in Smart TVs

Abdulaziz Abdugani

(4)

© 2020 Abdulaziz Abdugani

Privacy Analysis of Smart TV Communication http://www.duo.uio.no/

Printed: Reprosentralen, University of Oslo

(5)

Abstract

The increasing popularity of Internet–connected TVs promises new conveniences, possibly introducing new privacy concerns. Smart TV vendors have the power to gather many types of information from consumers that use a Smart TV. Unlike traditional old TVs, many modern Smart TVs have sensors such as cameras, microphones and other types of sensors that constantly monitor details of consumer usage. There is a need to study how Smart TV vendors gather data about their consumers and how this information is transmitted through the Internet.

In this paper, five Smart TVs were put to the test to see if vendors follow their own policies. A single case study was conducted, where each Smart TV was monitored to see how each TV communicates with its vendors and other third parties while the vendor policies are accepted or declined.

This was tested in two states, in one state the privacy policy was accepted while in the other state, the privacy policy was declined. The collection of data was done by intercepting and capturing the traffic from the TVs on a local network. The collected network traffic was further filtered, sorted and fed into an analysis process.

The analysis process consists of an PII (Personally Identifiable Informa- tion) evaluation of the network endpoints which can have a direct relation to the privacy of the user. This is done by using the available data sources such as VirusTotal, McAffe and OpenDNS in addition to using sources from relevant research publications. The results for each TV are presented in tables with the relevant network endpoints and a PII classification.

This study also gives an insight to privacy and GDPR, by introducing privacy concepts and the relation to the data protection rules. Privacy policies for each Smart TV vendor were examined and each data type is presented with a PII classification.

The findings of this thesis show that Smart TVs communicate with PII related domains under a declined privacy policy. This is seen in the analysis chapter where an evaluation of each network endpoint is conducted.

Another finding, which also confirms the current research about the use of personal data and advertisement, shows many advertisement related domains on each Smart TV. This thesis ends with a discussion about the findings and a short section on working countermeasures.

(6)
(7)

Acknowledgments

The following thesis marks the end of my master’s degree in Programming and System Architecture at the University of Oslo. First, I would like to thank my supervisor Nils Gruschka, he has provided great feedback and guidance throughout this thesis.

I would also like to thank my family and friends for supporting me, especially my close friend Hamza Muftic for helping me and keeping me motivated throughout the project.

Finding Smart TVs to test has been challenging because of the Covid restrictions, I would like to thank my friends and neighbours for letting me test their Smart TVs.

(8)
(9)

Contents

List of Figures vii

List of Tables ix

1 Introduction 1

1.1 Motivation . . . 1

1.2 Problem statement & Objective . . . 1

1.3 Structure . . . 2

2 Background 3 2.1 Privacy . . . 3

2.1.1 Definition of Privacy . . . 3

2.2 GDPR . . . 5

2.2.1 Processing of sensitive data . . . 6

2.2.2 Privacy policy . . . 6

2.2.3 Privacy shield . . . 7

2.3 Personally Identifiable Information (PII) . . . 7

2.4 Privacy classification for IoT . . . 10

2.5 Network communication of Smart TVs . . . 11

2.5.1 HTTP . . . 11

2.5.2 DNS . . . 11

2.5.3 TLS . . . 11

2.6 Privacy in network traffic . . . 14

2.7 Smart TV . . . 15

2.7.1 Smart TV OS . . . 16

2.7.2 Android TV . . . 17

2.7.3 Tizen OS . . . 17

2.7.4 WebOS . . . 17

2.8 Smart TV security threats . . . 18

2.9 Smart TVs privacy issues . . . 20

2.9.1 Microphone and gesture sensor . . . 21

2.9.2 Web browser and cookies . . . 22

2.9.3 Automatic content recognition . . . 22

3 Data collection 25 3.1 Research methodology . . . 25

3.2 The data collection method . . . 26

3.3 Sniffing TLS communication . . . 26

(10)

3.4 Data gathering method . . . 29

3.5 Building data gathering method . . . 29

3.6 Data collection method setup . . . 31

3.7 Executing the data collection . . . 33

4 Analysis and results 37 4.1 Analysis method . . . 37

4.2 Comparison of vendor’s privacy policies . . . 39

4.2.1 Data types collected by vendors . . . 39

4.2.2 User’s privacy policy . . . 40

4.3 PII classification . . . 42

4.4 Captured traffic and analysis . . . 45

4.5 Sony TV Bravia 4K . . . 45

4.5.1 Idle mode . . . 46

4.5.2 Interacting with the TV . . . 47

4.5.3 PA and PD domain relation . . . 48

4.6 Samsung Q60 . . . 49

4.6.1 Idle mode . . . 49

4.6.2 Interacting with the TV . . . 51

4.6.3 PA and PD domain relation . . . 52

4.7 Samsung Q65 . . . 52

4.7.1 Idle mode . . . 52

4.7.2 Interacting with the TV . . . 55

4.7.3 PA and PD domain relation . . . 56

4.8 LG webOS TV SK7900PLA . . . 57

4.8.1 Idle mode . . . 57

4.8.2 Interacting with the TV . . . 59

4.8.3 PA and PD domain relation . . . 60

4.9 Philips 55PUT6101/12 . . . 60

4.9.1 Idle mode . . . 60

4.9.2 Interacting with the TV . . . 68

4.9.3 PA and PD domain relation . . . 69

4.10 Vendor vs ATS traffic . . . 70

4.11 Additional testing . . . 71

4.11.1 Third–party ad–domains . . . 72

5 Discussion 75 5.1 Analysis results . . . 75

5.2 Limitations . . . 77

5.3 Countermeasure . . . 78

6 Conclusion 79 6.1 Summary . . . 79

6.2 Future work . . . 80

Bibliography 81

(11)

A All of the captured domains 83

A.1 Sony Smart TV . . . 83

A.2 Samsung A Smart TV . . . 88

A.3 Samsung B Smart TV . . . 95

A.4 LG Smart TV . . . 105

A.5 Philips Smart TV . . . 110

(12)
(13)

List of Figures

2.1 TLS 1.2 handshake . . . 13

2.2 Smart TV OS 2018 marketshare (Source: Statista [30] . . . 16

3.1 Rooting attempt . . . 27

3.2 Simple overview of mitmproxy in the network . . . 28

3.3 ADB tool . . . 28

3.4 Permission denied . . . 29

3.5 Overview of the network setup . . . 32

3.6 Network flow after ARP–spoof . . . 33

3.7 Wireshark with filters . . . 34

3.8 Flow of the data gathering . . . 35

4.1 Analysis flow . . . 38

4.2 LG Privacy policies . . . 42

4.3 HTTP response from events.samsungads.com . . . 52

4.4 Advertisement on the main menu . . . 58

4.5 Cookies sent to cache.zeasn.tv under PD idle state . . . 63

4.6 GET requests to cache.zeasn.tv under PD idle state . . . 65 4.7 Total relation of packet size between vendor and ATS domains 70

(14)
(15)

List of Tables

2.1 Smart TV OS list . . . 17

3.1 Smart TV model list . . . 25

4.1 Data types provided by Smart TVs to vendors . . . 40

4.2 Data types provided by a user to vendors . . . 41

4.3 Privacy principles and concepts . . . 43

4.4 Smart TV PII classification concept . . . 44

4.5 Sony TV – Domains in idle mode PA state . . . 46

4.6 Sony TV – Domains in idle mode PD state . . . 46

4.7 Sony TV – Domains while using applications in PA state . . 47

4.8 Sony TV – Domains while using applications in PD state . . 48

4.9 Sony Smart TV PII related domains seen in both PA and PD states . . . 48

4.10 Sony Smart TV PII related domains only seen in PA state for both modes . . . 49

4.11 Samsung A – Domains in idle mode PA state . . . 49

4.12 Samsung A – Domains in idle mode PD state . . . 50

4.13 Samsung A – Domains while using applications in PA state . 51 4.14 Samsung A – Domains while using applications in PD state 51 4.15 Samsung Smart TV Q60 PII related domains occur in both PA and PD states . . . 52

4.16 Vendor PII related domains only seen in PD state . . . 52

4.17 Samsung B – Domains in idle mode PA state . . . 53

4.18 Samsung B – idle domains in PD state . . . 54

4.19 Samsung B – Domains while using applications in PA state . 55 4.20 Samsung B – Domains while using applications in PD state . 56 4.21 Samsung Smart TV B PII related domains occur in both PA and PD states . . . 56

4.22 Vendor PII related domains only seen in PD state . . . 57

4.23 LG TV - Domains in idle mode PA state . . . 57

4.24 LG TV – Domains in idle mode PD state . . . 59

4.25 LG TV – Domains while using applications in PA state . . . 59

4.26 LG Smart TV PII related domains occur in both PA and PD states . . . 60

4.27 Philips A – idle domains in PA state . . . 61

4.28 Philips A – idle domains in PD state . . . 62

(16)

4.29 Philips A - Domains while using applications in PA state . . 68 4.30 Philips A - Domains while using applications in PD state . . 69 4.31 Philips Smart TV PII related domains occur in both PA and

PD states . . . 70 4.32 Philips Smart TV PII related domains occur only in PA states 70 4.33 Samsung Voice - domains . . . 71 4.34 Google ad–domains . . . 72 4.35 Rakuten ad–domains . . . 72

(17)

Listings

3.1 Code snippet for finding unique DNS lookups . . . 30 3.2 Code snippet for finding total amount of transferred bytes . 31 3.3 Wireshark filters . . . 33 4.1 JavaScript code snippet from Zeasn ad sdk file . . . 66 4.2 JavaScript code snippet from Zeasn ad sdk file . . . 66

(18)
(19)

Chapter 1

Introduction

1.1 Motivation

Modern Smart TVs offer many comfortable features like voice control, access to online services, electronic program guide, social media integration etc.

However, many of these features come with a threat to the privacy of the user, because the Smart TV transmits a lot of information to the manufacturer, tracking services or other providers, revealing user’s behaviour, interests and desires. It is now harder to find a TV in the market without any smart functions.

Even though television mainly was made for national advertisement and for displaying media, modern TVs are capable of delivering much more direct advertisement with new smart features, features that can compromise users’ privacy by sending metadata to vendors.

There are many ways of implementing new ”Smart” features, and users today typically don’t use all of the features available. The possibility of unused apps that run in the background and consistently broadcast data to the internet might lead to security and privacy related consequences.

LG’s early Smart TVs were collecting information from consumers [24].

Mainly, it collected device ID, viewing information and USB filenames stored on the consumer’s external hard drive. Therefore, a consumer’s privacy is directly affected where sensitive information such as filenames is unnecessarily collected.

A Smart TV that does not respect consumer’s privacy can further lead to privacy related consequences. There are also many different security threats a Smart TV is exposed to. A smart home ecosystem is in itself is an asset that needs to be carefully protected, unauthorized access to built–in cameras and microphones could be used to compromise privacy in a smart home ecosystem.

1.2 Problem statement & Objective

The objective of this thesis is to try and find privacy related threats in modern Smart TVs. Privacy policies play a central role between the end–

user and the vendor. Therefore, the Smart TV is assumed to work differently

(20)

based on the state of the privacy policy. An average user is often not very concerned with what kind of data is sent to vendors and third–party companies. Various studies [11] and market observations have shown that, on the one hand, consumers attach great importance to keeping their personal data private. On the other hand, they mostly do not act in a data protection–conscious manner in everyday situations. This phenomenon, known as the privacy paradox, can be largely explained by the fact that the consumer does not receive essential information about data protection in relevant decision–making situations. This thesis will therefore test different Smart TVs under accepted and declined privcay policies, and classify the outgoing data relative to the user’s privacy.

Smart TVs and “Over the top” (OTT) platforms are the latest IoT devices found “spying” on users and leaking sensitive data to companies such as Facebook, Amazon, Google and Netflix [75]. Therefore, it is important to take a look at what kind of information about the consumer, TVs send to its vendor and to other third parties. This helps us to understand what impact it has on the privacy of the user. It is also necessary to research how a Smart TV operates in different modes and in idle mode under privacy policies.

With the introduction of GDPR and with consumers being more privacy aware, it is also important to look at how TV manufacturers respect and follow their own privacy policies. Therefore, this study will try to address the following research questions:

RQ 1: What threats to a user’s privacy do Smart TVs pose?

RQ 2:Does declining the privacy policy have any impact on the user’s privacy?

1.3 Structure

The next chapter will introduce and definePrivacyconcepts, and present the current state of the art research, explaining some of the software and hardware of Smart TVs. Common specifications of modern Smart TVs are given and an insight to security and privacy relations of the Smart TV OS is presented. Smart TV leading manufactures and the privacy policies of each vendors will be discussed. Further, some important terminologies will be introduced and explained. Smart TV OSes will be presented for each TV with a short introduction to each OS.

A chapter presenting an introduction to the data collection method is provided. The Data collection chapter focuses on research method and how data from each TV will be collected. Further, the Analysis and Results chapter presents the gathered data with an evaluation for each network endpoint. The Analysis chapter consists of two parts, first a definition of PII (Personally Identifiable Information) classification is given, that suits the data gathered from TVs, then the network data is evaluated using classification concepts along with other analysis tools.

The thesis will end with a conclusion trying to answer the proposed research questions. An appendix is given at the very end with relevant data that was used in the analysis process.

(21)

Chapter 2

Background

This chapter presents an introduction to privacy and GDPR, followed by the current IoT privacy related research. An introduction to Smart TV OS along with common integrated technologies in a modern Smart TV is given, followed by a presentation of current Smart TV security threats and privacy issues.

2.1 Privacy

The goal is to have a clear understanding of how PII (Personally Identifiable Information) is related to the data used in the analysis chapter, where a simple PII classification concept is created based on theories presented in this chapter. Therefore, an introduction is given to privacy, the relation between privacy and IoTs (Internet of Things), and how privacy is further used in this thesis.

2.1.1 Definition of Privacy

Privacy is a complicated concept, the definition of Privacy is very well described by author Levente Buttyán [13]. Privacy is described as information one can control when, where, how information about oneself is used and by whom. The concept of privacy is often addressed with a combination of technical and legal means. Privacy is not about hiding the individual’s personal information fromeveryone, since authorized parties under well defined circumstances need access to personal information [13].

For instance, medical doctors need to look at patient’s personal information and medical record. However, it is clear that not everyone should have access to one’s personal information and medical record. The problem of privacy occurs once personal information has fallen into wrong hands.

Hiding personal information from unauthorized parties is therefore an important act to make sure that privacy is controlled and maintained correctly.

The current understanding of privacy is often linked to freedom, democracy, and when an individual defines what is sensitive or unique to them. An unauthorized intrusion therefore violates privacy and causes

(22)

a great need for protection of these concepts. It is however far from trivial to ensure a proper privacy protection for users because of how chaotic the concept of privacy may be. People often are not able to give a precise definition of the privacy term, but many countries have now adopted laws related to the right of privacy. The 1948Universal Declaration of Human Rights article 12 declares the following:

”No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation. Everyone has the right to the protection of the law against such interference or attacks.” [84]

One of the most common definitions of Privacy is one from Alan Westin’s book in 1967 [27], states thatPrivacy is the claim of individuals, groups and institutions to determine for themselves, when, how and to what extent information about them is communicated to others”. In most countries, privacy is now considered as a basic human right to guarantee personal autonomy and human dignity. In Germany (German Constitutional Court, 1983), privacy has been defined as a right to Informational Self Determination [33], a right of individuals to make their own decisions as regards the disclosure and use of their personal data. Meaning that if individuals are monitored and profiled without limitations, then the privacy rights would be breached and violated.

From the year 2000, the Charter of the Fundamental Right of the European Union, article 7 (Respect for private and family life) and article 8 (Protection of personal data) brings together the fundamental rights to privacy for Europe [7]. The concept of privacy has different dimensions [17]:

• Informational privacy, where the individual controls how the data can be gathered, stored and processed.

• Privacy of communications, covers the security and privacy of mail, telephones and other forms of communications.

• Spatial privacy, is protecting the users personal space against hacking or other type of intrusion.

• Territorial privacy, which concerns the protection of the environment around the individual or close physical are surrounding the person such as work or public place.

• Bodily privacy, is related to protection of data about individual’s physical body and health status.

• Information privacyandPrivacy of communicationsare the ones that are mostly related to data protection rules.

Data protection rules ensure the security of individuals’ personal data and regulate the collection, usage, transfer, and disclosure of the personal data.

Vendors that are located in EU or that collect, store or transmit personal data of people situated in the EU must comply with GDPR.

(23)

2.2 GDPR

General data protection regulation (GDPR) is a set of European regulations for handling of the consumer data [20]. The regulations were enforced by The European Union and its Member States where each country is required to apply for an independent public Data Protection Authority (DPA). GDPR restricts private businesses and state administrations from processing, collecting or sharing personal data without consent. GDPR comes from the concept of the right to privacy. GDPR also regulates that data has to be stored and processed securely with use of cryptographic encryption. If a company does not comply or operate by the GDPR regulations, then there will be fines issued by the DPA. For such cases, DPA often collects relevant evidence launching an investigation which takes a significant amount of time, but if approved, a fine or a penalty will be imposed ranging from a few thousand to several million euros depending on the severity of the case [16]. The fine framework can be up to 20 million euros or up to 4 percent of their total global turnover of the preceding year [25].

By proposing the GDPR [20], the Commission aims to increase the trust in use of information by EU users, while also protecting the fundamental rights. In other words, GDPR is trying to ensure the trust between the consumer and the company, the regulation also provides some economic advantages to companies. The EU GDPR was set in motion on May 25th 2018 and replaced the EU Directive 95/46/EC which also had relative data protection regulations but GDPR is now more focused towards protection and is directly applied in each Member State.

There are some exceptions that GDPR does not apply for, these are the cases of public security and of criminal law enforcement (EU Police Data Protection Directive 2016/680). Another exception is for individuals where data processing is carried out privately or for household activities.

The private personal data in GDPR is directly linked with any information relating to an identifiable natural person, these could be names and addresses, IP–addresses, Web cookies, location data etc. A "Controller"

in GDPR, is a legal person or a public authority which determines the purposes and means of the processing of personal data. In many cases, a cloud server is seen as a "Data Processor" which collects and processes personal data on behalf of the Controller.

Overall, the GDPR regulation focus on the lawfulness, fairness and the transparency of processing personal information. Data should always be adequate, accurate and relevant to what it is necessary for. The data storage minimisation or limitation needs to be presented and applied. The data also needs to be processed in a manner that ensures appropriate security to protect the integrity and confidentiality, and the Data Controller is responsible for demonstrating the compliance.

There are also some important requirements for the lawfulness and consent in GDPR. Processing personal data always requires a legal ground in form of consent or if the law already addresses the requirements.

(24)

2.2.1 Processing of sensitive data

The sensitivity of personal data is mainly influenced of how the data will be further used and its purpose. According to the principle of proportionality [91, 52], data collection and sharing should be minimized related to how adequate and relevant the processed data is. This also means that data should be deleted if data is not needed any longer and the data storage should also be minimized. Another important aspect of the privacy principle is the principle of purpose specification [52], which means that data should only be collected and later used for specified purposes. According to the OECD [52], there is no data that can be called or given as a non–sensitive data. Dependent on the purpose and the context of use, all type of data may be listed as sensitive. Therefore, even public information such as an address or a name can become a highly sensitive information. This also means that data, if collected, needs to be clearly addressed for its collecting purposes and the use of data for any other purpose is therefore illegal.

2.2.2 Privacy policy

It is a widespread phenomenon that companies use uniform data protection regulations for all possible applications [12]. For example, Samsung writes in its data protection policy [63] that thePrivacy Policyapplies to all Samsung devices and services, from cell phones and tablets to TVs, home appliances, online services and more. This type of approach is understandable as far as online services are in question that can be accessed via several types of devices. However, if the consumer only uses one device and other services are not of interest, then the relevant set of policy rules will become more confusing. Anyone who uses Android TV for instance, does not necessarily have to use services such as Google Maps, Gmail or Google Photos. In the case of cross–device and cross–service data protection regulations, it is difficult to describe exactly what data is actually processed in the context of use. This can also be seen in Google’s privacy policy:

”The data we collect includes unique identifiers, the type and settings of the browser, the type and settings of the device, the operating system, information about the cellular network such as the name of the cellular provider and telephone number, and the version number of the app.

We also collect data about the interaction of your apps, browsers and devices with our services. These include the IP-address, crash reports, system activities and the date, time and referral URL of your request...”

From the consumer’s point of view, this formulation is questionable in several respects. On the one hand the list of examples contains data that is not collected for devices other than smartphones. On the other hand, the list is particularly important for more sensitive data categories [12]. In addition, the list is obviously incompletely formulated, the information is therefore presented to the consumer as a non–transparent "black box". The GDPR, requires the use of "understandable language" for the consumer in Article 12.

(25)

2.2.3 Privacy shield

In year 2015–2016, the U.S.–EU Safe Harbor Framework was updated with new laws and regulations and was called Privacy Shield [64]. Privacy Shield provides a privacy framework for companies to transfer personal data to the United States from EU with consistent and compatible laws with EU.

However, on July 16 2020, the Court of Justice of the European Union (CJEU) has invalidated the EU–U.S. Privacy Shield [83]. The Court (CJEU) ruled the Privacy Shield did not include sufficient enough limitations to ensure the protection of EU personal data from access and use by U.S. public authorities. Which is why the Court immediately invalidated the Privacy Shield, which can no longer be used for EU–US data transfers. The reason behind this decision is the overall challenges to U.S. privacy practices where the protection of personal data is a fundamental right in the EU, similar to the constitutional right in the U.S. The General Data Protection Regulation (GDPR) enshrined these fundamental rights and established uniform data protection standards across the EU designed to protect the personal data of EU–based individuals. [65].

2.3 Personally Identifiable Information (PII)

In this thesis, the term PII will be used instead of Personal Information/Data, the term will be used in the analysis chapter where PII related network endpoints are identified. The term PII is mainly used within the U.S. while Personal Data is considered to be the equivalent of PII in Europe [56]. NIST has a great definition of what Personally Identifiable Information (PII) is defined by the following description [44]:

... any information about an individual maintained by an agency, including (1) any information that can be used to distinguish or trace an individual‘s identity, such as name, social security number, date and place of birth, mother‘s maiden name, or biometric records; and (2) any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information.

Article 4 of GDPR has the following definition of Personal Data [5]:

... Personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person...

The PII and personal data terms are different as seen in the definition of each term. Personal data in GDPR covers a wider range of information than PII defined by NIST [56, 82]. For instance, location and GPS data is considered as personal information in GDPR while this type of information

(26)

is not directly mentioned in the PII definition by NIST. GDPR also states that cookies can be considered as personal information [68].

NIST have listed some examples of data that may contain PII [44]:

• Name, such as full name, maiden name, mother‘s maiden name, or alias

• Address information, such as street address or email address

• Asset information, such as Internet Protocol (IP) or Media Access Control (MAC) address or other host–specific persistent static identifier that consistently links to a particular person or small, well defined group of people

• Telephone numbers, including mobile, business, and personal numbers

• Personal characteristics, including photographic image (especially of face or other distinguishing characteristic), x–rays, fingerprints, or other biometric image or template data (e.g., retina scan, voice signature, facial geometry)

• Information identifying personally owned property, such as vehicle registration number or title number and related information

• Information about an individual that is linked or linkable to one of the above (e.g., date of birth, place of birth, race, religion, weight, activities, geographical indicators, employment information, medical information, education information, financial information).

Both PII descriptions by NIST and GDPR will be used as a merged definition of PII for this thesis. This is further described in the analysis chapter where PII classification is created.

PII can be further be separated into different sections such as linked information,linkable information[56] andsensitive PII[82]. Linked and linkable data differs where linkable data is information that on its own may not identify a person, but when combined with another piece of information could identify, trace, or locate a person [82]. While linked data is the information that directly is connected to an individual. sensitive PII is data where loss, compromisation, or disclosure without authorization of this data could result in harm, embarrassment, inconvenience, or unfairness to an individual [44, 82]. This type of data is often linked to medical, educational, financial, and employment information. Similarly, GDPR has its own classification for sensitive data in Art. 9 Processing of special categories of personal data [6, 82]. The following is an example list of special data categories defined by GDPR Art. 9:

• Personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs trade–union membership

• Genetic data, biometric data processed solely to identify a human being

(27)

• Health–related data

• Data concerning a person’s sex life or sensitive data

One amoung many companies that collect data is Google [32]. Google specifies what data they collect [31] and many of these data types can be considered as PII. Everything a consumer watches, searches, which ads that are clicked on and the location data are all of the information Google collects when using Google services. There is also information that user provides which are the following:

• Name, birthday and gender

• Password and phone number

• Emails sent and received

• Photos and videos saved

• Comments posted on Youtube

• Contacts addressed

• Calendar activities

Also, companies need to get consent from a consumer to collect PII and personal data, they would need to get consent before a consumer viewed a page containing ads, which is impractical. Companies that collect PII or personal data from consumers need to decide whether they are a data controller or a data processor, especially if they operate in countries bound by the European data–protection law (GDPR).

2.4 Privacy classification for IoT

Since a Smart TV is connected to the internet, it is common to recognize a Smart TV as a part of IoT. According to research from Gartner [29], it states that by the year 2020, there will be around 20 billion IoT units, mostly in regions such as Western Europe, North America and China. IoT devices are used for a variety of reasons that most likely make our day–to–day life easier. Researchers from Princeton University [96] found that many of medical IoT devices leak personal information in clear text. The analysis was done by using an open–source tool called Princeton IoT Inspector [59].

The IoT Inspector has the ability to scan and detect IoT devices in the local network.

One way to classify general privacy of IoT into different level categories, is proposed by researchers from Hong Kong University of Science and Technology and Beijing University of Posts and Telecommunications in a research paper [40]. The classification was based on the big data of queries from the "Baidu knows" search engine. In this proposed research paper, each stage describes the security consequence level. Level 1 privacy for

(28)

instance includes the public information similar to an address information or a phone number. The level 2 consists of information containing personal data such as age, height, weight, current location as well as legal and social identity information, these could be social security numbers, passport or driving license information. Level 2 also includes the financial information such as bank accounts, credit card numbers and personal property material data. The last but not least, is the level 3 privacy, which mainly composed of the information directly corresponding to person’s biological biometric data such as the fingerprint, face–ID, as well as identification card or number and the internet protocol (IP) address. If level 3 privacy information is compromised or leaked, the consequences might lead to serious identity theft.

The sensitivity of privacy is very closely related to the understanding of consequences for any leaked private information. There is a degree for the sensitivity of privacy [40] and it is different for each individual person. Common consumers usually trust companies that his or her data is maintained in a safe manner even though not everyone is able to check and research if an IoT product leak sensitive data.

Privacy vulnerabilities of Encrypted IoT traffic was examined in a Princeton University research paper [4]. Network traffic from the following smart home IoT devices was collected, Sense sleep monitor, a Nest Cam Indoor security camera, a Belkin WeMo switch, and an Amazon Echo. The analysis was based on how the device operate in the network with or without user interaction. They found that network traffic rates of all IoT devices revealed user activities, making it apparent that encryption alone does not provide adequate privacy protection for smart homes. The research specifies that TLS inspection is not used, but rather the analysis relies on metadata such as IP packet headers, TCP packet headers, and send/receive rates. This type of metadata is also collected by ISPs (Internet Service Providers) for traffic analysis.

2.5 Network communication of Smart TVs

The analysis will be executed on the outgoing Smart TV network communication. Smart TVs use standard network protocols, for instance in the application layer DNS and HTTP i used, and TLS in the presentation layer. Therefore, this section gives a short description on how TLS 1.2 works, what DNS and HTTP are.

2.5.1 HTTP

The Hypertext Transfer Protocol (HTTP) is an application–level protocol for distributed, collaborative, hypermedia information systems [37]. HTTP is the foundation of data communication for the World Wide Web, where hypertext documents include hyperlinks to other resources that the user can easily access. It is designed to permit intermediate network elements to improve or enable communications between clients and servers.

(29)

2.5.2 DNS

The Internet relies on DNS, DNS stands for Domain Name System and is used on the Internet to correlate between IP–addresses and readable names [98]. A more detailed description about DNS can be found inRFC2929. DNS privacy issues have been examined inRFC7626. Currently, most of the DNS queries is not encrypted and is sent in clear text over UDP.

2.5.3 TLS

The usage of TLS (Transport Layer Security) has increased in Smart TVs.

TLS provides communication security over network and has many strong cryptographic algorithms specifically designed for this purpose [22]. The goal is to provide both privacy and data integrity between two or more communicating parties.

TLS protocol accomplishes three main components: Encryption, Authentication, and Integrity [93].

• Encryption: hides the data being transferred from third parties

• Authentication: ensures that the parties exchanging information are who they claim to be

• Integrity: verifies that the data has not been forged or tampered with Encryption is done by using one of many supported cryptographic algorithms. Authentication is achieved by using the certificates. A TLS certificate is issued by a certificate authority (CA) to the person or business that owns a domain. The certificate contains important information about who owns the domain, along with the server’s public key, both of which are important for validating the server’s identity. TLS makes Man–In–The–

Middle attacks difficult because of the authentication phase with signed certificate. Once data is encrypted and authenticated, it is then signed with a message authentication code (MAC). The recipient can then verify the MAC to ensure the integrity of the data [93].

There is also another problem that TLS solves with an extension called Server Name Indication (SNI). If a server hosts multiple websites, each with its own TLS certificate, the server then does not know which exact certificate to display for the client who is trying to connect to a website. SNI solves this problem by specifying the hostname or domain name during the TLS handshake [92]. Figure 2.1 shows an overview of TLS 1.2 handshake.

(30)

Figure 2.1: TLS 1.2 handshake

As of August 2018, TLS 1.3 was released [34]. There are many important improvements with TLS 1.3, but most significant improvement in TLS 1.3 is that it is more secure by providing perfect forward secrecy. TLS 1.3 uses the Ephemeral Diffie–Hellman key exchange protocol as default, which generates a one–time key that’s used only for the current network session.

The key is discarded at the end of the session. Also, the handshake process is reduced which means that this version is slightly faster than the previous 1.2 version. TLS 1.2 required two round–trips (RTT) to complete the handshake, while TLS 1.3 only need one RTT. Round–trip time (RTT) is measured in milliseconds and is a duration time it takes from a client’s request to the response received from the server. Another difference is that in TLS 1.2, certificate is sent over in plain text while in TLS 1.3 it is encrypted.

It is important to notice that TLS is difficult to decrypt without installing certificates on client hosts, which is why [50] researchers usedbest-effort TLS interception, by rooting Amazon Fire TV, manually installing the root certificate and decrypting all of the encrypted traffic. It showed that many of the apps send personally identifiable information (PII) to third parties and platform domains. On Roku TV, only HTTP traffic was analysed, since there was no methods to get access to system [50, 85].

2.6 Privacy in network traffic

Privacy in communication networks is about controlling personal identifi- able information (PII) in the network. The analysis of network traffic can be categorized into encrypted and non–encrypted communication. Encryption

(31)

is used to exclude unauthorised parties having access to PII type of data.

This ensures that only authorized parties are able to decrypt data, however, there are cases where even if the data is encrypted, it is still possible to observe who the receiver and the sender is by inspecting metadata. The header information is used in network communication to route the message to its destination address, this type of metadata is not encrypted by de- fault. Personal data and interest can be revealed by looking at unencrypted metadata such as, the name of destination server, the size/length of each packet, as well as the rate of occurrences and duration of communication session [13].

Design guidelines for preserving privacy of IoT was issued by [58]. The guidelines introduced can be used in govern privacy concerns of smart homes, healthcare, public safety and supply management. The research provides an insight into requirements of privacy that needs to be integrated in the development of privacy frameworks, which can partially be used in this thesis’ privacy measurement context. An analysis of privacy disclosure in DNS queries has been done by researchers of Kyushu University [98].

The analysis results show that DNS queries can potentially leak sensitive information about the user. This type of information shows what and where a user/client wants to connect to, which reveals consumer’s interests. The unpleasant side effect of DNS queries as stated in the research [98] is that there are third parties that collect this type of public information.

As stated in RFC7626 [9], DNS requests received by a server can be triggered by different reasons.

• Primary Request: This is the main domain name in the URL

• Secondary Requests: These requests are additional requests performed by the user agent without any direct involvement or knowledge of the user

• Tertiary requests: additional requests performed by the DNS server itself

The data in a DNS request consists of different fields, two of these data fields are important when it comes to privacy issues, the IP-address and the DNS query name. The DNS query name is the full name sent by the user and reveals information about what the user does [9]. This type of information can be considered as user log data which is collected by Smart TV vendors and third parties. The IP-address in itself is not directly tied to PII but combined with other type of data can reveal user’s behaviour, this is further discussed in the analysis chapter.

The lack of privacy in DNS has an impact in security, therefore there are standards developed that is able to ensure privacy, these are DNS over TLS (DoT) and DNS over HTTPS (DoH) [23]. The main difference between these two standards are the usage of different ports, DoT uses port 853 while DoH uses port 443 which is standard HTTPS traffic.

As seen in the previous section, TLS assures encryption of data and it is difficult to extract this information without decrypting the data. However, a

(32)

study [86] shows that it is still possible to analyse and classify the encrypted traffic. The study starts by introducing different encryption protocols and presents how one could extract information. Two types of information are common to most protocols that can be extracted. The first type covers the connection itself, and its properties exchanged in the initial handshake. The second type covers communicating peers’ identifiers, which are exchanged in the authentication phase [86]. Information extraction from encrypted traffic can provide valuable information such as extracting the Server Name Indication which can be used by a home router’s firewall to filter traffic.

This method will be utilized in the analysis chapter.

2.7 Smart TV

An IoT device is considered ”Smart” if the device has computing hardware that is able to run an operating system (OS), handle data from various sensors and the possibility for internet access. Smart TVs are often closed systems, some of these run on Linux with an application called ”exeDSP”

on top of the kernel. ExeDSP application is responsible for handling and controlling all of smart functions that a Smart TV needs.

A Smart TV or an Over–the–top (OTT) television device, is defined as an internet connected TV that offers regular traditional TV with additional features such as installation of applications, possibility of accessing the internet with a web browser, and many other integrated functions. A Smart TV also allows to connect other OTT devices, Blu–ray players, game consoles and other network–connected interactive devices that utilize television–type display outputs. Modern Smart TVs come with Wi–Fi and ethernet ports to easily connect the TV to the internet, but some TVs also leverages the use of traditional systems such as cable and satellite to receive television content. The main difference between an external Smart TV box/stick (OTT television) and a regular Smart TV is that the OTT device is more portable and is able to connect to any type of a monitor.

Smart TVs also support installation of apps, the installation of an app can either be from a USB drive, app stores or even from a user–provided web server [47].

A Smart TV is considered to be a part of Internet of Things (IoT) in research publications [69, 71]. This definition will be further used in this thesis by presenting privacy and security aspects of Smart TV as an IoT device. IoT devices are growing rapidly over the years, almost all of the electronic devices we use daily are directly connected to the internet, without any filtering of possible sensitive information that is sent over the line. The use of microprocessors in internet connected devices has become ubiquitous.

Modern IoT devices may collect and send information [4] to the vendors and other third parties, as the devices are heavily equipped with smart sensors such as motion sensors, built–in microphones, cameras and other types of sensors. The information from these sensors has major privacy implications, especially when this type of data is sent to third parties that collect and track information about consumers. Since there are no

(33)

regulations on what a Smart TV (or an IoT) can collect and send over the internet, it is assumed that consumers generally value functionality over privacy. IoT are made to be simple and easy to use so that the task is executed or completed without any concerns about security. Privacy is an important aspect of information security. Ensuring privacy is becoming more difficult after such fast growth of IoT market.

2.7.1 Smart TV OS

In this section, a short introduction to current operating systems (OS) of Smart TVs is given. It is important to have an understanding of the type of Smart TV operative systems that exists and see if there are any differences between the popular OSes [2]. Figure 2.2 shows the most popular Smart TV OSes worldwide in year 2018 [30], where Android OS is the most popular Smart TV OS used.

Figure 2.2: Smart TV OS 2018 marketshare (Source: Statista [30]

)

Since internet is the main source where Smart TVs get content from, these devices are therefore depended on internet access.

Most popular tech companies that make Smart TVs are Samsung, LG, Sony, Philips, and Panasonic. The technology used for manufacturing these Smart TVs are often different both in hardware and software. There is one thing that is common in all types of Smart TVs, it is the possibility of installing and using different applications, the installation file often comes from an integrated vendor App store. Android TV for instance, which is an OS used by Sony and Philips TVs, does give the users the ability to install apps through Google Play Store. It is the operating system (OS) that controls the installation from an App store. Smart TV vendors use different operative systems, Samsung’s Tizen OS and LG’s WebOS use Linux as a base kernel.

(34)

Table 2.1 shows operating system for each Smart TV vendor.

Vendor OS

Samsung Tizen OS

LG webOS TV

Sony Android TV

Philips (modified) Android TV, WhaleOS

Amazon Fire TV

Panasonic Firefox OS

Table 2.1: Smart TV OS list

2.7.2 Android TV

Different types of Android OS with detailed background and technical information is described in paper [14]. Android TV was launched by Google in 2014 as a newly configured version of its successful Android mobile OS.

There are many Smart TV vendors that choose Android TV as the main OS for the TV. Android TV appears as a restricted version of Android for mobile phones, where the interface seems to be specially designed for use with a remote control and the settings are very similar to Android mobile settings.

2.7.3 Tizen OS

Early Samsung Smart TVs were running a Linux application called exeDSP [46]. Tizen OS [42] is an open–source software developed by Samsung Electronics for different media platforms such as wearable Samsung smart watches, cameras and TVs. The operating system Tizen OS for Smart TVs was released in 2015 and has been updated over the years.

2.7.4 WebOS

WebOS is different to other Smart TV systems, mainly because the OS uses web–based applications meaning that most of the applications are developed in HTML, JavaScript and CSS. This means that WebOS, as the name suggests, needs an internet connection to be fully functionable.

2.8 Smart TV security threats

In this section, a small introduction to current security threats will be given along with the common hardware of Smart TVs. Security vulnerabilities lead to privacy exposures and vice versa. As seen in the previous privacy section, GDPR and other privacy laws require companies to keep the data safe through security. If a company collects, processes and stores personal data, then privacy relies heavily on the security related measures. To keep the privacy of the user safe, Smart TVs need to be secured from unauthorized parties or actors.

(35)

Early Smart TVs had slow hardware and greatly lacked usability of apps [46], which lead to low usage of these integrated apps by users. Current models of Smart TVs are catching up to the complexity of a modern smart phone. A lot of gesture sensors are now included in addition to cameras and microphones. A typical standard mid–range priced Smart TV often has an ARM processor, with around 500MB of RAM and 1 to 2GB of flash memory. Smart TV vendors focus more on the usability rather than focusing on security of the devices.

Currently, there has not been many cases where a Smart TV is compromised, and an attacker gains remote network access. The security of a Samsung Smart TV was tested and attacked for research purposes to show how feasible the attack could be [48]. Legacy Smart TVs are known to have different software vulnerabilities [48], an attacker is able to gain full control over the Smart TV by injecting a malicious media file which exploits vulnerability in FFmpeg library. FFmpeg is an open–source software library and has a lot of tools for handling and processing of audio and video data.

Vendors like Samsung and LG make use of FFmpeg libraries, unfortunately there are over 300 CVEs for FFmpeg [19]. Any Smart TV that uses FFmpeg project’s libraries are therefore exposed to a variety of attacks, depending on the version of FFmpeg [48]. Even though there are different vulnerabilities, it is not common for hackers to attack Smart TVs. However, early Smart TVs are guaranteed to have vulnerabilities found in the present year [73].

Therefore, privacy concerns are increased with elder Smart TVs.

Important differences between the legacy TV systems and the modern Smart TVs are presented in [2]. The research shows how a Smart TV is more capable of capturing private data, analysing the smart TVs from both hardware and software perspectives. Moreover, the paper also presents a study of issues and challenges faced by Smart TV viewers, which includes the issues related to interactivity, content overloading, privacy, and security.

Smart TV vendors always limit the users’ experience by eliminating access to the system, meaning the user can only interact with what is given and there is no full system access. However, there is a risk of ”bricking” the Smart TV if the user has root privileges. Rooting a Smart TV OS is possible, just like rooting an Android OS or jailbreaking an Apple iOS. By rooting a device, user gains full access to the system and its recovery mode, this will in most cases break the warranty. There are different community forums that offer rooting tools specially designed for Smart TVs, one of the popular forums is theSamyGO forum which offers a variety of tools including a modified version of exeDSP [73].

A research from Noroff University [80], did an analysis of the security for two LG Smart TVs model 42LS570T–ZB and model 55LA740V. The analysis process proposed for the LG Smart TV [79] suggests 10 steps for acquiring potential information and outlining the functionality by highlighting some of the problems, including some security issues.

There is also an article from Consumer Reports [70], where they analysed different Smart TV models and concluded that disagreeing with vendors’

privacy policies will cause the TV to lose basic functionality. The security analysis of Samsung and brands that use Roku TV platform (2018 Smart

(36)

TV models) showed that these were vulnerable for web–based attacks.

According to the analysis, devices that use Roku TV are poorly secured because of remote control APIs that are enabled by default. Consumer Reports also conducted a survey [70] of subscribers who owned a Smart TV, 38000 Smart TV owners completed the survey. Around 51% were worried about the privacy implications and 62% were concerned about the overall security of a Smart TV.

It is possible that a Smart TV might have serious security vulnerabilities that a hacker might use and exploit. One of the most dangerous cyberattacks could be when an unsecured TV is hacked and used as a spying tool since the TV is perfectly equipped with the right tools such as microphone, camera and even more. Even the FBI [54] have issued a warning about the Smart TV cyberattack possibility and addressed some steps on how one could prevent such attacks.

Smart TVs are often placed in sensitive areas such as bedrooms or living rooms. According to the Wikileaks [21], CIA agents used a malware called Weeping Angelthat was able to run in the background as a ”legitimate” Smart TV app. The malware was designed for a Smart TV from Samsung F8000 series. An article from Forbes newsletter magazine [10] clarifies that the malware infiltrates the TV physically via USB but there is also a possibility that CIA has remote infection techniques. Weeping Angelis able to record audio from the built–in microphone both when the TV is in a ”powered on and off” states. The malware never turns off the TV, rather fakes it by dimming down the screen and making the LED–indicator to mimic a powered off state. The recorded audio is stored as files, the TV will only send these files to a ”CIA Wi–Fi” hotspot nearby. It is also important to notice that if the TV uses Wi–Fi as internet source, thenWeeping Angelwill also extract Wi–Fi credentials.

2.9 Smart TVs privacy issues

In an article published by The Washington Post, four of the most popular Smart TVs were tested to see how these TVs record everything a user watches [15]. The research was done on Smart TVs from Samsung, TCL Roku TV, Vizio and LG. Each TV was given data policy permission and the IoT Inspector tool was used to capture the transmitted data to vendors and other thirds parties. The conclusion was that some TVs send data every second and others once per specified period of time. The data is often a fingerprint or a screenshot of what the user is watching on the TV.

The main motivation for vendors collecting user data, is to deliver more targetable advertisement. This type of consumer data is valuable as the data is shared and further sold to other third–party companies. VIZIO Inc, is one of the largest manufactures and sellers of Smart TVs, has collected data of 11 million TVs without consumers’ consent from year 2014 to 2016 [87]. In 2017, since VIZIO failed to inform consumers about the "smart" setting that collects viewing data, they had to pay 2.2 million dollars to settle chargers with the Federal Trade Commission (FTC).

(37)

A recent study [85], shows how advertisement services and Smart TV applications operate. The analysis of 57 different Smart TVs showed that apps communicate with many different advertising and tracking services.

Some of these advertising organizations only appear on certain platforms.

The research also addresses PII Exposures in Smart TVs. They identify PII values (such as advertising ID and serial number) for Amazon Fire TV and Roku TV. Both TVs showed PII exposures where advertising ID alongside serial number and device ID was sent to third–parties and platform–specific party (Amazon). The research then evaluated the DNS–based blocklists such asPi-hole,Firebog,MoaABandStopAd Smart TV (SATV)to see how effective these were. Their results showed that some of the advertisement services were missed (false negatives) and more aggressive blocklists suffered from false positives where the app functions would simply fail.

The leakage of PII values from Amazon Fire TV as seen in a conference paper [50] was confirmed. Both research publications [50, 85] had automated scripts that simulated a user using the application, this would in some cases speed up the research process. The analysis of domains was done by using VirusTotal, McAfee and OpenDNS [85].

Another research study [69] examined IoT devices where Smart TVs were included, they addressed research questions such asWhat is the destination of network traffic? What data is sent in plain text and what content is sent using encryption?andDoes a device expose information unexpectedly?. The research findings show that 72 of 82 IoT devices, including the Smart TVs, have at least one destination that is not a first party. 56 % of the US devices and 83.8%of the UK devices contact destinations outside their region, all devices expose information to eavesdroppers via at least one plaintext flow, and a passive eavesdropper can reliably infer user and device behaviour from the traffic (encrypted or otherwise) of 30/81 devices [69].

Smart TVs appear mainly in two different forms, a regular Smart TV and an external (box/stick) Smart TV also called Over-the-Top (OTT) streaming devices, that offer an alternative television with subscription. Roku TV and Amazon Fire TV are well known OTT streaming devices that offer special streaming content compared to a usual Smart TV. In a research paper [50], the network traffic from these devices was tested and examined by performing TLS interception. The results showed that both Roku TV and Amazon Fire TV devices were tracking and collecting the user identifying information as well as the user viewing behaviour. All of the IoT devices that expressively use the internet, are able to introduce privacy related risks. In addition to collecting information directly related to user behaviour, Smart TVs have been found to collect other pieces of identifying information such as device IDs, serial numbers, Wi–Fi SSIDs and MAC–addresses.

2.9.1 Microphone and gesture sensor

One of the "smart" features of modern TVs is the ability to control the TV with voice commands. Some TVs implement this feature by integrating a microphone into the TV or (like in most cases) a microphone on the remote controller. A Smart TV with an integrated microphone and camera increases

(38)

the privacy threats. Therefore, it is important that vendors specify in the privacy policies on how and why microphone and other sensors are used, especially in countries that follow GDPR.

Modern Samsung Smart TVs have voice control or voice assistant Bixby included. Bixby is a popular voice assistant created by Samsung and is integrated in all of the Samsung smartphones. Some devices even have a dedicated Bixby button. A voice assistant makes it easier for a user to search and find content. Android TVs and LG’s WebOS for instance useGoogle Assistant. The voice control is made up with predefined commands that checks for the user input, but it will also render what the user says in order to provide better search results for content both locally and on the internet.

To indicate that voice control is used, an icon of microphone is displayed on the Samsung Smart TV.

2012 Samsung models (mainly E-series) [46], were the early versions with a built-in microphone. These TVs were configured to continuously record surrounding sounds, even when the user has disabled the voice control feature. An article from Cnet [43] discusses how Samsung’s Smart TVs not only record what a user is saying for voice control, but also share this data with third parties. This means that a consumer should be careful about using voice control by not including any personal information as it might end up in the database of third parties.

Samsung flagship Smart TV models have built-in cameras. This allows users to use video communication apps such as Skype directly on the TV [46]. These Smart TVs are high end models that also have gesture sensors, meaning a user can interact with the Smart TV right in front of it without the controller.

2.9.2 Web browser and cookies

Most of the IoT devices with a display and with a possibility for some input, have an integrated simple web browser. Although entering input like text is still a struggle on Smart TVs, most of TVs allow users to connect a USB–

keyboard and Bluetooth keyboards on TVs that have Bluetooth available.

The web browser in itself is not very powerful and does not respond nearly as fast as a web browser on a computer, but it does allow users to search for content and browse web sites. There are two main reasons that make web browsers slow. Firstly, web sites that are heavily made of JavaScript and have a lot of content to load, this requires more processing of data.

Second, web sites are often created in such a way that is more suitable for web browsers in computers or mobile phones. Not all of the TVs have a built-in web browser, Smart TVs that use Android as an OS will have to download and install a web browser through the Google App Store.

A cookie is a piece of data stored in text files that websites place on a visitor’s device. This allows websites to identify visitors by storing specific information. Cookies are essential for many websites and web–

applications. But there is also another side of cookies, these can be used to track the pages users visit from site to site, which allows advertisers to track user’s interests and behaviours. This is called targeting and advertising

(39)

cookies [89]. Targeting and advertising cookies are designed to gather information from user’s device to display advertisements based on relevant topics of interest. Advertisers will place these cookies on websites with the permission of the website’s operator. With the GDPR in place, websites now have to ask for permission and inform about which cookies are used.

2.9.3 Automatic content recognition

Since 2012, Smart TVs gained the ability to gather the screen information a user is viewing on the tv, this technology is called Automatic Content Recognition (ACR). ACR technology works by identifying video and audio fingerprints which enables vendors to identify one video from another [38].

With ACR, it is possible to determine what shows or movies are trending and are popular. ACR data works because Smart TVs (with permission) capture a few pixels (fingerprints) from the content the viewer is currently watching on the TV and share that data with the TV manufacturer’s ACR tracking software [95]. The software takes these pixels and matches them to a database that keeps track of local broadcasts in the region the viewer is watching in. The ACR data also includes the length of commercial breaks and which commercials are being watched. With this technology, ACR providers know the following things:

• Is the viewer watching linear, OTT, DVR or VOD?

• What shows and commercials they are watching on a second–by–

second basis

• What the viewers IP-address is, which will then allow them to know their physical address and which websites and apps they visit.

This data is all anonymized, e.g., there are no actual PII attached [95], this is further discussed in the analysis chapter where the PII values used in this thesis are introduced.

ACR typically collects both audio and video, making it easier to identify what shows or series is being watched including the episode number as well as the viewing time. Since the audio is collected, some ACR systems are also able to detect the language of the content [88]. This type of behaviour is happening in the background and a consumer will not notice this process.

All of the information is stored in the ACR cloud application and is then handled by an ”Event System” [76]. The Event System generates the data to deliver directly to the tv client. With ACR, advertisers are able to craft targeted ads and determine user’s interests or desires.

ACR is built to gather screen information from consumers. Samsung and LG’s Smart TVs allow the user to disable ACR [24], but it is still configured enabled as default. This means that if the user turns off the ACR function, a factory reset will change the ACR back to default state. ACR companies such as SambaTV also uses web beacons to identify each user [62], this information is then combined with video and audio fingerprints as well as a device-ID and user’s IP-address. Smart TV vendors partner with advertising firms that make use of the combined data from consumers.

(40)

Another interesting fact mentioned in a Forbes article by Alan Wolk and in a Washington Post article by Geoffrey A. Fowler [95, 15], is that SambaTV targets mobile phones based on ACR data from Smart TVs, this is done by analysing IP–addresses and finding smartphones that is in the same network as the Smart TV.

The value of ACR data, according to Forbes [95], is estimated to grow and become a $5 billion industry by 2021. Advertisers use ACR data to help determine who is watching what across a wide variety of formats and options. The persistence of this technology does take into consideration of what application is used, ACR will still collect viewing content.

SambaTV has its own privacy policy, this policy is seen in Philips Smart TVs alongside with Philips’ policy. SambaTV specifies what kind of data is collected and why [62]. The collected data is mainly Content Viewing Information, Log Information and Device Information. The following is a snippet from the privacy policy:

Where the law permits, we may also obtain information from other sources and combine that with information we collect through our Services. For example, we might obtain information from data providers and advertising exchange services, including assumed demographics and interests, and data about your engagement with certain ads.

It is also mentioned that Samba TV is a participant in the EU-US Privacy Shield and Swiss-US Privacy Shield programs, but as mentioned in the background chapter, EU-US Privacy Shield is no longer valid. SambaTV also states that the gathered ACR data is not further sold to any third parties [41]. Instead, advertisers pay SambaTV to direct ads to other gadgets in a home after their TV commercials play.

Early versions of ACR data collection are seen in year 2013, a blog post [24] shows that LG’s Smart TV collects viewing information as well as connected device names and even file names. There was an option in the system settings called "Collection of watching info" that was set to beON by default, which, after network interception analysis, showed that viewing information appears to be collected regardless of whether this option is set toONorOFF. The transmitted data to LG was done using HTTP, which included the unique device ID, viewing data and file names that were stored on consumer’s external USB hard drive.

(41)

Chapter 3

Data collection

This chapter focuses on the data gathering methods and shows how the collecting method will be executed. The goal is to gather as much network traffic as possible in different modes for each policy state. Under data collection, TVs will be in idle mode, while turned off mode and a mode where the TV is controlled and interacted with. The voice and microphone function will also be tested for the TVs that have microphone included. Each test is executed under the accepted and declined privacy policy states.

3.1 Research methodology

For this thesis, a single case study was done to research PII exposures in Smart TVs under different privacy policy states. Acase studyis a research method that focuses on understanding the dynamics of single settings [51].

Meaning the research provides a deeper understanding of specific instances of a phenomenon. Each Smart TV is experimented with where policies have been accepted and declined for each test.

A controlled experiment was conducted on five Smart TVs, Table 3.1 shows each TV data was collected from. Since the collected data may contain PII of an end–user, the project was therefore reported to the Norwegian Centre for Research Data (NSD).

Samsung QE65Q7FNA LG webOS TV SK7900PLA

Sony TV BRAVIA 4k Samsung Q60 Philips 55PUT6101/12 Table 3.1: Smart TV model list

3.2 The data collection method

The data collection method needs to be systematic and portable in order to achieve an easy setup and to provide consistent and precise results that can

Referanser

RELATERTE DOKUMENTER