Face Me! Head-tracker interface evaluation on mobile devices

(1)

DOCTORAL THESIS

2018

FACE ME!

HEAD-TRACKER INTERFACE EVALUATION ON MOBILE DEVICES

Maria Francesca Roig Maimó

(2)

(3)

DOCTORAL THESIS

2018

Doctoral Programme in Information and Communications Technology

FACE ME!

HEAD-TRACKER INTERFACE EVALUATION ON MOBILE DEVICES

Maria Francesca Roig Maimó

Thesis Supervisor: Javier Varona Gómez

Thesis Supervisor: Cristina Suemay Manresa Yee Thesis tutor: Javier Varona Gómez

Doctor by the Universitat de les Illes Balears

(4)

Maria Francesca Roig Maimó

Face Me! Head-tracker interface evaluation on mobile devices Documentation, June 12, 2018

Supervisors: Javier Varona Gómez and Cristina Suemay Manresa Yee

Universitat de les Illes Balears

Departament de Matemàtiques i Informàtica

Computer Graphics and Vision and AI Group (UGIVIA)

Cra. de Valldemossa, km 7.5 07122 , Palma, Illes Balears

(5)

Dr Javier Varona Gómez of Universitat de les Illes Balears I DECLARE:

That the thesis titlesFace Me! Head-tracker interface evaluation on mobile devices, presented by Maria Francesca Roig Maimó to obtain a doctoral degree, has been completed under my supervision and meets the requirements to opt for an International Doctorate.

For all intents and purposes, I hereby sign this document.

Signature

Palma de Mallorca, June 12, 2018.

v

(6)

(7)

Dra Cristina Suemay Manresa Yee of Universitat de les Illes Balears I DECLARE:

That the thesis titlesFace Me! Head-tracker interface evaluation on mobile devices, presented by Maria Francesca Roig Maimó to obtain a doctoral degree, has been completed under my supervision and meets the requirements to opt for an International Doctorate.

For all intents and purposes, I hereby sign this document.

Signature

Palma de Mallorca, June 12, 2018.

vii

(8)

(9)

Publications and contributions

Journals

1. Roig-Maimó, M. F., Manresa-Yee, C., & Varona, J. (2016). A robust camera- based interface for mobile entertainment. Sensors, 16(2), 254-272. Impact Factor: 2.677, Q1 in the category INSTRUMENTS & INSTRUMENTATION.

2. Manresa-Yee, C., Roig-Maimó, M. F., & Varona, J. (2017). Mobile accessibility:

Natural user interface for motion-impaired users. Universal Access in the Information Society, 1-13. Impact Factor: 1.219 (JCR 2016), Q3 in the category COMPUTER SCIENCE, CYBERNETICS.

3. Roig-Maimó, M. F., MacKenzie, I. S., Manresa-Yee, C., & Varona, J. (2018).

Head-tracking interfaces on mobile devices: Evaluation using Fitts’ law and a new multi-directional corner task for small displays.International Journal of Human-Computer Studies, 112, 1-15. Impact Factor: 2.863 (JCR 2016), Q1 in the category COMPUTER SCIENCE, CYBERNETICS.

Fitts’ law: On calculating throughput and non-ISO tasks. Revista Colombiana de Computación, 19(1), 7-28.

Proceedings

1. Roig-Maimó, M. F., Varona Gómez, J., & Manresa-Yee, C. (2015). Face Me!

Head-tracker interface evaluation on mobile devices. InProceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Sys- tems - CHI 2015, 1573–1578. ACM. Congress CORE A* (Flagship Conference) (CORE2017).

2. Manresa-Yee, C., Roig-Maimó, M.F., & Varona, J. (2016). Accesibilidad móvil:

head-tracker para personas con discapacidad motora. InActas del XVII Congreso

ix

(10)

Internacional de Interacción Persona-Ordenador - INTERACCION 2016, 153-160.

Ediciones Universidad Salamanca.

3. Roig-Maimó, M. F., Manresa-Yee, C., Varona, J., & MacKenzie, I. S. (2016).

Evaluation of a mobile head-tracker interface for accessibility. InProceedings of the 15th International Conference on Computers Helping People With Special Needs - ICCHP 2016, 449-456. Springer. Congress CORE C (CORE2017).

Evaluating Fitts’ law performance with a non-ISO task. InProceedings of the XVIII International Conference on Human Computer Interaction - INTERACCION 2017, Article 5. ACM.

5. Roig-Maimó, M. F., Varona, J., & Manresa-Yee, C. (2018). Reflections on ESM in the wild: the case of a mobile head-gesture game. InProceedings of the XIX International Conference on Human Computer Interaction - INTERACCION 2018.

(In press).

Book chapters

1. Manresa-Yee, C., Morrison, A., Muntaner, J. J., & Roig-Maimó, M. F. (2017).

Multi-sensory environmental stimulation for users with multiple disabilities. In Recent Advances in Technologies of Inclusive and Well-Being, 165-182. Springer.

Internships

1. Stuttgart University, Stuttgart, Germany. From 17th of June until 26th of June, 2015.

2. York University, Toronto, Canada. From 1st of October until 10th of December, 2015 (mobility grant EEBB-I-15-10293 from the Spanish government).

3. Stuttgart University, Stuttgart, Germany. From 1st of October until 30th of November, 2016 (mobility grant EEBB-I-16-11743 from the Spanish government).

Projects

1. Experiencias de diseño y desarrollo de interfaces naturales en industria, educación y rehabilitación (TIN2012-35427). Principal investigator: Javier

(11)

Varona Gómez. Funding entity: Ministerio de Economía, Industria y Competi- tividad (MINECO). 2013-2016.

2. NEOTEC, ’Plataforma de desarrollo de Aplicaciones Basadas en Interac- ción Natural’ (IDI-20140183). Principal investigator: Javier Varona Gómez.

Funding entity: Inisle Interactive Technologies S.L. 2014-2015.

3. Impuls al disseny dels sistemes interactius dirigits a la teràpia de nins i nines amb necessitats especials (OCDS-CUD2015/07). Principal inves- tigator: Cristina Manresa Yee. Funding entity: Oficina de Cooperació al Desenvolupament i Solidaritat (OCDS) de la UIB. 2016.

4. Evaluación implícita de sistemas interactivos en contextos de salud y bi- enestar (TIN2016-81143-R). Principal investigator: Javier Varona Gómez.

Funding entity: Ministerio de Economía, Industria y Competitividad (MINECO).

2016-2019.

5. Diseño de experiencias interactivas dirigidas al bienestar de personas con necesidades especiales (OCDS-CUD2016/13). Principal investigator:

Cristina Manresa Yee. Funding entity: Oficina de Cooperació al Desenvolupa- ment i Solidaritat (OCDS) de la UIB. 2017-2019.

xi

(12)

(13)

Agraïments

„

Quan surts per fer el viatge cap a Ítaca, has de pregar que el camí sigui llarg, ple d’aventures, ple de coneixences.

Has de pregar que el camí sigui llarg, que siguin moltes les matinades

que entraràs en un port que els teus ulls ignoraven,

i vagis a ciutats per aprendre dels que saben.

Tingues sempre al cor la idea d’Ítaca.

Has d’arribar-hi, és el teu destí, però no forcis gens la travessia.

És preferible que duri molts anys, que siguis vell quan fondegis l’illa,

ric de tot el que hauràs guanyat fent el camí, sense esperar que et doni més riqueses.

Ítaca t’ha donat el bell viatge, sense ella no hauries sortit.

I si la trobes pobra, no és que Ítaca t’hagi enganyat. Savi, com bé t’has fet, sabràs el que volen dir les Ítaques.

—Ítaca (fragment) (Lluís Llach)

Un viatge. Per mi, la tesis ha estat un llarg viatge cap a Ítaca.

I, com la majoria de viatges, aquest va començar amb els ulls posats al meu destí.

Més endavant, una vegada vaig haver començat a caminar, vaig adonar-me que, més que el destí, l’important era el camí que anava recorrent. I, viatjant, viatjant, vaig adaptar-me a un nou idioma, a una nova cultura, a costums estrangers, a creuar-me amb altres viatgers com jo i, sobretot, a aprendre a perdre’m i a tornar-me trobar.

xiii

(14)

Aquest viatge no hagués estat possible sense tota la gent que m’ha acompanyat i m’ha ajudat totes les vegades que m’he perdut (que no han estat poques). Moltes gràcies a tots! Especialment, als meus directors, Xavi i Cristina, per tota la seva paciència i ajuda; als meus companys de laboratori, per ser-hi cada dia; a na Cristina i a en Ramon, per ser els meus guies per terres llunyanes; i, sobretot, els meus pares, que han estat el meu recolzament incondicional durant tota la meva vida (i tots els seus viatges).

Gràcies per viatjar amb mi!

(15)

„

Measure what can be measured, and make measurable what cannot be measured.

—Galileo Galilei

xv

(16)

(17)

Abstract

The integration of front cameras on mobile devices and the increase on processing capacity has opened the door to vision-based interfaces on mobile devices. However, research mostly focuses on the development of new interfaces and their integration into prototypes without analyzing human performance.

In this work, we present FaceMe, a head-tracker interface for mobile devices, and its evaluation from the point of view of Human-Computer Interaction. Our results indicate that head-tracking input on mobile devices conforms to Fitts’ law. Although we obtain a mean throughput that is low compared to a desktop mouse, it lies within the range of desktop head-tracking interfaces and it is close to mobile tilt-based interaction.

Since the tests described in the ISO standard were conceived for desktop environ- ments, interaction at the corners of the display is not tested (due to the circular arrangement of targets). For this reason, we present an alternative task — the multi-directional corner (MDC) task — for Fitts’ law testing on small displays. As the MDC task better represents the tasks for which devices with small display are used, the MDC task is recommended for calculating throughput when the evaluation is done on a device with a small display. Therefore, in this work we propose a methodology to evaluate input methods for devices with a small display, i.e., for mobile devices.

Besides, our conducted studies point out that FaceMe is not only a valid interface for accessibility purposes but also that users are willing to use it in their daily life, even though head gestures interfaces have traditionally been considered socially unacceptable in mobile settings. Hence, FaceMe — just as any other valid head gestures interface — is ready to be released to the general public.

xix

(20)

Resum (català)

La integració de les càmeres frontals als dispositius mòbils, juntament amb l’increment de la seva capacitat de processament, ha obert la porta a la utilització d’interfícies basades en visió. Malgrat això, la major part de la recerca es centra en el desenvolupament de noves interfícies i la seva integració dins prototipus enlloc de centrar-se en l’anàlisi del seu rendiment humà.

En aquest treball es presenta FaceMe, una interfície per a dispositius mòbils basada en el seguiment del moviment del cap, i la seva avaluació des del punt de vista d’Interacció Persona-Ordinador. Els resultats obtinguts indiquen que la interacció mitjançant el moviment del cap s’ajusta a la llei de Fitts. I, tot i que el rendiment mitjà obtingut és baix comparat amb el rendiment del ratolí d’escriptori, aquest es troba dins el rang del de les interfícies basades en el moviment del cap per a ordinadors d’escriptori i és semblant al rendiment de la interacció basada en la inclinació del dispositiu mòbil.

Atès que els tests descrits a l’estàndard ISO van ser concebuts per a entorns d’ordinador d’escriptori, aquests no contemplen el testeig de la interacció a les cantonades de la pantalla (a causa de l’arranjament circular dels objectius). Per aquets motiu, presentem una tasca alternativa — la tascamulti-directional corner(MDC) — pel testeig de la llei de Fitts en dispositius amb pantalla petita. Com que la tasca MDC representa millor les tasques que s’utilitzen en aquest tipus de dispositius, aquesta es recomana quan l’avaluació es realitza per a un dispositiu amb pantalla petita. Per tant, en aquest treball es recomana una metodologia per avaluar mètodes d’entrada per a dispositius amb pantalla petita, és a dir, per a dispositius mòbils.

A més, els estudis realitzats assenyalen que FaceMe no és només una interfície vàlida per accessibilitat, sinó que els usuaris també estan disposats a utilitzar-la dins la seva vida quotidiana, fins i tot encara que les interfícies basades en el moviment del cap tradicionalment s’han considerat inacceptables des del punt de vista social.

Consegüentment, FaceMe — així com qualsevol altra interfície vàlida basada en el moviment del cap — es troba llesta per ser llançada al públic.

(21)

Resumen (castellano)

La integración de las cámaras frontales en los dispositivos móviles, junto con el incremento de su capacidad de procesamiento, ha abierto la puerta a la utilización de interfaces basadas en visión. A pesar de ello, la mayor parte del trabajo de investigación se centra en el desarrollo de nuevas interfaces y su integración dentro de prototipos en lugar de en el análisis de su rendimiento humano.

En este trabajo se presenta FaceMe, una interfaz para dispositivos móviles basada en el seguimiento del movimiento de la cabeza, y su evaluación desde el punto de vista de Interacción Persona-Ordenador. Los resultados obtenidos indican que la interacción mediante el movimiento de la cabeza se ajusta a la ley de Fitts. Y, aunque el rendimiento medio obtenido es bajo comparado con el rendimiento del ratón de escritorio, éste se encuentra dentro del rango de rendimiento de las interfaces basadas en el movimiento de la cabeza para ordenadores de escritorio y es similar al rendimiento de la interacción basada en la inclinación del dispositivo móvil.

Debido a que los tests descritos en el estándar ISO fueron concebidos para entornos de ordenador de escritorio, éstos no contemplan el testeo de la interacción en las esquinas de la pantalla (a causa de la disposición circular de los objetivos). Por este motivo, presentamos una tarea alternativa — la tareamulti-directional corner (MDC) — para el testeo de la ley de Fitts en dispositivos con pantalla pequeña. Ya que la tarea MDC es más representativa de las tareas que se utilizan en este tipo de dispositivos, ésta se recomienda cuando la evaluación se realiza en dispositivos con pantalla pequeña. Por lo tanto, en este trabajo se propone una metodología para evaluar métodos de entrada para dispositivos con pantalla pequeña, es decir, para dispositivos móviles.

Además, los resultados obtenidos indican que FaceMe no es sólo una interfaz válida para accesibilidad, sino que los usuarios también están dispuestos a utilizarla en su vida diaria, incluso aunque las interfaces basadas en el movimiento de la cabeza tradicionalmente han sido consideradas como socialmente inaceptables. Por consi- guiente, FaceMe — así como cualquier otra interfaz válida basada en el movimiento de la cabeza — se encuentra lista para ser lanzada al público.

xxi

(22)

(23)

1 Introduction: The storyboard

„

To understand a science, it is necessary to know its history.

—Auguste Comte

This chapter presents the motivation of this work using a storyboard (see Figure 1.1).

Each scene presents a problem that has to be faced.

Scene 0 Scene 1 Scene 2 Scene 3

Scene 4 Scene 5 Scene 6 Final scene

Figure 1.1. The complete storyboard.

1

(24)

Within the wide range of existing mobile devices, smartphones have great attention in the research field of Human-Computer Interaction (HCI) due to the great challenges they present, mainly because of their limited input and output capabilities [31, 37].

This is not the only reason for its importance in the field, but the new possibilities offered to enhance the user experience with the additional components included in the mobile devices, like cameras -front or rear- and inertial sensors [50].

The integration of cameras combined with increased processing capacity on mobile devices has opened an area of research on vision-based interfaces (VBI). New interaction methods have been developed using the images captured by the camera as input primitives, often to detect the movement of the device [29, 27]. In addition, the integration of front cameras opens the door to head-tracking interfaces (interaction with devices through head movements) as used in commercial applications [28, 38].

An example is the Smart Screen feature on the SamsungGalaxy S4that uses the front camera to detect the position of the user’s face. Face movements are used to perform functions like scrolling within documents, screen rotation, or pausing video playback. Previous work on head-tracking interfaces tends to focus on the development of new tracking methods instead of analyzing user performance. In general, results regarding the user’s performance and experience are not sufficiently represented [43].

This work was born inside a project named “Design and development of experiences in industry, education and rehabilitation using natural interfaces", in particular, inside the objective focused on the development and evaluation of vision-based interfaces of mobile devices. Therefore, the initial objective of this work is the development and evaluation of vision-based interfaces for mobile devices.

Hence, the first step should be the development of a vision-based interface for mobile devices. But, what kind of vision-based interface?

In broad strokes, a vision-based interface uses the information of a camera to interact with a device. So, what does a mobile device’s camera see? If we talk about the front camera, the direct answer is the face. Therefore, a head-tracker interface might be the solution.

(25)

Figure 1.2. Scene 0: We have a head-tracker for mobile devices!

Body gestures are promising for creating more natural interfaces. Current mobile devices include a variety of sensors (such as accelerometers, gyroscope, or cameras) that allow gesture recognition. These sensors offer possibilities to enhance the user experience performing gestures as input control. In particular, the integration of front cameras makes possible to detect and track the user’s face (or head). Therefore, it becomes possible to extend the input vocabulary of mobile devices by using head gestures.

Head-tracking provides a way to interact with devices through the movements of the head. So, the goal of a head-tracker is to detect head’s movements and translate them into interaction actions to the device.

Chapter 2 presents the head-tracker interface developed.

Figure 1.3. Scene 1: Why would you want it?

At first glow, from a researcher’s point of view, a head-tracker is a new possibility to enhance the user experience. So, it is not surprising that previous work on HCI usually focuses on the development of new access methods.

3

(26)

From a more practical point of view, head-tracking on mobile devices has a direct application in assistive tools for motor-impaired users as they allow hands-free interaction and can also be used for rehabilitation purposes. But, thinking in a more general purpose way, head-tracking could also be used in entertainment as an alternative to touchscreen-based controls, which perform not so well relative to physical controls [11, 32, 69]. So, why not think about head-tracking for able-bodied users?

The development of “new" access methods opens the door to new ways to interact with devices. So, only the imagination is the limit for their potential applications.

Chapter 2 details the potential applications of the head-tracker interface developed.

Figure 1.4. Scene 2: Is it good enough?

It is not enough to develop a new access method; it has to be good. But, what does determine if an access method is good enough?

To discover if an access method is good, it has to be tested and evaluated. Since the access method is for humans, it has to be tested with human participants. Besides, it has to be tested for the task for which it is intended to be used for.

The framework for human performance evaluation for non-keyboard devices is the ISO 9241-411 [34]¹. ISO 9241-411 describes performance tests based on Fitts’ law for evaluating the efficiency and effectiveness of existing or new non-keyboard input devices.

In the field of mobile devices, due to the particularities of small displays or the research question to answer, it may not be suitable to use the tests described in the

1ISO 9241-411 [34] is an updated version of ISO 9241-9 [35]. With respect to performance evaluation, both versions of the standard are the same.

(27)

ISO standard. For that reason, researchers sometimes design custom tasks suitable to their particular research question.

So, as the head-tracker is for mobile devices, and one of the key aspects of mobile devices is the small screen size, we want to test if all regions of the device screen are accessible to users through the head-tracker. Therefore, we should conduct users’ studies with human participants to ensure that the entire device screen can be reached with the head-tracker.

Chapter 3 and Chapter 6 explain how to answer this question.

Figure 1.5. Scene 3: Oh-oh, we cannot compare it with other input devices

Due to the particular motivation of our research, i.e., evaluate if all regions of the device screen are accessible, our evaluation methodologies are ad hoc. Therefore, our experimental procedures, while internally valid, are not standard, and this undertakes comparisons between studies.

So, although our experimental procedures allow us to answer our particular research question, we cannot compare our results with other studies.

That is to say, we might know that our head-tracker is valid to reach the entire screen of a mobile device (i.e., it is good enough) but we cannot compare it with other mobile input devices: e.g., is head-tracking for mobile devices as good as the tilt input?

Chapter 4 studies how to solve this issue.

5

(28)

Figure 1.6. Scene 4: Let’sFittsit!

As an effort to bring consistency and allow between-study comparisons, the ISO 9241-9 standard was published in 2002 and updated in 2012 as ISO 9241-411. As already mentioned, ISO 9241-411 describes performance tests for evaluating human performance of non-keyboard input devices. The primary tests involve point-select tasks using Fitts’ law throughput as a dependent variable.

Therefore, users’ studies that follow the standard and use Fitts’ throughput as a measure of human performance will allow between-study comparison.

So, we should conduct a user study to evaluate the head-tracker interface following the recommendations described in the ISO standard and to obtain a benchmark value of throughput. This will allow comparisons with other input devices and methods.

Chapter 5 details how to answer this question.

Figure 1.7. Scene 5: OK, but what about the corners?

(29)

Since the tests described in the ISO standard were conceived for desktop environ- ments, there are limitations for small displays — one of the key features of mobile devices. For example, due to the circular arrangement of targets, interaction at the corners of the display is not tested.

One of the key aspects of mobile devices is the small screen size. Therefore, a main requirement that a mobile device’s access method should accomplish is that all regions of the device screen should be accessible through the access method. That is, all regions of the display are important. Consequently, the tasks should include targets positioned at the corners of the display. To overcome this limitation, and make possible the between-study comparison based on throughput(the recommended ISO’s human performance measure), an alternative task — the multi-directional corner (MDC) task — for Fitts’ law testing focused on the peculiarities of small displays is designed and evaluated.

Chapter 5 details the new MDC task.

Figure 1.8. Scene 6: No one will use it!

At some point we might face the sentence “No one will use a head-tracker (if he/she does not have to)". And that sentence might lead to the other one “Why do you bother about head-trackers for able-bodied users?". So, how much of truth is there in these sentences?

Gesture-based mobile interfaces may require users to adopt new behaviors that might be embarrassing in certain mobile settings. Therefore, there is a reluctance to release them to the general public alluding social grounds.

So, it may be necessary to conduct a social acceptance study to understand if users would be willing to use a head-gesture interface within their daily life. But, which are the implications of social acceptance studies over technology usage? Should we let that the concern about social acceptance determine the future of a new technology?

Should we decide if a technology is worthy only based on the results of preferences’

7

(30)

questionnaires? We should not forget that when it comes to user preferences, there can be many different influences, and these preferences may change over time.

What would have happened with the AppleiPhoneotherwise?

Chapter 7 studies how to answer this issue.

Figure 1.9. Final scene: In the end, what do you know?

This work starts with the development of a new vision-based interface for mobile devices (a head tracker), but this work is not focused neither on head trackers nor the development of new vision-based interfaces for mobile devices. This work is about the evaluation of a vision-based interface for mobile devices. So, the research is inspired by the question “Which is the best way to evaluate a vision-based interface in the case of mobile devices?" (or even “Which is the best way to evaluate any kind of interface in the case of mobile devices?") and the head tracker is only the tool that helps to answer this question.

Mainly, the contribution of this work is to provide a methodology of evaluation focused on the particularities of mobile devices (e.g., small displays). Hence, this methodology could be used to evaluate any vision-based interface (even any access method) for mobile devices, not only head-trackers.

(31)

Objectives (a.k.a. Spoilers)

To sum up, the main objective of this work is the formal evaluation of a new vision- based interface for mobile devices. To achieve this, we will need to:

• Develop a new vision-based interface for mobile devices: a head-tracker interface (FaceMe).

• Evaluate the usability of FaceMe, i.e., its effectiveness, its efficiency and its satisfaction.

To evaluate FaceMe in terms of effectiveness and efficiency, and to allow between- study comparison, we should follow the recommendations described in the ISO standard. But, as the tests described in ISO 9241-411 present limitations for small displays, it is also necessary to:

• Design and develop a new task focused on the particularities of small displays:

the multi-directional corner (MDC) task.

• Validate the new MDC task.

Finally, in order to evaluate FaceMe in terms of satisfaction, we will need to assess its comfort and its usage in real settings.

9

(32)

(33)

2 FaceMe: The head-tracker interface

A robust camera based interface for mobile devices

„

Often a silent face has voice and words.

—Ovid

In this chapter, FaceMe — the developed head-tracker interface for mobile devices — is presented. As pointed out earlier, in this work, the head-tracker interface is used as a case study to exemplify the evaluation of a vision-based interface on mobile devices. So, even though the head-tracker is indeed part of the contribution, we suggest the reader to see the head-tracker more as a tool than as a goal.

2.1 Introduction

Head-tracking provides a way to interact with devices through the movements of the head. Consequently, the goal of a head-tracker is to detect head movements to translate them into interaction actions on the device.

Head-trackers have a direct application in assistive tools for motor-impaired users as they allow hands-free interaction. In the assistive domain technologies, such interfaces are widely used for desktop computers [47, 48] and in multiple commercial mobile applications [24, 25]. Previous work even showed that interaction through head-trackers can be used for rehabilitation purposes [46]. On mobile devices, head-trackers have also been used in entertainment as an alternative to touchscreen- based controls, which perform not so well relative to physical controls [11, 32, 69]. Tilt-controlled games, a gesture-based interface that could work similar to head-tracker interfaces under some circumstances, are increasingly common and popular on mobile devices [59, 1].

Research on head tracker interfaces based on image sensors for desktop computers is mature and has been conducted for a long time for HCI purposes [60, 9, 12,

11

(34)

63]. This kind of research is now focusing on mobile devices as front cameras are integrated and devices count with sufficient processing capacity to develop vision-based interfaces.

The representation of the head and the features selection for tracking (e.g., skin color or face geometry) is a key point for the tracking step. The tracking can be done detecting the face at each frame or by corresponding the face across frames, which would update the face location based on information from previous frames.

Basically, the head tracking is performed by tracking facial features [28, 26, 18] or the entire face (in 2D or 3D) [36, 22, 39, 14]. Frequently, the facial features selected to track are the eyes [26] or the nose [63] and the entire face is usually tracked based on skin color [22, 63] or face detectors [36].

Commercial apps can also be found that track the head for diverse purposes such as the Smart Screen Samsung Galaxy S4 [54] that uses the front camera to detect the position of the face and eyes to perform functions like scroll within documents, screen rotation or pause video playback.

In this chapter, FaceMe is presented. FaceMe is a new head-tracker interface for mobile devices that only uses the information of the front camera (there is no need of additional components) to detect and track the user’s nose position and translate its movements into interacting actions to the device (see Figure 2.1), allowing for example, the use of the nose as a pointer.

Figure 2.1. FaceMe uses the information of the front camera to detect and track the user’s nose position and translate its movements into interacting actions to the device.

(35)

2.2 Interface design and development

A version of SINA system [63] — a camera-based head-tracker interface for desktop environment — has been adapted and optimized for mobile devices (the software was developed in iOS).

To design the interface, the design recommendations for camera-based head-controlled interfaces listed by Manresa-Yee et al. [45] were followed. Manresa-Yee et al. [45]

summarizes design decisions for any interface based on a head-tracker and the different approaches researchers have used in desktop computers.

The developed camera-based interface is based on facial feature tracking instead of tracking the overall head or face. The selected facial feature region is the nose, because this region has specific characteristics (i.e., distinction, uniqueness, invariance, and stability) to allow tracking, it is not occluded by facial hair or glasses, and it is always visible while the user is interacting with the mobile device (even with the head rotated).

An overview of the interface design is depicted in Figure 2.2. The process is divided into two stages: theUser detectionstage and theTrackingstage.

Figure 2.2. Camera-based interface design.

2.2 Interface design and development 13

(36)

TheUser detection stage is responsible for processing the initial frames from the camera to detect the user’s facial features to track. Once these facial features are detected, theTrackingstage performs their tracking and filtering. Finally, the average of all the features, i.e., the nose point, is sent to a transfer function; which will use the tracking information and the device’s characteristics (e.g., screen size), to fulfill the requirements of each mobile application (i.e., app).

2.3 User detection (and facial feature selection)

As pointed out earlier, theUser detectionstage is responsible for processing the initial frames to detect the user’s facial features to track. Therefore, the first step is the detection of the main user face.

A fundamental requirement for the interface is that the user (i.e., main user) has to be automatically detected to get the control of the interaction with no need of a calibration stage. So, assuming that the user keeps his/her head steady for a predefined number of frames, the system automatically detects the user’s face in the image: the position and width of the image region corresponding to the user’s face (see “Main face region" stage in Figure 2.3).

Figure 2.3. Illustrated theoretical stages for the detection of the main user face.

The facial user detection process is based on the face detection API integrated into the mobile platform (i.e., iOS), which makes the face detection in an optimized way. Even if different people are present in the image (see “Face detection" stage in Figure 2.3), the system will consider the main face (the biggest one) as the user of the system (see “Main face region" stage in Figure 2.3). Finally, we introduce a temporal consistency scheme in order to avoid false positives and ensure a steady user action for a proper algorithm initialization (see “User detected" stage in Figure 2.3). The algorithm details to detect the main user’s face region in the image are shown in Algorithm 1.

(37)

Algorithm 1Facial detection algorithm carried out in theUser detectionstage

1: whileframedo

2: It←image frame . It: camera image frame at timet

3: .Select the main face:

4: F_it← {position_it, width_it} . F_it: ith-detected face inI_t

5: Ft← {F_it | ∀j:widthit> widthjt} . Ft: main face selected

6: .Temporal consistency:

7: . position_t: main face position components inxandydimensions

8: . F D: allowed maximum face displacement (in image pixels)

9: . N F: predefined number of frames

10: if(|position_t−positiont−1|< F D) inN F imagesthen

11: Ft←facial region . Ft: user’s face detected

12: end if

13: end while

Observing the anthropometrical measurements of the human face, the nose region can be found in approximately the second third of the facial region. We apply this rule to select the nose image region over the facial image region detected (see “Nose region" step in Figure 2.4), for searching good facial features to track. By applying this strategy, the nostrils and the corners of the nose are found as good facial features to track (see “Facial features" step in Figure 2.4).

Figure 2.4. Simulated steps of theUser detectionstage.

Due to the lighting environment causing shadows, other unstable features can be detected too. As the center of the nose is the ideal point to send to the transfer function, good features will be preferably placed on both sides of the nose and with certain symmetrical conditions. Therefore, a re-selection of the initially found features is carried out to achieve a more robust tracking process, selecting pairs of features symmetrical respect to the vertical axis as it is described in Algorithm 2 and it is shown in Figure 2.5c.

Finally, the nose point is the average of all facial features being tracked, which will be centered on the nose, between the nostrils (see Figure 2.5d).

2.3 User detection (and facial feature selection) 15

(38)

Algorithm 2Feature re-selection algorithm forUser detection

1: x_c←center_x(nose image region) . x_c: horizontal coordinate of the center of the nose image region

2: .Create two sets of the selected facial features, one for the left facial features, x_L, and another for the right facial features,x_R:

3: xL← {x | x < xc} .Left facial features

4: x_R← {x | x > xc} .Right facial features

5: .Select pairs of facial features of each set that are approximately at the same line and compute for each selected pair:

6: ifhorizontal distances of each feature toxcare equalthen

7: re-select this feature

8: end if

(a) (b) (c) (d)

Figure 2.5. Stages of the real-time nose tracking algorithm: (a) automatic face detection, (b) initial set of features, (c) best feature selection using symmetrical

constraints, (d) mean of the selected features: tracked nose point.

As depicted in Figure 2.6 with images from the front camera of the mobile device, the face and facial features detection is stable and robust for different users, light conditions and backgrounds.

Figure 2.6. The system’s functioning with different users, lightings and backgrounds.

(39)

2.4 Tracking

As formerly said, theTrackingstage is responsible for the tracking and filtering of the user’s facial features. It is important to highlight that in theTrackingstage, the face does not need to be fully visible, unlike in theUser detectionstage.

To track the facial features, the spatial intensity gradient information of the images is used to find the best image registration [7]. We use a pyramidal implementation of the classical Lucas-Kanade algorithm [10]. Therefore, the tracking algorithm is robust to handle head rotation, scaling or shearing, so the user can move in a flexible way.

However, fast head movements can cause the lost or displacement of features to track. As we are only focusing on the nose region, when a feature is separated from the average point more than a predefined value, the feature will be discarded (see

“Filtering of displaced features" step in Figure 2.7). When there are not enough features to track, then theUser detectionstage restarts.

Figure 2.7. Simulated steps of theTrackingstage.

To provide robustness to the Tracking, we recover and update the nose features used to track. We follow a typical Bayesian approach to sensor fusion, combining measurements in the representation of a posterior probability. In this case, we combine for each new frame, the tracked nose features with new detected features (see “Fusion" step in Figure 2.7). For this goal, when the user’s face is looking up towards the camera, we search for new features to track on the nose region (see

“New detected features" in Figure 2.7). To update the set of tracked features, a probability of re-initialization is used to include the new detected features in the current nose feature set. In our case, this probability value has been set to 0.05 as a result to the initial prototype testing [47]. The user does not participate actively in this phase; he or she will just feel a subtle readjustment of the point on to the nose.

2.4 Tracking 17

(40)

Then, we apply a velocity constantKalmanfilter to smooth the positions [8] (see blue point in Figure 2.6).

The proposedTrackingstage is able to run in real-time on current mobile devices with different CPU platforms. Table 2.1 shows the processing times in milliseconds (ms) of the head-tracker for different mobile devices when operating with the current image resolution (144×192). Besides, a performance test with different image resolutions in an AppleiPhone 6 Plusplatform (A8: 64bits, 1.4GHz ARMv8-A dual core) is also included as a comparison in Table 2.2.

Table 2.1. Average processing time of the head-tracker for different mobile devices.

Device CPU Processing time (ms)

AppleiPad 2 A5: 32bits, 1Ghz, ARM Cortex-A9 dual core 251 AppleiPhone 5s A7: 64bits, 1.3GHz ARMv8-A dual core 37 AppleiPad Air A7: 64bits, 1.4GHz ARMv8-A dual core 36 AppleiPhone 6 A8: 64bits, 1.4GHz ARMv8-A dual core 31

Table 2.2. Average processing time of the head-tracker with different resolutions on an AppleiPhone 6 Plusmobile device with a CPU A8: 64bits, 1.4GHz ARMv8-A dual

core.

Height (px) Width (px) Processing time (ms)

192 144 31

352 288 54

480 360 68

640 480 89

2.5 Transfer function

As stated earlier, the mobile application using the head-tracker will decide how to proceed with the nose point given by the head-tracker. The average of all the features, i.e., the nose point, will be sent to a transfer function in the app; which is responsible of translating the head-tracker information in a useful way to the app.

The information provided by a head-tracker could be used in different ways. In the next two sections, we will describe the transfer functions for the two most common usages: using the interface as a pointing device and using the interface for head gesture recognition. The transfer functions described are used later in the following chapters of this work.

(41)

2.5.1 Using the interface as a pointing device

The clearest usage for a head-tracker is pointing, that is, to use it as a pointing device. A pointing device is an input device that allows to input spatial data to a device screen, for example, by controlling the movement of a cursor on the screen.

In this case, the nose point is mapped to the device screen to use the nose as a pointer. That is, a virtual object (i.e., a circle cursor) is positioned on the mobile screen according to the movements of the user’s head (see Figure 2.8).

Figure 2.8. The transfer function processes the user’s head motion to translate it to a position in the mobile’s screen.

To translate the tracking data from the head-tracker interface to the device screen, a relative positioning approach is used. In this approach, the interface reports the change in coordinates, which each location is calculated relative to the previous location, rather than to a fixed location. That is, the transfer function translates the change in coordinates in the camera image to a coordinates’ change on the device screen.

2.5 Transfer function 19

(42)

Let’s defineu_t= (ux,t, uy,t)as the user’s nose movement at every time stampt, i.e., the change in coordinates of the nose point in the camera image:

u_t=n_t−nt−1, (2.1)

wheren_t= (n_x,t, n_y,t)corresponds to the nose point in the camera image at every time stampt.

The transfer function maps the user’s nose movement,u_t, to a device screen position, p_t= (p_x,t, p_y,t)at every time stampt.

To ensure that users can reach all screen positions in a comfortable way with their range of head movement, a test to measure the users movement range in the camera image plane should be conducted, and a scale factor, s = (s_x, s_y), should be considered. For example, for a source image of 192×144 pixels (low image resolution of an AppleiPhone 5), an informal test with five volunteers obtained a value of 55 pixels in both horizontal and vertical movement. Therefore, for that case, the scale factor is calculated as follows:

s_x = w

55, s_y = h

55, (2.2)

wherewandhcorrespond to the device display resolution (width, height).

Therefore, the actual position of the cursor,c_t= (cx,t, cy,t), at every time stamptis calculated as follows:

c_t=c_t−1+u_t·s·gainF actor (2.3)

Further, a gain factor is included, that is, the amount of movement on the device in response to a unit amount of movement of the user in the camera image. In that sense, the gain factor can be interpreted as the velocity of the cursor, in pixels/frame rate.

Usages of the interface as a pointing device are explored in Chapter 3, Chapter 4, Chapter 5, and Chapter 6.

(43)

2.5.2 Using the interface for gesture recognition

Another common usage for a head-tracker is head-gesture recognition. In this case, the sequence of nose points returned by the head-tracker is analyzed to recognize a gesture.

For example, a simple transfer function could detect the horizontal movement of the user’s head in both directions: left and right. And then, the recognized gesture could trigger some interaction action, like controlling the horizontal direction of a moving virtual object. In this case, the moving right head-gesture is recognized when a displacement to the left is detected on the points returned by the head-tracker (see Figure 2.9) and a moving left head-gesture is recognized if the displacement is in the opposite direction.

Figure 2.9. The transfer function processes the user’s nose horizontal displacement to detect a moving right head-gesture.

Then, when a horizontal head-gesture is detected, the transfer function could translate a virtual object to its right or to its left: applying, for example, a horizontal impulse to a physic body (the moving virtual object) to change its linear velocity without changing its angular velocity, following Newton’s second law. As we want to impart a momentum to thex-dimension proportional to the horizontal displacement of the head performed by the user, a gain factor is included and our momentum responds to the Equation 2.4

v_x·gainF actor, (2.4)

where the velocity (v_x, in pixels/frame rate) corresponds to the difference in the x-coordinates of the nose points returned by the system in two consecutive frames.

Usage of the interface for gesture recognition is explored in Chapter 7.

2.5 Transfer function 21

(44)

2.6 Conclusion

In this chapter, we have presented FaceMe, a new head-tracker interface for mobile devices. FaceMe only uses the information of the front camera to detect and track the user’s nose position and translate its movements into interacting actions to the device.

Basically, the system uses the front-camera data to detect features in the nose region (User detection stage), to track a set of points (Tracking stage), and to return the average point as the nose position (i.e., the head position of the user). Then, the app using the head-tracker will decide how to proceed with the given nose position (Transfer functionstage).

FaceMe has shown to be stable and robust for different users, light conditions, and backgrounds. Therefore, it is a valid case study to exemplify the evaluation of new vision-based interfaces on mobile devices.

FaceMe was developed in iOS. Hence, this version is only compatible with iOS devices. Even though this limits the range of devices were FaceMe can be tested (iOS devices), the algorithm presented could be directly translated to other mobile platforms: for example, Android OS. In fact, as future work, the translation of FaceMe to other mobile operating systems would extend the external validity of the results obtained in this work.

(45)

3 Evaluating the human performance of FaceMe

„

The best things come in small packages.

—Proverb

In Chapter 2, we have presented FaceMe, a new head-tracker interface for mobile devices. Even though research mostly focuses on the development of new interfaces and their integration into prototypes, this is not enough. Whenever a new interface is developed, the next step should be its human performance analysis in order to assure that it is a valid interface.

In this chapter, we present theFaceMe experiment, the first evaluation of FaceMe from the point of view of Human-Computer Interaction.

3.1 Introduction

Previous works on head-tracker interfaces focused on the development of new access methods instead of analyzing the user’s performance. Therefore, results regarding the user’s point of view are not sufficiently tested [43].

In the field of mobile devices, due to the particularities of small displays, a main aspect that an access method should accomplish is that all regions of the device screen should be accessible through the access method.

In this chapter, we present the first human performance evaluation of FaceMe. Due to the particularities of mobile devices (e.g., small displays), the main motivation of the study was testing target selection over the entire display surface to determine if all regions of the device screen were accessible for users.

This experiment, to the best of our knowledge, is the first attempt to evaluate human performance of a head-tracker interface on mobile devices. There is research about head-trackers’ evaluation on computers [56, 13] but this research has not been extended to mobile devices.

23

(46)

3.2 The FaceMe experiment

The FaceMe experiment investigates critical design factors of the developed mobile head-tracking interface, such as target size and target location, with mainly two goals. The primary goal is to determine if all regions of the device screen are accessible for users through the interface. The secondary goal is to provide design recommendations for those designers and developers using the interface.

The FaceMe experiment uses a mobile head-tracking interface to investigate the effect of device orientation (portrait, landscape), gain (1.0, 1.5), and target width (88 pixels, 176 px, 212 px). The evaluation is limited to selection accuracy, cursor velocity, and selection errors.

3.2.1 Participants

Nineteen unpaid participants (four females) were recruited from the local town and university campus from an age group of 23 to 69. The average age was 38.2 years (SD= 14.1). None of the participants had previous experience with head-tracking interfaces.

3.2.2 Apparatus and experiment task

The experiment was conducted on an AppleiPhone 5 with a resolution of 640× 1136 px and a pixel density of 326 ppi. This corresponds to a resolution of 320× 568 Apple points.¹

The experiment involved a point-select task that required positioning a circle cursor inside a square target (see Figure 3.1).

Figure 3.1. Example of a target condition (W= 88 px, orientation = landscape) with annotations and the procedure instructions (task belonging to the practice

sequence).

1Apple’s point (pt) is an abstract unit that covers two pixels on retina devices. On theiPhone 5, one point equals 1/163 inch (Note: 1 mm≈6 pt).

(47)

User input combined mobile head-tracking for pointing and touch for selection.

Selection occurred by tapping anywhere on the display surface with a thumb. The target was highlighted in green when the center of the cursor was inside the target;

a selection performed in these conditions was considered successful.

The experiment sought to determine if all regions of the device screen were accessible for users. For this purpose, the display surface was divided into 15 regions of approximately 213×227 px (see Figure 3.2). The targets were centered inside the regions.

Figure 3.2. Regions of the display surface in portrait and landscape orientation. (Numbers added to identify regions.)

The task was implemented in both portrait and landscape (left) orientations.

A sequence of trials consisted of 15 target selections, one for each of the 15 regions of the display, presented randomly and without replacement. Upon selection, a new target appeared centered inside one of the remaining regions. Selections proceeded until all 15 regions of the display were used as target centers (see Figure 3.3).

Figure 3.3. Random example of the first four trials of a possible sequence of trials with targets placed inside their regions (selection order: 7, 9, 4, 11) of a target condition (W = 88 px, orientation = landscape). Note that only one target is visible for each trial (numbered regions and already selected targets added for

clarification purposes).

3.2 The FaceMe experiment 25

(48)

According to the iOS Human Interface Guidelines [3], the optimal size of a tappable UI element on theiPhoneis 44×44 pt, which is equivalent to the minimum level chosen for target width: 88×88 px. According to research focused on the optimal size of targets on touch screens, targets should have a minimum size of 12 mm [64], giving approximately our medium level chosen for target width: 176×176 px.

The maximum level for target width was limited by the size of the screen regions (approximately 213×227 px).

For each user session, all twelve conditions were used and presented in random order until all trials were completed.

To adapt to each new condition, participants were required to correctly select a practice series of three targets to start the sequence of trials (see Figure 3.4). The practice series was not registered as experiment data.

Figure 3.4. Example of the practice sequence for the target condition.

3.2.3 Procedure

After signing a consent form, participants were briefed on the goals of the experiment and were instructed to sit and hold the device (in the orientation indicated by the software) in a comfortable position (see Figure 3.5). The only requirement was that their entire face was visible by the front camera of the device.

The experiment task was demonstrated to participants, after which they did a few practice sequences. They were instructed to move the cursor by holding the device still and moving their head. Participants were asked to select targets as quickly and as close to the center as possible. They were allowed to rest as needed between sequences. Testing lasted about 15 minutes per participant.

(49)

(a) Portrait orientation (b) Landscape orientation

Figure 3.5. Participants performing the experiment: holding the device in (a) portrait orientation and in (b) landscape orientation. Moving the cursor by moving the

head and selection by tapping anywhere on the display surface with a thumb.

3.2.4 Design

The experiment was fully within-subjects with the following independent variables and levels:

• Orientation:portrait, landscape

• Gain: 1.0, 1.5

• Width: 88, 176, 212 px

The dependent variables were selectionerror, selectionaccuracyandvelocity.

The total number of trials was 19 Participants×2 Orientations×2 Gains×3 Widths

×15 Trials = 3420.

3.3 Results

In this section, results are given for accuracy, velocity, and error rate.

3.3 Results 27

(50)

3.3.1 Accuracy

The accuracy measure collected in the experiment was the Euclidean distance between the effective user selection and the center of the target.

The mean accuracy per task over the entire experiment was 64 pixels (i.e., 32 points).

As expected, the effect of target width on accuracy was statistically significant (F_2,36= 55.5,p< .001) and, unsurprisingly, target width = 88 px produced the best accuracy (50.24 pixels), with target width = 212 px the worst (72.74 pixels) (see Figure 3.6). Pairwise comparisons with Bonferroni correction show that the target widths 176 and 212 are no longer significantly differentp=.45> .0166.

Figure 3.6. Accuracy by target width and gain. Error bars show 95% CI.

The effect of gain on accuracy was also statistically significant (F1,18= 18.8,p< .001).

The mean for the gain = 1 was 14.38% lower than the mean of 68.96 px for the gain = 1.5.

Interface orientation on accuracy was not statistically significant (F_1,18= 1.32,p>

.05).

3.3.2 Velocity

Movement time was not analyzed because of the different task distances due to the randomly presented target positions. We analyzed velocity instead, calculated as effective distance of the task divided by the movement time.

The mean velocity per task over the entire experiment was 217.22 px/s.

(51)

As expected, the effect of target width on velocity was statistically significant (F2,36= 78.5,p< .001). Target width = 88 px produced the lowest velocity (163.80 px/s), with target width = 212 px the highest (249.92 px/s). Pairwise comparisons with Bonferroni correction show that the target widths 176 and 212 are no longer significantly different p = .064 > .0166. The effect of gain on velocity was also statistically significant (F1,18 = 10.1,p < .01). Gain = 1.5 produced the highest velocity (226.02 px/s.), with gain = 1 the lowest (208.44 px/s.).

Note that for target width = 88 the mean velocity achieved with gain = 1 is higher than with gain = 1.5 (see Figure 3.7). That effect would be explained because of the greatest need of target re-entries with gain = 1.5 due to the higher difficulty with final selection for smaller targets and with higher gains.

Figure 3.7. Velocity by target width and gain. Error bars show 95% CI.

Interface orientation on velocity was not statistically significant (F1,18= 0.18,ns).

3.3.3 Error rate and target position

The error rate, calculated aserroneous selections per total number of selections, is used to test if all the regions of the screen are accessible in both landscape and portrait orientation. The mean per task over the entire experiment was 0.22 (i.e. a 22% of erroneous selections).

With condition gain = 1 all the regions of the screen were accessible with an acceptable error rate even with the smallest target width condition. An average correct selection rate above 0.83 is achieved for target width = 176 px (0.83) and target width = 212 px (0.89). An average correct selection rate of 0.7 is achieved for target width = 88 px.

3.3 Results 29

(52)

From results shown in Figure 3.8, it seems that there is no relation between correct selection rate and target location. In addition, it is confirmed that there is not effect of interface orientation in target location.

(a) (b)

Figure 3.8. Correct selection rate by target width and interface orientation with gain condition = 1. Interface orientation: (a) portrait orientation, (b) landscape

orientation.

3.4 Discussion

We observed that all the participants (without previous experience with head-tracker interfaces) were able to operate with the interface successfully and without specific indications, so we can affirm that it is an intuitive and natural interface.

A particular feature of mobile devices is that they can be hold in different orientations.

In practice, this feature implies a different location of the front camera for each orientation and, therefore, a different perspective of the face and a change in its movement’s translations. However, despite these practical facts, our user study confirms that thedevice orientationhasno effecton the use of the head-tracker interface.

As expected, the effect of gain is significant on accuracy, error rate and velocity.

Despite a higher mean velocity is reached with gain = 1.5 in most of the cases (8.43

% faster), gain = 1 produces a higher accuracy (14.38% more accurate) and a lower error rate (21.68% lower), therefore we recommend the use ofgain = 1.

(53)

As we observed with the accuracy results (the distance from the effective selection to the center increases with the target size), the participants were more concerned about selecting correctly than accurately. According to the mean accuracy per task obtained (64 px), the minimum target width to obtain a good error rate would be greater than 128 pixels.

As expected, the effect of target width is important. But a post-hoc analysis shows no significant differences in accuracy and velocity for widths 176 px and 212 px.

Therefore, and also due to the limited size of mobile devices screens, we recommend a target size higher than 176 px.

3.5 Conclusion

FaceMe, the developed head-tracker interface, has been evaluated from the point of view of mobile HCI, contributing with the first attempt²of human performance evaluation of a head-tracker interface on mobile devices.

As this study is the first evaluation of the interface, it has been focused on ensuring that all regions of the device screen are accessible through the interface in order to conclude that the interface is valid.

Based on the obtained results, we can conclude that FaceMe is a valid and an intuitive interface to interact with mobile devices. Besides, two design recommendations are made for those designers and developers using the developed interface.

But, due to the particular motivation of our research, i.e., evaluate if all regions of the device screen were accessible through the head-tracker interface, our evaluation methodologies are ad hoc. Therefore, our experimental procedures, while internally valid, are not standard, and this undertakes comparisons between studies. So, although our experimental procedures allow us to answer our particular research question, we cannot compare our results with other studies present in the literature.

That is to say, we know that FaceMe is a valid interface for mobile devices; but, how valid is it? Is FaceMe as valid as other mobile input devices: e.g., tilt input? Hence, once we have ensured that FaceMe is a valid interface for mobile devices, further formal studies have to be conducted in order to evaluate it and allow its comparison with other input methods.

2Since the derived publication of this work [53], other mobile head-trackers human performance’s analysis have been recently published [16, 1].

3.5 Conclusion 31

(54)

Face Me! Head-tracker interface evaluation on mobile devices

DOCTORAL THESIS

2018

FACE ME!

HEAD-TRACKER INTERFACE EVALUATION ON MOBILE DEVICES

Maria Francesca Roig Maimó

DOCTORAL THESIS

2018

Doctoral Programme in Information and Communications Technology

FACE ME!

HEAD-TRACKER INTERFACE EVALUATION ON MOBILE DEVICES

Maria Francesca Roig Maimó

Thesis Supervisor: Javier Varona Gómez

Thesis Supervisor: Cristina Suemay Manresa Yee Thesis tutor: Javier Varona Gómez

Doctor by the Universitat de les Illes Balears

Publications and contributions

Journals

Proceedings

Book chapters

Internships

Projects

Agraïments

„

„

Contents

Abstract

Resum (català)

Resumen (castellano)

1

Introduction: The storyboard

„

Objectives (a.k.a. Spoilers)

2

FaceMe: The head-tracker interface

A robust camera based interface for mobile devices

„

2.1 Introduction

2.2 Interface design and development

2.3 User detection (and facial feature selection)

2.4 Tracking

2.5 Transfer function

2.5.1 Using the interface as a pointing device

2.5.2 Using the interface for gesture recognition

2.6 Conclusion

3

Evaluating the human performance of FaceMe

„

3.1 Introduction

3.2 The FaceMe experiment

3.2.1 Participants

3.2.2 Apparatus and experiment task

3.2.3 Procedure

3.2.4 Design

3.3 Results

3.3.1 Accuracy

3.3.2 Velocity

3.3.3 Error rate and target position

3.4 Discussion

3.5 Conclusion