Analysis of mobile eye-tracking studies

(1)

1 Thies Pfeiffer – Central Facility Labs

Analysis of mobile eye-tracking studies

(2)

Eyetracking Quickstart

Buswell, 1935

Taken from Joos et al. 2005

Prukinje-Eyetracker, Source unknown

(3)

Why do we move our eyes?

The eyes only perceive a small part of the world in high acuity 1,3°in the foveola to

5° in the fovea

(4)

Why do we move our eyes?

The eyes only perceive a small part of the world in high acuity 1,3°in the foveola to

Where?

(5)

Why do we move our eyes?

The eyes are permanently on the move to scan our environemnt

Where?

When?

The eyes only perceive a small part of the world in high acuity 1,3°in the foveola to

5° in the fovea

(6)

Why do we move our eyes?

Fixation

Point-of-Regard

Scanpath

(Alfred L. Yarbus, 1967)

Where?

When?

The eyes only perceive a small part of the world in high acuity 1,3°in the foveola to

The eyes are permanently on the

move to scan our environemnt

(7)

What do we process during fixations?

Fixation

• Express Fixations < 100 ms

• identification of known stimuli (e.g. brands, signs)

• Image Processing 100 – 300 ms

• processing of emotions and image- based information

• Reading > 300 ms

• analysis and understanding of texts and complex structures

Scanpath

Point-of-Regard How long?

The eyes are permanently on the move to scan our environemnt

(Alfred L. Yarbus, 1967)

What?

(8)

Additional relevant eye movements

• Smooth Pursuit

• following moving targets

• Spatial Perception

• Accomodation

• adapting the lense to different depths of focus

• Vergence Movements

• bringing the point of regard onto

corresponding retinal areas in

both eyes

(9)

How do we measure eye movements?

Buswell, 1935

Taken from Joos et al. 2005

Prukinje-Eyetracker, Source unknown

(10)

Detecting eye orientation

• most common method today based on video camera in the infrared domain

• eye is illuminated using infrared LED

• detecting the pupil using methods of computer vision

• for stabilization, the reflection of the LED on the lens is also detected

measurements provide the

orientation of the eye and

the size of the pupil

(11)

Mapping eye orientation to the computer screen

• Goal: get fixation information in terms of screen coordinates >

plane of analysis (2D)

• Requirement: head/eye position relative to eye tracker + eye tracker position relative to screen

• two solutions

• fix the head using a chinrest or bitebar

• use computer vision again to track the eye position

• mapping from eye position and orientation to screen (analysis

plane) often done using an explicit calibration

Modern remote eye tracking system attached to a

laptop.

(12)

How do we analyse eye movements?

(13)

Scanpath Analysis

How to create a scanpath:

• map orientation and position of the eye to coordinates of target stimuli to get fixated point (short

“fixation”)

• e.g. desktop pixels

• map fixation duration to radius, draw circle around fixated point

• connect subsequent fixations by straight lines,

• in addition to that, sometimes fixations

are numbered

(14)

Scanpath Analysis

What do we learn?

• closer investigation of a single individual

• visualization of the viewing process

• Important indices

• number of fixations, duration of fixations, distance of saccades

• re-fixations: going back to previously fixated areas

• sub-path patterns

• Example research topics

• text understanding

• predicting next fixation target, e.g. syllables in reading

(15)

Region Analysis

How to create a region analysis:

• define regions on the target stimuli and label them

• rectangles in the simplest case, but polygons are also possible

• aggregate fixations within each region and create per-region statistics

• e.g. min/max duration, median duration, number of fixations, total duration

• connect regions according to the

frequency of their transitions

with directed arrows

(16)

Region Analysis

What do we learn?

• investigation of groups

• coarse visualization of the viewing process

• Important indices

• number of fixations, duration of fixations

• transitions, transition probability

• Example research topics

• interaction between text and images

(17)

Analysis of Attention Maps / Heatmaps

How to create an attention map:

• map orientation and position of the eye to coordinates of target stimuli to get fixated point

• map fixation duration to attention level

• spread around the fixated point according to the area of high acuity

• typically modelled as a Gaussian distribution

• map the accumulated attention level to a color

• e.g. heat color ramp ⇒ Heatmaps

(18)

Analysis of Attention Maps / Heatmaps

What do we learn?

• investigation of groups

• taking into account area of high acuity

• Important indices

• duration of fixations

• areas of low/high attention level

• Example research topics

• saliency mapping (e.g. comparing with computer vision)

• quantitative analysis of designs

(19)

What do these approaches have in common?

• In all approaches, the human establishes the link between

“pixels of attention” and the attended content

• Implicitly for scanpaths and attention maps based on spatial co- occurance

• Explicitly for region-based analysis

• Process

• Gaze Position & Orientation (2.5D)

⇒ Screen Coordinates (2D)

⇒ Content (2D)

(20)

What do we hope to get

from following eye movements?

(21)

Speech processing

- Visual World Paradigm

• Idea:

• Based on the eye movements of a listener during a verbal instruction one can draw conclusions about the processing of different language structures in the brain (e.g. preferences, sequences, etc.)

• Method of empirical research in psycholingusitics: Visual World Paradigm

(Tanenhaus, et al. (1995). Integration of Visual and Linguistic Information in Spoken Language Comprehension. Science, 268, 1632-1634)

Weiß, P., Pfeiffer, T., Eikmeyer, H. - J., & Rickheit, G. (2006).

Processing Instructions. In G. Rickheit & I. Wachsmuth (Eds.), Situated Communication(pp. 31–76). Berlin: Mouton de Gruyter.

(22)

Detection perceptual biases

(23)

Assessing cognitive processes - Detecting Search Strategies

Gaze Analysis

Pfeiffer, J., Pfeiffer, T., & Meißner, M. (In Press).Towards Attentive In-Store Recommender Systems: Detecting Exploratory vs. Goal-oriented Decisions. Proceedings of the SIGDSS 2013 Pre-ICIS Workshop - Reshaping Society through Analytics, Collaboration, and Decision Support: Role of BI and Social Media.

(24)

Assessing level of expertise

One of the observed persons is the expert, the other a trainee.

Which video shows the recordings of the expert?

Cooperation with Prof. Dr. Jörg Thomaschewski, HS Emden-Lehr

(25)

HCI: Interaction between

Speech, Gestures, Gaze and Environment

• Using motion capturing and eye tracking we measured gaze and gestures during communication of references

• Result: the best approximation of pointing direction was achieved taking the gaze direction of the

dominant eye into account (Pfeiffer 2011).

•

Implications for the addressee

Pfeiffer, T. (2011). Understanding Multimodal Deixis with Gaze and Gesture in Conversational Interfaces(Berichte aus der Informatik) . Aachen, Germany: Shaker Verlag.

°

(26)

Grounding with the Eyes: Joint Attention

• If interaction partners deliberately direct their attention towards a target, this is called: Joint

Attention.

• In establishing Joint Attention, it is important in which sequence the gaze alternates between the target and the interlocutor ⇒

„Communication Protocol“

Pfeiffer-Leßmann, N., Pfeiffer, T., & Wachsmuth, I. (2013).A model of joint attention for humans and machines.ECEM 2013, JEMR Vol. 6, pp. 152–152.

(27)

Desktop-based 2D Systems

Advantages

• Strong assumption of comparable perspectives between participants

• Strong assumption about temporal synchronization of perceived

stimulus onsets (because they are always within field of view)

• Effortless identification of gaze targets

• Convenient tools for analyzing gaze

data (Scanpath, Heatmap, AOI/ROI)

(28)

Desktop-based 2D Systems

Disadvantages

• Restricted application domain

• restricted field of view

• restricted presentations of 3D stimuli

• restricted interaction with other modalities (walking, sports, etc.)

• only simple interactive situations

• almost no social interactions

• no real-life situations

• In almost all cases, the target domain

needs to be modelled in the computer to

be subject to analysis

(29)

Current Trend:

From Stationary Eye Tracking to Mobile Systems

(30)

Leaving the Laboratory, Embracing the Real World

(31)

Studying Real Interactions

Studies on human-human interactions in close interaction spaces.

(32)

Measuring Mobile Eye Tracking Data

• Basic idea similar to desktop

• Mapping eye orientation to video plane for analysis

• General eye position and arrangement of camera and plane known by design

• Hard part is detecting the pupil in

different lighting conditions and

environments

(33)

Why is mobile eye tracking then so difficult?

• Main problems:

• Content on the analysis plane is not known

• dynamic environments

• moving head ⇒ moving camera ⇒ moving content

• Location of content on the analysis plane depends on time, position and orientation of the wearer’s head > highly individual data

• Fixation data cannot be aggregated simply by location

• What is a fixation in a mobile setting anyway?

• Standard methods of analysis are not directly applicable

• They rely on the assumption of a static content on the plane of

analysis that does not change over time and/or between participants.

• Regions of interest are going in the right direction, but they are also

normally presented visually in static locations

(34)

Standard Solution:

Manual Annotation

• Manual Annotation of Gaze Videos

• going through the recordings

• frame-by-frame or

• fixation-by-fixation

• labelling each fixation according to underlying content

• Some approaches to speed up this process exist

• e.g. semanticode

• Result: comparable to region

analysis, good for statistics, but

no precise location on stimuli

(35)

Problems with Manual Annotation

• Direct problems

• very time consuming, often 15x original recording time

• cost/benefit ratio renders many studies inoperable

• differences in interpretation between annotators

• Inter-rater agreement, annotating (selected) sequences by several annotators

• Indirect problems

• because of effort, re-analysis is unlikely to happen and thus post-hoc changes of the annotation manual are unlikely to happen

• reduces scientific quality

• errors in the recordings are often only detected during analysis

• collecting more data is often problematic when distance in time is too large,

additional quality control right after recordings increases again the workload

(36)

Huge Problem:

Increased Numbers of Participants

(37)

Options to get out of the misery

• Do not be interested in content

• activity detection by raw gaze data analysis

• drowsiness detection

• detection of cognitive load

(38)

Options to get out of the misery

• Identify the content in the plane of analysis (scene camera video) automatically

Harmening, K. & Pfeiffer, T. Location-based online

identification of objects in the centre of visual attention using eye tracking. Proceedings of the First International Workshop on Solutions for Automatic Gaze-Data Analysis 2013 (SAGA 2013), Center of Excellence Cognitive Interaction Technology, 2013, 38- 40

(39)

Options to get out of the misery

• Do away completely with the weak 2D world!

• Standard Intermediate Approach

• Gaze Position & Orientation (2.5D)

⇒ Screen Coordinates (2D)

⇒ Content (2D)

• Direct Approach

• Gaze Position & Orientation (3D)

⇒ Content (3D)

(40)

3D Gaze Analysis in Virtual Reality

(41)

Parts of the problem already solved

Content

• is already represented in 3D

Gaze

• has to be mapped to

the 3D world.

(42)

Typical Virtual Reality Installatoin:

- CAVE

• 3D Stereo projections surrounding the user

• Head position and orientation is

tracked anyway to

compute the required

perspective for the

rendering process

(43)

VR Example: Joint Attention

- Cognitive Modelling in the Agent Max

Pfeiffer-Leßmann, N., Pfeiffer, T., & Wachsmuth, I. (2012). An operational model of joint attention - timing of gaze patterns in interactions between humans and a virtual human. Proceedings of the 34th Annual Conference of the Cognitive Science Society(pp. 851–856).

(44)

Combining Motion Capturing and Eye Tracking

• Scene Camera

• Video-based Eye Tracking

• Binocular

• Infrared LED

• Cable bound

(45)

Construction of the 3D User Model

Head

Eye Left

Eye Right

Transfor mation

Transfor mation Transform

ation

Head Position & Orientation

Eye Orientation

Eye Distance

(46)

Accuracy and Precision

(Pfeiffer, 2008)

(47)

Model-based Determination of 3D Point-of-Regard

3D Point-of-Regard

• Basic approach

• Requires only monocular eye tracking

Position is determined by intersecting gaze-ray with

object models

(48)

Model-based Determination of 3D Point-of-Regard

3D Point-of-Regard?

Position is determined by intersecting gaze-ray with

object models

(49)

Taking Vergence Movements into Account

3D Point-of-Regard!

Gaze depth is detemined by

analyzing vergence movements

(50)

Taking Vergence Movements into Account

3D Point-of-Regard?

Gaze depth is detemined by

analyzing vergence movements

(51)

Machine Learning Approach

• Intersection of line of sight • Parameterized Self- Organizing Map (ML)

(Pfeiffer, Latoschik und Wachsmuth, 2010)

(52)

Machine Learning Approach

3D Point-of-Regard 3D Point-of-Regard based on

machine learning (PSOM)

(53)

Visualization: 3D Scan Path (Single Person)

• Fixations as spheres

• Size represents duration

• Saccades represented as links

(54)

Visualization: 3D Scan Path (Multiple Persons)

Data

• 3D point-of-regards using PSOM

• 10 persons

• Visualization not suitable for many

parallel 3D scan paths

(55)

Model-of-Interest based Visualization

(Stellmach, Nacke und Dachselt, 2010)

Data

• 3D point-of-regard detemined by model intersection

• Recorded on desktop, monocular eye tracking

Visualization

• Color-coding duration of attention or number of fixations per object

• Analogous to 2D Heatmaps

• red: most-often fixated areas

• blue: rarely fixated areas

• uncolored: not fixated areas

(56)

Surface-based Visualization

Data

• 3D point-of-regard detemined by model intersection

• Recorded on desktop, monocular eye tracking

Visualization

• Color-coding duration of attention

or number of fixations per object

(57)

3D Attention Volumes

Data

• 3D point-of-regard based on PSOM

Visualization

• Volume rendering of attention

• Models of the objects of interest are not necessarily required

Pfeiffer, T. (2011). Understanding Multimodal Deixis with Gaze and Gesture in Conversational Interfaces(Berichte aus der Informatik) . Aachen, Germany: Shaker Verlag.

(58)

3D Attention Volumes

(59)

3D Attention Volumes on Real Objects

(60)

Transition to 3D Gaze Analysis in Real Life

(61)

Parts of the problem already solved

Content

• Idea:

• only model relevant

aspects of the world, so called proxy objects

Gaze

• has to be mapped to

the 3D world.

(62)

Getting Head Position & Orientation

Egocentric Camera Pose

Estimation using Scene Camera

• inexpensive

• requires computational power

• might be intrusive to design (markers)

Camera Pose Estimation using Outside-in Tracking

• high precision

• expensive (20.000,- and up)

• restricted area

(63)

Camera Pose Estimation

3D position

& orientation

(64)

3D AOI: Annotated Proxy Geometry

3D position

& orientation Window

Door

Chimney

3D position

& orientation

(65)

Determining Fixation Target

3D position

& orientation Window

Door

Chimney

gaze ray

3D area of interest 3D position

& orientation

(66)

First

Set-up Coordinate Frame

(67)

Second

Place Target Objects

(68)

Third

Enter 3D Proxy Geometries into the Model

Used to identify target Annotate Proxy Objects:

<MillimeterField DEF='field1' id='1'>

<ObservableObject DEF='MyObject' name='AOI' position='0 0 0' size='1 1 1'/>

</ MillimeterField >

(69)

Third – Advanced Users

Alternatively annotate complex objects

For adanced users

• 3D Scans using Microsoft Kinect or Intel RealSense

• 3D Modelling e.g. in Blender

Window

Door

Chimney

(70)

Forth

Run the experiment

Alternative 1

• Use standard procedure of eyetracking system to

record video and sample file with gaze data

• Use EyeSee3D to analyse the video and sample data offline

Alternative 2

• Use EyeSee3D in online-

mode to get results in real-

time

(71)

Fifth

Collect Results

• CSV file output:

• Time: absolute time of day

• Framenumber: number of frame from scene camera

• Fixated Object ID: as specified in the model annotation

• Fixated Position: in 3D coordinates

• Observer Position: in 3D coordinates

• Observer Matrix: 4x4 Matrix with Position/Orientation of observer

• Merge that with event/sample files from eyetracker

(72)

Efficient Analysis of Mobile Eye-Tracking Studies

Pfeiffer, T., & Renner, P. (2014). EyeSee3D: A Low-Cost Approach for Analysing Mobile 3D Eye Tracking Data

Using Augmented Reality Technology. Proceedings of the Symposium on Eye Tracking Research and Applications, 195–202.

(73)

Eye-Hand Coordination

(74)

Towards Visualizations for 3D Eye Tracking

Maurus et al. (2014) Realistic Heatmap Visualization for Interactive Analysis of 3D Gaze Data. ETRA 2014

(75)

Recent work:

Towards more realistic 3D attention mapping

• Problems with existing approaches

• based on intersections, not real 3D gaze position [maurus, stellmach]

• centered on objects (no cross object scattering) [stellmach]

• no check for occlusions [stellmach]

• visualization based on vertex coloring [stellmach]

• no support for moving objects [maurus, stellmach]

• Application side

• require dedicated viewer [maurus]

• post-processing process [maurus, stellmach]

• sub-optimal rendering quality [maurus, stellmach]

(76)

Recent work

Pfeiffer, Thies and Memili, Cem (2015). GPU-accelerated Attention Map Generation for Dynamic 3D Scenes. IEEE VR 2015.

(77)

3D Attention Mapping on 3D Objects

(78)

Our approach

• Realistic 3D Point-of-Regard Modelling

• Shadow mapping for every 3D fixation

• Binocular eye tracking for depth estimation

• 3D Gaussian to represent spread of attention around 3D POR

• Per-object representation of attention in Attention Texture

• Provides adjustable level of detail (texture size, texture UV mapping)

• Allows for moving/transforming objects

• Global maximum collection in Max-Attention-Texture

• Speed-up normalization by reducing read/write cycles

• Splitting attention aggregation from heatmap generation

• Attention is aggregated on per-object level

• Heatmap textures are generated on-the-fly using a shader

• Heatmap textures can be exported for high-quality renderings

(79)

Performance

on Quadro K5000 (173 GB/s, 256-bit)

(80)

Mapping Attention in Complex 3D Scenarios

(81)