Building Telepresence Systems: TranslatingScience Fiction into Reality

(1)

BUILDING TELEPRESENCE SYSTEMS:

Translating Science Fiction Ideas into Reality Translating Science Fiction Ideas into Reality

Henry Fuchs

University of North Carolina at Chapel Hill (USA) and

NSF Science and Technology Center for Computer Graphics and Scientific Visualization

Major support is gratefully acknowledged from the U.S. National Science Foundation and

from the U.S. Defense Advanced Research Projects Agency

Henry Fuchs Henry Fuchs

University of North Carolina at Chapel Hill (USA) University of North Carolina at Chapel Hill (USA)

andand

NSF Science and Technology Center for NSF Science and Technology Center for Computer Graphics and Scientific Visualization Computer Graphics and Scientific Visualization

Major support is gratefully acknowledged Major support is gratefully acknowledged from the U.S. National Science Foundation and from the U.S. National Science Foundation and

from the U.S. Defense Advanced Research Projects Agency from the U.S. Defense Advanced Research Projects Agency

(2)

Introduction Introduction

v The dominant grand challenge of graphics in the past 30 years has been realism, esp. photorealism

v Briefly, in the past decade, being in a virtual world, captured the public imagination

v Next, to be immersed (at least partly), in a far-away place/ with far-away people

v Driving examples: telemedicine, telecollaboration (MCAD) and laparoscopic surgery

vv The dominant grand challenge of graphics in the past The dominant grand challenge of graphics in the past 30 years has been realism, esp. photorealism

30 years has been realism, esp. photorealism

vv Briefly, in the past decade, being in a virtual world, Briefly, in the past decade, being in a virtual world, captured the public imagination

captured the public imagination

vv Next, to be immersed (at least partly), in a far-away Next, to be immersed (at least partly), in a far-away place/ with far-away people

place/ with far-away people

vv Driving examples: telemedicine, telecollaboration Driving examples: telemedicine, telecollaboration (MCAD) and laparoscopic surgery

(MCAD) and laparoscopic surgery

(3)

Initial Concepts:Visual Telepresence

(1993) (1993)

Local MD & Patient Remote Consultant Local MD & Patient Remote Consultant

(4)

Medical Consultants “Together”

Medical Consultants “Together” with with Local MD & PatientLocal MD & Patient

(5)

Problem: Too Difficult (for now) Problem: Too Difficult (for now)

v Real-time 3D scene capture at each of the sites

v Presentation of local plus remote scenes on head-mounted displays to each of the

participants

v Other tasks: image generation, head and hand tracking, etc. are easy by comparison

vv Real-time 3D scene capture at each of the sitesReal-time 3D scene capture at each of the sites

vv Presentation of local Presentation of local plusplus remote scenes on remote scenes on head-mounted displays to each of the

head-mounted displays to each of the participants

participants

vv Other tasks: image generation, head and hand Other tasks: image generation, head and hand tracking, etc. are easy by comparison

tracking, etc. are easy by comparison

(6)

Solution: Work on an easier problem first Solution: Work on an easier problem first

vEliminate need for head-mounted display

vReduce need of scene capture

l to smaller regions of the rooms

l to reconstruction (and viewing) from fewer places

vNew, easier problem:

Advancing teleconference-based

TELECOLLABORATIONS toward TELEPRESENCE

vvEliminate need for head-mounted displayEliminate need for head-mounted display

vvReduce need of scene captureReduce need of scene capture

ll to smaller regions of the roomsto smaller regions of the rooms

ll to reconstruction (and viewing) from fewer placesto reconstruction (and viewing) from fewer places

vvNew, easier problem: New, easier problem:

Advancing teleconference-based Advancing teleconference-based

TELECOLLABORATIONS

TELECOLLABORATIONS toward toward TELEPRESENCETELEPRESENCE

(7)

Our Vision of Telecollaboration:

A Normal Office

(8)

Our Vision of Telecollaboration:

Overlapped Projected Displays

(9)

Our Vision of Telecollaboration:

Seeing and Manipulating Objects Seeing and Manipulating Objects

Stereo via Stereo via shutter / shutter /

prescription prescription glasses

glasses

Displays can Displays can light up and light up and cover the cover the entire room entire room

(10)

Our Vision of Telecollaboration:

“Being There” Together

(11)

What Will It Take: major areas What Will It Take: major areas

v Displays: fixed, not head-mounted

v 3D scene capture

v Image generation system

v Tracking system

vv Displays: fixed, not head-mountedDisplays: fixed, not head-mounted

vv 3D scene capture3D scene capture

vv Image generation systemImage generation system

vv Tracking systemTracking system

(12)

Displays: New opportunities with Displays: New opportunities with

Micromirror-based displays Micromirror-based displays

v Physical micro-mirrors on custom IC

l 800 x 600 pixel resolution typical

l Commercial product from Texas Instruments

v One bit of memory behind each mirror

v Consider it as part of the memory-space of the graphics system, not as a separate projector

v Use for both display and for (lighting to aid) scene capture

vv Physical micro-mirrors on custom ICPhysical micro-mirrors on custom IC

ll 800 x 600 pixel resolution typical800 x 600 pixel resolution typical

ll Commercial product from Texas Instruments Commercial product from Texas Instruments

vv One bit of memory behind each mirrorOne bit of memory behind each mirror

vv Consider it as part of the memory-space of the Consider it as part of the memory-space of the graphics system, not as a separate projector

graphics system, not as a separate projector

vv Use for both display and for (lighting to aid) Use for both display and for (lighting to aid) scene capture

scene capture

(13)

Displays:

Fixed, large visual areaFixed, large visual area

v Adapt to user’s own environment

v Large area

v High resolution

v Bright / High contrast

v (Increased demand on image generation)

l Lot more pixels

l Adapt to custom screen geometries

vv Adapt to user’s own environmentAdapt to user’s own environment

vv Large areaLarge area

vv High resolutionHigh resolution

vv Bright / High contrastBright / High contrast

vv (Increased demand on image generation)(Increased demand on image generation)

ll Lot more pixelsLot more pixels

ll Adapt to custom screen geometriesAdapt to custom screen geometries

(14)

Video of “3D Talking Heads”

v Rapid depth extraction via ‘structured light’

l reduces problem of finding corresponding points in multiple camera images

v Light patterns made nearly imperceptible by projecting complementary patterns very rapidly

vv Rapid depth extraction via ‘structured light’Rapid depth extraction via ‘structured light’

ll reduces problem of finding corresponding points in reduces problem of finding corresponding points in multiple camera images

multiple camera images

vv Light patterns made nearly imperceptible by Light patterns made nearly imperceptible by projecting complementary patterns very rapidly projecting complementary patterns very rapidly

(15)

From Images to Geometry: A New Paradigm for 3D From Images to Geometry: A New Paradigm for 3D Computer Graphics

Computer Graphics

Images

Geometry

QuickTime VR (not 3D)

Lumigraph Light-Field Rendering

depth maps

“3D talking heads”

texture mapping

geometric models

( 3D fax ) Talisman

(16)

Image-Based Rendering, Inverse Rendering (from J.Arvo) Image-Based Rendering, Inverse Rendering (from J.Arvo)

Inverse Rendering

Inverse Inverse Rendering Rendering Images

Images

Images New imagesNew imagesNew images

Interpolation Interpolation Interpolation

New images can be obtained by interpolating

New images can be obtained New images can be obtained

by interpolating by interpolating

(17)

Post-Rendering Warp Post-Rendering Warp

Warp

Past Viewpoint

Predicted Future Viewpoint

Current Viewpoint

(18)

Video of Post Rendering Warp Video of Post Rendering Warp

...

(19)

Image Generation:

High-performance Graphics High-performance Graphics Computers

Computers

Now: screen subdivision (~SGI RE, UNC Pixel-Planes 5)

Next: z-compositing final images (UNC PixelFlow)

R R

G G

R R

G G

G

G Geometry

Processors Rasterization

Processors

Z-priority Compositors

C C C C

(20)

Object Parallel by Image Composition Object Parallel by Image Composition

Rasterizer Geometry

Processor

Frame Buffer

Rasterizer Geometry

Processor

Frame Buffer

Rasterizer Geometry

Processor

Frame Buffer

Rasterizer Geometry

Processor

Frame Buffer

C C

C

Renderer

Renderers send visibility

& color data into network

Image Composition Network:

C's do visibility test on pixels between top and left.

Multiple

Independent Primitive Streams

(21)

ImageFlow ImageFlow

v Departure from polygon-based rendering

v History of rendering: lines, polygons, texture, depth (our belief)

v Warp images based on depth value at each image sample

v Input from cameras

v Preliminary design begun

vv Departure from polygon-based renderingDeparture from polygon-based rendering

vv History of rendering: lines, polygons, texture, History of rendering: lines, polygons, texture, depth (our belief)

depth (our belief)

vv Warp images based on depth value at each Warp images based on depth value at each image sample

image sample

vv Input from camerasInput from cameras

vv Preliminary design begunPreliminary design begun

(22)

Tracking user’s head Tracking user’s head

v Difficulty with commercial trackers

v Image-based tracking / hybrid tracking

l new difficulty: keeping tracking targets in view

v Predictive tracking (Ron Azuma: possible for 50-60ms)

vv Difficulty with commercial trackersDifficulty with commercial trackers

vv Image-based tracking / hybrid trackingImage-based tracking / hybrid tracking

ll new difficulty: keeping tracking targets in viewnew difficulty: keeping tracking targets in view

vv Predictive tracking (Ron Azuma: possible for Predictive tracking (Ron Azuma: possible for 50-60ms)

50-60ms)

(23)

v Second generation ceiling tracker

v Ceiling tiles completed and installed

l simple drop-in “acoustical” tiles

l enabled by Brown, Caltech, UNC collaboration

v Design and fabrication at UNC and Utah

v System functioning

l 2KHz estimates (0.5 ms latency)

l 0.1 mm RMS position noise

l 0.02 degree RMS orientation noise

vv Second generation ceiling trackerSecond generation ceiling tracker

vv Ceiling tiles completed and installedCeiling tiles completed and installed

ll simple drop-in “acoustical” tilessimple drop-in “acoustical” tiles

ll enabled by Brown, Caltech, UNC enabled by Brown, Caltech, UNC collaboration

collaboration

vv Design and fabrication at UNC and UtahDesign and fabrication at UNC and Utah

vv System functioningSystem functioning

ll 2KHz estimates (0.5 ms latency)2KHz estimates (0.5 ms latency)

ll 0.1 mm RMS position noise0.1 mm RMS position noise

ll 0.02 degree RMS orientation noise0.02 degree RMS orientation noise The HiBall Tracker

The HiBall Tracker

(24)

HiBall: photo

(25)

SCAAT Autocalibration: ceiling photo & LED calibrations SCAAT Autocalibration: ceiling photo & LED calibrations

(26)

Video of UNC HiBall Tracker

(27)

“Being There”: in 5-10 years

vKey: Acquire and Display EVERY mm EVERY sec

v v v v v

v ( end )

vvKey: Acquire and Display EVERY mm EVERY secKey: Acquire and Display EVERY mm EVERY sec

vv vv vv vv vv

vv ( end ) ( end )

(28)

Video of “Walking around Leonard’s Yard”

v Illustrated photographic “feel” of rendering from image input

v For each pixel in 360-degree panorama gets a depth/disparity value

vv Illustrated photographic “feel” of rendering Illustrated photographic “feel” of rendering from image input

from image input

vv For each pixel in 360-degree panorama gets a For each pixel in 360-degree panorama gets a depth/disparity value

depth/disparity value

(29)

New Custom Head-mounted Display New Custom Head-mounted Display

(

D Colucci, K Keller, R Fish@U.Utah)D Colucci, K Keller, R Fish@U.Utah)

v Video see-through to correctly merge real &

synthetic parts of the scene (esp. occlusion)

v Video cameras optically at user’s eye positions

v Unobstructed view except for display

v Flip-up / flip-down

vv Video see-through to Video see-through to correctly merge real &

correctly merge real &

synthetic parts of the scene synthetic parts of the scene

(esp. occlusion) (esp. occlusion)

vv Video cameras optically at Video cameras optically at user’s eye positions

user’s eye positions

vv Unobstructed view except Unobstructed view except for display

for display

vv Flip-up / flip-downFlip-up / flip-down David Casalino, MDDavid Casalino, MDDavid Casalino, MD

(30)

Displays (2 of 2): Head-mounted Displays (2 of 2): Head-mounted

v Comfortable

v See-through (usually)

l Optical: cheap, easy; can’t combine real and virtual

l Video: bulky, esp. for wide field of view

v Field of view

v Brightness / Resolution

vv ComfortableComfortable

vv See-through (usually)See-through (usually)

ll Optical: cheap, easy; can’t combine real and virtualOptical: cheap, easy; can’t combine real and virtual

ll Video: bulky, esp. for wide field of viewVideo: bulky, esp. for wide field of view

vv Field of viewField of view

vv Brightness / ResolutionBrightness / Resolution

(31)

Laparoscopic Visualization Laparoscopic Visualization

(with Anthony Meyer, MD, PhD)(with Anthony Meyer, MD, PhD)

v Goal: view from surgeon’s normal point of view - - as with open surgery

v Key challenge: extract 3D range image from laparoscopic camera

v Initial experiment

l Pre-experiment: mechanically scanned 3D surface of medical model

l During experiment: mapped live “laparoscopic”

camera video onto the 3D surface

vv Goal: view from surgeon’s normal point of view -Goal: view from surgeon’s normal point of view - - as with open surgery

- as with open surgery

vv Key challenge: extract 3D range image from Key challenge: extract 3D range image from laparoscopic camera

laparoscopic camera

vv Initial experimentInitial experiment

ll Pre-experiment: mechanically scanned 3D surface of Pre-experiment: mechanically scanned 3D surface of medical model

medical model

ll During experiment: mapped live “laparoscopic” During experiment: mapped live “laparoscopic”

camera video onto the 3D surface camera video onto the 3D surface

v .

v . .

v

l . . . . .

l . . ..

v .

v . .

v

l . . . . .

l . . ..

(32)

Our Vision of Telecollaboration:

“Being There” Together

Shared CAD & Visualization

Modeling

Image Generation

Scene Acquisition Interaction

Networking Tracking

Rendering

Displays

(33)

Needing to Know Depth within Video Images Needing to Know Depth within Video Images

v Needed to extract 3D remote environment

v Postrendering warp: widely applicable to speed-up image generation frame rate

vv Needed to extract 3D remote environmentNeeded to extract 3D remote environment

vv Postrendering warp: widely applicable to Postrendering warp: widely applicable to speed-up image generation frame rate

speed-up image generation frame rate