BUILDING TELEPRESENCE SYSTEMS:
BUILDING TELEPRESENCE SYSTEMS:
Translating Science Fiction Ideas into Reality Translating Science Fiction Ideas into Reality
Henry Fuchs
University of North Carolina at Chapel Hill (USA) and
NSF Science and Technology Center for Computer Graphics and Scientific Visualization
Major support is gratefully acknowledged from the U.S. National Science Foundation and
from the U.S. Defense Advanced Research Projects Agency
Henry Fuchs Henry Fuchs
University of North Carolina at Chapel Hill (USA) University of North Carolina at Chapel Hill (USA)
andand
NSF Science and Technology Center for NSF Science and Technology Center for Computer Graphics and Scientific Visualization Computer Graphics and Scientific Visualization
Major support is gratefully acknowledged Major support is gratefully acknowledged from the U.S. National Science Foundation and from the U.S. National Science Foundation and
from the U.S. Defense Advanced Research Projects Agency from the U.S. Defense Advanced Research Projects Agency
Introduction Introduction
v The dominant grand challenge of graphics in the past 30 years has been realism, esp. photorealism
v Briefly, in the past decade, being in a virtual world, captured the public imagination
v Next, to be immersed (at least partly), in a far-away place/ with far-away people
v Driving examples: telemedicine, telecollaboration (MCAD) and laparoscopic surgery
vv The dominant grand challenge of graphics in the past The dominant grand challenge of graphics in the past 30 years has been realism, esp. photorealism
30 years has been realism, esp. photorealism
vv Briefly, in the past decade, being in a virtual world, Briefly, in the past decade, being in a virtual world, captured the public imagination
captured the public imagination
vv Next, to be immersed (at least partly), in a far-away Next, to be immersed (at least partly), in a far-away place/ with far-away people
place/ with far-away people
vv Driving examples: telemedicine, telecollaboration Driving examples: telemedicine, telecollaboration (MCAD) and laparoscopic surgery
(MCAD) and laparoscopic surgery
Initial Concepts:Visual Telepresence
Initial Concepts:Visual Telepresence
(1993) (1993)Local MD & Patient Remote Consultant Local MD & Patient Remote Consultant
Medical Consultants “Together”
Medical Consultants “Together” with with Local MD & PatientLocal MD & Patient
Problem: Too Difficult (for now) Problem: Too Difficult (for now)
v Real-time 3D scene capture at each of the sites
v Presentation of local plus remote scenes on head-mounted displays to each of the
participants
v Other tasks: image generation, head and hand tracking, etc. are easy by comparison
vv Real-time 3D scene capture at each of the sitesReal-time 3D scene capture at each of the sites
vv Presentation of local Presentation of local plusplus remote scenes on remote scenes on head-mounted displays to each of the
head-mounted displays to each of the participants
participants
vv Other tasks: image generation, head and hand Other tasks: image generation, head and hand tracking, etc. are easy by comparison
tracking, etc. are easy by comparison
Solution: Work on an easier problem first Solution: Work on an easier problem first
vEliminate need for head-mounted display
vReduce need of scene capture
l to smaller regions of the rooms
l to reconstruction (and viewing) from fewer places
vNew, easier problem:
Advancing teleconference-based
TELECOLLABORATIONS toward TELEPRESENCE
vvEliminate need for head-mounted displayEliminate need for head-mounted display
vvReduce need of scene captureReduce need of scene capture
ll to smaller regions of the roomsto smaller regions of the rooms
ll to reconstruction (and viewing) from fewer placesto reconstruction (and viewing) from fewer places
vvNew, easier problem: New, easier problem:
Advancing teleconference-based Advancing teleconference-based
TELECOLLABORATIONS
TELECOLLABORATIONS toward toward TELEPRESENCETELEPRESENCE
Our Vision of Telecollaboration:
Our Vision of Telecollaboration:
A Normal Office
A Normal Office
Our Vision of Telecollaboration:
Our Vision of Telecollaboration:
Overlapped Projected Displays
Overlapped Projected Displays
Our Vision of Telecollaboration:
Our Vision of Telecollaboration:
Seeing and Manipulating Objects Seeing and Manipulating Objects
Stereo via Stereo via shutter / shutter /
prescription prescription glasses
glasses
Displays can Displays can light up and light up and cover the cover the entire room entire room
Our Vision of Telecollaboration:
Our Vision of Telecollaboration:
“Being There” Together
“Being There” Together
What Will It Take: major areas What Will It Take: major areas
v Displays: fixed, not head-mounted
v 3D scene capture
v Image generation system
v Tracking system
vv Displays: fixed, not head-mountedDisplays: fixed, not head-mounted
vv 3D scene capture3D scene capture
vv Image generation systemImage generation system
vv Tracking systemTracking system
Displays: New opportunities with Displays: New opportunities with
Micromirror-based displays Micromirror-based displays
v Physical micro-mirrors on custom IC
l 800 x 600 pixel resolution typical
l Commercial product from Texas Instruments
v One bit of memory behind each mirror
v Consider it as part of the memory-space of the graphics system, not as a separate projector
v Use for both display and for (lighting to aid) scene capture
vv Physical micro-mirrors on custom ICPhysical micro-mirrors on custom IC
ll 800 x 600 pixel resolution typical800 x 600 pixel resolution typical
ll Commercial product from Texas Instruments Commercial product from Texas Instruments
vv One bit of memory behind each mirrorOne bit of memory behind each mirror
vv Consider it as part of the memory-space of the Consider it as part of the memory-space of the graphics system, not as a separate projector
graphics system, not as a separate projector
vv Use for both display and for (lighting to aid) Use for both display and for (lighting to aid) scene capture
scene capture
Displays:
Displays:
Fixed, large visual areaFixed, large visual areav Adapt to user’s own environment
v Large area
v High resolution
v Bright / High contrast
v (Increased demand on image generation)
l Lot more pixels
l Adapt to custom screen geometries
vv Adapt to user’s own environmentAdapt to user’s own environment
vv Large areaLarge area
vv High resolutionHigh resolution
vv Bright / High contrastBright / High contrast
vv (Increased demand on image generation)(Increased demand on image generation)
ll Lot more pixelsLot more pixels
ll Adapt to custom screen geometriesAdapt to custom screen geometries
Video of “3D Talking Heads”
Video of “3D Talking Heads”
v Rapid depth extraction via ‘structured light’
l reduces problem of finding corresponding points in multiple camera images
v Light patterns made nearly imperceptible by projecting complementary patterns very rapidly
vv Rapid depth extraction via ‘structured light’Rapid depth extraction via ‘structured light’
ll reduces problem of finding corresponding points in reduces problem of finding corresponding points in multiple camera images
multiple camera images
vv Light patterns made nearly imperceptible by Light patterns made nearly imperceptible by projecting complementary patterns very rapidly projecting complementary patterns very rapidly
From Images to Geometry: A New Paradigm for 3D From Images to Geometry: A New Paradigm for 3D Computer Graphics
Computer Graphics
Images
Geometry
QuickTime VR (not 3D)
Lumigraph Light-Field Rendering
depth maps
“3D talking heads”
texture mapping
geometric models
( 3D fax ) Talisman
Image-Based Rendering, Inverse Rendering (from J.Arvo) Image-Based Rendering, Inverse Rendering (from J.Arvo)
Inverse Rendering
Inverse Inverse Rendering Rendering Images
Images
Images New imagesNew imagesNew images
Interpolation Interpolation Interpolation
New images can be obtained by interpolating
New images can be obtained New images can be obtained
by interpolating by interpolating
Post-Rendering Warp Post-Rendering Warp
Warp
Warp
Past Viewpoint
Predicted Future Viewpoint
Current Viewpoint
Video of Post Rendering Warp Video of Post Rendering Warp
...
Image Generation:
Image Generation:
High-performance Graphics High-performance Graphics ComputersComputers
Now: screen subdivision (~SGI RE, UNC Pixel-Planes 5)
Next: z-compositing final images (UNC PixelFlow)
R R
R R
G G
G G
R R
R R
G G
G
G Geometry
Processors Rasterization
Processors
Z-priority Compositors
C C C C
Object Parallel by Image Composition Object Parallel by Image Composition
Rasterizer Geometry
Processor
Frame Buffer
Rasterizer Geometry
Processor
Frame Buffer
Rasterizer Geometry
Processor
Frame Buffer
Rasterizer Geometry
Processor
Frame Buffer
C C
C
C
Renderer
Renderers send visibility
& color data into network
Image Composition Network:
C's do visibility test on pixels between top and left.
Multiple
Independent Primitive Streams
ImageFlow ImageFlow
v Departure from polygon-based rendering
v History of rendering: lines, polygons, texture, depth (our belief)
v Warp images based on depth value at each image sample
v Input from cameras
v Preliminary design begun
vv Departure from polygon-based renderingDeparture from polygon-based rendering
vv History of rendering: lines, polygons, texture, History of rendering: lines, polygons, texture, depth (our belief)
depth (our belief)
vv Warp images based on depth value at each Warp images based on depth value at each image sample
image sample
vv Input from camerasInput from cameras
vv Preliminary design begunPreliminary design begun
Tracking user’s head Tracking user’s head
v Difficulty with commercial trackers
v Image-based tracking / hybrid tracking
l new difficulty: keeping tracking targets in view
v Predictive tracking (Ron Azuma: possible for 50-60ms)
vv Difficulty with commercial trackersDifficulty with commercial trackers
vv Image-based tracking / hybrid trackingImage-based tracking / hybrid tracking
ll new difficulty: keeping tracking targets in viewnew difficulty: keeping tracking targets in view
vv Predictive tracking (Ron Azuma: possible for Predictive tracking (Ron Azuma: possible for 50-60ms)
50-60ms)
v Second generation ceiling tracker
v Ceiling tiles completed and installed
l simple drop-in “acoustical” tiles
l enabled by Brown, Caltech, UNC collaboration
v Design and fabrication at UNC and Utah
v System functioning
l 2KHz estimates (0.5 ms latency)
l 0.1 mm RMS position noise
l 0.02 degree RMS orientation noise
vv Second generation ceiling trackerSecond generation ceiling tracker
vv Ceiling tiles completed and installedCeiling tiles completed and installed
ll simple drop-in “acoustical” tilessimple drop-in “acoustical” tiles
ll enabled by Brown, Caltech, UNC enabled by Brown, Caltech, UNC collaboration
collaboration
vv Design and fabrication at UNC and UtahDesign and fabrication at UNC and Utah
vv System functioningSystem functioning
ll 2KHz estimates (0.5 ms latency)2KHz estimates (0.5 ms latency)
ll 0.1 mm RMS position noise0.1 mm RMS position noise
ll 0.02 degree RMS orientation noise0.02 degree RMS orientation noise The HiBall Tracker
The HiBall Tracker
HiBall: photo
HiBall: photo
SCAAT Autocalibration: ceiling photo & LED calibrations SCAAT Autocalibration: ceiling photo & LED calibrations
Video of UNC HiBall Tracker
Video of UNC HiBall Tracker
“Being There”: in 5-10 years
“Being There”: in 5-10 years
vKey: Acquire and Display EVERY mm EVERY sec
v v v v v
v ( end )
vvKey: Acquire and Display EVERY mm EVERY secKey: Acquire and Display EVERY mm EVERY sec
vv vv vv vv vv
vv ( end ) ( end )
Video of “Walking around Leonard’s Yard”
Video of “Walking around Leonard’s Yard”
v Illustrated photographic “feel” of rendering from image input
v For each pixel in 360-degree panorama gets a depth/disparity value
vv Illustrated photographic “feel” of rendering Illustrated photographic “feel” of rendering from image input
from image input
vv For each pixel in 360-degree panorama gets a For each pixel in 360-degree panorama gets a depth/disparity value
depth/disparity value
New Custom Head-mounted Display New Custom Head-mounted Display
(
(
D Colucci, K Keller, R Fish@U.Utah)D Colucci, K Keller, R Fish@U.Utah)v Video see-through to correctly merge real &
synthetic parts of the scene (esp. occlusion)
v Video cameras optically at user’s eye positions
v Unobstructed view except for display
v Flip-up / flip-down
vv Video see-through to Video see-through to correctly merge real &
correctly merge real &
synthetic parts of the scene synthetic parts of the scene
(esp. occlusion) (esp. occlusion)
vv Video cameras optically at Video cameras optically at user’s eye positions
user’s eye positions
vv Unobstructed view except Unobstructed view except for display
for display
vv Flip-up / flip-downFlip-up / flip-down David Casalino, MDDavid Casalino, MDDavid Casalino, MD
Displays (2 of 2): Head-mounted Displays (2 of 2): Head-mounted
v Comfortable
v See-through (usually)
l Optical: cheap, easy; can’t combine real and virtual
l Video: bulky, esp. for wide field of view
v Field of view
v Brightness / Resolution
vv ComfortableComfortable
vv See-through (usually)See-through (usually)
ll Optical: cheap, easy; can’t combine real and virtualOptical: cheap, easy; can’t combine real and virtual
ll Video: bulky, esp. for wide field of viewVideo: bulky, esp. for wide field of view
vv Field of viewField of view
vv Brightness / ResolutionBrightness / Resolution
Laparoscopic Visualization Laparoscopic Visualization
(with Anthony Meyer, MD, PhD)(with Anthony Meyer, MD, PhD)
v Goal: view from surgeon’s normal point of view - - as with open surgery
v Key challenge: extract 3D range image from laparoscopic camera
v Initial experiment
l Pre-experiment: mechanically scanned 3D surface of medical model
l During experiment: mapped live “laparoscopic”
camera video onto the 3D surface
vv Goal: view from surgeon’s normal point of view -Goal: view from surgeon’s normal point of view - - as with open surgery
- as with open surgery
vv Key challenge: extract 3D range image from Key challenge: extract 3D range image from laparoscopic camera
laparoscopic camera
vv Initial experimentInitial experiment
ll Pre-experiment: mechanically scanned 3D surface of Pre-experiment: mechanically scanned 3D surface of medical model
medical model
ll During experiment: mapped live “laparoscopic” During experiment: mapped live “laparoscopic”
camera video onto the 3D surface camera video onto the 3D surface
v .
v . .
v
l . . . . .
l . . ..
v .
v . .
v
l . . . . .
l . . ..
Our Vision of Telecollaboration:
Our Vision of Telecollaboration:
“Being There” Together
“Being There” Together
Shared CAD & Visualization
Modeling
Image Generation
Scene Acquisition Interaction
Networking Tracking
Rendering
Displays
Needing to Know Depth within Video Images Needing to Know Depth within Video Images
v Needed to extract 3D remote environment
v Postrendering warp: widely applicable to speed-up image generation frame rate
vv Needed to extract 3D remote environmentNeeded to extract 3D remote environment
vv Postrendering warp: widely applicable to Postrendering warp: widely applicable to speed-up image generation frame rate
speed-up image generation frame rate