Application #3: Semantic Scene Understanding

(1)

• Modeling by example, revisited

Application #1: 3D Modeling

1

[Sung et al. 2017]

Deep neural network predicts the next best part to add and its position to enable non-expert

users to create novel shapes.

(2)

• Joint multi-modal understanding

Application #2: Image Understanding

2

[Zhang et al. 2017]

understanding 3D shapes can benefit image understanding

(3)

• Semantic 3D reconstruction

Application #3: Semantic Scene Understanding

[Song et al. 2017] 3

(4)

4

Motivating Applications: Semantic Scene Understanding

[Kelly et al. 2017, Kelly and Guerrero et al. 2018]

Application #4: 3D Asset Creation

(5)

• Number of Voxels grows as versus occupied surface

What’s Different in 3D?

5

O (n

³

)

<latexit sha1_base64="pPG/+UG2ht775Yz5Ajs7KOd+Azk=">AAAB7XicbZDLSgMxFIbP1Futt6pLN8Ei1E2ZsYIui27cWcFeoB1LJs20sZlkSDJCGfoOblwo4tb3cefbmLaz0NYfAh//OYec8wcxZ9q47reTW1ldW9/Ibxa2tnd294r7B00tE0Vog0guVTvAmnImaMMww2k7VhRHAaetYHQ9rbeeqNJMinszjqkf4YFgISPYWKt5WxYP1dNeseRW3JnQMngZlCBTvVf86vYlSSIqDOFY647nxsZPsTKMcDopdBNNY0xGeEA7FgWOqPbT2bYTdGKdPgqlsk8YNHN/T6Q40nocBbYzwmaoF2tT879aJzHhpZ8yESeGCjL/KEw4MhJNT0d9pigxfGwBE8XsrogMscLE2IAKNgRv8eRlaJ5VPMt356XaVRZHHo7gGMrgwQXU4Abq0AACj/AMr/DmSOfFeXc+5q05J5s5hD9yPn8AYnCOVQ==</latexit><latexit sha1_base64="pPG/+UG2ht775Yz5Ajs7KOd+Azk=">AAAB7XicbZDLSgMxFIbP1Futt6pLN8Ei1E2ZsYIui27cWcFeoB1LJs20sZlkSDJCGfoOblwo4tb3cefbmLaz0NYfAh//OYec8wcxZ9q47reTW1ldW9/Ibxa2tnd294r7B00tE0Vog0guVTvAmnImaMMww2k7VhRHAaetYHQ9rbeeqNJMinszjqkf4YFgISPYWKt5WxYP1dNeseRW3JnQMngZlCBTvVf86vYlSSIqDOFY647nxsZPsTKMcDopdBNNY0xGeEA7FgWOqPbT2bYTdGKdPgqlsk8YNHN/T6Q40nocBbYzwmaoF2tT879aJzHhpZ8yESeGCjL/KEw4MhJNT0d9pigxfGwBE8XsrogMscLE2IAKNgRv8eRlaJ5VPMt356XaVRZHHo7gGMrgwQXU4Abq0AACj/AMr/DmSOfFeXc+5q05J5s5hD9yPn8AYnCOVQ==</latexit><latexit sha1_base64="pPG/+UG2ht775Yz5Ajs7KOd+Azk=">AAAB7XicbZDLSgMxFIbP1Futt6pLN8Ei1E2ZsYIui27cWcFeoB1LJs20sZlkSDJCGfoOblwo4tb3cefbmLaz0NYfAh//OYec8wcxZ9q47reTW1ldW9/Ibxa2tnd294r7B00tE0Vog0guVTvAmnImaMMww2k7VhRHAaetYHQ9rbeeqNJMinszjqkf4YFgISPYWKt5WxYP1dNeseRW3JnQMngZlCBTvVf86vYlSSIqDOFY647nxsZPsTKMcDopdBNNY0xGeEA7FgWOqPbT2bYTdGKdPgqlsk8YNHN/T6Q40nocBbYzwmaoF2tT879aJzHhpZ8yESeGCjL/KEw4MhJNT0d9pigxfGwBE8XsrogMscLE2IAKNgRv8eRlaJ5VPMt356XaVRZHHo7gGMrgwQXU4Abq0AACj/AMr/DmSOfFeXc+5q05J5s5hD9yPn8AYnCOVQ==</latexit><latexit sha1_base64="pPG/+UG2ht775Yz5Ajs7KOd+Azk=">AAAB7XicbZDLSgMxFIbP1Futt6pLN8Ei1E2ZsYIui27cWcFeoB1LJs20sZlkSDJCGfoOblwo4tb3cefbmLaz0NYfAh//OYec8wcxZ9q47reTW1ldW9/Ibxa2tnd294r7B00tE0Vog0guVTvAmnImaMMww2k7VhRHAaetYHQ9rbeeqNJMinszjqkf4YFgISPYWKt5WxYP1dNeseRW3JnQMngZlCBTvVf86vYlSSIqDOFY647nxsZPsTKMcDopdBNNY0xGeEA7FgWOqPbT2bYTdGKdPgqlsk8YNHN/T6Q40nocBbYzwmaoF2tT879aJzHhpZ8yESeGCjL/KEw4MhJNT0d9pigxfGwBE8XsrogMscLE2IAKNgRv8eRlaJ5VPMt356XaVRZHHo7gGMrgwQXU4Abq0AACj/AMr/DmSOfFeXc+5q05J5s5hD9yPn8AYnCOVQ==</latexit>

O (n

²

)

<latexit sha1_base64="63ak468AXCor9nBfBLOpfhlLeqM=">AAAB7XicbZDLSgMxFIbP1Futt6pLN8Ei1E2ZKQVdFt24s4K9QDuWTJppYzPJkGSEMvQd3LhQxK3v4863MW1noa0/BD7+cw455w9izrRx3W8nt7a+sbmV3y7s7O7tHxQPj1paJorQJpFcqk6ANeVM0KZhhtNOrCiOAk7bwfh6Vm8/UaWZFPdmElM/wkPBQkawsVbrtiwequf9YsmtuHOhVfAyKEGmRr/41RtIkkRUGMKx1l3PjY2fYmUY4XRa6CWaxpiM8ZB2LQocUe2n822n6Mw6AxRKZZ8waO7+nkhxpPUkCmxnhM1IL9dm5n+1bmLCSz9lIk4MFWTxUZhwZCSanY4GTFFi+MQCJorZXREZYYWJsQEVbAje8smr0KpWPMt3tVL9KosjDydwCmXw4ALqcAMNaAKBR3iGV3hzpPPivDsfi9ack80cwx85nz9g645U</latexit><latexit sha1_base64="63ak468AXCor9nBfBLOpfhlLeqM=">AAAB7XicbZDLSgMxFIbP1Futt6pLN8Ei1E2ZKQVdFt24s4K9QDuWTJppYzPJkGSEMvQd3LhQxK3v4863MW1noa0/BD7+cw455w9izrRx3W8nt7a+sbmV3y7s7O7tHxQPj1paJorQJpFcqk6ANeVM0KZhhtNOrCiOAk7bwfh6Vm8/UaWZFPdmElM/wkPBQkawsVbrtiwequf9YsmtuHOhVfAyKEGmRr/41RtIkkRUGMKx1l3PjY2fYmUY4XRa6CWaxpiM8ZB2LQocUe2n822n6Mw6AxRKZZ8waO7+nkhxpPUkCmxnhM1IL9dm5n+1bmLCSz9lIk4MFWTxUZhwZCSanY4GTFFi+MQCJorZXREZYYWJsQEVbAje8smr0KpWPMt3tVL9KosjDydwCmXw4ALqcAMNaAKBR3iGV3hzpPPivDsfi9ack80cwx85nz9g645U</latexit><latexit sha1_base64="63ak468AXCor9nBfBLOpfhlLeqM=">AAAB7XicbZDLSgMxFIbP1Futt6pLN8Ei1E2ZKQVdFt24s4K9QDuWTJppYzPJkGSEMvQd3LhQxK3v4863MW1noa0/BD7+cw455w9izrRx3W8nt7a+sbmV3y7s7O7tHxQPj1paJorQJpFcqk6ANeVM0KZhhtNOrCiOAk7bwfh6Vm8/UaWZFPdmElM/wkPBQkawsVbrtiwequf9YsmtuHOhVfAyKEGmRr/41RtIkkRUGMKx1l3PjY2fYmUY4XRa6CWaxpiM8ZB2LQocUe2n822n6Mw6AxRKZZ8waO7+nkhxpPUkCmxnhM1IL9dm5n+1bmLCSz9lIk4MFWTxUZhwZCSanY4GTFFi+MQCJorZXREZYYWJsQEVbAje8smr0KpWPMt3tVL9KosjDydwCmXw4ALqcAMNaAKBR3iGV3hzpPPivDsfi9ack80cwx85nz9g645U</latexit><latexit sha1_base64="63ak468AXCor9nBfBLOpfhlLeqM=">AAAB7XicbZDLSgMxFIbP1Futt6pLN8Ei1E2ZKQVdFt24s4K9QDuWTJppYzPJkGSEMvQd3LhQxK3v4863MW1noa0/BD7+cw455w9izrRx3W8nt7a+sbmV3y7s7O7tHxQPj1paJorQJpFcqk6ANeVM0KZhhtNOrCiOAk7bwfh6Vm8/UaWZFPdmElM/wkPBQkawsVbrtiwequf9YsmtuHOhVfAyKEGmRr/41RtIkkRUGMKx1l3PjY2fYmUY4XRa6CWaxpiM8ZB2LQocUe2n822n6Mw6AxRKZZ8waO7+nkhxpPUkCmxnhM1IL9dm5n+1bmLCSz9lIk4MFWTxUZhwZCSanY4GTFFi+MQCJorZXREZYYWJsQEVbAje8smr0KpWPMt3tVL9KosjDydwCmXw4ALqcAMNaAKBR3iGV3hzpPPivDsfi9ack80cwx85nz9g645U</latexit>

(6)

AO-CNN STRUCTURE

Data Representation .. Many Possibilities!

6

points voxels cells patches

(7)

1. Representation 

2. Neighborhood information

• who are the neighbouring elements

• how are the elements ordered 

3. Extrinsic versus intrinsic representation 

4. Simplicity versus memory/runtime tradeoff

Challenges

7

(8)

• Image-based 

• Volumetric 

• Surface-based 

• Point-based

Representation for 3D

8

(9)

• Image-based 

• Volumetric 

• Point-based

Representation for 3D

9

(10)

• Image-based

• Volumetric

• Point-based

• Surface-based

Representation for 3D: Multi-view CNN

10

regular image analysis networks

[Kalogerakis et al. 2015]

(11)

Multi-view CNN

3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation

11

3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation

(12)

Multi-view CNN

Integrating View Information

12

(13)

• Image-based

Representation for 3D: Local Multi-view CNN

13

Segmentation

Correspondence Feature matching

Predicting semantic functions

[Huang et al. 2018]

localized renderings for point-wise features

(14)

Tangent Convolutions

14

[Tatarchenko et al. 2018]

loses information due to occlusion project to local patches 

(contrast with PCPNet construction)

(15)

Signal Interpolation

▸ Use nearest neighbor or Gaussian mixture based methods for interpolation.

▸ Now the signal is more dense

Dealing with Sparse Points

15

(16)

Signal Interpolation

▸ Use nearest neighbor or Gaussian mixture based methods for interpolation.

▸ Now the signal is more dense

Dealing with Sparse Points

16

(17)

Tangent Convolutions

Improved Performance

17

(18)

• Image-based

• PROS: directly use image networks, good performance

• CONS: rendering is slow and memory-heavy, not very geometric

• Volumetric 

• Point-based 

• Surface-based

Representation for 3D

18

(19)

• Image-based 

• Volumetric 

• Point-based

Representation for 3D

19

(20)

• Volumetric

3D CNNs : Direct Approach

20

[Xiao et al. 2014]

(21)

*) VOXNET: A 3D CONVOLUTIONAL NEURAL NETWORK FOR REAL-TIME OBJECT RECOGNITION [MATURANA ET AL. 2015]

VoxNet [Maturana et al. 15]

21

▸ Binary occupancy, density grid, etc.

rotational invariance

(22)

VISUALISATION OF FIRST LAYER FILTERS

Visualization of First Level Filters

22

(23)

• Volumetric

Representation for 3D: Volumetric Deformation

[Yumer and Mitra 2016] 23

(24)

Efficient Volumetric Datastructures

24

[Wang et al. 2017]

(25)

O-CNN: STRUCTURE AND CNN OPERATIONS

Data Structure and CNN Operations

25 shuffled keys

(encode position in space)

labels

(parent label → child indices)

downsampling example

(“where there is an octant, there is CNN computation”)

faster neighbor access

(26)

Efficient Volumetric Datastructures

[Hane et al. 2018] 26

only generate non-empty voxels

Wang et al. 2017

Encoder Decoder/generator

(27)

Efficient Volumetric Datastructures

[Hane et al. 2018] 27

(28)

O-CNN: EVALUATION

Lower Memory Footprint

28

(29)

*) ADAPTIVE O-CNN [WANG ET AL. 2018]

Adaptive O-CNN

29

image to planar patch-based shapes

[Wang et al. 2018]

(30)

First-order Patches

30

OCNN Adaptive OCNN

(31)

*) FPNN: FIELD PROBING NEURAL NETWORKS FOR 3D DATA [LI ET AL. 2016]

Field Probing Neural Networks for 3D Data

31

[Li et al. 2016]

(32)

Spatial Probes

32

(33)

Details

Method Details

33

(34)

• Image-based 

• Volumetric

• PROS: adaptations of image networks

• CONS: special layers for hierarchical datastructures, still too coarse 

• Point-based

Representation for 3D

34

(35)

• Image-based 

• Volumetric 

• Surface-based 

• Point-based

Representation for 3D

35

(36)

• Many different ways to parameterize a surface:

Local/Global Parameterizations

36

[Sinha et al. 2016]

Geometry Image Metric Alignment (GWCNN)

[Ezuz et al. 2017]

(37)

*) DEEP LEARNING 3D SHAPE SURFACES USING GEOMETRY IMAGES [SINHA ET AL. 2016]

Shape Surfaces using Geometry Images

37

(38)

*) GEODESIC CONVOLUTIONAL NEURAL NETWORKS ON RIEMANNIAN MANIFOLDS [MASCI ET AL. 2018 (UPDATED VERSION]

Using Geodesic Patches: GCNN

38

(f ? a)(x) := X

✓,r

a(✓ + ✓, r)(D(x)f )(r, ✓)

<latexit sha1_base64="TquvtCTwYBxPgA56TL/7lpYdxY0=">AAACO3icbZDLSxxBEMZ7NA8z5rHRYy6FizBDRGYkEAkIknjwaIKrws6y1PTWuI09D7prxGXY/yuX/BO55eIlh0jINff07s4hPgoafnxfFdX1pZVWlqPoh7e0/Ojxk6crz/zV5y9evuq8XjuxZW0k9WSpS3OWoiWtCuqxYk1nlSHMU02n6cWnmX96ScaqsjjmSUWDHM8LlSmJ7KRh50uQQWIZDWAIwVUIH/bAT2ydD5uEx8S4ZaaAwYLhLSQHpB20HpjQDw7cWBYGZmshhsNON9qO5gX3IW6hK9o6Gna+J6NS1jkVLDVa24+jigcNGlZS09RPaksVygs8p77DAnOyg2Z++xQ2nTKCrDTuFQxz9f+JBnNrJ3nqOnPksb3rzcSHvH7N2e6gUUVVMxVysSirNXAJsyBhpAxJ1hMHKI1yfwU5RoOSXdy+CyG+e/J9ONnZjh1/ftfd/9jGsSLeiA0RiFi8F/viUByJnpDiq7gWv8SN98376f32/ixal7x2Zl3cKu/vP2AxqYM=</latexit><latexit sha1_base64="TquvtCTwYBxPgA56TL/7lpYdxY0=">AAACO3icbZDLSxxBEMZ7NA8z5rHRYy6FizBDRGYkEAkIknjwaIKrws6y1PTWuI09D7prxGXY/yuX/BO55eIlh0jINff07s4hPgoafnxfFdX1pZVWlqPoh7e0/Ojxk6crz/zV5y9evuq8XjuxZW0k9WSpS3OWoiWtCuqxYk1nlSHMU02n6cWnmX96ScaqsjjmSUWDHM8LlSmJ7KRh50uQQWIZDWAIwVUIH/bAT2ydD5uEx8S4ZaaAwYLhLSQHpB20HpjQDw7cWBYGZmshhsNON9qO5gX3IW6hK9o6Gna+J6NS1jkVLDVa24+jigcNGlZS09RPaksVygs8p77DAnOyg2Z++xQ2nTKCrDTuFQxz9f+JBnNrJ3nqOnPksb3rzcSHvH7N2e6gUUVVMxVysSirNXAJsyBhpAxJ1hMHKI1yfwU5RoOSXdy+CyG+e/J9ONnZjh1/ftfd/9jGsSLeiA0RiFi8F/viUByJnpDiq7gWv8SN98376f32/ixal7x2Zl3cKu/vP2AxqYM=</latexit><latexit sha1_base64="TquvtCTwYBxPgA56TL/7lpYdxY0=">AAACO3icbZDLSxxBEMZ7NA8z5rHRYy6FizBDRGYkEAkIknjwaIKrws6y1PTWuI09D7prxGXY/yuX/BO55eIlh0jINff07s4hPgoafnxfFdX1pZVWlqPoh7e0/Ojxk6crz/zV5y9evuq8XjuxZW0k9WSpS3OWoiWtCuqxYk1nlSHMU02n6cWnmX96ScaqsjjmSUWDHM8LlSmJ7KRh50uQQWIZDWAIwVUIH/bAT2ydD5uEx8S4ZaaAwYLhLSQHpB20HpjQDw7cWBYGZmshhsNON9qO5gX3IW6hK9o6Gna+J6NS1jkVLDVa24+jigcNGlZS09RPaksVygs8p77DAnOyg2Z++xQ2nTKCrDTuFQxz9f+JBnNrJ3nqOnPksb3rzcSHvH7N2e6gUUVVMxVysSirNXAJsyBhpAxJ1hMHKI1yfwU5RoOSXdy+CyG+e/J9ONnZjh1/ftfd/9jGsSLeiA0RiFi8F/viUByJnpDiq7gWv8SN98376f32/ixal7x2Zl3cKu/vP2AxqYM=</latexit><latexit sha1_base64="TquvtCTwYBxPgA56TL/7lpYdxY0=">AAACO3icbZDLSxxBEMZ7NA8z5rHRYy6FizBDRGYkEAkIknjwaIKrws6y1PTWuI09D7prxGXY/yuX/BO55eIlh0jINff07s4hPgoafnxfFdX1pZVWlqPoh7e0/Ojxk6crz/zV5y9evuq8XjuxZW0k9WSpS3OWoiWtCuqxYk1nlSHMU02n6cWnmX96ScaqsjjmSUWDHM8LlSmJ7KRh50uQQWIZDWAIwVUIH/bAT2ydD5uEx8S4ZaaAwYLhLSQHpB20HpjQDw7cWBYGZmshhsNON9qO5gX3IW6hK9o6Gna+J6NS1jkVLDVa24+jigcNGlZS09RPaksVygs8p77DAnOyg2Z++xQ2nTKCrDTuFQxz9f+JBnNrJ3nqOnPksb3rzcSHvH7N2e6gUUVVMxVysSirNXAJsyBhpAxJ1hMHKI1yfwU5RoOSXdy+CyG+e/J9ONnZjh1/ftfd/9jGsSLeiA0RiFi8F/viUByJnpDiq7gWv8SN98376f32/ixal7x2Zl3cKu/vP2AxqYM=</latexit>

[Masci et al. 2015]

(39)

*) GEODESIC CONVOLUTIONAL NEURAL NETWORKS ON RIEMANNIAN MANIFOLDS [MASCI ET AL. 2018 (UPDATED VERSION]

GCNN Architecture

39

(40)

• Parameterize in spectral domain

Handling Rotational Ambiguity

40

(41)

map 3D surface to 2D domain

Parameterization for Surface Analysis

41

[Maron et al. 2017]

(42)

map 3D surface to 2D domain

Parameterization for Surface Analysis

42

[Maron et al. 2017]

(43)

• Map 3D surface to 2D domain 

• One such mapping: flat torus (seamless => translation-invariant) 

• Many mappings exists: sample a few and average result 

• Which functions to map?  

XYZ, normals, curvature, …

Parameterization for Surface Analysis

43

[Maron et al. 2017]

(44)

• Tested on mesh segmentation

Parameterization for Surface Analysis

44

[Maron et al. 2017]

(45)

• Condition decoded points on 2D patches

Texture Transfer (Parameterization + Alignment)

45

[Wang et al. 2016]

(46)

AtlasNet for Surface Generation

[Groueix et al. 2018] 46

condition decoded points on 2D patches

(47)

AtlasNet for Surface Generation

47

Latent representation can be

inferred from images or point clouds

[Groueix et al. 2018]

(48)

AtlasNet for Surface Generation

48

Quad Mesh is generated by mapping a regular grid in

2D domain to 3D points

[Groueix et al. 2018]

(49)

AtlasNet for Surface Generation

49

texture coordinates come for free!!

(50)

• Image-based 

• Volumetric 

• Surface-based

• PROS: parameterize + image networks (instrinsic representation)

• CONS: suffers from parameterisation artefacts (local versus global distortion),   requires good quality mesh 

• Point-based

Representation for 3D

50

(51)

• Image-based 

• Volumetric 

• Point-based

Representation for 3D

51

(52)

• Common representation: native representation 

• Easy to obtain from meshes, depth scans, laser scans

Representation for 3D: Point-based

52

(53)

• Common representation 

• Easy to obtain from meshes, depth scans, laser scans 

• Unstructured (e.g., any permutation of points gives same shape!)

In Original Representation

[Qi et al. 2017] 53

(54)

• Permutation-invariant functions

PointNet for Point Cloud Analysis

54

permutation-invariant functions

[Qi et al. 2017]

(55)

• Permutation-invariant functions

• Use MLPs (h) and max-pooling (g) as simple symmetric functions

PointNet for Point Cloud Analysis

55

Use MLPs (h) and max-pooling (g) as simple symmetric functions

[Qi et al. 2017]

(56)

PointNet Architecture

[Qi et al. 2017] 56

(57)

PointNet for Point Cloud Analysis

57

(58)

PointNet++

[Qi et al. 2018] 58

(59)

• Multi-scale version

PCPNet for Local Point Cloud Analysis

[Guerrero et al. 2018] 59

(60)

PCPNet Architecture

60

(61)

• Often generated output needs to be compare to some true shape

PointNet for Point Cloud Synthesis

[Su et al. 2017] 61

Earth Mover Distance as loss function