• Modeling by example, revisited
Application #1: 3D Modeling
1
[Sung et al. 2017]
Deep neural network predicts the next best part to add and its position to enable non-expert
users to create novel shapes.
• Joint multi-modal understanding
Application #2: Image Understanding
2
[Zhang et al. 2017]
understanding 3D shapes can benefit image understanding
• Semantic 3D reconstruction
Application #3: Semantic Scene Understanding
[Song et al. 2017] 3
4
Motivating Applications: Semantic Scene Understanding
[Kelly et al. 2017, Kelly and Guerrero et al. 2018]
Application #4: 3D Asset Creation
• Number of Voxels grows as versus occupied surface
What’s Different in 3D?
5
O (n
3)
<latexit sha1_base64="pPG/+UG2ht775Yz5Ajs7KOd+Azk=">AAAB7XicbZDLSgMxFIbP1Futt6pLN8Ei1E2ZsYIui27cWcFeoB1LJs20sZlkSDJCGfoOblwo4tb3cefbmLaz0NYfAh//OYec8wcxZ9q47reTW1ldW9/Ibxa2tnd294r7B00tE0Vog0guVTvAmnImaMMww2k7VhRHAaetYHQ9rbeeqNJMinszjqkf4YFgISPYWKt5WxYP1dNeseRW3JnQMngZlCBTvVf86vYlSSIqDOFY647nxsZPsTKMcDopdBNNY0xGeEA7FgWOqPbT2bYTdGKdPgqlsk8YNHN/T6Q40nocBbYzwmaoF2tT879aJzHhpZ8yESeGCjL/KEw4MhJNT0d9pigxfGwBE8XsrogMscLE2IAKNgRv8eRlaJ5VPMt356XaVRZHHo7gGMrgwQXU4Abq0AACj/AMr/DmSOfFeXc+5q05J5s5hD9yPn8AYnCOVQ==</latexit><latexit sha1_base64="pPG/+UG2ht775Yz5Ajs7KOd+Azk=">AAAB7XicbZDLSgMxFIbP1Futt6pLN8Ei1E2ZsYIui27cWcFeoB1LJs20sZlkSDJCGfoOblwo4tb3cefbmLaz0NYfAh//OYec8wcxZ9q47reTW1ldW9/Ibxa2tnd294r7B00tE0Vog0guVTvAmnImaMMww2k7VhRHAaetYHQ9rbeeqNJMinszjqkf4YFgISPYWKt5WxYP1dNeseRW3JnQMngZlCBTvVf86vYlSSIqDOFY647nxsZPsTKMcDopdBNNY0xGeEA7FgWOqPbT2bYTdGKdPgqlsk8YNHN/T6Q40nocBbYzwmaoF2tT879aJzHhpZ8yESeGCjL/KEw4MhJNT0d9pigxfGwBE8XsrogMscLE2IAKNgRv8eRlaJ5VPMt356XaVRZHHo7gGMrgwQXU4Abq0AACj/AMr/DmSOfFeXc+5q05J5s5hD9yPn8AYnCOVQ==</latexit><latexit sha1_base64="pPG/+UG2ht775Yz5Ajs7KOd+Azk=">AAAB7XicbZDLSgMxFIbP1Futt6pLN8Ei1E2ZsYIui27cWcFeoB1LJs20sZlkSDJCGfoOblwo4tb3cefbmLaz0NYfAh//OYec8wcxZ9q47reTW1ldW9/Ibxa2tnd294r7B00tE0Vog0guVTvAmnImaMMww2k7VhRHAaetYHQ9rbeeqNJMinszjqkf4YFgISPYWKt5WxYP1dNeseRW3JnQMngZlCBTvVf86vYlSSIqDOFY647nxsZPsTKMcDopdBNNY0xGeEA7FgWOqPbT2bYTdGKdPgqlsk8YNHN/T6Q40nocBbYzwmaoF2tT879aJzHhpZ8yESeGCjL/KEw4MhJNT0d9pigxfGwBE8XsrogMscLE2IAKNgRv8eRlaJ5VPMt356XaVRZHHo7gGMrgwQXU4Abq0AACj/AMr/DmSOfFeXc+5q05J5s5hD9yPn8AYnCOVQ==</latexit><latexit sha1_base64="pPG/+UG2ht775Yz5Ajs7KOd+Azk=">AAAB7XicbZDLSgMxFIbP1Futt6pLN8Ei1E2ZsYIui27cWcFeoB1LJs20sZlkSDJCGfoOblwo4tb3cefbmLaz0NYfAh//OYec8wcxZ9q47reTW1ldW9/Ibxa2tnd294r7B00tE0Vog0guVTvAmnImaMMww2k7VhRHAaetYHQ9rbeeqNJMinszjqkf4YFgISPYWKt5WxYP1dNeseRW3JnQMngZlCBTvVf86vYlSSIqDOFY647nxsZPsTKMcDopdBNNY0xGeEA7FgWOqPbT2bYTdGKdPgqlsk8YNHN/T6Q40nocBbYzwmaoF2tT879aJzHhpZ8yESeGCjL/KEw4MhJNT0d9pigxfGwBE8XsrogMscLE2IAKNgRv8eRlaJ5VPMt356XaVRZHHo7gGMrgwQXU4Abq0AACj/AMr/DmSOfFeXc+5q05J5s5hD9yPn8AYnCOVQ==</latexit>
O (n
2)
<latexit sha1_base64="63ak468AXCor9nBfBLOpfhlLeqM=">AAAB7XicbZDLSgMxFIbP1Futt6pLN8Ei1E2ZKQVdFt24s4K9QDuWTJppYzPJkGSEMvQd3LhQxK3v4863MW1noa0/BD7+cw455w9izrRx3W8nt7a+sbmV3y7s7O7tHxQPj1paJorQJpFcqk6ANeVM0KZhhtNOrCiOAk7bwfh6Vm8/UaWZFPdmElM/wkPBQkawsVbrtiwequf9YsmtuHOhVfAyKEGmRr/41RtIkkRUGMKx1l3PjY2fYmUY4XRa6CWaxpiM8ZB2LQocUe2n822n6Mw6AxRKZZ8waO7+nkhxpPUkCmxnhM1IL9dm5n+1bmLCSz9lIk4MFWTxUZhwZCSanY4GTFFi+MQCJorZXREZYYWJsQEVbAje8smr0KpWPMt3tVL9KosjDydwCmXw4ALqcAMNaAKBR3iGV3hzpPPivDsfi9ack80cwx85nz9g645U</latexit><latexit sha1_base64="63ak468AXCor9nBfBLOpfhlLeqM=">AAAB7XicbZDLSgMxFIbP1Futt6pLN8Ei1E2ZKQVdFt24s4K9QDuWTJppYzPJkGSEMvQd3LhQxK3v4863MW1noa0/BD7+cw455w9izrRx3W8nt7a+sbmV3y7s7O7tHxQPj1paJorQJpFcqk6ANeVM0KZhhtNOrCiOAk7bwfh6Vm8/UaWZFPdmElM/wkPBQkawsVbrtiwequf9YsmtuHOhVfAyKEGmRr/41RtIkkRUGMKx1l3PjY2fYmUY4XRa6CWaxpiM8ZB2LQocUe2n822n6Mw6AxRKZZ8waO7+nkhxpPUkCmxnhM1IL9dm5n+1bmLCSz9lIk4MFWTxUZhwZCSanY4GTFFi+MQCJorZXREZYYWJsQEVbAje8smr0KpWPMt3tVL9KosjDydwCmXw4ALqcAMNaAKBR3iGV3hzpPPivDsfi9ack80cwx85nz9g645U</latexit><latexit sha1_base64="63ak468AXCor9nBfBLOpfhlLeqM=">AAAB7XicbZDLSgMxFIbP1Futt6pLN8Ei1E2ZKQVdFt24s4K9QDuWTJppYzPJkGSEMvQd3LhQxK3v4863MW1noa0/BD7+cw455w9izrRx3W8nt7a+sbmV3y7s7O7tHxQPj1paJorQJpFcqk6ANeVM0KZhhtNOrCiOAk7bwfh6Vm8/UaWZFPdmElM/wkPBQkawsVbrtiwequf9YsmtuHOhVfAyKEGmRr/41RtIkkRUGMKx1l3PjY2fYmUY4XRa6CWaxpiM8ZB2LQocUe2n822n6Mw6AxRKZZ8waO7+nkhxpPUkCmxnhM1IL9dm5n+1bmLCSz9lIk4MFWTxUZhwZCSanY4GTFFi+MQCJorZXREZYYWJsQEVbAje8smr0KpWPMt3tVL9KosjDydwCmXw4ALqcAMNaAKBR3iGV3hzpPPivDsfi9ack80cwx85nz9g645U</latexit><latexit sha1_base64="63ak468AXCor9nBfBLOpfhlLeqM=">AAAB7XicbZDLSgMxFIbP1Futt6pLN8Ei1E2ZKQVdFt24s4K9QDuWTJppYzPJkGSEMvQd3LhQxK3v4863MW1noa0/BD7+cw455w9izrRx3W8nt7a+sbmV3y7s7O7tHxQPj1paJorQJpFcqk6ANeVM0KZhhtNOrCiOAk7bwfh6Vm8/UaWZFPdmElM/wkPBQkawsVbrtiwequf9YsmtuHOhVfAyKEGmRr/41RtIkkRUGMKx1l3PjY2fYmUY4XRa6CWaxpiM8ZB2LQocUe2n822n6Mw6AxRKZZ8waO7+nkhxpPUkCmxnhM1IL9dm5n+1bmLCSz9lIk4MFWTxUZhwZCSanY4GTFFi+MQCJorZXREZYYWJsQEVbAje8smr0KpWPMt3tVL9KosjDydwCmXw4ALqcAMNaAKBR3iGV3hzpPPivDsfi9ack80cwx85nz9g645U</latexit>
AO-CNN STRUCTURE
Data Representation .. Many Possibilities!
6
points voxels cells patches
1. Representation
2. Neighborhood information
• who are the neighbouring elements
• how are the elements ordered
3. Extrinsic versus intrinsic representation
4. Simplicity versus memory/runtime tradeoff
Challenges
7
• Image-based
• Volumetric
• Surface-based
• Point-based
Representation for 3D
8
• Image-based
• Volumetric
• Surface-based
• Point-based
Representation for 3D
9
• Image-based
• Volumetric
• Point-based
• Surface-based
Representation for 3D: Multi-view CNN
10
regular image analysis networks
[Kalogerakis et al. 2015]
Multi-view CNN
3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation
11
3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation
Multi-view CNN
Integrating View Information
12
• Image-based
Representation for 3D: Local Multi-view CNN
13
Segmentation
Correspondence Feature matching
Predicting semantic functions
[Huang et al. 2018]
localized renderings for point-wise features
Tangent Convolutions
Tangent Convolutions
14
[Tatarchenko et al. 2018]
loses information due to occlusion project to local patches
(contrast with PCPNet construction)
Signal Interpolation
▸ Use nearest neighbor or Gaussian mixture based methods for interpolation.
▸ Now the signal is more dense
Dealing with Sparse Points
15
Signal Interpolation
▸ Use nearest neighbor or Gaussian mixture based methods for interpolation.
▸ Now the signal is more dense
Dealing with Sparse Points
16
Tangent Convolutions
Improved Performance
17
• Image-based
• PROS: directly use image networks, good performance
• CONS: rendering is slow and memory-heavy, not very geometric
• Volumetric
• Point-based
• Surface-based
Representation for 3D
18
• Image-based
• Volumetric
• Surface-based
• Point-based
Representation for 3D
19
• Volumetric
3D CNNs : Direct Approach
20
[Xiao et al. 2014]
*) VOXNET: A 3D CONVOLUTIONAL NEURAL NETWORK FOR REAL-TIME OBJECT RECOGNITION [MATURANA ET AL. 2015]
VoxNet [Maturana et al. 15]
21
▸ Binary occupancy, density grid, etc.
rotational invariance
VISUALISATION OF FIRST LAYER FILTERS
Visualization of First Level Filters
22
• Volumetric
Representation for 3D: Volumetric Deformation
[Yumer and Mitra 2016] 23
Efficient Volumetric Datastructures
24
[Wang et al. 2017]
O-CNN: STRUCTURE AND CNN OPERATIONS
Data Structure and CNN Operations
25 shuffled keys
(encode position in space)
labels
(parent label → child indices)
downsampling example
(“where there is an octant, there is CNN computation”)
faster neighbor access
Efficient Volumetric Datastructures
[Hane et al. 2018] 26
only generate non-empty voxels
Wang et al. 2017
Encoder Decoder/generator
Efficient Volumetric Datastructures
[Hane et al. 2018] 27
O-CNN: EVALUATION
Lower Memory Footprint
28
*) ADAPTIVE O-CNN [WANG ET AL. 2018]
Adaptive O-CNN
29
image to planar patch-based shapes
[Wang et al. 2018]
First-order Patches
30
OCNN Adaptive OCNN
*) FPNN: FIELD PROBING NEURAL NETWORKS FOR 3D DATA [LI ET AL. 2016]
Field Probing Neural Networks for 3D Data
31
[Li et al. 2016]
Spatial Probes
32
Details
Method Details
33
• Image-based
• Volumetric
• PROS: adaptations of image networks
• CONS: special layers for hierarchical datastructures, still too coarse
• Surface-based
• Point-based
Representation for 3D
34
• Image-based
• Volumetric
• Surface-based
• Point-based
Representation for 3D
35
• Many different ways to parameterize a surface:
Local/Global Parameterizations
36
[Sinha et al. 2016]
Geometry Image Metric Alignment (GWCNN)
[Ezuz et al. 2017]
*) DEEP LEARNING 3D SHAPE SURFACES USING GEOMETRY IMAGES [SINHA ET AL. 2016]
Shape Surfaces using Geometry Images
37
*) GEODESIC CONVOLUTIONAL NEURAL NETWORKS ON RIEMANNIAN MANIFOLDS [MASCI ET AL. 2018 (UPDATED VERSION]
Using Geodesic Patches: GCNN
38
(f ? a)(x) := X
✓,r
a(✓ + ✓, r)(D(x)f )(r, ✓)
<latexit sha1_base64="TquvtCTwYBxPgA56TL/7lpYdxY0=">AAACO3icbZDLSxxBEMZ7NA8z5rHRYy6FizBDRGYkEAkIknjwaIKrws6y1PTWuI09D7prxGXY/yuX/BO55eIlh0jINff07s4hPgoafnxfFdX1pZVWlqPoh7e0/Ojxk6crz/zV5y9evuq8XjuxZW0k9WSpS3OWoiWtCuqxYk1nlSHMU02n6cWnmX96ScaqsjjmSUWDHM8LlSmJ7KRh50uQQWIZDWAIwVUIH/bAT2ydD5uEx8S4ZaaAwYLhLSQHpB20HpjQDw7cWBYGZmshhsNON9qO5gX3IW6hK9o6Gna+J6NS1jkVLDVa24+jigcNGlZS09RPaksVygs8p77DAnOyg2Z++xQ2nTKCrDTuFQxz9f+JBnNrJ3nqOnPksb3rzcSHvH7N2e6gUUVVMxVysSirNXAJsyBhpAxJ1hMHKI1yfwU5RoOSXdy+CyG+e/J9ONnZjh1/ftfd/9jGsSLeiA0RiFi8F/viUByJnpDiq7gWv8SN98376f32/ixal7x2Zl3cKu/vP2AxqYM=</latexit><latexit sha1_base64="TquvtCTwYBxPgA56TL/7lpYdxY0=">AAACO3icbZDLSxxBEMZ7NA8z5rHRYy6FizBDRGYkEAkIknjwaIKrws6y1PTWuI09D7prxGXY/yuX/BO55eIlh0jINff07s4hPgoafnxfFdX1pZVWlqPoh7e0/Ojxk6crz/zV5y9evuq8XjuxZW0k9WSpS3OWoiWtCuqxYk1nlSHMU02n6cWnmX96ScaqsjjmSUWDHM8LlSmJ7KRh50uQQWIZDWAIwVUIH/bAT2ydD5uEx8S4ZaaAwYLhLSQHpB20HpjQDw7cWBYGZmshhsNON9qO5gX3IW6hK9o6Gna+J6NS1jkVLDVa24+jigcNGlZS09RPaksVygs8p77DAnOyg2Z++xQ2nTKCrDTuFQxz9f+JBnNrJ3nqOnPksb3rzcSHvH7N2e6gUUVVMxVysSirNXAJsyBhpAxJ1hMHKI1yfwU5RoOSXdy+CyG+e/J9ONnZjh1/ftfd/9jGsSLeiA0RiFi8F/viUByJnpDiq7gWv8SN98376f32/ixal7x2Zl3cKu/vP2AxqYM=</latexit><latexit sha1_base64="TquvtCTwYBxPgA56TL/7lpYdxY0=">AAACO3icbZDLSxxBEMZ7NA8z5rHRYy6FizBDRGYkEAkIknjwaIKrws6y1PTWuI09D7prxGXY/yuX/BO55eIlh0jINff07s4hPgoafnxfFdX1pZVWlqPoh7e0/Ojxk6crz/zV5y9evuq8XjuxZW0k9WSpS3OWoiWtCuqxYk1nlSHMU02n6cWnmX96ScaqsjjmSUWDHM8LlSmJ7KRh50uQQWIZDWAIwVUIH/bAT2ydD5uEx8S4ZaaAwYLhLSQHpB20HpjQDw7cWBYGZmshhsNON9qO5gX3IW6hK9o6Gna+J6NS1jkVLDVa24+jigcNGlZS09RPaksVygs8p77DAnOyg2Z++xQ2nTKCrDTuFQxz9f+JBnNrJ3nqOnPksb3rzcSHvH7N2e6gUUVVMxVysSirNXAJsyBhpAxJ1hMHKI1yfwU5RoOSXdy+CyG+e/J9ONnZjh1/ftfd/9jGsSLeiA0RiFi8F/viUByJnpDiq7gWv8SN98376f32/ixal7x2Zl3cKu/vP2AxqYM=</latexit><latexit sha1_base64="TquvtCTwYBxPgA56TL/7lpYdxY0=">AAACO3icbZDLSxxBEMZ7NA8z5rHRYy6FizBDRGYkEAkIknjwaIKrws6y1PTWuI09D7prxGXY/yuX/BO55eIlh0jINff07s4hPgoafnxfFdX1pZVWlqPoh7e0/Ojxk6crz/zV5y9evuq8XjuxZW0k9WSpS3OWoiWtCuqxYk1nlSHMU02n6cWnmX96ScaqsjjmSUWDHM8LlSmJ7KRh50uQQWIZDWAIwVUIH/bAT2ydD5uEx8S4ZaaAwYLhLSQHpB20HpjQDw7cWBYGZmshhsNON9qO5gX3IW6hK9o6Gna+J6NS1jkVLDVa24+jigcNGlZS09RPaksVygs8p77DAnOyg2Z++xQ2nTKCrDTuFQxz9f+JBnNrJ3nqOnPksb3rzcSHvH7N2e6gUUVVMxVysSirNXAJsyBhpAxJ1hMHKI1yfwU5RoOSXdy+CyG+e/J9ONnZjh1/ftfd/9jGsSLeiA0RiFi8F/viUByJnpDiq7gWv8SN98376f32/ixal7x2Zl3cKu/vP2AxqYM=</latexit>
[Masci et al. 2015]
*) GEODESIC CONVOLUTIONAL NEURAL NETWORKS ON RIEMANNIAN MANIFOLDS [MASCI ET AL. 2018 (UPDATED VERSION]
GCNN Architecture
39
• Parameterize in spectral domain
Handling Rotational Ambiguity
40
map 3D surface to 2D domain
Parameterization for Surface Analysis
41
[Maron et al. 2017]
map 3D surface to 2D domain
Parameterization for Surface Analysis
42
[Maron et al. 2017]
• Map 3D surface to 2D domain
• One such mapping: flat torus (seamless => translation-invariant)
• Many mappings exists: sample a few and average result
• Which functions to map?
XYZ, normals, curvature, …
Parameterization for Surface Analysis
43
[Maron et al. 2017]
• Tested on mesh segmentation
Parameterization for Surface Analysis
44
[Maron et al. 2017]
• Condition decoded points on 2D patches
Texture Transfer (Parameterization + Alignment)
45
[Wang et al. 2016]
• Condition decoded points on 2D patches
AtlasNet for Surface Generation
[Groueix et al. 2018] 46
condition decoded points on 2D patches
• Condition decoded points on 2D patches
AtlasNet for Surface Generation
47
Latent representation can be
inferred from images or point clouds
[Groueix et al. 2018]
• Condition decoded points on 2D patches
AtlasNet for Surface Generation
48
Latent representation can be
inferred from images or point clouds
Quad Mesh is generated by mapping a regular grid in
2D domain to 3D points
[Groueix et al. 2018]
• Condition decoded points on 2D patches
AtlasNet for Surface Generation
49
Latent representation can be
inferred from images or point clouds
texture coordinates come for free!!
• Image-based
• Volumetric
• Surface-based
• PROS: parameterize + image networks (instrinsic representation)
• CONS: suffers from parameterisation artefacts (local versus global distortion), requires good quality mesh
• Point-based
Representation for 3D
50
• Image-based
• Volumetric
• Surface-based
• Point-based
Representation for 3D
51
• Common representation: native representation
• Easy to obtain from meshes, depth scans, laser scans
Representation for 3D: Point-based
52
• Common representation
• Easy to obtain from meshes, depth scans, laser scans
• Unstructured (e.g., any permutation of points gives same shape!)
In Original Representation
[Qi et al. 2017] 53
• Permutation-invariant functions
PointNet for Point Cloud Analysis
54
permutation-invariant functions
[Qi et al. 2017]
• Permutation-invariant functions
• Use MLPs (h) and max-pooling (g) as simple symmetric functions
PointNet for Point Cloud Analysis
55
Use MLPs (h) and max-pooling (g) as simple symmetric functions
[Qi et al. 2017]
PointNet Architecture
[Qi et al. 2017] 56
PointNet for Point Cloud Analysis
57
PointNet++
PointNet++
[Qi et al. 2018] 58
• Multi-scale version
PCPNet for Local Point Cloud Analysis
[Guerrero et al. 2018] 59
PCPNet Architecture
60
• Often generated output needs to be compare to some true shape
PointNet for Point Cloud Synthesis
[Su et al. 2017] 61
Earth Mover Distance as loss function