High-Performance Graphics 2021 N. Binder and T. Ritschel (Guest Editors)
(2021),
A Halfedge Refinement Rule for Parallel Catmull-Clark Subdivision Supplemental Material: CPU Performance Measurements
J. Dupuy and K. Vanhoey Unity Technologies
This document provides exhaustive performance measurements of our CPU implementation. The measurements were performed on an
AMD Ryzen Threadripper 3960X CPU with 24-cores. We compiled our program with options -march-native and -Os. Our program relies
on OpenMP to spawn threads, and we report performance measurements for threads counts 1, 2, 4, 8, 16, and 32. Each number corresponds
to the median timing over a set of 50 runs. The goal of this document is to convey the key information that the performances of our method
scale linearly with the number of threads from 1 to 16 threads. For 32 threads we observe less significant speed-ups, which is due to the fact
that the CPU has 24 cores and obviously becomes less efficient at distributing tasks.
1. Rook
1 2 3 4 5
10
−110
010
110
20.26
2.58
11.92
47.99
190.99
0.3
1.36
6.13
24.38
97.37
0.18
0.79
3.26
12.74
50.51
0.12
0.48
1.86
6.97
26.93
8.56·10−2
0.31
1.09
3.86
15.05
7.87·10−2
0.32
0.93
3.23
12.52
timings (ms)
Vertex-Point Refinement
CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32
depth:
1 2 3 4 5
10
−210
−110
010
11.02·10−2
0.3
1.81
8
31.64
1.16·10−2
0.29
0.91
4
16.31
7.24·10−3
0.16
0.47
2.02
8.25
6.66·10−3
9.13·10−2
0.24
1.05
4.19
6.6·10−3
5·10−2
0.28
0.71
2.18
9.11·10−3
4.37·10−2
0.12
0.49
1.86
timings (ms)
Halfedge Refinement
CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32
depth:
10
−210
−110
03.76·10−3
8.95·10−2
0.26
0.59
1.23
5.26·10−3
9.04·10−2
0.26
0.57 0.64
4.23·10−3
5.69·10−2
0.16
0.34 0.39
4.26·10−3
3.31·10−2
9.34·10−2
0.18
0.36
5.65·10−3
2.82·10−2
5.64·10−2
0.11
0.23
8.43·10−3
2.58·10−2
4.96·10−2
8.94·10−2
0.14
timings (ms)
Crease Refinement
CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32
Rook
S 0 S 4
44 non-quads / 24 boundaries 280 creases
H0
= 3,064
F0= 777
E0= 1,544
V0= 768
H4
= 784,384
F4= 196,096
E4= 392,384
V4= 196,289
2. Bishop
1 2 3 4 5
10
−110
010
110
20.33
3.24
14.52
56.59
238.57
0.36
1.67
7.44
29.61
119.7
0.19
0.91
3.92
15.38
61.33
0.12
0.53
2.11
8.3
32.49
6.95·10−2
0.41
1.24
4.63
18.08
7.51·10−2
0.29
1.05
3.74
14.76
timings (ms)
Vertex-Point Refinement
CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32
depth:
1 2 3 4 5
10
−210
−110
010
11.23·10−2
0.38
2.21
9.42
39.65
1.39·10−2
0.19
1.11
4.87
19.87
8.2·10−3
0.21
0.57
2.48
10.03
6.84·10−3
0.11
0.3
1.26
5.12
6.81·10−3
7.99·10−2
0.33
1.05
4.39
9.62·10−3
5.13·10−2
0.2
0.56
2.26
timings (ms)
Halfedge Refinement
CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32
depth:
10
−210
−110
04.5·10−3
0.16
0.31
0.7
1.51
6.14·10−3
0.11
0.32 0.37
0.76
4.35·10−3
6.02·10−2
0.19
0.38 0.41
4.72·10−3
3.89·10−2
9.96·10−2
0.24
0.47
5.65·10−3
2.89·10−2
6.45·10−2
0.16
0.26
9.15·10−3
2.54·10−2
5.41·10−2
0.11
0.18
timings (ms)
Crease Refinement
CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32
Bishop
S 0 S 4
132 non-quads / 24 boundaries 224 creases
H0
= 3,740
F0= 968
E0= 1,882
V0= 917
H4
= 957,440
F4= 239,360
E4= 478,912
V4= 239,555
3. Car
1 2 3 4 5
10
−110
010
110
20.54
5.39
24.11
96.97
413.57
0.29
2.79
12.45
49.81
207.58
0.31
1.48
6.55
26.08
104.97
0.22
0.91
3.62
13.98
56.1
0.14
0.59
2.1
7.91
30.79
0.12
0.44
1.68
6.33
25.2
timings (ms)
Vertex-Point Refinement
CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32
depth:
1 2 3 4 5
10
−210
−110
010
110
23.17·10−2
0.61
3.68
16.18
69.46
2.56·10−2
0.32
1.85
8.21
33.44
1.25·10−2
0.31
0.94
4.13
16.86
8.62·10−3
0.18
0.49
2.12
8.59
8.39·10−3
9.82·10−2
0.26
1.09
4.39
9.69·10−3
7.31·10−2
0.31
0.92
3.78
timings (ms)
Halfedge Refinement
CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32
depth:
10
−210
−110
01.18·10−2
0.18
0.52
1.2
2.54
9.96·10−3
0.21 0.28
0.63
1.29
6.29·10−3
0.11
0.28 0.35
0.68
5·10−3
6.09·10−2
0.19
0.39 0.39
6.23·10−3
3.93·10−2
0.1
0.22
0.41
8.4·10−3
3.32·10−2
7.87·10−2
0.14
0.29
timings (ms)
Crease Refinement
CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32
Car
S 0 S 4
all-quads / 60 boundaries 314 creases
H0
= 6,300
F0= 1,575
E0= 3,180
V0= 1,642
H4
= 1,612,800
F4= 403,200
E4= 806,880
V4= 403,717
4. ArmorGuy
1 2 3 4 5
10
010
110
210
32.95
28.5
134.78
561.93
2,294.11
1.57
14.79
67.44
283.99
1,162.7
0.8
7.62
35.01
143.33
595.87
0.45
4.08
18.55
76.1
314.03
0.36
2.3
10.36
42.21
174.42
0.28
1.91
8.28
34.35
142.76
timings (ms)
Vertex-Point Refinement
CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32
depth:
1 2 3 4 5
10
−210
−110
010
110
210
30.2
3.37
20.61
89.52
386.84
6.38·10−2
1.72
10.39
45.62
193.1
3.33·10−2
0.85
5.19
22.56
97.39
1.94·10−2
0.43
2.65
11.54
49.48
3.14·10−2
0.33
1.76
5.95
26.3
2.32·10−2
0.19
1.13
5.12
27.45
timings (ms)
Halfedge Refinement
CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32
depth:
10
−210
−110
010
10.11
1.13
3.14
7.04
14.15
3.37·10−2
0.55
1.54
3.45
7.44
1.52·10−2
0.29
0.81
1.79
3.72
9.73·10−3
0.32 0.44
1.19
1.95
1.45·10−2
0.17
0.5 0.62
1.05
1.29·10−2
0.1
0.26
0.4
0.74
timings (ms)
Crease Refinement
CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32
ArmorGuy
S 0 S 4
300 non-quads / 2,034 boundaries 7,101 creases
H0
= 34,388
F0= 8,639
E0= 18,211
V0= 10,022
H4
= 8,803,328
F4= 2,200,832
E4= 4,417,936
V4= 2,217,554
5. Bigguy
1 2 3 4 5
10
−110
010
110
20.49
5.03
22.52
90.46
377.15
0.51
2.59
11.61
46.45
188.72
0.34
1.56
6.38
24.54
98.88
0.23
0.91
3.51
13.3
52.52
0.17
0.69
2.63
7.71
29.49
0.12
0.47
1.6
5.96
23.31
timings (ms)
Vertex-Point Refinement
CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32
depth:
1 2 3 4 5
10
−210
−110
010
110
23.94·10−2
0.57
3.42
15.06
61.44
1.97·10−2
0.29
1.7
7.53
30.73
1.27·10−2
0.28
0.87
3.85
15.53
8.85·10−3
0.17
0.45
1.95
7.99
9.58·10−3
8.46·10−2
0.37
1.62
6.74
9.74·10−3
7.87·10−2
0.24
0.85
3.52
timings (ms)
Halfedge Refinement
CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32
depth:
Bigguy
S 0 S 4
(all-quads / no boundaries)
H0= 5,800
F0
= 1,450
E0= 2,900
V0= 1,452
H4
= 1,484,800
F4= 371,200
E4= 742,400
V4= 371,202
6. Monsterfrog
1 2 3 4 5
10
−110
010
110
20.43
4.45
20.09
80.89
340.93
0.47
2.29
10.37
41.47
167.83
0.27
1.25
5.4
21.41
87.15
0.16
0.69
3
11.57
45.36
0.13
0.46
1.71
6.48
25.2
9.76·10−2
0.41
1.46
5.11
20.5
timings (ms)
Vertex-Point Refinement
CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32
depth:
1 2 3 4 5
10
−110
010
110
20.49
5.03
22.52
90.46
377.15
0.51
2.59
11.61
46.45
188.72
0.34
1.56
6.38
24.54
98.88
0.23
0.91
3.51
13.3
52.52
0.17
0.69
2.63
7.71
29.49
0.12
0.47
1.6
5.96
23.31
timings (ms)
Vertex-Point Refinement
CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32
depth:
Monsterfrog
S 0 S 4
(all-quads / no boundaries)
H0= 5,168
F0
= 1,292
E0= 2,584
V0= 1,308
H4
= 1,323,008
F4= 330,752
E4= 661,504
V4= 330,768
7. Imrod
1 2 3 4 5
10
010
110
210
31.94
18.68
84.41
349.58
1,358.14
1
9.55
43.66
174.02
713.34
0.55
4.97
21.95
89.08
364.67
0.52
2.63
11.75
47.14
198.43
0.35
1.48
6.44
26.03
111.18
0.25
1.23
5.08
21.12
90.89
timings (ms)
Vertex-Point Refinement
CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32
depth:
1 2 3 4 5
10
−210
−110
010
110
20.13
2.06
12.98
55.73
231.75
7.75·10−2
1.05
6.49
28.04
119.96
3.76·10−2
0.53
3.25
14.15
59.82
2.15·10−2
0.29
1.66
7.15
30.48
2.02·10−2
0.31
0.86
3.67
16.11
1.69·10−2
0.22
0.71
3.14
16.55
timings (ms)
Halfedge Refinement
CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32
depth:
Imrod
S 0 S 4
(3,479 non-quads / 223 boundaries)
H0= 21,399
F0
= 6,202
E0= 10,811
V0= 4,630
H4
= 5,478,144
F4= 1,369,536
E4= 2,740,856
V4= 1,371,341
8. T-Rex
1 2 3 4 5
10
010
110
210
33.71
36.62
176.38
714.53
2,942.72
1.95
19.14
87.42
363.6
1,488.1
1.05
9.95
45.24
186.58
753.35
0.61
5.34
23.92
102.96
392.98
0.39
2.94
13.24
60.15
218.72
0.38
2.33
10.6
49.07
175.83
timings (ms)
Vertex-Point Refinement
CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32
depth:
1 2 3 4 5
10
−210
−110
010
110
210
30.24
4.35
26.67
116.82
497.51
0.16
2.12
13.43
60.01
248.55
7.98·10−2
1.07
6.68
30.21
125.06
4.59·10−2
0.55
3.4
15.38
63.53
2.33·10−2
0.3
2.87
7.98
33.43
1.7·10−2
0.23
1.48
6.95
34.46
timings (ms)
Halfedge Refinement
CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32
depth:
T-Rex
S 0 S 4
(468 non-quads / 594 boundaries)
H0= 45,224
F0
= 11,422
E0= 22,909
V0= 11,539
H4