• No results found

This document provides exhaustive performance measurements of our CPU implementation. The measurements were performed on an

N/A
N/A
Protected

Academic year: 2022

Share "This document provides exhaustive performance measurements of our CPU implementation. The measurements were performed on an"

Copied!
9
0
0

Laster.... (Se fulltekst nå)

Fulltekst

(1)

High-Performance Graphics 2021 N. Binder and T. Ritschel (Guest Editors)

(2021),

A Halfedge Refinement Rule for Parallel Catmull-Clark Subdivision Supplemental Material: CPU Performance Measurements

J. Dupuy and K. Vanhoey Unity Technologies

This document provides exhaustive performance measurements of our CPU implementation. The measurements were performed on an

AMD Ryzen Threadripper 3960X CPU with 24-cores. We compiled our program with options -march-native and -Os. Our program relies

on OpenMP to spawn threads, and we report performance measurements for threads counts 1, 2, 4, 8, 16, and 32. Each number corresponds

to the median timing over a set of 50 runs. The goal of this document is to convey the key information that the performances of our method

scale linearly with the number of threads from 1 to 16 threads. For 32 threads we observe less significant speed-ups, which is due to the fact

that the CPU has 24 cores and obviously becomes less efficient at distributing tasks.

(2)

1. Rook

1 2 3 4 5

10

−1

10

0

10

1

10

2

0.26

2.58

11.92

47.99

190.99

0.3

1.36

6.13

24.38

97.37

0.18

0.79

3.26

12.74

50.51

0.12

0.48

1.86

6.97

26.93

8.56·10−2

0.31

1.09

3.86

15.05

7.87·10−2

0.32

0.93

3.23

12.52

timings (ms)

Vertex-Point Refinement

CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32

depth:

1 2 3 4 5

10

−2

10

−1

10

0

10

1

1.02·10−2

0.3

1.81

8

31.64

1.16·10−2

0.29

0.91

4

16.31

7.24·10−3

0.16

0.47

2.02

8.25

6.66·10−3

9.13·10−2

0.24

1.05

4.19

6.6·10−3

5·10−2

0.28

0.71

2.18

9.11·10−3

4.37·10−2

0.12

0.49

1.86

timings (ms)

Halfedge Refinement

CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32

depth:

10

−2

10

−1

10

0

3.76·10−3

8.95·10−2

0.26

0.59

1.23

5.26·10−3

9.04·10−2

0.26

0.57 0.64

4.23·10−3

5.69·10−2

0.16

0.34 0.39

4.26·10−3

3.31·10−2

9.34·10−2

0.18

0.36

5.65·10−3

2.82·10−2

5.64·10−2

0.11

0.23

8.43·10−3

2.58·10−2

4.96·10−2

8.94·10−2

0.14

timings (ms)

Crease Refinement

CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32

Rook

S 0 S 4

44 non-quads / 24 boundaries 280 creases

H0

= 3,064

F0

= 777

E0

= 1,544

V0

= 768

H4

= 784,384

F4

= 196,096

E4

= 392,384

V4

= 196,289

(3)

2. Bishop

1 2 3 4 5

10

−1

10

0

10

1

10

2

0.33

3.24

14.52

56.59

238.57

0.36

1.67

7.44

29.61

119.7

0.19

0.91

3.92

15.38

61.33

0.12

0.53

2.11

8.3

32.49

6.95·10−2

0.41

1.24

4.63

18.08

7.51·10−2

0.29

1.05

3.74

14.76

timings (ms)

Vertex-Point Refinement

CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32

depth:

1 2 3 4 5

10

−2

10

−1

10

0

10

1

1.23·10−2

0.38

2.21

9.42

39.65

1.39·10−2

0.19

1.11

4.87

19.87

8.2·10−3

0.21

0.57

2.48

10.03

6.84·10−3

0.11

0.3

1.26

5.12

6.81·10−3

7.99·10−2

0.33

1.05

4.39

9.62·10−3

5.13·10−2

0.2

0.56

2.26

timings (ms)

Halfedge Refinement

CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32

depth:

10

−2

10

−1

10

0

4.5·10−3

0.16

0.31

0.7

1.51

6.14·10−3

0.11

0.32 0.37

0.76

4.35·10−3

6.02·10−2

0.19

0.38 0.41

4.72·10−3

3.89·10−2

9.96·10−2

0.24

0.47

5.65·10−3

2.89·10−2

6.45·10−2

0.16

0.26

9.15·10−3

2.54·10−2

5.41·10−2

0.11

0.18

timings (ms)

Crease Refinement

CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32

Bishop

S 0 S 4

132 non-quads / 24 boundaries 224 creases

H0

= 3,740

F0

= 968

E0

= 1,882

V0

= 917

H4

= 957,440

F4

= 239,360

E4

= 478,912

V4

= 239,555

(4)

3. Car

1 2 3 4 5

10

−1

10

0

10

1

10

2

0.54

5.39

24.11

96.97

413.57

0.29

2.79

12.45

49.81

207.58

0.31

1.48

6.55

26.08

104.97

0.22

0.91

3.62

13.98

56.1

0.14

0.59

2.1

7.91

30.79

0.12

0.44

1.68

6.33

25.2

timings (ms)

Vertex-Point Refinement

CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32

depth:

1 2 3 4 5

10

−2

10

−1

10

0

10

1

10

2

3.17·10−2

0.61

3.68

16.18

69.46

2.56·10−2

0.32

1.85

8.21

33.44

1.25·10−2

0.31

0.94

4.13

16.86

8.62·10−3

0.18

0.49

2.12

8.59

8.39·10−3

9.82·10−2

0.26

1.09

4.39

9.69·10−3

7.31·10−2

0.31

0.92

3.78

timings (ms)

Halfedge Refinement

CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32

depth:

10

−2

10

−1

10

0

1.18·10−2

0.18

0.52

1.2

2.54

9.96·10−3

0.21 0.28

0.63

1.29

6.29·10−3

0.11

0.28 0.35

0.68

5·10−3

6.09·10−2

0.19

0.39 0.39

6.23·10−3

3.93·10−2

0.1

0.22

0.41

8.4·10−3

3.32·10−2

7.87·10−2

0.14

0.29

timings (ms)

Crease Refinement

CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32

Car

S 0 S 4

all-quads / 60 boundaries 314 creases

H0

= 6,300

F0

= 1,575

E0

= 3,180

V0

= 1,642

H4

= 1,612,800

F4

= 403,200

E4

= 806,880

V4

= 403,717

(5)

4. ArmorGuy

1 2 3 4 5

10

0

10

1

10

2

10

3

2.95

28.5

134.78

561.93

2,294.11

1.57

14.79

67.44

283.99

1,162.7

0.8

7.62

35.01

143.33

595.87

0.45

4.08

18.55

76.1

314.03

0.36

2.3

10.36

42.21

174.42

0.28

1.91

8.28

34.35

142.76

timings (ms)

Vertex-Point Refinement

CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32

depth:

1 2 3 4 5

10

−2

10

−1

10

0

10

1

10

2

10

3

0.2

3.37

20.61

89.52

386.84

6.38·10−2

1.72

10.39

45.62

193.1

3.33·10−2

0.85

5.19

22.56

97.39

1.94·10−2

0.43

2.65

11.54

49.48

3.14·10−2

0.33

1.76

5.95

26.3

2.32·10−2

0.19

1.13

5.12

27.45

timings (ms)

Halfedge Refinement

CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32

depth:

10

−2

10

−1

10

0

10

1

0.11

1.13

3.14

7.04

14.15

3.37·10−2

0.55

1.54

3.45

7.44

1.52·10−2

0.29

0.81

1.79

3.72

9.73·10−3

0.32 0.44

1.19

1.95

1.45·10−2

0.17

0.5 0.62

1.05

1.29·10−2

0.1

0.26

0.4

0.74

timings (ms)

Crease Refinement

CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32

ArmorGuy

S 0 S 4

300 non-quads / 2,034 boundaries 7,101 creases

H0

= 34,388

F0

= 8,639

E0

= 18,211

V0

= 10,022

H4

= 8,803,328

F4

= 2,200,832

E4

= 4,417,936

V4

= 2,217,554

(6)

5. Bigguy

1 2 3 4 5

10

−1

10

0

10

1

10

2

0.49

5.03

22.52

90.46

377.15

0.51

2.59

11.61

46.45

188.72

0.34

1.56

6.38

24.54

98.88

0.23

0.91

3.51

13.3

52.52

0.17

0.69

2.63

7.71

29.49

0.12

0.47

1.6

5.96

23.31

timings (ms)

Vertex-Point Refinement

CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32

depth:

1 2 3 4 5

10

−2

10

−1

10

0

10

1

10

2

3.94·10−2

0.57

3.42

15.06

61.44

1.97·10−2

0.29

1.7

7.53

30.73

1.27·10−2

0.28

0.87

3.85

15.53

8.85·10−3

0.17

0.45

1.95

7.99

9.58·10−3

8.46·10−2

0.37

1.62

6.74

9.74·10−3

7.87·10−2

0.24

0.85

3.52

timings (ms)

Halfedge Refinement

CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32

depth:

Bigguy

S 0 S 4

(all-quads / no boundaries)

H0

= 5,800

F0

= 1,450

E0

= 2,900

V0

= 1,452

H4

= 1,484,800

F4

= 371,200

E4

= 742,400

V4

= 371,202

(7)

6. Monsterfrog

1 2 3 4 5

10

−1

10

0

10

1

10

2

0.43

4.45

20.09

80.89

340.93

0.47

2.29

10.37

41.47

167.83

0.27

1.25

5.4

21.41

87.15

0.16

0.69

3

11.57

45.36

0.13

0.46

1.71

6.48

25.2

9.76·10−2

0.41

1.46

5.11

20.5

timings (ms)

Vertex-Point Refinement

CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32

depth:

1 2 3 4 5

10

−1

10

0

10

1

10

2

0.49

5.03

22.52

90.46

377.15

0.51

2.59

11.61

46.45

188.72

0.34

1.56

6.38

24.54

98.88

0.23

0.91

3.51

13.3

52.52

0.17

0.69

2.63

7.71

29.49

0.12

0.47

1.6

5.96

23.31

timings (ms)

Vertex-Point Refinement

CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32

depth:

Monsterfrog

S 0 S 4

(all-quads / no boundaries)

H0

= 5,168

F0

= 1,292

E0

= 2,584

V0

= 1,308

H4

= 1,323,008

F4

= 330,752

E4

= 661,504

V4

= 330,768

(8)

7. Imrod

1 2 3 4 5

10

0

10

1

10

2

10

3

1.94

18.68

84.41

349.58

1,358.14

1

9.55

43.66

174.02

713.34

0.55

4.97

21.95

89.08

364.67

0.52

2.63

11.75

47.14

198.43

0.35

1.48

6.44

26.03

111.18

0.25

1.23

5.08

21.12

90.89

timings (ms)

Vertex-Point Refinement

CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32

depth:

1 2 3 4 5

10

−2

10

−1

10

0

10

1

10

2

0.13

2.06

12.98

55.73

231.75

7.75·10−2

1.05

6.49

28.04

119.96

3.76·10−2

0.53

3.25

14.15

59.82

2.15·10−2

0.29

1.66

7.15

30.48

2.02·10−2

0.31

0.86

3.67

16.11

1.69·10−2

0.22

0.71

3.14

16.55

timings (ms)

Halfedge Refinement

CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32

depth:

Imrod

S 0 S 4

(3,479 non-quads / 223 boundaries)

H0

= 21,399

F0

= 6,202

E0

= 10,811

V0

= 4,630

H4

= 5,478,144

F4

= 1,369,536

E4

= 2,740,856

V4

= 1,371,341

(9)

8. T-Rex

1 2 3 4 5

10

0

10

1

10

2

10

3

3.71

36.62

176.38

714.53

2,942.72

1.95

19.14

87.42

363.6

1,488.1

1.05

9.95

45.24

186.58

753.35

0.61

5.34

23.92

102.96

392.98

0.39

2.94

13.24

60.15

218.72

0.38

2.33

10.6

49.07

175.83

timings (ms)

Vertex-Point Refinement

CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32

depth:

1 2 3 4 5

10

−2

10

−1

10

0

10

1

10

2

10

3

0.24

4.35

26.67

116.82

497.51

0.16

2.12

13.43

60.01

248.55

7.98·10−2

1.07

6.68

30.21

125.06

4.59·10−2

0.55

3.4

15.38

63.53

2.33·10−2

0.3

2.87

7.98

33.43

1.7·10−2

0.23

1.48

6.95

34.46

timings (ms)

Halfedge Refinement

CPU_1 CPU_2 CPU_4 CPU_8 CPU_16 CPU_32

depth:

T-Rex

S 0 S 4

(468 non-quads / 594 boundaries)

H0

= 45,224

F0

= 11,422

E0

= 22,909

V0

= 11,539

H4

= 11,577,344

F4

= 2,894,336

E4

= 5,793,424

V4

= 2,899,140

Referanser

RELATERTE DOKUMENTER

There had been an innovative report prepared by Lord Dawson in 1920 for the Minister of Health’s Consultative Council on Medical and Allied Services, in which he used his

The ideas launched by the Beveridge Commission in 1942 set the pace for major reforms in post-war Britain, and inspired Norwegian welfare programmes as well, with gradual

This report documents the experiences and lessons from the deployment of operational analysts to Afghanistan with the Norwegian Armed Forces, with regard to the concept, the main

Figure 5.3 Measured time series of the pressure for HK 416 N at two different directions from the shooting direction, with and without flash suppressor, at 84 cm from the muzzle..

Overall, the SAB considered 60 chemicals that included: (a) 14 declared as RCAs since entry into force of the Convention; (b) chemicals identied as potential RCAs from a list of

An abstract characterisation of reduction operators Intuitively a reduction operation, in the sense intended in the present paper, is an operation that can be applied to inter-

Fig. Modeling is done with the composite-roughness surface scattering kernel for the same type of bottom as in Fig. There are 10 dB between the thick marks on the vertical axes.

The only plugin in this test that does not load tiles until they are needed. This feature is absolutely necessary when implementing very large Level of Detail-based models.