Car Evaluation - Flexible Ensemble Structures for Gradient Boosting

A.5 Car Evaluation

This section contains the results of the comparison between traditional and flexible structures, each optimized with Bayesian Optimization, on the Car Evaluation dataset. The results are divided into four hyperparmeter scenarios; Scenario 1: learning_rate; Scenario 2:

max_depth; Scenario 3: learning_rate, max_depth and subsample; and Scenario 4:

learning_rate, max_depth, subsample and colsample_bytree.

Of all the scenarios, only the 5-tree flexible structure of Scenario 1 improved prediction performance compared to the 5-tree traditional structure. The percentage of improvement from the 5-tree traditional structure, relative to the improvement gained with the 6-tree traditional structure, was here 46.60%. For Scenario 2, 3 and 4, the relative percentages of improvement were -38.47%, -146.68% and -7.10%, respectively. These scenarios were thus detrimental to prediction performance compared to the traditional approach to ensemble structure optimization. Scenario 1, being the only scenario with an improvement in prediction with the flexible structure, was also the scenario with the best flexible structure Error, being 0.015339. The best Error of the traditional structures of 5 trees, was comparably Error 0.014181, from Scenario 3. Thus, the flexible ensemble structures could not surpass the best obtained prediction performance of the traditional structures for this dataset.

Regarding the characteristics of the flexible structure, there were some similarities between the scenarios’ optimal hyperparameter values. For instance, learning_rate values were generally quite high for all scenarios, and max_depth was very commonly 6, the highest possible value, for most trees. Beyond this, however, the differences between the scenarios were considerable, and it was quite apparent that the hyperparameters influenced each other’s optimal values when optimized together.

A.5.1 Scenario 1

For Scenario 1 of the Car Evaluation dataset, the traditionally optimized structures of 5 and 6 trees are tabulated in Table A.49, while the flexible structure obtained with Holistic optimization is tabulated in Table A.50. The prediction performance comparison between the traditional structures and the flexible structure is tabulated in Table A.51. From the 5-tree traditional structure, the 5-tree flexible structure improve prediction performance equivalent to 46.60% of the improvement achieved with the 6-tree traditional structure.

This demonstrates that the flexible structure was significantly beneficial for prediction performance with the optimization scenario of learning_rate for this dataset.

In regards to the structure’s characteristics, the traditional structures had very similar learning_rate values, both being roughly 0.93. The flexible structure also had quite similar and high learning_rate values, ranging between 0.91 and 1.0.

5 Trees 6 Trees

Error 0.017362 0.013021

Learning_rate 0.9315 0.9270

Table A.49: The Error score and hyperparameter configuration of the traditionally structured ensembles of 5 and 6 trees, based on hyperparameter Scenario 1. The learning_rate was optimized though 1000 iterations of Bayesian Optimization, and evaluated with cross validation on the Car Evaluation dataset.

Error 0.015339 l_r Tree 1 0.9187 l_r Tree 2 0.9777 l_r Tree 3 0.9979 l_r Tree 4 0.9816 l_r Tree 5 0.9489

Table A.50: The Error score and hyperparameter configuration of a flexible ensemble structure of 5 trees, based on hyperparameter Scenario 1. The learning_rate values for each tree were optimized through 2000 iterations of Bayesian Optimization, and evaluated with cross validation on the Car Evaluation dataset.

Error improvement with added tree 0,004341 Error improvement with flexible structure 0,002023

Relative percentage gain 46.60%

Table A.51: The Error improvement, compared to the traditional structure of 5 trees, with adding another tree to the traditional structure, and with the 5 tree flexible ensemble structure, based on hyperparameter Scenario 1 on the Car Evaluation dataset. And the percentage of improvement obtained with the flexible structure, relative to that obtained with the added tree.

A.5.2 Scenario 2

For Scenario 2 on the Car Evaluation dataset, the traditionally optimized structures of 5 and 6 trees are tabulated in Table A.52, while the flexible structure obtained with Holistic optimization is tabulated in Table A.53. The prediction performance comparison between the traditional structures and the flexible structure is tabulated in Table A.54. From the 5-tree traditional structure, the 5-tree flexible structure improved prediction performance equivalent to -38.47% of the improvement achieved with the 6-tree traditional structure.

This demonstrates that the flexible structure was not beneficial for prediction performance with the optimization scenario of max_depth for this dataset.

In regards to the structure’s characteristics, the traditional structures were both configured with the max_depth value, 6. For the flexible structure, the max_depth values were 4 and 5 for Tree 1 and 5, respectively, and 6 for the others.

Considering the value range of max_depth being only 1 to 6, the search complexity should be relatively low. It is therefore quite strange how Bayesian Optimization was not able to obtain at least an equal prediction performance to the traditional structure, by finding the equivalent configuration.

A.5. CAR EVALUATION 99 5 Trees 6 Trees

Error 0.052659 0.048898

Max_depth 6 6

Table A.52: The Error score and hyperparameter configuration of the traditionally structured ensembles of 5 and 6 trees, based on hyperparameter Scenario 2. The max_depth was optimized though 1000 iterations of Bayesian Optimization, and evaluated with cross validation on the Car Evaluation dataset.

Error 0.054106

m_d Tree 1 4

m_d Tree 2 6

m_d Tree 3 6

m_d Tree 4 6

m_d Tree 5 5

Table A.53: The Error score and hyperparameter configuration of a flexible ensemble structure of 5 trees, based on hyperparameter Scenario 2. The max_depth values for each tree were optimized through 2000 iterations of Bayesian Optimization, and evaluated with cross validation on the Car Evaluation dataset.

Error improvement with added tree 0.003761 Error improvement with flexible structure -0.001447

Relative percentage gain -38.47%

Table A.54: The Error improvement, compared to the traditional structure of 5 trees, with adding another tree to the traditional structure, and with the 5 tree flexible ensemble structure, based on hyperparameter Scenario 2 on the Car Evaluation dataset. And the percentage of improvement obtained with the flexible structure, relative to that obtained with the added tree.

A.5.3 Scenario 3

For Scenario 3 on the Car Evaluation dataset, the traditionally optimized structures of 5 and 6 trees are tabulated in Table A.55, while the flexible structure obtained with Holistic optimization is tabulated in Table A.56. The prediction performance comparison between the traditional structures and the flexible structure is tabulated in Table A.57. From the 5-tree traditional structure, the 5-tree flexible structure improve prediction performance equivalent to -146.68% of the improvement achieved with the 6-tree traditional structure.

This demonstrates that flexible structure was detrimental for prediction performance with the optimization scenario of learning_rate, max_depth and subsample for this dataset.

In regards to the structure’s characteristics, the traditional structures were nearly identical in all hyperparameter values, with learning_rate being roughly 1.0, max_depth being 6, and subsample being roughly 0.99 for both structures. The flexible structure had learning_rate values ranging between 0.75 and 0.95, the max_depth values were 6 for all trees, while the subsample values ranged between 0.68 and 0.98. However, Tree 5 was the only tree with a subsample value less than 0.89.

5 Trees 6 Trees Error 0.014181 0.009841 learning_rate 0.9992 0.9998

max_depth 6 6

subsample 0.9896 0.9872

Table A.55: The Error score and hyperparameter configuration of the traditionally structured ensembles of 5 and 6 trees, based on hyperparameter Scenario 3. Learning_rate, max_depth and subsample were optimized though 1000 iterations of Bayesian Optimization, and evaluated with cross validation on the Car Evaluation dataset.

learning_rate max_depth subsample

Tree 1 0.7554 6 0.9588

Tree 2 0.8824 6 0.8956

Tree 3 0.8270 6 0.9800

Tree 4 0.7974 6 0.9322

Tree 5 0.9475 6 0.6818

Error 0.020547

Table A.56: The Error score and hyperparameter configuration of a flexible ensemble structure of 5 trees, based on hyperparameter Scenario 3. The learning_rate, max_depth and subsample values for each tree were optimized through 2000 iterations of Bayesian Optimization, and evaluated with cross validation on the Car Evaluation dataset.

Error improvement with added tree 0.00434 Error improvement with flexible structure -0.006366

Relative percentage gain -146.68%

Table A.57: The Error improvement, compared to the traditional structure of 5 trees, with adding another tree to the traditional structure, and with the 5 tree flexible ensemble structure, based on hyperparameter Scenario 3 on the Car Evaluation dataset. And the percentage of improvement obtained with the flexible structure, relative to that obtained with the added tree.

A.5.4 Scenario 4

For Scenario 4 on the Car Evaluation dataset, the traditionally optimized structures of 5 and 6 trees are tabulated in Table A.58, while the flexible structure obtained with Holistic optimization is tabulated in Table A.59. The prediction performance comparison between the traditional structures and the flexible structure is tabulated in Table A.60. From the 5-tree traditional structure, the 5-tree flexible structure improve prediction performance equivalent to -7.10% of the improvement achieved with the 6-tree traditional structure. This demonstrates that flexible structure was slightly detrimental to prediction performance with the optimization scenario of learning_rate, max_depth, subsample and colsample_bytree for this dataset.

In regards to the structure’s characteristics, the traditional structures were quite similar in most hyperparameter values. The learning_rate values were roughly 1.0 and 0.96, the

A.6. STATLOG SATELLITE 101

In document Flexible Ensemble Structures for Gradient Boosting (sider 131-135)