Results

6.1 Modifying the code book size of the VQAPC model

6.1.2 Results

The word segmentation result tables present the number of segments produced, the number of valid words recognized, and three other metrics. The recognition rate is the percentage of recognized valid words out of the actual quantity in the input. Over segmentation is the rate by which the system produces more words or segments than the actual amount in the input. Finally, the table provides the ratio of recognized valid words to the number of segments.

The reinforcement learning result figures show the number of actions executed for each episode of the learning loop. The quantity for each episode is the average number of actions over the 100 random seeds. It should be noted that for results relating to WordSegAG, the figures show the average considering all the five runs.

The results are presented by category of word segmentation algorithm used. First, the results using WordSeg AG are examined. Table 7, Table 8 and Table 9 present the word segmentation results for codebook sizes 128, 256 and 512, respectively.

While,Table 10summarizes the results of the three codebook sizes by presenting the average from all five runs. For a detailed breakdown on the quantity of recognized words, appendixBcan be referred to.

Results Run 1 Run 2 Run 3 Run 4 Run 5

Number of segments 251 243 249 249 239

Recognized valid words 77 65 76 72 66

Recognition rate 15.40% 13.00% 15.20% 14.40% 13.20%

Over segmentation 49.80% 51.40% 50.20% 50.20% 52.20%

Valid words / segments 30.68% 26.75% 30.52% 28.92% 27.62%

Table 7: Segmentation results using WordSeg AG and codebook size 128.

Results Run 1 Run 2 Run 3 Run 4 Run 5

Number of segments 478 478 486 480 483

Recognized valid words 157 159 165 160 163

Recognition rate 31.40% 31.80% 33.00% 32.00% 32.60%

Over segmentation 4.40% 4.40% 2.80% 4.00% 3.40%

Valid words / segments 32.85% 33.26% 33.95% 33.33% 33.75%

Table 8: Segmentation results using WordSeg AG and codebook size 256.

(a) Result of 50 episodes.

(b) Result of first 15 episodes.

Figure 9: Reinforcement learning results using WordSeg AG for codebook sizes 128, 256, and 512.

Results Run 1 Run 2 Run 3 Run 4 Run 5

Number of segments 453 455 453 457 454

Recognized valid words 152 150 152 145 153

Recognition rate 30.40% 30.00% 30.40% 29.00% 30.60%

Over segmentation 9.40% 9.00% 9.40% 8.60% 9.20%

Valid words / segments 33.55% 32.97% 33.55% 31.73% 33.70%

Table 9: Segmentation results using WordSeg AG and codebook size 512.

Results code size 128 code size 256 code size 512

Number of segments 247 481 455

Recognized valid words 72 161 151

Recognition rate 14.40% 32.20% 30.20%

Over segmentation 50.60% 3.80% 9.00%

Valid words / segments 29.15% 33.47% 33.19%

Table 10: Average of segmentation results using WordSeg AG.

As seen fromTable 10, using codebook size 256 generates the most number of seg

ments and achieves the highest recognition rate among the three codebook sizes. This observation is surprising since it is assumed that size 512 would be the one generating the best segmentation results. Although, the results from both sizes 256 and 512 are very close. In addition, the system under segments the combined sound file, and the results appear to be consistent across all five runs for all the codebook sizes.

The plots in Figure 9 illustrate how the agent has performed during reinforcement learning. Compared to the results of codebook size 128, it shows the agent took37%

fewer actions in the first episode when codebook size 512 is used. While, when code

book size 256 is used, it shows the agent took44% fewer actions in the first episode.

Closer inspection of Figure 9bshows that during the initial episodes, results of sizes 256 and 512 appear to coincide and maintain a small margin from the results of size 128.

Results code size 128 code size 256 code size 512

Number of segments 231 424 411

Recognized valid words 75 111 118

Recognition rate 15.00% 22.20% 23.60%

Over segmentation 53.80% 15.20% 17.80%

Valid words / segments 32.47% 26.18% 28.71%

Table 11: Summary of segmentation results using WordSeg TP.

For the next section, Table 11 summarizes the word segmentation results when WordSeg TP is used. For a detailed breakdown on the quantity of recognized words, Table 23in appendixBcan be referred to.

It is apparent from Table 11 that using codebook size 512 achieves the highest recognition rate. While the most number of segments is obtained when using codebook size 256. Again, only a small difference is observed between the results of codebook sizes 256 and 512. With all three sizes, the system under segments the combined sound file.

Comparing the data fromTable 11andTable 10, it shows that recognition rates are lower when using WordSeg TP compared to AG. However, for the case of codebook size 128, there is not much difference in the recognition rates when using either of the two word segmentation algorithms. Furthermore, all codebook sizes produced lower number of segments when usingTP.

The plots in Figure 10 represents the agent’s performance when using the word segmentation results under WordSegTP. In general, it shows the agent took the least number of actions when codebook size 128 is used. Interestingly,Figure 10bshows in the first episode that the agent took around12% more actions with codebook size 128 compared to size 512. On the other hand, it shows the agent took the most number of actions when codebook size 256 is used.

Finally, Figure 11 shows all the cases in one plot. Prominent peaks are occurring at episodes 12, 22, 32, and 42. It can be noted that the agent is initialized the same way each time an episode starts. The peaks may be explained by the interval at which the target network synchronizes with the policy network. The DQNhyperparameters inTable 6show the target network getting updated every10episodes. The peaks seem to occur after these updates.

Figure 11 also shows that using codebook size 256 with WordSeg AG starts the learning loop with the lowest number of actions taken during the first episode. While in the later episodes, the plot shows that the lowest number of actions are taken when using codebook size 128 with WordSeg TP.

In summary, hypothesisH1is tested to be true to some degree only. Results show that increasing the codebook size has positive effects on the recognition rate of the word segmentation results. It is strictly observed when increasing the codebook size from 128 to 256. However, when going from codebook size 256 to 512, the difference between their recognition rates is not that significant. When using WordSeg TP, size

(a) Result of 50 episodes.

(b) Result of first 15 episodes.

Figure 10: Reinforcement learning results using WordSeg TP for codebook sizes 128, 256, and 512.

512 achieves slightly better recognition rate. While the opposite is observed when using WordSeg AG.

Nonetheless, hypothesis H1 cannot be justified for the reinforcement learning re

sults. There are instances in the experiments wherein the lower codebook size case managed to achieve better reinforcement learning results. It is observed that there seems to be a correlation between the ratio of recognized valid words to the number of segments and the performance in the reinforcement learning part. Interestingly, cases with higher ratios develop to have generally lower number of actions. It is to be remarked that this relationship is only anecdotal and not to be considered as generally applicable to otherDQNimplementations.

(a) Result of 50 episodes.

(b) Result of first 15 episodes.

Figure 11: Comparison of all reinforcement learning results for codebook sizes 128, 256, and 512.

In document A Deep Learning Approach to Spoken Language Acquisition (sider 52-59)

6.1 Modifying the code book size of the VQ­APC model

6.1.2 Results

6.1 Modifying the code book size of the VQAPC model