• No results found

2. Speech corpus and manipulation methods

2.2 Manipulation methods

2.2.1 Duration manipulation

This section describes the method for manipulating the durations of the N2 utterances. For each sentence, the durations of the phonemes in the N2 utterances were manipulated so that they matched the durations of the phonemes in the N1 utterance. For this purpose, it was first necessary to segment and measure the duration of each phoneme in the N2 utterance and each phoneme in the corresponding N1 utterance.

2.2.1.1 Segmentation

Segmentation was guided by visual impressions from waveforms and spectrograms, coupled with the author’s auditory impression. Segmentation was easier when the consonants were articulated with full closure (plosives, nasals, laterals and to a certain extent taps) or friction (fricatives) than when the articulation was approximantic, especially when the formant structure showed smooth transitions rather than abrupt changes. In these cases it was necessary to rely more heavily on auditory impressions. In order to determine the boundaries between vowels and plosives it was necessary to decide how to treat portions of aspiration.

Post-aspiration (following the plosive and preceding the vowel) was treated as a separate segment, whereas pre-aspiration (following the vowel and preceding the plosive) was treated as part of the vowel. This approach was chosen because post-aspiration is a feature that occurs in regularly across dialects, whereas pre-aspiration occurs only in particular dialects and is often facultative3 (Helgason, Stölten & Engstrand, 2003). Vowels at the very end of utterances were left unadjusted because it was impossible to decide where the vowel ended and the exhalation started.

2.2.1.2 Manipulation

As previously described, the segmentation of each phoneme in the N1 and N2 utterances of the same sentence provided each phoneme’s duration. The following explains how the N2

3 Traditionally, it has been believed that pre-aspiration occurs in only a few Norwegian dialects but research shows that pre-aspiration may be much more common than previously assumed. The view on pre-aspiration in Norwegian is currently changing also due to recent investigations into its linguistic function (van Dommelen, 1998; van Dommelen, 2000).

utterance’s phoneme durations were adjusted to match the corresponding N1 utterance’s phoneme durations. The duration of each N2 phoneme was divided by the duration of the corresponding N1 phoneme. This yielded a factor number with which each N2 phoneme was multiplied. The result was that each N2 phoneme ultimately matched the duration of each corresponding N1 phoneme. The following illustration shows an excised word from an N2 utterance in its original form and in the duration manipulated version.

Figure 2.1: The word “kjørte” (= drove) as spoken by a Russian N2 speaker. Original N2 utterance above and duration manipulated N2 utterance below. The Southeast pronunciation of the sequence “kj” is pronounced as a palatal [ ç ] and the sequence “rt” is pronounced as a retroflex [ ʈ ].

The example shows that there are durational differences between the N2 original and the N2 duration manipulated utterances. The most prominent difference is that the ratio between the vowel [ ø ] and the following plosive [ ʈ ] has been altered. In the N2 original, the VC ratio is positive (i.e. C longer than V) whereas it is negative (i.e. V longer than C) in the manipulated version. The VC ratio is important in Norwegian because the language has phonological opposition between long and short vowels. This opposition is realized as a durational trade-off between the vowel and subsequent consonant. There are many Norwegian word pairs that differ only in the VC ratio. For instance, the (main) difference between the words “sette” (= to put) and “sete” (= seat) is that the former is pronounced with a VC: syllable (short vowel and long consonant) and the latter with a V:C syllable (long vowel and short consonant). Although the word “kjørte” is not among the words that change into a different word if the VC: syllable is instead pronounced as a V:C syllable, the pronunciation of the word becomes foreign accented nevertheless.

0.1 sec

ç ø ʈ ʰ ə

2.2.1.3 Problems

The previous section explained that the duration manipulation was performed by changing each N2 phoneme’s duration so that it matched the corresponding N1 phoneme’s duration.

This procedure posed difficulties for several reasons.

In some cases the segment inventories were not identical across the N2 and N1 utterances.

One reason for this was epenthesis (the insertion of sounds) in the N2 utterances. Epenthesis is typically a strategy that non-native speakers use when coping with a phonotactic pattern different from that found in their L1 (Husby & Kløve, 1998). Epentheses were left unaltered as a rule, but if the insertion made the duration manipulated utterance sound unnatural (which could happen if the surrounding segments were considerably shortened) the insertion was shortened just enough to restore the naturalness of the utterance. A second problem regarding discrepancies between the N2 and N1 utterances was that phonemes found in the N1 utterance were sometimes not realized in the N2 utterance. For instance, the word “bilen” (= the car) was sometimes pronounced without the final nasal. Such deletions did not affect the manipulation procedure. In the example with the word “bilen”, the / e / would then simply be manipulated to match the duration of the corresponding N1 / e /. Another discrepancy between the N2 and N1 utterances was that the N2 utterances sometimes had pauses in them.

Pauses were left unaltered except in a few cases where the duration manipulation made the pause sound unnatural in the modified surroundings. In these rare cases the pauses were shortened enough to remove this effect of unnaturalness.

The reason why epentheses, deletions and pauses were left unadjusted (as a rule) was that the focus of this investigation was on the durational pattern of the segments found in the utterances. Therefore, while the experimenter recognizes the potential interesting contributions of epentheses, deletions and pauses in perceptions of non-native speech, disfluencies of this kind lie outside the scope of this investigation.

In addition to problems arising from discrepancies between the N2 and N1 realizations of the same sentence, there were also some inherent problematic issues regarding the type of duration manipulation itself. Firstly, the manipulation affected not only the internal durational organization of the utterance, but also the duration of the entire utterances. This is because for a particular utterance, the utterance duration equals the sum of each phoneme’s duration. For instance, if most of the phonemes in an utterance were shortened, the whole utterance was

made shorter. This affects the impression of speaking rate. (The effects of altered speaking rate will be investigated). Secondly, the manipulation of duration inadvertently also affected the intonation. This is because when the duration of a certain portion of the signal was altered, the steepness of the intonation slope was also changed. The three illustrations below show how the slope of the intonation contour changes when a segment is lengthened and shortened.

Fig. 2.2: Original segment duration.

Fig. 2.3: Lengthened segment duration.

Fig. 2.4 Shortened segment duration.

Figure 2.2 shows a segment of 10 ms duration and an intonation contour that rises from 210 to 230 Hz. When the slope is calculated as the difference in Hertz divided by the difference in milliseconds, the slope is 20 Hz/ms. In Figure 2.3 the segment has been lengthened to 15 ms.

The intonation still rises from 210 to 230 Hz, but the slope is now clearly less steep, only 1.33 Hz/ms. In Figure 2.4 the segment has instead been shortened. The intonation contour, which still rises from 210 to 230 Hz, now has a steeper slope of 4 Hz/ms.

In other words, if a portion of the signal was shortened, then the intonation slope of this portion automatically became steeper and vice versa, when a portion was lengthened the

intonation slope became less steep. However, the duration manipulation affected the intonation slopes only to a very moderate degree and was regarded as having a negligible effect, because an effect could not be detected when the author, a trained phonetician, listened carefully to the stimuli.