Simplified spellings on Twitter:
A study of non-standard spellings in the UK and the US
Mie Søhagen Berggaard
ENG4790 – Master's Thesis in English, Secondary Teacher Training
30 credits
Department of Literature, Area Studies and European Languages Faculty of Humanities
Spring 2020
Abstract
This study explores the use of simplified spellings on Twitter, with a purpose of analyzing and comparing the use of non-standard spelling between English speakers in the UK and the US. The study further discusses how the results of this comparison fit with the historical evolution of spelling within each country, as well as the general attitudes toward non-standard spellings in the UK and the US. The study is conducted through a quantitative analysis, where a number of words in their standard and non-standard forms are extracted from two datasets: one containing UK- located tweets and the other containing US-located tweets. The motivation for conducting this study was driven by the lack of already existing research that compares the use of non-standard spelling in CMC between English speakers in different countries. The analysis examines the non- standard spellings by comparing them with their standard forms, as well as comparing the
findings from the two countries with each other.
The analyzed words are divided into four main features, which are referred to as ‘abbreviations’,
‘acronyms’, ‘non-standard contractions’ and ‘reduplicated letters’. The results of the analysis indicate that the use of non-standard spelling on Twitter is fairly similar in both countries, albeit with a number of outliers. Overall, it appears that non-standard spellings are slightly more frequent in the US than the UK, which, according to the theory presented in the study, fits reasonably well with the historical evolution of English spelling and the perceived attitudes among English speakers in the two countries. The results also indicate that the most frequently used non-standard spellings in both countries are acronyms, followed by non-standard
contractions, abbreviations and reduplicated letters. As the results only describe the frequency of non-standard spellings recorded in each dataset, there are many opportunities for future research with a more in-depth analysis, providing a more detailed investigation of the results.
Acknowledgements
First and foremost, I wish to express my deepest gratitude to my supervisor, Gjertrud Flermoen Stenbrenden, whose help and support has been invaluable in the process of writing this thesis.
I would also like to thank everyone who has supported me throughout my academic studies, including my lecturers at UV and ILOS at the University of Oslo.
Finally, and most importantly, I wish to thank my family and friends, who have always been by my side with love and encouragement. This thesis would not have been possible without their
constant support.
Contents
1 Introduction and research questions . . . 1
1.1 Introduction . . . 1
1.2 Research questions . . . 2
2 Theoretical background . . . 4
2.1 History of the British and American writing systems . . . 4
2.2 History of non-standard spelling in CMC . . . 8
2.3 Attitudes toward non-standard spelling in the US and the UK . . . 13
3 Methodology . . . 17
3.1 Twitter . . . 17
3.2 Quantitative research . . . 19
3.3 Datasets . . . 20
4 Results . . . 21
4.1 Abbreviations . . . 22
4.2 Acronyms . . . 30
4.3 Non-standard contractions . . . 32
4.4 Reduplicated letters . . . 37
5 Discussion . . . 38
5.1 Common simplified spellings on Twitter compared . . . 39
5.2 The evolution of spelling in the UK and the US . . . 43
6 Conclusion . . . 46
References . . . 49
Tables and figures
Table 1. you, are, your . . . 22
Figure 1. u, r, ur . . . 23
Table 2. (al)though, through, enough . . . 24
Figure 2. (al)tho, thru, enuf(f) . . . 25
Table 3. tonight, tomorrow, before . . . 25
Table 4. because, with, without . . . 26
Table 5. what, thanks, sorry, really, seriously, people, message, text . . . 28
Table 6. acronyms . . . 30
Table 7. non-standard contractions . . . 32
Figure 3. gonna, gunna, wanna, gotta, kinda, sorta . . . 35
Figure 4. dont, dnt, cant, cnt, im, hes, shes . . . 36
Table 8. reduplicated letters . . . 37
1 Introduction and research questions
1.1 Introduction
The English writing system is often called ‘irregular’ and ‘chaotic’ on account of its complex spelling. The writer Jerome K. Jerome described English spelling as a “disguise for
pronunciation” (Upward & Davidson 2011: 1), while the Danish linguist Otto Jespersen has called English spelling “a pseudo-historical and anti-educational abomination” (Upward &
Davidson 2011: 1). Because of this perceived chaotic spelling, many people have advocated a simplification of English spelling. An example of this is the English professor and Member of the British Parliament Mont Follick, who twice introduced a simplification bill to Parliament, calling the current English writing system “a chaotic concoction of oddities without order or cohesion”
(Upward & Davidson 2011: 1).
This can, for example, be seen in the many different pronunciations of -ough. The letter sequence -ough is frequently found in the English language with various pronunciations depending on each individual word. There are at least eight possible pronunciations of this sequence in American English, and nine in British English (Ough (tetragraph) 2020). Examples are the words through, thorough, though, plough, cough and rough, where the -ough sequence has its own distinct pronunciation in each case (Upward & Davidson 2011: ix). These variations in pronunciation are prime examples of the lack of correspondence between English spelling and pronunciation.
Throughout history, there have been several attempts to simplify English orthography by making the spelling reflect the pronunciation to a much larger extent than can be seen in the current English writing system. While a small number of simplifications have gained enough traction to affect English spelling, the majority of these attempts were left unsuccessful (Cook & Ryan 2016:
267).
English spelling has long been known for its absence of change. Ever since the standardization of the English writing system in the early Modern English period (c. 1500-1700), there have been few spelling changes, and this is one of the major reasons why English spelling is considered to be so chaotic – it simply does not reflect changes in pronunciation which have taken place since the early Modern English period (Cook & Ryan 2016: 1). It can thus be said that English has one of the least transparent writing systems in the world. Cook and Ryan (2016: 7) describe
transparency of a writing system as the number of “one-to-one correspondences between letters and sounds”. The more consistently one letter corresponds to one particular sound, and the more
one sound corresponds to one particular letter, the more transparent the writing system. Cook and Ryan (2016: 145) further explain the lack of transparency in English spelling partly to be caused by the lack of change in spelling over the last four centuries. While the writing system has been fairly constant, the English language has gone through many sound changes, which have not been reflected in the standard spellings. The consequence of this is that English spelling has gradually become less transparent, and the writing system is continuously moving away from transparency.
That begs the question, will this development ever change?
The ever-rising popularity of the Internet and computer-mediated communication (CMC) is said to be having a big impact on English spelling. With this new mode of communication, much of today’s writing is no longer monitored by printers and publishers, which leaves much more room for each individual to freely use any creative and vernacular spelling of their choosing (Cook &
Ryan 2016: 1). This has opened up for people to look away from the standard spelling they learned in school and instead write in ways they consider ‘simpler’, more ‘effortless’ and closer to pronunciation. For the first time since the standardization of English spelling, alternative spellings have become widespread and acceptable. But so far, this acceptance is only limited to informal writing through CMC, such as text messages, chat rooms and social media. Whether these spelling variants will eventually find their way to the written standards is yet to be seen (Cook & Ryan 2016: 139).
1.2 Research questions
In this thesis, I will research the use of non-standard English spelling on the CMC platform Twitter. There is no doubt that the Internet and CMC are significant parts of our daily lives.
Around the world, there are an estimated 5.1 billion mobile phone users (67% of the world’s population), 4.3 billion Internet users (57%) and 3.4 billion active social media users (45%). The largest percentage of Internet users by total population is found in North America and
Western/Northern Europe with approx. 95% of the population regularly using the Internet (Bullock 2019). This thesis will focus on two of the English-speaking countries where the Internet and CMC are most frequently used: The United Kingdom and the United States of America.
RQ1: What are some simplified spellings widely used on Twitter in the UK and the US?
This research question aims to find out which types of non-standard spellings are most frequently
found on Twitter in the two respective countries. Types of non-standard spellings include:
Abbreviations: Reduced number of letters, such as you → u, through → thru, because → cuz, thanks → thx, people → ppl etc.
Acronyms: Initialized words, such as to be honest → tbh, by the way → btw, I love you → ily etc.
Non-standard contractions: Alternative spellings, such as going to → gonna, kind of → kinda, can’t → cant, he’s → hes etc.
Reduplicated letters: Repeated letters, such as so → sooooo, haha → haaahaa, yes → yeeeeeees etc.
RQ2: How do the trends in the two countries compare?
The features of non-standard spelling in CMC have been thoroughly researched for several decades, but I have yet to come across any studies that compare spelling in CMC in different English-speaking countries. I therefore aim to find out to what extent the use of these features differs between English speakers in the UK and the US. Since the comparison will be made by using two geolocated datasets, the study will compare the Englishes found in the UK and the US rather than native speakers of British and American English.
RQ3: What does the result say about the evolution of spelling in each country?
Since Johnson’s standardization of English in the eighteenth century, there has been little orthographic change in the British standard written language. The American standard, on the other hand, was introduced as a simplified and more transparent version of the original British written standard, and became widely used in the nineteenth century. Many linguists, such as Egersten (1972) or Kövecses (2000), have described American spelling as a simplified version of British spelling. It can therefore be assumed that British people are more conservative in their use of English spelling, while Americans are more prone to use simplifications. Does the result of this analysis corroborate this notion? How does the result fit with the general attitudes and the historical evolution of spelling within each country?
2 Theoretical background
2.1 History of the British and American writing systems
The English writing system has a rich and variable history. During the Middle English period, English spelling went through considerable changes, as the rapidly changing English language was left without a standard spelling. This was a consequence of the Norman Conquest in 1066, which resulted in French being made the official language of England, causing the Old English written standard to gradually collapse and lose its status (Upward & Davidson 2011: 68). Without a written standard, English writing was able to evolve with no regulation, which resulted in large variation in spelling depending on each writer’s personal preference, training and regional background. It became the norm to write as one would speak, meaning that the various dialectal differences of the spoken language were also reflected in the written language (Cook & Ryan 2016: 116). In addition to this, there were no universally accepted orthophonetic rules, making it possible to spell certain words in several different ways despite the pronunciation being the same.
A rather extreme example of this is the 510 different spellings recorded of the word through (Cook & Ryan 2016: 133). That being said, some variants were considerably more frequent than others, and when excluding all forms found in less than ten texts, the following 34 variants remain (Cook & Ryan 2016: 134-5):
thorgh, thoro, thorow, thorowe, thoru, thorugh, thorw, thourgh, throgh, throw, thrugh, thurgh, thurghe, thurght, yurgh, þorgh, þorou, þorough, þorouȝ, þorow, þorowe, þoru, þorugh, þoruȝ, þorw, þorwe, þorw3, þor3, þourȝ, þrouȝ, þrow, þroȝ, þurgh, þurȝ
(Cook & Ryan 2016: 135) If we then proceed to leave out the forms found in less than 50 texts, only thorow, thurgh, þoruȝ and þorw remain as the most common spellings of the word through (Cook & Ryan 2016: 135).
This large amount of variation in spelling was considerably reduced during the fifteenth century when a standard written language, based on the London dialect, started to emerge (Cook & Ryan 2016: 116). Over the years, there was a clear tendency toward more uniform spelling, which consequently suppressed the use of regional variation in writing. However, since this tendency evolved rather naturally and unregulated, the written language was still far from fixed (Cook &
Ryan 2016: 138). In the sixteenth and seventeenth centuries, the English orthography as we recognize it today was starting to take hold. This was largely done by orthoepists and
orthographers, who shaped the new ‘correct’ spelling of English in publications such as dictionaries and other printed texts. These practices continued in the eighteenth century, with scholars working to standardize the written language even further. One of these scholars was the lexicographer Samuel Johnson, who in 1755 published A Dictionary of the English Language.
More than any previous Modern English publication, Johnson’s dictionary was so widespread and authoritative that it since became the definition of ‘British spelling’. Since this publication, the development of English spelling stagnated to such a degree that Johnson’s dictionary from 1755 is still considered to be the essence of British spelling today (Cook & Ryan 2016: 278).
What characterized Johnson’s spelling ideology was partly the large focus on etymology when developing his spellings, as he prioritized the words to stay true to their origin. His ideology did not focus on transparency between spelling and sound, which many would-be spelling reformers have long strived for (Cook & Ryan 2016: 278). All the way back to the sixteenth century, there have been activists aiming for a more simplified and transparent orthography where each sound should correspond to a particular spelling (Cook & Ryan 2016: 125). However, only one spelling reform can be said to have been successful since the standardization of the English writing system in the sixteenth to eighteenth centuries, and that is the standard American spelling, advocated by the lexicographer Noah Webster. As an American nationalist, Webster’s
motivations for a spelling reform distinguished him from the ones before him. His aim was not only to simplify English orthography, but also to advance the intellectual independence of the United States and to make their political nationalism more apparent to everyone (Cook & Ryan 2016: 267). Webster (1789: 20) believed that having a unique language system was crucial in order to be an honorable independent nation: “An independent nation, our honor requires us to have a system of our own, language as well as government”. But despite being mostly motivated by nationalism, Webster was also in favor of a more regular and consistent writing system:
It has been my aim in this work … to ascertain the true principles of the language, in its orthography and structure ; to purify it from some palpable errors, and reduce the number of its anomalies, thus giving it more regularity and consistency in its forms … and in this manner to furnish a standard of our vernacular tongue.
(Cook & Ryan 2016: 279) In 1789, Webster published his book Dissertations on the English Language, where he presented his reasonings for an American spelling reform. One of these reasonings was a list of four
advantages that would come of this spelling reform:
1. They would make learning to spell and read easier.
2. They would make American pronunciation more uniform.
3. They would make books shorter and thus less expensive.
4. They would distinguish American spelling from British, which would encourage the publication of books in America and strengthen American copyright laws.
(Webster 1789: 396-7) In the coming years, Webster published a series of dictionaries, where his alternative spellings were presented in detail. These spellings included weeding out silent letters from a word, making the spelling of given sounds more regular and making spelling reflect pronunciation to a larger degree than the pre-existent written standard. Some of these changes included replacing -our with -or (humour → humor), -re with -er (centre → center), -ise with -ize (criticise → criticize) and - logue with -log (analogue → analog), to name a few. These changes were seen as
simplifications, making American English spelling more regular and transparent than its British counterpart (Cook & Ryan 2016: 280-6).
Although Webster’s spelling reform eventually took a strong hold in the United States, and became the basis for the current American written standard, it took a long time for printers and publishers in the US to replace Johnson’s British standard with new American spellings. In the mid-nineteenth century, Americans began to adopt the more modest spelling reforms from
Webster’s dictionaries. Some of his more radical suggestions, on the other hand, were never fully implemented in the United States (Cook & Ryan 2016: 279). Compared to the most radical advocates for spelling reforms, Webster’s suggestions were generally moderate, which might be the reason why this is the only spelling reform after Johnson’s standardization of English spelling that can be deemed a success. Webster based his reform on the already existing standard and did not aim to build a more transparent written standard from scratch. A mix between a patriotic ideology in a new independent nation and familiar spellings that did not become unintelligible to those already familiar with the British standard are said to be the key to this spelling reform’s eventual success (Cook & Ryan 2016: 267).
Several linguists have commented on the different suggestions for an American spelling reform.
One of these is Egersten (1972: 26), who has commented on simplifications such as the omission of gh in words where this consonant cluster has become silent in Present-Day English
pronunciation. This includes words like through, though, night, light, high etc. The gh in these words could not simply be omitted without the pronunciations and meanings of the words
changing, which is why each individual spelling would need to be changed according to the pronunciation of the word. The main goal was to make the spelling more ‘phonetic’, and thus simpler. Through was commonly simplified as thru, though became tho, night became nite, light became lite and high became hi. Egersten also notes that these spellings never managed to become the norm in American English formal spelling, but they have managed to stay ‘alive’ for centuries in informal spelling.
As for the -ough sequence in particular, Webster’s spelling reforms mostly left it unchanged, but that does not mean that the -ough spelling was not a subject of controversy. In 1876, the
American Philological Association’s committee on spelling reform recommended that words such as through and though should change into thru and tho. In 1887, these spellings were included in a ‘List of Amended Spellings’, partly developed by the American Philological Association (Cook & Ryan 2016: 373). While some words, such as hiccup (hiccough), plow (plough), slew (slough) and donut (doughnut) eventually became common, acceptable spellings in the American written standard, other words, such as thru (through) and tho (though) had a harder time gaining popularity over the traditional spellings and were eventually abandoned from formal writing altogether (Ough (tetragraph) 2020). According to the Merriam-Webster
dictionary, the alternative spellings thru and tho have never been very common, despite having a long history of occasional use. Their greatest popularity occurred in the late nineteenth and early twentieth centuries, when spelling reformers continuously attempted to adopt these alternative spellings in formal writing. As these attempts were eventually unsuccessful, thru and tho can currently be seen in informal writing, as well as some technical journals (Thru (n.d.)). On September 30, 1975, the New York Times published an article about abandoning simplified spellings, such as thru and tho:
CHICAGO, Sept. 29 (UPI)—The Chicago Tribune said today that it had lost much of its campaign to simplify spelling of the American language and that “thru” was through.
In a front page announcement and an editorial, the newspaper said it had amended its stylebook and abandoned such spellings as “thru,” “tho,” and “thoro.”
“Regretfully we concede they have not made the grade in spelling class,” the editorial said. It said that the “og” words, like epilog, dialog and synagog, seem to be gaining in acceptance and would be retained in their simplified form.
“From now on,” the editorial said, “Webster's Third will be our guide, first variants
preferred. Sanity some day may come to spelling, but we do not want to make any more trouble between Johnny and his teacher.”
The editorial said that in 1934 The Tribune set out to correct an “unspeakable offense to common sense” and simplify spelling. Eighty words were shortened and simplified for use in the paper.
('Thru' Is Through As Chicago Tribune Ends Spelling Fight/subhead> 1975) Although these spellings are no longer found in formal writing, they still play a role in the
linguistic landscape of English, particularly in the United States. Spelling thru instead of through is commonly used by sign makers in the American linguistic landscape. An example of this is the use of thru in road traffic signs, like <DRIVE-THRU>, <NO THRU STREET> and
<THRUWAY>. It is said that the reason for this shorter, more simplified spelling on road traffic signs is generally because of the notion that it is easier for drivers to see and process text that is short and phonetically transparent while driving – thus making the sign easier to read (Cook &
Ryan 2016: 373).
Based on this history, and the differences between British and American spelling, is it accurate to call American English spelling a simplified version of British English spelling? One of Webster’s aims for the new American writing system was to make the spelling ‘simpler’. This notion is also widely accepted among linguists today, who see most of Webster’s changes as ‘simplifications’.
According to Kövecses (2000: 13), American spelling can be described as ‘regular’, ‘informal’,
‘economical’, ‘direct’, ‘democratic’, ‘tolerant’, ‘prudish’, ‘inflated’, ‘imaginative’ and ‘inventive’
– this is of course in comparison to the British standard spelling. He also points out: “in those cases where British English had alternative spellings, Webster always recommended the simpler form for American usage” (Kövecses 2000: 167). Because of these simplifications, Kövecses (2000: 168) considers standard American English spelling to be obviously simpler than standard British English spelling. Clark (1965: 185) has also commented on the simplification of
American spelling, saying: “most British spellings differ from American […] in being longer”.
2.2 History of non-standard spelling in CMC
Although the American written standard can be considered a minor spelling reform, remarkably little has changed with English spelling in general since the early Modern English period. After
all these years, however, this might be about to change. After centuries of the same types of media, the media through which we consume writing has drastically changed in only one
generation. It can be said that we are currently in the middle of a media revolution, where we are moving from the traditional writing styles of handwriting and print to computer-mediated
communication (CMC). This makes writing much more accessible for anyone, which has resulted in English writing being more widespread and popular than ever before. Texts are no longer required to go through publishers and copy editors before they are made public, and the
consequence of this is a much more free and creative use of spelling. Writing through CMC, such as e-mails, text messages, chats, tweets and blog posts etc. has developed into something very different from traditional writing (Cook & Ryan 2016: 270-1). Many people today will spell words differently depending on the context, still sticking to the standard spelling in formal situations, while adopting more creative and simplified spellings in informal situations. This is particularly prevalent among the younger generation who has grown up with CMC through, for example, chat rooms, text messages and social media (Cougnon et al. 2017: 309).
The way English is written in CMC has been of great interest for linguists since the use of this technology became widespread among the public. Already in the 1980s, the idea that computer use had an impact on the language was beginning to take hold. Baron (1984: 119) considered CMC to be a mode of interaction that was gradually replacing other, more traditional, modes.
Traditional writing, as well as language in other contexts, was gradually being replaced by the way language was used in CMC. In 1988, Murray studied the use of electronic messaging (E- messaging) and noted a trend of shortening and omitting words in order to ‘economize’ the time and space used on writing. She observed a number of simplifying strategies, which would allow users of E-messaging to focus on the ability to communicate rather than the necessity of ‘correct’
spelling. Her observations concluded that some of the most frequently used strategies were the use of abbreviations (Murray 1988: 360), as well as the use of multiple vowels and multiple punctuation marks (Murray 1988: 365).
In 1996, Herring published a collection of 14 articles, discussing the use of written English in CMC. Collot and Belmore (1996) described the way English was used in bulletin boards as “a new variety of English”, which appeared to be a mix of what they considered to be written and spoken linguistic features. The same was concluded by Yates (1996), who compared CMC to both written and spoken corpora, looking at different linguistic features, like modal verbs and pronouns. He concluded that writing in CMC was influenced by both written and spoken
language in various ways. However, like the majority of CMC-studies at this time, neither of the
articles conducted an in-depth analysis of orthographic and typographic features in CMC. Instead they mainly compared the lexical and syntactic features of CMC with the traditional written language. One of the few exceptions to this trend was the article by Werry (1996), which discussed the features of English writing on Internet Relay Chats (IRC). Werry observed a number of innovative orthographic features used on IRC, such as abbreviations, acronyms, reduplicated letters, non-standard use of punctuation, as well as other non-standard spellings (Cook & Ryan 2016: 473-4). He also suggested some reasons for these innovative features:
Some of the most characteristic and interesting features of the language used on IRC are the result of a complex set of orthographic strategies designed to compensate for the lack of intonation and paralinguistic cues that interactive written discourse imposes on its users.
(Werry 1996: 56) A few years later, Danet (2001: 16) studied the correspondence between ‘writing-like’ and
‘speech-like’ communication in CMC. As a result of this study, she proposed a list of “common features of digital writing” (Danet 2001: 17), including orthographic features such as
abbreviations (e.g. ur instead of your), acronyms (e.g. ttyl instead of talk to you later), multiple punctuation marks (e.g. Hey!!!! What’s up???), action descriptions (e.g. *sigh*, *grins*), laughter representations (e.g. haha, lol, hehe) and what she refers to as ‘eccentric spelling’ (e.g.
catz and dogz). According to Danet (2001: 16), these features are commonly implemented to imitate oral conversation, and one of the key elements of spoken conversation is speed. “Speed is all-important – we cannot type as fast as we speak, but we do our best to type fast – and editing virtually impossible” (Danet 2001: 16-17). In addition to the speed of the correspondence, Danet (2001: 17) also notes that the way the words are written is part of a strategy to make the
conversation appear as natural as it would be in a spoken setting. The spelling and structure of the text is thus changed to emulate this experience:
We rarely, if ever, encounter them in formal genres of paper-based writing such as business letters or reports, because people have been taught to avoid them. In the past, expressivity had been suppressed by the teaching of literacy in the schools. Children were taught that a written composition must differ in a host of ways from a spontaneous oral sequence of utterances.
(Danet 2001: 17)
What Danet refers to as ‘eccentric spelling’ is often rooted in the spoken varieties of English.
This means that the eccentric spellings are generally based on the word’s pronunciation, and the written form thus adopts a more direct sound-to-letter correspondence than in the standard written language. An example of this includes replacing the inflectional -s with the letter <z> when the phoneme is typically pronounced [z] (e.g. letterz instead of letters or boyz instead of boys).
Another example is the letter sequence au, which can be replaced by a single letter depending on the pronunciation (e.g. spelling the sound [ʌ] with a <u>, so that because becomes becuz). A further example is replacing words or syllables with identically-sounding numbers (e.g. <2>
instead of to or <4> instead of for) (Cook & Ryan 2016: 478). There are, however, exceptions to this sound-to-letter correspondence. This can be seen in instances where the letter <z> is
replacing the inflectional -s even when the common pronunciation is [s] rather than [z], such as creepz, hitz and workz. The reason for this trend can arguably be the result of an extension from the original sound-to-letter trend (Cook & Ryan 2016: 481).
In 2010, the study of language in CMC had further developed, but the same topics were still being discussed. Squires (2010: 482-3) detected the same orthographic features, such as abbreviations (e.g. r instead of are and ppl instead of people) and acronyms (e.g. lol instead of laugh out loud and omg instead of oh my God), as well as an irregular use of apostrophes in contractions and possessive nouns (e.g. hes instead of he’s and Laurens instead of Lauren’s).
While some speakers were observed to use the standard variety of apostrophes 100% of the time, others were observed not to use apostrophes at all. Overall, the use of apostrophes was only documented in about half of all cases. Squires also noted that the use of punctuations, such as apostrophes, in CMC was considerably less discussed among linguists than the use of other features, such as abbreviations and acronyms. Despite this, Squires found that these features were generally more frequently used in CMC than the other, more discussed features. The reason why these features might be more frequently used in CMC could come down to what Sebba (2007:
32) describes as ‘zone of social meaning’, which refers to written symbols that are changed in order to express social meaning, but are not altering their linguistic interpretation. In the case of omitting apostrophes, this rarely alters the linguistic interpretations and can therefore be omitted without causing further confusion.
Schnoebelen (2012: 122) touched upon the same features as Murray, Werry and Danet, but this time in the specific context of the social networking platform Twitter. Because of the 140-
character limit per tweet, it would be expected that words were deliberately shortened in order to save space. While there were some cases of this, Schnoebelen also detected a trend of ‘expressive
lengthening’, which is the same phenomenon that Werry referred to as ‘reduplicated letters’.
Murray also referred to parts of this phenomenon with the definition of ‘multiple vowels’.
Examples of this phenomenon are: sooo, heeeey, hahahaaa and yummm, which typically represent the sounds in spoken conversation, indicating that these reduplicated letters are
representing a longer, more drawn-out pronunciation than the standard spelling. Just like Squires, Schnoebelen (2012: 123) also detected a trend where the shortening of words were not only carried out through abbreviations and acronyms, but also through use of contractions, and
specifically the dropping of apostrophes in already established contractions (e.g. wasnt instead of wasn’t).
In 2016, Tagliamonte commented on the use of ‘Internet language’ among teens in her book Teen Talk: The Language of Adolescents. Here, she presents an analysis conducted in 2009-2010 about Canadian adolescents’ language use, including their spelling on the Internet. In order to find out which non-standard spellings are most frequent among the adolescents, she uses data from private CMC conversations between 45 people aged 17-21 from the Toronto area in Canada (Tagliamonte 2016: 17, 226). As a result of her analysis, Tagliamonte (2016: 233) affirms that the most frequent spelling features in CMC include abbreviations, initialisms and other short forms. Of the most frequently used CMC forms, the acronym lol (laugh out loud) is in a clear lead, being used in a total of 40% of all non-standard spellings. Next comes the other laughter imitation haha (including alternate spellings with expressive lengthening) with a total of 24% of all non-standard spellings. Other frequently used CMC forms include lm(f)ao (laugh my (fucking) ass off), om(f)g (oh my (fucking) god), cuz/becuz/bcuz (because), tmr (tomorrow), ppl (people), btw (by the way), ttyl (talk to you later), tho (though), ic/i c (I see), thx (thanks), wtf (what the fuck), sry (sorry), msg (message), np (no problem) and brb (be right back) (Tagliamonte 2016:
234). All in all, Tagliamonte (2016: 243) discovered many non-standard variants typical of spelling in CMC, but she notes that the CMC forms only accounts for 1.7% of the total number of words in the sample.
In another part of the analysis, Tagliamonte (2016: 240) compares the use of certain non-standard spellings with the standard variants. This includes orthographic variations of the second-person pronoun (you/u), as well as the second-person verb conjugation of to be (are/r). In both cases, a three-lettered word is abbreviated into a one-lettered word because of its shared pronunciation with the alphabetic letter. The result shows that the standard variant you is used in 79.2% of all instances of you/u, while the CMC form u is used in the remaining 20.8%. The standard variant are is even more frequent than you, with a distribution of 86.6%, while the CMC form r is used
13.4% of the time. It is thus clear that the standard variant is by far the preferred choice for both words, but the non-standard variant u is more frequently used than r. Tagliamonte (2016: 249) also compares the future temporal reference system of English, including the future verb forms will, shall and going to, and their abbreviated forms (‘)ll, ima, gonna and gunna. When analyzing the adolescents’ use of these words on instant messaging services, will is the most frequent with approx. 44% of all future references. (‘)ll comes next with approx. 25%, followed by both going to and gonna, which are both used in approx. 8% of all future references. Next comes ima with approx. 3%, then gunna with approx. 1%, and lastly shall without any instances among the participants. Interestingly, the standard form going to and the non-standard form gonna are equally as frequent, and while the standard form will dominates, another standard form shall is not used at all. Based on these results, Tagliamonte (2016: 249) concludes that while the future referencing uses both standard and non-standard forms with varying frequency, the results show evidence of conservatism among the adolescents.
2.3 Attitudes toward non-standard spelling in The US and the UK
With the rapid emergence of CMC in the last few decades, several studies have tried to find out how the use of CMC has affected people’s spelling and how they perceive these non-standard varieties. When researching Americans’ attitudes toward incorrect grammar and spelling, Baron (2008: 169) comments on what she observes to be an increasingly common attitude among the younger generation, which she dubs ‘linguistic whateverism’. This attitude describes a growing indifference to the need for linguistic rules and consistency and has mostly manifested itself among the younger generation growing up with CMC. Baron labels this younger generation of English speakers as ‘the “whatever” generation’ and notes that there are several clear indicators as to why the attitudes of this generation differ from the older generations.
According to Baron (2008: 169), this growing attitude is less about opposing the established linguistic norms and more about a natural shift in social agendas, educational politics and the way we live our lives. There is no escaping the fact that schools have always played a big role in shaping language – both spoken and written – and as Baron notes, the way the education system has changed over the years has also changed the way people approach language. The school system used to be dominated by normative instruction, where people were instilled with an awareness that linguistic rules exist for a reason, and that people’s ability to conform to those rules determines the way they are perceived in society. But since World War II, the education
system in the US has gradually developed in a more informal, non-normative and student- centered direction. This has resulted in a more casual approach to linguistic consistency, which Baron (2008: 170) considers one of the main reasons for the increasing indifference toward the traditionally established rules of the English language. Instead of only focusing on correct grammar and pronunciation, the students have increasingly been encouraged to express
themselves in their own words. There is now more focus on what the students express, and less on how they do it.
In addition to the education system, Baron (2008: 170) also mentions other reasons for this linguistic indifference, such as a growing acceptance of different dialects and sociolects – which has been increasingly encouraged by society throughout the last few decades – as well as a growing trend in academic discourse:
Many of us in the academy have noticed that the present generation of university students is more reticent than their predecessors to engage in debate or to criticize the words or actions of others. When challenged to make judgments or take sides, a common refrain is
‘‘Whatever.’’ Anything’s OK. Let’s not fight over it. Whatever you do or say—including how you say or write it—is fine.
(Baron 2008: 170) There has been some concern among linguists that the attitudes of the “whatever” generation are diminishing the role of linguistic consistency, and with this in mind, Baron (2008: 171) presents a number of potential consequences that this increasing attitude may have on the English language in the future, including potential scenarios such as: “language (both spoken and written) will play a reduced role as a social status marker”, “writing will increasingly become an instrument for recording informal speech rather than the distinct form of linguistic representation that emerged by the end of the seventeenth century in England” and “As a literate society, we will continue to write, but will revert to an attitude toward spelling and punctuation conventions redolent of the quasi-anarchy of medieval and even Renaissance England” (Baron 2008: 171).
As mentioned above, Baron (2008: 169) states that the way we live our lives generally has a big impact on the way we use language, and today’s society has thus played a significant part in developing the “whatever” generation. According to Baron (2008: 170), our daily lives have generally become more and more hectic as time goes by, with many aspects of our lives being sped up for convenience. One of these aspects is the use of language, and particularly writing.
Baron (2008: 170) notes that the opportunity to write with convenient tools, such as computers, has made writing all the more easy and time-saving, as it can now be done without the need to pause and think about how and what we write. It is always possible to go back and edit, if that happens to be necessary. As Baron (2008: 170) describes it: “writing went from being a contemplative activity to a rushed job”. But despite this, Baron (2008: 171) concludes that the growing “whatever” attitude among the younger generation is not caused by these computer- based tools. Instead, she states that this relatively new technology rather acts as a signal booster, which shows what the current tendencies are and how they have manifested themselves in our daily lives. She also argues that the generally accepted writing styles of a particular time have always been customized to fit the prevailing circumstances, and in current circumstances, informal and ‘dressed-down language’ prevails (Baron 2008: 172).
However, these observations by Baron (2008) may be perceived as an overgeneralization of the younger generation. While she detected an overall trend among youth in the United States, there were no detailed statistical analyses backing her observations. In 2016, Bogetić conducted a qualitative study about young people’s own views of non-standard spellings in CMC. When searching through different US-based websites, such as dating blogs for teens, Bogetić (2016:
255) found a number of comments by teens who evaluated the language and spelling of their peers on these online platforms – expressing their disapproval of the use of non-standard spelling.
These attitudes were detected by Bogetić going through the websites with searches for different terms, such as ‘proper English’, ‘language’ and ‘grammar’. After randomly selecting 200 blog posts by US-based teens on the social networking site mylol.net, 81 blog posts included mentions of language; 28 of which were considered ‘short’ (<1 sentence), while the remaining 53 were considered ‘long’ (>1 sentence). Based on these blog posts, Bogetić (2016: 256) concludes that
‘language complaints’ appear to be widespread and prominent on this social networking platform, despite not being the main feature of the blogs themselves. This study shows youth who care about and disapprove of ‘linguistic whateverism’, which indicates that Baron’s (2008) observations do not fit a significant number of the younger generation.
In 2009, Drouin and Davis conducted a more specific study, where 80 American college students participated in a survey. This survey aimed to find out how the students used non-standard spellings – referred to as ‘text speak’ – in different contexts, as well as their opinions and attitudes toward the use of text speak. This includes how they feel about the appropriateness of text speak in both formal and informal communication, and what they think of text speak’s influence on their use of standard English. Out of the 80 participants, only 34 (43%) answered
that they use text speak, while 46 (57%) answered that they do not use text speak. Thus, the majority of the participants did not use non-standard spelling in CMC (Drouin & Davis 2009:
55). Those who answered that they do use non-standard spelling mentioned using text speak on CMC platforms such as SMS, informal e-mails, as well as the social networking platform
MySpace. Despite the majority of the participants not using text speak themselves, 75% answered that they think it is appropriate to use text speak in informal situations, such as written
conversations between friends. When it came to formal situations, such as written conversations with instructors, only 6% found it appropriate to use text speak. It was therefore clear that most of these college students were able to assess different contexts and use the written language in different ways (Drouin & Davis 2009: 57).
In a similar study from 2010, Jones conducted a study in the UK, aiming to find out what the younger generation thought of non-standard spelling in CMC. According to Jones (2010: 7), the study was conducted in order to find out if society is “starting to move away from standard spelling systems through the opportunities presented by CMC”. In order to find out why non- standard spellings occur in CMC and if ‘correct’ spelling is still important in today’s CMC, Jones (2010: 7) conducted a qualitative questionnaire for native speakers of English between 18 and 24 years of age. The results showed that 89.7% of the participants used the Internet every day, 9.3%
of the participants used the Internet 3-4 times per week and 1% used the Internet less than 3-4 times per week. The vast majority of the participants (80.8%) thought ‘correct’ spelling was important, while 19.2% did not believe that this was of particular importance. When asked if they had any problems using standard English spelling, 78.5% said they never had any problems with this, while 21.5% said that they did. The participants were also asked about their attitudes toward a hypothetical spelling reform, where 64% answered that they were against such a reform, 22%
answered that they were in favor of it and 14% had no opinion on the matter (Jones 2010: 23-5).
The participants were later asked to agree, be indifferent to or disagree with a series of
statements, some of which had a clear majority of agree/disagree: “dictionaries should include unconventional and variant spellings” (66.4% agree), “bad spelling on the Internet irritates me”
(67.5% agree), “the English language should have a standard spelling system” (63.2% agree),
“we should spell the way we speak” (59% disagree), “bad spellings should be ignored in job or university applications” (83.8% disagree) and “we should be able to spell the way we want”
(73% disagree). Some statements that had more even results were: “alternative spellings are completely unacceptable (31% agree, 37% disagree and 32% indifference) and “we should judge level of intelligence on a person’s ability to spell” (14% agree, 48.3% disagree and 37.7%
indifference) (Jones 2010: 27-8). When asked about possible reasons for non-standard spellings in CMC, 92% of the participants agreed that “it’s become the norm”. 52% of the participants agreed that “it’s fashionable” and 29% agreed that “it’s fun”. These answers were closely followed by “it’s faster” and “people are unsure of the correct spellings” (Jones 2010: 33-4). On account of these qualitative analyses, there seems to be a stronger hold on ‘linguistic
whateverism’ among the younger generation in the US, while the younger generation in the UK has a larger amount of conservative opinions and attitudes when it comes to spelling –
particularly in informal situations.
3 Methodology
When studying linguistics, many different methods of research may be employed depending on the topic of study. In order for the three research questions of this thesis to be answered in an accurate and precise manner, it requires a large amount of data to be collected and compared. A statistical analysis of quantitative data is thus the most appropriate method to approach this study.
This chapter explores the methods for carrying out a quantitative study of non-standard spelling on social media in the UK and the US. The first section includes the chosen platform for
analyzing alternative spellings in CMC, the second section deals with the process of quantitative research, and the third section explores the use of datasets to conduct an analysis.
3.1 Twitter
In order to analyze English spelling on social media in the UK and the US, I have decided to narrow the scope to one specific social networking platform that is widely used in both countries.
Twitter is one of the most popular social networking platforms worldwide, including the UK and the US. The microblogging platform was created in 2006 and has since developed to become one of the most visited websites in the world. As of January 2020, there are more than 330 million active users of Twitter worldwide. The US is the leading country based on number of Twitter users with 59.3 million users, while the UK is in third place with a total of 16.7 million Twitter users. This means that approx. 18% of the total US population are users of Twitter, and 24.6% of the total UK population are registered Twitter users (Aslam 2020). In 2009, Steven Johnson wrote an essay for Time magazine, explaining how Twitter works as a social media platform:
As a social network, Twitter revolves around the principle of followers. When you choose to follow another Twitter user, that user's tweets appear in reverse chronological order on your main Twitter page. If you follow 20 people, you'll see a mix of tweets scrolling down the page: breakfast-cereal updates, interesting new links, music recommendations, even musings on the future of education.
(Johnson 2009) Like many other social networking platforms, Twitter is designed to post writing, photos, videos and links. These posts are referred to as ‘tweets’ and can either be made public for everyone to see, or private only for approved followers. In this sense, Twitter is similar to most other social networking platforms, such as Facebook and Instagram, but there are several features that distinguish Twitter from other popular platforms. One of these features is the limit of 280 characters per tweet (140 characters until November 2017). This restriction has generally made tweets different from other posts on social media, since the text must be short and compact. It is also possible to forward other people’s tweets and add them to one’s own account, which is commonly called ‘retweeting’. Although one can also post links, photos and short video
sequences, the most common tweets are short written texts up to 140 characters (Twitter (n.d.)).
As Twitter has been growing in popularity, it has also become increasingly subjected to analysis.
One of such analyses was conducted in in August 2009 by Pear Analytics. This American marketing company conducted a study where 2000 US-located tweets were analyzed in order to categorize the different types of tweets. This resulted in six different categories:
‘pointless babble’, ‘conversational’, ‘pass-along value’, ‘self-promotion’, ‘spam’ and
‘news’. The analysis concluded that 40.55% of the tweets could be categorized as so- called ‘pointless babble’, closely followed by ‘conversational’, which made up 37.55% of the all the tweets. A distant third was ‘pass-along value’ with 8.7% and the remaining 13.2% were divided between the categories of ‘self-promotion’, ‘spam’ and ‘news’
(Twitter Study: Usage- 40% Is Pointless Babble 2009). Soon after the publication of this study, Pear Analytics garnered criticism from the researcher Danah Boyd, who expressed a strong disapproval of Pear Analytics’ chosen labels, particularly ‘pointless babble’.
Instead, she argued that a better description for such tweets, where one’s only aim is being social and expressing oneself in different social contexts, is to call it ‘social grooming’ or
‘peripheral social awareness’, rather than dismissing a valid form of communication as
‘pointless babble’ (Boyd 2009). However, no matter what one prefers to call this category
of tweets, it still appears that informal posts with no particular agenda is the most frequent way to use Twitter.
3.2 Quantitative research
Since the process of collecting linguistic data into machine-readable corpora began in the 1950s, the use of quantitative methods in linguistic research has increased significantly (McEnery &
Hardie 2012: 37). According to Rasinger (2013: 10), Quantitative methods are carried out with a collection of ‘quantifiable’ data that can be used in statistics, such as numbers, figures and charts.
This data is generally processed through statistical analyses that aim to answer questions like
‘how much?’ or ‘how many?’. There is no set limit to the amount of data that can be used in quantitative research, but in order for the study to be as accurate as possible, a decent amount of data is required. If the amount of data is too small, the results will likely suffer from the
possibility of inaccuracy and inconclusiveness. Rasinger (2013: 11) further describes the process of a quantitative study to include developing a theory or hypothesis, generating a method,
analyzing a significant amount of data and, lastly, deducting and discussing the results. Johnson (2008: 3) has compiled a list of “four main goals of quantitative analysis”, which is used to describe what one is trying to accomplish with a quantitative approach to the study. This list includes the four processes ‘data reduction’, ‘inference’, ‘discovery of relationships’ and
‘exploration of processes that may have a basis in probability’:
1. data reduction: summarize trends, capture the common aspects of a set of observations such as the average, standard deviation, and correlations among variables;
2. inference: generalize from a representative set of observations to a larger universe of possible observations using hypothesis tests such as the Hest or analysis of variance;
3. discovery of relationships: find descriptive or causal patterns in data which may be described in multiple regression models or in factor analysis;
4. exploration of processes that may have a basis in probability: theoretical
modeling, say in information theory, or in practical contexts such as probabilistic sentence parsing.
(Johnson 2008: 3)
In addition to the previously mentioned descriptions of a quantitative study, Rasinger (2013: 35) also mentions another important part of quantitative research; namely ‘research design’. He concludes that quantitative research can be designed in several different ways; the main
categories being longitudinal research designs and cross-sectional research designs. Longitudinal designs study data that has been collected over a longer period of time, which makes it possible to observe possible changes that have taken place throughout this time period (Rasinger 2013:
38). Cross-sectional designs, on the other hand, study data that is collected over a much shorter period of time and makes it possible to observe how a situation is at one given moment (Rasinger 2013: 36).
In this thesis, the structure of the quantitative analysis will be organized according to cross- sectional research designs, where the quantitative data is collected at one specific point in time in order to provide insight into the distribution of a variable at a particular moment in time. The goal of this analysis is thus to summarize trends and find the correlation of particular features within a specific variable; the variable being people in two different English-speaking countries: the UK and the US.
3.3 Datasets
To analyze these research questions, I will use two Twitter datasets that have been collected from the website followthehashtag.com. This website offers several features to facilitate research studies, such as free Twitter datasets to be used as corpus (Free Twitter Datasets (n.d.)). The two datasets contain geolocated tweets from the UK and the US, respectively. Each dataset consists of Twitter streams with several categories of information, including tweet ID, time of publication, specific location, user name, nickname, Twitter bio, number of followers, number of followings, tweet URL and tweet content. In this study, I will analyze the non-standard spelling features that are found in the published tweets, which is why I will only use the information found in the tweet content. Retweets have been excluded from the datasets, leaving only content from original tweets. The dataset of tweets that is geolocated to the UK consists of 169,033 individual tweets that have been published within a timespan of 167 hours in April 2016 (170,000 UK geolocated tweets. Free Twitter Dataset. (n.d.)). The dataset of tweets that is geolocated to the US is a little larger with 204,820 individual tweets that have been published within a timespan of 48 hours in April 2016 (200,000 USA geolocated tweets. Free Twitter Dataset. (n.d.)).
When analyzing the tweet content, I will use the frequencies of a chosen set of words and phrases to compare the datasets. However, since these datasets consist of a different amount of tweets, it would be inaccurate to compare the raw numbers between the two datasets. When counting the exact numbers of a result, the amount can be described as the ‘absolute frequency’. However, this way of counting will cause errors in comparisons where the sample sizes are not identical. Thus, in order to get an accurate comparison in such cases, one must obtain the ‘relative frequency’.
This refers to the frequency of a number compared to another based on the sample size, and the best way to calculate this is to obtain the percentage of the findings in both cases and compare these to each other (Rasinger 2013: 95-6). In this thesis, I will thus calculate the percentage of each finding from the two datasets and compare the relative frequencies.
4 Results
This chapter presents the analysis of some simplified spellings found on Twitter in the UK and the US. Since the datasets consist of geolocated tweets with no further information of anyone’s language-background, it is not known for certain if the writers of each tweet are, in fact, native speakers of British or American English even though they are located in either the UK or the US at the time of tweeting. For this reason, the analysis will not use the terms British English and American English when comparing the results of the datasets, but the comparison will instead be made between the Englishes found the UK and the US. The results will then be compared by calculating the percentage of relative frequencies in each dataset. Because of the different sample sizes of the two datasets, the raw numbers will only be used to calculate the relative frequency without any further analysis of the distribution of absolute frequency. When deciding which words to analyze and compare, I have chosen a number of words found in the results of previous related studies, such as Murray (1988), Danet (2001), Squires (2010), Schnoebelen (2012) and Tagliamonte (2016).
The analysis is divided into four sections, each focusing on different orthographic features:
abbreviations, acronyms, non-standard contractions and reduplicated letters. The term abbreviation generally refers to a spelling that is shortened from its standard form, whether this is through a phrase being shortened with the use of acronyms or grammatical contractions, or a word being shortened with certain letters being removed and/or exchanged with other letters. In the first section, I will only focus on the latter definition, where a word is shortened through the
removal and/or change of letters. The second section will focus on the acronyms, meaning words that have been formed by using the initials of phrases consisting of more than one word. In the third section, I look at non-standard contractions, referring to contractions that are not acceptable in standard writing, such as omitting apostrophes of standardized contractions and contracting words with no standard contractions. The last section focuses on reduplicated letters, which refers to letters that are repeated within a word in order to emphasize a drawn-out pronunciation.
4.1 Abbreviations
In order to get a broader view of different trends among Twitter users in the UK and the US, I have decided to work with a sample of 20 different words and their different abbreviated forms.
As mentioned above, I have found a number of abbreviated forms from several previous studies that are widely considered to be frequent among CMC users. These words are further divided into five categories: words that are shortened into phonetically alike letters (i.e. you, are, your),
shortened forms of -ough (i.e. (al)though, through, enough), words that can be shortened with numbers (i.e. tonight, tomorrow, before), words that can be shortened with the use of a slash (i.e.
because, with, without) and other words that are shortened with a reduced number of letters (i.e.
what, thanks, sorry, really, seriously, people, message, text).
You, are, your:
In the first category, you is shortened into the identically sounding letter u, are is shortened into the identically sounding letter r and your is shortened into these two letters combined: ur. The following table shows the distribution of these words and their abbreviated forms in the UK and the US.
Table 1. Distribution of you, are and your on Twitter in the UK and the US
Absolute frequency Relative frequency
UK: US: UK: US:
you 18360 34582 95.3% 95.9%
u 914 1489 4.7% 4.1%
are 10479 11031 95.1% 93.6%
r 536 756 4.9% 6.4%
your 9294 14276 97.4% 98%
ur 247 294 2.6% 2%
As seen in table 1, the abbreviated forms u, r and ur are distributed fairly similarly in the UK and the US. When looking at the relative frequency of the standard forms compared to the non- standard forms, both datasets show a similar trend for all three words. The standard variant is used in the vast majority of the cases in both countries, with more than 93% of each word being spelled in its standard form. This indicates that there are no great outliers with the distribution of these abbreviations, as the usage varies between 2% and 6.4%. There are, however, small
differences between the distribution of non-standard spelling in the UK and the US. With a small margin, the non-standard variants u and ur are slightly more frequent in the UK than the US, while r is more frequent in the US than the UK. Since the distribution of u and ur only differ by 0.6 percentage point in favor of the UK, this marginal difference may be deemed insignificant.
The distribution of r differs slightly more between the two datasets, with 1.5 percentage points in favor of the US, but this can also be considered a marginal difference. In figure 1 below, these relatively marginal differences can be seen in more detail.
Figure 1. Comparison of u, r and ur on Twitter in the UK and the US.
(Al)though, through, enough:
Despite being spelled with the letter sequence -ough in standard English, the words (al)though, through and enough have three different sound realizations, and are thus inclined to be spelled in three different ways in informal writing in order to be orthographically closer to the different pronunciations. The -ough in (al)though is pronounced like one would pronounce the letter o, which is why (al)tho has traditionally been a common alternative spelling. The -ough in through
is phonemically pronounced /u:/ and the word has thus been traditionally shortened to thru. The - ough in enough is phonemically pronounced /ʌf/ and can therefore be spelled enuf(f) to make the spelling more phonetically transparent. The following table displays the results of these spelling variations in the UK and the US.
Table 2. Distribution of (al)though, through and enough on Twitter in the UK and the US Absolute frequency Relative frequency
UK: US: UK: US:
(al)though 304 282 82.4% 58.4%
(al)tho 65 201 17.6% 41.6%
through 1317 1457 96.9% 85.3%
thru 42 251 3.1% 14.7%
enough 763 533 99.6% 96%
enuf(f) 3 22 0.4% 4%
Table 2 shows that the standard variants (al)though, through and enough are more frequently used in both datasets compared to their non-standard variants (al)tho, thru and enuf(f), with the majority using the standard variant in both datasets. When comparing the datasets with each other, it appears that the US is more inclined to use the non-standard variants than the UK. The US dataset uses (al)tho 41.6% of the time, while the UK dataset only uses this non-standard variant 17.6% of the time. This marks a difference of 24 percentage points, which indicates that the non-standard variant has a much stronger hold in the US than the UK. The same can be seen with thru, which is also more frequent in the US compared to the UK. In the US dataset, 14.7%
of all cases of the word use the non-standard variant, while it is only used in 3.1% of all cases of the word in the UK dataset. This marks a difference of 11.6 percentage points, which indicates a clear difference between the two countries, although the difference is smaller than with (al)tho.
Enuf(f) also follows the same trend, albeit with an even smaller difference than (al)tho and thru.
In the UK, the word enough is almost exclusively used in its standard form, with a distribution of 99.6% using enough to 0.4% using enuf(f). The US has a slightly larger percentage of non-
standard use, with enuf(f) being used 4% of the time, which marks a difference between the two countries of 3.6 percentage points in favor of the US. Although the use of non-standard variants in the two datasets clearly differs, there appears to be a similar trend in both countries. This trend indicates that, in addition to all three words being more frequent in the US than the UK, they also follow the same order of frequency in both countries. In both datasets, (al)tho is the most
frequently used non-standard variant, followed by thru and lastly enuf(f). A detailed comparison of the non-standard variants between the UK and the US can be seen in figure 2 below.
Figure 2. Comparison of (al)tho, thru and enuf(f) on Twitter in the UK and the US.
Tonight, tomorrow, before:
The words tonight, tomorrow and before have several recorded non-standard variants in CMC, including the use of numbers to replace identically sounding syllables, such as 2nite,
2morrow/2moro and b4. In addition to this, previous studies, such as the above-mentioned by Tagliamonte (2016), also include other abbreviations like tonite and tmr. The table underneath shows the distribution of these words and their various abbreviated forms in the UK and the US.
Table 3. Distribution of tonight, tomorrow and before on Twitter in the UK and the US Absolute frequency Relative frequency
UK: US: UK: US:
tonight 4025 7214 98.7% 96.8%
tonite 50 186 1.2% 2.5%
2nite 4 49 0.1% 0.7%
tomorrow 889 1813 99.2% 99.1%
tmr 0 0
2morrow 4 13 0.5% 0.7%
2moro 3 3 0.3% 0.2%
before 982 1554 97% 97.7%
b4 30 36 3% 2.3%