Towards Continuity-as-Code

(1)

Towards Continuity-as-Code

From Local Solutions to a High-Level Approach for Automated Canary

Deployments

Lea Çeliku

Thesis submitted for the degree of

Master in Network and System Administration 60 credits

Department of Informatics

Faculty of mathematics and natural sciences

UNIVERSITY OF OSLO

(2)

(3)

Towards Continuity-as-Code

From Local Solutions to a High-Level Approach for Automated Canary

Deployments

Lea Çeliku

(4)

Towards Continuity-as-Code http://www.duo.uio.no/

Printed: Reprosentralen, University of Oslo

(5)

Abstract

Canary deployment is a well-known deployment strategy that allows companies to test their newest version of an application, also known as the canary, with a subset of the real network traffic. This enables engineers to detect various types of errors and performance degradations in advance.

As a result, any negative impacts on the user experience as well as any potential risks are minimized. However, this deployment strategy is as good as it is sophisticated to implement, due to the amount of steps needed to complete it and because of the advancements that have occurred in the world of software deployment, with regards to automation.

This thesis explores canary deployment setups in containerized cloud environments, with special focus on automation. The result is three technical implementations which build on top of another and show the progression of canary deployments from a manual setup to a more automated one. What was observed is that when increasing the level of automation, the complexity of the mechanism hosting the canary implementation increases as well due to the decrease of adaptability in various use-cases. The three created technical prototypes are best-practice solutions which resemble what is offered today from the industry and we classify them as local solutions.

The results indicate that a high-level approach for automated canary deployments is missing and this thesis proposes a novel, language-based approach to overcome the adaptability issues observed from the local solutions. We name this approach as Continuity-as-Code to show the importance of continuous delivery-as-code. Future exploration of this approach can pave the way towards high-level, automated and more expressive ways to implement canary deployments and other deployment strategies.

(6)

(7)

List of Figures

2.1 This figure shows the architectural evolution of deployments. It is inspired by the official Kubernetes documentation [9] . . . 10 2.2 Canary in software engineering from its emergence in 2010

until present. . . 19 2.3 Popularity of Canary in Research . . . 22 2.4 This figure presents the distribution of the papers, based on

the authors affiliations. . . 25 2.5 This figure presents the distribution of the papers, based on

their respective takeaways. . . 27 2.6 This figure presents the distribution of the papers, based on

the venue types they have been published. . . 28 2.7 This figure presents the distribution of the papers, based on

the authors’ affiliations and venue types. . . 29 2.8 This figure presents the distribution of the canary in the

industry, based on the number of talks from 2010 until 2019. 32 2.9 This figure presents the distribution of the canary from both

the industry-based and academic-oriented perspectives. . . 33 4.1 This figure presents a very abstract canary deployment

diagram, without any technical intervention. . . 46 4.2 This figure presents the first model, which we have named

as the "threshold-based model". . . 55 4.3 This figure presents the second model, which we have

named as the "production-based model". . . 56 4.4 This figure presents the third model, which we have named

as the "baseline-based model". . . 58 5.1 The first part of this figure shows Grafana’s dashboard for

the stable version "stable-version", meanwhile the second part shows Grafana’s dashboard for the canary version

"canary-version". . . 74

(12)

5.2 This figure presents six different executions of the "metrics.py" script, while its main purpose remains on showing the different arguments each script execution should take. . 79 5.3 This figure presents the first prototype, which we have been

referring to as the manual canary setup. It shows the whole picture of how every tool is connected to the other in order to perform manual canary deployment. . . 82 5.4 This figure presents the "Rate 200" graph with the right

configurations as an example of how the rest of the graphs will be created in Grafana, for the rest of the metrics. . . 87 5.5 This figure presents the "Rate 200" graph, the "Rate 200 diff"

graph, the "Rate 404" graph, the "Rate 404 diff" graph, the

"Time 200" graph and the "Time 200 diff" graph, configured in the right way. . . 88 5.6 This figure presents the alerting options under the "Payload

Diff" graph as an example. . . 89 5.7 This figure presents the alerting options under the "Time 200

Diff" graph as an example. . . 89 5.8 This figure presents the "Payload Diff" and "Time 200 Diff"

graphs, after creating the alert rules. . . 90 5.9 The first part of this figure presents the webhook configured

to connect Graphana and our Microsoft Teams channel.

The second part presents the alert rules configured for the

"Payload Diff" and "Time 200 Diff" graphs. . . 91 5.10 This figure presents the notification received via the web-

hook configured to connect Graphana and our Microsoft Teams channel. . . 91 5.11 This figure presents the second prototype, which we have

been referring to as the manual canary setup with improved monitoring and analysis. . . 92 5.12 This figure presents the third prototype, which we have been

referring to as the semi-automated canary setup. . . 101 6.1 This figure presents the scenarios, in terms of complexity

and features. . . 120

(13)

List of Tables

2.1 The table of taxonomy used to extract information from papers mentioning canary or simply relevant to the topic . . 23 3.1 This table summarizes the objectives for each of the phases

of the project. . . 43 4.1 The table of comparisons between the designed models . . . 62 5.1 This table introduces the different application versions

which will be used throughout this project, their roles and the environment they are expected to run on. . . 67 5.2 This table lists the metrics which will be taken into consid-

eration, their descriptions as well as where in the script of Listing 7 they are referred to. . . 76

(14)

(15)

Preface

Acknowledgments

My profound gratitude goes to my supervisor Kyrre Begnum who has been a great point of reference and support throughout the entire master program. I would like to thank him for guiding me through this thesis tirelessly. Every discussion has been valuable, positive, educational and motivational, especially during the pandemic times which have been very difficult for everyone. I would like to also thank my supervisor for suggesting this topic for me, in cooperation with Eficode.

In addition, I would like to thankMike Long,Sami Alajrami,Muhammad Kamran AzeemandArinze Akubuewho are industry experts from Eficode for their cooperation and valuable advices. The talks with them and the fellow colleagues at Eficode have been inspirational, insightful and very positive.

Furthermore, I would like to thank all of my friends for always being there for me and listening to me in both good and difficult times. I owe a special thanks for that to my best friendsIra Kalludhi, Floralba SulceandSol Jeanette Nilsen. Other special thanks go to my colleagues and close friends in Albania, specificallyErjon CenolliandArtur Karameta, for doing their best to support me during these challenging years.

Last but not least, I thank my family, my loving mother and father for everything they have done for me, because I would not be here without them. Despite being my family, they are also my best friends, my point of support in life and I love them very much. These past two years have not been easy for me and my mother, so I would like to dedicate this thesis to her for being such a strong woman and to my special dad, my now angel in heaven.

(16)

(17)

Chapter 1

Introduction

Over the years, the software development life-cycle (SDLC) has experienced several changes. The reasons for that are related to the ever- increasing requirements coming from the end-users, mostly related to time, quality and costs. To outshine their competition in the market, software- domain companies must be able to provide frequent, secure and robust services within the shortest possible time-frame. To address that, a new approach came to life which brought a new culture among people and organizations. Its main focus was combining the knowledge and skills of two teams which had always been separated from one another, the developers and operations teams, even though the daily tasks often require them to communicate. By enabling this collaboration, these teams make it possible for the companies to reach the needed automation for their infrastructure and services. This way, processes that have been historically slow and manual finally became fast and automated. This new paradigm is named

“DevOps” and aims at providing agility, continuity and automation for the entire SDLC. Some of its phases include continuous development, continuous testing, continuous delivery and integration, continuous deployment as well as continuous monitoring.

The focus of this document will be the continuous deployment phase, and in particular, one specific deployment strategy, named canary deployment. This technique aims at releasing a new software version gradually into production, by first introducing the change to a small subset of users, before rolling it out to the entire infrastructure and making it accessible by everyone. It presents a way of testing the new service directly in production by observing its behavior, while the software receives a small percentage of the real network traffic. This form of deployment took its name after the old mining industry, where miners used canary birds to test the level of poisonous gases in mines. If the canaries expressed abnormal behavior, the miners would evacuate, since their safety would be at risk.

In comparison to our case, if the new software version does not behave as expected, then it would roll back to the old, stable version, which would then, receive the whole network traffic.

(18)

A canary deployment can be performed through a manual process and that is how most organizations have adopted it until now [1]. The responsible engineer has to look at the generated logs and graphs that monitor CPU usage, error rates and other necessary metrics. This method is very prone to human error and causes poor decision making that might lead organizations to deploy faulty code [1]. Manual canary deployment is slower, because the whole analysis process is slower. Consequently, businesses that want to evolve quickly and optimize their continuous delivery processes will run into several bottlenecks, if the canary analysis is done manually [1].

In order for this strategy to fit well with the DevOps principles and containerized cloud environments, every step needs to be continuous. The canary deployment workflow needs to be automated, fast, monitored, simple to use and able to be implemented in various services and applications, independent of the underlying infrastructure. The problem lies in the fact that the steps of canary deployment processes are hard to automate. These steps include deploying the applications, routing the traffic between them properly, monitoring the traffic and analyzing the metrics in an advanced way, before reaching a final decision about the canary. Therefore, most of these steps are either done manually, or they are partially automated.

Several big companies have developed tools that include canary testing as a feature, but some of them can be costly or not easily adaptable for other business purposes. Some of them can also be very dependent on the underlying infrastructure, or might present complexity in being set up. What is missing is an open-source, inter-operable tool to automate the complete workflow of canary deployment in a simple-to-use manner. This document will investigate the automation of canary testing capabilities, with respect to containerized cloud environments.

Problem Statement:

• P1: Design a model to perform canary deployments, by automating all the steps required in canary deployment scenarios.

• P2:Investigate both in theoretical and practical manners, the automation and feasibility of canary deployment scenarios in containerized cloud environments, while keeping true to P1.

The model needs to be open-source, inter-operable across different continuous delivery and continuous integration platforms, easy to use, declarative, with a special focus on automation. To design such a model and later on develop a prototype based on it, Kubernetes and the advanced technology behind it will be used, since it is has become a popular container orchestration framework with a tight partnership with cloud technologies, in the current industry. There are many more stories, concepts and tools to be discovered for a comprehensive understanding, and that is why the next chapter will come into use.

(19)

1.1 Outline

This thesis includes eight chapters: introduction, background, approach, design, implementation, analysis, discussion and conclusion. Each of them has a specific mission that reflects the line of logic followed in this work.

This section will serve the purpose of outlining the key goals of the chapters in this thesis.

Chapter 2:The background chapter provides the necessary knowledge for understanding software deployment and the way it has been impacted by agile methodologies, DevOps and infrastructure advancements. Dif- ferent deployment strategies are explained and more specifically canary deployments. Furthermore, this chapter explores the research on canary deployments both from academia and the industry.

Chapter 3: The approach chapter plans the journey, by choosing a method of research which defines what the reader could expect as an outcome from this thesis. The approach method employed to investigate the problem statement is exploratory research, but alternative methods are also discussed.

Chapter 4:The design chapter presents a unified technical terminology around canary deployments, which will be used throughout this thesis. In addition, models are designed which feature different ways of performing canary deployments and a short comparison of these models is also presented. Finally, this chapter discusses some additional thoughts about canary analysis as one of the most important steps in canary deployments.

Chapter 5: The implementation chapter demonstrates the evolution of canary deployments from a manual setup to a fully automated one, based on one of the models created in the design chapter. The prototypes are implemented on a containerized cloud environment, which features Kubernetes. This chapter shifts from presenting technical solutions to a thorough architectural discussion, which then leads to the proposal of a novel idea, namely Continuity-as-Code.

Chapter 6: The analysis chapter investigates the outcomes of this journey. It evaluates the terminology and models presented in the design chapter and lists observations about the prototypes and ideas presented in the implementation chapter.

Chapter 7:The discussion chapter discusses the results elaborated upon in the analysis chapter. It also serves as a reflection on the challenges faced during each step of the journey, the approach method chosen, the relation between this thesis and the literature selected in the background chapter.

Finally, this chapter discusses future directions this thesis could take.

Chapter 8: The conclusion chapter serves as a summary of the main results of this thesis. Automated canary deployments open new research paths and further exploration of Continuity-as-Code is necessary to improve what was proposed in this thesis.

(20)

(21)

Chapter 2

Background

This chapter is divided in three main sections. The first one describes the main movements in software engineering that have revolutionized the entire digital industry, shaping its form as we know it nowadays. This historic landscape will provide a complete overview of the journey of software deployment from its early stages to the latest strategies of releasing software, amongst them canary deployment. In the second section, canary capabilities will be discussed in depth through related research papers, with the intention of providing a comprehensive understanding of the concept and the process behind it. The third section will focus on explaining a number of tools and technologies that will be used in the upcoming chapters.

2.1 The Art of Deploying Software

Before going into details about canary capabilities, it is valuable to provide a comprehensive timeline of the main events in the world of software engineering, with respect to software deployment. The progress in this field over the last five decades has been astonishing. Let’s travel back to the ’80s, when the SDLC was working very differently. Internet wasn’t widely known and used. So, how did software-domain companies operate in the market back then? How were applications getting delivered to the end users? Most importantly, how was a typical work process and culture within the companies? To answer these questions, Thomas Limoncelli, a famous American system administrator, network engineer, author and speaker, who worked at Google for a long time and now works as a Site Reliability Engineer (SRE) at StackExchange, held a speech during LISA conference in 2011, and he started by asking his audience: “Do you love the 80s?" [2]. The audience obviously loved the ’80s, approving happily. Then he continued: "There’s so much to love about the 80s, ... I loved the computers from the 80s, the software, the software development methodology of the 80s” [2]. That’s when the audience became silent and Thomas laughed. Why was that? Simply because the software development methodology used in the ’80s was the famous "waterfall"

(22)

model, that today does not have that many admirers.

The waterfall model, introduced in the ’70s, provided the software- domain companies with one approach to make things work, in a time where changes were not so frequent and every step had to be completed, before going to the next one. It was a downhill approach, where going back to a previous step was not very common. Briefly, it was made of five important steps, where the analysts were the ones who talked with the customers and wrote a big book of requirements, which was then handed to the requirement specifications department. The latter prepared another book of requirements, based on the previous book, while adding their own specifications. This book was handed to the designers, who would prepare another book, while adding their own requirements. Afterwards, this book would be handed to the implementers, who were the actual developers writing the code. When the code was finished, it had to be tested from the testing and integration department. If the tests were completed successfully, the software would be handed to the operations and maintenance department. They would ship the code in floppy discs or similar, go to the end user and continue with the needed actions, such as provisioning, installing, upgrading, maintaining , backups, restoring or even scaling the software. Therefore, producing the code and installing the code were two completely separate processes, happening on two different places, one in the company’s premises, and the other one at the user’s premises. This meant for developers and operations teams to be completely disconnected. The customers were getting their code way much later than when they initially introduced their requests. It was a "great" time to be a developer, because even if a customer was complaining, the developers would only say "We did what was written in the book". As Thomas, during his speech, also said, "It was great to be a developer in the 80s, because software at the time did not have bugs, because bug tracking systems had not been invented yet" [2].

If waterfall was so good, what changed it? The ’90s came, where the internet started spreading globally and web servers became a common communication tool. Through these technological enhancements, even software moved to the web and it started getting deployed using practices of the client-server architectures [3]. The client-server architecture is a computing model, where the main server hosts, manages and controls the majority of the resources and services used by the client [3]. The waterfall model was still being used as a software development approach, but this would not last long, since the way software was being deployed changed [4]. Applications were being sent to the web servers, and clients were using their browsers to communicate with the web servers. The servers were owned by the same company that was producing the software. This meant for developers and operations to be working on the company’s premises, and they could not be as disconnected as before. First issues between these teams started arising in the late ’90s, which suggested the need for other working processes, and the 2000s were about to change everything.

The dynamics of the industry changed, because competition became very

(23)

intense. To be able to excel in the market, companies had to perform very frequent changes, add new features, while keeping their software stable and reliable. As a result, using waterfall was getting outdated. Novel technologies required modern methodologies to support the SDLC.

2.1.1 The Rise of Agile Methodologies

As we just learned, velocity and reliability became two key factors in determining a company’s success in the 2000s. Why were these variables so difficult to maintain? The first reason was that software was taking too long to be developed, since several bottlenecks were being discovered in the existing waterfall methodology. That is why novel principles came to the rescue. Their main focus was to change the working culture among developers, while aiming towards rapid software development. This approach paid more attention to the people, rather than the tools and technologies themselves. Its goal was to bring end customers, business teams and developers closer, in order for collaboration to become more efficient and productive. Later on, this movement was properly named agile development.

By deviating from the usual design requirements, system specifications, and extensive user requirements documentation, project leaders started focusing on finding ways to perform small, rapid, and frequent releases of the software. Specifications and Requirements’ teams got coupled up with the Design and Implementations’ teams, since all their documentations and products were prone to frequent changes. It was meaningless to isolate these teams from one another, as they had to be able to communicate freely and act fast in case requirements changed. All the applications started getting developed in series of small increments, so that everyone involved would be able to provide quick feedback.

In case end-users required new changes, developers would be able to include them in the upcoming application versions. On the other hand, agile development would not have been able to deal with these changes without an extensive tool support. Version control systems would allow developers to continuously collaborate with each other, automated testing environments would consume less time and be more reliable than manual testing, configuration management tools would be able to support automation in various system infrastructures and more.

Even though software-domain companies had different core business purposes and working cultures, the agile methods being used had quite similar characteristics with one another. This led to the creation of the Agile Manifesto in 2001 [5], which reflected the basis for this new philosophy.

Only four years later, agile development methods were included and referenced in one of the most classical books about software engineering in the academia, written by Ian Sommerville, titled "Software Engineering"

[6]. In his book, the writer also cites some of the most interesting statements of the Agile Manifesto [5], as follows:

(24)

• Individuals and interactions over processes and tools.

• Working software over comprehensive documentation.

• Customer collaboration over contract negotiation.

• Responding to change over following a plan.

In these statements, there is value on the items on the left, but not more than the items on the right of each sentence. By helping each other, Agile Manifesto’s professional writers claim to have uncovered better ways of developing software. Therefore, it is not subtle that this approach focuses on bringing people together. The result would be long-term value for the company, in terms of motivation, effectiveness, and costs. If everything was working well, why was there a need for another approach?

2.1.2 The Age of DevOps

As mentioned, agility in development would allow developers to focus on the software and collaboration, rather than on formal documentations. The result was obvious, since code was being developed in the highest pace and quality ever experienced by the companies. What about the release of these applications and versions? How were the operations’ teams affected by this agility in development? The challenge of wanting velocity in releasing new features was mostly rooted in the disconnection between developers and operations. Even nowadays, developers get rated based on change and shipping new features [2]. Operations, on the other hand, get rated by maintaining stability of the system [2]. As it can be imagined, change is not a good friend of stability, especially when frequent, since it might bring up new problems in the system. Being afraid of shipping new features rapidly, the disproportion in time between producing software and releasing it was quite disturbing, in terms of risk increase and quality decrease. There was a need to improve not only the development side of the software, but the delivery and deployment processes of it as well.

To deal with this, agile infrastructure was born around the same time as agile development (2001). At the time, it did not have an established name. It only existed as a competing idea, together with waterfall and agile development. Shifts from waterfall to agile methodologies were already happening, but going from agile development to agile infrastructure needed its own time. This necessary transition was properly given a name in 2009, during the Velocity Conference by two well-known software engineers working at Flickr, John Allspaw and Paul Hammond, who held a speech about the way Flickr was deploying, at least ten times per day [7]. They stressed the importance of change, and building the right infrastructure to allow it to happen as often as it needs to. "Devs thinking like Ops and Ops thinking like Devs" [7], they claimed. And there it was, the so-long needed term "DevOps".

(25)

DevOps defined a new engineering methodology, a novel organizational structure and an innovative management philosophy. Instead of the requirements being thrown down the waterfall to the operations, developers and operations began working together. They joined forces for the sake of extreme reliability, great stability and high velocity of change in a software product. A new mindset was created within organizations which featured a high level of cooperation between its departments. To support these changes, a great set of automated tools was built as well. Therefore, DevOps was not only about tools and technologies, processes and culture, but above everything it was about people. Its roots are easily found in the agile methodologies.

Adopting DevOps practices was not an easy task and several teams are still struggling to achieve it within their companies. These practices enable businesses to improve their software delivery performance, based on four key indicators uncovered in the research done by the authors of

"Accelerate" [8], which explains the science behind the “state of DevOps”.

These measurements include: deployment frequency, lead time from testing to production environments, mean time to recover from downtime, and the change failure percentage [8]. Organizations can improve these indicators simply by implementing automation strategies and tools in the right way. In this regard, DevOps is not a process with an initial and final destination, but instead a continuous practice of improving every part of the SDLC, including the development, testing, integration, deployment and monitoring phases. To execute actions in each phase, there are several tools that come to help, and some of them existed before DevOps even came into the picture. This indicates that this movement had initiated before it got its name, but when DevOps surfaced, existing and novel tools and practices started getting properly recognized and used in the industry.

Some of the most widely known practices of DevOps include automated infrastructure, continuous integration and delivery practices, continuous deployment strategies, shared metrics among developers and operations, and more. By implementing the right practices in the correct ways, organizations do not need to consider maintenance and manual work anymore. Instead, they have the opportunity to focus on new ways to improve their systems and drive their business operations, in terms of market share and profitability.

An important piece of the DevOps movement are the people, and therefore a great cultural transformation was necessary for the entire system to function. These changes featured mutual trust and respect, continuous collaboration and learning as well as avoidance of finger- pointing every time failures occurred. Later on, this concept expanded across organizational boundaries, which made DevOps the new ideal mindset towards culture change in people, leadership transformation in business processes and technical shifts when it came to existing practices.

(26)

2.1.3 The Impact of Infrastructure on Software Deployment As it was previously mentioned, DevOps emerged as a means to enhance deployment and it did this via continuous delivery and continuous deployment practices. A business can keep excelling on the market, if it rapidly produces reliable and secure software. That is how its end customers remain delighted by the service, and that is also a very good reason for deployment being a key piece in this puzzle. Until now, the main software development methodologies were discussed and described in detail. Before we move further, it is time to get an understanding of the impact that infrastructure has had on software deployment over time, because deployment and infrastructure obviously have a tight relationship.

For this reason, three essential eras need to be discussed and all of them have a common goal: "Organizations being able to deliver effective and multiple deployments of applications for end customers". They are shown in figure 2.1, where we can see that they used different technological strategies to reach this objective.

Figure 2.1: This figure shows the architectural evolution of deployments. It is inspired by the official Kubernetes documentation [9]

.

The first era is known as the “Traditional Deployment” period, because it meant running several applications on physical servers. The issue consisted on resource allocation, as it was not possible to define resource boundaries between applications running on the same physical server [10]. For instance, if several applications were running on the same physical server and one of them needed more resources to function, the other applications would lack these resources and consequently they would under-perform [10]. The solution proposed at the time was having one physical server per application and that obviously meant for many resources to remain unused, which would threaten scalability and hardware costs.

The second era is named “Virtualized Deployment”, since it brought

(27)

virtualization in the picture [10]. Virtual machines (VM) were fully operating machines that utilized software instead of physical servers to run applications. As it can be seen in figure 2.1, this would make it possible to run several VMs on top of a single physical server’s control processing unit (CPU) [10]. While running applications in separate VMs and multiple VMs on the same physical hardware, resource isolation was achieved, scalability was improved, costs were reduced and information security was enhanced [10]. When cloud computing emerged, virtualization got even more popular and tools, like Puppet [11] and Chef [12] were created to help with configuration management. In addition, tools like MLN [13] and Terraform [14] were built to help with infrastructure provisioning, and this approach was named Infrastructure-as-Code. It didn’t take long and the problems started again, because VMs were built on images that contained a specific operating system (OS), and once created, one could not modify those configurations anymore. It became frequent that developers were using a different OS in their development environment, compared to the one running in the production environment. Therefore, moving applications through environments, adapting their dependencies and libraries, became very challenging and this created the "shipping problem".

While the operations teams were improving their infrastructure tools and technologies, the developers needed to catch up as well. They created the containers, which solved the "shipping problem". For this reason the third era is named "Containerized Deployment". Containers are similar to VMs, but their use is wider. The main difference between them is related to their architecture, because containers only include their own libraries, dependencies and a hardware emulator, while VMs also include their OSes. This makes it possible for the containers to be separated from the underlying infrastructure, and this leads to them being very light-weight and portable across various cloud platforms and OS distributions [10].

They provide consistency across the development, testing and production environments. All that containers needed was a framework, such as Docker [15], that allows developers to ship their code together with all the components it needs, such as libraries and other dependencies, making them independent of the base systems. This technology plays an essential role in the DevOps field, since it helps avoiding infrastructure-based conflicts between developers and operations teams.

Even though container run-time Application Programming Interfaces (API), such as Docker, are well-equipped to manage individual containers, they are unable to manage multiple containers at the same time [16]. To address this issue, a container orchestration tool like Kubernetes comes into the picture [16]. It is an open-source system for automating deployment, scaling, and management of containerized applications [9]. Kubernetes relies on more than 15 years of experience from Google’s production system, which is then combined with the best ideas and practices of its supportive IT community [9]. Understanding Kubernetes can be complex, but it has become essential for software-domain companies to establish a solid knowledge about its capabilities.

(28)

2.1.4 Deployment Strategies

By having a clear overview of the way software used to be developed and deployed over the years, we can understand what we mean by using the words "better","faster", "... of a higher quality" today. From the time when software used to be released via floppy discs all over the world, software started getting released via web servers faster, and now most of the applications get shipped via virtual machines and containers in the cloud. Another characteristic was that software before used to be treated as a monolith, which meant that for every change, the whole software had to be updated and released again. Nowadays, applications are made of several smaller services, which can be developed and deployed separately, often called microservices. Developers only update the parts that need a change, and these parts get deployed separately.

With everything being about deployments, the deployment strategy becomes more important than ever. For these applications or microservices to get deployed in effective ways, many competitive strategies existed in different periods of time. Now, some of them get combined together, so that the result is as effective as possible. A brief explanation of various deployment strategies follows, so that the readers can obtain a good understanding of the main concepts behind them, and as a result, differentiate between them.

Big Bang Deployment

"Big Bang" deployments update whole or large parts of an application at a time. This suggests for the application to be monolithic. The "waterfall"

methodology made use of this strategy, since it dates back to the days when software was released on physical media and installed by the end customer [17]. Using this strategy, the various SDLC phases must be completed in sequence, before the application gets to production. Most modern applications can be found in forms of microservices. They get updated independently and frequently, teams need to be able to roll them back to previous versions, and deliver new releases with minimum downtime [18].

Hence, the "Big Bang" approach is considered slow and risky for modern teams.

Rolling Deployment

This deployment strategy is also known as phased, gradual, incremen- tal, or step-deployment [17]. This technique replaces an older application version gradually to a new one and assumes that the application is deployed with multiple replicas, where each replica delivers the same service.

The deployment happens in a longer time frame, but the advantage of it remains in its ability to reduce downtime and make deployments transparent to the users [17]. This is achieved by keeping some older replicas running to service users requests, while the new version is being rolled out.

Blue/Green Deployment

The blue/green deployment strategy differs from the other deployment

(29)

strategies, since the new version of an application is deployed alongside the existing version. If the testing phase decides that the new version meets all the expected requirements, the traffic gets switched from one version to the other at the load-balancer level [18]. Rolling out new versions and rolling back to the older versions is possible instantly. The downside with this approach is related to the need of having double the amount of resources.

Canary Deployment

Our main topic of interest is canary deployment, which consists of gradually shifting the network traffic from the old application version to the new application version. This means that newer code gets deployed in small parts of the production infrastructure, which makes it available only to a small subset of users. This minimizes any negative impact on the user experience as well as any risk about future challenges. Most consider it as the best way of testing a software, because of its ability to monitor the software performance and behaviour (error rate), while receiving real network traffic in the production environment [18]. Usually the traffic is split based on certain weights. For instance, 90 percent of the requests go to the old version, 10 percent go to the new version. Then, 70 percent of the requests go to the old version, 30 percent go to the new version. Between each iteration, the new version’s performance is monitored to determine how fit it is for production, until all the network traffic gets directed to the new application version. In the rest of the chapter, there will be more information about the history and capabilities of this deployment strategy.

A/B Testing

Let’s take a moment and remember that a lot of basic and comprehensive testing has already been completed before these deployment strategies take place. Also, it is important to understand how every technique works, since there are slight differences from one strategy to another. A good example of that is "A/B Testing" which is usually implemented through canary capabilities, since it routes a subset of users to a new feature or func- tionality, under certain conditions [18]. It is widely used for making business decisions based on statistics, so it is considered more as a comparable testing between two versions, rather than a deployment strategy. This is one of the techniques, which needs to be combined with something else and canary is the perfect candidate. A combination of these two strategies would improve not only the deployment workflow in general, but the business decisions as well, since indirect feedback from users are likely to happen very fast.

Shadow Deployment

Shadow deployment consists of releasing a new version alongside an existing software version. It is used to test the performance of the new version, while sending the same network traffic to both of the versions.

Even though the resources and traffic requests are duplicated, this strategy has no real impact on the end user, since the responses are only taken from the old (stable) version [18]. This is very useful when testing production

(30)

load on new features. Only when the new feature meets the expected performance, the application gets rolled out. Its downside is related to the complexity with the setup and maintenance requirements.

Feature Flags/Toggles

Even though feature flags are not known as a code deployment strategy, they need to be mentioned in this section, since they control what we call the "feature life-cycle management" [19]. Features are wrapped in conditionals. Afterwards, toggles are used to turn the features on and off, or flags which can return a wider variety of values. Then, the code will know how to interpret them, based on the given instructions. Feature flags are used to support long-term control, percentage roll-outs, and multivariate states [19]. Therefore, they are not an application deployment strategy, but rather a part of deployment strategies.

2.1.5 A Canary in a Coal Mine

As we now understand, canary deployment is part of a bigger paradigm.

But why was this deployment strategy named after a bird? Travelling back to the year 1913, the research on monoxide carbon of John Scott Haldane, a famous Scottish physiologist, recommended the use of canaries in coal mining practices [20]. Canaries were suggested as a technique to detect carbon monoxide and other toxic gases before they could hurt humans, in this case miners. This practice was initiated by British miners, who were taking canaries down into mine tunnels with them. If toxic gases were present in the mine, the first ones to detect them would be the canaries, since they would get severely ill and die. This was a clear sign for miners to evacuate the tunnels immediately. This technique was used for 75 years, until 1986, when other less brutal techniques were discovered, which did not include canaries [20]. Even though it phased out, this technique is widely known as "Canary in a Coal Mine".

But, how is this analogous to a canary in software deployments? In the same way that the birds were used to mitigate the risk that all the miners were facing in coal mines, a new software version is only made available to a small percentage of the real network traffic and stays alongside the old, stable version, which gets the rest of the traffic. If the new software version behaves as expected, without losses in performance, its user base gets gradually increased. This loop goes on, until the new version completely replaces the old application version. If problems with the canary version start arising though, then the network traffic gets redirected to the old, stable version. As a result, only a small percentage of the users will be affected by the version change. Based on their infrastructures and systems, companies have approached this workflow quite differently from one another. This will be discussed in the next subsection.

The history of canaries’ emergence is very interesting, but how this innocent bird became a game-changer in release engineering is a bit of a

(31)

puzzle, since the thing we now call "canary deployment" does not have an official "release" date. Therefore, thorough research was needed in order to build a timeline and give it its own origin story.

2.2 Research in Canary Deployment

The first questions one gets when hearing about a new topic are always related to the person or company who invented it or the first uses of it in the real world. Sometimes, like in our case, a timeline of events and thorough research are needed to understand the evolution of a certain topic or technology. We define this as a historic review of the events. Moreover, we need to observe the involvement of academia and industry in the topic.

We do that using a taxonomy to summarize the best available evidence on a particular research topic and we define this as a systematic review. Finally, we need to provide insights and context into actual evidences that have been gathered and we define this as a context review. Now, let’s begin the historic review with an award at the Oscars in 2010.

2.2.1 The Rise and Evolution of Canary Deployment

In July 8th, 2010, a software named "CANARY" [21] won the R&D Magazine’s "Oscar of Innovation" awards [22]. R&D refers to "Research and Development". It is a magazine founded in 1959, and has since then been a very close partner to research scientists, engineers, and technical staff members at laboratories around the world [23]. The R&D 100 Awards are given annually to the 100 most technologically significant products that were introduced in the past year [23]. Sandia National Laboratories (SNL) and the United States Environmental Protection Agency (EPA) were the creators of this canary software package, whose goal was to detect anomalies in water during quality assessments. As a rapid and accurate detector of contamination incidents in drinking water utilities all over the world, CANARY aims at providing the highest possible quality of water to its customers. This software analyzes signals from networked sensor data, and it works like a canary analysis system. Once it detects anomalies, it alerts in a specified way the responsible people. This was the first time a tool got the name "Canary" in the technology world, with a unique description, such as "This canary sings out water warnings" [22]. Maybe, but not certainly this project paved the way for the term "canary" to enter the software engineering world.

Google Chrome Canary

Just a couple of days later, in July 22nd, 2010, Anthony Laforge, an experienced Program Manager at Google, announced Google would be rolling out a new release process with the intention of accelerating the pace at which Google Chrome stable releases became available to the users [24].

Until then, Google had three main release channels, the "Stable", "Beta"

(32)

and "Dev". Each of them was associated with a different function. The

"Stable" channel was and still is used to reflect the traditional production environment, it is the safest application version for users to engage with.

It used to be updated quarterly. The "Beta" channel showcases the next features for Chrome, without much risk involved for the users, but still it was not as stable as the previous mentioned channel. The "Dev" channel had a similar purpose as the "Beta", but it was less polished and stable.

There were three main goals being achieved with the new release process, and Anthony explained them in his post [24], as follows:

• Shorten the release cycle and still get great features in front of users, when they are ready.

• Make the schedule more predictable and easier to scope.

• Reduce the pressure on engineering to “make” a release.

Each of these goals improved a different component of the DevOps chain. The first one aimed at accelerating the release of the new features, without any loss of quality. The second focused on implementing a good project management practice, and the third intended the improvement of the working process for software engineers, in terms of pressure and stress of releasing new features in a single release cycle. Along with this movement, a fourth channel came by, named "Canary". Chrome Canary represented the most "bleeding edge" official developments in Google.

Therefore, it was the most experimental channel out of all the versions, since its releases were rolled out as soon as their build was complete. There was also no initial testing or usage performed by Google engineers.

This new process resulted in shortened release cycles from quarterly to six weeks for major updates in the "Stable" channel. The "Beta" channel was getting updated approximately a month before the "Stable" channel. The

"Dev" channel was getting updated once or twice every week, so that some basic release-critical testing could be performed in time. The new "Canary"

channel, though, had the fastest release cycle, as it was performed daily, according to the specified time schedules from Google. Chrome Canary has been used by Google in order to test new features directly in production, and if it gets killed, then software engineers either fix it, so that it can go to the "Dev" channel next, or they just erase it as a version.

The Book of Continuous Delivery

There we have the first two approaches in the software-domain world, that used the "canary" concept. While the coincidence is likely to be nothing more than that, it is possible that Sandia National Laboratories’ software might have inspired Google for the name of its newest release channel.

Later on, authors and professionals in this field developed the concept of canary and adapted it to the software world. The first time canary was defined as a releasing methodology in the way we know it today, happened

(33)

in August, 2010, on a very impressive book that enlightened the software- domain world with its title and content: "Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation" [25].

This book was written by Jez Humble and David Farley and presented an in-depth guide to the principles and practices behind continuous delivery and the DevOps movement [26]. Jez and David are both famous authors of several important books about software as well as professional software engineers, that have extensive experience with code, infrastructure and product development in companies around the world. Jez, now, works for Google Cloud as a technology advocate, and teaches at the University of California, Berkeley. David has become the founder and managing director of a consultancy company, named Continuous Delivery Ltd., which advises companies on topics about continuous delivery practices and more. Being written by them, this book became the new Bible of Continuous Delivery, since it provided many ideas for companies to adopt during their organizational transformations. One of them happened to be canary deployment. This marked the real birth of canary as a deployment strategy in release engineering. It also gave it an initial definition.

What is Considered as a Canary?

As a short recap, canary was first used as a name for a software developed by SNL and EPA, which in theory applied the idea behind the famous methodology among miners, named "canary on a coal mine".

Then, "canary" evolved into taking the name of a new releasing channel for Google Chrome, which would present the latest features of the newest official versions of Chrome. It hadn’t been previously tested, therefore experienced users or developers could play with the new features and report bugs in time. After this approach, canary entered the world of software engineering in the form of a deployment strategy, and that was very well defined in the book "Continuous Delivery" [25] by Jez Humble and David Farley. At this point, it is crucial to examine how canary deployment has been shaped up over time. Despite a great number of approaches companies use, we define the basic requirements for a canary deployment workflow to be:

1. Environment Separation. Even though, the rest of the canary workflow is conducted in the production environment, the canary deployment still needs to be separated based on the set of servers the company owns. It must be determined where to deploy the canary service/s.

2. Traffic Routing. It is simple to imagine that, in order for a new version to get deployed to a subset of users first, the network traffic needs to be re-routed in a way that there is no downtime as well as fast feedback can be produced by the users. A certain control loop must be responsible for managing the network traffic, based on the results of the tests performed on the production environment.

This means increasing the percentage of network traffic if the canary

(34)

behaves well, otherwise decreasing it.

3. Traffic Monitoring. One of the most important parts of a canary consists in being able to monitor its behaviour, by setting a number of requirements or metrics that need to be collected and monitored.

4. Traffic Analysis.The final requirement is being able to analyze these metrics collected and come to a conclusion, which will define the fate of the canary. This phase of the canary deployment can be achieved in several ways, such as automated analysis, manual analysis, or just by leaving the server live and waiting to see if any problems are found by its end users.

5. The Final Decision. Depending on the results of the analysis, one might decide to roll out the canary or roll it back, so that it either replaces the old stable version or gets removed from the production infrastructure [27].

Canary Deployment as a Trend

Since 2010, "canary" as a deployment strategy, has been referred to differently by several authors. The most commonly found terms for it have been "canary deployment" and "canary release". Using these terms, it is essential to look at how canary evolved from then until now. This will be initially presented by looking at its popularity on the Internet.

Google has released an online tool in 2006, named Google Trends, which analyzes the search interest of specified words over time. Based on the comparative search between "canary deployment" and "canary release", this tool has been used to present the search interest relative to the highest point on the chart, which is the value of 100. To achieve this, a timeline had to be established and the one chosen, in this case, begins from 2010 until the present day, since "canary" was introduced then.

After performing the search, Google Trends creates a graph, which shows when a certain topic first started getting attention, its peak point and when the interest on it has started slowing down. Figure 2.2 illustrates when

"canary deployment" and "canary release" first started getting attention, and it is around the beginning of 2010. Even though today these terms are used interchangeably, the combination of "canary deployment" presents our topic of interest in a better way compared to "canary release", since releasing a canary can be used in other contexts as well. Since 2010, the trend of canary deployments has had its ups and downs, since not too much exploration has been done about it in certain periods. The reasons for this will be discussed in the next section. Still, we can observe an increasing trend in the recent years, reaching its peak point in the late 2019 until the present day. Its evolution will be discussed thoroughly in the next section of this chapter to provide a complete understanding of it, with respect to research in academia and industry.

As a short summary, this part discussed the first attempts that paved the way for canary to enter the digital world through the ideas of big research

(35)

Figure 2.2: Canary in software engineering from its emergence in 2010 until present.

centers, valuable companies as well as popular authors. Another important matter covered in this subsection was related to the popularity of "canary"

in terms of software engineering, on the Internet.

2.2.2 A Systematic Review

Some questions that come to mind next, are: "What happened to canary since 2010? How did the academic world reply to this new development?

Consequently, how did it evolve since then?"

The first step to finding answers was to perform online searches using widely known digital libraries in the world of engineering and technology. The aim of the searches was to find papers containing the words

"canary deployment", "canary releasing", "canary testing", "canarying" or

"progressive delivery" to build a comprehensive understanding of the previous literature around the topic. These terms were all referring to the same concept and they were used by various authors in different publications, which made the research work around "canary" even more sophisticated.

The main digital libraries used to perform the advanced online searches were the Institute of Electrical and Electronic Engineers (IEEE) Xplore [28], the Association for Computing Machinery (ACM) Digital Library [29] and Google Scholar [30]. These databases are considered as the most popular sources addressing papers regarding software engineering, consequently canary deployment.

IEEE Xplore

The IEEE Xplore is a research database used to explore, discover and

(36)

access conference papers and proceedings, journal articles and related materials on computer science, electrical engineering and electronics, and similar fields [28]. Two journals of the highest interest for software engineers are IEEE Software and IEEE Access.

ACM Digital Library

Similarly, the ACM Digital Library is considered as an essential platform used for exploring, discovering as well as networking in a professional aspect [29]. It is the world’s largest computing society and serves as a bridge between educators, professionals and researchers with the aim of promoting career development and technical excellence in the field of computing. It offers a collection of full-text publications from certain important publishers, including several journals, conference papers, magazines, newsletters and books. ACM also offers a guide to computing literature which provides the users with an extensive bibliographic database [29]. The journals of the highest interest for software engineers are the Communications of the ACM and ACM Transactions.

One important magazine that publishes high quality articles is considered to be ACM Queue.

Google Scholar

Google Scholar is a very popular search engine that lists scholarly literature from a great number of disciplines, including most peer- reviewed online academic journals and books, conference papers, theses and dissertations, technical reports, and more [30].

After conducting a thorough research through the above-mentioned digital resources, we came to the first conclusion that from 2010 to 2014, there were little to none papers found for our topic of interest. The first time a paper fully addressed canary deployments happened in 2015 [31]. This fact can be explained through several reasons. First, due to the canary concept being a cutting-edge technology, time was required to establish a certain definition for it, a standard way of referring to it. Many papers might have understated its use, without actually using a widely known term. Nowadays, this is somehow solved, since as it was presented in figure 2.2, the most common term of addressing canary in software engineering has become "canary deployment". The second reason might be the time needed to explore a certain topic of interest, especially when it includes novel technologies. Researchers need resources, time and testing to write a comprehensive scientific paper. That paper requires time to get in the eye of the public, since the peer reviewing process is long, too.

Another finding was the small number of scientific papers addressing this concept. While the logical reasons explained above still stand, one more interesting reason presented here is the number of novel technologies and paradigms that have evolved around the same time as canary deployment. Due to the fact that canary deployment is considered a DevOps practice, several other research studies needed to be conducted about the movement itself, before reaching the actual practices behind it.

(37)

For instance, the research was spread into understanding how companies were adopting automated continuous deployment pipelines and this was studied in the paper by Marko Leppänen et. al [32]. This paper, released in 2015, uncovered that deployment automation was still not fully adopted by the software-domain companies taken into consideration. Even though the focus of the paper was not on canaries, it is still needed to understand the momentum that the industry was going through and how they were approaching continuous deployment. Then in 2016, research was being conducted on the requirements that DevOps engineers would have in the job market. In the paper by Noureddine Kerzazi and Bram Adams [33], it was discovered that automation was in the center of it. This shows how researchers were still trying to grasp the concepts of DevOps and eventually their attention on finding new solutions to address canary deployments was not focused.

Due to the DevOps movement, new supportive tools and technologies have evolved around the same time as canary deployment. Here, we can mention container frameworks, such as Docker [15], container orchestration frameworks, such as Kubernetes [9], continuous delivery and integration pipelines, such as Jenkins [34], and more. All of them play an essential role on the way canary might be implemented in the systems, nowadays. Therefore, the literature itself has been more focused on gaining a comprehensive understanding around the new technologies coming from the industry, and canary has been put on hold academically, but while the overview of the main tools has now been obtained by both the research and industry, the time has come to put some efforts and excel in canary deployment. This means that this document is being written at the very best time, regarding the evolution of canaries in software.

We now start to see an explanation for the number of research papers being small and for the ups and downs being so frequent. If observed carefully, figure 2.2 already reflects some of these fluctuations from the year 2015 and on. Figure 2.3, on the other hand, presents the popularity of canary deployment in research, and it serves as a summary of what was covered until now, in this subsection. It is important to mention that this graph includes only papers that explicitly mentioned the concept of canaries in their papers, and they add up to only 15 in total [31, 33, 35–47].

The Taxonomy Classification for Papers

The next step is the design of a general taxonomy, which would help classify the papers. This taxonomy was built using a spreadsheet, in order to extract the main information out of every paper that was found relevant to the topic of interest. It was not necessary for the word "canary" to be mentioned explicitly, since at the time, various terms of addressing the same concept were used. Table 2.1 presents how the papers were analyzed, in accordance to the categories specified in each row. The second column explains in detail what is included in each category. This helped looking

(38)

Figure 2.3: Popularity of Canary in Research

for patterns in the research literature.

Let us go through the most important categories defined in the taxonomy to observe the need for them in this research. The first one defines the year of the publication of the paper, which means for us to build a certain timeline of events. The second category defines the title of the paper. The third one defines the venue name, which means the name of the publication in which the paper was introduced and the fourth row lets us know about the type of venue. Here, only four values can be used as an answer, since the paper can be published in a conference proceedings book, in a scientific journal, a scientific magazine or an industry-based documentation, which we refer to as a white-paper. Based on where it is published, a certain rating of the paper can be implied. The fifth category defines the domain where the problem statement belongs to, which can be general software engineering, release engineering, cloud computing, DevOps, or other unspecified domain. The sixth class gives us information about the names of the author(s), meanwhile the seventh category provides us with information about their affiliations. If the affiliation is academic, then we can easily assume the incentive of writing the paper being research.

If the affiliation is industry, then we can assume a certain level of real-life reason of writing the paper. If it is mixed, then it shows the relationship’s strength between academia and industry on this topic.

The eighth group provides us with information about the nature of the approach of the paper. The ninth one gives us the main results of the paper, such as optimization, new ideas, new knowledge about a certain topic or a particular claim. The next row offers us information about the takeaways of the paper, such as source code of a certain tool or product, comprehensive diagrams or encouragement for a movement.

After that, we extract information about how realistic the approach has been. For instance, did it include real-life testing of its problem or features,

(39)

THE TAXONOMY

Category Information Extracted

Year The publication year of the paper

Title The title of the paper

Venue name The venue name of the paper

Venue type The venue type of the paper, such as: Conference, Journal, Magazine, White-paper, Other

Problem domain The problem domain of the paper, such as: Software Engineering, Re- lease Engineering, Cloud Comput- ing, DevOps, Other

Author/s The name/s of the author/s

Author affiliation The profession/s of the author/s, such as: Industry, Academia, Mixed

Approach The type of the approach of the

paper, such as: Comparative, Model Only, Simulation, Prototype, Case Study, Other

Main result The main message of the paper, such as: Optimization, New Idea, New Knowledge, Claim

Takeaways The main takeaways of the paper,

such as: Source Code, Product, Di- agrams, Movement

Realistic How realistic the paper is: Aca-

demic, Real-life Testing, Features Canary as a term How often is canary mentioned:

None, Rare, Common, Central Sentiment What is the feeling of the author

about the paper: Sceptical, Curious, Enthusiastic

Citations How often is the paper cited: Few times, Common, Popular

Keywords The keywords of the paper

Table 2.1: The table of taxonomy used to extract information from papers mentioning canary or simply relevant to the topic

Towards Continuity-as-Code

Towards Continuity-as-Code

From Local Solutions to a High-Level Approach for Automated Canary

Deployments

Lea Çeliku

Thesis submitted for the degree of

Master in Network and System Administration 60 credits

Department of Informatics

Faculty of mathematics and natural sciences

UNIVERSITY OF OSLO

Towards Continuity-as-Code

From Local Solutions to a High-Level Approach for Automated Canary

Deployments

Lea Çeliku

Abstract

Contents

List of Figures

List of Tables

Preface

Chapter 1

Introduction

1.1 Outline

Chapter 2

Background

2.1 The Art of Deploying Software

2.2 Research in Canary Deployment