Resource Allocation in Geographically Distributed Multi-Cloud Environments

(1)

Resource Allocation in Geographically Distributed

Multi-Cloud Environments

Hans Henrik Sande

Thesis submitted for the degree of

Master in Programming and System Architecture 60 credits

Department of Informatics

Faculty of Mathematics and Natural Sciences

UNIVERSITY OF OSLO

(2)

(3)

Resource Allocation in Geographically Distributed

Multi-Cloud Environments

Hans Henrik Sande

(4)

c 2020 Hans Henrik Sande

Resource Allocation in Geographically Distributed Multi-Cloud Environments

http://www.duo.uio.no/

Printed: Reprosentralen, University of Oslo

(5)

Abstract

With the advent of multi-clouds, cloud consumers are able to lever- age different geographically distributed cloud providers to satisfy the requirements of an application. However, the possibility of per- formance degradation of inter-node latency should an application require resources from different cloud providers still remains a chal- lenge.

This study aims to provide a scalable resource management solution for multi-cloud environments. The proposed solution is a heuristic algorithm, implemented as an allocation module in Apache Mesos.

Apache Mesos is a resource management solution, providing a scal- able way of allocating resources through its two-level scheduling mechanism. Our proposed algorithm bundles resources as resource offers, where the main focus is to minimize the average latency and average bandwidth of each bundle. Affinity propagation clustering was leveraged to cluster the geographically distributed datacenters based on the average latency and bandwidth.

The allocator was tested on a real-world platform with two remote cloud providers. For large-scale evaluations, simulations are used.

Different simulation scenarios were ran, each with different parame-

ters to see how our algorithm performed. The results are compared

with two other algorithms.

(6)

7.2.1 Simulation Scenario 1 - Varying number of slaves . . . 54 7.2.2 Simulation Scenario 2 - Varying total resources of slaves . 56 7.2.3 Simulation Scenario 3 - Varying number of datacenters . . 58 7.2.4 Simulation Scenario 4 - Varying bundle sizes . . . 60 7.2.5 Simulation Scenario 5 - Varying bundling strategies . . . 62 7.2.6 Simulation Scenario 6 - Varying application types . . . 64

8 Application Scenarios 66

8.1 Geographically distributed big data applications . . . 66 8.2 Multi-cloud cost efficiency . . . 66 8.3 Geo-Distributed Machine Learning . . . 67

9 Future work 68

10 Conclusion 69

(8)

List of Figures

1 Illustration of a grid resource management system . . . 11 2 Mesos architecture . . . 23 3 Resource offer example in Mesos . . . 27 4 Mesos architecture after it has been extended to multi-cloud . . . 31 5 Simulation Scenario 1: Average latency, bandwidth and execu-

tion of increasing frameworks . . . 48 6 Simulation Scenario 2: Average latency, bandwidth and execu-

tion of increasing number of tasks . . . 50 7 Simulation Scenario 3: Average latency, bandwidth and execu-

tion of increasing task resources . . . 52 8 Simulation Scenario 1: Average latency, bandwidth and execu-

tion of increasing slaves . . . 54 9 Simulation Scenario 2: Average latency, bandwidth and execu-

tion of increasing slave resources . . . 56 10 Simulation Scenario 3: Average latency, bandwidth and execu-

tion of increasing datacenters . . . 58 11 Simulation Scenario 4: Average latency, bandwidth and execu-

tion of increasing bundlesize . . . 60 12 Simulation Scenario 5: Average latency, bandwidth and execu-

tion using the three different bundling strategies . . . 62 13 Simulation Scenario 6: Overall performance of applications with

varying latency and bandwidth requirements . . . 64

(9)

List of Tables

1 Example data set . . . 36

2 Similarity Matrix . . . 36

3 Responsibility Matrix . . . 37

4 Availability Matrix . . . 37

5 Parameters . . . 44

6 Parameters of Simulation Scenario 1 . . . 49

(10)

Acknowledgements

The writing of this thesis has come together with the help of many people. I would like to thank you all for the kind help you have given me.

I would like to offer special thanks to my supervisor Feroz Zahid, for being a superlative sparring partner and offering valuable feedback throughout this work.

Finally, I will thank my family, friends and girlfriend for being so

patient during the writing of this thesis. Thank you Stephen for

proofreading of the thesis.

(11)

1 Introduction

In multi-clouds, the virtual machines can be geographically distributed. The virtual machines can belong to different cloud vendors. This adds the chance of higher inter-node latencies as opposed to a typical cluster with no nodes geographically distributed. If an application requires coordination among nodes in different geographically located clouds, it can result in reduced performance because of the higher inter-node latency. This creates the demand of having the inter-node latency as minimal possible. Cloud providers only guarantee the availability of a virtual machine, not the latency and response time of accessing the application deployed on the virtual machines. To ensure that low latency applications are deployed on geographically distributed clouds guaranteeing low latency to users, cloud providers has to deploy the application on the “nearest”

version. Deploying parts of either applications or data on different clouds is called multi-cloud deployment. Despite the best effort of cloud providers they are subject to failures, for instance, the outage of GoDaddy resulting in mil- lions of web sites being down. Some advantages associated with multi-cloud deployment is higher availability and redundancy, disaster recovery and geo- presence.

In this thesis, we address the challenge of allocating resources in a geographically distributed multi-clouded environment. The problem is formulated as a linear optimization problem which is NP complete, hence we present a heuristic solution. The solution we present is a heuristic algorithm which provides three different resource bundle categories, all in different sizes. The algorithm focuses on optimizing the datacenters used to fill the bundles with resources. Based on the results from the large-scale simulations, our proposed algorithm performs better than the other two algorithms used for comparison. Our algorithm has a lower execution time, better average latency and average bandwidth on all simulation scenarios.

1.1 Motivation

Resource management is an important issue for almost all computing environments. In cloud computing, it is considered one of the most important aspects in order to achieve efficient use of the underlying hardware [1]. One of the greatest benefits of Mesos¹ is fine-grained resource sharing, which is due to its two-layer scheduling algorithm. With the two-level scheduling algorithm, rather than having Mesos centrally allocate all the resources in the cluster to task, it provides resources in the form of resource offers to frameworks. Frameworks are then responsible to allocate resources to the tasks from within their resource pool. The main advantage of the two-layer scheduling algorithm presented in Mesos is scalability.

1http://mesos.apache.org

(12)

Cloud computing offers easy access to a wide variety of resources dispersed around the world. These resources, which can be infrastructure, software or platforms, can be delivered either by a single provider or multiple ones, known as multi-clouds.

The main goal of this thesis is to provide a scalable resource management solution for multi-cloud environments. By extending the already provided functionality of Apache Mesos, our provided solution was made possible. With this extension, problems not concerned with the original deployment of Mesos arises.

The main problem of this extension is how the allocator should select resources from slaves in potentially geographically distributed datacenters. The allocator has to optimize selection of resources in geographically distributed datacenters with respect to average latency and bandwidth.

By extending Mesos to cater to a multi-clouded environment, it effectively enables the possibility to run greater amounts of tasks than what is possible using the standard version of Mesos, as the option to pool resources from multiple clouds is now present. It also provides a solution to counter a situation where the datacenters of a cloud provider would be down, be it for scheduled mainte- nance, a natural disaster or power outage.

1.2 Outline

The thesis is divided into ten chapters. In Chapter 1 (this chapter), we introduce and motivate our work. Chapter 2 covers the technical background relevant to this thesis. Chapter 3 gives an overview of the Mesos resource management software. Chapter 4 explains the implementation of our solution. Chapter 5 covers the related work and their relevance to our thesis. Chapter 6 gives an overview of the simulation program that was written for our solution. Chapter 7 covers the different simulations ran using our algorithm and the results. Chapter 8 covers different application scenarios our algorithm might be useful. Chapter 9 highlights elements not covered in this thesis, but could be interesting for future work. Chapter 10 concludes the thesis.

(13)

2 Technical Background

In this chapter we cover technologies related to our thesis. The technologies covered in this chapter are Grid computing, Cluster computing and Cloud computing, where a brief introduction to each is provided along with how each deal with resource management. We also cover the K-means clustering algorithm.

2.1 Grid Computing

Grid can be viewed as systems that comprise sharing of resources in a coor- dinated manner, and in dynamic heterogenous environments, solve problems catering to the needs of running applications requiring substantial amounts of bandwidth and computational resources [2]. Classification of the different kinds of grids can be done in the following way: computational grid, access grid, data grid and datacentric grid [2].

• Computational grids deliver high performance computing.

• Access grids enables access of a minor number of specific resources.

• Data grids store and move huge data sets.

• Datacentric grids allows for distribution of repositories of data unable to be stored in a single repository.

Typically, grids are composed of heterogeneous resources. The availability of the given resources varies during execution of a grid application by reason of lacking ownership of the applications resources. Grid applications, generally, require substantial amounts of resources and have a vast spectrum regarding service requirements. Allocation of resources and managing fluctuations of the availability of these resources is a central ability to the provisioning of services in a grid network.

2.1.1 Resource Management

One could define a grid resource as a unit in need of carrying out an operation by an application [3]. The grid resource allocation process is composed of four main functions [4]:

• Scheduling

• Code transfer

• Data transmission

• Monitoring

(14)

Scheduling

Resource scheduling is an application-to-resource mapping process that embodies three main phases [5]; Resource discovery, resource selection and job execution.

Resource discovery: Resource discovery is the part where a search for available resources is performed and a list of the found available resources are gen- erated in a list.

Resource selection: Resource selection is the part where the most optimal resources matching the quality of service criteria are selected from the list gen- erated in the resource discovery phase.

Job execution: Job execution is the phase that embodies submission of jobs to the resources selected from the resource selection phase, and monitoring their execution.

Code transfer

Code transfer involves the transfer of code from the individual tasks to the allocated resources for execution.

Data transmission

Data transmission transfers data required for the execution of a task. After all transfers are completed the executions process commences.

Monitoring

Monitoring is responsible for continuous checking of resource availability, capa- bility, usage and their future reservations.

Grid resource broker

When executing an application, the user interacts with a grid resource broker.

The broker is responsible for resource discovery, scheduling, and processing application jobs on the distributed resources in the grid [6]. When submitting jobs to the broker, the broker has to acquire information about the appropriate resources based on the requirements of the job. This information is acquired by accessing the grid information system. The grid information system is responsible for keeping status information of all the resources [7].

Jobs are split and distributed to the selected resources by the broker. Should degradation of resource performance be detected or the broker discovering a superior resource that can execute the job efficiently, the user is able to restart the job on a different resource. After the completion of a job, the result is retuned to the broker which notifies the user [7].

(15)

Grid application (1)

Results (5)

Processed jobs (4) Unprocessed jobs

(3)

Grid resources details

R₁

R₂

R_n (2)

Resource broker Resource pool

User

Grid information system

Figure 1: Illustration of a grid resource management system

An example grid resource management framework

Ranganathan et. al. introduces in [8] one of the most known frameworks for scheduling in the grid. As the architecture is explained, submission of requests for executing a task from any one of a number of sites is made possible for the user. In each site, not including the local computing system, the system model embodies three main components:

1. The external scheduler (ES), which is responsible for determining a location for the execution of the submitted task.

2. The local scheduler (LS), which is responsible for ordering execution of tasks at the particular location.

3. The dataset scheduler (DS), which is responsible for deciding if and when data should be replicated, and possibly the deletion of files, if necessary.

In general, resource sites contain heterogeneous computing resources that are interconnected by networks independent of vendors. When a task request is received, the external scheduler communicates with the local schedulers. The goal is to find a local scheduler that has the available resources needed to execute

(16)

the task and able to execute it within the specified due date associated with the task. If the external scheduler is able to locate such a local scheduler, then the site where the local scheduler resides is chosen for the execution of the task. If the external scheduler is unable to locate such a local scheduler, it uses a search mechanism to try to locate a local scheduler residing in another site that can meet the requirements regarding processing the task. The local schedulers in the other sites are controlled by other external schedulers. After a preset number of steps has been exceeded, the location of a local scheduler is deemed impossible and the task request is either passed to another local scheduler able to minimize the due date of failure depending on the request parameter of the task. After a suitable site has been located, the external scheduler passes the task request to this site and is then managed by the local scheduler of the site.

2.2 Cluster computing

A cluster of computers is a computer system consisting of a multitude of stand- alone inter-connected computers that cooperates as a single integrated computing resource. The cluster can either be parallel or distributed [9]. Having all the nodes connected through a fast network and the applications communicating with a middleware responsible of presenting the cluster as a single resource is a typical architecture of a computer cluster. The key components of a computer cluster include:

• Multiple stand-alone computers

• Operating systems

• High performance interconnects

• Middleware

• Applications

To support high-bandwidth and low-latency inter-processor communication between cluster nodes, the cluster needs to incorporate fast interconnection technologies.

The single system image [10] (SSI) is implemented by the resource management system to help represent the distributed system as a single computing resources.

By abstracting complexity of distributed system and the heterogeneity of the clusters from the user, better usability is achieved.

A cluster resource management system (RMS) acts as a cluster middleware that implements the SSI for a cluster of machines. The RMS enables users to execute jobs on the cluster, oblivious to the complexities of the underlying cluster

(17)

architecture. Each resource has some status information associated with it, and the status information is maintained by the resource management system.

The resource management system is used to select available nodes that have sufficient amount of available resources, thus being eligible for assignment of jobs. If there are no available nodes with sufficient available resources, the resource management system will leverage job queues. The resource management system place the job on the job queue and keeps it there until there are satisfactory amounts of available resources. Upon satisfactory amounts of available resources, the resource management system uses a job scheduler to determine the jobs currently in the queue that ought to be executed. During the execution part of the job, it is managed by the resource management system which, after completion of the job, returns the results to the users [9]. There are different resource management techniques employed in the clusters, such as LoadLeveler, Load Share Facility (LSF) and Condor.

LoadLeveler

LoadLeveler was developed by IBM to act as a resource management system for their clusters. It enables building, submission and processing of jobs [11]. Upon job submission, each job is inspected to figure out the resources required for the job, this is located in the command file associated with the job along with the due date of the job. After determining what resources the job needs to be executed, LoadLeveler locates the machines that can provide the required resources and provide the best time for job dispatching. Jobs that are unable to be executed based on resource availability of the available machines are put in job queues. By using job classes, a classification mechanism where the administrator can define job classes to restrict users to use a certain job class and decide what jobs a machine is capable of running, it schedules jobs to run on machines.

Load Share Facility

Load Share Facility [12] is a loosely coupled cluster solution for heterogenous systems. LSF uses a layered architecture. By using a layered architecture, LSF allows for extension modules to be installed to provide advanced services. The base layer provides low-level cluster services such as dynamic load balancing and access to the resources available on all nodes in the cluster. The top layer consists of the module providing different utilities.

Condor

Condor [13] manages a cluster of dedicated machines and is also able to execute jobs on non-dedicated machines that are otherwise left idle. Condor automati- cally detects these idle machines and uses them via checkpointing and migration of job processes.

(18)

2.3 Cloud computing

The National Institute of Standards and Technology (NIST) definition remains the standard when referring to the cloud. NIST defines a cloud model being made up of five key cloud characteristics, four cloud deployment models and three cloud service models [14].

The five key cloud characteristics are on-demand self-service, broad network access, pooling of resources, rapid elasticity and measurement of services. The presence of these five characteristics [15] are required for cloud services to align with the definition [14, 16]. The four cloud deployment models are public cloud, private cloud, hybrid cloud and community cloud. The three cloud service models are Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS). To easier visualize the service models, one can envision a stack, where IaaS would be the lowest level of the stack and SaaS being the highest level of the stack.

2.3.1 Key characteristics On-demand self-service

On-demand self-service defines the ability to have request and fulfilment process fully automated, enabling a cloud consumer to request and receive access to a service offering. This greatly aids freeing administrative burdens on the cloud provider side, as fulfilling of the request from the cloud consumer hap- pens without the intervention of an administrator having to fulfil the request manually.

Broad network access

The services of a cloud should be accessible to most devices. The alternative, developing and maintaining an application for every client would prove both time consuming and costly. The cloud consumer should only be required to be in possession of a basic network connection in order to connect and use the services or applications provided by the cloud. Given the fact that cloud consumers might have limited bandwidth, the cloud should require no client or a thin client, seeing as downloading a fat client on a low-bandwidth connection would result in significant amount of time spent waiting.

Resource pooling

As cloud consumers will not be in a constant need of the resources available, instead of having the resources sit idle, they can be used by another cloud consumer. This is the main idea behind resource pooling. This enables the cloud provider to serve significantly more cloud consumers instead of giving a designated amount of dedicated resources. By using virtualization, which allows for a more dense system being able to host multiple virtual sessions on a single system, resource pooling is achieved. When operating in a virtualized

(19)

environment, the resources of the physical systems are added to the resource pool that can be leveraged by other virtual systems, this is also known as multi- tenancy.

Rapid Elasticity

Upon high usage of resources from cloud consumers, the cloud must be able to grow to satisfy the demands, and be able to shrink once the usage sub- sides to mitigate wasting of resources. This is called rapid elasticity and is, in most cases, achieved using automation and orchestration. When the usage of resource exceeds a given point, the cloud begins the process of expanding its capacity. This feature is what enables cloud providers to handle burst capacity, increased capacity required only for a short time period, needed by cloud consumers.

Measured Service

The “pay as you go” feature of cloud computing is made possible through the service measurement. First a quantifiable, appropriate, metric has to be iden- tified. The metric could be time usage, bandwidth consumption or data consumption. Based on the chosen metric, a determined rate is set, and the cloud consumer is charged based on the consumption on the chosen metric relative to the rate.

2.3.2 Service Models

The NIST definition of cloud computing outlines three basic service models [14]:

Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software- as-a-Service (SaaS).

IaaS

The Infrastructure-as-a-Service model primarily revolves around providing computing resources to the cloud consumers. The cloud provider provides, manages and controls the underlying cloud infrastructure. By using virtualization, cloud providers can split, assign and dynamically resize resources to build ad-hoc systems for the cloud consumers. The cloud consumers control the instance of the operating system, storage and their deployed applications [17, 18].

PaaS

As IaaS provides the virtualized underlying infrastructure, the Platform-as-a- Service provides a software platform enabling cloud consumers to run their systems on. Cloud consumers control the applications they deploy and also configure the hosting environment to their preferences. In essence, the various development tools, operating systems and infrastructure provided by the IaaS is rented by the cloud consumer which results in a greatly simplified development experience [17, 18].

(20)

Platform-as-a-Service uses a multi-tenant architecture. By using a multi-tenant architecture, trust relationships regarding security for users, distribution of code and application usages, is maintained [19]. PaaS offers to create composition of multiple web services. These services access databases and re-use services maintained inside private networks [20].

SaaS

In the Software-as-a-Service cloud model the cloud services are offered, which cloud consumers can access by using either a client or a programming interface.

Managing specific user configurations is the only element the cloud consumer has to do, as opposed to the elements listed in IaaS and PaaS [17, 18]. SaaS can either leverage a multi-tenant architecture or, as an alternative, virtualization.

When using virtualization as opposed to multi-tenancy, virtualization-to-cost is used to manage a large number of cloud consumers [19].

2.3.3 Deployment models

How an organization decides to leverage the cloud varies, as every organization has their own unique requirements regarding the amount of control desired for the environment of the chosen services. To best accommodate the spectrum of requirements, the cloud environment can be implemented using the different available service models, where each service model has its own set of requirements and benefits. NIST defines four cloud deployment models [14]: public clouds, private clouds, community clouds, and hybrid clouds. Although not listed in the NIST definition, multi-clouds will also be covered as it is highly relevant to the thesis. A cloud deployment model is defined according to where the infrastructure for the deployment resides and who has control over that infrastructure.

Private cloud

Private cloud infrastructure is dedicated to a single organization or group. Pri- vate cloud could either be owned by the organization or leased. It may be managed by the organization or a third party and can exist on-premises or off- premises. Private cloud is more expensive and secure when compared to public cloud [21]. Private clouds are hosted within the firewall of the organization.

Private clouds are flexible and service-based. Processes, services and data are managed within the organization. In private cloud there are no additional security regulations or legal requirements that can be present in a public cloud environment. By using a private cloud, the cloud service providers and the cloud consumers have optimized control of the infrastructure and improved security, since user’s access and the networks used are restricted [22].

Public cloud

A public cloud is one in which a third-party provider offers computing services over the public Internet [23]. It is offered on a pay-per-usage model. The

(21)

cloud service provider is responsible for setting up the hardware, software and networking resources. Public clouds does not imply that the data of the user is public. In many cases, access control mechanisms are required before the cloud consumer can make use of cloud resources.

Community cloud

When a number of individuals or organizations govern and manage a cloud computing solution, it is known as a community cloud. It could also be managed by a third party. They are, in most cases, built and operated for a specific group where the participants have similar cloud requirements and their incentive is working together to accomplish shared goals. Community clouds is a hybrid form of private clouds. For organizations working on projects together or doing research together and possibly bound by similar law regulations, the community cloud is a superb option [24].

Hybrid cloud

A hybrid cloud is a mix of public and private clouds leveraging the benefits of both. Under certain times when the resources available at the private cloud are insufficient, the company can leverage resources from a public cloud. They can also tap into the benefits of the public cloud, while maintaining the other desirable benefits of the private cloud by having security sensitive applications or sensitive data reside in the private cloud [22]. By being able to leverage resources from the public cloud, when resources are insufficient on the private cloud, they are able to perform cloud bursting, having overflow traffic directed from the private cloud to the public cloud to mitigate interruption of services [25].

Multi-Clouds or Interclouds

Leveraging multiple clouds from multiple vendors is known as a multi-cloud.

They could be private or public. The motivation behind multi-clouds is to better distribute computing resources and decrease the chances of potential downtime or data loss by having the limits of a single cloud service exceeded. As the simultaneous use of multiple cloud providers is made possible, the multi-cloud strategy helps avoiding vendor lock-in. By having a combination of multiple clouds, computational power and available storage is increased as opposed to having a single cloud service [26].

Resource management in cloud computing strives to offer a high availability of resources, enabling multi-tenancy, providing a fulfilling time variant service model, offer efficiency, and lastly, provide reliable resource usage.

As stated in [27], resource management in clouds can be divided up into two phases: Ab-initio Resource Assignment and Periodic Resource Optimization.

(22)

Ab-initio Resource Assignment

Ab-inito resource assignment is the initial resource assignment process, meaning that resources are requested by the cloud consumer for the first time. The resource manager has to locate and identify the differing resources provided by the cloud provider. After successfully locating and identifying available resources, they are then gathered and a negotiation process with the cloud consumer is initiated where the goal is to make sure the resources are available as per requirement. Upon completion of the negotiation, the resources are grouped logically as per the requirements of the cloud consumers where the resources most satisfactory regarding the requirements of the cloud consumers are chosen. The final step is having the virtual resources selected in the previous step mapped to physical resources and then allocated to the cloud consumer.

Periodic Resource Optimization

Periodic resource optimization is presented as a process for two different categories of resources which are non-virtualized resources and virtualized resources.

The non-virtualized resources are also called as physical resources. For both categories of resources, periodic resource optimization contains similar steps.

The only difference is that virtualized resources can be assembled together as per the resource requirement and can be disassembled also. For both non- virtualized and virtualized resources, they first are monitored to analyze how they are utilized. Then a prediction of which resources are going to be required by the application is made. The predicted resources are then negotiated with the cloud consumer to ensure their availability. The resources can be be scaled up or scaled down. The resources are then reallocated to the cloud consumer and pricing based on the resource usage is then made.

2.4 Clustering Algorithms

In this thesis, we are using the affinity propagation clustering algorithm to cluster datacenters based on their average latency and average bandwidth to each other. Since we use K-means to compare with our selected clustering algorithm, it is introduced here.

2.4.1 K-means clustering

K-means is a centroid-based algorithm, or a distance-based algorithm, where we calculate the distances to assign a point to a cluster. In K-Means, each cluster is associated with a centroid.

The K-means clustering algorithm divides M points in N dimension into K clusters so that the within-cluster sum is minimized. It does not yield the most optimal result, but rather a local optima. In a local optima of K-means there

(23)

is no movement of a point from one cluster to another that will reduce the within-cluster sum of squares.

The algorithm requires as input a matrix ofM points in N dimensions and a matrix ofK initial cluster centers, known as centroids, inN dimensions. The number of points in clusterLis the Euclidean distances between pointIandL.

The general procedure is to search for a K-partition with locally optimal within- cluster sum of squares by moving points from one cluster to another.

(24)

3 Mesos

Mesos is an open source platform enabling numerous various cluster computing applications to share commodity clusters. It was originally developed at the University of California, Berkeley [28]. Mesos resides in the middle of the operating system and the application layer, efficiently enabling the deployment and management of applications in a large-scale clustered environment in a more simple way. What makes Mesos so unique and scalable is the two-level scheduling mechanism it offers. By using resource offers, Mesos enables the sharing of resources in a coarse-grained and scalable way. Mesos only decides how many resources each framework should be offered, but it is up to the scheduler of the framework to decide if the offer is sufficient to run its computational tasks on. Frameworks also have the ability to reject offers, this enables Mesos to better satisfy the constraints of a framework seeing as Mesos is oblivious to the constraints a framework might have. Through the usage of delayed scheduling, where a framework waits 1-5 seconds to acquire nodes storing the input data, Mesos can yield nearly optimal data locality.

3.1 How does Mesos work?

Mesos enables applications to request computational resources without the need of a user knowing how the resources are translated to virtual machines or bare- metal by providing an abstraction layer. Drawing parallels to how the operating system of a personal computer access the resources available to it, Mesos ensures the accessibility of needed resources by the applications [28]. As opposed to configuring multiple clusters of servers to be able to handle different parts of an application, Mesos takes a different approach. It contains a pool of servers that is shared by the frameworks running in the Mesos cluster. All the servers residing in the pool are able to execute the various parts of the application with them interfering with one another. They are also able to, if necessary, allocate resources across the cluster dynamically. Mesos leverages features provided by the modern kernel, cgroups in Linux, to provide isolation of the resources.

The key components of Mesos are:

• Schedulers

• Executors

• Master

• Slaves

• Frameworks

(25)

The Mesos Master

The primary job of the master node is to allocate the resources available in the cluster using the specified sharing policy. It does so in the form of resource offers. A resource offer can be viewed as a unit of allocation composed of available resources on a slave, that is sent to the frameworks. The master is also responsible for managing the task life cycle of the frameworks.

Frameworks

Mesos defines frameworks as a software system that manages and executes one or more tasks on a cluster. Tasks are the consumers of resources. Each task can be unique. A framework in Mesos consists of two components: a framework scheduler and framework executor. Framework schedulers are responsible for coordinating the execution. A framework executor provides the ability to control the task execution.

The Scheduler

While the master node determines how many resources to offer to each framework, it is the scheduler of the framework that selects which of the offered resources to use. When a framework accepts offered resources, it passes Mesos a description of the tasks it wants to launch on them. Frameworks are able to implement whatever scheduler they prefer.

The Executor

An executor is a process that is launched on slave nodes to run the tasks of the framework. The slave is responsible for starting the executor process. Once started, it uses the SUBSCRIBE call from the Mesos HTTP API [29] to subscribe to a slave. After subscribing, the executor can launch its tasks there.

A single executor can launch multiple tasks. By sending information regarding the status of the task, the executor is able to notify the slave about changes regarding the task. The slave notifies the framework regarding changes of the state of the task. The framework performs health checks and all operations needed to run a task by the framework.

The Mesos Slave

The main responsibility of the slave is to execute the tasks of a framework by using the resources of the slave. To create a more clear picture of the resources on the slaves, they can be divided into two categories, slave resources and slave attributes. Attributes primary role is to associate slave with certain information, whereas slave resources are consumable elements used by a task. It is the slave resources that are allocated to the frameworks and attributes used to identify, for instance, what operating system the slave is running or what software version on it.

(26)

Communication

Mesos components leverages the Mesos API [29] to communicate using the following:

• Scheduler API: This is used to communicate with the framework scheduler and master.

• Executor API: This is used to communicate with an executor and the Mesos slave.

• Internal API: This is used to communicate with the Mesos master and slave.

• Operator API: This is the API exposed by Mesos for operators and is used by web UI, among other things. Contrary to the rest of the Mesos API, this API is synchronous.

Figure 2 shows the main components of Mesos. Mesos consists of a master that manages slaves running on each cluster node, and Mesos frameworks that run tasks on these slaves.

3.1.1 Architecture

Efficient share of clusters for the frameworks is enabled through the scalable and resilient core provided by Mesos. As the spectrum of frameworks is vast and ever-changing, it is important to place most of the control regarding task scheduling and execution at the frameworks, and expose a lightweight interface to the framework to enable sharing of resources in an efficient manner. The primary benefits of trusting the frameworks with the control is to keep frameworks as loosely coupled from Mesos as possible. By doing so, frameworks are able to implement their preferred way of dealing with cluster related problems (fault handling, data locality) and also able to evolve independently of Mesos. It also frees Mesos from additional work, keeping in simple and reducing how often the system is required to change, and thus being able to maintain it scalable and robust in a simple manner.

By enabling frameworks the option to reject offers, Mesos offers an alternative to frameworks having to specify any constraints they have and the type of resources they require. If a framework deems a resource offer unsatisfactory it can reject it and wait for an offer that satisfies the constraints the framework might have. With the option to reject offers, frameworks can handle complex resource constraints instead of Mesos, which adds to the point of it being kept scalable and simple. A downside of only using rejection, is that frameworks might have to wait a substantial amount of time for an offer it will accept, rejecting many offers til an acceptable offer is offered. To combat this, Mesos provides filters for frameworks, which enables frameworks to specify resources it will always reject.

(27)

(7) Send resource offer

to framework (7)

Send resource offer

to framework

(9) Description of tasks

to be run sent to agent

to be run sent to agent Mesos master

(3) Register with master (8)

Accept offer along with description of tasks to be run

Spark scheduler

(3) Register with master

(8) Accept offer along

with description of tasks to be run Hadoop scheduler

(5) Send available resources

to master

(5) Send available resource

to master

Task(s) Task(s)

Spark executor

Hadoop executor

Mesos Agent Mesos Agent

(4) Executor of framework

added to agent

(4) Executor of

framework added to

agent (6)

Generate offers based on allocation policy

module

(1) Master initialized

(10) Executor

runs tasks

(10) Executor

runs tasks

Figure 2: Mesos architecture

Mesos provides performance isolation between framework executors running on the same slave by leveraging existing operating system isolation mechanisms.

Since these mechanisms are platform-dependent, Mesos support multiple isolation mechanisms through pluggable isolation modules.

Mesos have three mechanisms to ensure task scheduling is done in an efficient manner whilst being robust to failure. The first mechanism is one mentioned earlier, the option to reject specific resources. As of now, two types of filters are available, filtering based on amount of available resources on the slave or a whitelist of slaves. The second mechanism, is counting resources offered to a framework towards its allocation, this incentives frameworks to quickly respond to offers and filter unusable resources. Lastly, if a framework has not responded to an offer for a sufficiently long time, Mesos rescinds the offer and re-offers the resources to other frameworks.

(28)

3.1.2 Resource Management in Mesos

Given the fact that Mesos is able to handle such a diverse workload, there is no silver bullet for a scheduler that can satisfy the requirements of all the various frameworks. A scheduler for a long running task will schedule in a different way compared to how a scheduler for a batch processing framework would. The solution Mesos has opted for regarding its design is paramount for the mentioned problem: separating resource allocation and task scheduling. The master determines what framework gets which resources, whereas task scheduling is revolved around how the scheduler of the framework is responsible for deciding how to use the allocated resources in the manner deemed most fit by the scheduler.

Following is a typical flow of events for a framework in Mesos:

1. The master node is initialized.

2. Once the master is initialized, the slaves registers with the Mesos master.

3. The framework scheduler registers itself with the Mesos master.

4. The executor of the frameworks are added to the slaves.

5. The slaves send their available resources to the Mesos master.

6. The Mesos master runs the its allocation module and determines the frameworks to be the recipients of the resource offers.

7. The Mesos master sends the resource offers to the frameworks.

8. The frameworks receives the resource offers, the framework scheduler veri- fies that the offer is satisfactory according to its needs. Should that be the case, the framework scheduler accepts the offer and sends a description of the tasks it wants to run to the Mesos master. If the offers were not able to satisfy the needs, the offers are rejected and the framework will have to wait for a better offer.

9. The Mesos master sends the description of tasks to the slaves.

10. The slave allocates the requested resources and launches the task executors. The executor is launched on slave nodes and runs the framework’s tasks. Upon completion, the framework will unregister itself from the master and thus not have offers sent to it anymore.

The opted design mentioned above is in essence what enables it do be so scalable, stable and simple, seeing as allocation of resources is done in a way oblivious to how they are scheduled. However, the scheduler has no global view of how the resources are utilized and therefore allocation decisions has the chance of being suboptimal. Another concern is the type of resources a framework might need for execution, as the frameworks do not pass this information to the master, it is instead solved by giving frameworks the ability to rejects offer that they deem unfit for using for execution.

(29)

Dominant Resource Fairness algorithm

Mesos resource allocation is based on Dominant Resource Fairness [30] (DRF).

The motivation behind DRF was due to the lack of allocation policies for environments with resources of different types in a heterogeneous environment. For each user DRF computes the share of each resource allocated to the user, where the largest share of a given resource amongst the allocation is the dominant share of the user, this may differ between users.

Briefly explained, the goal of DRF is to maximize the minimum dominant share across all users. An example being, Alice wishes to run CPU intensive tasks, and Bob wishes to run memory intensive tasks, DRF tries to equalize the share of CPUs Alice gets with the share of memory that Bob gets. Refer to [30] for more details.

DRF has many provable properties:

1. Strategy proofness: Users cannot lie to improve the allocation to them.

2. Sharing incentive: DRF has a minimum allocation guarantee for each user, and no user will prefer exclusive partitioned cluster of size 1/N over DRF allocation.

3. Single resource fairness: In case of only one resource, DRF reduces to the min-max algorithm.

4. Envy free: Every user prefers his allocation over any other allocation of other users. This also means that the users with the same requests get equivalent allocations.

5. Bottleneck fairness: When one resource becomes a bottleneck, and each user has a dominant demand for it, DRF is equivalent to min-max.

6. Monotonicity: Adding resources or removing users can only increase the allocation of the remaining users.

7. Pareto efficiency: The allocation achieved by DRF will be pareto efficient, meaning it would be impossible to improve the allocation for any user without worsening the allocation of another user

(30)

3.1.3 Example of offer generation in Mesos

Figure 3 shows how a framework gets scheduled to run a task.

There are four events in the figure:

1. Reporting to the master of free resources 2. Sending offers

3. Framework replying 4. Launching of tasks

As the slave has connected to the Mesos cluster, it notifies the master node of all available resources residing on the slave. The master then invokes the implemented allocation policy module to decide how these resources should be shared amongst the frameworks in the cluster. In this case, the allocation policy module decides that framework 1 should be offered all available slaves on the newly connected slave. The master sends a resource offer to the scheduler of the framework, describing exactly what is available on the slave to the framework.

The framework decides whether it wants to launch tasks using the offer, in this case, it accepts the offer and the framework’s scheduler replies to the master with information about two tasks to run on the agent, using (2 CPUs, 1GB RAM) for the first task and (1 CPUs, 2GB RAM) for the second task. Lastly, the master sends the tasks to the agent, which allocates appropriate resources to the framework’s executor, which in turn launches the two tasks. Seeing as 1 CPU and 1 GB RAM are still unallocated, the allocation module could now offer the remaining resources to another framework. The resource offer process repeats when tasks finish and new resources becomes free.

(31)

Allocation module

FW Scheduler Job 1 Job 2

Task Task Framework 1

Agent 1 Executor

Mesos master

<s1, 4cpu, 4gb, ... >

(2)

<s1, 4cpu, 4gb, ...>

(1)

<fw1, task1, 2cpu, 1gb, ..>

<fw1, task2, 1cpu, 2gb, ...>

(4)

<task1, s1, 2cpu, 1gb, ... >

<task2, s1, 1cpu, 2gb, ...>

(3)

Figure 3: Resource offer example in Mesos

(32)

4 Implementation

Our proposed algorithm is implemented as a Mesos resource allocator. Apache Mesos offers a pluggable mechanism to implement a scheduling policy through the allocator. The allocator encapsulates the logic that the master node uses to determine which frameworks to make resource offers to. The default sharing policy used by Mesos is the weighted Dominant Resource Fairness algorithm.

Implementing our allocator module with a sharing policy that allows for resource offers using multiple datacenters needed the following items:

• Associate slaves to a datacenter

• Ability to combine resources from multiple slaves from different datacenters into an offer.

4.1 Associating slaves to datacenters

Apache Mesos is implemented to handle resource allocation within a single datacenter. With our task of extending Mesos to be able to cater to a multi-cloud environment, the need to differentiate between slaves, with respect to what datacenter they belong to, arose.

Mesos utilizes protocol buffers extensively for messaging and serialization. Pro- tocol buffers offers a language-neutral, platform-neutral method of serializing structured data. It is both highly flexible and efficient [31]. A developer can specify how the serialized information ought to be structured by defining protocol buffer message types in a .proto file. Each protocol buffer message is a small logical record of information, containing a series of name-value pairs. Follow- ing is the message containing information about a slave from the mesos.proto file.

message SlaveInfo {

required string hostname = 1;

optional int32 port = 8;

repeated Resource resources = 3;

repeated Attribute attributes = 5;

optional SlaveID id = 6;

...

}

Numerous name-value pairs in the SlaveInfo message stems from flags set by the command-line arguments provided when launching a slave to run inside Mesos.

Adhering to this coupling, an additional flag for the id of the datacenter the slave belongs to were added. Changes were also made to the Slave code, specif-

(33)

ically adding the id of the datacenter to the Flags object within the slave. To communicate the newly added flag, the structure of the SlaveInfo message were altered. The chosen solution resulted in creating a new message to represent the datacenter id, and adding this as a name-value pair in the SlaveInfo message.

One could argue that the necessity for an independent message to represent the datacenter is redundant, and rather used a primitive value to represent it.

By changing the structure of the SlaveInfo message and include a new name- value pair to describe the datacenter where the slave will run, providing an additional flag when launching a slave, the necessity to associate slaves to datacenters is accomplished.

message SlaveInfo {

required string hostname = 1;

optional int32 port = 8;

repeated Resource resources = 3;

repeated Attribute attributes = 5;

optional SlaveID id = 6;

required DatacenterID datacenter_id = 7;

...

}

message DatacenterID {

required string datacenter_id = 1;

}

4.2 Combining resources across datacenters into offers

As stated in Section 4.1, Mesos manages resources in a single datacenter. Ex- tending it to a multi-cloud environment proposes some issues regarding resource offers.

• Selection of a satisfactory combination with respect to latency, bandwidth and number of slaves

• Strategy for combining resources

By enabling the opportunity to offer resources from slaves in different datacenters, and effectively different geographical locations, having to consider the latency and bandwidth between the datacenters becomes paramount. A combination can contain resources from different datacenter. Implementing affinity propagation clustering as our clustering algorithm greatly aids the selection of geographically close datacenters. We cluster datacenters based on their respec-

(34)

tive latency and bandwidth to each other. Affinity propagation clustering will be explained in greater detail later in Section 4.3.2, with an example for illustration. In short, it uses message passing between the different data points, datacenters in our case, to determine a set of data points that best exemplify the data and associates each data point with one exemplar, where the exemplar being the “representative” of the cluster.

Our strategy for selecting and combining resources is a bundling strategy. The idea is to take resources from different slaves in potentially different geographically placed datacenters and offer them to frameworks as a single combined offer. As per the current implementation, each framework will be offered three different offers, all in increasingly bigger sizes. However, an arbitrary number of categories can be supported by the algorithm. The three categories as per the current implementation are Small, Medium and Large, all having a set amount of resource size associated with them. For example, Small (2 CPU, 2 RAM), Medium (4 CPU, 4 GB RAM), Large (6 CPU, 6 GB RAM). However, it is possible that a framework wanting to run a task that exceeds our Large category, thus not ever being able to run. There is also the chance of over-allocation of resources and resource fragmentation, as a framework will accept one of the bundles, and there is no guarantees that the resource requirements of the task(s) is equivalent to the accepted bundle, thus resulting in some resources sitting idle and potentially poor resource utilization. To counter this a fine-grained approach would be necessary.

We have demonstrated a fine-grained bundling strategy using simulations. This strategy will create (at most) three bundles based on the requirement of at most three next tasks the framework. Every bundle will be tailored for the task, this will completely remove any over-allocation of resources, thus eliminating fragmentation. This requires the allocator to have knowledge of the resource requirements of all the tasks of each framework, in order to successfully create a bundle based on the requirements of a task. This was not implemented in the real-world experiments as Mesos does not keep track of resource requirements of frameworks, the motivation behind the two-level scheduler mechanism used in Mesos was to present a more scalable alternative to this approach.

(35)

(2) Slave register

with master (3) Framework register

with master (5) Send available resource to master

(2) Slave register

with master (3) Framework register

with master (5) Send available resources to master (8)

Accept offers along with description

of tasks to be run (7)

Send resource offer

to framework

to be run sent to agent

Mesos master Scheduler

Datacenter B Datacenter C

Datacenter A Executor

Mesos Agent

(1) Master initialized

Executor Scheduler (4)

Executor added to agent

(4) Executor added to agent

(6) Generate offers

using our allocation module

(8) Accept offers along with description

of tasks to be run

to be run sent to agent (10)

Executor runs tasks

(10) Executor

runs tasks (7)

Send resource offer

to framework

Figure 4: Mesos architecture after it has been extended to multi-cloud

4.3 Deep dive into the allocator

In this section, a thorough explanation of the code that was written² will be given. Modifications regarding the allocator will also be addressed. The repository for Apache Mesos was forked from GitHub³. The default allocator used, as previously mentioned, is weighted DRF. By using this as the foundation for

2https://github.com/HHSande/mesos

3https://github.com/apache/mesos

(36)

our algorithm, it not only saved us time, but a lot of work not necessarily concerned with realizing the multi-cloud aspect of the allocator, but rather Mesos functionality, was already implemented for us.

4.3.1 Implementation

The allocator is written in C++11. We will focus on the two main aspects most important to our implementation, namely the steps before generation of offers starts and the generation of offers.

Preparing for offer generation

Once the master node is initialized it will execute our clustering algorithm. It reads latency and bandwidth values that are normalized between 0 and 1, such that the higher values implies better results for both bandwidth and latency.

For our thesis, the collection of latency and bandwidth are made by randomly generating numbers to represent the latency and bandwidth between datacenters. To actually collect real-world values for bandwidth and latency, one could use the tool iperf3⁴ for bandwidth collection. For latency collection one could ping between datacenters for a set amount of time and calculate the average value for each pair of datacenters.

After the master node has been initialized and the clustering algorithm has been executed, two things has been achieved.

1. The allocator has stored the exemplar and datacenters having the given exemplar as a representative in a map to use as a look up later in the process.

2. The slaves can connect to the master. When a slave connects with the master, the id of the slave and an object representing the slave will be stored in the allocator. When a slave disconnects the id and object will be removed. This ensures that the allocator has the correct slaves to calculate offers to generate from.

The required steps are now complete and the allocator is now ready to generate offers to frameworks.

Generation of offers

Before initiating the generation of offers, the registered frameworks are shuffled, to ensure that every framework has the equal chance of being selected first when the amount of resources during the current offer generation is the highest.

For instance, framework A and B are long running tasks, and combined they require all available resources, this would lead to framework C having to wait an extensible period of time before it could run its short running task.

4https://iperf.fr/

(37)

When traversing the frameworks to be made offers to, a random cluster is selected to start our search in. From this randomly picked cluster, a random datacenter within is selected. As the possibility of selecting a datacenter that is not satisfactory close with regards to geographical location as the starting point for the search is present, one could implement a similar strategy as was done with adding datacenter id to the slaves. The framework would then have a datacenter associated with it, and thus enable the selection of the most optimal starting point in terms of geographical location.

The goal of each offer generation to frameworks is to be able to fill all three bundles with a satisfactory combination of slaves, once one of the bundle categories are completed it is stored in a hashmap, and the filling of the next bundle continues.

For each framework, there is an associated hashmap with the different bundle categories that were made, for each category there is an associated list of pairs containing the slave id and the amount of resources taken from the given slave.

While iterating through frameworks awaiting offers, as previously mentioned, a random cluster and a random datacenter within the cluster is selected. The allocator keeps track of which datacenters and clusters it has visited, when all the slaves in a datacenter has been checked, the datacenter is marked as visited, when all datacenter inside the cluster is visited, the cluster is marked visited.

Before searching the slaves in a datacenter for resources to fill the current bundle with, the slaves are shuffled. The motivation behind this is to mitigate allocating from only the slaves in front of the list, as this would lead increasingly heavier load on the front slaves, propagating slowly backwards, rather than having it more evenly distributed.

In order for a slave to be an eligible candidate for contribution to the bundle, it has to either have resources equivalent or greater than the remaining resources needed to complete the the bundle, or resources equivalent or greater than the set threshold.

Two parameters that can be modified are the maximum size of the bundles and a threshold regarding resources for each slave to be selected for the current bundle. Maximum bundle size dictates how many slaves can make up a bundle, and the threshold specifies a given percentage of resources the slave must have to be selected.

For each slave in the datacenter, the total offerable resources are retrieved and verified whether the slave can provide the remaining resources needed to complete the bundle or the slave has resources equal or greater to a set threshold.

(38)

Two outcomes can occur at this point:

• The slave has enough resources to satisfy either the remaining resources needed for the bundle or it satisfies the threshold. The resources from the slave and the id of the slave gets paired and associated with the current bundle. Lastly, the bundle is added to the storage of bundles.

• The slave does not have enough resources to satisfy both the remaining resources needed or the threshold. The allocator confirms whether smaller bundle categories have been made, if they have, it will compare any bundle made using two or more slaves with the current slave. Should the slave satisfy the resource requirement of one of the smaller categories, the allocator will release the resources used by the slaves that previously made up the bundle and replace it with the current slave. If the slave has insufficient resources to replace one of the smaller categories the allocator continues to attempt to fill the current category using the next slave.

If the allocator is unable to create all bundle categories from the datacenter, it will continue to search in another datacenter. By using the stored cluster, the allocator can retrieve the most optimal datacenter relevant to the datacenter that was currently visited. The same process applies when the allocator has to start searching in another cluster. The allocator calculates the latency and bandwidth cost of the neighbouring datacenters relative to the current datacenter and returns the datacenter with the greatest score.

When all possible bundles have been made given the resources available, the allocator starts sending out offers to the frameworks, using the bundles given to the respective frameworks. Given that Mesos frameworks are not written to handle a combined unit of offers, the decision was to adhere to the architectural style and send each bundle as a single independent offer. The allocator will first try to send the smallest, as this have the highest chances of having the lowest latency and bandwidth as it can more likely be satisfied with the fewest amount of slaves, and thus fewest amount of different datacenters, if the framework is rejects the offer, the allocator sends the next category, and repeats this process until the framework has either accepted or rejected all and will simply have to wait. If a bundle is accepted, the larger categories, if any, will have their resources released.

To make the frameworks operate according our proposed new way of offering resources, the allocator would send a single offer, containing all the bundles, and the framework would decide which of the bundles would suffice. Enabling this would lead to a less “chatty” system, and the frameworks having to process less offers as the amount of offers sent to each framework would decrease from minimum 1 and maximum 3 to constant 1.

(39)

4.3.2 Using Affinity propagation

What makes the affinity propagation a superior choice for our task compared to k-means, as opposed to having to choose the amount of clusters and initial set of points, by using affinity propagation, the gradual emerging of exemplars and associated clusters is enabled through the exchange of real-valued messages between the data points [32].

Matrices are used to represent both data sets. Using matrix as representation is well-suited for datasets that are dense. The message exchange between the points is the equivalent to matrix manipulation. The algorithm scales well with size, as it uses four matrices for its calculation:

• Similarity matrix

• Responsibility matrix

• Availability matrix

• Criterion matrix

As input, the algorithm requires two sets of data:

1. The representation of how fit a given point is to be the exemplar of another point. This is the similarity between the data points. No similarity between the two points means they cannot belong to the same cluster.

2. The second set describes how suited data points are to be an exemplar, called preferences.

The iterations are performed until either the cluster boundaries remain un- changed over a number of iterations or after some predetermined number of iterations steps regarding message passing is executed during each iteration:

1. Responsibility calculation: Responsibility r(i, k) shows the accumulated evidence for how fit point k is to be point i’s exemplar, also accounting for other potential exemplars for pointi. Responsibility is sent from data pointito candidate exemplar pointk.

2. Availability calculation: Availability a(i, k) shows the accumulated evidence for how appropriate it would be for pointito choose pointkas its exemplar, taking into account the support from other points that pointk should be an exemplar. Availability is sent from candidate exemplar point kto point i.

Initially, both matrices are initialized to all zeroes, and after each iteration affinity propagation uses the similarity and availability calculated in the previous iteration in order to calculate the responsibility of the data points. By summing the responsibility and availability matrices, clustering information regarding exemplars is retrieved.

Resource Allocation in Geographically Distributed Multi-Cloud Environments

Resource Allocation in Geographically Distributed

Multi-Cloud Environments

Hans Henrik Sande

Thesis submitted for the degree of

Master in Programming and System Architecture 60 credits

Department of Informatics

Faculty of Mathematics and Natural Sciences

UNIVERSITY OF OSLO

Resource Allocation in Geographically Distributed

Multi-Cloud Environments

Hans Henrik Sande

Abstract

This study aims to provide a scalable resource management solution for multi-cloud environments. The proposed solution is a heuristic algorithm, implemented as an allocation module in Apache Mesos.

The allocator was tested on a real-world platform with two remote cloud providers. For large-scale evaluations, simulations are used.

Different simulation scenarios were ran, each with different parame-

ters to see how our algorithm performed. The results are compared

with two other algorithms.

Contents

List of Figures

List of Tables

Acknowledgements

The writing of this thesis has come together with the help of many people. I would like to thank you all for the kind help you have given me.

I would like to offer special thanks to my supervisor Feroz Zahid, for being a superlative sparring partner and offering valuable feedback throughout this work.

Finally, I will thank my family, friends and girlfriend for being so

patient during the writing of this thesis. Thank you Stephen for

proofreading of the thesis.

1 Introduction

1.1 Motivation

1.2 Outline

2 Technical Background

2.1 Grid Computing

2.2 Cluster computing

2.3 Cloud computing

2.4 Clustering Algorithms

3 Mesos

3.1 How does Mesos work?

4 Implementation

4.1 Associating slaves to datacenters

4.2 Combining resources across datacenters into offers

4.3 Deep dive into the allocator