High Throughput Virtualization

(1)

High Throughput Virtualization

Susinthiran Sithamparanathan

Thesis submitted for the degree of Master in Programming and Networks

60 credits

Department of Informatics

Faculty of mathematics and natural sciences

UNIVERSITY OF OSLO

(2)

(3)

High Throughput Virtualization

Susinthiran Sithamparanathan

(4)

Printed: Reprosentralen, University of Oslo

(5)

Abstract

Virtualization is one of the key technologies used in the era of Big Data and Cloud.

In this thesis, we’ll look at achieving high throughput network performance with 40Gb/s Ethernet (40GbE) adapters with support for Remote Direct Memory Access (RDMA), Single Root I/O Virtualization and Sharing Specification (SR-IOV) and RDMA over Converged Ethernet (RoCE).

The study looks at the challenges, issues and achieved benefits of such SR-IOV- enabled high throughout network adapters in a virtualized environment intended for high throughput networking. This study shows that SR-IOV and RoCE is able to deliver close to bare metal and line rate network throughput. The results show that the combination of SR-IOV and TCP/IP delivers 91.3% increased bandwidth compared to Paravirtualization and that RoCE delivers 80.2% higher bandwidth over TCP/IP. The results also show that SR-IOV and TCP/IP is able to deliver an increase of 18.4% over bare metal in terms of achieved network throughput. However, the increased performance of SR-IOV does come with a cost of increased system load as well as higher memory usage as the study will further detail.

(6)

Acknowledgements

I’d like to express my appreciation to the following people and institutions, and recognise their support:

• Simula Research Laboratory for hosting this very interesting project and providing suitable research environment throughout the multiple extended project time.

• UiO: University of Oslo and Oslo for offering the master degree program and providing a high quality study environment and facilities.

• Professor Tor Skeie for being my secondary supervisor at Ifi/UiO.

• Adjunct research scientist Ernst Gunnar Gran for giving me the opportunity to work with such an interesting topic with the mentioned resources available at the Department of Advanced Computing and System Performance (CASPER) at Simula, and for his guidance even after partially quitting Simula!

• Dr. Vangelis Tasoulas for being my primary supervisor even at time after finishing his Ph.D at Simula. Thank you very much for the time you spent giving me guidance and advise through the extended project time! Your help is invaluable to me!

• The administration at my former employer Institute of Theoretical Astrophysics at UiO, for motivating me to complete a master degree, and giving me the opportunity and support to study and take the mandatory exams at for the master program at UiO.

• My wife Kalpana, my mother Ambi and cousin Asha for your invaluable support and help while i was away from home and couldn’t take care of our kids. No words can express how thankful i’m for spending countless time with

(7)

our kids while i had to study, take exams and work on the master thesis! And our much beloved daughters Nitara and Mayraa for letting me work on master degree spending less time with you. Now you can look forward to not having to ask anymore: "Pappa, skal du til skolen idag og?" when dropping you home after the nursery!

(8)

List of Figures

1.1 Cloud adoption 2017 . . . 13

1.2 Three types of cloud computing . . . 14

2.1 Type 1 VM architecture (native) . . . 29

2.2 Type 2 VM architecture (hosted) . . . 29

2.3 How SR-IOV works . . . 33

2.4 RDMA architecture . . . 35

2.5 Architecture of Infiniband, RoCE and TCP/IP . . . 36

2.6 KVM Guest Execution Loop . . . 41

2.7 OpenStack Architecture . . . 43

4.1 iPerf vs Perftest B/W . . . 71

4.2 TCP/IP Paravirtualization vs Bare Metal . . . 75

4.3 Adding Virtual Hardware from Virtual Machine Manager . . . 78

4.4 VM Average Memory Usage: Paravirtualization (PV) vs SR-IOV (TCP/IP and RoCE) . . . 80

5.1 TCP/IP Average Bandwidth for different MTU sizes . . . 82

5.2 TCP/IP Bandwidth and System Load for MTU 1500 and 9000 . . . 83

5.3 TCP/IP Bandwidth for different MTU sizes . . . 85

5.4 RoCE and TCP/IP Mean Bandwidth for all MTUs . . . 86

5.5 RoCE and TCP/IP Mean System Load for all MTUs . . . 88

5.6 TCP/IP IRQ Generation Server and Client . . . 89

5.7 TCP/IP IRQ Affinity Core 1-7 and CPU Affinity Core 0 . . . 91

5.8 VirtIO TCP/IP B/W Different MTU . . . 93

5.9 Paravirtualization TCP/IP IRQs generation Server and Hypervisor . . . 94

5.10 B/W & System Load MTU 1500 vs 9000 . . . 96 5.11 CPU Context Switches Client (VM) and Hypervisor MTU 1500 vs 9000 . 99

(12)

5.12 SR-IOV TCP/IP Bandwidth for all MTU sizes . . . 100 5.13 SR-IOV TCP/IP Bandwidth and System Load for MTU 1500 and 9000 . 101 5.14 SR-IOV: RoCE vs TCP/IP Average Bandwidth for MTUs . . . 103 5.15 SR-IOV: RoCE vs TCP/IP Average System Load for all MTUs . . . 104 5.16 Paravirtualization and SR-IOV: IRQ Generation on Hypervisor and VM 105 5.17 Left: % CPU load hypervisor. Right:% Fraction of CPU time used for

servicing guest . . . 107 C.1 Bare metal memory usage RoCE and TCP/IP . . . 137 C.2 Hypervisor memory usage Paravirtualization and SR-IOV (RoCE and

TCP/IP) . . . 138

(13)

List of Tables

3.1 Physical Servers . . . 53 3.2 Experiment Phases . . . 55 4.1 Developed Scripts . . . 68

(14)

Part I

Introduction

(15)

Chapter 1

Motivation

Cloud computing is a relative recent paradigm which is changing the landscape of IT processes in many enterprises. Traditionally, the IT was mainly limited to an in-house portfolio for many of the businesses. It’s been predicted that cloud computing will grow [1] and that the cloud computing was going to be the preferred choice for many businesses and that the businesses should develop a strategy for workloads that could be moved out to the cloud and that should be kept in-house [2][3]. In recent years, this has actually become the trend for businesses as they have moved out workloads that is suitable to be run in the cloud. And today cloud computing is among the preferred technology for many academic environments, enterprises and service providers.

Service providers such as Amazon, Google, Salesforce and Microsoft have already established new data centers in various locations around the world for the purpose of delivery cloud computing services to the public with redundancy and reliability.

Cloud computing mainly delivers computing resources on the following levels [4]:

• sofware: such as Dropbox offering a software service to store and sync files.

• platform: such as Microsoft Azure or Google App Engine enabling tenants to run their applications on.

• intrastructure: such as Amazon Web Services (AWS) delivering infrastructure services (storage, compute, network etc)

As cloud computing is becoming well established and mature, the enterprises aren’t waiting to benefit from the services it offers, whether it’s in the form of public, private

(16)

or hybrid cloud. According to a survey by IDG [5] in 2016 showed that enterprise companies (with more than 1000 employees) have moved 45% of the applications and computing infrastructure to the cloud already and that these companies anticipate having 60 percent of their total IT environment in a mix of public, private, and hybrid clouds by 2018. Another recent survey by Right Scale [6] in 2017 showed that the enterprises run 32 percent and 43 percent of their workload in public and private cloud, respectively. The same survey showed that the hybrid cloud is the trend and the preferred strategy for decision makers among the enterprises. As we can seen in the figure 1.1 taken from the survey, 95 percent of the respondents are now taking advantage of cloud computing in 2017. Yet another report from Technology Business Research [7] estimates a growth of total spending of business in private cloud to $69 billion by 2018, a compound annual growth rate (CAGR) of 14 percent from 2014.

This is not surprising as cloud service providers offer tenants to scale as the demand increases hence allowing tenants to start with smaller workloads. With such adoption rate, we need to more research and studies of the subject of I/O and high speed networking. The network of a cloud is the tenants’ highway into the cloud resources and it’s the one that pushes the tenants data in and out of the cloud. Resources such as CPU, memory and storage can be negotiated on an SLA-level with QoS. This way, tenant’s applications are then given a mean of guarantee for the available resources.

Therefore, it is critical that high speed network adapters used in cloud infrastructure do not take up unfair amount of resources, such as CPU and memory, in order to perform as close to their design specifications.

(17)

Figure 1.1: Cloud adoption 2017

There are many types of cloud computing that exist as of today. The most common cloud types are public, private and hybrid cloud [4]. These three types as well as some other types of cloud are further explained in chapter 2 under subsection 2.1.3

Figure 1.2 illustrates the three different cloud types. A typical characteristic of public cloud is that it’s provided by a third party commercial provider over the Internet to many tenants. The tenants share the resources (CPU, network and storage) within the cloud and the providers have infrastructure with data centers and servers to provide such a cloud service. Examples of public cloud providers are Amazon, Google, Salesforce and Microsoft. Private clouds are typically dedicated cloud computing resources within a business’ private network and is either managed by the company’s IT or an external cloud provider. A hybrid cloud is a mix of the former two. Typically, it can be a business strategic choice of having some of the applications running in a public cloud, while keeping the others inside their own private cloud with their own infrastructure. There are multiple factors that need to be taken into consideration when making the decision about what applications to run in different cloud types.

These factors involve, but not limited to, laws, regulations and business model.

Private clouds are deployed by research environments and IT businesses around the world. This type of cloud provide researchers a testbed for their research and an infrastructure for the businesses to develop the application on. As of the writing, there

(18)

are some orchestration tools (also known as cloud framework) available that are Open Source Software (OSS) [8] with OpenStack being a mature one.

OpenStack has gained significant popularity and support from many leading technology companies. It has become the de facto standard for open source IaaS cloud deployments. Openstack consists of well known open source components and it all started as a NASA and Rackspace project back in 2010 and later adopted by Ubuntu Linux developers in 2011. Today OpenStack lists over 200 companies and organizations as members, sponsors and supporters of the OpenStack Foundation on their website [9]. Among the most notable ones are the founders Rackspace and NASA. Other supporters include names such as AT&T, HP, Dell EMC, IBM, Canonical, Red Hat, Cisco and SUSE.

Figure 1.2: Three types of cloud computing [4]

(19)

Cloud provider services are layered in the following SPI(SaaSPaaSIaas) mode:

• Software as a Service (SaaS)

• Platform as a Service (PaaS)

• Infrastructure as a Service (IaaS)

These are layered on top of each other with IaaS at the bottom. On top of IaaS reside PaaS and SaaS, with the latter being on top one. Enabling technologies at the core of the cloud computing is virtualization, network and storage. As the SPI layers are a combination of building blocks of clouds, hardware and software, issues like availability, performance and efficiency is to be taken into consideration. Specifically, as both PaaS and SaaS reside on top of IaaS, performance and Quality of Service (QoS) is tightly coupled with the performance and QoS of IaaS. As applications running in the cloud have become more data-intensive [1], the underlying infrastructure now has to move data between VMs at bandwidth requirement as per SLA agreement between the tenants and the cloud provider. Therefore, we will deploy such an IaaS cloud using OpenStack using 40GbE network adapters in order to study how it might perform in such a scenario.

In order to achieve higher degree of efficiency and server (and hardware) consolidation, virtualization technology is heavily used in cloud deployments. Physical resources like processing units (CPU/GPU), memory, storage and networks are virtualized and shared between virtual machines in order to achieve a high degree of resource utilization and efficiency.

It’s not uncommon to host some significant numbers of VMs on a physical host in an attempt to increase the data center resource utilization. In such a scenario, the VMs share prossessing units and Input and Output interfaces (I/O) which in turn might affect the overall performance of the cloud in terms of computational power and communications (network). An undesired effect of this could be loss of availability of the services.

The processing capabilities of today’s processing units are very high and with powerful dynamic resource allocation techniques such as dynamic memory allocation [10], main consideration of performance is about I/O and specifically networking [11] to provide high QoS. This strongly emphasizes the significance of the network in today’s cloud infrastructure. Physical hosts and VMs within a data center is dependent on the QoS provided by the network layer to perform to

(20)

a satisfactory level. Networks within a cloud can both be physical and virtualized.

Cloud computing is highly parallel and distributed in nature where the resources are shared through physical and virtual networks. As VMs reside on different physical hosts, they are mananged through the networks. Communication between physical hosts and VMs does also occur over the network be it physical or virtual network.

The reliance upon the network in such cloud environments will ultimately affect the overall QoS which is again directly related to the performance and QoS of the networks. Network Interface Card (NIC), hereafter called network adapter, uses a virtualized version of the physical network adapter present in the host server. There are multiple techniques to expose a network adapter from a physical host to VM which will be discussed later in this paper, but the network I/O in and out of a VM will pass through this virtualized network adapter inside the VM. As of the writing, it’s quite common to see 10GbE adapters used between physical hosts in cloud deployments as well as 1GbE network adapter. As today’s high end processors have significant amount of cores and can address a vast amount of memory, the density of hosted VMs within a physical hosts are increasing. This also means the physical host’s network adapter needs to process higher number network traffic/packets.

Without paying closer attention to the networking capacity of a virtualization host, it can quickly become a bottleneck of the cloud deployment.

Emulation is one of the techniques used to expose the network adapter from physical host to the VMs. Using emulation for an interface means emulating/simulating the complete interface fully in software inside the VM hence adding significant processing overhead as well as higher resource utilization to the Virtual Machine Monitor

(VMM) [12][13][14][15]. Also, paravirtualization and fullvirtualization is used as two approaches to expose interfaces from hypervisor to VM. With paravirtualization, the VM does not emulate/simulate hardware, but it’s requried guest OS to be modified.

Fullvirtualization on the other hand requires no such modification of the guest OS and can take advantage of built-in hardware support from processors from Intel (VT-x) and AMD (AMD-V). With the above mentioned ways of exposing interfaces from a hypervisor to VM, there is a significant amount of intervention of hypervisor involved [11] with data processing of the I/O devices which also affects the running VM. Such overhead can saturate the processor in a high speed network, such as 40 Gigabit Ethernet (GbE), impairing the overall system performance.

In order to minimize the hypervisor intervention and keep the overhead at a minimal level, several techniques are proposed such as PCI passthrugh and SR-IOV. These

(21)

techiques allow the VMs to directly access the physical I/O resource without the need of emulation from the hypervisor [11]. PCI passthrough refers to a PCI device being directly and exclusively exposed to one dedicated VM and it’s guest OS. PCI passthrough yields significant increase in performance close to what we can get with native device and at the same time minimizing the process overhead introduced by device virtualization. But the major downside of exposing a device to a VM using PCI passthrough is that only one VM can utilize the device at time and the physical I/O resource can’t be shared between VMs, hence it’s not a scalable solution. Servers have a finite amount of PCI Express (PCIe) lanes and slots hence the amount of VMs utilizing PCI passthrough for devices will be limited. Additionally, direct assignment of physical I/O resource to a VM is an issue for live migration [16] which is a key characteristic of the cloud.

SR-IOV is an extension to the PCIe standard that allows VMs to directly access the shared I/O devices without intervention from hypervisor. It is similar to PCI passthrough in the way that an I/O device is available the VM directly While using PCI passthrough only one VM can be given exclusive direct access to the I/O device, the SR-IOV extension adds the (huge) benefit of being able to be used by multiple VMs simultaneously. This way of passing through an I/O device yields performance benefits [17], but requires SR-IOV enabled hardware. A single SR-IOV-capable PCIe device exposes multiple endpoints that are light-weight PCI functions called virtual functions (VFs). Each VF can then be passed to the VMs in a virtualized environment.

SR-IOV not only improves performance by bypassing hypervisor intervention, but also addresses the availability issue we mentioned with PCI passthrough technique.

One of the interesting questions is whether an SR-IOV-enabled I/O device delivers the performance out of the box without any further system tuning in a virtualized environment.

1.1 Problem Statement

In a multi core virtualized cloud environment, applications that use TCP or UDP are processed by the CPU and they all need to wait in line with other applications and system processes for their turn of using CPU cycles. In addition to the negative network performance impact this creates, the CPU cycles are better utilized for tenants workloads.

(22)

High throughput network adapters such as 40GbE network adapter is able to process 40 billions packets per second. One major problem in such a scenario is that if the CPU cores have to be involved in processing of every packets, it will firstly lead to higher latency and reduced network throughout, but it will also adversely affect the performance of VMs the hypervisor is hosting. When there aren’t a data channel steadily available to service the network I/O, a system can become less predictive with regard to when the CPU will be available to the tenant’s workload or when it will have to dedicate processing power to servicing the network I/O data.

Based on the discussion in the current chapter, the research questions (RQs) for this study are:

• RQ1: What are the challenges of using SR-IOV enabled network adapter in a virtualized environment?

• RQ2: What are the challenges and issues of deploying VMs for high performance and high throughout networking?

• RQ3: How can we achieve close to 40Gb/s without creating significant load to the CPU cores?

• RQ4: What are the considerations to be made when deploying VMs for high throughput networking in a cloud environment?

1.2 Thesis structure

The layout of the this paper is as following: The background chapter (2.1) comes after this introduction, where related works and literatures are collected. It aims at giving a brief overview of the different tools used throughout the project, as well as relevant technologies. The methodology chapter (3) next gives an explanation of objectives and methods of the study and describes the project plan. It is included by some important parameters and calculations related to the method. It is followed by the results and analysis chapters (4 and 5) where the actual results are displayed and analyzed in detail. Then in the discussion chapter (6), an overall evaluation of what has been done, problems encountered and future work is discussed. The conclusion comes at the end where the questions asked in the problem statement section (1.1) are answered based on the findings in this study and the knowledge acquired. Also

(23)

an appendix chapter with URL links to all of the important scripts created, some configuration setup of the NIC as well as some graphs can be found at the very end of this document.

(24)

Chapter 2

Background

2.1 Cloud computing

In the midst of major trends like Big Data and IoT, cloud computing is a critical technology needed to process, move and store data. Recall from chapter 1 that some surveys showed that cloud computing is considered mature enough for the business to adopt the technology as part of their IT portfolio. Cloud computing is known as the sixth paradigm of computing technology as shown in figure??(retrieved fromVoas and Zhang).

Infrastructure behind a cloud consists of elastic pools that consist of large number of servers connected through network. Resources in cloud are elastic in the sense that they can be dynamically allocated to the tenant on demand and released after usage, then to become available to other tenants of the same cloud. The elastic resources in a cloud environment are of following types:

• Processing units

• Storage

• Networking

• Applications

The cloud infrastructure contains, in addition to the physical layer of hardware resources, also an abstraction layer of software that is deployed across the physical

(25)

layer. By offering such elastic resource allocation to the tenants, the cloud eliminates the initial cluster setup cost and time [19].

Today, there are many definitions of cloud computing but according to Mell et.al cloud computing is:

"a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction." [20]

From Mell et.al definition of cloud and various other’s [21][22][23], it can be

understood that cloud deployment is aiming to achieve highly scalable and available on demand computing services delivered through the Internet.

2.1.1 Cloud Computing Characteristics

Cloud computing paradigm has brought the following characteristics and features [4][24][20][1]:

• On-demand self service:Tenant’s resource needs are automatically taken care of without manual intervention by the service provider.

• Broad network access:Services are accessible over the Internet from client platforms such as mobile phones, tables, laptops or desktops. Additionally, high network performance and localization is achieved by cloud service providers by leveraging geo-diversity of data centers around the world.

• Resource pooling:# Re-phrase needed # The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. There is a sense of location independence in that the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter). Examples of resources include storage, processing, memory, and network bandwidth.

• Rapid elasticity:Resources are automatically scaled and released based on demand. Resource capabilities appear unlimited to the tenant at any time.

(26)

• Measured service:A pay-per-use pricing model is employed. Resource usage is monitored and tenants are billed according to the resource usage only.

Amazon, a leading public cloud provider, offers elastic resources within their Amazon EC2 tier. A tenant utilizing a virtual machine within Amazon EC2 can whenever needed, reconfigure elastic resources as mentioned above and then release the additional resource when they are not needed anymore. In other words, scaling of resources in the cloud is reversible in contrast to adding hardware resources to a bare metal server that is typically done once and never removed. This is covered by resource pooling, on-demand self service and rapid elasticity. Access to tenant’s virtual machine and resources are provided over the Internet covering broad network access part of the cloud characteristics. Since the service is measured, the "usual"

utilization of initial resources as well as the extra resource additions will be added to tenant’s invoice.

2.1.2 Cloud Layered Architecture

In the section 2.1, we briefly mentioned how a cloud infrastructure would look like.

The architecture of a cloud computing environment is divided into four layers [24][4]:

• the hardware/datacenter layer: consists of physical resources of the cloud within a data center such as physical servers, routers/switches, cabling, power and cooling systems.

• the infrastructure layer: pool of virtualized resources created on top of physical resources.

• the platform layer: operating systems and applications frameworks built on top of virtualized infrastructure layer.

• the application layer: Actual cloud applications residing on top of the platform layer and used by the tenants.

For the delivery of cloud services, a service-driven business model is em-

ployed [4][24][20]. As mentioned in the chapter 1, these services can be grouped into three categories:

• Infrastructure as a Service (IaaS): refers to scalable computing resources such as VMs, servers, storage, load balancers, networks etc provided to the tenants

(27)

to be able to deploy and run arbitrary software such as operating systems and applications.

• Platform as a Service (PaaS): refers to computing platform services such as operating system, programming language execution environment, database and web server that are readily available to the tenants.

• Software as a Service (SaaS): refers to delivery of application over Internet that run in the cloud. There is no burden for the tenant to manage nor control the underlying cloud infrastructure that are all taken care of by the service providers. Today smart mobile phones are widely used and it typically runs

"apps" that is a great example this type of cloud software delivery. While app’s front end resides locally within the phone, the backend resides in the cloud.

Each layer above IaaS abstracts the details of the underlying layer. IaaS being at the bottom means it’s the layer that the upper layers of PaaS and SaaS depend on. This thesis intend to study the network performace of IaaS as it’s a critical layer of the architecture.

In addition to abovementioned cloud "aaS" architecture, there are also other models such asDatabase as a Service (DBaaS)[25] and and even the concept ofEverything as a Service (XaaS)[26]. XaaS was mentioned by Armbrust et.al already in 2009 [27].

2.1.3 Cloud Deployment Models

According to [20] and other research papers there are four types of cloud deyploy- ment models [4][24][26][25], but there are also other emerging deployment type [28].

Public cloud is a type of cloud where computing resources are dynamically

provisioned off-site by a third-party provider. The computing resources are present in the cloud service provider’s data center and share shared with the tenants in a multi- tenant architecture. A public cloud services offered vary from infrastructure, storage to applications to the general public over the Internet. Some well-known examples of public clouds include Amazon Elastic Compute Cloud (EC2), Google AppEngine and Windows Azure Services Platform

Public clouds enable the enterprises to cut the initial investment into hardware and software, thus reducing the economic risk. As availability is one of the main

(28)

characteristics of the cloud, an enterprise can carefully start with small amount of resources and grow the resource utilization based on the demand. However, not every type of workload is well suited to be put into the public clouds. As the public cloud is operated by a third-party offsite, security and privacy along with law and regulations are of high concern. Not to mentioned the connectivity that solely depend on the availability of the Internet on both side, the provider and the tenant.

Private cloud is a type of cloud intended internally for the enterprises. In contrast to public cloud, private clouds are not publicly available to everyone. Cloud

infrastructure can either be owned by and be on premise within the enterprise or off premises at a cloud provider exclusively reserved for usage by the enterprise. In both cases, the management can be done by the IT of the enterprise, by a third-party or a combination of both. On premise private cloud deployment has the huge benefit of giving the enterprise security and privacy enabling them to comply to required laws and regulations. As of today, there are some popular private cloud platforms available: OpenStack (will be discussed in details in section 2.7), CloudStack, Eucalyptus, OpenNebula, VMware vCloud Suite and Microsoft Azure Stack.

Hybrid cloud is a type of cloud that combines the two above mentioned types of cloud. As such, a hybrid cloud can be a good trade off between the limitations of a private cloud and the security and privacy issues of a public cloud. As we could see from some of the recent surveys, this type of cloud is on the rise and a trend as of today.

Community cloud is a type of cloud that slightly differs from public cloud, although used by several constituencies. The definition of community cloud is:

"The cloud infrastructure is provisioned for exclusive use by a specific community of consumers from organizations that have shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be owned, managed, and operated by one or more of the organizations in the community, a third party, or some combination of them, and it may exist on or off premises."[20]

As such, a community cloud involves cooperation and integraton of IT infrastucture and resources from multiple organizations. It requires interoperability and

(29)

compliance between the participating organizations and their resources, including identity management (IAM). One example is the community cloud shared between scientists from different organizations at the CERN Large Hadron Collider (LHC).

Inter cloud is also known as federated or multi cloud, is a type of cloud that provides basis for provisioning heterogeneous multi-provider resources for various workloads on demand with respect to QoS [28]. It aims to provide seamless

integration of public cloud of different providers. Figure (cite figure) shows the architecture of an inter cloud.

2.2 Virtualization

Virtualization is one of the key technology leveraged by data centers delivering cloud computing services. It has virtually transformed today’s data centers. Within the concept of virtualization lies the ability to emulate another computer system, in software, using the same hardware. However, the concept of virtualization is not new and it has existed from the 1960s [12][29][30][31]. At that time, more powerful hardware called mainframe was used with hypervisor to partition and isolate each VM running simultaneously within one mainframe. Fast forwarding to 2005, as the enhancements in hardware technology has been steadily improving, hypervisors gained traction among academia and industry.

Today, the term "virtualization" has become ambiguous [32]. For instance, mobile device emulators are a type of virtualization due to the fact that the OS is running on an emulated hardware, hence removing the OS binding from the hardware [31].

In this study, we look at virtualization in the context of cloud computing and data centers. There are multiple definitions of the term "virtualization" [33][34][35]. One definition by Sahoo J. et.al is:

"Virtualization is a technology that introduces a software abstraction layer between the hardware and the operating system and applications running on top of it".

The objectives of the virtualization technology [36][34][37][38][39] is to:

• Add an abstraction layer between the application and the hardware

(30)

• Enable consolidation and reduce cost as well as complexity.

• Provide isolation of computer resources for improved reliability and security

• Improve service level as well as the QoS

• Better align IT processes with business goals

• Eliminate redundancy in, and maximize utilization of, IT infrastructure

There are many numerous approaches to virtualization [31][33] such as mobile, data, memory, Desktop Virtual Infrastructure (VDI), storage, network and application virtualization, but this study will focus on server and I/O virtualization in particular.

They can be classified into three categories:

• Infrastructure virtualization: network and storage

• System Virtualization: server and desktop (VDI)

• Software Virtualization: application and high level language

2.2.1 Server Virtualization

In order to maximize resource utilization and efficiency, a physical server can be utilized to run multiple operating systems with isolation, independently from other OS. This is generally the most common virtualization known today and when people generally use the term "virtualization", they refer to server virtualization. It hides the physical characteristics of computing resources such as CPU, memory and storage to the software running on it and the entity using it.

There are also multiple definitions of server virtualization[\cite][~][31]

Common types of server virtualization is as following [36][33][31]:

• Hardware virtualization, aka HVM

• Paravirtualization, aka PVM

• Operating system virtualization, aka containers

Hardware Virtualization Also known as Hardware Virtual Machines (HVMs), is a virtualization technique that relies on special hardware to achieve computer

(31)

virtualization. An HVM works by intercepting privileged calls from a VM and handing these calls to the hypervisor. The hypervisor decides how to handle the call, ensuring security, fairness, and isolation between running VMs. The use of hardware to trap privileged calls from the VMs allows multiple unmodified OSs to run. This provides tremendous flexibility as system administrators can now run both proprietary and legacy OSs in the VM. In 2005, the first HVM compatible CPU became available, and as of 2012 nearly all server-class and most desktop-class CPUs support HVM extensions. Both Intel and AMD implement HVM extensions, referred to as Intel VT-X and AMD-V, respectively. Since HVMs must intercept each privileged call, considerably higher overhead can be experienced than with PVMs. This overhead can be especially high when dealing with input/output (I/O) devices such as the network card, a problem that has led to the creation of paravirtualization drivers.

Paravirtualization drivers such as the VirtIO package for KVM allow a VM to reap the benefits of HVMs such as an unmodified OS while mitigating much of the overhead.

Examples of HVM-based virtualization systems include KVM and VMware servers.

Paravirtualization Also know as Paravirtualization Machines (PVMs), is a virtualization technique that was the first form of full computer virtualization and are still widely deployed today. The roots of paravirtualization run very deep indeed with the first production system known as VM/3701, created by IBM and available in 1972, many years before paravirtualization became a mainstream product. The VM/370 was a multi-user system which provided the illusion that each user had their own operating system. Paravirtualization requires no special hardware and is implemented by modifications to the VM’s operating system. The modifications instruct the operating system (OS) to access hardware and make privileged system calls through the hypervisor; any attempt to circumvent the hypervisor will result in the request being denied. Modifying the OS does not usually create a barrier for open source OSs such as Linux; however, proprietary OSs such as Microsoft Windows can pose a considerable challenge. The flagship example of a PVM system is Xen,2 which first became an open source project in 2002. The Xen Hypervisor is also a keystone technology in Amazon’s successful cloud service EC2.

Operating System Virtualization There is also a third class of virtualization know as container virtualization or OS virtualization, which allows each user to have a secure container and run their own programs in it without interference. It has been

(32)

shown to have the lowest overhead when compared to PVMs and HVMs. This low overhead is achieved through the use of a single kernel shared between all containers.

Such a shared kernel does have significant drawbacks, however, in that all users must use the same OS. For many architectures such as public utility computing, container virtualization may not be applicable as each individual user wants to use their own operating system in their VM, and is therefore not the focus of this article [40].

The management of VMs in such a virtualized platform is done by a hypervisor or VMM. There are two types of hypervisor architecture as illustrated in figure

• Type 1 (also known as native): The VMM or the hypervisor run directly on top of the hardware with the VMs or guest OSes, be it Windows or Linux, running above the hypervisor. The applications are run inside each VM which is the layer above the VMs as seen from the figure From the same figure we can see that since the hypervisor run directly on top of the hardware, there is no additional OS layer between them, hence yielding better performance. However, hardware support can be an issue. Example of this Type 1 virtualization

architecture is Kernel-based Virtual Machine (KVM),VMWare ESXi, Xen and Microsoft Hyper-V

• Type 2 (also known as hosted): In this virtualization architecture, the hypervisor sits on top of an OS that controls the hardware resources. The VMs is run on top of the hypervisor and the application on the top layer as seen from figure The fact that the hypervisor is run on top of another OS, performance penalty with this type of virtualization architecture is inevitable. Seen from a VM, access to hardware resources goes through two OSes. On the other hand, the hardware support is as good as the hardware supported by the OS running the hypervisor.

Example of this Type 2 virtualization architecture is KVM,VMWare Workstation and Virtualbox

(33)

Figure 2.1: Type 1 VM architecture (native)

Figure 2.2: Type 2 VM architecture (hosted)

Figure2.1 and 2.2 illustrates the architectural differences of the above mentioned hypervisor types.

Note that KVM is listed as both Type 1 and Type 2 hypervisor. KVM is a kernel

module in Linux that supports hardware virtualization, hence it depends on the Linux kernel fully. However, if there is a bare minimal installation of a Linux distribution of any kind, one might argue that adding KVM module will make it a Type 1 hypervisor.

2.3 I/O Virtualization

The rate at which data can flow from one device or server to another, commonly referred to as I/O for Input and Output, is becoming a bottleneck in the ever growing

(34)

virtualized infrastructure. In a virtualized cloud environment, the I/O devices, such as NICs or storage devices (e.g HDD/SSD), must be shared between the VMM and the VMs. It is the responsibility of the VMM to expose the needed I/O device to the running VMs being hosted and at the same time provide isolation and security for device access routing between the VMs and the physical I/O devices. When such an I/O device is virtualized and exposed to a VM, the VMM must be able to intercept all I/O operations that are issued by the guest OS running inside the VM, in order for the VMM to execute those I/O operations on the physical I/O device. As such, these I/O operations are trapped by the VMM and handled by the privileged VMM, on behalf of the guest OS. Such interceptions by the VMM inevitably creates overhead.

The issue with overhead is even more significant when a high speed network adapter is virtualized and shared between VM because of the high rate of packet arrivals and departures. And by virtualizing I/O devices we achieve multiplexing and demultiplexing, isolation, portability and interposition [16]:

With the increasing processor core counts and higher addressable memory on today’s server hardware, VM density is increasing and is consolidating more I/O traffic onto the servers used as virtualization host. If I/O is not sufficient, then it could limit all the gains brought about by the virtualization process.

Further, I/O virtualization can be divided into full virtualization, paravirtualization and direct device assignment [14][41][42]. The following sub sections intend to explain the three types of I/O virtualization.

2.3.1 Full Virtualization

Full virtualization provides a complete simulation of the underlying hardware enabling software that can be run on the physical hardware to be able to run inside the VM. It is also referred to as emulation or software emulation as there is an emulation layer sitting in between the VM and the underlying hardware. This type of virtualization has the widest range of support for guest OSes.

Pros:

• Offers higher level of flexibility where guest OS need not be modified

• Provides complete isolation of betweem VMs and between VMs and VMM

• Provides near-native performance

(35)

Cons:

• The on-the-fly translation of instructions from the guest OS to host OS (hypervisor) causes significant performance degradation

• Complex on x86 architecture as not all privileged instructions can be trapped.

2.3.2 Paravirtualization

Paravirtualization uses a technique to provide partial simulation of the underlying hardware. One of the key feature is the address space virtualiation to offer each VM it’s own unique address space. Most of the hardware features are simulated, although not all.

Pros:

• Easier to implement compared to full virtualization

• Provides complete isolation of between VMs and between VMs and VMM

• Provides high performance for network and disk I/O when no HW assistance is available

Cons:

• Guest operating systems inside the VMs need modification

• Lack of backward compatibility and low portability

2.3.3 Direct Device Assignment

Called device passthrough, direct assignment, PCI (device) passthrough and Direct (Access) I/O, all referring to the technique of assigning a physical PCI device to a specific VM so that it can directly access the physical resource without intervention of the VMM [43]. VM will, using the device driver, be able to communicate with the device without requiring a device driver in the VMM. I/O Memory Management Unit(IOMMU) is a hardware unit that enables the mapping of device DMA address inside the VM to physical memory address [43]. Additionally, the IOMMUs provide another significant security in the form of isolation for VMs with direct device access.

(36)

As mentioned above, both full and paravirtualization have significant overhead using software to emulate devices to the VMs. Since direct device assignment bypasses the VMM for it’s operation, the overhead is significantly reduced [14] compared to full or paravirtualization. Today, CPU offerings from both Intel (Intel VT-d) and AMD (AMD IOMMU) come with hardware support for direct device assignment [44][45].

The following are the advantages and disadvantages of direct device assignment.

Pros:

• Reduced intervention by the VMM hence less overhead compared to full virtualization and paravirtualization

• Better security by providing isolation of mapped memory regions for devices and VMs.

• Device driver not required at VMM level.

Cons:

• One device can only be used by one VM

• The amount of required PCIe slots is limited

2.4 SR-IOV

Compared to native hardware performance, the I/O performance of virtualized environments have been significantly worse and can quickly become the bottleneck of such a system. The poor I/O performance of virtual machines has suffered because high performance I/O is enabled by the I/O device’ ability to perform direct memory access (DMA), whereby the I/O device can write directly to the VMM’s memory without interrupting the VMM’s CPU. But from within a VM, the DMA is more complex due to the fact that memory address space inside the VM is not the same as the real memory space of the VMM. Every DMA triggered by a VM requires the VMMs intervention for VM-to-host memory address translation. For this, the VM is using interrupts against VMM’s CPU so that the VMM can perform the address translation. Further, when there are multiple VMs being hosted by the same VMM, the VMM also has to act as a virtual network switch when the I/O is bound to a network adapter, ultimately leading to higher latency for such I/O operations.

(37)

As we have seen earlier, there exists multiple techniques to virtualize and share an I/O device with the VMs from a VMM. We have mentioned full and paravirtualization techniques as well as device passthrough, both with it’s pros and cons. However, hardware vendors such as Intel and AMD, have been providing increased support for virtualization within the their hardware, with IOMMU being an example of such.

As part of a continuous research and development to further minimize the overhead involved and improve the sharing capabilities of I/O devices in a virtualized environment, an extension to the PCIe standard was introduced by PCI-SIG [46]. As mentioned previously, the new technique is called SR-IOV and allows VMs to directly access shared I/O devices without the intervention from VMMs hence contributing to reduction in overhead. With the standardization and adoption of SR-IOV, virtualized cloud environments have got a desired increase in terms of high performing I/O virtualization. FigureFiguresr-iov is an illustration of how SR-IOV works.

Figure 2.3: How SR-IOV works

With SR-IOV specification [46], two new function types were introduced, namely Physical Functions (PFs) and Virtual Functions (VFs):

• PFsThese are full PCIe functions that include the SR-IOV Extended Capability.

The capability is used to configure and manage the SR-IOV functionality.

• VFsThese are lightweight PCIe functions that have just enough resources for

(38)

supporting data movement.

2.5 RDMA and RoCE

RDMA [47][48][49][50] as a technology has been utilized by Infiniband intra data center networks for quite some time where low latency and high throughput are key requirements. For instance, in the research field when scientists run code that requires high degree of parallelism using Message Passing Interface (MPI), low latency is critical as MPI passes small messages back and forth between the nodes in a large- scale cluster. This, in contrast to TCP/IP-based network communications that require copy operations causing higher latency, increased CPU utilization and higher memory usage. As of the time of writing, RDMA is supported by the following protocols:

• InfiniBand (IB)a network protocol which supports RDMA natively from the beginning. Often used in HPC environment where low latency and throughput are requirements.

• RDMA Over Converged Ethernet (RoCE)a network protocol that allows performing RDMA over Ethernet networks and existing Ethernet infrastructure.

• Internet Wide Area RDMA Protocol (iWARP)a network protocol that allows performing RDMA over Transmission Control Protocol (TCP). iWARP can be seen as the competing protocol to RoCE although it initially had the ability to work over Wide Are Network (WAN).

(39)

Figure 2.4: RDMA architecture

(40)

Figure 2.5: Architecture of Infiniband, RoCE and TCP/IP

Figure 2.4 illustrates the RDMA architecture while figure2.5 illustrates the different stack of Infiniband, RoCE and TCP/IP. The former two protocol are both based on RDMA, while the latter is not. While Direct Memory Access (DMA) is the ability of a device to access host memory directly without the intervention of the CPU, RDMA is the ability of accessing memory on a remote system without interrupting the processing of the CPU(s) on that system, effectively bypassing the remote system’s operating system kernel and CPU.

In a brief summary, RDMA offers the following advantages:

• Zero-copy: applications can perform data transfers without the involvement of the network software stack. Data is sent and received directly to the buffers without being copied between the network layers.

• Kernel bypass: applications can perform data transfers directly from user-space without kernel involvement.

• No CPU involvement: applications can access remote memory without consuming any CPU time in the remote server. The remote memory server will be read without any intervention from the remote process (or processor).

Moreover, the caches of the remote CPU will not be filled with the accessed

(41)

memory content.

In recent years, RoCE [47] [48][51][52][53][54] is emerging as an interesting RDMA technology promising to keep the latency low, but at the same time running data movement over the well known Ethernet switched fabric instead of InfiniBand Host Channel Adapters (HCA) and switches. RDMA efficiently allows supported systems to communicate with low overhead, latency and with significantly reduced CPU utilization. It does so by having transport offload with hardware RDMA engine implementation and bypasses operating systems kernel to communicate directly between applications. RoCE is a standard protocol defined in the InfiniBand Trade Association (IBTA) standard [52]. One of the main idea of the RoCE protocol is to allow organizations to keep utilizing their existing Ethernet infrastructure and leverage the benefits of RDMA. The ability to do RoCE requires RoCE-capable network interface cards, such as the Mellanox adapters used in this study.

Since RoCE is a sibling technology of Infiniband, it also requires lossless fabric to leverage what it promises. A lossless fabric, such an Infiniband network, is a fabric where packets on the wire are not reqularly dropped. The standard Ethernet is designed as best-effort where packet loss can occur and there are mechanisms on the TCP transport layer to re-transmit the lost packets which in turn adds to the overhead, latency, memory consumption and CPU utilization. Infiniband, on the contrary, uses a technique known as link level flow control to ensure that packets are not dropped in the fabric under normal circumstances.

In order to achieve a lossless fabric with RoCE, a set of enhancements to Ethernet protocol exist under the term Data Center Bridging (DCB). DCB comprises five new specifications from the IEEE which taken together provide almost the same lossless characteristic as InfiniBand’s link level flow control. One of the adopted and notable enhancement is Priority Flow Control (PFC). PFC is a link level flow control mechanism that can be controlled independently for each frame priority to ensure lossless transmission when a DCB network is congested. This requires the RoCE infrastructure to support recent version of Ethernet, meaning the switches, NICs and HCAs must implement the important parts of these new IEEE specification. Since this study’s infrastructure is based on back-to-back connected HCAs, the DCB is not within the scope.

(42)

2.6 KVM

Kernel-based Virtual Machine [55][29] (KVM) brings an easy-to-use, open source and fully featured integrated virtualization solution for Linux. It relies fully on the Linux kernel to be usable. Compared to the type 1 hypervisors that are installed directly on top of the running hardware, KVM requires a running Linux kernel. It’s origin goes back to a Israeli company Qumranet Inc that developed and maintained KVM.

KVM’s debut goes back to 2007 [56][57] when it was merged into the Linux kernel and released with the Linux mainline kernel version 2.6.20 February 5, 2007. Qumranet Inc was aquired by Redhat Inc in September 2008 and further development effort was organized by the open source community with the supervision of Redhat Inc.

KVM is provided as a kernel module to the Linux kernel or the Linux operating system [58][55] turning the operating system into a hardware accelerated hypervisor. It started with support for the x86 architecture (Intel and AMD), further to expand with support for additional architectures. As of the time of writing, KVM virtualization is supported on the following hardware architectures [59]:

• Intelwith the extension VT-x and the vmx CPU flag.

• AMDwith the extension AMD-V and the svm CPU flag.

• ARM: 32-bits System on Chip (SoC) ARMv7-A(Cortex-A7,Cortex-A15, Cortex- A17) as well as 64-bits ARMv8-A SoCs.

• PowerPCis supported by a number of selected embedded cores.

• S390is supported for the 64-bit versions such as z9.

Virtual machines created using KVM appear as normal Linux processes and integrate seamlessly with the rest of the system. The thight integration of KVM into Linux enables us to reuse existing functionality in the kernel such as the scheduler and NUMA support on a developer level. As well, it also enables us to to reuse existing process management infrastruture in Linux, for instance top to monitor CPU usage, taskset to pin virtual machines to specific CPUs and kill(1) to pause or terminate virtual machines.

(43)

2.6.1 How does KVM work?

We can turn any Linux distribution into a hypervisor by installing the kernel module kvm.ko which provides the necessary hardware acceleration for the virtualized resources. And for each processor architecture, there will be an accompanying kernel module installed. For instance, for the Intel processors, the module is called kvm- intel.ko and for AMD processors the modules is called kvm-amd.so. It is important to emphasis that KVM alone is not enough for a hypervisor to be usable. KVM based Linux virtualization solution consists of KVM itself, QEMU and libvirt. KVM needs the assistance of QEMU for the creation of VMs and for the management of VMs and virtual resources, libvirtd [60][61] is commonly used.

QEMU is a generic and open source machine emulator and virtualizer [62][63].

Together with KVM, QEMU is a user-space provided component for emulating machine devices that provides an emulated BIOS, PCI bus, USB bus and a standard set of devices such as IDE and SCSI disk controllers, network cards, etc. Since QEMU executes the guest code directly on the host CPU, it’s performance is close to bare metal. Without the hardware acceleration provided by KVM, QEMU was around four to ten times slower executing code [63].

Libvirt is a C toolkit to interact with the virtualization capabilities of recent versions of Linux (and other OSes). Libvirtd, QEMU and KVM is a combination that’s commonly found in various Linux distributions to enable virtualization and perform various operations on VMs.

As already mentioned, virtual machines created by KVM are regular Linux processes that are scheduled by operating system scheduler (Linux scheduler). Regular Linux processes execute in either user mode or kernel mode with the former being the default execution mode for application running as a Linux process. An application can change into kernel mode only if it requires a service from the Linux kernel, such as an I/O service.

KVM adds another execution mode called guest mode that also has both user mode and kernel mode within it’s context . In other words, a process executing in guest mode is a process that is run inside of a virtual machine. Figure shows a conceptual view of KVM virtualization architecture.

Guest execution loop is executed as following:

(44)

• At the outermost level, userspace calls the kernel to execute guest code until it encounters an I/O instruction, or until an external event such as arrival of a network packet or a timeout occurs. External events are represented by signals.

• At the kernel level, the kernel causes the hardware to enter guest mode. If the processor exits guest mode due to an event such as an external interrupt or a shadow page table fault, the kernel performs the necessary handling and resumes guest execution. If the exit reason is due to an I/O instruction or a signal queued to the process, then the kernel exits to userspace.

• At the hardware level, the processor executes guest code until it encounters an instruction that needs assistance, a fault, or an external interrupt.

Guest execution loop is illustrated in figure 2.6 (retrieved from Lublin et al.). As mentioned, KVM requires CPU hardware support to expose a character special device, namely /dev/kvm that’s available to the userspace to create and run virtual machines through a set of ioctl()s. The following are the operations provided by the /dev/kvm device:

• Creation of a new VM

• Memory allocation to a VM

• Reading and writing virtual CPU registers

• Interrupt injection into a virtual CPU

• Ability to run virtual CPU

2.7 OpenStack

In the field of open source cloud computing platform, OpenStack is a mature and well known software. OpenStack provides an IaaS solution that is composed of a set of loosely coupled, but rapidly evolving FOSS projects that support a wide set of technologies and configuration options. The integration between the components is facilitated by the use of Application Programming Interface (API) offered by each components [64]. OpenStack supports all types of cloud environments. At the time of the writing of this thesis, OpenStack is the leading FOSS platform for building public and private IaaS cloud and has got very good traction among academia and

(45)

Figure 2.6: KVM Guest Execution Loop

(46)

businesses around the world. The project is being backed by many bigger names in the industry as mentioned in the motivation part and today there are over 200 businesses behind this project to collaboratively driving it forward with an active and vibrant community. A significant amount of business provide both public and private cloud services based on OpenStack.

OpenStack IaaS framework is composed of the following three core FOSS projects:

• OpenStack Compute, known as Nova handles VM instantiation and termination, among others, based on VM images from Glance.

• OpenStack Object Storage, known as Swift provides distributed and redundant object storage similar to Amazon S3.

• OpenStack Image Service, known as Glance provides API for VM images to Nova, for instance to create VM instances.

In addition to the three core projects mentioned above, there are other other projects such as OpenStack identify service (Keystone), OpenStack Block Storage (Cinder), OpenStack Network (Neutron) and OpenStack Dashboard (Horizon) providing a web user interface (UI) for management purposes, in addition to the command line interface (CLI).

The code base of OpenStack is developed and released around a 6-month release cycle. After the initial release, additional stable point releases will be released in each release series [65]. During the development cycle, the release is identified using a codename and codenames are ordered alphabetically for consecutive releases. The releases are also referred to by a numerical version number that consists of the release year appended by a 1 or a 2, depending on whether it’s the first or second release of the year in question. For instance, at the time of the writing of this thesis, the current stable version and release of OpenStack is 2017.2 codenamed Pike. Other releases from 2017 is 2017.1 codenamed Ocata. The codenames are cities near where the corresponding OpenStack design summit took place [66].

An illustration of OpenStack’s modular architecture with it’s various components is depicted in the figure??.

(47)

Figure 2.7: OpenStack Architecture

(48)

2.7.1 Components

Some of the major components of OpenStack framework are explained [67][68][69][70][71]

below in detail.

Nova is a cloud computing fabric controller and being the Compute service makes this the main part of the IaaS framework. It’s aimed for management and automation of pools of resources. It interacts with other components like Keystone for authentication and Horizon for user interface. KVM, VMWare, Hyper-V and Xen hypervisors are supported, as well as Linux container (LXC) technology. Nova offers four main services as following [69]:

• The API Service: it receives user requests and translates them into Cloud actions through Web Services

• The Compute Service: It mainly handles communications with the local

hypervisor, so as to enable VM instantiation and termination, as well as queries to VM load indicators and performance metrics

• Network Service: handles all the aspects related to network configuration and communications. In particular, for each server, an instance of this service is in charge of creating virtual networks useful to let VMs communicate between themselves and with the outside of the Cloud. However, due to some limitations by this service, a recent networking component (Neutron, explained below) has emerged with advanced networking capabilities. This network service is now considered legacy.

• The Scheduler service: decides the node on which a new VM has to be instantiated and launched based on policies

Glance is the Image service, lookup and retrieval system for VM images. It is an essential part of the OpenStack IaaS enabling tenants to discover, register and retrieve VM images. Through it’s Image REST API, tenants can query VM image metadata and retrieve an actual image. The VM images can for instance be stored in the Object Storage provided by Swift. Glance provides support for multiple disk and container formats. Amazon machine image (ami) is one of the supported disk image fortmants.

(49)

Neutron is the advanced networking part of OpenStack. It provides management of networks and IP addresses for the IaaS ensuring network is not a bottleneck nor a limiting factor in a cloud deployment scenario. Additionally, it also provides tenants self-service ability of network configurations.

Cinder is the Block Storage service provider to the OpenStack environment. It provides and manages persistent block storage and can attach a logical volume to a VM like a local disk. Cinder also has the ability to back up VMs by interacting with Swift. In previous releases of OpenStack, this service was integrated into Nova and was called nova-volume.

Keystone being the Identity and Access Management (IAM) of OpenStack, it provides authentication and authorization services to entities. It can integrate with existing backend directory services like Leightweight Directory Access Protocol (LDAP). It supports multiple forms of authentication including standard username and password credentials, token-based systems and AWS-style (i.e. Amazon Web Services) logins. Elements of OpenStack including Swift, Glance, and Nova are authenticated and authorized by Keystone.

Swift is the Object Storage service part of OpenStack. It aims to provide highly scalable and redundant object store that is conceptually smiliar to Amazon’s S3 service. Multiple replicas (copies) of each object is distributed to multiple storage nodes to achieve scalability and redundancy! Swift is one of the oldest and mature component of OpenStack.

Trove Trove is Database as a Service for OpenStack. It’s designed to run entirely on OpenStack, with the goal of allowing users to quickly and easily utilize the features of a relational or non-relational database without the burden of handling complex administrative tasks. Cloud users and database administrators can provision and manage multiple database instances as needed. Initially, the service will focus on providing resource isolation at high performance while automating complex administrative tasks including deployment, configuration, patching, backups, restores, and monitoring.

(50)

Horizon is the web UI that provides a dashboard for the cloud management purposes in addition to the CLI provided by OpenStack. It interacts with other components thorough their respective APIs.

Heat is an orchestration deployment function to create or update virtual resource instances using Nova, Cinder or other blocks based on a text template.

Ceilometer being a newer component of OpenStack, it provides metering function of virtual resource usage such as virtual CPU and network I/O as foundation for billing systems.

2.8 Related works

To best of our knowledge, there has been no studies on evaluating performance of 40GbE RDMA and RoCE based NICs in a virtualized environment. There are multiple studies and papers where SR-IOV, RDMA and RoCE have been studied. However, they were either limited to previous generation 10GbE NICs without the support for RDMA or 40GbE RoCE-capable NICs where focus was high throughput and low latency, mostly focusing on the improved performance of hardware-enabled SR-IOV versus various other software-based I/O virtualization.

However, they are interesting studies because they study very important aspects of SR-IOV-capable NICs as well as many system wide key aspects such as IRQ affinity, CPU utilization, latency etc. Such studies can be the foundation for further research and study as the technology advances.

2.8.1 Studying Performance of 1GbE SR-IOV-enabled NIC In Virtualized Environment

Network interface virtualization: challenges and solutions[40].

In this study Ryan Shea et al. presented a performance evaluation of a 1GbE SR-IOV capable NIC in a virtualized environment with regard to bandwidth, CPU cycles, context switches, last level cache (LLC) usage and interrupts. This study compared

High Throughput Virtualization

High Throughput Virtualization

Susinthiran Sithamparanathan

Thesis submitted for the degree of Master in Programming and Networks

60 credits

Department of Informatics

Faculty of mathematics and natural sciences

UNIVERSITY OF OSLO

High Throughput Virtualization

Susinthiran Sithamparanathan

Abstract

Acknowledgements

Contents

List of Figures

List of Tables

Part I

Introduction

Chapter 1

Motivation

1.1 Problem Statement

1.2 Thesis structure

Chapter 2

Background

2.1 Cloud computing

2.2 Virtualization

2.3 I/O Virtualization

2.4 SR-IOV

2.5 RDMA and RoCE

2.6 KVM

2.7 OpenStack

2.8 Related works