Investigating the Rule Re-ordering Algorithm in order to optimize Software Testing Pipelines

(1)

Investigating the Rule Re-ordering Algorithm in order to optimize

Software Testing Pipelines

To provide faster feedback to developers Syeda Taiyeba Haroon

Master’s Thesis Autumn 2017

(2)

Investigating the Rule Re-ordering Algorithm in order to optimize Software Testing Pipelines

Syeda Taiyeba Haroon

2nd January 2018

(3)

(4)

Acknowledgement

I would like to express my sincere gratitude to my supervisor Kyrre Begnum for his immense support and motivation throughout my thesis. It has been a great pleasure to work with him for his enthusiasm and optimism in implementing and growing our ideas through out the project. I have gained a lot more wisdom and confidence beyond my thesis through this experience and I am indebted for his influence.

My courtesy to Anis Yazidi for his time in giving me an introduction to the RR algorithm. It was a valuable starting point.

I am grateful toHårek Hagerud for always being supporting of my decisions and helping me deal with my issues.

My appreciation to all my teachers and staff at UiO and HiOA for making this journey possible. To all my friends and colleagues here in Norway and abroad for their kindness, helpful comments and encouragements. One more special thank you to the caregivers at my daughter’s kindergarten for their thoughtfulness that goes beyond their job specifications.

My genuine thanks to my husband for his constant support and encouragement in all my accomplishments. Without his soundness and motivation I would not be here today. Especially to our one and only four year old daughter for her remarkable patience with me the past few months. I owe her the most!

My heartfelt thanks to my parents living in Bangladesh for always having faith in me and supporting and fuelling my dreams without questions. To both my brothers for their valuable advises and always finding ways to help me. To my second parents also back home for their moral support and my brother and sister in laws for their supporting words in every situation.

(5)

(6)

Abstract

All software development today is engineered by some form of delivery pipeline regardless of there nature. Within this paradigm of release engineering which may or may not be continuously integrated, there within lies the testing phase that plays a crucial role in delivering an efficient software. This thesis aims to

explore the boundaries of software test automation in a continuous delivery pipeline by implementing a re-ordering algorithm to optimally sort the test cases such that tests with a higher likely hood of failing are executed first, thus

enabling faster feedback for failed tests to developers.

The outcome from this project was a test optimizing tool with the use of a re-ordering algorithm. The thesis also provides suggestions for future work

using the implemented tool.

Keywords: software testing, software test optimization, test automation techniques, faster feedback, continuous delivery pipeline, scientific tool

(7)

Chapter 1 Introduction

Software Engineering - Two words put together that have evolved in meaning and state at quite a fast pace over the past couple of years. By definition software engineering is defined as the technological engineering that is applied on software development to produce competent and prominent software. With the growth of digitization and online services huge amount of time and resources are spent on creating "internet aids" for users, thus leading to a continuous bombardment of "softwares" into the World Wide Web every second.

In a fast moving world of technologies, delivering software applications faster as well as providing end user satisfaction is a great challenge in terms of delivery time, satisfactory product, development cost, revenue etc. Over the years the software engineering domain has been experimenting various tools, techniques and methodologies that has continuously aided in managing the software development pipeline better. The goal of today’s software development is all about - Agility meaning faster and smoother delivery of software. Likewise under such circumstancestesting of the product has to happen rapidly as well.

From a top view of the domain of Software Engineering observation shows that many software application developments fail to reach their due dates, due to issues in the development or integration or delivery pipeline. So how are these products validated? How much time is reasonably dedicated to evaluating the performance of a product?

This research will focus on the discipline of Software Engineering as a whole from an operations perspective with specific focus on software testing in the pipeline.

Software Testing has always been there since the start of developments. But it has also evolved over the years, with the evolution of software developments.

Various forms of techniques have been introduced to enhance the development and delivery of software. But, not as much focus is on software test pipeline automation. Solutions include the iterative/incremental development model,

(12)

which also bought about the implementation of the Agile Manifesto Framework to promote a continuous development and delivery process throughout the life cycle of the Software Application/ Project.

The goal of the Agile Manifesto framework is to deliver software in shorter sprints by delivering working software frequently. A fully functioning software delivery at an early stage is the highest priority in satisfying customers. It also provides the primary measure of integral progress by accommodating integration testing at every sprint of the development life cycle.

However, with agile developments, issues in resource requirements increased leading to higher development costs, which led to the introduction of containers and dockers to reduce the resources required by virtualization and thus providing portability and enabling smoother development across platforms. But how has testing evolved in all this?

Some would say that "software testing" is in a way dying. As dysfunction- ing as it sounds, we are assured by the very same source that performance quality of software is still the explicit focus of the development industry. The highlight is that testing of a developing product now happens at every sprint and it happens in that significant time slot between incremental development and production. Therefore, software testing is not "dead", rather it has been morphed and branched out into different stages of the agile development life cycle commonly referred to asContinuous Integration and Delivery.

Continuous Integration (CI) came into play to provide better integrated quality control. CI traditionally does verification by automatic builds that is used to detect errors which have/might have gone unnoticed when modules were independently unit tested.

Looking at the pipeline from a birds eye view, a typical pipeline has four basic stages;

• Commit Code

• Build Automation and Integration

• Test Automation

• Deployment Automation

Figure 1.1: Basic Automation Pipeline

From the above figure 1.1we see that, When a developer pushes code to a version control repository, an automation server picks up on the integrated changes and builds the software step by step. This build process may consist of various

(13)

other tasks among them; software testing, validating the software and finally deploying after clearance.

In large organizations where role-outs happen every two days or less - the pres- sure in the testing domain is at its peak. What happens when one of these long set of integrated system test fails? Is there enough time to fix it? Is the feedback time for failure too long?

Looking closely at the discipline of software testing, we understand that large companies with built-in pipelines for their software delivery run "automated tests" as part of the continuous delivery pipeline. The word "automated tests"

as we observed has an ambiguous meaning. Tests may be automated in various different ways. There has been quite a lot of research on automated tests ranging from simply automating a list of tests to auto generating test data using machine learning etc.

In account of the classification of ways to mitigate the automation testing, different organizations employ different strategies of continuous delivery to resolve their obstacles. This research will take into account large organizations that follow a disciplined continuous delivery pipeline.

Taking into account all the visible challenges in the testing pipeline, proposals to automating the test design process has been a great mitigating factor for organizations. Nevertheless, automation does not necessarily make the testing time slot shorter or feedback delivery faster, it establishes possibilities for a smoother testing process for the developers and the operations team. The main obstacle lies in the complexity of these software tests in large dynamic test environments. This study is meant to focus on two aspects;

1. Tests and their Dependencies 2. Feedback time for Failed tests

The objective of this thesis among many research on test automation is to propose a newer way to optimize software test pipeline along with providing early feedback for failed tests. This is further demonstrated below by justifying the two main focuses of this thesis.

1. Tests and their Dependencies

This thesis looks at testing from the integration testing stage, which is usually done after unit testing (individual component testing). Integration Testing by definition is the combination of many unit test modules which are tightly coupled with each other to test the interface between the different modules.

Fundamentally there are two approaches to integration testing:

1. Bottom-up approach 2. Top-down approach.

(14)

This thesis focuses on bottom up approach to integration testing. Figure 1.2 illustrates bottom up integration. As can be seen from the illustration, the lowest module has a number of tests which need to be carried out first before the top module.

Figure 1.2: Bottom Up Approach to Integration Testing

Electing the most suitable test to carry out first can be based on a number of factors. There has been various researches to optimize the complexity of re-ordering the tests. Tests having dependencies cannot be re-arranged with the use of a simple re-ordering algorithm without leading to integration testing dysfunction. In complex integration systems, individual tests have numerous dependencies which need to be carried out first before the test. Figure 1.3 shows how the branches needs to be parsed when honoring dependencies.

Figure 1.3: Re-ordering by honoring dependencies

Small changes in modules may also require new tests to be added to the pipeline.

These challenges often lead to surge in the pipeline which need to be manually fixed. Thus one goal of this thesis is to implement a re-ordering algorithm that will aid in optimally sorting the tests by honoring the dependencies.

(15)

2. Feedback time for Failed tests

Another challenge found in software testing pipelines is the feedback time for failed tests. It turns out there is a high likely of tests failing when it comes to integration testing. Often a number of tests fail and the developer does not hear about the failed test until the complete set of tests in the pipeline is completed.

Moreover what if the tests which have no likely hood of failing are later in the pipeline. The consequence is that these tests might have had a higher chance of being executed without obstacles have they been earlier in the test pipeline.

Logically it maybe possible to arrange a large number of tests based on priorities, but that wouldn’t be an automated test system. That is where this thesis is relevant. One of our goals as mentioned is to calculate the likely hood of a test failing based on some derivatives that could be applied to a re-ordering algorithm.

A rule re-ordering algorithm elected to be used in this thesis has been implemented previously as a Firewall Rule re-ordering algorithm. The algorithm utilizes the probability of failure of each test and their dependencies to determ- ine whether two tests are eligible for swapping. Furthermore the algorithm calculates the mean time before failure to decide whether an eligible swapping would lead to a gain in optimization. The algorithm and it’s qualities are further explained in the next chapter.

Overall the perception is that such an algorithm would be assistance in aligning the tests with a higher likely hood of failure first without breaking any integrity levels. Furthermore, having an instant feedback system for failed tests for every user would reduce the time wasted on waiting for the blame game.

In account of the definition of "automation" it maybe stated that the discipline of software testing has not reached full automation yet, however this paper aims to provide a step forward by proposing an algorithmic optimal test pipeline.

Through this approach the goal is to answer the real world question - "How much optimization can be achieved through automation? While at the same time preserving the integrity of the pipeline"

1.1 Problem Statement

Our problem statement aims to enhance the experience of developers with early feedback while also improving the automation of the test pipeline.

The problem statement for this thesis is;

To implement a re-ordering algorithm in the test automation pipeline that could aid in optimizing the order of probable failed tests while respecting test dependencies and catering faster feedback to developers in order to facilitate optimal test automation.

(16)

Based on these grounds as mentioned before, there are two main focus of this research;

• To optimize the tests in the pipeline while honoring test dependencies

• To provide early feedback for failed tests

There-ordering algorithmas mentioned before is adapted from the RR-algorithm [13] used to sort a firewall’s rule order based on each rule’s matching probability, dependency relationships, and firewall position thereby ensuring that the average packet matching time is reduced. The intent here is to apply the same methodology on the software tests in the pipeline to ensure an optimal test order for all tests.

Thetest automation pipeline in this context refers to the test automation stage of the Continuous Delivery (CD) pipeline. Through this stage an improved version of a software is thoroughly tested to ensure it meets all the system qualities of the desired software. Important aspects validated at this stage include performance, compliance, functionality, security etc.

The termoptimizinghere refers to ordering the system tests based on their dependency relationships, individual departmental user commits and test position in pipeline to ensure reduction in average time for execution of failed tests.

Probable failed tests refers to the tests that are likely to fail depending on the optimization rules. The goal is to execute the tests with high probability of failure first.

Faster Feedback means instant feedback to developers regarding failed tests which were caused due to their code. This will ensure that developers receive feedback as soon as their code’s test fails thus allowing them to fix their bugs faster saving an impact-ful amount of company time wasted while the system is being "tested".

Optimal Test Automation is the refined revenue that this research aims to provide large businesses looking to automate their test pipelines.

Theimplementation of the algorithm would show possible benefits of the use of a sorting algorithm in a way that the complexity of the process of managing software tests are mitigated.

1.2 Summary of Results

Having decomposed the problem statement here are the sub problems that the thesis concentrates on solving;

A How suitable is the RR-algorithm for Software Testing

(17)

B Re-order tests based on their probability of failure while also honoring dependencies

C How is the impact different when using a different parameter other then probability of failure?

D What is the average impact on discovering mean-time before test failure when applying the RR-algorithm on the Software testing pipeline?

E Can the Algorithm be modified to make it more beneficial for the Software testing pipeline?

A. How suitable is the RR-algorithm for Software Testing

The implementation of the algorithm into the software testing pipeline seemed like a sustainable experiment for the continuous integration pipeline. There has been successful conclusions from the proposed solutions.

B. Re-order tests based on their probability of failure(POF) while also honoring dependencies

Selecting the parameters most suitable for re-ordering of tests is one of the most subtle challenge in test automation. This thesis uses the RR-algorithm to discover mean time before failure (DMTBF) which relies on the POF and the current position of the test in question. Solving this issue has been marginal, however having sufficient number of tests with higher probability of failure executed early in the test pipeline while still being able to honor the dependencies has been a satisfying improvement.

C. How is the impact different when using a different parameter other then probability of failure?

The prospect of being able to discover failure early in the integration process allows earlier feedback to developers for failed tests. Other parameters such as cost of test or test execution time before failure could be future prospects for the algorithm as well.

D. The average impact of discovering mean-time before test failure when applying the RR-algorithm

The average impact computed is quite progressive as the number of tests and their dependencies increases. There is on average a 20% decrease in time from using the adapted test reordering algorithm for reordering.

(18)

E. Can the Algorithm be modified to make it more beneficial for the Software testing pipeline?

The algorithm seems to have a positive impact on the optimization process and substituting a different parameter such as the cost with the probability of failure showed significant positive results as well. Other parameters such as execution time of each test could be a favorable choice too.

1.3 Thesis Outline

This section will give an outline of the rest of the thesis layout.

Chapter 2 - The Background chapter discusses the backdrops behind this thesis, the cultural domain of the problem statement and past researches related to software testing, test automation, algorithms used for test optimization and other related topics. Refer to tables at the end of the background chapter to see a compilation.

Chapter 3 - The Approach chapter dissects the problem statement and how this research intends to approach the rule re-ordering algorithm in-order to optimize tests in a test automation pipeline. Furthermore, this chapter lays out the plan for a proposed prototype and how the prototype shall be measured and speculated. Additionally it also gives an insight into alternative methods discussed during the scope of the thesis.

Chapter 4 - Results: Designing the Algorithm. This chapter has the details on the design and development of the algorithm and all it’s functionaries.

Chapter5- Results: Implementation and Experiments chapter showcases the implementation of the algorithm in a test automation pipeline. The chapter has code snippets of the functions used for the development as well as experimental data, results and resulting graphs.

Chapter 6 - Analysis: Experimental Analysis as the name suggests contains the analysis established from the experiments.

Chapter7- Discussion: This chapter discusses the assumptions made, obstacles faced, conceptions grounded, over all viewpoint established via implementing the idea behind this thesis.

Chapter 8- Conclusion: This chapter at the end summarizes the attainment and maturity of the thesis and further prospects of this thesis.

(19)

Chapter 2 Background

This chapter will give an overview of the cultural environment around software engineering, past researches related to software testing, test automation, algorithms used for test optimization and other related topics. Followed by the research paper discussions is an introduction to the candidate algorithm for this thesis.

2.1 Software Testing Foundations

This section gives an overview of the foundations of software testing and how they are relevant to this research. To understand this we look at how Software Engineering (SE) is the domain that manages the development and delivery of software while ensuring it is "flawless" and ambiguous to an organizations needs.

This means, testing is an essential part of software delivery. Also study shows that testing consumes between 30 to 50 percent of the software development time. To overcome that, Test Automation is widely considered to reduce time and cost of software delivery in general.

There is often the reverence, "What exactly consists of Software Testing?" and

"Why is it so important?" and "How much Testing is enough?". These questions are often asked from an organizational perspective who have other concerns related to feature requirements, costs, delivery time, etc.

To understand software testing logically we turn to the ISTQB [5] definition of Software Testing - which breaks any analogy into two parts; The definition starts with a description of testing as aprocess and then lists some objectives of the test process.

(20)

With that in mind from a software systems context, We know that a software system consists ofseveral processes at every stage of the software development life cycle (SDLC). All these tasks require planning, preparation, evaluation and testing other related products such as operations, training etc.

The second part of the definition covers theobjectives of testing and why it needs to be done. This means testing for specified requirements, testing for defects and lastly demonstrating the over all performance of the product.

All this in objective forms the seven principles of software testing;

1. Testing shows presence of defects: Testing reduces the probability of undiscovered bugs but doesnot provide proof of accuracy.

2. Exhaustive testing is impossible: Testing every simple tiny combination and conditions is impossible.

3. Early Testing: Early Testing allows focus on ticking of objectives.

4. Defect Clustering: It is widely interpreted that most of the bugs are usually concentrated in a selected number of modules.

5. Pesticide paradox: Repetitive testing using the same set of tests cannot help identify new bugs in a continuous delivery pipeline.

6. Testing is Context Dependent: Different modules need different set of tests.

7. Absence of errors fallacy: Sometimes fixing errors does not guarantee stability of end product if the product does not meet the users expecta- tions.

There has been numerous researches in the field of software testing, test automation and continuous integration and delivery pipelines. Past researches in the field have discussed automation, optimization, search-based, sorting and-or partitioning of test cases, stages of the SDLC etc.

To get a clear understanding of the test designing architecture and where it crosses path with operations, an observation of recent researches on test design shed light on a research on design principles in test architecture [14]. The research showed that Test Architectural Design comes in two architectural forms;

Test System Architecture and Test Suite Architecture.

(21)

Figure 2.1: The Test Architecture

According to their study this research is a combination of both. As observed from the figure above, this research aims at achieving an optimized test ordering in the test suite architecture and simultaneously enhances the system architecture by providing a mechanism to deliver early feedback for failed tests to developers.

2.2 Software Testing and Continuous Delivery

This section discusses the relationship between software testing and continuous delivery as a service. The journey of code starts from the code editor to version control, continuous building, testing, and finally deploying of applications and services on the web.

Figure 2.2: Scope of CI and CD

The above image shows a typical workflow in the scope of Continuous Delivery.

Looking deeper at a typical workflow from code to deployment, it starts with a developer committing their work in a central version control system such as git.

The commits are detected via a web hook which can be an integrated plugin with github or a script written for automation. The webhook detects the changes and they are updated into a test environment where the new changes are to be

(22)

tested. The test environment is then built and the code is tested immediately.

If the test built is successful, the developer is informed of the success. Code is then committed to deployment environment and built automatically. If the test is not successful, the developer is informed via mail or sms or an automated issue is created on github.

This is a typical workflow in the continuous delivery pipeline as described. The workflow represented here shows the use of git and jenkins in test automation which are widely used in the industry for automation.

Figure 2.3: Typical CI Workflow

Therefore from a birds eye perspective, the scope of the journey starts with continuous integration (CI), then adds continuous delivery (CD), and finally ends up with continuous deployment as shown in figure2.2.

The scope of this thesis lies within continuous delivery and therefore a definition of CD is in order.

Continuous Delivery is the implementation of automated software development methods. The two most important pieces of continuous delivery is defining and validation of criteria’s for code in each environment from development to production, secondly to ensure that it is doing continuous delivery. The latter is just as crucial, as this criteria defines what code goes into production.

It is in these criteria where the boundaries of development-testing-production meet. To achieve continuous delivery code needs to be able to flow uniformly through the three environments. DevOps is what provides confluence between these environments. DevOps as the name suggests is Development and Opera- tion Team collaborationg their code and making it work but in between them there is the Quality Assurance Team (QA) known for managing and assuring testing. That is where Software Testing fits in the DevOps paradigm, between development and production.

With growing research in the field of DevOps, Software Test Pipelines play a tremendous importance in research of software delivery and deployment. Based on such growing interests, presented in the next sections are, discussions of algorithms and re-ordering of test cases in the test pipeline. Followed by the

(23)

last section discussing strategies leaders in the industry have used over the years to mitigate their challenges.

2.3 Research on Optimization Algorithms

This section discusses related researches on software test cases and test pipelines using heuristic algorithms. The different types of algorithms used to optimize software testing at different stage of the test cycle.

2.3.1 Test case Optimization

A recent research paper [20] proposes a strategy that achieves optimization in terms of the total number of test case executions needed to detect all faults.

Their approach implements algorithms on different partitions (with the aim on test case prioritization) containing tests where each partition contains test cases of different priorities based on detecting failure, cost of debugging a failure etc.

There has been several studies on partitioning software tests dynamically and offline. This particular research experimented with 12 algorithms to prove their case.

Further readings on this approach of partitioning hints at the efficient quick sort algorithm which uses the approach -divide and conquer by dividing a large data set into two smaller data sets: low elements and high elements. Then it recursively sorts the data sets.

This thesis has a similar approach to using a sorting algorithm to achieve optimization. But rather then sorting test data sets to run different test sets each time, the proposed approach is to accept that all the tests need to be run.

This does not fit the principle that exhaustive testing is not possible but the highlight is that some tests can be prioritized over others based on developer activities such as their probability of failing some tests.

Thereupon, this paper converges on two other principles of Software testing;

Early Testing and Defect Clustering.

Early Testing The whole reason behind testing is to detect defects. Therefore having early testing in the Software Development Life Cycle ensures defects are detected early based on the goals to be reached.

Defect Clustering Often during testing, it is observed that a large number of defects are found in a small percentage of the software module. It means that the defects are not spread all across the application but rather a small section.

From thePareto principle, it is known that about 20% of the software module contains 80% of the bugs.

(24)

With these two principles in main perspective, this thesis argues that Early testing is only effective if developers can have earlier feedback for their failed code.

Moving on to defect clustering, the aim is to highlight the fact that if software test cases are sorted individually for developers based on their probability of failing a particular test. This would prioritize probable defected codes testing first.

Refereing back to the paper[20] discussing partitioning, there was an interesting reference to a rather old paper [8] that features one of the seven principles of testing - "Exhaustive testing is impossible". Their suggested solution was to keep the tests in the test set as different as possible each time a product was re- tested, thereby attempting to maximize the test effectiveness. The logic behind that summarized in simpler words - An experienced tester would select new tests added to test newer functionality and the objective of this paper was to present an approach that allows such form of intelligent automated test generator.

In other words, anti-random testing attempts to reduce the number of test cases run before the first failure can be revealed. Therefore the aim toreduce the number of test cases run before a failure is revealedis a goal for this research along with providing instant feedback for those failed test. Even though today test sprints are smaller, systems have continued to increase in size and complexity. New tests are continuously added and out-dated tests removed from the pipeline. In some prospects, test automation allows automatic execution of newly developed test cases as soon as they are added. A known practical advantage of test automation is consistency and re-producibility. Furthermore automated test pipeline allows multiple runs of the tests with the exact same conditions thus accommodating proper debugging of “random” failures. How- ever, it must be taken into account that an improper layout or ordering of tests can make automation impossible thus an understanding of the particular system and its component is essential. Any proposals of test automation may differ based on a system.

2.3.2 Automation and Machine Learning

In the domain of automation today we talk about Machine Learning. The paradigm shift towards Automated Machine Learning is happening and there is a lot of buzz around choosing algorithms related to hyper parameters that can be tuned. Even though in this case automated machine learning is beyond the scope of a short thesis, it shall be pointed out that having a test system that can compile it’s needs and optimize itself would be ingenious.

There has been numerous research and survey on automated machine learning test case development. In other words, the system writes test cases based on the changes made to the system and it’s past experiences with testing the system.

(25)

There has been research on test generation models [12] which showed that reach- ability can only be solved in exponential time. Which in reference speaks of the principle that it is nearly impossible to test every detail of a system even through automation (Exhaustive testing is impossible). Machine Learning is a large domain and has a lot of scope when discussed from the perspective of automated machine learning in software testing but since that is not the centre of this research, a shift towards algorithms used in software test automation brings more scope for this research.

2.3.3 Search-based Algorithms

There has been a lot of Search-based software testing approaches in which automation was easily the solution for software testing problems using optimization algorithms.

Reviews on common search algorithms [10] and some of the classic testing problems to which the search-based approach was applied showed how they used materialistic optimizing search techniques - genetic Algorithms to automatically or partially auto-generate test data. Over all, they showed that problem-specific fitness function guides the search to good solutions from a potentially infinite search space, within a practical time limit.

Another paper [11] by the same author published more recently discussed as the title says, automated search based algorithms and how to shift from code coverage to fault coverage. They agree that exhaustive testing is impossible.

They also highlight how most search based software testing architectures concentrate on code coverage that is to say that how much of the source code is exercised for faults when a particular test suite is run. They introduce that a fault coverage system would be much cheaper to execute compared to a code coverage system. They also talk about how fault databases are available for open source projects today. They believe present search based software engineering techniques could be used for search based software testing to learn the idea behind revealing faults in systems. Finally, they propose that using genetic programming to mutate test requirements based on faults found while testing is a choice of research at this time.

An interesting discussion [1] on search based algorithms and fitness functions was found. Though the paper was not exactly relevant to the field of this thesis, the concept of the importance of normalizing branch distance seemed like relevant information in relation to dealing with test dependencies which is a relevant parameter of this research. The paper discussed how test data generation based on code coverage lead to optimization problems. Further more they discuss fitness functions and normalizing functions as part of search based testing. They propose a new function related to normalizing functions for search based algorithms that has lower computations cost. From the perspective of this research, the concept of how branch distance is normalized and computation cost

(26)

is reduced convey the impression of how test dependencies and their dept play an import role in cost computations.

This discussion on search based testing for optimization sheds light on how this thesis could use a similar analogy of optimization but with an advanced sorting algorithm - the Rule Re-ordering (RR) Algorithm [13] discussed in the next section here.

2.3.4 The Rule-Reordering Algorithm

The RR algorithm uses as its foundation a bubble sort like algorithm to sort the test cases. But rather then using just any sort-based algorithm, the RR algorithm has the specialty of working with dynamic traffic data as well as it’s ability to sort based on a swapping rank until the list is optimal. The Algorithm was originally adapted and implemented in the dynamic ordering of firewall rules where the algorithm takes firewall rules as its input. It takes into consideration a set of rules that states every rules preceding and succeeding nodes. With dynamic data traffic also comes a lot of potential conflicts, thus the algorithm takes a heuristic approach which resolves the conflicts in the best practical manner to reach a suitable point of optimization. But it is universally known that greedy algorithms can be faster because they visit every decision only once while heuristic algorithms tend to revisit their choices. In this case, a heuristic approach is the choice as double validation is what will provide the system stability and compliance. Below is a presentation of the algorithm and it will be further described for software testing in more detail in the Approach 3and results4chapter.

(27)

Algorithm 1:A rule re-ordering algorithm Data: A list of firewall rules

Result: A new and improved ordering of firewall rules

1 forrx in rules do

2 ∆max= 0

3 forry in rules do

4 ∆new= 0

5 if rx! =ry then

6 if rx.pos < ry.pos then

7 if rx.pos < succeeding_max(ry)AND ry.pos > preceding_min(rx)then

8 if rx.prob < ry.probthen

9 ∆new= (ry.prob−rx.prob)∗(ry.pos−rx.pos)

10 if ∆max <∆newthen

11 ∆max= ∆new

12 end

13 end

14 end

15 end

16 else

17 if ry.pos < succeeding_max(rx)AND rx.pos > preceding_min(ry)then

18 if if ry.prob < rx.probthen

19 ∆new=

(rx.prob−ry.prob)∗(rx.pos−ry.pos)if∆max <∆newthen

20 ∆max←∆new

21 end

22 end

23 end

24 end

25 if ∆max >0 then

26 swap(rx, ry)

27 end

28 end

29 end

(28)

Below is an example of how the RR-Algorithm operated with Firewall rules.

Looking at an example of how the rules are re-ordered by the algorithm helps in understanding the working of the algorithm better.

Example of Rule Reordering Algorithm with Firewall Rules

The implementation of the RR algorithm required two files as input to be read.

The first file contains rule label and their dependency relationship and the initial packet matching probabilities. The precedence_relationships has the tests that should precede that that. The second file contains the actual rules with the rule label as a comment.

File 1

name,probability,precedence_relationships A,0.75,B,C

B,0.97,D C,0.99 D,0.68

File 2

iptables -A FORWARD -i eth0 -p udp -s190.0.0.0/8 --dport90-m state / --state NEW -j ACCEPT -m comment --comment"A"

iptables -A FORWARD -i eth0 -p udp -s190.1.1.0/24 --dport90:94 - m state / --state NEW -j DROP -m comment --comment"B"

iptables -A FORWARD -i eth0 -p udp -s190.1.2.0/24 -m state / -- state NEW -j DROP -m comment --comment"C"

iptables -A FORWARD -i eth0 -p udp -s190.1.1.2 --dport99-m state / --state NEW -j ACCEPT -m comment --comment"D"

(29)

Example of Rule Reordering Algorithm with Firewall Rules (Contin- ued)

The–commentis the label for each rule that corresponds to that rule in the first file. When the rule re-ordering algorithm is given these two files, the algorithm uses the first file to choose which rule needs to be swapped and which. Then makes the swaps in the second file. The algorithm would produce the following result from the original list. The table below shows the un ordered set on the left and the ordered set on the right.

un-ordered rules ordered rules A,0.75,B,C C,0.99

B,0.97,D D,0.68

C,0.99 B,0.97,D

D,0.68 A,0.75,B,C

Table 2.1: Mapping un-orderd rule set ordered rule set

SinceRule C has the highest matching probability and has no dependencies it is moved to the top. Next Rule D is placed because Rule B is dependent on Rule D. Rule Rule B comes right after as the remaining Rule A is dependent on Rule B and Rule C. Therefore the sorted list as presented. The resulting optimally sorted rule list would be:

Sorted Rule List

iptables -A FORWARD -i eth0 -p udp -s190.1.2.0/24 -m state / -- state NEW -j DROP -m comment --comment"C"

iptables -A FORWARD -i eth0 -p udp -s190.1.1.2 --dport99-m state / --state NEW -j ACCEPT -m comment --comment"D"

iptables -A FORWARD -i eth0 -p udp -s190.1.1.0/24 --dport90:94 - m state / --state NEW -j DROP -m comment --comment"B"

iptables -A FORWARD -i eth0 -p udp -s190.0.0.0/8 --dport90-m state / --state NEW -j ACCEPT -m comment --comment"A"

Thus this is how the Rule Re-ordering algorithm reorders rules, by taking into consideration the rule dependencies and their probability of failure. Before making each swap the algorithm calculates the swapping rank for the swap and if it is more than the last swapping rank, the swap is made. This example was a short example thus the swapping rank impact is not noticed. However it is more noticeable when the list is large.

(30)

2.4 Cost-based Optimization

Another observation of a simple cost model for comparing test generation tools in the test generation model [2] argues that there is no best combinatorial tests generator. They argue that based on a system the cost of executing a single test could affect the choice of tools. Moreover they propose that the combinatorial model should play a role in the choice of a test generator. They computed a data-mining process that was able to produce two decision trees to make the choice. Experimental results showed that the decision trees efficiently helped in the test generation process with a significant reduction of the total testing cost.

But they observed that their predictor model took sometime to compute the data, which questions the trade-off between time-saved in choosing a generator and time consumed by the predictor model for computation.

Another really interesting paper [18] that also discusses test cases at the integration testing phase. In particular their topic focused on the cost benefit analysis of using the dependency knowledge. In fact the research was aimed at the problem in selection and ordering of test cases in the integration testing phase. They argued that many large embedded systems still do integration testing manually to avoid complex disasters. They used an online decision support system to select and order test cases based on the results from the last execution of test cases. Then they studied the economic efficiency of testing by embedding a customized calculation of return on investment (ROI) in the system. There experiments showed a positive impact of reordering tests using this method. The paper concluded that cost was a relevant parameter in the ordering of tests. This paper was an interesting read as it has a similar approach as this thesis though they only considered one parameter - cost.

Lastly one more paper was reviewed in the field of cost analysis for test automation. This paper [3] focused on finding the degree of test automation needed for a given project. The simulation model they designed was for software testers to evaluate the performance of test cases with different scales of test activities simulated in the system. This paper had an overview of the system models, its usage and sample simulation experiments that were used to demonstrate the model. This paper was based on one of their earlier case studies where they proposed a simulation model for a software company in Calgary, Canada using the system dynamics modeling technique. there end product was also to compute the ROI of test automation. They compared the performance and ROI of ranges of tests from fully automated tests to fully manual tests to identify the most suitable level of test automation for a given project or context.

Based on the above study it is noticeable that even though the software testing domain has a lot of research on automation and optimization of test pipelines there is no clear or direct solution for this problem. Test cases are very specific to the need of the software in development therefore it is not possible to design a one for all solution to the problem statement defined earlier. From the background study it is clear that it might be impossible to reach maximum optimization

(31)

even if the algorithm is capable because certain priority related choices have to be made based on the industries personal gains.

2.5 Industrial Optimization and Feedback Con- trol

Looking specifically into the search sector of DevOps and Continuous Delivery, there has been quite a number of industrial surveys on optimizing software test pipelines. However they have a different approach to the one this research focuses on.

An online companys’ case study [4] summarized their experience from more than six years and showcased their technical and organizational challenges in estab- lishing and maintaining an effective continuous delivery pipeline with automated testing as part of the pipeline. They also had another publication [17] the year before that discussed the Practical Challenges in Test Environment Manage- ment. One of the challenges stated in that paper is automation challenges: It highlights the risk of test cases altering the state of test environments. In other words, when the execution order is changed, side effects of one test case may influence the result of others. Thus to acknowledge that they suggest including a reset to initial state. They also suggested considering infrastructure as code, which according to them is supported by configuration management tools such as Puppet, Chef, Ansible etc. Research in these practices are slightly uncommon but a survey of Operations teams discussing and presenting their work sheds light on how they have improved their continuous delivery by using Puppet to monitor the testing code infrastructure is a smart choice in complex systems as puppet can pick up exactly where code has changed which makes monitoring changes continuous followed by an optimized testing process that provides early feedback makes automation a success.

To argue why tests and early test feedback is a great Learning Source we looked at this old brief research report on "Tests and Test Feedback as Learning Sources" [9]. It is not a software testing research paper, rather it is a discussion on the influence of test taking and feedback in promoting learning in students.

I found this paper relevant to point out that getting early feedback right after committing code allows developers to fix their bugs faster as the code is prob- ably still fresh in their mind and detecting their own faults for the particular bug might be easier. The paper argues that with immediate feedback and I quote from the paper, "it forces the learner to process and attend to the feedback"

which they believe can produce substantial gains on a later tests. Even though there may have been other factors involved, they proved that initially retrieved items were recalled better on the later test than items which were not retrieved earlier.

(32)

Moving on to see how early feedback relationships are with test systems - observation [15] shows it has a considerable amount of effect on this study - ss their experimental results indicated positive results in terms of coverage and error detection. They showed that feedback-directed random test generation can outperform systematic and un-directed random test generation.

This does not however directly indicate that this proposed experiment will produce positive results, but it does indicate that feedback directed systems in general has a positive impact on error detection and debugging.

Specifically if a large system scenario is considered, a feedback-directed testing procedure where both continuously tested and deployed applications with early feedback to developers based on their coding behaviours could quickly finds errors in the system as a whole. Large dynamic environments that require heavy amount of testing could benefit from such systematic techniques.

We came across a paper analyzing the effectiveness of detecting faults in Com- binational testing [19]. The main purpose of their research was to prove that combinational testing works best on Boolean specification testing. But that was not our main interest as we were more interested in seeing their approach to detecting faults and it’s effectiveness on their test system overall. How detecting faults and their approach to dealing with them is relevant in the field of testing was of interest to this thesis. The 3rd section of the paper proposed fault detecting probability which was of relevance to us as we checked out how they approached and calculated the probability. They introduced a number of failure causing schemas and then based on the lower bound of fault-detecting probability on the combinational test suites they introduced a theorem and proved their case through experiments.

Lastly, the Test automation of NFC IC’s [16] is a great motivation for future practical implementation of this thesis. They used Jenkins and NUnits for validation and verification of pre-test runs for their NFC IC development project.

The system is usable in two ways, A developer may do manual execution of tests on their desk before committing code and also may reproduce failed tests. The other is the automated approach to ensure all tests are run against the same device and configuration. The system in practice here is not of higher complexity, it is not highly automated either but their approach and implementation goes a long way.

(33)

2.6 Summary

Compiling the discussions above, there has been a certain level of abstractions in many of the researches that makes the thought of practical implementations slightly harder. However what was achieved through this background research is the core principles and possible grounds of further research based on the idea of this thesis. By taking into account the software test architecture and the complexity of implementing the algorithm, the idea has evolved overtime. At this point implementing a close to real world model and orchestrating tests on the model is the project ambition.

On the next page table3.1shows a summary of all the tables reviewed in this chapter

(34)

Category Publication 2.2 Software Testing Founda-

tions Foundations of software testing: ISTQB certific- ation [5]

2.2 Software Testing Founda-

tions Design principles in test suite architecture [14]

2.2 Software Testing and Con-

tinuous Delivery Reliable Software Releases through Build, Test, and Deployment Automation [6]

2.3Test case Optimization A cost-effective software testing strategy employ- ing online feedback information[20]

2.3Test case Optimization Anti-random testing: Getting the most out of black-box testing [8]

2.3 Automation and Machine

Learning Test generation from timed pushdown automata

with inputs and outputs [12]

2.3Search-based Algorithms Search-based software testing: Past, present and future [10]

2.3Search-based Algorithms Automated search for good coverage criteria:

moving from code coverage to fault coverage through search-based software engineering [11]

2.3Search-based Algorithms It really does matter how you normalize the branch distance in search-based software testing [1]

2.3Sort-based Algorithm Dynamic Ordering of Firewall Rules Using a Novel Swapping Window-based Paradigm [13]

2.4Cost-based Optimization Using decision trees to aid algorithm selection in combinatorial interaction tests generation [2]

2.4Cost-based Optimization Cost-Benefit Analysis of Using Dependency Knowledge at Integration Testing[7]

2.4Cost-based Optimization Cost-effective regression testing through Adapt- ive Test Prioritization strategies [18]

2.4Cost-based Optimization When to automate software testing? A decision- support approach based on process simulation [3]

2.5 Industrial Optimization and

Feedback Control Automated testing in the continuous delivery pipeline: A case study of an online company [4]

Feedback Control Practical Challenges in Test Environment Man- agement[17]

Feedback Control Paper[16]

Feedback Control Test automation for NFC ICs using Jenkins and NUnit [9]

Feedback Control Feedback-directed random test generation[15]

Feedback Control Why combinatorial testing works: Analyzing minimal failure-causing schemas in logic expres- sions [19]

Table 2.2: Compilation of Research Papers by Topic

(35)

Chapter 3 Approach: Planning and Design of Algorithm

To put the problem statement into practice an implementation of the Rule Re- ordering Algorithm [13] is needed.

By re-looking at the problem statement highlighted below are the words that have been ope-rationalized in the past and is the focus of this thesis.

To implement are-ordering algorithm¹in thetest automation pipeline² that could aid inoptimizing³ the order ofprobable failed tests⁴while respecting test dependencies and cateringfaster feedback⁵ to developers in order to facilitateoptimal test automation⁶.

By referring to the problem statement this thesis proposes an approach to design and implement a prototype of the RR algorithm that is adapted to software test automation. The approach involves using a re-ordering algorithm ¹ discussed earlier; the RR algorithm2 is able to dynamically re-order firewall rules based on each rules probability, dependency relationships and position of the rule in the firewall. Therefore it finds the most optimal swapping window which in turn reduces the packet matching time.

For this research the implementation of the algorithm will be approached from a test automation² perspective where the algorithm is already designed to re- order ³ based on probability of a parameter in our case probable failed tests⁴ while honoring all dependent tests to reserve the integrity of the pipeline. Fur- thermore the goal of this thesis is to pinpoint that optimal test automation⁶can be achieved by being able to provide faster feedback⁵ to developers for failed tests.

Therefore having an optimistic approach to adapting the algorithm for accommodating software tests, their dependent tests and their failing probability based

(36)

on user seems more promising. This further implies that other parameters can be considered for optimization based on past research which may help the RR- algorithm to fit in the software testing paradigm better.

3.1 Objectives

The objective here is to achieve the two main goals of this thesis and therefore some assumptions have been made about systematic environment for implementation. The two main goals are:

• To optimize the tests in a test automation pipeline and

• To provide early feedback for failed tests.

To back up these two goals, two main assumptions have been made about the system for this research, which is it has to be aprofessional andautomated test environment.

1. Automation: An automated test environment means that every step after a developer commits code is automated. Meaning there is Continuous Integration (CI) in place. In order to automate the testing process, manual interference must be avoided.

2. Professional test environment: By professional test environment referring to monitoring and delivering reports, logs and charts to be able to visualize changes.

Moving forward in this chapter, the approach being used to implement the algorithm to suit this project is discussed.

3.2 Defining Workflow of Software Testing

This section discusses the workflow, from development to testing and the tasks that need to be covered. It is a known fact that not all industries follow the same pattern for test automation pipeline. Every industry has their own suited workflow for testing. However these pipeline are not considered to be unique. They have a standard workflow which is reassembled to suit industrial needs. There- fore this section will discuss the standard workflow and the proposed workflow suited for this thesis. There are two main subsections below, one discussing what the standard workflow in the industry looks like and the next section discusses the proposed workflow for this thesis.

(37)

3.2.1 Typical Workflow for Automation Testing

As described back in the background chapter what the typical workflow in an auotmation pipeline looks like, below is a short reminder before moving on to describe the proposed approach to applying the prototype.

1. It all starts with a developer committing their work in a central version control system.

2. The latest changes are detected by the git server. The changes are committed into a test environment for testing.

3. The latest changes are detected by the git server. The changes are committed into a test environment for testing.

4. The test environment is built and tested immediately.

5. If the test built is successful, the developer is informed of the success.

6. Code is then committed to deployment environment and built automatically.

Now the environment described above can be quite different based on the or- ganization, their requirements and priorities. Thus this research will involve developing an imitation of a real continuous delivery pipeline scenario which will resemble on average any continuous delivery pipeline and the focus being exploring the implementation benefits of the algorithm in this scenario. Moving on in this chapter is the Proposed Workflow.

3.2.2 The Proposed Scenario

Below is a description of the proposed test automation scenario which will be used to prepare a design proposal in the next section for development and implementation of a prototype.

Being able to visualize the scenario will help propose a design which can be approached in a way that will mimic the test automation pipeline. The scenario proposed below has two sections; the front end and the back end.

1. The front-end : Assigning test sets to users

2. The back-end test : Sorting tests based on User behaviour.

(38)

The Front-End: Assigning test sets to Users

1. An imitation of user data commits could be saved in a file and fed to a detector script.

2. The detector script would then parse through the file to get the name of the user who committed and the revision number. Those data could then be passed on to another script called thetestpointer script.

3. Thetestpointer scriptwould check the user name against which test set should be started. The chosen test set will then be run for the particular user.

4. The latest changes are detected by the git server. The changes are committed into a test environment for testing.

5. The test environment is built and tested immediately.

6. If the test built is successful, the developer is informed of the success.

7. Code is then committed to deployment environment and built automatically.

The Back-End : Sorting tests based on User behaviour

1. A dataset of username and their probability of failing a particular test can be generated based on user behaviour data collected from user commits and their built results.

2. These user data can then be used to create separate user data test sets which can be fed to the algorithm for re-ordering.

3. The algorithm then re-orders each user data set separately and stores them. These sorted lists are used by thetestpointer script at the front end to direct user to its test set.

The figure below illustrates the front-end and back-end in one work flow to be able to visualize the flow of data. This illustration along with the description proposes the scenario on which the design will be based.

(39)

Figure 3.1: The Proposed Workflow

(40)

To make the proposed scenario easier to visualize below is an actual scenario described of how two users can prioritized based on their probability. The example shows example tests and how they would be re-ordered.

Proposed Scenario

Consider two 2 users working on two different piece of code.

User 1: Trisha is working on adding some new features to the security authen- tication system.

User 2: Simon is working on fixing some bugs in the user interface design.

The typical workflow for their integration testing with their dependencies in place is as follows:

test name dependency

1 network_tests -

2 database_integration_tests network_tests

3 security_tests network_tests

4 user_interface_tests -

Firstly the idea is to collect data on each user’s test compilation results and collect their individual rate of failing each test. Therefore in this case each user’s customized pipeline will look as follows:

User 1: Trisha - Security Engineer

test name percentage

1 network_tests 20%

2 database_integration_tests 40%

3 security_tests 60%

4 user_interface_tests 0.1%

User 2: Simon - Web Designer

1 network_tests 10%

4 user_interface_tests 80%

(41)

Anticipated Optimization Solution

Further on with the algorithm in place the goal is to optimize these lists such that the tests with higher rate of failure are at the top. Therefore an optimized list should as follow:

User 1: Trisha - Security Engineer

1 network_tests 20%

4 user_interface_tests 0.1%

User 2: Simon - Web Designer

1 user_interface_tests 80%

2 network_tests 10%

The goal of this research is to see the effects of optimizing a test pipeline. Then next step would be to create per user personalized ordered test sets making them most optimal that the user gets instant feedback for failed tests while still honoring the integrity of the dependent tests. This is achieved by first using training data sets to train the algorithm to create these optimized pipeline of test sets for each user. There test sets are expected to be further optimized based on changes from user behaviour.

3.3 Design Proposal

Firstly, the algorithm has to be re-written for software testing. Since it is known that the algorithm will take as input a list of tests with their position number, probability of the test failing and the precedent and subsequent tests. These inputs will be needed for initial algorithm testing and further computation.

Considering the two goals of this thesis as mentioned earlier ; (1) To use the algorithm to produce an optimal test pipeline and (2) Hence provide faster

(42)

feedback for failed tests, this thesis will concentrate on creating the back-end work flow as described in section3.2.2above. Therefore the planned work flow for this thesis will look as follows:

• A test set generator script to generate test sets as mentioned in back-end (3.2.2) description.

• A test re-ordering algorithm script which takes in the data sets and reorders the tests and stores them in a separate file.

• The sorted lists will be used for analyzing the impact of sorting tests, whether it is optimal or not.

• While some of the impacts and analysis will be done manually others may require running scripts. These will be further explained in the next section.

Over all, a number of scripts other then the algorithm needs to be written. The next sections will review the terminologies used for designing the scripts and the following section will contain a analysis of what scripts need to be written.

3.3.1 Proposed Scripts to be written

This section discusses the main scripts that may be written to help organize the work flow moving forward.

Test Generator Script

A script will be written that will generate a set of tests which can be used in testing the prototype of the algorithm script. The script shall generate a set of tests like the following;

testname, initial_position, probability_of_failure, pre_tests

Listing 1: Data Format

firewall_rule_test,5,0.54, networktraffic_test, load_test

Listing 2: Sample Data

Test Re-ordering Algorithm Script

A separate script will be written for the test re-ordering algorithm which will take as input data in the format as shown in listing2 above. The re-writing

Investigating the Rule Re-ordering Algorithm in order to optimize Software Testing Pipelines

Investigating the Rule Re-ordering Algorithm in order to optimize

Software Testing Pipelines

To provide faster feedback to developers Syeda Taiyeba Haroon

Master’s Thesis Autumn 2017

Investigating the Rule Re-ordering Algorithm in order to optimize Software Testing Pipelines

Syeda Taiyeba Haroon

2nd January 2018

Acknowledgement

Abstract

Contents

Chapter 1

Introduction

1.1 Problem Statement

1.2 Summary of Results

1.3 Thesis Outline

Chapter 2

Background

2.1 Software Testing Foundations

2.2 Software Testing and Continuous Delivery

2.3 Research on Optimization Algorithms

2.3.1 Test case Optimization

2.3.2 Automation and Machine Learning

2.3.3 Search-based Algorithms

2.3.4 The Rule-Reordering Algorithm

2.4 Cost-based Optimization

2.5 Industrial Optimization and Feedback Con- trol

2.6 Summary

Chapter 3

Approach: Planning and Design of Algorithm

3.1 Objectives

3.2 Defining Workflow of Software Testing

3.2.1 Typical Workflow for Automation Testing

3.2.2 The Proposed Scenario

3.3 Design Proposal

3.3.1 Proposed Scripts to be written