A Hybrid of Bacterial Foraging and Differential Evolution -based Distance of Sequences

(1)

A Hybrid of Bacterial Foraging and Differential Evolution -based Distance of Sequences

Muhammad Marwan Muhammad Fuad

The University of Tromsø - The Arctic University of Norway, NO-9037 Tromsø, Norway

Abstract

In a previous work we presented a new distance that we called the sigma gram distance, which is used to compute the similarity between two sequences. This distance is based on parameters which we computed through an optimization process that used the artificial bee colony; a bio-inspired optimization algorithm. In this paper we show how a hybrid of two optimization algorithms; bacterial foraging and differential evolution, when used to compute the parameters of the sigma gram distance, can yield better results than those obtained by applying artificial bee colony. This superiority in performance is validated through experiments on the same data sets to which artificial bee colony, on the same optimization problem, was tested.

Selection and peer-review under responsibility of KES International.

Kewwords: Bacterial Foraging, Differential Evolution, Sigma Gram Distance.

1. Introduction

Similarity search is an important problem in computer science. This problem has several applications in many domains such as text, video and image retrieval, pattern recognition, bioinformatics, web search, fingerprint databases, and many others. The similarity search problem involves a collection of data objects D and a given query q. The task of this problem is to retrieve the data objects in D which are “close” to q according to some semantics of closeness. This closeness or similarity between two data objects is measured using a similarity measure or when satisfying certain axioms it is usually called a distance metric. The metric model of handling the similarity search problem has been widely used with different data types.

Time series data is one of the data types to which the metric model is frequently (but not always) applied to handle different tasks such as classification and clustering.

Time series are high-dimensional data, so there are different techniques to reduce their dimensionality, among these techniques symbolic methods have attracted particular attention.

The main distance used to compare two strings is the Edit Distance (ED) [1] which is defined as the minimum number of delete, insert, and change operations needed to transform string S into string T. However, this distance has its limitations because it considers local similarity only.

In [2] we presented a new distance metric, the Sigma Gram distance (SG) that is applied to sequences. SG uses parameters which we computed using an optimization algorithm called Artificial Bee Colony (ABC).

Bio-inspired, also called nature-inspired, optimization is a rapidly growing domain of research, and new algorithms emerge constantly. Yet these algorithms may have their own shortcomings in certain optimization problems. One of the new techniques that have been successfully used to overcome these drawbacks of optimization algorithms is to couple two optimizers to produce a hybrid one that has the advantages of each of the optimizers.

In this paper we use a hybrid of two bio-inspired optimization algorithms to compute the parameters of SG and we show how the hybrid algorithm outperforms the aforementioned ABC algorithm.

The rest of this paper is organized as follows: Section 2 presents related work. The hybrid algorithm is introduced in Section 3, and the comparison with ABC is conducted in Section 4. We conclude this paper in Section 5.

2. Related Work

Strings, also called sequences or words, are a way of representing data. This data type exists in many fields of computer science such as molecular biology where DNA sequences are represented using four nucleotides which correspond to the four bases: adenine (A), cytosine (C), guanine (G) and thymine (T). This can be expressed as a 4-symbol alphabet. Protein sequences can also be represented using a 20-symbol alphabet which corresponds to the 20 amino acids. The edit distance [1] is the main distance that is applied to compute the similarity between

(2)

two strings. It is defined as the minimum number of delete, insert, and substitute operations needed to transform one sequence S into another sequence T.

In a previous work [2] we presented an extension of the edit distance, which is based on the sum of n-grams.

The proposed distance SG is defined as follows:

LetΣbe a finite alphabet, and let Σ^*be the set of strings on Σ. Given n, Let _a⁽^S⁾

f n be the frequency of the n- grama_ninS, and _a⁽^T⁾

f n be the frequency of the n-grama_nin T , where S,T are two strings in

Σ

^*. Let Nbe the set of integers, and N⁺ the set of positive integers.

Let g:N⁺×Σ^*→N

g

( )

n,S =n if 1≤n≤ S g

( )

n,S = S +1 if S <n Then SG is defined as:

( ) _∑

⁽ ⁾

( ) ( ) _∑ (

^{( )} ^{( )}

)

= ∈ 









 + − − + − ⋅

=

T , S max

1 n

T a S a A a

n _n _n

n n

f , f min 2

2 T , n g S , n g T S . T

, S

SG λ (1)

whereS ,T are the lengths of the two strings S,T respectively, and where λ_n∈R⁺∪

{ }

0 .

Determining the value of the parametersλ_n is not a trivial task. In [2] the value of λ_nwas calculated as the output of an optimization problem. The optimization algorithm we used was artificial bee colony (ABC) [3], which is a bio-inspired optimization algorithm based on the foraging behavior of bees. In ABC each food source represents a potential solution of the optimization problem and the quality of the food represents the value of the objective function to be optimized. The control parameters of ABC are the population size (the number of food sources) pop_size , the number of cycles nr_cycles , and the number of trials of a certain food source max_nr.

3. A Synergy of Bacterial Foraging and Differential Evolution

Different optimization algorithms have different qualities. The principle of hybridization is to benefit from this fact by combining two optimization algorithms to obtain a new optimization algorithm that takes advantage of the strengths of the two methods. In this paper we are concerned with the hybridization of two optimization algorithms; Differential Evolution (DE) and Bacterial Foraging (BF).

3.1. Differential evolution

Differential Evolution (DE) is a bio-inspired optimization algorithm based on the principles of genetics and natural selection. DE is one the most powerful stochastic optimization algorithms for continuous parameters [4].

DE has the same elements as a standard evolutionary algorithm; i.e. a population of individuals, selection according to fitness, crossover, and random mutation. DE creates an environment in which a population of individuals, representing solutions to a particular problem, is allowed to evolve under certain rules towards a state that minimizes the value of a function which is usually called the fitness function.

As with other evolutionary algorithms, the first step of DE is defining the problem variables and the fitness function. The range of the variable values can be constrained or unconstrained. A particular configuration of variables produces a certain value of the fitness function and the objective of DE is to find the configuration that gives the optimal value of the fitness function.

DE has many variations, but in the following we present the classical DE. DE starts with a collection of randomly chosen individuals constituting a population whose size is pop_size. Each of these solutions is a vector of p dimensions and it represents a possible solution to the optimization problem. The fitness function of each individual is computed. In the next step for each individual of the population, which we call the target vector Ti

at this phase, three mutually distinct individuals: V_r₁ ,V_r₂

,V_r₃

, and different from T_i

, are chosen at random from the population. The donor vectorD

is formed as a weighted difference of two of Vr1

,Vr2

,Vr3

, added to the

(3)

third; i.e. D Vr1 F

(

Vr2 Vr3

)

− +

= . Fis called the mutation factor or the differentiation constant and it is one of the control parameters of DE. Fis usually chosen from the interval

[ [

⁰^,¹ ^.

The trial vector R

is formed from elements of the target vector T_i

and elements of the donor vectorD according to different schemes. In the following we present the crossover scheme presented in [5] which we adopt in this paper; an integer Rndis randomly chosen among the dimensions

[ ]

1,p . This guarantees that at least one of the dimensions will be changed. Then the trial vector R

is formed as follows:

( ) ( [ [ ) ( )





 + − < ∨ =

= t otherwise

i Rnd C

, rand if t

t F t t

j , i

r j

, i r

, i r , i r , i i

1

3 0

2

1 (2)

where i=1,...,p. C_ris the crossover constant, which is another control parameter. The control parameters of DE are determined by the algorithm designer.

The next step of DE is selection. This step decides which of the trial vector and the target vector will survive in the next generation and which will die out. The selection is based on which of the two vectors; trial and target, yields a better value of the fitness function.

Crossover and selection repeat for a certain number of generationsNrGen, which is the third control parameter of DE. Most algorithms add a stopping criterion, which terminates DE if met, even if NrGen has not been reached.

3.2. Bacterial foraging

Bacterial Foraging (BF) is an optimization algorithm which is inspired by the foraging behavior of the Escherichia coli (E. coli) bacteria. The principle of BF is that natural selection tends to eliminate animals with poor foraging strategies and either replaces them with others that have better foraging strategies or shapes them into ones which have these desirable strategies [6]. BF formulates this process as an optimization problem.

In the following we present a brief description of BF taken mainly from [7]; during foraging locomotion of E.

coli is achieved by a set of flagella which, when rotating in the clockwise direction, cause the bacterium to tumble, and when rotating in the counterclockwise direction enable the bacterium to swim. These two movements are known as chemotaxis. Figure 1 shows the tumbling and swimming chemotactic movements. The purpose of chemotaxis is to help the bacterium approach or avoid nutrient or noxious substance gradients.

Sudden environmental changes may destroy the chemotactic progress causing the elimination and dispersal of a group of bacteria.

Given a function f

( )

θ ;θ∈R^p (p is the number of parameters) to be minimized. BF finds the minimum of f by applying four mechanisms; chemotaxis, swarming, reproduction, and elimination-dispersal, which we will illustrate shortly, but let us first present a few definitions which are necessary to understand these mechanisms: a chemotactic step is a tumble followed by another tumble, or a tumble followed by a swim. Table 1 summarizes the symbols we are going to use to describe BF.

Fig. 1. Chemotactic movement: (a) swimming (b) tumbling

Flagella rotating counterclockwise: the bacterium swims Flagella rotating clockwise: the bacterium tumbles

(4)

Nb The number of bacteria in the population Nc The number chemotactic steps

Ns The swimming length

Nre The number of reproduction steps

Ned The number of elimination-dispersal events Ped The probability of elimination-dispersal

C(i) The size of the step taken in the random direction determined by the tumble

Table 1. The symbols used in the description of bacterial foraging

The position of each member of the population of Nb bacteria at the j^th chemotactic step, k^th reproduction step, and l^th elimination-dispersal event is denoted by

( ) { ( )

b

}

i j,k,l | i 1,2,...,N k

, j , i

P = θ = .

We now describe the four mechanisms we mentioned earlier in this section:

• Chemotaxis: Let ^θⁱ

(

^j^,^k^,^l

)

be the i^thbacterium at the j^th chemotactic step, k^th reproduction step, and l^th elimination-dispersal event, then the movement of the bacterium can be represented by :

( ) ( ) ( ) ( ) ( ) ( )

i i i i

C l , k , j l , k , 1

j T

i i

∆

∆ θ ∆

θ + = + (3)

where Δ is a vector in the random direction whose elements lie in the interval [-1, 1].

• Swarming: E. coli demonstrate a swarming behavior in that they travel in rings of bacteria which move up the nutrient medium when they are placed in the center of a semisolid matrix with a single nutrient chemo- effecter. When simulated by a high level of succinate the bacteria release an attractant aspartate which helps them aggregate into groups and thus move as a swarm. The cell-to-cell signal in the swam can be represented by the following function:

( )

( ) ( ( ) ) ( )

( )

∑ ∑

∑ ∑ ∑

= =

= = =























− −

−

+























− −

−

=

b

b b

N

1 i

p

1 m

i 2 m m repellant repellant

N

1 i

N

1 i

p

1 m

i 2 m m attractant attractant

i cc cc

exp . h

exp . d )

l , k , j , f l

, k , j P , f

θ θ ω

θ θ θ

(4)

where dattractant , ωattractant , hrepellant, ωrepellant are coefficients to be chosen by the algorithm designer.

The objective function f_cc

(

θ,P

(

j,k,l

) )

is to be added to the original objective function to present a time varying objective function in that if many cells come close together there will be a high amount of attractant and hence an increasing likelihood that other cells will move towards the group. This produces the swarming effect [6].

• Reproduction: Through this process the least healthy bacteria die out and the healthier ones will undergo cell division to produce two daughter bacteria. This guarantees that the swam size will remain stable.

• Elimination and dispersal: There might be a gradual or sudden change in the environment where the bacteria live. As a result, the bacteria in a certain region are killed or a group might be dispersed into another location. This has two effects on chemotaxis; the first is destroying the chemotactic progress. The second is that the new bacteria might be placed at locations with a better food source, thus assisting chemotaxis.

3.3. Hybridization of BF and DE

Compared with other bio-inspired optimization algorithms, BF possesses a poor convergence behavior over multi-modal and rough fitness landscapes. Its performance is also heavily affected with the growth of problem

(5)

dimensionality [8]. On the other hand, DE may suffer from stagnation; i.e. the inability of progressing towards global optima. DE may also suffer from premature convergence. To overcome these problems the authors of [9]

proposed an optimization algorithm, called Chemotactic Differential Evolution (CDE), which is based on hybridizing DE and BF by integrating some features from both of these optimizers. The experiments conducted have shown that CDE outperforms both DE and BF.

In CDE each trial vector first undergoes an adaptive computational chemotaxis. The trial vector is viewed as an E. coli bacterium. During chemotaxis, the bacterium which is close to a noxious substance takes a larger chemotactic step to move towards nutrient substances. Before each move, it is ensured that the bacterium moves in the direction of increasing nutrient substance concentration; i.e. a region with smaller objective function value.

After this, it is subjected to DE mutation. For the trial vector, three vectors, other than the previous one, are selected, one of which is added with a scaled difference of the remaining two. The produced vector probabilistically interchanges its components with the original vector. Offspring vector replaces the original one if the objective function value is smaller for it. The process is repeated several times over the entire population in order to obtain the optimal solution [9].

4. Performance Evaluation

The aim of our experiments is to compare the performance of CDE with that of ABC which we presented in [2] on the same optimization problem which is a classification task of symbolically represented time series.

A time series S is an ordered collection:

S=

{ (

t1,v1

) (

,t2,v2

) (

,...,tn,vn

) }

(5) where t₁<t₂ <...<t_n, and where v_iare the values of the observed phenomenon at time points t_i.

Time series data mining handles several tasks such as classification, clustering, similarity search, motif discovery, anomaly detection, and others. Time series are high-dimensional data so they are usually processed by using representation methods that are used to extract features from these data and project them on lower- dimensional spaces.

The Symbolic Aggregate approXimation method (SAX) [10] is one of the most important representation methods of time series. SAX is applied as follows:

1-The time series are normalized.

2-The dimensionality of the time series is reduced using PAA [11], [12].

3-The PAA representation of the time series is discretized by determining the number and location of the breakpoints. Their locations are determined using Gaussian lookup tables. The interval between two successive breakpoints is assigned to a symbol of the alphabet, and each segment of PAA that lies within that interval is discretized by that symbol.

The last step of SAX is using the following similarity measure:

( ) ∑ ⁽ ⁽ ⁾ ⁾

=

≡ ^N

i

i i, rˆ sˆ N dist

Rˆ n , Sˆ MINDIST

1

2 (6)

Wherenis the length of the original time series, Nis the length of the strings (the number of the segments),Sˆ andRˆ are the symbolic representations of the two time series SandR, respectively, and where the function

) (

dist is implemented by using the appropriate lookup table.

The objective function of the optimization problem of our experiments is the error of a time series classification task based on the first nearest-neighbor (1-NN) rule using leaving-one-out cross validation. This means that every time series is compared to the other time series in the dataset. If the 1-NN does not belong to the same class, the error counter is incremented by 1. The parameters of the optimization problem are λ_nin relation (1), in other words, we compute λ_nthat minimize the classification error using ABC as an optimizer (which is the optimizer used in [2] ) which we refer to as ABC-SG, and compare that with the optimal values of

λn when those λ_nvalues are obtained by using CDE as an optimizer in (1), which we refer to as CDE-SG.

(6)

In our experiments we used the same datasets on which ABC-SG was tested in [2]. These datasets are available at UCR [13].

As indicated earlier, the tested methods used symbolically represented time series This means that the time series were transformed to symbolic sequences using the first three steps of SAX presented earlier in this section, but instead of using MINDIST given in relation (6), we use ABC-SG (or CDE-SG). The parameters λ_n in relation (1) are computed using ABC (or CDE). This means, for each value of the alphabet size we formulate an ABC (or CDE) optimization problem where the fitness function is the classification error, and the parameters of the optimization problem areλ_n. Practically ncan take any value that does not exceed that of the shortest string of the two stringsS,T. However, in the experiments we conducted n∈

{

1,2,3

}

because these are the values of interest for time series.

The control parameters for CDE were the following; the number of bacteria in the population Nb is 20, the differentiation constant Fwas set to 0.9, and the crossover constant C_r was set to 0.5. The number of chemotactic steps Nc was set to 5, the swimming length Ns was set to 4. The number of reproduction steps Nre

was 4, the number of reproduction steps Ned was 2, and Ped , the elimination-dispersal probability, was set to 0.25. The dimension of the problem p is that of n. As for λ_n, their values are in fact unconstrained, but for simplicity we optimized them in the interval

[ ]

⁰^,² ^.

As for ABC, the control parameters were the same used in [2] ; the number of cycles nr_cycles was set to 20, and the number of trials of a certain food source max_nr was set to 10. As for the population size, it is the same as that for CDE.

For each dataset we first the apply the optimizer (ABC or CDE) on the training datasets to get the vector λ_n that minimizes the classification error on these training datasets, then we utilize this optimal λ_nvector on the corresponding testing datasets to get the final classification error for each dataset.

In Table 2 we present some of the results we obtained for alphabet size equal to 3, 10, and 20, respectively, which were the values on which ABC-SG was tested.

Beef

ABC-SG CDE-SG

n=1 n=2 n=3 n=1 n=2 n=3

α*=3 0.567 0.567 0.567 0.533 0.533 0.500

α =10 0.500 0.500 0.467 0.467 0.467 0.433

α =20 0.333 0.367 0.367 0.333 0.333 0.333

(*: α is the alphabet size)

Coffee

ABC-SG CDE-SG

n=1 n=2 n=3 n=1 n=2 n=3

α =3 0.571 0.393 0.357 0.393 0.357 0.357

α =10 0.214 0.286 0.214 0.214 0.179 0.179

α =20 0.143 0.071 0.143 0.143 0.071 0.071

ECG200

ABC-SG CDE-SG

n=1 n=2 n=3 n=1 n=2 n=3

α =3 0.190 0.210 0.240 0.180 0.210 0.220

α =10 0.200 0.220 0.220 0.200 0.210 0.210

α =20 0.230 0.230 0.260 0.220 0.220 0.250

(7)

Gun_Point

ABC-SG CDE-SG

n=1 n=2 n=3 n=1 n=2 n=3

α =3 0.193 0.193 0.180 0.180 0.180 0.146

α =10 0.146 0.127 0.133 0.133 0.127 0.120

α =20 0.087 0.073 0.073 0.053 0.053 0.067

FaceFour

ABC-SG CDE-SG

n=1 n=2 n=3 n=1 n=2 n=3

α =3 0.057 0.057 0.057 0.057 0.045 0.045

α =10 0.045 0.057 0.114 0.045 0.045 0.068

α =20 0.114 0.114 0.102 0.090 0.102 0.102

OSULeaf

ABC-SG CDE-SG

n=1 n=2 n=3 n=1 n=2 n=3

α =3 0.351 0.343 0.331 0.331 0.331 0.322

α =10 0.298 0.306 0.298 0.298 0.298 0.298

α =20 0.322 0.331 0.331 0.306 0.322 0.322

Table 2. Comparison between ABC-SG and CDE-SG

The results show that the classification errors of CDE-SG, and for all the datasets shown, are equal or smaller than those of ABC-SG and for all values of the alphabet size.

5. Conclusion

In this paper we applied a hybrid optimization algorithm; chemotactic differential evolution - CDE, to compute the parameters λ_nof SG distance, which minimize the error of a time series classification task. We compared this optimizer with another one; artificial bee colony- ABC, and we showed experimentally that CDE gives better results.

This paper shows the advantages of coupling optimization methods to produce a new hybrid method to improve the performance of stand-alone optimizers. However, the resulting hybrid method may have too many control parameters that more research should be conducted to reduce the number of these control parameters.

References

[1] Wagner, R.A., Fischer, M. J. : The string-to-string correction problem, Journal of the Association for Computing Machinery, Vol. 21, No.

I, January 1974, pp. 168-173.

[2] Muhammad Fuad, M.M. , 2012. “ABC-SG: a new artificial bee colony algorithm-based distance of sequential data using sigma grams”.

The Tenth Australasian Data Mining Conference - AusDM 2012, Sydney, Australia.

[3] Karaboga, D., 2005. “An idea based on honey bee swarm for numerical optimization.”. Technical Report TR06, Erciyes University, Engineering Faculty, Computer Engineering Department.

[4] Das, S., and Suganthan, P.N.: Differential evolution: a survey of the state-of-the-art. IEEE Trans. on Evolutionary Computation, Feb.

2011.

[5] Feoktistov, V.: Differential evolution: in search of solutions (Springer Optimization and Its Applications). Secaucus, NJ, USA: Springer- Verlag New York, Inc., 2006.

[6] Passino, K. M.: Biomimicry of bacterial foraging for distributed optimization and control, IEEE Control Syst. Mag., vol. 22, no. 3, Jun.

2002, pp. 52–67.

[7] Das, S., Biswas, A., Dasgupta, S., Abraham, A.: Bacterial foraging optimization algorithm: theoretical foundations, analysis, and applications, foundations of computational intelligence. Global optimization, studies in computational intelligence, vol 3, 2009, pp 23–

55.

(8)

[8] Biswas, A., Dasgupta, S., Das, S. and Abraham, A., 2007. “ Synergy of PSO and bacterial foraging optimization: a comparative study on numerical benchmarks”, Second International Symposium on Hybrid Artificial Intelligent Systems (HAIS 2007), Advances in Soft computing Series, Springer Verlag, Germany, Corchado, E. et al. (Eds.): Innovations in Hybrid Intelligent Systems, ASC 44, pp. 255- 263.

[9] Biswas, A., Dasgupta, S., Das, S., and Abraham, A.: A Synergy of differential evolution and bacterial foraging algorithm for global optimization. Neural New World, vol. 17, no. 6, 2007, pp. 607–626.

[10] Lin, J., Keogh, E., Lonardi, S., Chiu, B. Y., 2003. “A symbolic representation of time series, with implications for streaming algorithms”. DMKD 2003: 2-11.

[11] Keogh, E,. Chakrabarti, K,. Pazzani, M. & Mehrotra: Dimensionality reduction for fast similarity search in large time series databases.

J. of Know. and Inform. Sys. 2000.

[12] Yi, B,K., & Faloutsos, C., 2000: “Fast time sequence indexing for arbitrary Lp norms”. Proceedings of the 26st International Conference on Very Large Databases, Cairo, Egypt .

[13] Keogh, E., Zhu, Q., Hu, B., Hao. Y., Xi, X., Wei, L. & Ratanamahatana, The UCR Time Series Classification/Clustering Homepage:

www.cs.ucr.edu/~eamonn/time_series_data/ C. A. 2011.

A Hybrid of Bacterial Foraging and Differential Evolution -based Distance of Sequences