Experimental analysis in visual features generation for submarine images preprocessed with contrast enhancement techniques

(1)

T reba ll F ina l de G rau

GRAU D’ENGINYERIA ELECTRÒNICA INDUSTRIAL I AUTOMÀTICA

Experimental analysis in visual features generation for submarine images

preprocessed with contrast enhancement techniques.

CARLES COLL GOMILA

Tutor

Francisco Bonin Font

Escola Politècnica Superior

Universitat de les Illes Balears

(2)

(3)

C ^ONTENTS

Contents i

Abstract iii

1 Introduction 1

1.1 Project context . . . 1

1.2 Navigation issues in P.O. meadows . . . 2

1.3 Previous work . . . 3

1.4 Goal of the project . . . 3

2 Theoretical background 5 2.1 Visual odometry . . . 5

2.2 Two View Geometry . . . 6

2.3 Color correction and image contrast enhancement . . . 7

2.4 Feature detection and tracking . . . 9

2.5 Feature Matching for Loop Closing Detection . . . 10

2.6 Epipolar Geometry . . . 11

2.7 Fundamental matrix . . . 12

2.8 Homography . . . 13

3 Data generation for visual odometry estimation 15 3.1 The code . . . 15

3.2 Input data . . . 16

3.3 Parameters . . . 16

3.4 Output data . . . 17

3.5 Comments . . . 17

4 Data generation for loop close detection 19 4.1 The code . . . 19

4.2 Input data . . . 20

4.3 Parameters . . . 20

4.4 Output data . . . 20

4.5 Comments . . . 20

5 Display of results 21 5.1 analyse_GOPRO_dataset.m . . . 21

5.1.1 Definition of parameters . . . 21

(4)

5.1.2 Load data . . . 22

5.1.3 Plot matches and inliers . . . 22

5.1.4 Plot statistics . . . 23

5.2 PlotInliersComp.m . . . 23

5.3 analyse_LC_GOPRO_dataset.m . . . 23

6 Results for the odometry estimation 27 6.1 Results obtained from the raw and enhanced datasets . . . 27

6.2 Matches and inliers along the trajectory . . . 31

6.3 Datasets comparison . . . 33

6.4 Conclusions . . . 35

7 Results for the loop close detection 37 7.1 Raw dataset . . . 38

7.2 LCC dataset . . . 39

7.3 LOG3 dataset . . . 40

7.4 MAI dataset . . . 41

7.5 MSR dataset . . . 41

7.6 Matches and inliers along the trajectory . . . 41

7.7 Conclusion . . . 44

8 Conclusions 45 A Additional instructions 47 A.1 Running the code for odometry evaluation . . . 47

A.2 running the code for loop close detection . . . 48

B Code 49 B.1 Code for odometry estimation . . . 49

B.1.1 inliers_comp.m . . . 49

B.1.2 plotInliersComp.m . . . 50

B.2 Code for loop close detection . . . 51

B.2.1 analyse_LC_GOPRO_dataset.m . . . 51

B.2.2 loadFiles_LC . . . 52

B.2.3 generateFrameIdx . . . 53

B.2.4 plotLC . . . 54

B.2.5 plotPatches . . . 56

B.2.6 plotRateLC . . . 57

Bibliography 59

(5)

A ^BSTRACT

Autonomous navigation in underwater environments performed by an AUV is a crucial activity in the context of the ARSEA project as it seems to be the best approach for mapping P.O. colonies. The navigation is performed with a SLAM implementation that relies partially on Visual Odometry and Loop Close Detection, which are both based entirely on visual information registered with cameras. However, the inherent conditions of the P.O. environment difficult the extraction of visual features so it is crucial to find the best feature extraction algorithm if a precise navigation is desired.

This project focuses in the study of different possibilities for feature detection and tracking and also different image contrast enhancement algorithms that may improve the overall navigation performance. Given a dataset of raw images and another four datasets with the enhanced images, a software is used to run the feature tracking process given a detector and a descriptor. The resulting data is then analyzed in order to find the best detector and descriptor algorithms as well as image contrast enhancement algorithm.

The loop close detection process shares the same limitations as those in Visual Odometry as it also relies on visual information. Thus, a similar methodology is used to find the best algorithms with small differences because it uses feature matching which has more general visual constraints than feature tracking. This difference can lead to different results so an agreement may be needed at the end.

(6)

(7)

C

HAPTER

1

I NTRODUCTION

1.1 Project context

Posidonia Oceanica (P.O.) is an endemic seagrass that grows in the Mediterranean Sea and forms vast meadows. The conservation of these meadows is very important for the maintenance, development and stability of the Mediterranean subsea ecosystems they live in for four reasons: 1) they favor the deposition of new sediments and steady the unconsolidated ones, 2) they attenuate currents and wave energy, 3) they boost biodiversity being a source of food and refuge for numerous species, and 4) recent studies have demonstrated that these meadows absorb great amounts of carbon and release oxygen to the water, increasing its quality and transparency, and mitigating the climate change.

It is also known that during the last decades the extension of P.O. meadows has been declining, usually affected by human activities such as uncontrolled leisure anchoring, dragging fishing or the construction of industrial infrastructures. To prevent the declining from going further, a control over the meadows is needed. The control usually consists in measurements of the meadows extension and height in different moments to estimate its evolution. This control is typically done by divers, who install markers in the perimeter of the meadows and certain gauges inside. However, this process is slow, tedious, imprecise and limited by the autonomy of the scuba tanks. In consequence, new strategies are needed to monitor and control these benthic habitats, since their preservation affect directly the environment and critical activities for the tourism and fishing industries.

Some researchers have proposed to control P.O. colonies exploiting multispectral imaging provided by satellites [1]. Although satellite imageries have shown to be a good method to detect the borders of meadows in shallow waters, they are not applicable in deeper waters since their turbidity and density prevents light penetrating across the water column.

Other P.O. mapping classical techniques consist in analyzing bathymetries built with a Side Scan Sonar (SSS) attached to a vessel hull or to an underwater vehicle [2].

(8)

However, expensive and complicated logistics are needed to complete a mission of data collection.

Lately, lightweight AUVs (Autonomous Underwater Vehicle) equipped with diverse sensors (Side Scan Sonar [SSS], GPS, doppler logs [DVLs], inertial units [IMU] and specially cameras) have been proposed as an efficient, secure and fast tool to survey and collect data in marine habitats, and particularly suitable to explore and image colonies of P.O. [3]. This P.O. meadows mapping technique has become the most interesting, due to its autonomous and low cost performance, and the latest research works are focused in this particular technique being still an open and challenging research line. The work presented in this report is framed in the context of the ARSEA project, and focused on vehicle motion and pose calculation using visual data. The improved vehicle motion is critical to complete the robot mission with success and for the composition of visual 2D maps.

1.2 Navigation issues in P.O. meadows

The Submarine Robotics team from the UIB, inside the context of ARSEA project, proposes the exploration of coastal areas to detect, map and control P.O. meadows, using their SPARUS II submarine [4] equipped with cameras, a DVL and an IMU. One of the critical points for an AUV when exploring unknown areas is its own localization.

However, AUV navigation presents various difficulties which are inherent to these environments.

Traditionally a great variety of sensors is used to obtain the necessary information to achieve the self-localization in underwater environments. IMUs (Inertial Measurement Unit) are usually used to obtain a first estimation of the vehicle motion (especially in orientation) at high frequency, and during very small time intervals. Howerver, the information given by the gyros or the accelerometers can not be used to integrate the global vehicle position since the accumulated drift is quite significant. The velocity of the vehicle can be obtained with a greater accuracy by means of a DVL sensor (Doppler Velocity Log) and integrating the velocity the position is obtained. Additionally, Visual Odometry is used to get the position. Then, a Kalman filter is usually used to merge all this information into a more stable and unique measurement. The vehicle position resulting from the filter still presents part of the drift accumulated by the dead reckoning sensors (DVL or visual odometry). To correct this drift, two global methods can be used: an USBL (Ultra Short Baseline) or a visual SLAM (Simultaneous Localization and Mapping) [5].

USBL allows the absolute localization of the vehicle thanks to a fixed transceiver that communicates with the vehicle through an acoustic link. But it has disadvantages:

it is expensive, slow in communication and a support boat carrying the transceiver is needed. Visual SLAM can only be used to correct drift when the trajectory closes loops. This means that the trajectory covers an area that has already been covered before, usually in a different orientation. When the vehicle passes over the same areas several times, and it can grab images of sufficient quality, with great amounts of stable visual features, the use of visual SLAM is preferable to the use of an USBL, since is faster, cheaper and permits a higher frequency of pose samples. When mapping P.O.

meadows, sweeping trajectories with many loop closures are usual, so visual SLAM is

(9)

1.3. Previous work

very appropriate. Given the adequacy of the conditions, one of the goals of the ARESA project [6] is to combine USBL with visual SLAM. However this task has proven to be a challenge underwater and specially in areas with P.O..

When computing the visual odometry in areas colonized with P.O. (P.O. meadows) there is a considerable loss of reliability and accuracy due to the texture of the sea bottom and dynamics of the seaweed. P.O. has many long and narrow leafs that sway with the swell. This swaying makes the extraction of stable visual features a difficult task. Furthermore, it has a dark green color that gets darker the deeper the seabed. The regions colonized with Posidoni present either highly textured areas where the PO is, with multiple points with similar visual characteristics, or areas with no texture as the sand banks. As it will be explained in chapter 2, the accuracy of visual odometry and the image registration for visual loop closing detection depends on the number of stable features extracted from the images.

1.3 Previous work

A first approach to the issues related to underwater navigation in environments with P.O.

has already been presented in [7]. The goal of that research was to find the combination of visual feature extractor and descriptor that provided the highest number of inliers when tracking them in consecutive frames. The research work had one major part:

different combinations of feature extractors and descriptors were applied to the images to find out which combination gave more guarantees to obtain a reliable visual odometry. Five different video datasets grabbed in different points of the north-west coast of Mallorca at 10fps were used in the research. Additionally, ten different visual feature detectors (FAST, STAR, SIFT, SURF, ORB, BRISK, MSER, GFTT, Harris and Dense) were combined with seven different descriptors (SIFT, SURF, BRIEF, BRISK, ORB, FREAK and LDB). Two different algorithms (LMEDS and RANSAC) were used to filter the outliers imposing the epipolar constraint. All these combinations were used to track the visual key-points in consecutive images.

A higher number of stable fetatures matching in consecutive frames favours the calculation of a visual odometry with a minimum reliability. So that, finding the combination of detector and descriptor that gives the highest number of inliers in consecutive frames is critical to increase the reliability of the vehicle displacement calculation using visual data.

Results showed that the best detector that generates the highest number of features was Dense. LDB, BRIEF and ORB were the best descriptors and RANSAC the best algorithm for outliers removal. The rest of the combinations gave worse results.

1.4 Goal of the project

The results from the previous research work were conclusive but, for several reasons, they were insufficient. The criterion to find the best combination of detector and descriptor was based only on the number of matchings in consecutive images. This inaccurate criterion lead to two limitations: 1) the results did not represent the accuracy of the loop close detection since a loop close is detected from the number of matchings in non-consecutive images (images of the same scene, taken at different time instants

(10)

and from different viewpoints) and 2) all feature detectors and descriptors were tested without taking into account if they were invariant to rotation scale and translation. All the features in consecutive images are supposed to have slight rotation and translation so it was not critical to select the algorithms invariant to rotation, but it was for the images that closed a loop.

Consequently, the goals of this project are focused on extending the previous work [7], a) to apply a set of color enhancement algorithms to increase the image contrast and thus the number of inliers in consecutive and loop closing frames and b) to find the best combination of feature detectors and descriptors computed after the application of the new color enhancement methods [8]. The best combination will be the one that maximizes the number of inliers in consecutive and in loop closing frames in areas with presence of P.O.. As a consequence of this, the increment of inliers between consecutive images causes a direct improvement in the calculation of the vehicle displacement using visual data frame to frame, and in the visual identification of areas already visited by the robot. It is important to note that this work only focuses in the combinations that are invariant to rotation, scale and translation.

(11)

C

HAPTER

2

T HEORETICAL BACKGROUND

2.1 Visual odometry

Since the beginning of 80s, when the first real-time visual odometry implementation on a Mars rover was achieved, the visual odometry field has progressed at a high rate.

This increased interest in the field is perfectly comprehensive since visual odometry has several significant advantages over other localization methods: a) It can be used by any kind of robots (terrestrial, aerial or submarine), b) it only uses one or two cameras, which are cheaper than USBLs or DVLs and c) in terrestrial robots, it is even more accurate than wheel odometry. In general visual odometry has utmost importance in underwater environments where the GPS is denied and in those aeria environments where the gps is occluded by buildings. In particular the ARSEA project requires to use visual data for navigation.

Nowadays there exists a great variety of algorithms that are used in visual odometry.

They can be divided in stereo or monocular depending on the number of cameras used.

Stereo is more robust because the six degrees of freedom in the 3D space (x, y, z, roll, pitch and yaw) can be obtained, while monocular vision can only determine the three degrees of freedom in a plane (x, y, yaw). We know that images are projections of the 3D world into a plane where depth is lost. Thus, monocular vision, which only works with one camera, can only determine x, y, and yaw. However, when having several images from the same 3D scene taken at different points of view, depth can be recovered and thus the 6 remaining degrees of freedom of the 3D space. This is how stereo vision works.

Regardless of what method is used, its main structure is the same and the differences lie in the implementation details. A general outline:

1. Acquisition of input images.

2. Application of image processing techniques for correction and enhancement.

3. Feature detection and tracking

(12)

4. Outliers removal

5. Estimation of camera motion.

As it has been said before, the goal of this project is to find experimentally the best algorithms for image enhancement and feature extraction, so we are focusing on steps 2, 3 and 4.

2.2 Two View Geometry

The main task in visual odometry is to compute the relative transformation for each image pair in the recorded trajectory. These transformations describe the relative position of the cameras for each pair of images and can be described by means of the Multiple View Geometry. To get more information about this topic it is strongly recomended to take a look at [9].

Following the sequence of steps in the general algorithm described above the first transformation (image acquisition) we find is the perspective projection. This transformation describes the linear mapping between a point in the 3D world to a point in the image plane, using a pinhole camera model, which is the simplest. It is important to notice that the points belong to a projective space, which is an extension of Euclidian space where all lines (including parallel) meet in a point. Points in projective space are expressed in homogeneous coordinates independently of the number of dimensions that space has. Homogeneous coordinates are like traditional Euclidian coordinates (x,y) but with an added component (x,y,1) for a 2 dimension space. This last component allows the definition of points at infinity by making its value 0, (x,y,0). Furthermore, two homogeneous coordinates are equivalent when they differ by a common multiple (kx,ky,k) = (x,y,1). So to get the coordinate pair from homogeneous coordinates triple it is necessary to divide the first two components by the last one (in case of a 2D space).

Notice that the pair (x,y) of a point at infinity (x,y,0) cannot be obtained since (x/0,y/0) is infinite.

The projection matrix represents a map from 3D to 2D and has de following form.



 x y f



=







1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1











 X Y Z 1







(2.1)

In the above equation (x,y,f ) represents the image point while (X,Y,Z,1) represents the point in the 3D world. The general projective transformation from Euclidean 3-space to an image also takes into account the internal and external camera parameters. The internal camera parameters are defined by the camera calibration matrix which has four parameters:

• The scaling in the image in x and y direction,αxandαy. This parameters are used to convert distances in the 3D space to pixels in the image plane. There is one parameter for each direction since the pixels may not be squared.

(13)

2.3. Color correction and image contrast enhancement

• The principal point (x0, y0), which is the point where the optic axis intersects the image plane. This translation is necessary because the image coordinate system usually has its origin at the top left corner instead of the center.

K=





αx x0

αy y0

1



 (2.2)

The external camera parameters refer to an Euclidean transformation between world and camera coordinates. This transformation is applied when the origins of the camera and world coordinates are not the same.





 Xc am

Y_{c am} Zc am

1







=

·R t 0^T 1

¸





 X Y Z 1







(2.3)

In the previous equation R is the 3x3 rotation matrix between the camera and world coordinates, and t is the translation vector between world and camera coordinates. After concatenating the three matrices we get the general projection matrix from Euclidean 3-space to an image as

X=



 x y 1



=K







1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1







·R t 0^T 1

¸





 X Y Z 1







=K£ R|t¤

X (2.4)

In a two view geometry we have two cameras with different focal points such that

x=P X x⁰=P⁰X (2.5)

Where x is the projection in camera P of the point X in 3D space and x’ the projection of the same point in P’.

2.3 Color correction and image contrast enhancement

The second step in the general algorithm for visual odometry consists in the application of some image preprocessing technique to correct the color or enhance the image contrast to be used in the following steps. An additional correction can be applied to remove distortions caused by the device’s physical limitations, such as lens aperture.

Image contrast enhancement is used to increase the quality of the image so a higher number of features can be detected and tracked. Four different image enhancement techniques have been tested in this project since they have a direct impact in the feature extraction and tracking quality. The image enhancement methods used are LCC [8], MSR-NK (from now on called LOG3) [10], MAI [11] and MSR [12]. Figure 2.1 shows two samples of images extracted from several video sequences recorded in Port of Valldemossa with the corresponding enhanced counterparts.

(14)

Figure 2.1: Example of the appearance of the frames after applying an image contrast enhancement algorithm. From top to bottom: two different frames from the raw dataset.

The same two frames after applying LCC. The two frames obtained with LOG3. The two frames obtained with MAI. And finally the two frames obtained with MSR.

(15)

2.4. Feature detection and tracking

2.4 Feature detection and tracking

After the images have been captured and enhanced, the next step is to find and match interest points in consecutive images. This process is divided in three tasks: feature detection, feature description and feature tracking.

Feature detection consists in the identification of interest points in an image. An interest point is a region easy to localize in different images that view the same area because it presents particular, specific and nearly unique visual characteristics (visual descriptor). Textureless regions are clearly poor in interest points. In contrast, regions with large contrast changes (gradients) are rich in visual features. Straight line segment or straight edges are an exception because regions following the direction of the edge are similar. Thereby, the best interest points are those that contain at least two gradient orientations, such as corners, that is why borders are never used for tracking.

Although for tracking is not strictly necessary, we have restricted our work to detectors and descriptors invariant to rotation, scale and translation since those particularities are necessary for loop closing detection.

Once the interest points are detected they are identified using a feature descriptor.

The interest point identification or description is necessary to be able to discriminate one feature from another, and thus, in the matching process, the correspondences can be found. The computation of the feature descriptor always involves its neighborhood.

There exists a great variety of feature description algorithms and they can be divided in Local Binary Descriptors and Spectra Descriptors. The difference between them lies in the computational cost. Local Binary Descriptors represent features as binary vectors, which make them efficient to compute, store and match. Spectra Descriptors represent features by quantities that can be measured or computed such as light intensity, color, local area gradients and surface normal. All this involves more intense computation and floating point calculations which are more memory and time consuming. The difference in computation can be very significant, up to two orders of magnitude.

The most popular feature detector and descriptor is SIFT (Scale Invariant Feature Transform) [13]. It is a robust algorithm as it is invariant to translation, scaling and rotation. SIFT is tested in this project and, as it will be shown in the results section, it gives good results for visual odometry and even better results for loop closing detection.

In the tracking process, feature correspondences in consecutive images are searched.

The general way to find correspondences is by defining a distance function that com- pares two descriptors, in this case a SSD (sum of square differences). Assuming brightness constancy (projection of the same point looks the same in every frame), small motion (points do not move very far in consecutive frames) and spatial coherence (points move like they neighbors) the process can be described as follows:

1. Given a feature in the first image, the SSD is calculated for all the features inside a reduced patch on the second image. This patch is centered at the coordinates of the feature in the first image, assuming that the displacement between images will be low if the frame rate is sufficiently high.

2. The feature in the patch that minimizes the SSD is considered the best match.

A better approach takes into account the ratio of the SSD from the best match and the SSD of the second best match. In this way ambiguous matches can be rejected.

(16)

Figure 2.2: Feature correspondences in the two images that overlap, closing a loop.

Darker areas show P.O. while clearer areas show sand and rocks

The detectors used in this project are FAST, STAR, SIFT, SURF, ORB, BRISK, and Harris. The descriptors that have been tested are SIFT and SURF as Spectra descriptors and BRISK, ORB and FREAK as binary descriptors.

2.5 Feature Matching for Loop Closing Detection

Feature matching is very similar to feature tracking in the way that both work with feature descriptors in two images. However, the main difference is that in feature matching a much greater camera displacement, rotation, and a change in scale can be present. This means that all the constraints found in tracking are not applicable. Thus,

(17)

2.6. Epipolar Geometry

feature matching can be thought of as a general approach to the problem of finding corresponding features in different images.

Feature matching is used in applications such as image alignment (e.g., panoramic mosaics), object recognition, 3D reconstruction (e.g., stereo) and loop closing detection in robot navigation. All these applications can involve a significant camera displacement, rotation and change in scale so feature detectors and descriptors that are invariant to translation, rotation and scaling are needed. In loop closing detection, which is studied in this project, all three transformations can happen at the same time.

First, when a loop is closed, the relative orientations between the images that close the loop are more likely non-zero as the same area (partially or totally) is covered from different points in the trajectory. Secondly, the displacement of the cameras is also significant as the images that close the loop are not consecutive. Finally, a difference in scale can happen as the robot might not maintain an exactly constant altitude along all the trajectory. Figure 2.2 shows two images that close a loop taken at different points in the trajectory with highlighted feature matchings. As we can expect, the location of the features differ significantly from one image to the other, and a 90 degree rotation is also noticeable.

The feature attributes described above are all geometric. However, in feature matching, photometric invariance is also needed. Photometric invariance refers to the characteristics of the image, mainly brightness. Brightness invariance can be easily achieved by using gradients, which is very common among all the feature detection and description algorithms.

Another difference between feature tracking and matching is the way the feature correspondences are searched. In a matching process the same SSD is applied but, instead of limiting the search in a patch in the second image, the search is performed in the whole image. This is a consequence of assuming large displacement, rotation and scale change. Additionally, sometimes the search is performed a second time in cross correspondences (from image A to image B and confirm from B to A) in order to ensure the correspondences are correct.

As it has been said before, the SPARUS II preforms SLAM [5] to correct odometry deviations obtained after the Kalman filter has integrated the visual odometry and the readings from DVL, IMU and preassure sensors. And to compute SLAM, loop close detection, which relies on feature matching, is necessary. This is the reason why half of the project is centered in studying different algorithms to improve the loop close detection in areas with P.O. by increasing the number of detected features in the feature matching process.

2.6 Epipolar Geometry

The epipolar geometry allows us to answer the following questions: a) when having multiple views of the same scene, given an image point in the first view, where is the corresponding point in the second view? b) Which is the relative position of the cameras? c) Which is the 3D geometry of the scene?

Figure 2.3 shows two cameras looking at the same point P in space from different positions. The projection of this point in the left camera is p and in the right camera is p’. To understand the relation between p and p’ let us suppose that we know only p. Its

(18)

Figure 2.3: Epipolar scheme

corresponding scene point P is constrained to be on a ray through p and the camera center and this ray is imaged in the right camera as a line l’, which is the epipolar line.

Thus, p’ is constrained to be in l’. If we do the same with a different point q, we will find that q’ must lie in a different epipolar line m’. However, m’ and l’ will always intersect in the same point e’ known as the epipole. The epipoles are the intersection of the line joining the camera centers, baseline, and the image planes. This point can be thought of as the image in the right camera of the left camera center. So the location of the epipole in the image plane only depends on the relative position of the cameras. More information in [9].

It is obvious that all epipolar lines in the second camera intersect in the epipole e’.

However there exists a particular case when the epipole is at infinity and the epipolar lines are parallel. This happens when image planes are parallel. The trajectory followed by the SPARUS II is coplanar with the image plane of the camera and parallel to the trajectory plane, so two consecutive images correspond to this particular configuration.

2.7 Fundamental matrix

After the brief explanation of epipolar geometry, the first significant relation between different images from the same scene can be introduced.

The fundamental matrix F is a 3x3 matrix which relates corresponding points in stereo images or two images from the same 3D scene. If x and x’ are corresponding

(19)

2.8. Homography

Figure 2.4: Homography scheme

image points then

x^0TF x=0 (2.6)

If l’ = Fx is the epipolar line corresponding to x, as it has been said before, x’ must lie on the epipolar line. In homogeneous representation of points the point x lies on the line l if and only if x^Tl=0. This means that the expression above is true if x’ lies in the epipolar line which is x’^Tl’ = 0, the same as above.

The fundamental matrix has 7 degrees of freedom which means that in order to compute F seven point correspondences are necessary. Once the F is determined all supposed correspondences can be checked and divided in inliers or outliers whether they fulfill x’^TFx = 0. Then the remaining inliers are used to compute H with the previously explained algorithm.

2.8 Homography

Another interesting relation that can be established between to images is the homography represented by H [9]. It is also known as a projectivity, which is an invertible mapping between projective planes such that three points x₁, x₂and x₃lie on the same line if and only if h(x1), h(x2) and h(x3) do. This means that two images are related by a homography if they are projections of the same planar surface in space as in figure 2.4.

Under this condition it is true that x’ = Hx, where H is a 3x3matrix with eight degrees of freedom. Once H is known the relative pose of the cameras can be estimated.

The typical way to compute the homography is by finding the correspondence of at least four points on the plane in the 3D world. To get the point correspondences visual feature extraction algorithms are used. Additionally it may be necessary to remove outliers or points that are not coplanar in order to obtain a precise homography.

The homography computation is very useful because many visual odometry imple- mentations are based on it (the AUV we are focused in is not an exception). They find the homography between consecutive images to obtain the movement of the camera.

However it is necessary to meet the geometric constraints: the features extracted from the images must be coplanar.

(20)

To accomplish with the planarity constraint we assume that the vehiclenavigates at a constant altitude and that this altitude is much higher than the maximum height of the bottom profile. The algorithm used to compute the homography performs a search for the most coplanar four points before computing the actual H. This way the variations in feature depth are compensated.

(21)

C

HAPTER

3

D ATA GENERATION FOR VISUAL ODOMETRY ESTIMATION

This chapter explains the process that has been followed to generate the visual odometry data which is then analyzed to find the best combination of feature detector, descriptor and image contrast enhancement.

3.1 The code

Developing the code that generates the data necessary for the navigation was not the goal of this project. Instead, I was given an existing code that could be run after minimal modifications. The code was designed to be run in a linux machine and it uses several libraries from Boost and OpenCV packages so they have to be previously installed.

The code may generate the output data automatically but it is necessary to do some manual work in order to tidily store all the output data since at the end of the process there will be 170 folders, one for each combination. The folder distribution and the commands necessary to run the code are explained in appendix A.1.

The code used to generate the results for the odometry estimation does not correspond to a complete and real time visual odometry estimation process. It actually works as follows. In first place, the code takes the first two consecutive frames and extracts the visual features in both images. Then the features are tracked and the fundamental matrix is computed to reject the outliers (wrong feature tracking). In the next step, the codes does the same computation but this time the pair is formed by the second and third frames from the video sequence. The code repeats the process for every image in the video sequence so at the end it has made as much iterations as number of frames in the video sequence minus one. As it has been said above, the code does not correspond to a complete visual odometry estimation process because it just extracts the visual features, computes the fundamental matrix and lists the number of matches and inliers and It does not compute the camera displacement. Furthermore, the code is not run in real time since the video has already been recorded.

(22)

3.2 Input data

The input data consists of five sets of images from the same video sequence. Four of them are improvements over the first set that from now on will be called raw dataset.

The video was recorded near Valldemosa in a region densely populated with P.O. by a diver. The velocity of the diver was sufficiently slow to get a huge overlap in consecutive frames. The camera used was a GOPRO and it was orientated so that the image plane was parallel to the sea bottom. A marker can be seen in the image sequence several times, which was used to indicate the beginning and end of the trajectory. Then, to obtain the images from the video a few transformations were performed. In first place, the video was downsampled to 10 fps to reduce the computational effort since the minimum sampling frequency of the GOPRO is 30 fps. In second place, the image resolution was downsized from 1920x1080 to 960x540 for the same previous reason. At the end, the resulting dataset consisted of 333 images and corresponds to the raw dataset. The other four datasets were obtained after applying different contrast enhancement algorithms to the first dataset, which are called LCC [8], MSR-NK [10], [11] and MSR [12]. This way the results obtained from the enhanced datasets can be compared among them and the raw dataset and the improvement can be measured.

3.3 Parameters

The code used in the data generation process can be adjusted through 4 parameters: a) detector, b) descriptor, c) threshold and d) outlier removal. These parameters have to be adjusted in thebatch_feature_matcher.cppfile before compiling.

• The parameter detector refers to the algorithm used to identify the interest points in the images. The detectors that have been used in this project are FAST, STAR, SIFT, SURF, ORB, BRISK, and Harris.

• The parameter descriptor defines the algorithm that is used to describe the detected features. The descriptors that have been applied are SIFT, SURF, BRISK, ORB and FREAK.

• The threshold is used to filter the good matchings from all the matchings. Its value has been kept always at 0.8 since this is the optimal value independently on the detector and descriptor used.

• The outlier removal refers to the method used to remove incorrect feature matchings. This parameter has always been set to LMEDS [14].

The detector and descriptor are the only parameters that have been changed as the goal of this project is to find the best combination of feature detectors and descriptors.

Thus, the value of this parameters have been set to cover all possible combinations.

Having 7 different detectors and 5 different descriptors result in 35 different combinations, except that the combination SIFT as detector and ORB as descriptor was not included in the final results report since the number of stable correspondences was nearly zero.

Additionally, five different input datasets have been tested, so another level of combination is added, which results in a total of 170 combinations.

(23)

3.4. Output data

3.4 Output data

The code generates a lot of output files. The vast majority of them do not contain any useful information and are just files used in intermediate steps of the algorithm.

For the goal of this project, the interesting information is contained in the file re- sults_crosschek.txt. This file contains the number of matches and inliers for each pair of consecutive images. This information will later be used to determine how good a combination of image enhancement and feature detector and descriptor is. In spite of not being so important in the process of finding the best combination, the rest of the files can be used to visualize the detected features.

3.5 Comments

After entering the last command the code will start to run and create files in the directory that has been indicated before. Depending on the combination that is running the process can be fast or very slow. For example, the detectors FAST, STAR, SIFT, and SURF slow down the process a lot, as well as descriptors SURF and SIFT. The difference in running time can be massive ranging from under a minute the fastest to 5 hours the slowest. For example, generating the data for the raw dataset with detector and descriptor ORB can take under a minute while using the same dataset with detector and descriptor SURF can take nearly 5 hours. And this process only represents two combinations from a total of 170.

This huge differences in running speed can be explained by two factors, the type of feature descriptor and the number of features it detects and matches. As it has been said in chapter 2, the spectra descriptors are computationally heavier than the binary ones, which agrees with the fact that the slowest feature detectors and descriptors are FAST, STAR, SIFT and SURFT, all of them spectra descriptors. On the other hand, binary descriptors such as BRISK, ORB and FREAK are simpler to compute which leads to a much shorter running time.

The second reason why spectra descriptors are slower than binary is that in general they detect and match a significantly higher number of features. This adds to their nat- urally low speed and results in algorithms that perform more expensive computations to more features, which can make them up to 300 times slower.

Leaving aside the running time factor, the extra number of features the spectra descriptors detect and match makes them more robust and appropriate for underwater navigation in P.O.

(24)

(25)

C

HAPTER

4

D ATA GENERATION FOR LOOP CLOSE DETECTION

The data generation process for loop close detection is similar to the one for visual odometry computation. However, the code is significantly different in the way it works.

Both visual odometry and loop close detection work by extracting visual features from a pair of images. Visual detectors and descriptors are used for this purpose as well as outlier removal algorithms. The difference lies in the source of the pair of images. In visual odometry the pair of images is composed of consecutive frames from a video sequence while in loop close detection one image is always the same and the other can be any of the frames in the video sequence so they are not necessarily consecutive. So at the end, the codes for odometry estimation and loop close detection are structurally the same.

4.1 The code

Although the code for feature tracking in consecutive images and feature matching in images that close loops was already implemented at the time this work started, some modifications were necessary to change configuration and data folders, and parameters such as the feature detector, descriptor, the used thresholds, recursive algorithms for outlier elimination, etc...

It is important to note that this code does not represent a loop close detection algorithm that would be implemented in a real SLAM. It is a code developed just to test different feature detectors and descriptors in a matching process. The way it works is the following.

Given a dataset of images from a trajectory and a query image, a matching process is performed between the query image and each image from the dataset. The query image is previously selected from all those frames of the video sequence that image areas on which the robot has passed by several times. Then, the outliers from the matching process are rejected applying the epipolar constraint (xFx’ = 0) so just the

(26)

inliers are kept. As a result there is a number of inliers for each pair of image from the dataset and the query image. This information will be used to estimated which is the best combination detector/descriptor/image enhancement method to be used to register images for loop closing detection in environments colonized with PO. It is expected that the image pair that closes a loop has a significantly higher number of inliers than a pair that has no overlap.

Again, the instructions that have to be followed to run the code are explained in appendix A.2.

4.2 Input data

The input data is exactly the same as in the odometry computation since the loop close detection is performed along all the images in the datasets. The same five datasets were tested.

4.3 Parameters

The parameters in the loop closing code are the same (feature detector, feature descriptor, outlier removal and matching threshold) except for an additional parameter that indicates the query image. This query image is the image from the dataset that is compared to the rest and it is adjusted in a different way, as it is explained in appendix A.2.

4.4 Output data

The resulting data from running the code for loop close detection is exactly the same as in the odometry estimation. However, the number of files may vary due to the fact that the images that do not close a loop, as it can be expected, do not contain enough matches, so the process is disrupted and the files are not generated. Nevertheless, the important file,results_crosschek.txt, is kept intact. But this time the file does not represent the number of inliers in consecutive images but the number of inliers between the query image and each image in the dataset.

4.5 Comments

In general, the time necessary to generate the data for the loop close detection was similar to the time necessary for the odometry computation. This similarity comes from the fact that the inner process is essentially the same: one searches matches between a query image and the rest of the dataset and the other searches matches between an image in the dataset and its consecutive image. The total number of image pairs that are searched for correspondences is the same with the only difference being the amount of resulting inliers, which is much lower in loop close detection.

(27)

C

HAPTER

5

D ISPL AY OF RESULTS

A very important part of the project is the display of results. After running the code the results from the odometry evaluation and loop close detection are stored in the filesresults_crosschek.txtin form of numerical matrix. And there are 170 of this files for the odometry evaluation and another 170 for the loop close detection. These numbers correspond to the total number of combinations obtained from 7 different detectors, 5 different descriptors, and 5 different datasets, minus the combinations with SIFT-ORB.

In conclusion, the amount of data that has to be analyzed is huge, so an automatic method to display the results in a more visual and intuitive way is needed.

The tool that suits best our purpose of visualizing the data is definitely Matlab. There are two main Matlab scripts for visualizing the data. The scriptanalyse_GOPRO_dataset.m

is used to display the results for odometry evaluation and the scriptanalyse_LC_GOPRO_dataset.m is used to display the results for loop close detection.

5.1 analyse_GOPRO_dataset.m

This script has several parts that show different information. The modifications done for this work are explained in annex II.

5.1.1 Definition of parameters

The first part contains the definition of all the parameters and the definition of the paths to the output data and to the image datasets. The user can manually adjust the detectors and descriptors that will be used in the script, so that the results shown only correspond to the selected combinations. This is useful because in the context of this project there are 34 combinations and the results from all this combinations cannot be shown at once.

This script has the limitation that it cannot directly compare results from different datasets. There only exists one path to the dataset folder so the user can only define one dataset at a time.

(28)

Figure 5.1: Number of matches and inliers along the trajectory for images enhanced with LOG3 and the three best combinations of detectors and descriptors.

In conclusion, in this part the user can define which combinations will be shown given a single dataset.

5.1.2 Load data

The function loadFiles is used to load all the usefull information in the filesresults_crosschek.txt.

However, just the data form the defined combinations is extracted, so the efficiency is higher.

5.1.3 Plot matches and inliers

Here the function plotMatchesInliers is called. It generates a figure containing two plots, where one shows the number of matches and the other shows the number of inliers for each image pair. The X axis represents the image pair and the Y axis the number of matches or inliers. Also all the combinations chosen are shown at the same time, which can make the plot unintelligibly if the number of combinations is significant. For this reason, the most useful way to use this function is to represent the evolution of matches and inliers through the trajectory for one or two combinations.

Figure 5.1 shows an example of the figure obtained from the function plotMatch- esInliers. Three different combinations have been chosen and they are represented by a different color. With this kind of representation the number of matches and inliers is known for each image pair and, thus, the results of each combination can be analyzed with great detail. Nevertheless, this representation has some disadvantages. In first place, the number of combinations that can be clearly displayed is limited. For example, trying to isolate a curve from a total of 10 would be very hard. This is a huge limitation as the goal of this project it to find the best combination in 34 candidates. In second

(29)

5.2. PlotInliersComp.m

place, it does not show clearly how good a combination is. A single value, like the mean number of matches and inliers would make the comparison much easier.

5.1.4 Plot statistics

This is the most important part of the code. It contains a function called plotMatchesIn- liersStatistics that shows the mean number of matches and inliers for each combination defined at the beginning of the script. Although the plots only show the mean values, this is the best way to compare different combinations among them in a simple manner.

Consequently this is the part of the code that has been used to determine the best combination.

An example of the resulting plots can be seen in 5.2. The detectors SIFT, SURF and ORB and the descriptors SIFT, SURF, ORB, BRISK and FREAK are all shown at the same time with no problems. The best combination in this particular case can be clearly identified as SURF-SIFT.

5.2 PlotInliersComp.m

This scripts is used entirely to measure and display the increment in the number of inliers when an image contrast enhancement algorithm is used to improve the quality of the acquired images. As the other scripts it has a first part where the parameters are defined: detectors, descriptors, outlier removal, threshold and the datasets that have to be analyzed. In the second part all the necessary data, which depends on the defined parameters, is loaded. Finally, at the end of the script, the function plotInliersComp is found.

The function plotInliersComp shows the increment in the average number of inliers between the data obtained from an enhanced dataset and the data obtained from the raw dataset. The increment is measured as the average number of inliers in the enhanced dataset divided by the average number of inliers in the raw dataset, so an increment greater than 1 means that the image contrast enhancement algorithm increases the number of inliers. To make the process of comparing different combinations easier the ratios are displayed as a bar plot, as it can be seen in figure 6.9.

A last figure is generated by the function that shows in a scatter plot the increment ratio versus the average number of inliers, as shown in figure 6.10.

5.3 analyse_LC_GOPRO_dataset.m

This completely new script is used to show the results for the loop close detection and the full code is presented in the appendice N. The script has three parts: the first part contains all the definition of parameters and paths to directories. It is almost the same as in the script for odometry evaluation, with an additional parameter being the query image.

The second part contains the function used to load all the interesting data, which is found inside theresults_crosschek.txt. The function can not be the same as in the other script because there is an additional level in the hierarchy folder that is explained in A.1 Except from this, the rest of the code is exactly the same.

(30)

Figure 5.2: Average number of matches and inliers for each combination

(31)

5.3. analyse_LC_GOPRO_dataset.m

Figure 5.3: Matches and inliers for each image pair obtained with FAST-FREAK, SIFT- SIFT and SURF-FREAK.

The last part contains two different functions that show the results extracted from the generated data. These functions are called plotLC and plotRateLC. Before them there is a definition of an array that represents the image intervals where a loop closure is supposed to happen. The user has to define the intervals by visualy analyzing the images in the dataset that close loop with the selected image query. The images that make up the interval must be identified by its name. The intervals are then used to plot the results.

The function plotLC plots the matches and inliers for each pair of images, which consists in the image query and a given image from the dataset. Also the regions where the user indicated a loop closure are highlighted so one can easily compare the number of matches and inliers between different regions.

An example of the resulting plots can be seen in figure 5.3. The combinations shown are FAST-FREAK, SIFT-SIFT and SURF-FREAK. The blue squares represent the regions that have been defined by the user as loop closures. The yellow line indicates the location of the query image and the number of matches and inliers around it is set to 0. The user can adjust the size of a neighborhood (interval of frames) around the query image whose value of matches and inliers is set to 0 just by adjusting an internal parameter of the function callednum_neighbors. It is important to note that this neighborhood around the query image must be ingonred because the query image and this particular region show the same scene but are not considered a loop closure since they belong to the same section of the trajectory. For a loop closure to happen there have to be a matching between two frames of the same scene recorded at different time instants and most likely from different perspectives and view points. Again, this kind of plot is useful to compare a small amount of combinations at the same time so it is recommended no to show more than two or three combinations at the same time.

(32)

The function plotRateLC shows two mean values of inliers for each combination specified. More specifically, it computes, on the one side, the mean value of inliers in the regions that are specified by the user as loop closing and on the other side the mean value of inliers in the other regions, which are supposed not to close a loop. These two values are then displayed as a grouped plot. Furthermore, to avoid considering the inliers obtained from the query image with itself, the function provides an internal parameter that allows the user to define an interval around the query image where the inliers are not considered. Also the user can define another parameter that designates the size of an interval adjacent to the limits of the regions that close loop. In this way it is ensured that the possible image pairs that close a loop at the boundaries of the region are rejected and, thus, the results are more accurate. An example of the plot generated by the function can be seen in figure 7.1.

(33)

C

HAPTER

6

R ESULTS FOR THE ODOMETRY ESTIMATION

In this chapter all the results from the odometry estimation are shown and discussed in order to determine the combination of detector, descriptor and image contrast enhancement that gives the highest number of inliers in consecutive images.

The information that will be used to achieve this goal is the mean value of matches and inliers in consecutive pairs of images computed for the whole dataset of images.

The higher the mean number of matches and inliers the more robust the combination is for this kind of environments with P.O..

The analysis has two phases. In the first phase the best combination of detector and descriptor for a given dataset will be determined. This will be repeated five times, one for each dataset. The number of possible combinations is 34 and the first name refers to the detector algorithm while the second name refers to the descriptor algorithm. For example, FAST-SIFT is detector FAST with descriptor SIFT. In the second phase the best image contrast enhancement method will be determined by comparing the results from first phase.

6.1 Results obtained from the raw and enhanced datasets

The raw dataset contains all the original frames with no contrast enhancement. The mean number of matches and inliers in consecutive pairs of images are shown in figure 6.1. The first conclusion that can be extracted is that the average number of inliers is always lower than the number of matches and the relation between them appears to be a constant factor for all the combinations. It seems that the whole bar plot of the inliers is the same as the matches but slightly scaled down. Also, combinations with the same detector have similar mean values. For example, all combinations with detector SURF have high numbers of inliers, while all combinations with detector STAR have very small number of inliers. But for this particular dataset the best combination is FAST-ORB with 1122 inliers.

The average number of matches and inliers for the LCC dataset is sown in figure 6.2.

(34)

Figure 6.1: Average number of matches and inliers in the raw dataset.

Figure 6.2: Average number of matches and inliers in LCC dataset.

(35)

6.1. Results obtained from the raw and enhanced datasets

Figure 6.3: Average number of matches and inliers in LOG3 dataset.

Figure 6.4: Average number of matches and inliers in MAI dataset.

(36)

Figure 6.5: Average number of matches and inliers in MSR dataset.

Again the best combination is FAST-ORB but in this case the difference with the second best combination is bigger. Also the behavior described in the previous section can be observed in the results. The average number of inliers for the best combination is 2263.

The results from the dataset LOG3 can be seen in figure 6.3. The resulting plots are very similar to the ones obtained from the LCC dataset but slightly higher. The best combination is again FAST-ORB with 2543 inliers.

The results from the MAI dataset are shown in figure 6.4. At this point the same bar plot shape can be expected. However the best combination is FAST-ORB with just 1946 inliers.

The results from the last dataset can be seen in figure 6.5. The plots have the same shape as expected and the best combination is FAST-ORB with 2053 inliers.

In conclusion, the combination that gives the highest number of inliers is FAST-ORB and this is true for all the enhanced datasets and also for the raw dataset. Detector FAST combined with descriptors SIFT, BRISK and FRAK also give a very high number of matches and inliers. The other best options with detectors different than FAST are SIFT- SIFT and SURF-SIFT, SURF-BRISK and SURF-FREAK, that, even though they are not the best, all of them give more than 800 inliers, an amount of stable correspondences enough to calculate a reliable odometry. On the other side, the worst combinations are those obtained with detectors STAR, ORB, BRISK, and HARRIS, independently of the descriptor used.

Reffering to the image contrast enhancement algorithms, it can be seen that all of them increase the number of matches and inliers. The one that increases the number of

(37)

6.2. Matches and inliers along the trajectory

Figure 6.6: Number of matches and inliers along the trajectory for images enhanced with LOG3 and the three best combinations of detectors and descriptors.

matches and inliers the most is LOG3 and the improvement can be clarly perceived in the combinations with higher number of matches and inliers. The worst combinations experience a little increase but it is not enough to make them any better. This behaviour may suggest that the increase is proportional to the initial number of matches and inliers so it is studied with much detail in section 6.3.

6.2 Matches and inliers along the trajectory

The average number of inliers is a good method to measure the overall sturdiness of a given combination. But it has a problem since it does not consider the inliers frame by frame, which can be unconsistent or not necessarily constant. For this reason the function described in section 5.1.3 has been used to analyze the matches and inliers frame by frame to make sure the distribution of inliers is relatively homogeneous.

Three different groups of combinations have been selected to show their matches and inliers along their trajectory since showing all the combinations would require many plots. These three groups differ in the number of inliers so that the first group contains the best combinations, the second group contains the combinations with middle results, and finally the third group contains the worst combinations. To further limit the number of figures all the results have been obtained with the same image contrast enhancement algorithm which is LOG3 since it has proven to be the best.

The first group with the best combinations contains the combinations FAST-SIFT, FAST-BRISK and FAST-ORB and the resulting plot can be seen in figure 6.6. The first striking fact is that the evolution of matches and inliers along the trajectory is very abrupt: it has some peaks higher than 6000 and then other frames have zero or near zero number of inliers. This shows that the descriptor FAST is greatly affected by

(38)

Figure 6.7: Number of matches and inliers along the trajectory for images enhanced with LOG3 and three medium performance detectors and descriptors.

Figure 6.8: Number of matches and inliers along the trajectory for images enhanced with LOG3 and three of the worst combinations of detectors and descriptors.

(39)

6.3. Datasets comparison

the particularities of the acquired images. The reader may also notice that all the combinations shown have a very similar shape and only differ in the maximum value, which at the end defines its mean value of inliers. This is the result of using the same detector with different descriptors.

The second group consists of combinations with intermediate results which are SIFT-SIFT, SURF-SIFT and BRISK-SIFT. The resulting figure can be seen in Figure 6.7.

This time the alternating parameter is the detector with fixed descriptor. Again the resulting plot looks similar to the previous one by its shape and abruptness. However the peaks are significantly lower at 1800. Another observation it that there are a few frames where the number of inliers is zero for the combinations SIFT-SIFT and BIRSK- SIFT while SURF-SIFT never gets to zero. In this sense SURF-SIFT is superior because it permits to establish a continuous odometry frame by frame. It there are not enough inliers to calculate the camera displacement between two consecutive frames, the vehicle navigation filter will supply it by giving more weight to the other sensors, which is something undesired.

Finally, the third group contains the combinations STAR-SURF, ORB-SURF and HARRIS-SURF which give very low number of inliers. If we look at the resulting plot (figure 6.8) we can see that the curves are not even close to a flat line but the scatter is not as high as in the previous combinations. The majority of the number of inliers lies between 10 and 50 for STAR-SURF and HARRIS-SURF while ORB-SURF has a more significant scatter of 10 to 90. Also the number of frames with zero inliers is not as high as one would expect: these combinations are bad simply because the number of inliers is too low.

6.3 Datasets comparison

From the previous results the best combination can be established as FAST-ORB with LOG3 image contrast enhancement. This combination gives the maximum average number of inliers in consecutive frames at 2543, which is a number 12.4% higher than the average number of inliers obtained with the second best image enhancement method (LCC with 2263 inliers). However it may be interesting to measure quantitatively the improvement provided by applying an image contrast enhancement to the raw dataset for all the combinations of detector and descriptor.

A new script in Matlab has been created for this purpose and can be seen at B.1. The script reads all the information that has been generated for the odometry estimation and generates four bar plots that are very similar to those used in the previous section.

The x axis is the same but the y axis, instead of showing the average number of inliers, it shows the ratio of average number of inliers in an enhanced dataset over the raw dataset. Since there are four different image contrast enhancement methods, four plots have to be generated.

After running the script the figure in 6.9 is generated. The first noticeable fact is the similarity in the results despite the different image contrast enhancement methods used. There are two groups defined by detectors STAR and BRISK that differ significantly from the rest because they have higher ratios of increment in number of inliers. And unexpectedly they can be found in all four plots mostly unchanged. Also the ratios of detector STAR range from 10 to 24, which are huge compared to the group formed

(40)

Figure 6.9: Increment in average number of inliers for each enhanced dataset with respect to the raw dataset

(41)

6.4. Conclusions

by detector BRISK with ratios up to 5 and the rest, all with ratios under 5. Another important thing to note is that the group formed by ORB has the lowest ratios and are smaller than 1, wich means that applying any of the four image enhancement methods only reduces the number of inliers.

The ratio of increment of inliers in the group formed by detector STAR is surpris- ingly high, which motivates a deeper study. When looking at the plot that represents the average number of inliers in the raw dataset 6.1 it draws the attention that the combinations with detector STAR have the lowest number of inliers, ranging from 2.4 to 7.6. These values are so low that after applying the image contrast enhancement, which increment the number of inliers up to 24 times, they still stay lower than the rest.

This fact may suggest that lower increments can be exptected from combinations with high number of inliers in the raw dataset. To see how strong this relation is let’s look at the combinations with the highest number of inliers in the raw dataset, which are FAST and SURF. Their increments are low and consistent along all the enhanced datasets, which agrees with the proposed correlation. If we do an inverse deduction following the reasoning that the higher the number of matches and inliers in the raw dataset the lower the proportional increment, the reasoning reaches a contradiction.

For example, if we look at figure 6.9 we can see that ORB has the lowest increment.

Following the previous reasoning, combinations with detector ORB should have the highest number of matches and inliers in the raw dataset since its improvement is the lowest. However, the reality is that combinations with detector ORB have low nombers of matches and inliers in the raw dataset, and thus, the reasoning is incorrect as well as the proposed correlation.

To get out of doubts a few lines of code have been added to the script in order to generate the scatter plot seen in figure 6.10. Each point in the plot corresponds to a particular combination and they are placed in the plot according to their average number of inliers in the raw dataset on X axis, and the average increment for the four enhanced datasets on Y axis. This way, any kind of relation should be visible.

However, the results show no apparent relation between the number of inliers in the raw dataset and the increment of inliers in the enhanced datasets. Actually what the scatter plot suggests is that all combinations exhibit a similar increment of inliers and the combinations with detector STAR are just outliers.

6.4 Conclusions

As it has been discussed in section 6.1 the best combination of detector and descriptor is detector FAST and descriptor ORB, and this is true for all four image contrast enhancement methods. It is considered the best because it gives the highest average number of inliers in consecutive images, which ensures the most robust odometry estimation available. The best image contrast enhancement algorithm has turned out to be LOG3 which combined with FAST-ORB gives 12.4% more inliers than the scond best image enhancement method (LCC with FAST-ORB).

(42)

Figure 6.10: Scatter plot showing all the combinations by average number of inliers in the raw dataset vs. average increment in the four enhanced datasets.

Experimental analysis in visual features generation for submarine images preprocessed with contrast enhancement techniques

T reba ll F ina l de G rau

GRAU D’ENGINYERIA ELECTRÒNICA INDUSTRIAL I AUTOMÀTICA