• No results found

Methods and implementation

The goal of this thesis is to create an autonomous quadcopter drone system with real-time analytic functionality for use in inspection and surveillance. Several tasks have to be done adequately to achieve that goal. First and foremost there is a need to create a system that enables the quadcopter drone system to fly autonomously without human intervention. Secondly, there is a need to deploy an object-detection system on a unit with sufficient computational power. In this part of the paper, we will outline how we approached these challenges.

3.1 Platform for autonomous drone operations -infrastructure and overview

The quadcopter drone SDK used in this thesis runs on mobile platforms such as iOS and Android. Running large neural networks on mobile devices is however not a viable option for this project. The reason is that it does not have sufficient computational power as can be seen in table 3.1. Empirical tests have previ-ously revealed an attained frame rate of 1.7 fps when running shallow versions of YOLO on the client platform. Therefore the remaining option was to utilize external computational power, either through cloud-based infrastructure or sepa-rate servers. Consequently, the system ended up with two different components, a drone client used for controlling the quadcopter and extracting sensor data, and also a server component used for object detection. The system’s structure will be presented in the sections to follow. We start by presenting the setup and architecture of the client.

37

3.1.1 Client architecture

We deployed the client part of the system on an iPhone 6 with iOS. The responsi-bilities of the client are to initiate missions, control the drone, and extract relevant sensor data to forward to the external server. The client has several capabilities.

Among them is the ability to start point based missions, where one designated specific geolocations for the drone to fly to, in addition to area coverage mis-sions, where one marks of an entire area on a map that shall be covered by the quadcopter. The client is also responsible for configuration of the mission-related parameters, such as quadcopter altitude, speed and heading during flight. One should furthermore be able to get feedback during mission execution, such as live video from the drone, in addition to feedback consisting of detections from the server. The client was broken down into separate component classes with unique responsibilities and structured according to the MVC-paradigm. See figure 3.1 for a class diagram displaying the client system in its entirety. The client was implemented in Objective-C++ to have access to POSIX socket networking code also used on the server. The development process ended up with the steps given in the undermentioned list.

1. Create graphical interface client application for the smartphone.

2. Connect client application to drone.

3. Create mission operator to manage drone missions.

4. Connect client to external server.

Graphical interfaces are natural starting points for development. The main GUI component of the client application is a map view used to steer the drone to different geographical coordinates. The map is used to select waypoints and also mark areas to cover to initiate missions.

Enabling the drone to talk to the clientwas the second part of the process.

We performed this by calling functions in the DJISDKManager part of the API.

Mission operator implementation enabled the client to push control com-mands to the drone. We materialized it by implementing various protocols pro-vided by DJI.

The first required protocol was the DJISDKManagerDelegate protocol. It sends the delegate class updated registration status and change of the aircraft status.

The second protocol of importance was the DJIFlightControllerDelegate. That

3.1. PLATFORM FOR AUTONOMOUS DRONE OPERATIONS - INFRASTRUCTURE AND OVERVIEW39

Table 3.1: Specifications of iPhone 6 used as platform for the drone application iPhone 6

Memory 1 GB RAM DDR3

Graphics PowerVR GX6450 (quad-core graphics) Processor Dual-core 1.4 GHz Typhoon (ARM v8-based)

protocol allows the client to manipulate the aircraft, for instance, by sending control commands, adjusting altitude, velocity and so forth.

The third protocol that needed support was DJIBaseProductDelegate. That protocol qualifies the application to receive notifications from the drone when there is connectivity changes, or other internal components changes, plus one gets diagnostic information from the drone at regular intervals.

Figure 3.1: A class diagram depicting the architecture of the client side of the system.

3.1. PLATFORM FOR AUTONOMOUS DRONE OPERATIONS - INFRASTRUCTURE AND OVERVIEW41

Streaming video from DJI drone

An important requirement for the client is to extract the video stream from the drone and forward it to the external server for processing. The process of ex-tracting it did, however, prove to be a lot more convoluted than first thought. In this section, we will briefly lay out the process.

DJI provides a nice framework called VideoPreviewer for its SDK to view the live video feed from the aircraft on the client application. However, it does not give access to individual video frames. It only enables implementation of a view to display video on the mobile device with decoding happening behind the scenes inside the framework. The design of the framework caused a problem for this project since it depends on being able to send images for processing to the exter-nal server. We went into a couple of dead-ends regarding the extraction of video during this project and spent roughly 1 month figuring out how we could decode the video. The answer lay in the multimedia library FFmpeg that we referenced in section 2.9.2.

What DJI makes available in the framework are h.264 encoded binary blobs of data that contains the video. Our initial idea was to send this binary blob of data to the server, perform decoding of the h.264 video and later perform the analy-sis on the decoded image. A video usually conanaly-sists of a container which wraps streams of audio and video. The individual data elements inside that stream are called frames, and a codec encodes each of those frames. To decode the frames, one passes packets from the streams inside the video to a decoder and perform some action on the decoded image. The process is as follows:

1. Open video stream.

2. Read packet from video stream into frame.

3. If frame is valid then perform action on the frame otherwise go back to step 2.

4. Go back to step 2.

The problem was that DJI did not provide sufficient information to create a de-coder, nor would they disclose how they did it through their developer forum. To create a decoder one needs an array of parameters, none of which were provided.

Much time was spent trying to create a functioning decoder but to no avail. Our effort to create a decoder ended up as figure 3.2 shows. However, after working many hours with the FFmpeg library, we started to grasp the underlying concepts of decoders. What we ended up doing was modifying the VideoPreviewer frame-work from DJI and inject custom functions. We essentially had to hack their

Figure 3.2: The attempt to create a decoder on the server did something correct, but was far from perfect.

framework to extract the desired image data. Deep inside the framework we dis-covered a place were the decoded AVFrames were being forwarded to their view classes, so we created a function there that forwarded that frame to our Network-Manager class. The NetworkNetwork-Manager used OpenCV to convert the AVFrame to a regular image, and that image was sent to the server for processing.

3.1. PLATFORM FOR AUTONOMOUS DRONE OPERATIONS - INFRASTRUCTURE AND OVERVIEW43

Waypoint-missions

One of the types of autonomous capabilities implemented for the quadcopter drone in this thesis is the waypoint-mission. In the waypoint-mission, the user selects waypoints on a map as destinations for the drone to cover during flight.

One can alter parameters for the waypoint-mission such as drone heading, speed, altitude and actions at each waypoint. The waypoint-mission is depicted in figure 3.5.

Area cover missions

Another type of autonomous mission added to the system developed in this thesis is the area cover mission. In the area cover mission, the user marks an area on a map for the quadcopter unit to cover. In the area cover mission, it is also possible to modify parameters such as speed, altitude, heading, and waypoint actions.

The area cover missions are generated by taking the raw pixel coordinates of the marked area on the screen, transform the pixel coordinates to the map view coordinate system, and later receive geographical points indicating the coordi-nates of the corners. The corner coordicoordi-nates of the marked area are handed to the InspectionArea-class which generate the missions. The algorithm for creating the waypoints for the area cover missions first calculates the necessary number of tar-get points to cover the desired area, and after that, it calculates the coordinates of each individual waypoint in the mission.

Manual missions

The last mode of operation we added as part of the GUI is the manual mission.

The manual mission lets the user fly the drone manually while having access to the capabilities created in the client application. The main usage of the manual mode for this project is for testing of the prediction abilities of the platform. The tests we performed were for instance to hover the drone over a car and run test out the object detection.

Graphical user interface

The client interface is touchscreen based and implemented for iOS-smartphones.

Through the GUI the user can control the quadcopter in addition to perform object detection of images taken during missions. The main way to control the drone is through the map view, and as a consequence, the majority of the inter-face make use of it. By using satellite images, we make it easy to execute missions.

The main screen, as illustrated in figure 3.3, presents a couple of options for the user through a set of buttons. It enables the user to change the drones main camera rotation. It furthermore lets the user perform predictions of the current camera images. Moreover, it also lets the user annotate the current location of the smartphone as a pin on the visible map view. The location annotation is mostly for use in experiments. In the bottom right corner of the screen, a video view is added to display a stream of video from the drones main camera to give good feedback regarding what the drone is currently doing in the operations.

The video view is created to be flexible such that it can be moved around and also set into full screen. This is depicted in figure 3.10 and 3.11. The last button lets the user select and initiate mission execution of different quadcopter missions.

After pressing the ’select mission’ button the user is presented with three choices.

A video showcasing the flow in the client GUI can be found athttps://youtu.

be/N0tuARrqEHQ.

1. Waypoint-mission.

2. Area cover mission.

3. Manual mission.

We show the process of initiating area cover missions in figures 3.6 and 3.7. Af-ter marking an area to cover and pressing ’start mission’ the drone will execute coverage of the desired area.

Waypoint-missions lets the user select an arbitrary number of geographical coor-dinates to cover during a mission. The method for creating a waypoint-mission is presented in figure 3.5.

3.1. PLATFORM FOR AUTONOMOUS DRONE OPERATIONS - INFRASTRUCTURE AND OVERVIEW45

Figure 3.3: The main screen of the user interface. The video view is shown in the bottom right corner. The yellow drone icon denotes the current position of the drone.

Figure 3.4: When Select Mission is pushed in the main screen, one arrives at the mission selection view shown here. From here one can choose which mode to operate the drone in.

Figure 3.5: The waypoint-mission view with two points selected for the drone to cover in a mission.

Figure 3.6: This is the area search mode view. From here the user can select an area for the drone to cover.

3.1. PLATFORM FOR AUTONOMOUS DRONE OPERATIONS - INFRASTRUCTURE AND OVERVIEW47

Figure 3.7: Marking an area for the drone to cover.

Figure 3.8: A mission created for the drone based on a designated area.

Figure 3.9: This is the manual flight mode view. This lets the user fly around manually.

Figure 3.10: An illustration of the flexibility of the video view. It can be moved anywhere on the screen.

3.1. PLATFORM FOR AUTONOMOUS DRONE OPERATIONS - INFRASTRUCTURE AND OVERVIEW49

Figure 3.11: The video view in full screen. One can get in and out of full screen video view by tapping the video view.

3.1.2 Server architecture

When the client was fully functional, the focus shifted over to developing the server to be used for neural network inference. One of the available platforms provided by the Department of Computer Science at NTNU was workstation computers residing in the Visual Computing Laboratory at NTNU. The one used for this thesis was a custom built workstation with the specifications seen in table 3.2. We deployed Tensorflow object detection on this instance with GPU acceler-ation enabled. The object detection inference part of the code was implemented using Python.

After we had deployed the object detection system, it was still isolated from the client system. To enable communication between the two entities we turned the workstation into a server by opening up some ports and making the IP-address of the workstation public on the web. We set the communication up using POSIX sockets as discussed in section 2.7, to get real-time communication capabilities.

By creating a code base in C++ for communication, we had highly portable net-working code both supported by the client platform and the server. We embedded the C++ code into the iOS-client by creating it in a hybrid language known as Objective-C++ which a mix of Objective-C and C++. We furthermore created custom memory stream classes to wrap the data to be sent over the network, by

Table 3.2: Specifications of the utilized workstation computer at the Visual Com-puting Laboratory

Workstation Memory 32 GB RAM DDR3 Graphics Nvidia TITAN X

Processor IntelR CoreTM i7-6800K Processor

integrating the OpenCV library we got support for sending image data through these streams. We created one input memory stream for receiving data and one output memory stream for sending data.

We implemented object detection in Python because we needed access to Ten-sorflow Object Detection. However, we implemented the part of the server com-municating with the client in C++ stimulated by a desire of maximizing network communication speed. For the two separate programs to be able to communi-cate a fair amount of research went into discovering the optimal inter-process communication schema. The solution became using shared memory. By using shared memory, we got the best of both worlds, quick communication and pow-erful object detection capabilities with no compromise. The way we did it was by creating a smart schema for the two processes to follow as can be seen in figure 3.12 and figure 3.13.

3.1. PLATFORM FOR AUTONOMOUS DRONE OPERATIONS - INFRASTRUCTURE AND OVERVIEW51

Figure 3.12: The activity flow for the C++ component of the server.

Figure 3.13: The activity flow for the Python component of the server.

3.2. OBJECT DETECTION 53

Figure 3.14: Sample of cars as seen from the quadcopter at 10 meters of altitude.

3.2 Object detection

With the infrastructure in place, the time had come to train and implement the neural networks that were going to perform the analysis on data from the aircraft.

To showcase the capabilities of the system we wanted a neural network that was capable of detecting cars from a top-down aerial view. Sample images from the drone are shown in figure 3.14, 3.15 and 3.16. The aforementioned sample images show the images we need to run object detection on.

When we evaluated which object detector to utilize we considered both speed and accuracy, and after careful consideration, we found out that SSD would be the best fit for this project. SSDs fast inference time became the determining factor since the project aimed at achieving real-time performance. We deployed the model by using TensorFlows object detection API on the workstation refer-enced in table 3.2. We trained the model through several iterations on different datasets. We manually gathered data through various means, both aerial air-plane photographies from online services, as well as data gathered during drone flight, were used in the process. The images were annotated using the LabelImg tool from section 2.5.2. The model we chose for this project was pre-trained on the Coco-dataset. Ideally, we would have utilized a model pre-trained on the KITTI-dataset, but we could not find a version of SSD in the TensorFlow ob-ject detection model zoo trained on that dataset. Training was monitored using TensorBoard. The following subsections will present how the object detector was trained during eight iterations. The results for each of the iterations will be presented in chapter 4.

Figure 3.15: Sample of cars as seen from the quadcopter at 20 meters of altitude.

Figure 3.16: Sample of cars as seen from the quadcopter at 30 meters of altitude

3.2. OBJECT DETECTION 55

Figure 3.17: Sample airplane photo at full resolution with a subsection to illus-trate how the cars look up close. Notice the similarity to the drone footage