Modeling - Remote vessel survey using VR

Several techniques and methods can be used to create a virtual model of a real world object.

One of these are generating the virtual representation using a LiDAR scanner, which will be presented in subsection 2.2.1. Another method for generating a virtual representation of a real world object is by using machine learning on floor plans to generate 3D models, as discussed in subsection 2.2.2. 360° imaging can be used as a supplement to the above modelling methods as presented in subsection 2.2.3.

2.2.1 LiDAR

LiDAR is a remote detection and ranging method used to measure distances, and is short for Light Detection and Ranging. It is a sensor which emit infrared light pulses and measures how much time it takes the light pulses to return after hitting nearby objects. The time between the output laser pulse and the detection of the reflected pulse allows the LiDAR sensor to calculate the distance to each object with an accuracy depending on the sensor. This is possible because the speed of light

is known, and thus the travel time seen in Equation 2.1.

d= c·t

2 (2.1)

Wheredis the distance,cis the light speed, andtis the time it takes the light to bounce back to the sensor. It is divided by 2 because the light has to travel the same distance twice as it returns.

Each second a LiDAR sensor captures millions of such distance measurement points. A point cloud which is a 3D matrix of detected points, is generated from a LiDAR scan. A 3d model of the scanned environment can be generated from the point cloud. Great accuracy can be achieved, and LiDAR sensors can provide reliable results over short and long ranges with an accuracy of millimeters.

(a) Point cloud generated by a LiDAR scanner. The

points are connected to form vertices. (b) A textured 3D model generated from the scans point cloud.

Figure 2.2: LiDAR scan of a sofa with 20mm accuracy.

A key benefit of using a LiDAR sensor is the ability to perform well under any light conditions.

This is because a LiDAR sensor only measures the reflection of the infrared light pulses that is emitted, which behave the same regardless of existing lighting conditions. The resulting point cloud of a LiDAR scan can be converted into a 3D map of the scanned environment. A LiDAR scan can be supplemented with different sensory data, such as images, to get a better understanding of the scanned environment. The point cloud surfaces is then combined with images of the scanned environment, creating a textured 3D model. LiDAR sensors are currently used in many industrial applications, ranging from scanning entire buildings, as presented in subsection 2.3.1, to navigation in autonomous driving vehicles.

There exist two types of LiDAR sensors, mechanical scanning LiDARs and a solid-state LiDARs.

A mechanical LiDAR can sample a large area simultaneously by rotating the sensor up to 360 degrees or by using a rotating mirror to steer a light beam. It provides a detailed mapping of a scanned environment. However, a large price, complexity, and reliability issues makes mechanical LiDARs an unatractive option. A solid-state LiDAR is built without any moving mechanical parts, and scans an environment incrementally. This makes it highly efficient at processing data, as well as being cheaper according to LeddarTech, 2020. A solid-state LiDAR scanner can be either stationary or handheld.

2.2.2 Machine Learning - 3D model from floor plan

3D models of objects and environments can be achieved by generating the necessary parts of it using specialized algorithms. If enough data about the real life object to be visualized is known,

algorithms can recreate the real life object in a virtual environment as discussed by Bjorn Mes in section B.3. Mes was able to accurately replicate a vessel virtually, using only data about the vessel and an algorithm.

Another method used for constructing 3D models of vessels is to construct a 3D model from their floor plan. By using a deep neural network to predict room-boundary elements, a digital representation used to reconstruct a 3D model is possible, as described by Zeng et al., 2019. The paper presents a new approach to recognize elements in floor plan layouts, such as walls, doors, windows, and different types of rooms. Since elements are recognized and labeled, a 3D model reconstruction of the floor plan can be made.

The architecture of the network created by Zeng et al., 2019 can be seen from Figure 2.3a. Here, a deeper version of a convolutional neural network called VGG is deployed to extract features from an input floor plan image. Then, the network is divided into two networks with different tasks.

One predicts the room-boundary pixels, e.i. walls, doors, and windows. This is the floor-plan elements that separate room regions. The other task predicts the room-type pixels, i.e. dining room, kitchen, etc. The network thus learns the shared features common for both tasks, then use two separate VGG decoders to make different predictions.

(a) Overall network architecture of the deep neural net-work created by Zeng et al., 2019.

(b) The attention & contexture layer in Figure 2.3a to help make room-type predictions by Zeng et al., 2019.

Figure 2.3: Dummy figure

To help with making room-type predictions, a spatial contextual module is created as seen in Figure 2.3b. Here, the input to the top branch is the room-boundary features from the top VGG decoder (see the blue boxes in Figure 2.3a and Figure 2.3b). The input to the bottom branch is the room-type features from the bottom VGG decoder (see the green boxes in Figure 2.3a and Figure 2.3b). The result from the different levels is the spatial contextual features, which help the features integration for room-type predictions (Zeng et al., 2019).

After the VGG network has been trained, correct predictions of room-boundaries and room-types from floor plans can be made. This can be used to create a 3D model based on the predicted results as seen in Figure 2.3c.

2.2.3 360 ° imaging

360° imaging, also known as omnidirectional imaging, uses cameras to create a high resolution 360°

field-of-view which show the entire scene at hand. It works by combining several photos into a clear 360° view using advanced algorithms. Several different industries have applied this technology. The

car industry use it to provide visual assistance to drivers. A clear benefit of using these cameras is the enhanced overview it gives the user.

There are two types of videos which can be seen in a 360° field-of-view, monoscopic videos and stereoscopic videos. Monoscopic videos are flat renderings, meaning that there is no depth percep-tion, captured by the 360° cameras. This is the most common type of 360° media, and is captured using a single lens system. Monoscopic renderings are commonly used for mapping, for instance in Google Street View (90Seconds, 2020). Stereoscopic video is on the other hand captured using a twin lens system, mimicking how humans use their eyes to perceive depth and distance (Viewport, 2021). Stereoscopic video thus add another level of immersion by adding depth data between the foreground and the background. An example of the added depth perception of stereoscopic video can be seen from Figure 2.4.

Figure 2.4: A stereoscopic image showing a woman closer to the camera than the image in the background (Terence Eden, 2013).

In document Remote vessel survey using VR (sider 14-17)