Machine learning & air quality forecasting

Machine learning

5.3 Machine learning & air quality forecasting

sample numberi, and h represents the predictive function of a machine learning model.

Coefficient of determination (r²) The coefficient of determination is a measure-ment of how likely it is that future samples are correctly predicted by a model. The best possible score is 1 and it can be negative [40].

Root mean square error (RMSE) The root mean square error emphasizes large errors by squaring the prediction error, which may be an undesired property if there are many outlying values that should be ignored [5, p. 37-39].

Mean absolute error The mean absolute error is a simpler metric and is the mean absolute distance between the predicted value and the observed value [5, p. 39].

Median absolute error As the name suggests median absolute error is the median of a absolute errors made by the model when comparing predictions with the actual values. A potential advantage of using the median is that outlier data are ignored.

5.3 Machine learning & air quality forecasting

Stochastic multiple linear regression and neural networks have previously been used to predict the concentration of air pollutants with success [18].

Forecasting models based on multiple linear regression with meteorological variables have previously been developed and tested with data from Helsinki and Athens. The target for the study was to forecast the maximum hourly concentration of PM₁₀ and NOX, as well as the daily average, for the following day. The latter was concluded to be the easiest task to model out of the two, as anomalies were smoothed by more

5.3. MACHINE LEARNING & AIR QUALITY FORECASTING

predictable observations. The challenge of seasonal variations was handled by making separate models for cold and warm periods of the year [18].

A two-day pollutant forecast with good predictive abilities has previously been built using an Elman Model based on a recurrent neural network to forecast the occurrences of different components in the city of Palermo (Italy), including PM10and NO2. Variables used for the models were wind speed and wind direction, pressure, and temperature.

[20].

Neural networks have previously been built for PM₁₀forecasting targeting the follow-ing day in urban areas in Belgium. The study concluded that meteorological conditions were the main influencer of the PM₁₀ concentration, with boundary layer height as the most important variable. Anthropogenic activities, on the other hand, had a smaller effect. Contrary to much similar work wind speed did not provide a significant role in the accuracy of the model [19].

Neural networks and lazy learning have been tested for ozone and PM₁₀forecasting for the current data in the city of Milan using air quality and meteorological data with promising results. The best estimation parameter for PM₁₀ was the previous observation, with less emphasis on meteorological variables [41]

A forecasting system for making NO₂ forecasts 24-48 hours in advance at four dif-ferent locations in Ireland has been developed using a model based on multiple linear regression, historical NO₂ observations, and meteorological data. Wind speed and di-rection were found to be a significant associated with the emissions levels at the three different sites [16].

Chapter 6

Architecture

The Braluft system is based on a microservice architecture consisting of four microser-vices spread across three virtual machines. Each microservice is a stand-alone appli-cation sub-unit. The communiappli-cation between each of them is based on HTTPS and REST-inspired APIs. This approach leads to flexibility as each service can be devel-oped and redeployed separately without affecting the other services [42]. More tradi-tional ”monolithic” approaches are on the contrary dependent on a full redeployment of the entire code base for even small changes. The microservice architecture is conse-quently more lightweight which enables easier deployment during updates and is well suited for situations where it is difficult to anticipate all functionality in advance [43].

The virtual machines are hosted on UH-IaaS, a collaboration between the univer-sities in Bergen and Oslo offering cloud computing to members of various research organizations including the University in Bergen [44]. They are running with identical specifications and operative system (1 VCPU, 4GB RAM, 20GB hard drive, Ubuntu 18.04).

This section will provide a more detailed description of the data sources and how they fit into the overall architecture of Braluft. The three virtual machines are each given an ambiguous name to make them distinguishable, Wilhelm, Thorvald, and Ragnvald.

The intent behind the ambiguity is to enable easy moving of services if needed. The following list shows an overview of the architecture with a summary of the different

services hosted by the different machines.

• Wilhelm Main service

Responsible for the communication between all the services through daily routine operations, data persistence, and serving as an API for the front end. Everything runs in other words through the main service.

Source Service

Enables gathering and formatting of data from external APIs and hosting these data so the main service can access them.

PostgreSQL

A relational database for data persistence.

Front-end

Static file hosting of front-end resources.

• Thorvald

Model manager

Responsible for handling the machine learning models that are a part of the application. This includes training models and making predictions using data provided by the the main service and hosting various utility functions.

• Ragnvald Image service

The image service is responsible for downloading web camera images over-looking the intersection on Danmarkplass, detecting the number of vehicles in each image, and providing these data to the main service.

Figure 6.1: The architecture behind Braluft

As the figure illustrates everything runs through the main service. The Source Service hosts Weather forecasts / observations from the API’s of Norwegian Meteorological Institute and air

quality observations from NILU. Observational traffic data are created by the Image Service using web camera images. The main service collects these data and saves them in the database. The main service is also sending the observational data to the model manager to train the underlying machine learning models in the program, as well as making sure forecasts

are made based on weather forecasts originating from the Source Service and traffic forecasts from the Model Manager.

In document Braluft: Forecasting air quality using incremental models and computer vision (sider 52-57)

Machine learning & air quality forecasting