Scanning the horizon : forecasting and trading on forward freight agreements using long short-term memory neural networks and AIS-derived features

(1)

Scanning the Horizon

Forecasting and trading on Forward Freight Agreements using Long Short-Term Memory Neural Networks and AIS-derived features

Herman Farbrot and Sindre Kalvik

Supervisor: Roar Os Ådland

Master thesis

MSc in Economics and Business Administration, Business Analytics and Finance

NORWEGIAN SCHOOL OF ECONOMICS

This thesis was written as a part of the Master of Science in Economics and Business Administration at NHH. Please note that neither the institution nor the examiners are responsible − through the approval of this thesis − for the theories and methods used, or results and conclusions drawn in this work.

Bergen, Fall 2019

(2)

(3)

Acknowledgements

First, we would like to thank our supervisor Roar Os Ådland for providing us with guidance throughout the process and giving valuable inputs regarding feature engineering and forecasting issues. Secondly, we would like to thank Gabriel Fuentes for helping with the subsampling of Automatic Identification System (AIS) data for each International Maritime Organization (IMO) number, which helped in the process of vessel screening and feature engineering. Thirdly we would like to thank the Center for Applied Research (SNF) at NHH for providing us with AIS data.

(4)

Abstract

The purpose of this study has been to predict Forward Freight Agreement (FFA) prices using machine learning techniques, investigate the additional forecasting power of Automatic Identification System (AIS) derived features, and to evaluate the profitability of applying forecasted directional movements to trading strategies.

A Long-Short-Term Memory (LSTM) neural network is used to predict price movements for the two closest quarterly, and the closest calendar year Capesize 5 Time Charter (5TC) FFAs.

We have derived features from AIS data to generate proxies for supply, demand and geographical distribution for a subset of Capesize vessels. Additionally, we have included commodity prices and macroeconomic variables. The forecasting horizon investigated has been one week, two weeks, and one month ahead. To benchmark the LSTM model, we have included Vector Autoregressive (VAR) models, Autoregressive Integrated Moving Average (ARIMA) models and a Random Walk.

The VAR models were found to be superior at forecasting FFA prices, and the results showed that the LSTM neural network and VAR show potential for predicting directional movements of prices. The results further indicate that AIS data holds predictive capabilities regarding directional movements of prices. Lastly, the trading results give implications of increased profitability compared to buy-and-hold and trend-following benchmarks, by utilizing the trading signals from the models.

(5)

1 Introduction

Historically, the dry bulk shipping market has been extremely volatile. The large freight rate fluctuations create opportunities for generating substantial returns. On the other hand, these large fluctuations are a source of great risk for the operators. Foresight of the future developments in rates is therefore invaluable for speculators and operators, where derivatives such as Forward Freight Agreements (FFAs) may be used for both reducing and increasing freight rate exposure.

FFAs are financially settled forward contracts, where the underlying asset is one of the freight rate indices published by The Baltic Exchange. The settling price for an FFA contract is calculated using the average spot price over the maturity period for the underlying indices.

The contracts trade in a competitive over-the-counter market and go through clearinghouses (Baltic Exchange, 2019). The clearing of contracts eliminates the default risk, but price risk and cash flow risk is still inherent, due to movements in the underlying rates and margin requirements. The use of brokers as intermediaries affects the execution speed of trades, as well as liquidity and transaction costs. In combination with relatively sizeable minimum trading requirements, it is difficult for individual investors to access the market (Wilson, 2013).

FFAs were initially intended as a risk management tool for operators in the market, but there has been substantial speculative interest from investment banks and hedge funds (Zheng &

Chen, 2018). Shipowners have a long exposure to physical freight, and benefit from increased rates, while charterers have a short exposure, as they are obliged to pay for freight services.

These parties may take opposite positions in FFA contracts to hedge against unfavorable price movements to stabilize earnings and cash flow. Precise forecasts of FFA prices may be particularly helpful regarding contractual decisions and development of hedging strategies.

For speculative third parties, information about future FFA prices may be used to form effective trading strategies, without participating in the underlying market.

Nomikos and Doctor (2013) applied technical trading rules for the FFA market, where the superior trading strategies generated substantial excess returns compared to a buy-and-hold benchmark. Their results gave implications against a weak form of market efficiency in the FFA market. The efficient market hypothesis states that information should be reflected in the prices to a degree, where the additional gains to be made by acting on available information do not exceed marginal costs (Fama, 1991). However, we want to take a different approach than Nomikos and Doctor, as we aim to include relevant market features in addition to

(7)

geospatial data to support our predictions of movements. The utilization of Automatic Identification System (AIS) data has in recent years emerged as a new way of supplementing research on maritime economics. The vast amount of available AIS data and possibilities for the extraction of features offers a wide variety of new opportunities within shipping analytics.

AIS data, in combination with programming tools, enables calculation of several metrics for selected vessels across time and space, in addition to the tracking of vessels and fleets in real- time. The increased availability of computing power and applications for machine learning techniques has, simultaneously, created new possibilities for analyzing complex data sets.

The objective of this paper is to forecast FFA prices using machine learning techniques. Our contributions to the literature are threefold: Firstly, we will apply and evaluate the performance of an LSTM neural network to predict movements of Capesize 5 Time Charter (5TC) FFA prices. Secondly, we will create new AIS-derived features and evaluate their predictive powers on FFAs. Thirdly, we will evaluate the forecasting models’ ability to generate profitable trading signals, by utilizing them in simple trading strategies. The first quarterly contract (1Q), the second closest quarterly contract (2Q), and the closest calendar year contract (1CAL) will be used for forecasting and trading, where the forecasting horizon will be one week, two weeks, and one month ahead.

The remainder of this paper is structured as follows: firstly, we present a literature review that serves as a foundation for the work conducted in this paper, which culminates in our contributions to the current body of literature. Secondly, the data used will be presented.

Thirdly, we go through the process of creating features. Fourthly, the methodology for feature selection and machine learning will be presented. Lastly, we present the results, before rounding off with some concluding remarks and recommendations for further work.

(8)

2 Literature Review

The literature review for this is paper is extensive, as the paper revolves around several topics.

First, relevant FFA market research is covered, followed by research on applications of AIS, before introducing research covering the predictive capabilities of machine learning techniques in relation to shipping, commodity and financial markets.

There is a large body of literature covering the relationships between spot and FFA prices, forecasting, and hedging performance. Kavussanos et al. (2004) investigated the impact of FFA trading on spot market volatility, where they found that FFAs have a reducing effect on the spot freight rate volatility. Kavussanos et al. (2004b) studied the unbiasedness of FFA prices, where they found FFA prices one and two months before maturity to be unbiased predictors of the spot prices for Panamax routes. Bessler et al. (2008) also found evidence of a cointegrated relationship between spot and forward rates for Panamax bulk carriers. Zhang et al (2014) studied the relation between spot and Time Charter (TC) rates, as well as spot and FFA rates. Their results gave evidence of cointegration between spot and FFA rates, and for TC rates and FFA rates. Adland and Alizadeh (2018) studied TC rates and FFA prices, and also found evidence of cointegration, but that TC rates overall are priced higher than FFAs. A convenience yield, and the additional risk related to physical freight contracts, among other reasons, were pointed out as explanations for the price differences. Kavussanos and Visvikis (2004) found FFA contracts to discover market information faster than spot prices, which was pointed out to originate from lower transaction costs in the forward market, compared to the spot market. Additionally, they found a bi-directional relationship between the FFA and spot markets. Kasimati and Veraros (2017) found that FFAs have limited prediction power for prediction of future freight rates, but that FFA prices were useful for directional predictions.

Further, Yin et al. (2017) found mean-reverting tendencies for both FFA and spot prices.

Regarding the hedging performance of FFAs, Alizadeth et al. (2015) found that the hedging performance of FFAs for tankers is worse compared to futures in other commodities and financial markets. One reason for this was identified as the absence of a cost-of-carry relationship between spot and forward prices, due to freight being a non-storable commodity.

Alexandridis et al. (2017b) found that freight rate risk can be reduced by 48% by holding a diversified portfolio of freight rates, and that additional risk can be reduced by hedging with forward contracts.

Regarding the topic of forecasting, Batchelor et al. (2007) tested the performance of time series models for predicting spot and forward rates in the dry bulk shipping market, and found that

(9)

Autoregressive Integrated Moving Average (ARIMA) models and Vector Autoregressive (VAR) models were superior to Vector Error Correction Models (VECM) for predicting forward rates. Further, the study gave evidence of forward rates providing additional information for spot rates in the future, but that spot rates were unhelpful for predicting forward prices. Lyridis et al. (2004) applied neural networks for forecasting FFA Prices. The main findings were that neural networks performed well at forecasting future prices, but that connectionist models overall held superior predictive performance. Kavussanos et al. (2014a) investigated spillover effects between dry bulk FFAs and commodities, and found agricultural commodity futures to lead freight markets. Kavussanos et al. (2010) further found that spillover effects of return and volatility generally are one-directional from commodity futures to FFAs. Regarding the topic of FFA trading, Nomikos and Doctor (2013) conducted a comprehensive study of quantitative trading strategies for Capesize, Panamax, and Supramax FFAs across different maturities. They applied trend, momentum, and volatility-based strategies, and evaluated these against a buy-and-hold benchmark. The trend-following strategies were superior among the simple strategies based on mean returns and Sharpe ratio, while complex learning strategies provided the highest average outperformance in terms of Sharpe Ratio, compared to the benchmark. Their best active trading strategies generated significant excess returns compared to the buy-and-hold benchmark, which implies inefficiency the FFA market, as prices do not reflect all available information.

Several studies covering AIS utilization revolve around shipping network detection, demand estimations, and trade patterns. Kaluza et al. (2010) studied the trade patterns for the different ship classes. The study interpreted the global movements of cargo as a network with a high level of complexity. Spiliopoulos et al. (2017) present a methodology for converting AIS data to be used effectively for understanding the shipping patterns in relation to global trading patterns. Wu et al. (2017) used AIS data for mapping vessel density and traffic density, to reveal the distribution of ships and traffic. Vessel density was defined as the number of vessels per unit area, and traffic density was defined as the average number of vessels crossing a region per unit area per unit time. Vector and grid-based methods were applied for traffic density calculations, while vessel density calculations were based on geofencing. Geofencing is a method of extracting data, based on geographical boundaries.

Jia et al. (2015) investigated the reliability of reported draught in AIS data for estimating vessel utilization, in the dry bulk freight market. Due to AIS messages lacking info on cargo type and volume, they present different models for estimation of cargo size, mainly based on draught.

They found that AIS data alone is insufficient for precisely tracking seaborne trade. Adland et al. (2017) compared the accuracy of AIS-derived trade statistics for the crude oil market to

(10)

official customs data. Their results revealed that AIS-derived data for seaborne crude exports align well with official export numbers in aggregate, but that there are several challenges related to the aggregation of micro-level data. Some key challenges pointed out were the usage of pipelines in parts of the supply chain, in addition to countries and regions operating as storage and transshipment hubs. They further state that any maritime research which covers market fundamentals, could draw benefits from AIS-derived tonne-mile demand data, if the cargo is observable and homogenous. This is to a large extent the case for the dry bulk and tanker markets.

Adland (2019) presents a framework for utilizing AIS data for dry bulk market analysis. He presents algorithms for generating data for tonne-mile demand, proxies for operational efficiency, and counting of unemployed ships. He argues that freight rates between regions tend to move synchronized in the long run, but that there may be differences in the short run due to local supply and demand imbalances. Regarding idle ships, he shows an inverse relationship between Capesize earnings and idle ships waiting in open sea. There are drawbacks to the metrics presented due to limitations of the information from AIS. However, he states that enriching the AIS data with information from other sources, such as vessel characteristics, contractual information, and bill of ladings, can lead to better results.

Regli and Nomikos (2019) studied the effect of tanker supply for the TD3 tanker route between Ras-Tanura and Japan. They created a proxy for short-term supply in the voyage charter market, where vessels were classified as available or unavailable based on geographical restrictions, self-reported destination, loading condition, and employment status. They found their AIS-derived supply measures to partially explain freight rate movements, where other more traditional supply measures, such as fleet size, were ineffective. Also, the study gave evidence of a lagged relationship between ballast sailing speeds and short-term freight rate movements.

Machine learning techniques as a prediction tool have been covered extensively for various stock, commodity and shipping markets. Herrera et al. (2019) examined forecasting of long- term prices for crude oil, coal and gas by applying neural networks, Random Forest and hybrid models, which were compared to a Random Walk benchmark. The results showed that Random Forest were superior. Huang and Wu (2018) applied Deep Multiple Kernel Learning for forecasting energy commodity prices. Their model included information from oil, gold, and currency markets, and was found systematically superior for forecasting crude oil prices, compared to traditional neural networks and regression models. Fischer and Krauss (2018) applied LSTM neural networks for predicting directional movements of the constituent stocks

(11)

of the S&P500. LSTM neural networks outperformed memory-free classification methods, such as Random Forest, logistic regression, and memory-free neural networks. The model was able to generate excess returns compared to the market portfolio from 1992 to 2009, but from 2010, the model was not able to yield excess returns after transaction costs. Their findings give evidence of the market becoming increasingly mature.

Lyridis et al. (2004) applied neural networks for forecasting monthly VLCC spot freight rates from 1979 to 2012. The results gave evidence of neural networks providing valuable forecasts, especially in volatile periods. Further, they found that crude oil price spreads, and Capesize rates, improved the forecasting performance. Fan et al. (2013) utilized wavelet neural networks for predicting the Baltic Dirty Tanker Index. Among the features included in their model, was the Dow Jones Industry Average and the AMEX Oil Index. The results showed that their model was unable to predict rates more accurately than an ARIMA benchmark on short horizons, but showed signs of superiority on longer forecasting horizons.

There are two recent and relevant studies that cover machine learning methods with the utilization of AIS-extracted features, for predicting freight rates. Næss (2018) investigated whether multivariate machine learning methods with the inclusion of AIS-derived features, improved predictions of short-term rates in the LPG freight market. The thesis gave evidence of favoring multivariate machine learning models over a VAR model, where a Multi-Layer Perceptron neural network and a LSTM neural network were tested. The LSTM model yielded the best prediction power, and both machine-learning models predicted short-term freight rates more accurately when including AIS-derived features. Salen and Århus (2018) also applied LSTM neural networks with the inclusion of AIS-derived features, for predicting freight rate movements for the route between Ras Tanura, Saudi-Arabia and Singapore. The forecasting horizons were one, five, and ten days ahead. The model performed best on the ten days ahead forecast horizon, compared to a multivariate linear regression benchmark. The additional variables derived from the AIS data did not improve the model significantly. However, they state in the paper that more recent AIS data, and improved optimization of hyperparameters, could have improved the results.

FFA market research, the applicability of AIS data, and the predictive powers of machine learning, is covered to a great extent in previous literature. However, there has not been carried out comprehensive studies regarding the use of AIS features in combination with machine learning techniques, for forecasting movements in FFA prices. Næss (2018) and Salen and Århus (2018) applied machine learning techniques in combination with AIS data, for forecasting spot prices for selected routes, where we aim to predict the FFA prices for a

(12)

composition of routes for the Capesize segment. Thus, our approach is on a more global scale.

Further, we create additional AIS-derived features, among them, a more global tonne-mile demand estimation, as well as several new approximations for unemployment. In addition, our study is perhaps more applicable in practical terms, as FFAs allow for more dynamic adjustments to freight rate exposure, in addition to our forecasting horizon being longer.

(13)

3 Data Foundation

This Section presents the AIS and price data that is utilized in this paper. Due to the increased quality in AIS data from 2014, and the change from the Capesize 4TC to the Capesize 5TC index, our study period will be from May 2014 (Skauen, 2015).

3.1 AIS Data

AIS is an automated system used in the maritime space for tracking and exchange of navigational information for vessels. It was mainly developed to prevent collisions and assist port authorities in controlling marine traffic more efficiently. Signals from AIS transponders are transmitted using Very High Frequency radio waves. Messages include both dynamic information, such as speed, positioning, and course, as well as static information, such as International Maritime Organization (IMO) number. (Marine Traffic, 2018).

We have been granted AIS data by the Center for Applied Research (SNF), which contains AIS messages for all bulk carriers from May 2014 to December 2018. We have separated the data into files based on the IMO numbers. The AIS messages do not contain information about vessel specifications, such as DWT. Hence, we have matched the IMO numbers from the AIS messages with fleet information from Clarksons World Fleet Register, and filtered the complete fleet list to only keep vessels above 150.000 DWT. This subset represents the most relevant vessels for the contracts that we are predicting. Figure 3.1 shows a sample message after separation, for each IMO number. Table 3.1 explains the message components.

timestamp_position,lon,lat,course,speed,draught,destination

2018-03-31 19:29:14,-45.49054,-26.026875,277.5,14.8,7.6,PARANAGUA BRZL

Figure 3.1 AIS sample message.

Table 3.1 AIS message components.

Message Component Meaning

“timestamp_position” date and time for the position

“lon” longitude of the position

"lat" latitude of the position

“course” Sailing Course

“speed” speed in knots

“draught” draught in meters

“destination” destination text as sent by the ship

(14)

Figure 3.2 shows the deadweight tonnage (DWT) distribution for the fleet subset. The average DWT in our sample is 202.472, with a clear separation in the distribution between Capesize vessels (150.000 DWT to 320.000 DWT), and Valemaxes of around 400.000 DWT.

Figure 3.2 Histogram of DWT for the fleet subset.

3.2 FFA Price Data

We have obtained FFA prices from the Baltic Exchange. The data contains prices from May 2014 to December 2018. We will be looking into the contracts for the two nearest quarters, in addition to the nearest calendar year. A quarter consists of a basket of three monthly contracts settled on a rolling basis, while a calendar contract consists of 12 monthly contracts. The Capesize 5TC basket is comprised of route C8, C9, C10, C14, and C16, where a weighted average of the underlying routes is used for calculating the 5TC price. A brief description of the routes and the weights, are presented in Table 3.2.

Table 3.2 5TC description (Schmitz, 2016).

Route Code Route description Delivery Duration Weight

C8 Transatlantic round voyage Gibraltar/Hamburg 30-45 days 25%

C9 Fronthaul Amsterdam/ Rotterdam About 65 days 12.5%

C10 Transpacific round voyage China/Japan 30-40 days 25%

C14 China-Brazil round voyage Qingdao 80-90 days 25%

C16 Revised backhaul North China/ South Japan About 65 days 12.5%

(15)

To create a continuous series of prices, we have sorted the contracts by maturity, and created continuous time series containing the contracts that are closest to maturity, but have not entered the settling period. Due to the structure of the series, the price may jump when rolling between contracts. According to Masteika et al. (2012), a proportional back-adjustment is a suitable for backtesting purposes. The adjustment ratio is calculated by dividing the price of the first day of the new contract by the price of the last day of the old contract. The price series will later be normalized, in order to keep the trends. See Figure 3.3 for a chart showing the actual FFA price series, and Figure 3.4 for a chart showing the synthetic FFA prices. See appendix A.2 for descriptive stats for the FFA prices before and after proportionally back- adjusting.

Figure 3.3 Actual FFA Prices.

Figure 3.4 Proportionally back-adjusted FFA prices.

(16)

4 Feature Extraction

The process of identifying relevant features to extract, has been based on data exploration and the studies presented in the literature review. To capture the geographical positioning of the fleet, we will divide the world map into different world regions. By generating a plot showing the density of AIS signals from our vessel subset, we can identify the main sailing patterns for the fleet. The density plot helps to better visualize the general patterns, in contrast to a plot visualizing all observed patterns equally visible. This is pointed out in the study by Næss (2018). Figure 4.1 show a density plot for our fleet subset.

Figure 4.1 Density plot of the Capesize fleet.

Based on visual inspection of the density plot, in combination with export and import data from Clarksons Shipping Intelligence Network, we have divided the world into a set of polygons. The purpose of dividing the map into polygons, is to isolate regions with different characteristics concerning the trading pattern for Capesize vessels, and meant to capture movements between export and import regions. Figure 4.2 shows the world map divided into world regions, and Table 4.1 shows the world region names. To create daily time series concerning different regions, we have used the ray casting algorithm, which is a common method for determining which polygon a longitude/latitude pair is inside (Narkawicz &

Hagen, 2016). The use of a ray casting algorithm is also suggested in the work of Næss (2018).

(17)

Figure 4.2 World regions.

Table 4.1 World region names and corresponding number.

Number Name

1 pacific_ocean

2 north_america

3 south_america

4 europe

5 med_sea

6 arabian_gulf

7 south_africa

8 indian_ocean

9 asia

10 aus

To calculate sailing distances between positions, we utilize an open-source distance calculator called, “python ports distance calculator” (Witsung, 2019). The method makes use of a pixelated world map, where all land areas are marked as unavailable for travelling through. In turn, the pixelated world map is transformed to an array. The algorithm finds the route between two sea coordinates passing the minimal amount of points in the map array. Finally, the distance in nautical miles is calculated between each point in the identified least cost route, using Vincenty’s formula. This formula calculates the distance on the surface of the earth, assuming the shape of the earth is an oblate ellipsoid (Scheucher, 2016). Table 4.2 shows

(18)

examples of the calculated distances between two ports, compared to the ones listed on sea- distances.org. As can be seen, there are some minor differences for the selected routes, but it seems like an acceptable approximation.

Table 4.2 Distance comparison for selected routes.

Port Hedland to Qingdao Rotterdam to Qingdao

Sea-distances.org 3583 NM 10751 NM

Distance calculator 3531.60 NM 11218.8 NM

Difference in nautical miles 51.4 467.8

Difference in % of seadistance.org -1.43% 4.35%

When processing the draught data, we assume that an average draught status below 70% of a vessel’s maximum observed draught, implies that the ship is sailing ballast. Figure 4.3 shows the distribution of draught ratios on a given day for the Capesize fleet. The bimodal shape of the distribution implies that this threshold is reasonable.

Figure 4.3 Distribution of draught ratios.

(19)

4.1 Count of Vessels and Capacity

The total fleet capacity reflects the total supply (Stopford, 2009), hence we will count the total number of vessels and capacity globally. Further, we will create features for each of the world regions, that count the number of vessels, total freight capacity, and the relative capacity distribution. These features may capture regional imbalances that are relevant for the development of freight rates (Regli & Nomikos, 2019; Næss, 2018). The freight capacity for each vessel is assumed to be 95% of its DWT. The capacity in each world region is then calculated by aggregating the capacity for each vessel within a world region. Thus, this measurement does not consider whether a vessel is ballast or laden, or whether it is contracted or not. The count of vessels is simply measured as the total number of vessels within a world region, and the relative capacity is calculated by dividing the capacity in a world region, by the total capacity of the entire fleet, on a given day. The relative count of vessels is also calculated similarly. Figure 4.4 shows regional capacity and relative capacity for selected world regions.

Figure 4.4 Capacity and relative capacity for selected world regions.

4.2 Net Flow of Vessels in World Regions

In addition to counting vessels within world regions, we will calculate the net flow of vessels for each world region. This is done by recording when a vessel travels from one world region to another. The sum of incoming and outgoing vessels for a world region is then calculated.

Figure 4.5 shows the net flow of vessels for Asia.

(20)

Figure 4.3 Net flow of vessels for Asia.

4.3 Fleet Sailing Speed and Standard Deviation

The speed of the fleet affects operational efficiency (Stopford, 2009), and we will generate several speed features. The average speed for a vessel is calculated by looking at the first and last observation for a ship for a single day, and calculating the distance. The distance traveled is then divided by the time difference between the first and last observations. We will calculate average speeds for the whole fleet, and average speeds within each world region. Further, we will distinguish between vessels classified as sailing laden or ballast. First, we will calculate all the speed features without including stationary vessels (vessels with a daily average speed below 2 knots), as this subset better represents the speed of the fleet actually sailing. Second, we will create the same features with the whole fleet, including stationary vessels, as this may capture some additional information. Additionally, we will include the standard deviation for each speed feature as this may provide information regarding the variation in operational efficiency. The calculations for the speed features are shown in Equations 4.1 and 4.2. Speed plots are shown in figure 4.6.

Average Speed =

∑ Distance sailed Sailing time

n ε N

N bulkers (4.1)

(21)

Figure 4.4 Average speed moving vessels, and total average speed.

As we can see from the two speed plots in Figure 4.6, the inclusion of stationary vessels makes a big difference for the average speed for the Mediterranean Sea and the global fleet. However, the differences are not notable for the Indian Ocean, which seems reasonable due this world region covering open sea for vessels passing Africa, where all vessels are expected to be sailing in normal speeds. An additional note is that the average speed for the Indian Ocean is consistently higher than for the global fleet, which also seems reasonable.

4.4 Tonne-Mile Demand

The real demand of freight is calculated on a tonne-mile basis, as it includes both the volume of cargo and the distance (Stopford, 2009). We will therefore create a proxy for tonne-mile demand for our fleet. Based on visual inspection of AIS data, investigation of export and import data from Clarksons Shipping Intelligence Network, and a list of Capesize ports, we have created the port area polygons shown in Figure 4.7. Each port area is labeled as either an import or export port area. The rationale for this, is that we want a system for estimating tonne- mile demand without relying on draught. See appendix A.1 for the labeling of port areas.

Speed standard deviation = 1

𝑁 √∑(Distance sailed

Sailing time − 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑠𝑝𝑒𝑒𝑑)²

n ε N

(4.2)

(22)

Figure 4.7 Port areas.

An estimate of realized tonne-mile demand is created through the following pseudo-algorithm:

 All positional observations for each vessel are processed, and stationarity in either a defined import or export port area longer than 12 hours is classified as a port call for loading or discharge of cargo.

 Each classified unload port call is matched with the previous load port call.

 If there are multiple loading port calls registered in a row for a vessel, the last will be registered, and if there are registered multiple discharge port calls, the first will be registered. Hence, there will only be a matching of two ports for distance

calculations.

 For each pair of loading and discharge port calls, the distance is calculated and multiplied by 95% of the vessel’s DWT, thus assuming all cargo is transported in full shiploads.

A drawback with this estimate, is that the tonne-mile demand only will include the international part of a multi-port voyage. An additional drawback is that the tonne-mile demand will be observable on the day of discharge. Thus, it is expected to be lagged four to eight weeks from when the demand was actually realized, i.e. when the ship was fixed. The tonne-mile demand estimate for a given day is calculated as shown in Equation 4.3. Figure 4.8 shows moving sums for the tonne-mile demand estimate.

(23)

Figure 4.8 Moving sums of the tonne-mile demand estimate.

When calculating the realized tonne-mile demand for 2017 based on our proxy, we end up at 9.148 billion tonne-miles for our fleet. According to Clarksons Research (2019), the total demand for iron ore and coal was 13.430 billion tonne-miles, while the demand for all major bulks (iron ore, coal and grain) was 16 852 Billion tonne-miles in 2017. As we have excluded the small and mid-size vessels in the dry bulk fleet, it is hard to compare directly, but it seems like our tonne-mile estimate might be able to capture some dynamics.

4.5 Load Factor and Loading Status

The load factor for the fleet is calculated by taking the daily average draught for each vessel, both globally and in each world region. Subsequently, all the average regional draught ratios are then averaged for each world region. Because the ship crew manually enters this data, it is prone to errors according to Jia et al., (2015). In addition, the method of calculating a mean of means could also be affected by an uneven distribution of observations, among the vessels in a world region. Equation 4.4 provides the calculation of the average load factor.

Tonne − mile demand = ∑ DWT • 95% • distance

n ε N

(4.3)

Average load factor =

∑ Current Draught Maximum Draught

n ε N

N bulkers

(4.4)

(24)

The observed vessel count and capacity sailing with a draught less than 70% of its observed maximum, are also aggregated, providing a proxy for the number of vessels sailing ballast.

Similarly, we have made a proxy for vessels and capacity sailing laden, by aggregating the count of vessels with a draught ratio above 70%. As draught is manually set, we have also included a feature counting vessels leaving export port areas and a feature counting vessels leaving import port areas. Figure 4.9 shows the average load factor and the relative laden share of the fleet.

Figure 4.9 Average load factor and laden share of the fleet.

4.6 Operational Status

According to Adland (2019), the number of unemployed ships may provide information regarding the short-term balance of supply and demand. Unemployed vessels represent an oversupply, hence there is an inverse relationship between unemployed ships and freight rates.

We will therefore create proxies for unemployed ships, total idle non-laden ships, and stand- by capacity.

A proxy for unemployed ships is made by aggregating stationary vessels that are non-laden based on AIS-reported draught (<70%), and are outside of discharge ports, or in areas not defined as port areas. An extension of this feature is also created, which is called “total idle non-laden” ships, which aggregates non-laden, stationary ships, regardless of location. These measures are included, as stationary status, or non-laden status, by themselves are insufficient for determining the contractual status of a vessel.

(25)

Further, we have created a measure for stand-by capacity, which measures the number of stationary vessels in the major exporting world regions (North, America, South America and Australia), as it may capture some information regarding vessels awaiting loading operations.

In addition, the proxy may capture some information regarding vessels waiting to get a contract, or waiting for a contract to commence.

Additionally, features cumulating the stationary time for vessels either categorized as

“unemployed” or “total idle non-laden” are also created. These features represent unused supply, providing measures of shipdays.

Both the proxy for unemployed and total idle non-laden ships may underestimate the actual values, due to inactivity of AIS transmitters. Even though there are drawbacks with these measures, we believe they may provide useful information over time. Figure 4.10 shows the unemployed share of the fleet, and the total idle non-laden share of the fleet, based on our calculations.

Figure 4.10 Total idle non-laden and unemployed share of the fleet.

Table 4.3 shows all AIS features previously described. We aim to capture several aspects affecting freight rates, as we both include measures for supply, demand, operational efficiency and operational status. The AIS data quality improved significantly from 2014 because of improved satellite coverage (Skauen, 2015). However, there are still signal gaps, giving an uneven distribution of AIS messages. Thus, there may be inconsistencies in the feature values.

To account for this, we have calculated moving averages of seven, thirty and sixty days for all features, except for tonne-mile demand and net flow of vessels, where a moving sum is used.

(26)

Table 4.3 All described AIS features and what they measure.

Feature Measure

Global capacity Total supply

Global vessel count

Global count of ballast vessels Operational Status

Global count of laden vessels and utilized supply

Global load factor Regional load factor

Regional vessel count Regional supply

Regional capacity and distribution

Regional relative capacity of supply

Regional relative vessel count

Regional net flow of vessels Regional changes

Global count of unemployed vessels Excess supply and

Global unemployed capacity Stand-by capacity

Global count of idle-non-laden vessels

Global cumulative sum of shipdays unemployed Global cumulative sum of shipdays idle non-laden Stationary vessels in export world regions

Global average speed Operational efficiency

Global average speed moving vessels Regional average speed

Regional average speed moving vessels Global average speed laden vessels Global average speed laden vessels moving Global average speed ballast vessels Global average speed ballast vessels moving

Global average speed standard deviation Changes in operational Regional average speed standard deviation efficiency

Ballast speed standard deviation Laden speed standard deviation

Global average speed for moving vessels standard deviation

Tonne-mile demand Total demand

(27)

4.7 Non-AIS features

In addition to the AIS-derived features, we will include price and market information shown in Table 4.4.

The Capesize 5TC spot rate, Baltic Capesize index and Baltic Dry Index will be included, as Kavussanos and Visvikis (2004), among others, have found spot rates to have an effect on FFA prices. The exchange rate of EUR to USD and Yuan to USD will additionally be included, as fluctuations in exchange rates has an impact on revenue and costs for operators in the market (Kavussanos & Visvikis, 2006).

Bunker prices affect the cost of operating vessels. In classical literature, vessels adjust the speed corresponding to changes in bunker prices, causing changes in the operational efficiency of vessels (Stopford, 2009). We assume our subset of vessels use 380Cst marine heavy fuel oil, and we will include the price from Bunkerindex.com. The price is calculated in dollars per metric ton, based on the average prices for all 380Cst port prices. We will also include Brent Crude oil prices, due to having several applications, among them transportation (Tsioumas, 2016).

Due to the findings of Tsioumas and Papadimitriou (2018) implying a bi-directional relationship between the Baltic Capesize Index (BCI) and the prices for iron ore and coal, we will include the spot index for iron ore 62% (ISIX62IU), and Rotterdam Coal futures (API21MON). In a setting with increased demand for iron ore, the price for iron will increase, also causing a rise in the demand for transportation. On the other hand, a positive shock in freight rates, may cause operators in the market to consider other transportation options, or store more commodities as inventory. This will effectively reduce the supply, which leads to increased commodity prices. A weakness of including these commodity prices, is that the effect on freight rates is dependent on whether the change in a commodity price is driven by supply or demand factors. A sudden fall in demand for major bulks will usually lead to a fall in commodity prices, and lead to reduced freight rates. On the other hand, a negative supply shock will usually lead to increased commodity prices, but decreased freight rates (Tsioumas

& Papadimitriou, 2018).

In addition to the features above, we will include the S&P500, US 10 year government bond yields, and US 3 month Libor yields. They may hold information about the development in the economy, as well as future funding rates, and general expectations for the future, according to Da et al. (2015).

(28)

Table 4.4 Non-AIS features.

Feature Description

Baltic Capesize Index Spot index for Capesize vessels

5TC spot rate Spot index for 5TC basket

Baltic Dry Index Spot index for dry bulk Euro to USD exchange rate Exchange rate

Yuan to USD exchange rate Exchange rate Average 380Cst bunker prices Bunker price average Brent crude oil price Brent Crude Spot Price

ISIX62IU Iron Ore Spot Price Index

API21MON Rotterdam Coal Futures Price

S&P500 S&P500 Index

US 10 year government bonds Bond yield

US 3 month LIBOR Bond yield

The total number of features available is 623 after including moving averages and moving sums. Descriptive statistics of the features used in the final models in this study, are presented in appendix A.2.

(29)

5 Methodology

5.1 Data Preparation

The process of preparing the data consists of data transformation, data normalization, and splitting of the data into training and test sets. Data transformation is the process of differencing the data to get it in a stationary form. To check for stationarity, we will be performing an Augmented Dickey-Fuller test. See appendix A.3 for test results showing evidence of stationarity after first differencing.

After the data is transformed into stationary form, the next step is normalization. We utilize the min-max scaling method for normalization as shown in Equation 5.1. Normalization could prove necessary when the scale of features differs, and when the ranges of values are large.

The former because features with larger scales will have a greater impact on the predicted output (Angelov & Gu, 2019). The latter because it could cause slow learning and convergence for the neural network (Brownlee, 2019). The min-max scaling gives each observation a value between 0 and 1, which is appropriate in the context of a neural network (Brownlee, 2019).

Although the min-max scaling method is commonly used in practice, it does not handle outliers well. If outliers are present, they will highly influence the results (Angelov & Gu, 2019).

𝑥_{𝑛𝑜𝑟𝑚} = 𝑥 − 𝑋_𝑚𝑖𝑛 𝑋_𝑚𝑎𝑥− 𝑋_𝑚𝑖𝑛

(5.1)

5.2 Walk Forward Validation

When training and evaluating the prediction models, we perform walk forward validation, also called expanding window cross-validation. This procedure makes use of a series of test sets, each consisting of a single observation 𝐻 steps ahead, where 𝐻 denotes the forecasting horizon. Every corresponding training set consists of all observations that are at least 𝐻 steps prior to each test observation, in order to avoid look-ahead bias. The concept of updating the predictive model at each time step improves the models opportunity of making good predictions, due to continuously receiving new information and patterns to be included in retraining (Brownlee, 2016). As the forecasts will not be reliable if they are based on a small training sets, the series of test sets do not start before the last 15% of the available data, from May 2018. The accuracies of the forecasts are obtained by averaging the results in the test set

(30)

the over the entire forecasting period. The nature of this procedure involves successive testing on the same data and could be a source of overfitting. Figure 5.1 shows the principle of the walk forward validation method for predictions 𝐻 = 5 steps ahead. Here, the blue series represent the training sets, while the yellow fields represent the test sets. (Hyndman &

Athanasopoulos, 2013).

Figure 5.1 Walk forward validation.

5.3 Machine Learning Methodology

Machine learning techniques represent a set of algorithms that enables a learning process from a data set, without being directly programmed. When working with supervised learning and regression tasks, the goal of the machine learning model is to receive input data, adjust parameters, and produce an output that is as close as possible to the actual value. The process of adjusting the parameters in the model is done by training on historic observations according to James et al. (2017)

Neural networks belong to a class of machine learning models that are capable of adding increased complexity, and is able to comprehend non-linear relationships (Haykin, 2009).

Neural networks consist of layers of neurons and weighted connections. The first layer of a neural network is the input layer, which is passed the independent variables. The network further consists of hidden layers and an output layer. Figure 5.2 shows a simplified structure of a neural network. (Haykin, 2009).

(31)

Input layer Hidden layer Output layer

Figure 5.2 inspired by Haykin (2009).

The information in the network flows from the input nodes, through the nodes in the hidden layers, before finally calculating an output. The connections between the nodes has a weight 𝑤, which regulates the information flow between nodes. The neuron values in the hidden layers and the output layer is calculated as the sum of the products of every incoming neuron and the connecting weights, and additionally adjusting for the bias 𝑏. (Haykin, 2009).

Equation 5.2 shows the calculation of the value of a neuron 𝑥_𝑙,𝑓, based on the weights and neuron values from the previous layer, 𝑤_{𝑙−1,𝑓}, 𝑥_{𝑙−1,𝑓}, as well as the bias, 𝑏_𝑙,𝑓.

𝑥_𝑙,𝑓 = ∑(

𝑓=1

𝑤_{𝑙−1,𝑓} ⋅ 𝑥_{𝑙−1,𝑓}) + 𝑏_𝑙,𝑓 (5.2)

The full process of calculating predicted values based on the independent variables is called a forward propagation. When a forward propagation is completed, the predicted value is compared to the actual value, and a loss function is computed. The loss function expresses the accuracy of the predictions, and the learning process for the network is based on minimizing the loss function by adjusting the parameters (Haykin, 2009). The contribution for each parameter to the loss function is calculated and adjusted between every forward propagation.

(32)

Equation 5.3 expresses a given loss function, where 𝑥 represents the input values, θ represents the parameters (weights and biases), and 𝑦 represents the actual output value.

𝐿(𝑦̂, y) = L(𝑓(𝑥, θ), 𝑦) (5.3)

Recurrent neural networks are looped, which enables the passing of information between the steps in the network, thus enabling information to persist. Olah (2015) states that recurrent neural networks may be thought of as multiple copies of the same network, where each network passes on information. Figure 5.3 shows the principle of the passing of information between consecutive steps in a.recurrent neural network.

Figure 5.3 Illustration of the chained structure of a recurrent neural network (Olah, 2015).

Recurrent neural networks generally perform well on short-term dependencies, but struggle to perform well when the duration of the dependencies increases. The reason for performing poorly on long-term dependencies is due to exploding or vanishing gradients, according to Bengio et al. (1994). Vanishing gradients shrink exponentially and make it difficult for a model to learn. Exploding gradients grow exponentially and impairs learning, and can cause instability and crash the model. LSTM networks are a subgroup of recurrent neural networks that are capable of learning long-term dependencies, as well as overcoming the problems of exploding or vanishing gradients (Hochreiter et al., 2001). See appendix A.8 for a more in- depth introduction to LSTM.

(33)

5.4 Hyperparameters

The internal parameters are set by training the neural network, while hyperparameters on the other hand, are determined by the researcher. The hyperparameters are set before the training process of a network commences. There are numerous configurations for the hyperparameters, and the optimal values are dependent on the problem to be solved (Leoni, 2019). The hyperparameters considered for adjustment in our LSTM neural network are presented below.

 Number of hidden layers

 Hidden nodes

 Learning rate

 Batch size

 Epochs

 Window size

 Regularization

The number of hidden layers determines how many layers there are between the input and output layers. When adding hidden layers, they essentially form new combinations of the previous learned representation of the problem to be solved. (Brownlee, 2017). The number of hidden nodes determines the number of units in each hidden layer. When training the model, the learning rate determines how much the model changes the weights based on the estimated error (Brownlee, 2019b). The batch size, on the other hand, determines the number of samples of data to be processed before adjusting the model parameters, where a sample represents the input sequence for one timestep (Brownlee, 2018). The process of going through the entire training set and adjusting the weights is known as an epoch, and the number of epochs determines how many times this process is repeated (Brownlee, 2018). Each sample includes an amount of previous observations, determined by the window size. Finally, the regularization is the inclusion of constraints to a model, which helps to reduce overfitting, and increase out of sample performance (Brownlee, 2017b).

5.5 Benchmark Models

To create grounds for comparison for the LSTM models, we will include a Random Walk model, Autoregressive Integrated Moving Average (ARIMA) models, and Vector Autoregressive (VAR) models.

(34)

Random Walk

The Random Walk is a standard model for benchmarking in forecasting. This model is also known as the naive model and takes the last actual value as the forecast (Hyndman &

Athanasopoulos, 2013). The Random Walk model can be formulated as shown in Equation 5.4.

𝑦̂_𝑡+𝐻 = 𝑦_𝑡 (5.4)

Where H is the forecasting horizon ARIMA

ARIMA is a univariate forecasting technique which uses the past lags and errors of dependent variable Y. It is a common forecasting tool when working with time series, and capable of capturing trends The AR term refers to the use of previous observations of dependent variable Y as features, and the number of lags included is determined by the parameter 𝑝. The integrated term is defined by a parameter 𝑑, which determines the order of differencing. The MA term refers to the use of past error terms 𝑒_𝑡, where parameter 𝑞 determines the number of error terms to include. Equation 5.5 shows the generalized combinations of the AR and MA terms depending on the values of 𝑝 and 𝑞. (Hyndman & Athanasopoulos, 2013).

𝑦_𝑡 = 𝑎 + 𝛽₁𝑦_𝑡−1+. . . +𝛽_𝑝𝑦_𝑡−𝑝+ 𝜙₁𝜀_𝑡−1+. . . + 𝜙_𝑞𝜀_𝑡−𝑞 (5.5) VAR

VAR is a multivariate forecasting technique that facilitates the inclusion of previous values of features and predictions, and has proven to be a powerful forecasting tool when working with financial time series (Zivot & Wang, 2006). VAR is a system of equations where every variable is calculated as linear combinations of past values of all the variables. The difference between traditional models, such as linear regression where predictor variables only affect the dependent variable, is that the variables influence each other. An example system with two variables and one lag can be expressed as shown in Equations 5.6 and 5.7. (Hyndman &

Athanasopoulos, 2013)

𝑌_1,𝑡= 𝑐₁ + ∅₁₁𝑌_1,𝑡−1+ ∅_12,1𝑌_2,𝑡−1+ 𝜀_1,𝑡 (5.6)

(35)

𝑌_2,𝑡 = 𝑐₂ + ∅₂₁𝑌_1,𝑡−1+ ∅_22,1𝑌_2,𝑡−1+ 𝜀_2,𝑡 (5.7) Where 𝜀_1,𝑡 and 𝜀_2,𝑡 represents white noise processes, ∅_𝑖𝑖,ℓ denotes the influence of the ℓth lag of variable 𝑌_𝑖 on itself, while ∅_𝑖𝑗,ℓ denotes the influence of the ℓth lag of the variable 𝑌_𝑗 on 𝑌_𝑖.

5.6 Feature Selection

Feature selection is the process of choosing the features to be utilized in the prediction model.

The goal is to remove irrelevant and redundant features, and avoid getting an overfitted model.

According to James et al., (2013), effective feature selection increases out of sample prediction accuracy, in addition to making the model easier to comprehend. Ideally, we would test the model with all combinations of the available features, but this will be too computationally expensive.

We will perform the filter methods presented in Table 5.1, where the resulting metric of feature importance is scaled to the range [0, 1]. Subsequently, we will calculate a mean importance score based on the performance from all the filter methods. Among the filter methods considered are both univariate and multivariate, as both could provide useful insight as to the feature’s importance. This scheme of creating a mean feature importance score is inspired by the work of Næss (2018). In the following are brief descriptions of the filter methods considered.

Table 5.1 Filter methods for feature selection.

Multivariate Univariate

Linear Regression Correlation coefficient

Lasso Regression Maximal Information Coefficient (MIC) Ridge Regression

Random Forest

Pearson’s Correlation Coefficient

Pearson’s correlation coefficient expresses the linear relationship between two variables and can be calculated as shown in Equation 5.8. The output for a correlation coefficient is in the range of [-1, 1], and by taking the absolute value, the coefficient will be in the range of [0, 1]

The correlation coefficient is useful for indicating linear relationships but could be misleading if the there exists a non-linear relationship.

(36)

ρX, Y = 𝑐𝑜𝑣 (𝑋, 𝑌) σ_𝑋σ_𝑌

(5.8)

Maximal Information Coefficient (MIC)

MIC, first presented by Reshef et al., (2011) is a metric that is able to discover a wide variety of relationships between two features, linear and non-linear. The metric possesses the property of equitability, meaning that it returns the same score for equally noisy relationships independent of the type of relationship (e.g linear, polynomial etc.) (Reshef, et al., 2011). The MIC takes a value in the range of [0, 1] and is calculated as shown in Equation 5.9. (Kinney

& Atwal, 2014 )

𝑀𝐼𝐶(𝑋, 𝑌) = 𝑚𝑎𝑥 { 𝐼(𝑋, 𝑌)

𝑙𝑜𝑔₂𝑚𝑖𝑛{𝑛_𝑋, 𝑛_𝑌}} (5.9) Where 𝐼(𝑋, 𝑌) is the mutual information, and can be defined as shown in Equation 5.10 (Kinney & Atwal, 2014 ).

𝐼(𝑋, 𝑌) = ∑ 𝑝(𝑥, 𝑦)𝑙𝑜𝑔₂ 𝑝(𝑥, 𝑦) 𝑝(𝑥)𝑝(𝑦)

𝑥,𝑦

(5.10)

The MIC represents the mutual information between random variables X and Y, normalized based on the minimum joint entropy between the two given random variables (Wint, 2019).

Multiple Regression

Multiple regression can also give an indication of the importance of variables through the magnitude of the coefficients. The multiple regression linear model between the dependent variable Y and the independent variables X, can be expressed as in Equation 5.11, according to James et al. (2017).

𝑦_𝑡 = 𝛽₀+ 𝛽₁𝑥_1,𝑡+. . . + 𝛽_𝑝𝑥_𝑝,𝑡+ 𝜀_𝑡 (5.11) Where 𝑥_1,𝑡, … , 𝑥_𝑝,𝑡 represents the value of a feature at time t, 𝛽₁, … , 𝛽_𝑝 are the coefficients for the features in regards to Y, 𝛽₀ is the intercept, and 𝜀 is the error term. The regression coefficients, 𝛽, are found by minimizing the residual sum of squares, shown in Equation 5.12.

(37)

𝑅𝑆𝑆 = ∑(𝑦_𝑡− 𝑦̂_𝑡)²

𝑁

𝑡=1

(5.12)

Fitting a linear regression model with all available features will probably lead to overfitting.

The upcoming methods of Lasso and Ridge regularization could be able to counter this drawback. When fitting an Ordinary Least Squares model with many predictors, the model will likely suffer from multicollinearity. However, the method is simple and can provide insights with regards to feature importance. The coefficients from the linear regression model are scaled to the range of [0, 1] for comparability with the other methods.

Lasso Regression

The Lasso regression is a regularization technique that applies shrinkage to a linear regression model. The Lasso adds a penalty correspondingly to the absolute value of the magnitude of the coefficients. This has the effect of shrinking less important coefficients features toward zero, where some coefficients may even shrink all the way to zero. Hence, the Lasso procedure results in sparse models that reduce problems with overfitting and multicollinearity. 𝜆 is the parameter that determines the impact of the regularization. The cost function for lasso can be expressed as shown in Equation 5.13. (James et al. 2017).

∑ (𝑦_𝑡− 𝛽₀− ∑ β_𝑗𝑥_𝑡,𝑗

𝑗

)

n 2

𝑡=1

+ 𝜆 ∑| β_𝑗|

𝑝

j=1

(5.13)

Ridge Regression

The Ridge regression behaves similarly to the Lasso regression, but the added penalty is here equivalent to the square of the magnitude of the coefficients. This causes all coefficients to be shrunken by the same factor, and will not eliminate any coefficients from the model, unlike the Lasso regression. The cost function to be minimized can be expressed as shown in Equation 5.14. (James et al. 2017).

∑ (𝑦_𝑖− 𝛽₀− ∑ β_𝑗𝑥_𝑡,𝑗

𝑗

)

n 2

𝑡=1

+ 𝜆 ∑ β_𝑗²

𝑝

𝑗=1

(5.14)

Scanning the horizon : forecasting and trading on forward freight agreements using long short-term memory neural networks and AIS-derived features