Energy Efﬁciency in Wireless Sensor Networks, Through Data Compression

(1)

Energy Efficiency in Wireless Sensor Networks, Through Data Compression

Srod Karim

Master’s Thesis, Autumn 2017

(2)

(3)

Abstract

Wireless sensor networks are an ever-growing market in today’s society;

Smart homes with wirelessly connected and controllable devices, continuous monitoring in environmental, health care and industrial settings, and automated, self-adjusting systems in homes and factories are all growing technologies that are built using wireless sensor networks.

In many of these systems, energy consumption can be a major concern;

Most devices in the network rely on small batteries as their energy source. Because of this, any energy usage optimization can lead to greatly increased lifetime.

Often, radio communication between microcontrollers and other devices is the biggest energy drain in the system.

Compression of sensor data before transfer between nodes can be used to reduce the energy consumed by the radio transmissions. By first compressing the data, it is possible to reduce the size of the message transmitted over radio, which in turn will lead to less energy consumed.

However, energy is required to perform the compression and decompression. It can be important to have a good idea of when compression may be beneficial for the overall energy efficiency of a wireless sensor network, and when it might be better to send the data without compression.

In this thesis we will look at typical properties of wireless sensor networks, wireless transfer protocols for sensor networks, and sensor data. We will analyze the effects of compressing sensor data before sending it over the network, and perform experiments and measurements in a simple wireless sensor network, to assess the effects data compression can have on the energy consumption in wireless sensor networks.

(4)

Acknowledgments

My thanks go to my advisor Kjetil Østerås, who provided guidance, advice, proofreading, suggestions and general feedback, and for helping me all around throughout my thesis.

I also want to thank Silicon Labs, for giving me the opportunity, and for providing me with the microcontrollers and tools required to perform the experiments.

I also owe thanks to my advisor at the University of Oslo, Associate Professor Arne Maus, for providing guidance, and for giving feedback on the structure and content of this thesis.

I would also like to thank Jonathan Ringstad for providing in depth proofreading and valuable feedback on the structure and wording of the text.

(5)

Preface

Overview

In this thesis, we perform experiments and measurements to asses the effects of compression in wireless sensor networks. For this purpose, we have divided our thesis into 6 distinct chapters that each cover a aspect of our thesis.

In chapter 1, we provide motivation for our thesis, and look at previous work in the field of energy efficiency in wireless sensor networks.

In chapter 2, we look at Wireless sensor networks and wireless data transfer, including many factors that contribute to the overall energy usage of wireless transmissions.

In chapter 3, we outline various encoding and compression algorithms that can be useful for wireless sensor networks, and create a basis for the algorithms we use for our experiments.

In chapter 4, we explain the experiments we perform, and examine the algorithms and datasets we use for our experiments.

In chapter 5, we look at the results from our experiments, and asses the effects compression has on energy usage in wireless sensor networks.

Finally, in chapter 6 we discuss potential future work in the fields of energy efficient wireless data transfer.

(6)

Introduction

1.1 Motivation

Energy consumption is an emerging topic in the field of computer science.

Excessive production and consumption of electricity can have hazardous effects on the environment, due to the use of non renewable energy sources.

Energy consumption is also a big concern for many companies with focus on computation and IT. Power consumption has been a design constraint for many businesses, most importantly due to the high monetary costs of supplying power. The electricity costs of modern computers have become one of the biggest strains on the budget of many firms and data centers. In 2005, Google engineer Luiz André Barosso published an article in which he analyzed the performance costs of their computer severs. In this article he stated that electricity costs made up for roughly one third of the total cost of running the servers. He additionally theorized that energy costs could end up becoming the most expensive part of maintaining a computer server by the end of the decade, even more expensive than the cost of the computer hardware[1].

Energy consumption is also a major concern in mobile, battery-driven devices. For laptops and phones, long battery times are an important feature. Yet, it can be hard to maintain long battery times, given the power consumption of the various components of modern laptops and phones[2][3].

Phones are not the only mobile devices that depend on getting energy through battery usage, as more and more devices using microcontrollers are being produced. Microcontrollers are small processors used in a variety of fields. These include remote controls, power tools, toys, office machines and automatically controlled products and devices such as sensor networks, control systems and implantable medical devices, as well as other embedded systems. Given their small structure and autonomous nature, they tend to be heavily power constrained. Many microcontrollors reside in locations that are hard to access, such as

(9)

Chapter 1. Introduction

Section 1.2. Energy efficient hardware

medicinal devices implanted in the body, or distributed sensor networks operating in forests, seas, and other locations far from civilization. In some cases, the changing of batteries for these products can be a difficult or maybe even impossible task, further underlining the importance of energy conservation. Additionally, microcontrollers in IoT devices and wireless sensor networks in homes aim to be easily installable, without requiring rewiring by electricians or drawing long extension cords, and therefore have to rely on batteries. Finally, a lot of devices operating on microcontrollers are intended to reduce energy consumption in larger systems, such as smart light bulbs that automatically turn on and off when people enter and leave the room, and that adjusts brightness depending on the time of day. It is of course important that the microcontrollers are are energy efficient as well, to reduce the overall energy consumption in the system.

In this thesis, we want to add to the scientific knowledge of energy consumption in mobile devices, with focus on wireless sensor networks.

We want to measure the energy consumption of data transfer in wireless sensor networks, and asses how this property is affected by using compression algorithms to compress data before transfer.

1.2 Energy efficient hardware

Due to the importance of power consumption, and the benefits energy efficiency can have, there has emerged a market for the design and production of energy friendly solutions in large portions of the computer science landscape. There is a constant flow of newer and more effective hardware and software designed with energy consumption in mind; tools, peripherals, code and even specifications are being made to be more energy-aware.

Perhaps most important is the development of energy efficient Central Processing Units (CPUs), as these can be the most energy hungry part of any microcontroller. With the rise of the Internet Of Things (IoT) and smart devices, there has been a rise to the energy friendly processors market. The current leader within this market is ARM, with a popular line of energy friendly processors. In 2013 ARM released a statement that nearly 60% of all all mobile devices worldwide used at least one ARM-based chip[4].

1.2.1 Energy efficient processors

ARM offers three primary series of microprocessors for use with mobile devices, and devices that don’t require high processing power. Each processor series has its own characteristics and applications. Most noticeably trading off computation speed for energy efficiency.

(10)

Section 1.2. Energy efficient hardware networks, through data compression

In addition to the processor hardware, ARM provides other incentives for energy efficiency. For example, ARM designed the THUMB and THUMB2 instruction sets for their processors. These are compact processor instruction sets which allow for highly dense code, leading to an even smaller energy footprint.

Many of the processors also come with multiple sleep states or energy modes. These allow the processor to run at varying energy levels, with proportional energy usage and computational power. When a processor is put into a sleep state, the main CPU is turned off, so it will consume less power. When in sleep mode, the CPU is unable to execute instructions until it is woken up again with a wakeup signal. The wakeup signal can be sent from a timer, or another peripheral. Sleep states are a great measure for cutting energy usage in a program with intervals where computation is not needed, and may increase the battery lifetime of the device. This can happen when a program is waiting on incoming data, or when it is waiting until a scheduled time to perform its code.

1.2.2 Energy efficient microcontrollers

In addition to the energy efficient processors with multiple energy modes, many microcontrollers will offer supplemental features to further reduce power consumption. Some microcontrollers even specialize in maintaining energy efficiency. This is done by designing an energy aware microcontroller layout, providing energy management controls to the programmer, and incorporating energy-efficient on-chip peripherals.

These can be peripherals such as low energy clocks, timers, serial interfaces, wireless communication, and more.

Since the processors are often the most power-consuming part of an microcontroller, it can be beneficial to use the processor as little as possible.

In 1.2.1 we discussed the idea of putting the processor into a sleep state when not in use, but there are additional ways to reduce processor usage.

If a certain type of task is performed often in a program, it can be beneficial to outsource this task to a specialized peripheral. The peripheral will have less flexibility than a full CPU, but may perform the specialized tasks at a lower energy cost. While the peripheral is performing its task, the main CPU can perform other tasks, or even go into a sleep state.

One popular peripheral used for offloading tasks from the processor is the Direct Memory Access (DMA) unit. This is a unit that allows for memory transfers without CPU intervention. The processor can configure the DMA, and move on to perform other tasks, or even be put into a sleep state. The DMA will transfer memory as configured by the processor. The DMA can transfer data within the local memory of the microcontroller, as well as transferring data between different peripherals. It can also be set up to handle incoming data and requests from other peripherals, and to resume the processor from sleep mode once the transfer is complete. This

(11)

Section 1.3. Energy efficient software

is especially useful for communication between the microcontroller and peripherals. The serial ports used for communication between the processor and peripherals typically run at a slower clock speed than the processor’s. This means that the processor will typically waste energy while communicating with a peripheral, by having cycles where it waits for the communication to continue. Using the DMA, the processor can instead go into a sleep mode. The DMA will then copy the message from local memory to the serial port, or from the serial port to the local memory, and resume the CPU once the message transfer is complete, thereby saving the energy that would otherwise be spent on the processor simply waiting.

1.3 Energy efficient software

While it is important to have energy efficient CPUs and on-chip peripherals, it is also important to utilize these components to their fullest potential through energy aware programming. In addition to energy efficient hardware, software can be designed with energy consumption in mind. Everything from core functions such as operating systems and schedulers, down to simple algorithms such as sorting, can be optimized for energy efficiency.

This is also a field of academic interest, as studies and new algorithms are constantly developed. This thesis focuses primarily on the effects data compression has on energy consumption in wireless sensor networks, but there are many additional measures for minimizing energy usage in energy constrained devices and wireless sensor networks. We will discuss a few notable thesis’ and studies that focus on energy efficient software.

1.3.1 Clockspeeds and dynamic speed scaling

One way to lower the energy cost of a CPU is to lower the clock speed it is running at. CPUs typically use less energy the slower clock speed they run at. However, running at a lower clock speed also means that the processor performs less computations per second, often at a higher energy cost per computation. Often a good strategy is to perform and complete tasks as fast as possible, and put the processor into sleep mode until the next task needs to be started. An alternative technique is to run the processor at a clock speed that lets it complete a task just in time before another task is initiated. Despite using more energy per computation, this technique can save energy by not having to go into sleep mode in the first place, and by not having downtime where no computation is performed.

Francis Yao, Alan Demers and Scott Shenker wrote in their 1995 article "A scheduling model for reduced CPU energy" about a dynamic speed scaling algorithm which they called "YDS", named after its three

(12)

Section 1.3. Energy efficient software networks, through data compression

creators. The idea behind this algorithm is to adjust the clockspeed of the processor at runtime, in a way that is optimal for energy usage. This idea is called Dynamic speed scaling[5]. Dynamic speed scaling is when the clockspeed is adjusted during computation to complete tasks just in time before another task is started. The clock speed is adjusted as necessary, depending on the amount of computation needed to complete a task, and the amount of time left until the next task needs to be started.

1.3.2 Parallelization and vectorization

A modern problem in computation is the dark silicon effect. This effect describes the phenomenon in which computers are limited in how many of their transistors they can use for any given operation. Moore’s law states that the number of transistors in a dense circuit doubles every other year[6]. However, powering all of these transistors becomes increasingly harder, especially at high clockspeeds. This is because of the transistors’

heat dissipation, which cause the device to become too hot for the transistors to function properly. The higher the clockspeed, the greater the heat dissipation. A common solution to this problem is to parallelize the workload. By cutting up a task into smaller parts, which can then be performed simultaneously, we can perform more tasks without increasing the clockspeed or computation time. Parallelization can also increase the energy efficiency of a program, both by increasing the overall computation speed, and through reducing the heat dissipation.

In 2013, Juan M. Cebrián and Lasse Natvig published online an article titled "Performance and energy impact of parallelization and vectorization techniques in modern microprocessors". Herein, they analyze and evaluate the usage of parallelization and vectorization for saving energy. They show that, depending on the processor and algorithm, it is possible to achieve energy savings by parallelizing the algorithms and running them on multiple threads. Additionally, vectorization can be used in place of, or in addition to, parallelization to further save energy[7]. Vectorization means that the processor performs an operation on multiple values simultaneously. This can be especially relevant for handling multimedia files such as images and audio, and for compression, tasks which are easily parallelizable or vectorizable.

1.3.3 Wireless sensor networks and compression

One popular use case for microcontrollers is in sensor nodes in wireless sensor networks. In a wireless sensor network, microcontrollers are used collect data using sensors, perform calculations on the data if necessary, and send the data off to be processed once enough has been collected. For such a task, the most energy expensive part is often the process of transmitting and receiving data wirelessly. In an article from 2010,

(13)

Section 1.3. Energy efficient software

Gregory J. Pottie and William J. Kaiser state that around 3000 processor instructions can be executed for the same energy cost as transmitting a single bit 100 meters over radio[8]. This gives us an opportunity for saving power by using data compression before transmitting data. By Pottie and Kaiser’s estimates, we will save energy even if we use thousands of instructions to compress the data size by a single bit.

This idea of using compression to reduce power usage is written about in Christopher M. Sadler and Margaret Martonosi’s article "Data Compression Algorithms for Energy-Constrained Devices in Delay Tolerant Networks". Here, they analyze multiple compression algorithms, and even develop their own algorithm, the Sensor Lempel–Ziv–Welch (S-LZW). The S-LZW is a variation on the Lempel–Ziv–Welch algorithm, and is a compression algorithm designed specifically for sensor networks.

In the article, they show that it can outperform other compression algorithms in terms of energy efficiency, depending on the network setup.[9]

This article by Sadler and Martonosi is an important influence for our thesis. In this theses, we want to build upon their work by measuring results for further compression algorithms, datasets, and compression sizes.

(14)

Energy efficient wireless data transfer

The focus of this thesis is on reducing the energy usage of data transfer over a wireless network. In this chapter we will describe the components that make up data transfer in wireless sensor networks, and various factors that can have an effect on the energy usage.

2.1 Wireless sensor networks

Wireless Sensor Networks (WSN) are a popular use for microcontrollers.

In a Wireless sensor network, numerous microcontrollers and other devices are connected in a mesh, sending information to one another, and acting upon the information.

Usually, microcontrollers in a sensor network are source nodes that gather sensor-data such as temperature, humidity, pressure, and more.

This data is then sent to a sink node for processing. The sink node is typically either a microcontroller that acts upon the received data to regulate a bigger system, or a more powerful machine with less restrictions that collects and analyzes the data, and acts upon it. The wireless sensor network can have any topology. In some networks, the source nodes send data directly to the sink node. In others, the microcontrollers might have to route the data trough many nodes before it reaches the sink. There may even be dedicated gateway nodes for routing the data. Figure 2.1 shows a few typical network topologies.

Wireless sensor networks have a wide variety of applications, as there are many tasks that rely on gathering sensor data in a region. They can be used to measure and log sensor data such as climate conditions, machine health, and factory conditions. The sensor data can then be collected and analyzed, and acted upon to improve conditions or keep a system stable. Sensor networks are also used for safety-critical tasks, such as detecting natural disasters like fire detection in forests and coal-mines, flood detection, and more. Early detection of natural disasters

(15)

Chapter 2. Wireless data transfer Section 2.1. Wireless sensor networks

sink source

source

source source source

(a)Star network topology, where all source nodes are connected directly to the sink

sink

source source

source

(b) multi-hop network topology, where source nodes are connected to the sink either directly, or through other source nodes.

Figure 2.1: Two popular network topologies for wireless sensor networks.

CPU

PowerSource

Transceiver

External Memory

Sensor

Figure 2.2: Illustration of a microcontroller acting as a single source node in a wireless sensor network. The power source is usually a small battery, but can also be a connection to the main power supply in a house, or an energy generator.

(16)

Section 2.2. Energy usage in radio transmissions networks, through data compression

is crucial to preventing or mitigating the disaster, or to allow initiating evacuation and other life saving measures in a timely manner.

Sensor networks are also used in many automated networks, for the purpose of measuring and automatically adjusting conditions in an area or system, such as the humidity and ventilation in a building.

When designing a Wireless sensor network, it is important to keep in mind the characteristics that are unique to it. The most important characteristic in sensor networks is the power consumption constraints inherent in microcontrollers. The nodes must be able to stay powered by small batteries, or by harvesting energy from external sources such as solar power and wind energy. A wireless sensor network should ideally be able to handle changes to the node mesh, including adding new nodes to the mesh, removing nodes from it, and moving nodes around. Another consideration is the environmental conditions of the sensor network. The network must be able to cope with the environmental factors that surround it; the communication between the nodes needs to function regardless of the conditions of the environment it operates in.

2.2 Energy usage in radio transmissions

When broadcasting a message from a node, its transmitter or transceiver is activated to perform the data transfer, and starts draining energy from the power source. Likewise, the receiver or transceiver is activated on the receiving node when listening for a broadcasted message. In many wireless sensor networks, the transmitters and receivers can be the biggest power sinks of the system.

2.2.1 Energy calculations

When discussing the energy consumption in electrical circuits, we focus on the electrical potential energyprovided by electrical outlets, generators or batteries, that is converted to other forms of energy, such as motion, light and heat. Once the electrical potential energy has been converted, it is considered as being used up, as a battery has a limited amount of potential energy to convert.

Energy is typically denoted as E in mathematical formulas, and is measured in Joules (J). One Joule describes the energy transferred to an object when a force of one newton moves that object one meter.

Energy can be measured through a variety of methods. A practical way to measure electrical energy usage is through the following formula:

1J =1V∗1C =1V∗1A∗1s

where V describes the voltage of the circuit, and C is the coulomb. The coulomb, which is the electrical charge in the circuit, can also be described

(17)

Chapter 2. Wireless data transfer

Section 2.2. Energy usage in radio transmissions

as1A∗1s, describing the charge transported by a current of one ampere in one second.

Another useful formula for measuring energy is through time and power:

1J =1W∗1s

where W describes the electrical power of a circuit, which is the rate at electrical energy is being transferred in the circuit.

These formulas communicate to us that we can reduce energy consumption by reducing the voltage or ampere of a circuit, or by increasing the amount of time the circuit operates at reduced voltage and ampere.

2.2.2 Transmission energy usage

The energy consumption of a transmission is affected by various factors.

From 2.2.1, we can assert that the energy consumption of a transmission can be calculated from the power of the transmission, and the time it takes.

One important factor that affects the time a transmission takes is the length of the message that is being transferred. The larger a message is, the more time, and thus energy, is required to broadcast it. Therefore, we can reduce the energy consumption of a transmission by reducing the length of the message being transferred.

The transmit power of a transmitter describes the power used for transmission, and can also be adjusted. By increasing the transmit power of a transmission, the message transmitted will be able to reach further.

Typically, transmit power is measured in decibel-milliwatts (dBm). The dBm of a transmitter tells us the logarithmic ratio between its power, and one milliwatt. The transmission power in microcontroller transmitters and transceivers will typically range range from -10dBm to 20dBm. Figure 2.3 shows the relation between dBm and watts for this range.

The transmit power we select for an application affects the range of our transmissions. In general, we can double the distance that a transmission can reach by quadrupling the transmit power. It can be beneficial to choose a fitting transmit power for each given task. The transmit power should be as small as possible, to conserve energy, while being large enough for the transmissions to reach the receivers.

(18)

−10 −₅ ₀ ₅ ₁₀ ₁₅ ₂₀

0 25 50 75 100

Power ratio in dBm

Powerinmilliwatts

Figure 2.3: The relation between dBm and wattage for transmission power, from -10dBm to 20dBm.

Another factor that affects the range and energy usage of data transmission is the frequency of the radio waves it is sent over. Higher frequencies allow for faster message throughput at the cost of increased energy usage per byte sent. We discuss this factor further in section 2.3, in conjunction with wireless transfer protocols.

2.2.3 Network packets

Wireless networks rely on sending information in packets. Each packet is a finite length set of data, and can be a self contained message, or part of a bigger message.

Packets consist of meta-information regarding the packet, and payload data. The meta-information can contain info on the connection, checksums, package id, package size, and more. The payload consists of the actual data that is being transferred.

When sending a message wirelessly, it be a good idea to split it up into multiple packets, rather than having one big packet. Splitting a message into smaller packets can cause increased message transmission overhead, as each package requires its own meta data. However, having multiple smaller packets reduces the cost of failed transmissions, as each package costs less to resend. This makes the network communication more secure against package loss or data corruption. For example, if one byte is corrupted in a single 512-byte packet, the entire 512-byte packet needs to be resent. If one byte is corrupted across eight 64-byte packets, only the 64-byte packet containing that byte needs to be resent. When picking a packet size for transferring data, it can be wise to pick a packet size that balances the overhead cost of sending more messages against the potential cost of losing a packet.

It is also possible to send packets with variable lengths, instead of a fixed length. When using a variable length packet, additional information

(19)

Chapter 2. Wireless data transfer

Section 2.2. Energy usage in radio transmissions

needs to be sent in the header of the packet, to specify its length. For example, when sending a variety of messages ranging from 8 to 32 bytes in size, it may be wiser to send the packets as variable length packets ranging from 8 to 32 bytes, rather than sending all messages in 32 byte packets. This can be useful for compressed data, as the compressed data’s size usually isn’t known beforehand, and may not be consistent between different compressed messages.

2.2.4 Receiver timing

When wirelessly sending data from one device to another, the receiving device needs to continuously be at standby, listening for incoming messages. This can lead to a lot of energy being wasted on idly waiting for a message. In figure 2.4 we can see a receiver listening for ca. 20 milliseconds before receiving any actual data, thereby wasting some energy.

One way to minimize this energy spent on listening for messages is to time the transmitter and receiver; the listening device should wake up just before the sending device sends out a message. In figure 2.5 we see a receiver listening for only 3 milliseconds instead, as the transmitter and receiver have agreed beforehand when to initiate transfer. We could try to minimize this waiting time even further, but then we risk that the transmitter starts transmitting too early, before the receiver has started listening. If the receiver starts listening after the transceiver has started sending data, the receiver will fail to receive the first part of the data, which can make the whole data unusable. It is important for the power consumption of a receiver to pick an optimal listening margin, so that little time is spent listening for messages, without risking to lose packages by starting to listen too late.

(20)

0 5 10 15 20 25 30 35 40 45 50

0 2 4 6 8 10

Time in ms

CurrentinmA

Figure 2.4: Wireless transceiver waiting for and receiving data, without timing. The red portion shows the transceiver listening for a message, and the green portion shows the transceiver receiving the message.

0 5 10 15 20 25 30 35 40 45 50

0 2 4 6 8 10

Time in ms

CurrentinmA

Figure 2.5: Wireless transceiver waiting for and receiving data, with timing. The red portion shows the transceiver listening for a message, and the green portion shows the transceiver receiving the message.

(21)

Chapter 2. Wireless data transfer Section 2.3. Wireless transfer protocols

2.3 Wireless transfer protocols

When multiple devices are communicating, a transfer protocol is required.

A transfer protocol is a set of rules and guidelines for how to communicate with one another. It defines how connections should be established, maintained, and cut, and how data should be transferred between the devices. If two devices do not use the same protocol, or don’t follow the rules set by the protocol, they cannot safely and reliably transfer data.

When communicating wirelessly, it is very easy to end up with interference with other communications and broadcasts on the same or nearby frequencies. A wireless transfer protocol needs to be able to tolerate this kind of interference. The protocols also have to comply with laws and regulations concerning radio transmissions, which vary from country to country. Fortunately, there are many wireless transfer protocols that can handle these issues. They have been designed to comply with radio regulations, and to operate regardless of interference.

The full radio spectrum consists large range of frequencies. Most of it is reserved for specific uses, and a license is required to broadcast or transfer data in those frequencies. There are however some radio bands that can be used without a license, as long as one follows a defined set of regulations. These are referred to as unlicensed radio bands, and are popular for usage in Internet Of Things (IOT) devices, and wireless sensor networks. A frequency band is a range of frequencies in the spectrum with a defined least and greatest frequency. Radio bands are sometimes split into smaller frequency ranges called channels. Figure 2.6 displays a rough overview of the radio spectrum as designated by the Institute of Electrical and Electronics Engineers(IEEE), including some near-globally unlicensed bands.

The regulations regarding usage and licensing of different radio bands can vary greatly from region to region. For example, a frequency range that is available for unlicensed use in Japan might not be available in the United States.

Many rules and regulations regarding telecommunication, and standards within wireless network communication that we use today, are maintained by the IEEE. IEEE 802 is a family of IEEE standards dealing with local area networks and metropolitan area networks, and contains many standards for wireless network protocols.

Within the specifications set by IEEE, there are a few radio bands that can be used license free close to worldwide. These are known as the 2.4GHz and 5GHz radio bands. These bands are also allocated for licensed uses, but do allow for unlicensed communication[11].

In addition to these, there are sub-1GHz frequency bands, ranging from 750MHz to 950MHz, which are available for unlicensed use. The availability of these bands varies from region to region, where different parts of this range can be used depending on the region[12].

(22)

Section 2.3. Wireless transfer protocols networks, through data compression

10¹ 10² 10³ 10⁴ 10⁵

HF VHF UHF L S C X KuK Ka V W G

ca 750-950 MHz, varies depending

on region (IEEE 802.15.4)

ca 2.4 - 2.5 GHz (IEEE 802.11)

ca 5.25 - 5.85 GHz (IEEE 802.11) Radio wave frequency in MHz

Figure 2.6: Radar-frequency bands according to IEEE standard, with focus on unlicensed radio bands, popular with IOT devices[10]

Choosing the correct communication frequency for each application can have a great impact on it’s performance and energy efficiency. Higher frequencies allow for higher data rates. This means that it is possible to transfer data in a shorter time, or transfer more data in the same time.

However, higher frequencies have greater free-space path-loss, which means that the signal will reach a shorter distance, and devices thus have shorter communication range. In addition, higher frequencies struggle more with obstruction, and reach even less far when communicating through walls, trees, or other structures. Finally, higher frequencies typically require more expensive hardware, as well as higher energy usage to operate. In addition to the basic differences from being at different frequencies, the 2.4GHz region is very crowded, which means that there is typically more noise and interference. This can lead to slower transmission latency and more packet losses.

We will look at several distinct transfer protocols, most of which are commonly used in wireless sensor networks, and describe their characteristics.

(23)

Chapter 2. Wireless data transfer Section 2.3. Wireless transfer protocols

2.3.1 Wi-Fi

Wi-Fi is one of the most popular wireless local area networking protocols.

It is primarily used for connecting devices to the Internet, through a wireless access point acting as gateway.

Wi-Fi commonly operates on the 2.4GHz and 5GHz bands, and has high data throughput. Newer Wi-Fi standards allow for a theoretical bandwidth of over 1Gbit/s. However, this also means that Wi-Fi has relatively high energy consumption, and requires expensive hardware.

The high cost and high energy consumption make Wi-Fi not typically suitable for wireless sensor networks. However, it might be worth considering using Wi-Fi in networks where high throughput is critical, and where energy consumption is not a concern.

2.3.2 Bluetooth

Bluetooth is a popular short range transfer protocol. It is maintained and marketed by the Bluetooth Special Interest Group (SIG), which is a group composed of over 30000 companies within telecommunication, computing, networking, and consumer electronics. In addition to developing the specification and protecting the trademark, Bluetooth SIG also maintains standards that manufacturers must meet before they can market their products as Bluetooth devices.

As of the Bluetooth specification 4.0 release in 2010, the Bluetooth specification Special Interest Group added Bluetooth Low Energy to the Bluetooth specification. Bluetooth Low Energy is an alternative to Bluetooth, for energy constrained devices. It allows for communicating with other Bluetooth Low Energy devices, as well as devices running regular Bluetooth 4.0 or above.

Whereas regular Bluetooth connections are continuous, devices using Bluetooth Low Energy communicate in intervals, sleeping when not in use. This reduces the throughput of the device, but allows for low energy consumption in devices that don’t require a constant connection, such as nodes in wireless sensor networks. It also performs timed communication as described in 2.2.4.

Bluetooth benefits from being very generic and highly interoperable.

Completely separate devices can communicate with with other, regardless of product manufacturer or transceiver vendor. Additionally, Bluetooth Low Energy devices can describe themselves, and what features they expose, using Generic Attribute (GATT) Profiles.

These factors make Bluetooth great for small networks such as homes and smart cars, systems that benefit from a generic protocol with high interoperability and modest throughput.

However, Bluetooth low energy is not well suited for longer distance communications, and in networks where the devices are frequently

(24)

Section 2.3. Wireless transfer protocols networks, through data compression

relocated. This can make it problematic in networks that are outdoors, or networks that get frequently modified.

A final positive note for Bluetooth is that most modern computers, laptops and smartphones come with built in bluetooth adapters. This allows us to connect these devices directly to a wireless sensor network, without having to install additional proprietary hardware.

2.3.3 ZigBee

ZigBee is a transfer protocol specification developed by the ZigBee Alliance, a group of companies that maintain and publish the ZigBee standard.

The ZigBee specification intends it to be simpler and cheaper than other wireless area networks such as Wi-Fi and Bluetooth.

The ZigBee protocol operates at the 2.4GHz and sub-1GHz frequency bands in North America and Europe, and has a higher range than Bluetooth Low Energy, promising full home coverage.

The ZigBee protocol is an open protocol, and the ZigBee alliance allows using it non-commercially without having to pay a royalty. However, a membership in the ZigBee Alliance is required to get permission to create commercial products that use the ZigBee specification.

Outdoors, ZigBee devices can communicate with nodes up to 200m away in open areas without occlusion, according to the specification.

2.3.4 Proprietary Protocols

In addition to popular wireless transfer protocols, many companies make use of their own proprietary protocols. A proprietary protocol can be used, as long as it is following the rules set by IEEE, and any other potential radio regulation laws. Some benefits of using proprietary protocols include not having to pay licensing rights to the protocol, and having a greater control over the format of the transmissions, to better suit the specific wireless communication use case.

(25)

Encoding and compression for energy efficient transfer of sensor data

In distributed sensor networks, source nodes need to send their data to a sink node. In some networks, the data might pass through multiple nodes before reaching the sink. Depending on the method of communication between nodes, the transfer of data can cost a significant amount of energy. In many wireless sensor networks, data transmission can be the part of the system that drains the most energy, and can be considered a limiting factor for the battery lifetime of the sensor nodes. Minimizing data sent between nodes may therefore yield a huge return in energy savings. This can be accomplished by compressing the data in the source microcontroller, and decompressing it again at the sink node. The microcontroller will have to spend power to compress and decompress the data, but the reduced data size will cause savings in energy for every hop it makes through the sensor network.

Depending on the transmission power of the microcontrollers, radio frequency, path through the mesh, and other factors, these energy savings can outweigh the cost of compression and decompression.

Depending on the transfer speed of the nodes, and distance between source and sink nodes, compression can also reduce the total time it takes for the message to reach the sink node. In time-critical sensor networks, such as fire detection sensor networks, this could help signal emergencies as quickly as possible.

There are many different compression algorithms to choose between, each suited for a different task. In this chapter, we will look at the characteristics of sensor data, analyze what kind of compression algorithms may produce the best results, and look at some compression algorithms that we believe are well suited for compressing sensor data on a microcontroller.

(26)

Section 3.1. Sensor Data networks, through data compression

3.1 Sensor Data

Sensor data is a general term used to describe measurements taken from the physical environment, using sensor devices. These can be chemical or physical measurements such as temperature or acceleration, user inputs such as knobs or buttons, and more.

There are many different sensors with unique characteristics. Even when measuring the same element the sensor data can differ from device to device, as there are many different sensors that perform similar functions, each with their own characteristics. There are however some characteristics that are shared between many different sensor measurements.

Data taken with a sensor will often be smooth, meaning that nearby measurements have very close values. This is caused by samples being taken close to each other in time and space, and thus tend to be similar.

For example, when measuring temperature over time, typically the temperature won’t change more than a few degrees between every measurement; Big sudden increases or decreases in temperature are very rare.

Additionally, sensor measurements tend to stay within a certain range, often oscillating around a central value. This causes repetition in values and patterns. For example, temperature data in one location over a few days will often increase and decrease in a sinusoidal fashion, with nearby measurements having similar values, and with repeating patterns. However, this pattern can be interfered with by a drastic weather change one day, or may drift away after a few weeks, to a different sinusoidal pattern with a different center and amplitude. Not all data measurements are smooth or have repeating values, however. User input sensors for example can be very erratic and non-smooth. Another example is when measuring the Global Positioning System (GPS) coordinates of a ship during a cruise trip. In that case the values will be smooth, but there might not be repeating values.

Another thing of importance is that sensors have varying range and precision, and thus the values acquired from them have to be stored in varied bit sizes, including sizes that don’t align with bytes. The data can have precision less than 8 bits, somewhere in between 8 and 16 bits, and so on. This can make the sensor values harder to work with together with compression, as many compression algorithms only perform well on data that is byte aligned, or data that is of a specific bit length.

For example, the popular temperature sensor DS18B20 reports temperatures with 9 to 12 bits of precision[13]. If we want to compress this data well, we may have to give up some precision by removing the least significant bit, or use 16 bits to represent the values.

Often it’s also possible to change the data to make it more compressible.

This will typically involve a transformation function that changes the data

(27)

Chapter 3. Encoding and compression Section 3.2. Reducing range or precision

from its original format into one that is easier to compress, with or without losing information. A transformation that loses information is called a lossy transformation, and one that doesn’t lose any information is called a lossless transformation.

3.2 Reducing range or precision

Sometimes, the full range of a sensor is not needed, and the data can be transformed to a format that uses fewer bits and is byte-aligned, to improve compression. For example, if we have 10-bit precision sensor measurements, but we don’t need full precision, we can perform a lossy transformation by removing the two least significant bits from each value.

This will reduce the size of the values to fit in a single byte each, at the cost of losing a bit of precision. It will typically also increase occurrence of specific values, and can further improve compression. However, this type of data transform is lossy, and should only be used if lower precision data is acceptable. Another option can be by adding a set value to each measurement, or subtracting a set value from each of them. For example, if all our measurement values range from 200 to 350, we can subtract 200 from each value in order to fit each value in a single byte.

3.3 Run-length encoding

Run-length encoding is a data encoding method sometimes used together with compression algorithms.

Run-length encoding is performed by encoding the data into a list of (value,count) tuples.

For example, the number sequence

1122221121123332112223322 can be represented as

(1, 2)(2, 4)(1, 2)(2, 1)(1, 2)(2, 1)(3, 3)(2, 1)(1, 2)(2, 3)(3, 2)(2, 2)

where each tuple shows what value comes next, and how many instances of that value there are in a row.

Run-length encoding works best on data that has a low range of values, and long strings of the same value.

Typically, run length encoding will be performed as part of a bigger compression algorithm. For example, if a compression algorithm greatly reduces the number of unique values, run-length encoding can be used afterwards to further compress the data. This can be especially useful together with lossy compression and encoding methods, which may output data with long strings of repeating values.

(28)

Section 3.4. Delta encoding networks, through data compression

3.4 Delta encoding

Delta encoding is a popular data encoding method used to transform data into a more compressible format. Instead of being represented by its own value, each value is represented by the change since last measurement.

Figure 3.1 shows an example of a list of numbers being delta encoded.

Before delta encoding 66 78 75 100 125 120 After delta encoding

66 12 -3 25 25 -5

Figure 3.1: Example of delta encoding applied to a simple array of values.

The first value remains the same after delta encoding. The rest show the difference between a value and the previous value.

We can delta encode a list of values

(a0,a₁,a2, ...,a_n−1,an) into a delta encoded list of the values

(d0,d1,d2, ...,dn−1,dn) by performing the operations

d0 =a0

dn =an−an−1

To transform the values from their difference encoded from back to absolute values, we can use

an =

∑

n i=0

d_i

An alternate way to express this is as follows:

a₀=d₀

an =an−1+dn

In figure 3.2 we can see the effects of applying delta encoding to sea surface temperature measurements that were sampled once a minute.

As we can see, the overall spread of values is much smaller in the delta encoded data. The original measurements range from −1.646 to

−0.473, whereas the delta encoded data ranges from −0.275 to 0.262. The dataset got reduced from385unique values to only146unique values, less than half as many unique values. The values are also much more

(29)

Chapter 3. Encoding and compression Section 3.4. Delta encoding

−0.50

−_0.75

−1.00

−1.25

−1.50

Temperaturein◦ C Original sensor data

0 100 200 300 400 500 600 700 800 900 1000

0.20 0.10 0.00

−0.10

−0.20

Changeintemperature◦ C

Delta encoded sensor data

Figure 3.2: Example showing the effects of delta encoding a sensor data set, of sea surface temperatures.

concentrated on one region. In the original dataset, the most common value is −_1.619 with 13 occurrences. In the delta encoded data, the most common value is 0, with 82 occurrences. By reducing the range of the values, and having values more concentrated on a smaller region, the data has been transformed to a format that is likely more compressible.

Delta encoding can however have drawbacks. Applying delta encoding to erratic and non-smooth data can result in data that is even less suited for compression. If nearby samples have values that are far from each other, delta encoding can further increase this distance, and make the data appear even more erratic. Fortunately, many sensor data measurements are smooth, and allow for improved compression with delta encoding.

Another problem with delta encoding is how all previous delta encoded values are needed, in order to decode one value. This means that we can’t decode a value in the middle of a delta encoded dataset, without first decoding every previous value. If we lose any values during a wireless transfer, all following values will also be lost. It may be wise to cut the input data into smaller chunks that are delta encoded separately.

(30)

Section 3.5. Huffman Encoding networks, through data compression

3.5 Huffman Encoding

Huffman encoding is a popular character-by-character encoding method, introduced by David A. Huffman in his 1952 article "A Method for the Construction of Minimum-Redundancy Codes" [14]. It works by

’translating’ each character to its optimal bit-representation. The idea is to use a minimal number of bits for each character. For example, if we only have 3 unique characters, there is no need to spend upwards of 8 bits on each character in our data string, as each set of 2 bits can represent 4 different characters. Huffman encoding goes a step further, encoding one of the characters, whichever is most used, as only 1 bit. This is to ensure that the encoded code is as small as possible, as far as encoding each character at a time, while also being unique for the input string. This pattern also scales well for greater amounts of characters. The more often a character occurs, the shorter the Huffman encoded code for that character will be.

The following is an example of a Huffman encoding. Imagine we have a dataset with only three different characters, ’A’, ’B’, and ’C’, where one of the characters, ’B’, occurs more often than each of the other two characters. We can set the bit-strings ’0’ to represent ’B’, ’10’ to represent

’A’, and ’11’ to represent C. Thus, the string ’AABCBBCB’ would be encoded as the binary string ’101001100110’. This code uniquely decodes to the string ’AABCBBCB’, given that we know what character each bit sequence translates to, and is in fact an optimal code for encoding the characters in the dataset separately.

(31)

Chapter 3. Encoding and compression Section 3.5. Huffman Encoding

These are the steps we need to perform for a Huffman encoding:

Create frequency

table

Create Huffman representation

table

Read character Write its

Huffman representation

Move to next

character Done with

the data?

Done no

yes

1 2

4 3

5 6

1. Iterate through the input data string to count the unique characters, and how often each character is used.

2. Use the data from step 1 to create a Huffman translation table from character to its bit- string representation.

3. Read the character that is in the current input data position.

4. Find the bit-string W that it translates to, and write that to our output string.

5. Move the input forward by one character, and the output by however many bits W took.

6. Go to step 3, continue until all data is encoded.

When working with sensor data, each character is typically one sensor measurement value, or part of a sensor measurement value.

(32)

Section 3.6. Arithmetic coding networks, through data compression

3.6 Arithmetic coding

Arithmetic coding is a lossless entropy encoding invented by Jorma J.

Rissanen, published in his 1976 article "Generalized Kraft Inequality and Arithmetic Coding". [15]

Arithmetic coding works by representing a series of characters as a single arbitrary-precision fraction in the range [0.0, 1.0). At any point in the message, the data so far is represented as a range between to fractions.

To perform arithmetic coding we first need to create a probability model for our coding. The probability model is a model describing the likelihood of each character, in our non-compressed data. It is represented through a series of connected intervals in the range [0.0, 1.0), with each interval corresponding to a character, where the size of the interval describes the likelihood of that character. For example, if we have an alphabet with only two characters, "a" and "b", with both characters having an equal probability of appearing in the string, we create a probability model with the interval [0.0, 0.5) representing "a", and the interval [0.5, 1.0) representing "b". This probability model can then be used to perform the arithmetic coding, by representing the string as a number. A number in the range [0.0, 0.5)represents the string "a", while a number in the range [0.5, 1.0) represents the string "b". We can then use the probability model to encode the next character in the string as well, by again dividing our previously mentioned ranges into further ones. A number in the range [0.0, 0.25) represents the string "aa", [0.25, 0.5) "ab", and so on, recursively. The end result is that we can encode any string using this alphabet as a single arbitrary-precision fraction in the range [0.0, 1.0).

If the characters have different probabilities of occurring in the string, we can give the characters intervals with varying size in the probability model. For example, if "a" is three times as likely to appear as "b", we can give it the range [_{0.0, 0.75}), and "b" the range[_{0.75, 1.0}). This allows more common sequences to be represented using fewer bits, thus ensuring that we can encode the string optimally. Figure 3.3 shows the string "aaba"

encoded using this probability model.

The more accurately the probability model describes the string, the more optimally we can encode it, resulting in fewer bytes total once compressed. An accurate probability model is required in order to compress effectively. Additionally, the decoder needs to have the same probability model as the coder, to ensure that the data is correctly decoded. This can be done in multiple ways.

One method is to define the probability model beforehand, and have it shared by the encoder and decoder. This method works well when the probability distribution of the data is well known and not expected to change. The downside is that it only optimally compresses messages that fit said probability model. It will not work well with data that doesn’t fit

(33)

Chapter 3. Encoding and compression Section 3.6. Arithmetic coding

a

b

0.00

0.75

1.00

aa

ab

0.00

0.5625

0.75

aaa

aab

0.00

0.421875

0.5625

aaba

aabb

0.421875

0.52734375

0.5625

Figure 3.3: Arithmetic coding of the string "aaba" into a single arbitrary precision floating point number. Here, the letter "a" is three times as likely as "b", and therefore takes three times as much space on the probability model. Using this probability model, the string "aaba" can be coded to the floating point number in the interval [0.421875, 0.52734375) that takes the least amount of bits to write as an arbitrary precision floating number.

That number, together with an integer that defines the length of the input string, defines a unique arithmetic code that can be decoded into the input string.

the model, or with data with frequently changing probability distributions.

Another method is to decide the probability model separately for each string. This can guarantee an optimal compression for all strings.

However, it does mean that the encoder has to perform an extra pass over the string to first create the probability model. Additionally, the model has to be sent together with the compressed data in the message, as it is needed by the decoder to decompress the string.

A final method is to adaptively change the probability model during encoding and decoding, to better match the string at any point. The decoded data matches the original string as long as the probability model is the same when decoding as when encoding, between each character.

This method is great for compressing data without knowing the probability distribution beforehand. The probability model is built during encoding and decoding, meaning that we don’t need to do an extra pass over the data first, and that we don’t need to send the probability model alongside the compressed data. Additionally, this method can sometimes compress even better than with an optimal probability model, as the probability model might better fit the string at any given character, rather than being optimal for the entire string as a whole.

This final method is well suited for sensor data. Sensor data tends

(34)

Section 3.6. Arithmetic coding networks, through data compression

to have repeating values during short intervals, and repeating patterns of values are not uncommon either. The adaptive probability model will change over time to best fit these repeating values and patterns, achieving favorable compression.

3.6.1 Prediction by partial matching

Prediction by partial matching (PPM) is a statistical data compression algorithm based on arithmetic coding. It was developed by John Cleary and Ian Witten, and published in their 1984 article "Data Compression Using Adaptive Coding and Partial String Matching" [16].

PPM is an adaptive arithmetic coder. It dynamically creates and maintains a probability model while encoding character by character.

When decoding, it dynamically recreates and maintains the same probability model, character by character. By using the contents of the string so far at any step to describe the probability model, it doesn’t require additional data to represent the probability model.

PPM uses a Markov Chain to calculate probabilities. At any point during the encoding and decoding process, series of characters that have occurred earlier in the string are used to define the probability model for the next character.

For example, during decoding, if the character sequence "abc" has occurred frequently in the string so far, and the decoder has just decoded

"ab", there is a high probability that the next character in the string is "c".

Therefore, that character is given a big range in the probability model.

The number of immediately preceding characters used to predict the next character is called the order of the PPM model. A PPM model of order 3 is written as PPM(3), and uses the character sequence made by three previous characters to determine the next character, by looking at which characters commonly occurred after this sequence earlier in the string. If the 3 character sequence hasn’t occurred yet, and thus no prediction can be made using the three character sequence, the decoder tries to make a prediction using only the last two characters, and so on.

PPM is especially good at encoding text, which often has repeating words, phrases and sentences, but can also work well with sensor data, with repeating sequences of data.

(35)

Chapter 3. Encoding and compression Section 3.7. Lempel–Ziv 77

3.7 Lempel–Ziv 77

Lempel–Ziv 77 (LZ77) is a lossless compression algorithm developed by Abraham Lempel and Jacob Ziv in 1977, in an article titled "A universal algorithm for sequential data compression"[17]. Together with the similar LZ78, released by Lempel and Ziv one year later, LZ77 is a well known algorithm that serves as a basis for many modern compression algorithms.

In 2004, IEEE declared LZ77 and LZ78 key historical achievements in electronic engineering.

The LZ77 is a dictionary coder. Dictionary coders are compression algorithms where the compressed code is a list of indexes to strings in a dictionary. The LZ77 uses a sliding window as its dictionary. When compressing a substring of the input data, the algorithm looks at the previous characters inside the sliding window to see if there is a matching prefix. It then generates a code, (i,j,x), where i is the index of the prefix inside the window, j is the length of that prefix, and x is first character after the prefix. It writes this code to the output, and moves the window forward by the length of the prefix plus 1 for the extra character x. If no prefix is found within the previous characters in the sliding window, we write a code with prefix length 0, so the code only describes the next character, x. After the compression, the output is a list of LZ77 codes.

For working with LZ77 using the original rules proposed by Lempel and Ziv, we need define two values. First we need to define the window size, N, which tells us the size of our sliding window. Secondly we need to define a max substring size, Ls, which tells us the max length of a substring that can get compressed into one code. These two values need to be the same when encoding and decoding a message. It is best to either decide these values beforehand, or decide these values seperately for each compression, at the start of the compression. If the values are decided separately for each compression, they need to be sent to the decompresser as part of the compressed data’s message. By having a larger window sizeN, we increase the chance of finding a matching prefix in the window, but we need more bits for the i values in the output codes to describe the larger range of possible indexes in the window. By having larger max substring size L_S, we can store larger prefixes in a single output code, but we need more bits for the jvalues in the output codes to describe the larger range of possible substring lengths in the window.

The window acts as a First In First Out (FIFO) buffer, with a head and a tail end. Whenever a character is added to the window, it is added to the head of the window, and a character is removed from the tail of the window.

Figure 3.4 shows an example of Lempel–Ziv 77 applied to an input string "aabbbcbbbca", resulting in an list of LZ77 codes once compressed.

Energy Efﬁciency in Wireless Sensor Networks, Through Data Compression

Energy Efficiency in Wireless Sensor Networks, Through Data Compression

Srod Karim

Master’s Thesis, Autumn 2017

Abstract

Acknowledgments

Preface

Overview

Contents

Introduction

1.1 Motivation

1.2 Energy efficient hardware

1.2.1 Energy efficient processors

1.2.2 Energy efficient microcontrollers

1.3 Energy efficient software

1.3.1 Clockspeeds and dynamic speed scaling

1.3.2 Parallelization and vectorization

1.3.3 Wireless sensor networks and compression

Energy efficient wireless data transfer

2.1 Wireless sensor networks

2.2 Energy usage in radio transmissions

2.2.1 Energy calculations

2.2.2 Transmission energy usage

2.2.3 Network packets

2.2.4 Receiver timing

2.3 Wireless transfer protocols

2.3.1 Wi-Fi

2.3.2 Bluetooth

2.3.3 ZigBee

2.3.4 Proprietary Protocols

Encoding and compression for energy efficient transfer of sensor data

3.1 Sensor Data

3.2 Reducing range or precision

3.3 Run-length encoding

3.4 Delta encoding

∑

3.5 Huffman Encoding

3.6 Arithmetic coding

3.6.1 Prediction by partial matching

3.7 Lempel–Ziv 77