Check if probability event should occur - Minimal requirements to compile the Kernel

C.1 Minimal requirements to compile the Kernel

3.6 Check if probability event should occur

In the initial implementation of the probability check it was observed that the probability event occured drastically less frequently than what was anticipated.

When attempting to execute the probability check upon packet acknowledgement, it was expected that the probability was executed for each packet acknowledged.

Due to TCPs cumulative acknowledgements, the probability check was only executed once per acknowledgement regardless if the acknowledgement indirectly acknowledged multiple packets.

The probability check had to be modified in order to support cumulative acknowledgements. The cumulative acknowledged challenge was managed by increasing the probability value relative to the amounts of packets being acknowledged. Code excerpt 3.7 show how it was managed in the case of ignoring ECN signals.

1 u64 prevent_ecn_overflow = (u64) tp->ecn_no_backoff_probability * packets_in_ack;

3 u32 prevent_ecn_probability;

4 if (prevent_ecn_overflow > 0xFFFFFFFFULL) {

5 prevent_ecn_probability = 0xFFFFFFFF;

6 }

7 else {

8 prevent_ecn_probability = (u32) prevent_ecn_overflow;

9 }

11 u32 random_number = get_random_int();

12 if(random_number < prevent_ecn_probability) {

13 // ECN prevented

14 tcp_ecn_queue_cwr(tp);

Code Excerpt 3.7: Modified probability check in order to compensate for cumulative acnkowledgements

3.5 Rate control mechanisms

DA-LBE framework has introduced control structures for adjusting the sending rate of a stream for both loss-based, delay-based protocols (including explicit signalling based).

1 static inline bool

2 da_lbe_mode_enabled(struct tcp_sock *tp)

3 {

4 return tp->da_lbe_mode;

5 }

Code Excerpt 3.8:Checking if DA-LBE mode is enabled

3.5.1 Phantom ECN

The DA-LBE framework have implemented the concept of phantom ECN to reduce the transmission rate. By simulating the reception of an ECN signal, the transmission rate (through reduction of the cwnd) is reduced without the need to drop packets. The simulation of an ECN signal avoids the generation of a CWR signal to the receiver so as to not create confusion. The rate at which phantom ECN occurs is controlled by a structure variable representing a probability of ECN signal occurrence (in percent).

Phantom ECNs are less effective than delay-based protocols for measuring increased RTT as a congestion indication. Delay-based protocols tend to be more reactive to changes in network congestion. In order to dynamically adapt to network changes, when the probability of phantom ECNs is configured, a check on average time between congestion indications is also performed. If the average congestion indication is below a certain limit (controlled from user-space), the phantom ECN would occur as anticipated. However, if it is above the limit, it is interpreted as the congestion in the network having changed (reduced), and thus the need for generating phantom ECN events is gone, therefore the transmission rate can be increased (as there is now more unused bandwidth than before the phantom ECN probability was configured from user-space). The limit of time interval between the indications is calculated using an Exponentially Weighted Moving Average (EWMA).

When calculating whether or not to fire a phantom ECN, the DA-LBE framework must make sure that it is not already a real ECN. If it is a real ECN, it should be respond with a Congestion Window Reduced (CWR), so that the sender is

informed that about the reducal of the congestion window as a result of the ECN signal.

When implementing support for phantom ECN signals, it was detected that it was less efficient than expected. That is, the probability of generating a phantom ECN was configured with a high probability in order to get the desired amount of ECN signals. It was observed that the phantom ECN probability calculation, deciding whether or not a phantom ECN should be generated, ran only a fraction of the time compared to the number of acknowledged packets. The calculation ran once every acknowledgement (as expected), but due to TCP cumulative acknowledgement, multiple packets was being acknowledged at once, and yet, the calculation only ran once. This problem was solved by increasing the probability by the number of packets being acknowledged, which gave us the expected result.

1 u32 phantom_ecn_probability = tp->phantom_ecn_probability * packets_in_ack;

Code Excerpt 3.9:DA-LBE-INFO in tcp.h

Phantom ECN signals are generated and handled inside the DA-LBE system. The way it works is that by setting a probability for a phantom ECN event taking place, it will eventually trigger and generate a phantom ECN signal. A phantom ECN signal, when generated, will cause the TCP algorithm to treat it as it would a regular ECN signal, but will prevent the CWR being sent to the other node.

Receiving a CWR indirectly indicates the presence of congestion. By generating the phantom ECN signals, the transmission rate is reduced which causes a lower relative share of the available capacity.

3.5.1.1 Congestion abatement

When calculating the average interval between network congestion indications, Exponentially Weighted Moving Average (EWMA) calculation is used. By letting the user-space application define the weight of the recently added intervals, how fast DA-LBE reacts to network changes (increased un-utilized bandwidth) can be controlled. Since the DA-LBE framework initially do not have any records of network congestion indications, the calculation of the network congestion interval on the first registered indication must be skipped. All the consecutive network congestion indications trigger a calculation on the time elapsed since the last network indication, thus adjusting the average congestion indication interval. The implementation of congestion abatement is handled using an average congestion interval is shown in code excerpts 3.10 and 3.11.

1 static bool tcp_ecn_rcv_ecn_echo(struct tcp_sock *tp, const struct tcphdr *th, u32 packets_in_ack)

2 {

3 ...

4 if (da_lbe_mode_enabled(tp) && !th->ece && !th->syn && (tp->

ecn_flags & TCP_ECN_OK)) {

5 u64 phantom_ecn_probability_overflow = (u64) tp->

phantom_ecn_probability * packets_in_ack;

7 u32 phantom_ecn_probability;

8 if (phantom_ecn_probability_overflow > 0xFFFFFFFFULL) {

9 phantom_ecn_probability = 0xFFFFFFFF;

10 }

11 else {

12 phantom_ecn_probability = (u32) phantom_ecn_probability_overflow;

13 }

15 if(tp->last_congestion_indication_time.tv_sec != 0 &&

16 tp->last_congestion_indication_time.tv_nsec != 0 &&

17 tp->avg_congestion_interval &&

18 tp->ecn_congestion_delay) {

19 struct timespec now;

20 struct timespec time_since_last_congestion_indication;

22 now = current_kernel_time();

23 time_since_last_congestion_indication = timespec_sub(now, tp->

last_congestion_indication_time);

25 s64 congestion_level_limit = tp->avg_congestion_interval * tp->

ecn_congestion_delay;

26 s64 millisec_since_last_congestion_indication = timespec_to_ms (&time_since_last_congestion_indication);

Code Excerpt 3.10:tcp_ecn_rcv_ecn_echo() located innet/ipv4/tcp_input.c

1 static void tcp_fastretrans_alert(struct sock *sk, const int acked ,

2 bool is_dupack, int *ack_flag, int *rexmit)

3 {

4 ...

5 if (da_lbe_mode_enabled(tp)) {

6 tp->congestion_event_count += 1;

8 // check if last_congestion_indication_time have been set

9 if(tp->last_congestion_indication_time.tv_sec != 0 &&

10 tp->last_congestion_indication_time.tv_nsec != 0) {

11 struct timespec now;

12 struct timespec interval;

13 now = current_kernel_time();

14 interval = timespec_sub(now, tp->

last_congestion_indication_time);

16 // use default ewma weight/level if not set or if empty

17 if(tp->ewma_weight || tp->ewma_weight == 0)

18 tp->ewma_weight = DALBE_EWMA_LEVEL;

20 if(tp->avg_congestion_interval) {

21 // add interval average usnig ewma

22 tp->avg_congestion_interval = dalbe_ewma(

23 tp->avg_congestion_interval,

24 timespec_to_ms(&interval),

25 tp->ewma_weight

26 );

27 } else {

28 // first time adding interval, do not ewma

29 tp->avg_congestion_interval = timespec_to_ms(&interval);

30 }

31 }

33 // store the time of this congestion event

34 tp->last_congestion_indication_time = current_kernel_time();

35 }

36 ...

37 }

Code Excerpt 3.11:tcp_fastretrans_alert() located innet/ipv4/tcp_input.c

3.5.2 Ignoring ECN signals

Being able to ignore ECN signals is a way of deflating the congestion price allowing for an increased transmission rate. Explicitly ignoring ECN signals (real, not phantom) prevents the congestion window from being reduced and continues on as if nothing had happened. In order to properly ignore ECN

signals the DA-LBE framework send a fake confirmation stating (falsely) that the congestion window was reduced. This is achieved by sending the Congestion Window Reduced (CWR) flag to the receiver. If the CWR isn’t sent, the receiver will interpret it as the ECN signals not being received, and it will continue to generate new ECN signals until it has received a confirmation that the ECN signal is received (through the CWR). The chance of dismissing a real ECN signal should be user defined, with a percentage between 0 and 100 defining the probability of dismissal of the ECN signal to happen. Setting the probability to 100 effectively disables ECN signals. The implementation of ignoring ECN signals is shown in code excerpt 3.12.

1 static bool tcp_ecn_rcv_ecn_echo(struct tcp_sock *tp, const struct tcphdr *th, u32 packets_in_ack)

2 {

3 // DA-LBE

4 if (th->ece && !th->syn && (tp->ecn_flags & TCP_ECN_OK)) {

5 ...

6 if(da_lbe_mode_enabled(tp)) {

7 ...

8 u32 random_number = get_random_int();

9 if(random_number < prevent_ecn_probability) {

10 // ECN prevented

11 tcp_ecn_queue_cwr(tp);

Code Excerpt 3.12:tcp_ecn_rcv_ecn_echo() located innet/ipv4/tcp_input.c

3.5.3 Ignoring loss

Being able to not reduce the congestion window upon a congestion event will let a delay-based CC compete more fairly towards best-effort CCs. As the congestion signal is being processed where it should reduce the congestion window in order to adapt to the congested network it will instead ignore the congestion window reduction, letting it continue to increase its window, thus increasing competition

between the CCs being used on the network. The probability of not backing off due to a congestion event is set by a percent value between 0 and 100. Increasing the probability will make it more aggressive, disregarding congestion indications in the network. The implementation of ignoring loss is shown in code excerpt 3.13.

1 void tcp_enter_recovery(struct sock *sk, bool ece_ack)

2 {

3 struct tcp_sock *tp = tcp_sk(sk);

4 int mib_idx;

6 if (tcp_is_reno(tp))

7 mib_idx = LINUX_MIB_TCPRENORECOVERY;

8 else

9 mib_idx = LINUX_MIB_TCPSACKRECOVERY;

11 NET_INC_STATS(sock_net(sk), mib_idx);

13 tp->prior_ssthresh = 0;

14 tcp_init_undo(tp);

16 if (!tcp_in_cwnd_reduction(sk)) {

17 if (!ece_ack)

18 tp->prior_ssthresh = tcp_current_ssthresh(sk);

20 // DA-LBE

21 if(da_lbe_mode_enabled(tp)) {

22 u32 random_number = get_random_int();

23 if(random_number < tp->cwnd_no_backoff_probability) {

24 tp->undo_marker = 0; // avoided CWND change

25 }

26 else {

27 tcp_init_cwnd_reduction(sk); // reduce CWND reduction

28 }

29 }

30 else {

31 tcp_init_cwnd_reduction(sk); // default behaviour

32 }

33 }

34 tcp_set_ca_state(sk, TCP_CA_Recovery);

35 }

Code Excerpt 3.13: Code excerpt A.17 outline the tcp_enter_recovery() located in net/ipv4/tcp_input.c

3.5.4 Phantom delay

We can artificially adjust the perceived RTT for delay-based CC algorithms, such as Vegas. Changing the RTT affect the transmission rate of a delay-based CC. The queueing delay is adjusted according to a percentage specified by the user-space application. The phantom delay implementation is shown in code excerpt 3.14.

1 static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets,

2 u32 prior_snd_una, int *acked,

3 struct tcp_sacktag_state *sack)

4 {

5 ...

6 if (icsk->icsk_ca_ops->pkts_acked) {

7 u32 adjusted_rtt = (u32) ca_rtt_us;

9 // DA-LBE

10 if(da_lbe_mode_enabled(tp) && tp->delay_based_mode) {

11 u64 congestion_price_calculation;

13 if(tp->base_rtt_based) {

14 union tcp_cc_info info;

15 size_t sz = 0;

16 int attr;

17 u32 base_rtt;

18 long rtt_difference;

20 if (icsk->icsk_ca_ops->get_info) {

21 sz = icsk->icsk_ca_ops->get_info(sk, ~0U, &attr, &info);

23 if(info.vegas.tcpv_rtt &&

24 info.vegas.tcpv_rtt < 0x7fffffff &&

25 info.vegas.tcpv_rttcnt > 2) {

27 base_rtt = info.vegas.tcpv_rtt;

29 if(adjusted_rtt - base_rtt > 0) {

30 congestion_price_calculation = adjusted_rtt - base_rtt;

31 congestion_price_calculation *= tp->

congestion_price_adjustment;

32 congestion_price_calculation = (u64)

congestion_price_calculation / USHRT_MAX;

34 adjusted_rtt = base_rtt + (u32) congestion_price_calculation

;

40 congestion_price_calculation = adjusted_rtt;

41 congestion_price_calculation *= tp->congestion_price_adjustment

;

43 if(congestion_price_calculation > 0)

44 congestion_price_calculation = (u64)

congestion_price_calculation / USHRT_MAX;

46 adjusted_rtt = (u32) congestion_price_calculation;

47 }

48 }

50 struct ack_sample sample = {

51 .pkts_acked = pkts_acked,

52 .rtt_us = adjusted_rtt,

53 .in_flight = last_in_flight

54 };

56 icsk->icsk_ca_ops->pkts_acked(sk, &sample);

57 }

58 ...

59 }

Code Excerpt 3.14:tcp_clean_rtx_queue() located innet/ipv4/tcp_input.c

3.6 Using the DA-LBE kernel framework from user-space

DA-LBE provides socket options for setting the control variables through setsockopt, and provides a statistical info structure (relevant to the DA-LBE framework) through a socket option for getsockopt. Figure 3.4 presents an overview of how to communicate with the the DA-LBE Linux Kernel framework.

Figure 3.4:Communicating with the Linux kernel through socket options

3.6.1 Socket options

Socket option Purpose

DA_LBE_INFO Send data toDestinationwith DA-LBE

enabled

DA_LBE_INFO_ECN Send data toDestinationwith DA-LBE enabled

DA_LBE_MODE The argument specifies whether or not DA-LBE should be enabled for the current socket (stream). The argument is either 1 (enabled) or 0 (disabled).

DA_LBE_ECN_BACKOFF The argument given specifies the probab-ility of not backing off (ignoring) on a real ECN signal. The percentage must be multiplied with the maximum value of an unsigned INT.

DA_LBE_CWND_BACKOFF The argument given specifies the probab-ility of not backing off on a congestion event. The percentage must be multiplied with the maximum value of an unsigned INT.

DA_LBE_CONGESTION_PRICE The argument given specifies the infla-tion/deflation of the congestion price.

The percentage must be multiplied with the maximum value of an unsigned INT.

DA_LBE_BASE_RTT_BASED The argument specifies whether or not the CC in use is BASE_RTT based.

The argument is either 1 (enabled) or 0 (disabled).

DA_LBE_DELAY_BASED_MODE The argument specifies whether or not the CC in use is delay based. The argument is either 1 (enabled) or 0 (disabled).

DA_LBE_ECN_CONGESTION_DELAY Receive data fromDA-LBEand Back-ground trafficmachines

DA_LBE_EWMA_WEIGHT Receive data fromDA-LBEand Back-ground trafficmachines

Table 3.6:Socket options usingsetsockopt

3.6.1.1 Invoking socket options

Code excerpts 3.16 and 3.17 show how the socket options can be invoked using C. There are two function calls which are used to communicate with the DA-LBE framework, the getsockopt() [5] and setsockopt() [5]. Code excerpt 3.15 provides the function declaration for the two.

1 int getsockopt(int sockfd, int level, int optname,

2 void *optval, socklen_t *optlen);

4 int setsockopt(int sockfd, int level, int optname,

5 const void *optval, socklen_t optlen);

Code Excerpt 3.15:getsockopt() and setsockopt() function declaration

getsockopt() and setsockopt() are used to manipulate options for a specified socket referenced to by a file descriptor. Manipulating socket options requires that the level which the socket option is declared must be provided. For the DA-LBE framework the level at which the socket options is declared is the IPPROTO_TCP.

In setsockopt() the arguments optval and optlen are used to specify the socket option values. In getsockopt() the argument optval and optlen specify where the socket option data is to be returned. If there is noe socket option value to be provided or returned, optval can be set to NULL.

1 if (getsockopt(tcp_socket, IPPROTO_TCP, DA_LBE_INFO,

2 (void *)&da_lbe_info,

3 (socklen_t *) &da_lbe_info_length) != 0) {

4 printf("Can not get statistics from getsockopt: DA_LBE_INFO\n");

5 }

Code Excerpt 3.16:Using getsockopt() to fetch DA-LBE statistics

1 if (setsockopt(tcp_socket, IPPROTO_TCP, DA_LBE_INFO_ECN,

2 (char *) &opt_ecn_probability,

3 sizeof(opt_ecn_probability)) < 0) {

4 printf("Can’t set data with setsockopt: DA_LBE_INFO_ECN\n");

5 }

Code Excerpt 3.17:Using setsockopt() to configure DA-LBE mechanisms

3.6.2 User-space example

Code excerpt 3.18 presents a simple C user-space application using the DA-LBE framework, see code excerpt C.1.

1 #include <stdlib.h>

16 #include <linux/tcp.h> // for DA-LBE socket options

17 #include <limits.h> // for UINT_MAX

19 #define PORT 12345

20 #define LENGTH 1448

21 #define DESTINATION_IP "192.168.1.100"

22 #define INPUT_FILE "foobar.data"

24 void error(const char *msg) {

25 perror(msg);

26 exit(1);

27 }

29 int main(int argc, char *argv[]) {

30 /* Variable Definition */

31 int sockfd;

32 int nsockfd;

33 char revbuf[LENGTH];

34 struct sockaddr_in remote_addr;

36 /* Get the Socket file descriptor */

37 if ((sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_IP)) == -1) {

38 error("ERROR: Failed to obtain Socket Descriptor!\n");

39 }

41 /* Fill the socket address struct */

42 remote_addr.sin_family = AF_INET;

43 remote_addr.sin_port = htons(PORT);

44 inet_pton(AF_INET, DESTINATION_IP, &remote_addr.sin_addr);

45 bzero(&(remote_addr.sin_zero), 8 );

47 /* Try to connect the remote */

48 if (connect(sockfd, (struct sockaddr *)&remote_addr, sizeof(

struct sockaddr)) == -1) {

49 error("ERROR: Failed to connect to the host!\n");

50 }

51 else {

52 printf("Connected to the host at port %d\n", PORT);

53 }

55 /* Send File to recipient */

56 char* fs_name = INPUT_FILE;

57 char sdbuf[LENGTH];

59 printf("Sending %s to %s\n", fs_name, DESTINATION_IP);

61 FILE *fs = fopen(fs_name, "r");

62 if(fs == NULL) {

63 printf("ERROR: File %s not found.\n", fs_name);

64 exit(1);

65 }

67 /* DA-LBE setsockopt */

68 int opt_da_lbe_mode = TCP_DA_LBE_ENABLED;

69 if (setsockopt(sockfd, IPPROTO_TCP, DA_LBE_MODE,

70 (char *)&opt_da_lbe_mode,

71 sizeof(opt_da_lbe_mode)) != 0) {

72 printf("Can’t set data with setsockopt: DA_LBE_MODE\n");

73 exit(1);

74 }

76 unsigned int opt_ecn_probability = 0.05 * UINT_MAX; // 5%

Phantom ECN probability

77 if (setsockopt(sockfd, IPPROTO_TCP, DA_LBE_INFO_ECN,

78 (char *)&opt_ecn_probability,

79 sizeof(opt_ecn_probability)) != 0) {

80 printf("Can’t set data with setsockopt: DA_LBE_INFO_ECN\n");

81 exit(1);

87 while((fs_block_sz = fread(sdbuf, sizeof(char), LENGTH, fs)) >

0) {

88 if(send(sockfd, sdbuf, fs_block_sz, 0) < 0) {

89 printf("ERROR: Failed to send file %s.\n", fs_name);

90 break;

91 }

93 bzero(sdbuf, LENGTH);

94 }

96 printf("Transfer of %s done!\n", fs_name);

98 /* DA-LBE setsockopt */

99 struct da_lbe_info da_lbe_info;

100 unsigned short da_lbe_info_length = sizeof(da_lbe_info);

101 if (getsockopt(sockfd, IPPROTO_TCP, DA_LBE_INFO,

102 (void *)&da_lbe_info,

103 (socklen_t *)&da_lbe_info_length) != 0) {

104 printf("ERROR: Failed to fetch DA_LBE_INFO\n");

105 } else {

106 printf("Number of phantom ECNs generated: %u\n", da_lbe_info.

dalbe_phantom_ecn_count);

107 }

108

109 close(fs);

110 close (sockfd);

111 printf("Connection closed.\n");

112 return (0);

113 }

Code Excerpt 3.18:User-space example

Experimental setup

This chapter describes the experimental environment that have been configured to perform our measurements, including traffic-shaper, background-traffic, host running the DA-LBE features and the tools we used.

This section explores the technical configuration for our experimental environ-ment, where we have performed our experiements and measurements. The steps neccesary in order to configure the experiemental setup is also presented.

4.1 Hardware

The experimental setup includes four Linux computers running Debian, a router to route the traffic and ethernet cables to connect them together. Each computer has a special role in our experimental setup, figure 4.1 gives an overview of the setup, and table 4.1 describes the purpose of the different computers.

Figure 4.1:Hardware setup

Machine Purpose

DA-LBE Send data toDestinationwith DA-LBE enabled

Background traffic Send background traffic toDestination

Traffic shaper Rate limiting and delay

Destination Receive data fromDA-LBEandBackground trafficcomputers Table 4.1:Machine overview

In figure 4.1 we observe that the traffic shaping computer have two network interface controllers (NICs) in order to be able to shape the traffic between the router and the destination host. We could have managed to solve it by only using one, since most NICs support full-duplex connection, but chose to limit the amount of influencing factors in our experimental setup. The NICs of each computer have potential to carry traffic at the nominal rate of 100 Mbit/s (100BASE-TX).

4.1.1 Configuring two network interfaces

In order for the traffic shaping computer to limit the transmission rate with minimal amount of influencing factors, we have equipped the traffic shaping computer with an additional NIC. The data arriving at one NIC is passed along to the second NIC before reaching the destination. The data transmitted keeps its integrity while traveling through the traffic shaper, but experiences a bottle neck (rate limiting) as well as added network delay in order to be more realistic. This configuration between the two NICs is commonly referred to as a network bridge. We are using the Linux network utility bridge-utils [bridge-utils] to bridge the two NICs together.

Both the rate limiting and added delay is achieved with NetEm. A challenge when using NetEm is to be able to achieve rate limiting and network delay simultaneously on a single NIC, as it is challenging to configure it properly. The rate limiting and network delay configuration can be configured properly in our case is by setting it on two different NICs. Consequently, any traffic traversing the traffic shaper, experiences traffic shaping from our NetEm configuration on the first NIC, thereafter experiencing network delay from our NetEm configuration on the second NIC.

Code excerpts 4.1, 4.2 and 4.3 demonstrates the installation and configuration nec-cesary in order to configure the networking bridge.

$ apt-get install bridge-utils

Code Excerpt 4.1:Installing bridge-utils

$ brctl addbr br0

$ brctl addif br0 eth0 eth1

In document Implementing Less than Best Effort with Deadlines (sider 60-0)