Characterization and Modeling of Slice-based Video Traffic
Thesis for the degree of Philosophiae Doctor Trondheim, May 2009
Norwegian University of Science and Technology Faculty of Information Technology, Mathematics and Electrical Engineering
Department of Telematics
NTNU
Norwegian University of Science and Technology Thesis for the degree of Philosophiae Doctor
Faculty of Information Technology, Mathematics and Electrical Engineering Department of Telematics
© Astrid Undheim
ISBN 978-82-471-1564-0 (printed ver.) ISBN 978-82-471-1565-7 (electronic ver.) ISSN 1503-8181
Doctoral theses at NTNU, 2009:92 Printed by NTNU-trykk
The interest in watching real-time video content transmitted over packet-based communication networks such as the Internet is growing. When resources are restricted, quality of service support and provisioning of service guarantees are needed in the network to ensure a satisfactory user experience. In addition, the video content, different choices made in the video encoding, the bitrate charac- teristics of the resulting video stream, and the network performance will have an influence on the perceived quality. Regarding the network performance, the packet loss ratio and packet loss distribution are identified as important performance parameters for real-time video, and are of particular interest. Estimating these parameters is a step towards assessing the perceived quality of service for a video transmission.
This thesis addresses issues related to video transmission over the Internet. In particular, new methods to characterize and analyze video traffic in a network perspective are proposed in order to estimate some key network performance parameters. This requires models of the traffic and the network elements. Video encoded using a newly developed slice-based H.264/AVC scheme is studied. This scheme intends to give less bursty video traffic, and will hence be favorable for encoding video to be transmitted over a resource constrained network.
Video clips encoded using this slice-based scheme are characterized using two different approaches. First using the correlation and distribution functions and second using a token bucket traffic model. The characterization gives statistical information about the video traffic and is a prerequisite for developing traffic models. Both of these issues are important since the slice-based video encoding produces a new type of video traffic.
The frame sequence of a slice-based encoded video clip is divided into sections that are classified using non-parametric methods. This classification is useful and necessary since a video stream in general is non-homogeneous and non-stationary.
The classes can then be analyzed separately, and different models can be used for the classes. The distribution and dependence structures in the classes are studied.
Next, a new approach for estimating loss is proposed using the classification of the video frames, giving the average loss as well as information about the clustering of the losses for the different classes. It is shown that losses over high thresholds are independent or weakly dependent, and the upper bounds of losses can be estimated using high quantiles. These quantiles give statistical guarantees for the amount of loss.
Next, a Gaussian model is developed for the video traffic. This model is advantageous since it incorporates the correlation functions of real video traces.
Also, because of the additive properties of Gaussian processes, properties for an aggregated traffic stream can be deduced from the single streams. The packet loss for a video stream is defined using the exceedances of the video frame sizes over a threshold. Characteristics of a loss period, in terms of length and loss volume, are then found. These further give the loss ratio and loss distribution in a bufferless model as well as for a small buffer. Such results are important since the perceived quality depends on both the total amount of loss as well as the distribution of the losses.
For real-time applications, service guarantees are needed to ensure a satisfactory quality level for the users. These service guarantees are specified by network calculus server models. An approach to parameter estimation for important server models is proposed using external measurements on a network router. The obtained results are compared to the theoretical values, and the cause of the discrepancies is identified.
This thesis is submitted in partial fulfillment of the requirements for the degree of philosophiae doctor (PhD) at the Norwegian University of Science and Technology (NTNU).
The PhD study has been conducted in the period August 2004 to January 2009. During the study period, I have been hosted and funded by the Centre for Quantifiable Quality of Service in Communication System, Centre of Excellence (Q2S). Q2S is funded by the Norwegian Research Council, NTNU and Uninett.
The PhD study was formally conducted at the Department of Telematics, NTNU.
In addition to the research work, it included mandatory courses corresponding to one semester of full-time studies, and one year of teaching assistance, funded by the Department of Telematics. Professor Peder J. Emstad has been the supervisor of this work.
During my four years at Q2S, there are several people I would like to thank.
First and foremost, Professor Peder J. Emstad for accepting me as a PhD-student and for being my supervisor through these four years. I have appreciated all discussions and feedback.
Thanks also to the great researchers whom I have been lucky to work with:
Natalia Markovich, co-author of two of my papers, Yuming Jiang for introducing me to the topic of network calculus, Yuan Lin for doing the encoding work for my video clips, and finally Arne Øslebø at Uninett for setting up the measurement equipment.
I thank everyone working at Q2S Centre, these four years would not have been the same without you. In particular, I thank Tor Kjetil Moseng who has been my room mate throughout our thesis works and Erik Hellerud for helping me out with latex related questions. I also want to thank the administrative staff Anniken, Hans, and Mette for being very helpful with everything and for creating a pleasant working environment.
Finally, thanks to my dear family and friends, my parents Helga and Bernt and my sisters Karin and Brit. Thank you for all your support and encouragement.
I Thesis Introduction 1
1 Introduction 3
1.1 Motivation . . . 3
1.2 Thesis Outline . . . 5
1.3 Contributions . . . 7
1.4 Publications . . . 9
2 Background 11 2.1 QoS Provisioning in the Internet . . . 11
2.2 QoS for Video Transmission over the Internet . . . 16
2.3 Network Calculus . . . 25
3 Slice-based H.264/AVC 31 3.1 Introduction . . . 31
3.2 H.264/MPEG-4 Advanced Video Coding (AVC) . . . 32
3.3 Slice-based H.264/AVC Video Encoding . . . 34
3.4 Description of the Sample Traces . . . 35
II Characterization of Slice-based H.264/AVC Encoded Video Traffic 39 4 Traditional Characterization 41 4.1 Introduction . . . 41
4.2 Scene Change Detection . . . 43
4.3 Marginal Distributions . . . 46
4.4 Sample Correlations . . . 48
4.5 Network Simulations . . . 51
4.6 Results from Simulations . . . 52
4.7 Conclusion . . . 57
5 Token Bucket Characterization 59 5.1 Introduction . . . 59
5.2 Token Bucket Parameter Estimation from Simulation . . . 61
5.3 Results from Simulations . . . 62
5.4 Analytical Estimation of the Token Bucket Parameters . . . 73
5.5 Conclusion . . . 76
III Non-parametric Analysis of Slice-based H.264/AVC Encoded Video Traffic 79 6 Classification of Slice-based Encoded Video Traffic 81 6.1 Introduction . . . 81
6.2 Scene Change Detection . . . 83
6.3 Test of the Dependence of Scenes . . . 86
6.4 Estimation of the Mean Excess and Tail Index . . . 92
6.5 Estimation of the Extremal Index . . . 95
6.6 Conclusion . . . 100
7 Estimation of Loss from Threshold Exceedance 101 7.1 Introduction . . . 101
7.2 Estimation of Loss in the Bufferless Model . . . 102
7.3 High Quantile Estimation of Losses . . . 105
7.4 Conclusion . . . 108
IV Characterization of Loss for Aggregated Video Using a Gaussian Model 109 8 A Gaussian Model for Aggregated Video Traffic 111 8.1 Introduction . . . 111
8.2 The Multivariate Normal Distribution . . . 114
8.3 Model for Aggregated Multimedia Traffic . . . 115
8.4 Limit Distributions for Characteristics of Excursions . . . 118
8.5 Conclusions . . . 121
9 Loss Periods for Aggregated Video Traffic 123 9.1 Introduction . . . 123
9.2 Characteristics of Loss . . . 125
9.3 Numerical Computations and Results from Simulations . . . 129
9.4 Comparison of Loss Periods and Excursions . . . 137
9.5 Expected Length of a Loss Period and an Excursion Using Little . 139 9.6 The Approximate Loss with a Small Buffer . . . 140
9.7 Conclusion . . . 143
V Router Models for Quality of Service Assessment 145 10 Router Modeling with External Measurements 147 10.1 Introduction . . . 147
10.2 Network Calculus Approach to Router Modeling . . . 149
10.5 Conclusion . . . 157
VI Concluding Remarks 159
11 Conclusions 161
11.1 Future Work . . . 163
Bibliography 165
ACF Autocorrelation Function
AF Assured Forwarding
AR Autoregressive
ARMA Autoregressive Moving Average
AVC Advanced Video Coding
BA Behavior Aggregate
BB Bandwidth Broker
CBR Constant Bit Rate
cdf cumulative distribution function
CF Correlation Function
D-BMAP Discrete Batch Markovian Arrival Process DAR Discrete Autoregressive
DCT Discrete Cosine Transform
df distribution function
DiffServ Differentiated Services
DPCM Differential Pulse-Code Modulation
DRR Deficit Round Robin
DSCP DiffServ Code Point
DSL Digital Subscriber Line
EF Expedited Forwarding
EVI Extreme Value Index
FARIMA Fractional Autoregressive Integrated Moving Average FBM Fractional Brownian Motion
FEC Forward Error Correction FGN Fractional Gaussian Noise FIFO First In-First Out
FMO Flexible Macroblock Ordering
FR Full Reference
GAR Gamma Autoregressive
GBAR Gamma Beta Autoregressive
GOP Group of Pictures
GoS Grade of Service
GR Guaranteed Rate
IETF Internet Engineering Task Force iid independent and identically distributed IntServ Integrated Services
IP Internet Protocol
ISDN Integrated Services Digital Network ITU International Telecommunication Union
LR Latency Rate
LR-WSG Latency Rate Worst-case Service Guarantee
LRD Long-Range Dependent
MAC Medium Access Control
MMFF Markov Modulated Fluid Flow
MOS Mean Opinion Score
MPEG Moving Pictures Experts Group
MSE Mean Squared Error
MTU Maximum Transport Unit
MVNI Multivariate Normal Integral NAL Network Abstraction Layer
NALU NAL Unit
ned negative exponential distribution
NR No-Reference
NRK Norwegian Broadcasting Corporation
P-MMBBP Periodic Markov Modulated Batch Bernoulli Process PFT Packet Scale Rate Guarantee Virtual Finish Time
PHB Per Hop Behavior
PLR Packet Loss Ratio
PQoS Perceived Quality of Service PSNR Peak Signal to Noise Ratio PSRG Packet Scale Rate Guarantee QoE Quality of Experience
RED Random Early Detection
RR Reduced Reference
RSpec Reservation Specification RSVP Resource Reservation Protocol RTCP Real Time Control Protocol RTP Real-Time Transport Protocol RTSP Real Time Streaming Protocol SBBP Switched Batch Bernoulli Process SLA Service Level Agreement
SLS Service Level Specification
SRD Short-Range Dependent
SSIM Structural Similarity Index Measurement
SSQ Single Server Queue
TCP Transmission Control Protocol
TES Transform Expand Sample
TOS Type of Service
TSpec Traffic Specification UDP User Datagram Protocol
UMTS Universal Mobile Telecommunications System
VBR Variable Bit Rate
VCEG Video Coding Experts Group
VCL Video Coding Layer
VFT Virtual Finish Time
VoD Video-on-Demand
VoIP Voice over IP
VQEG Video Quality Experts Group WFQ Weighted Fair Queueing
Thesis Introduction
Introduction
This thesis studies encoded video in a network perspective, first through general characterization of the video, next using different types of models both for the video traffic and the network in order to analyze the Quality of Service (QoS). The focus is on some key network performance parameters accessible at the network egress.
Knowledge of these parameters is a step towards assessing the QoS perceived by a user watching the transmitted video. Video encoded using the recently developed slice-based video encoding scheme is studied. This scheme was developed in order to produce less bursty video traffic, and hence results in lower loss and delay for an encoded video stream transmitted through a network.
This chapter serves as an introduction to the thesis. Section 1.1 gives the motivation for the research conducted. The outline of the thesis is given in Section 1.2 and the main results and contributions are described in Section 1.3. The papers written as part of the thesis are listed in Section 1.4.
1.1 Motivation
Originally, the Internet was designed to provide best-effort data delivery [1].
With the introduction of Voice over IP (VoIP) and streaming video over IP, stricter requirements are imposed on the network since these real-time services have stringent constraints regarding the key network performance parameters:
throughput, delay, delay jitter, and packet loss. The amount of video traffic transmitted over the Internet has increased tremendously lately [2], partly due to the growing popularity of web-based video streaming services such as YouTube [3].
Initially, YouTube provided only low quality video, but was recently updated to allow for high quality video and audio. Video clips are now shown in a widescreen high-definition format, using the latest standard for video coding, H.264/Advanced Video Coding (AVC) [4]. The Norwegian Broadcasting Corporation (NRK) has also had great success in offering real-time streaming of news etc., and recently also in providing streaming from the Olympic games in Beijing. In addition, mobile providers have put attention on video streaming for mobile devices, especially focusing on big sporting events such as football championships and the Olympics
1.1. Motivation
to attract users. This video is mostly real-time, which put demands on the network in order to satisfy the quality requirements of the users. QoS support is then needed when the network resources are restricted.
QoS is defined by ITU-T in the recommendation E.800 [5] as follows: “The collective effect of service performance which determine the degree of satisfaction of a user of the service”. As this definition shows, the user perspective is important when evaluating QoS and this has led to the introduction of the terms Perceived QoS (PQoS) and Quality of Experience (QoE) [6]. These terms link the user perception and expectations for QoS to the quantitative performance parameters accessible at the network boundaries, the most important being: throughput, packet delay, delay jitter, and packet loss.
To support end-to-end QoS for real-time traffic over the Internet, two different QoS architectures, namely the Integrated Services (IntServ) [7] and the Differ- entiated Services (DiffServ) [8] have been proposed by the Internet Engineering Task Force (IETF). Both of these architectures have the objective of minimizing delay, delay jitter, and loss for real-time applications, using a reservation-based approach and a class-based approach, respectively. To analyze the service guaran- tees specified by these models, a set of traffic and server models are defined under the name of Network Calculus, see e.g., [9] and [10].
Most of the video transmitted over the Internet is Variable Bit Rate (VBR) encoded video, which is often preferred over Constant Bit Rate (CBR) encoded video because of the constant end-user quality and higher compression efficiency for the former. H.264/AVC [4] is the latest video coding standard, and provides a considerable improvement in compression efficiency compared to earlier standards.
A high compression gain is achieved by removing temporal and spatial redundancy.
The present frame is encoded using previous or consecutive frame(s) as reference, taking advantage of the temporal redundancy in consecutive frames belonging to the same scene, while spatial redundancy is removed by transform coding. In addition, intra coded frames are inserted periodically to prevent error propagation.
The size of the encoded frames is decided by the rate-distortion target for VBR coding, giving variable frame sizes. The resulting video stream is therefore bursty, which can cause network delay and hence packet loss due to late arriving packets.
With this in mind, the explicit slice-based video encoding scheme was developed, originally described in [11], using the H.264/AVC standard. This new scheme has no Group of Picture (GOP) structure and the large intra coded frames are avoided.
Video encoded using the slice-based scheme is hence smoother than regular frame- based H.264/AVC encoded video, while retaining the constant end-user quality and error resilience of the latter.
This thesis addresses issues related to video transmission over the Internet, focusing on slice-based encoded video. Traffic characterization, modeling, and analyzes are performed, with the objective of estimating the network performance parameters (i.e., the packet loss) when a traffic stream is transmitted through a given network. To be able to predict the QoS delivered to the users, models of the traffic and the network are needed. In particular, models for aggregated traffic are important, to reflect the fact that video traffic in the Internet is seen as aggregates of single video streams.
Video
Internet
Traffic Characterization Traffic Analysis
Encoder
Traffic Model
Decoder
User
Network Model PQoS/QoE Figure 1.1: Main areas and focus in this thesis, concerning video transmission over the Internet.
The main areas and focus of this thesis shown in Figure 1.1 are traffic char- acterization, traffic analysis, traffic models, network models, and PQoS/QoE assessment. The latter is only discussed in terms of network performance, i.e., performance parameter estimation targeted at giving information about the PQoS is pursued. The next section gives the outline of the thesis, showing how these areas are covered.
1.2 Thesis Outline
This thesis is organized as follows. Part I continues with Chapter 2 which provides a background for the most important topics addressed in the thesis. Chapter 3 gives an overview of the H.264/AVC standard and in particular the explicit slice-based video encoding scheme. The two encoded video clips that are studied in this thesis are also described.
Following this, the main body of research work is divided into four parts. These parts are organized as follows:
Part II: Characterization of Slice-based H.264/AVC Encoded Video Traffic. This part of the thesis mainly focuses on characterization of slice- based encoded video. Studying the statistical properties of the slice-based video is interesting because this new encoding scheme produces video traffic with different characteristics from regular frame-based video. Characteri- zation is also an important prerequisite for developing traffic models. Part II is divided into two chapters. In Chapter 4, one video clip encoded using the slice-based video encoding scheme is characterized with respect to the distribution functions and the correlation functions of the scene lengths and frame sizes. In addition, simulations are performed to compare the performance of a slice-based encoded stream to that of a regular frame-based encoded stream. In Chapter 5, token bucket characterization of the slice-
1.2. Thesis Outline
based encoded video is pursued. The token bucket traffic model is important in the Internet since it is used for resource reservation in IntServ and for analyzing service guarantees using network calculus both in IntServ and DiffServ.
Part III: Non-parametric Analysis of Slice-based H.264/AVC Encoded Video Traffic. This part of the thesis describes a non-parametric approach for classification and analysis of slice-based encoded video. Applying non- parametric methods presents a new approach to traffic analysis, where no information about the distribution functions is needed. This part of the thesis is divided into two chapters. In Chapter 6, sections of the slice-based encoded video stream are classified by the average frame size. A new method for scene change detection for the individual classes is presented, and the resulting scenes are checked for dependence. Also, characteristics of the classes of video data are studied, such as the mean excess function and the tail index, showing the distribution structure in the classes. In Chapter 7, the results from Chapter 6 are exploited for estimation of loss using exceedances of frame sizes over a high threshold. The high quantiles for the amount of loss are also found.
Part IV: Characterization of Loss for Aggregated Video Using a Gaus- sian Model. This part of the thesis is devoted to video traffic modeling based on the results from Part II, and estimation of loss using the resulting model. In Chapter 8, a Gaussian model is proposed for the slice-based en- coded video. This model is advantageous since it incorporates the correlation functions of real video traces, while still being a simple, parsimonious model.
Also, because of the additive properties of Gaussian processes, properties for an aggregated traffic stream can be deduced from single streams. It is found that the exceedances of frame sizes over a threshold for an aggregated stream are related to the exceedances of frame sizes over a threshold for a single video stream. In Chapter 9, these exceedances are analyzed, constituting loss periods in a bufferless model. The moments of the length and loss volume of a loss period are found numerically using the correlation functions of the slice-based encoded video traces. In addition, relations between the distributions of the length and loss volume of a loss period are exploited.
Part V: Router Models for Quality of Service Assessment. This part of the thesis focuses on using measurements for router model parameterization.
In Chapter 10, network calculus server models are parameterized using external measurements on the input and output links of a router. The approach is to estimate the required parameters using measurements results from burst and backlog periods and to use known results to derive the results for Guaranteed Rate (GR) and Packet Scale Rate Guarantee (PSRG) server models. These models are used to analyze service guarantees in IntServ and DiffServ, respectively.
Part VI: Concluding Remarks. This part of the thesis contains a summary of the main results and conclusions of the thesis. Topics of future work are also identified and described.
1.3 Contributions
The main contributions of Part II are:
• Video traffic encoded using the slice-based H.264/AVC video encoding scheme is characterized, using distribution functions and correlation functions of the scenes and frame sizes. The results show that there is only negligible autocorre- lation for the scene lengths, and for the size of the frames in different scenes.
However, there is non-negligible correlation between the scene change frame at the beginning of a scene and the average frame size in the scene. Also the frames inside a scene exhibit non-negligible correlation.
• The packet loss and delay through a bottleneck node for the slice-based encoded video are compared to those for regular frame-based encoded video, using network simulations. The results show that the slice-based encoding is advantageous compared to frame-based encoding when the buffer size is small.
• Lossless and loss bounded token bucket and leaky bucket traffic models are parameterized for different slice-based encoded video streams. The results show that the slice-based encoded video can tolerate fewer reserved resources than the frame-based encoded video while still fulfilling the same loss and delay requirements. For all streams, the token bucket parameters are significantly reduced by introducing a small data buffer for input traffic queueing.
The main contributions of Part III are:
• Sections of a highly variable slice-based encoded video stream are classified according to the average frame size. From the resulting classes, a non-parametric method is proposed for scene change detection. The scenes are checked for dependence, both using regular dependence measures such as the Autocorrelation Function (ACF) and the Ljung-Box test, and using long-range dependence measures. From the ACFs and the Ljung-Box statistics only non-negligible correlation is found for scenes inside the classes, but the Hurst parameter estimates show signs of long-range dependence for the scenes in the considered classes.
• The distributions of the frame sizes in the classes are estimated by the mean excess function. The results show that all classes contain mixtures of heavy- and light-tailed distributed frame sizes. The number of finite moments for the frame size distributions is estimated using the Hill’s estimate.
• The expected loss for the classified video stream when transmitted over a bottleneck link is estimated. Two non-parametric functions, the mean excess function and the extremal index are used. In addition, the high quantiles for
1.3. Contributions
the losses are estimated, showing the upper bound for the amount of loss that can occur with a given probability.
The main contributions of Part IV are:
• A discrete, multivariate Gaussian model is proposed for the slice-based encoded video, taking the correlation between consecutive frames into account. Relations between single and aggregated streams are found for the exceedances of the frames sizes over a threshold.
• The relations between single and aggregated streams are exploited for calculating the moments of the length of a loss period for the aggregated stream based on the moments of the length for individual streams. Numerical results are given for correlation functions from different video traces, showing higher first and second moments for the length of a loss period when the correlation between consecutive frames is increased.
• The moments of the loss volume of a loss period are estimated using a similar numerical approach as for the length of a loss period. The results are compared to the loss volume of a loss period for a continuous Gaussian process and show satisfactory agreement for high thresholds. The loss volume is used for estimating packet loss in a bottleneck node with a small buffer, in addition to giving the loss directly in the bufferless case.
• A relation between the distribution of the length and loss volume of a loss period for the continuous process is recognized. This relation is shown to be valid for the length and loss volume of a loss period for the discrete process as well. In addition, the first moment of the length of a loss period is shown to agree with the first moment found using Little’s formula.
The main contributions of Part V are:
• The parameters of network calculus server models, in particular GR and PSRG, are estimated using external measurements on a network router. The parameters are estimated directly from the measurement results. In addition, a new approach using burst and backlog period statistics and their relations to the GR and PSRG server models is developed. With the latter method, the evaluation of the delay for every packet is avoided.
• The measurement results are used for estimation of the router processing time.
The minimum processing time is shown to be equal to the difference between the value of the theoretical server model rate parameters and the value of the measured rate parameter. Furthermore, the values of the error parameters from the measurements are higher than the values of the theoretical error parameters due to processing times being higher than the minimum processing time.
• The results from the server modeling of a router can be used to give delay bounds for token bucket constrained traffic flows. An example is given for the token bucket characteristics of the slice-based encoded streams from Chapter 5.
1.4 Publications
All papers are written under supervision and in cooperation with Professor Peder J. Emstad. Details of the contributions from the other co-authors are given under each publication.
Papers Included in the Thesis
[A] Astrid Undheim, Yuan Lin, and Peder J. Emstad. “Characterization of Slice- based H.264/AVC Encoded Video Traffic.” InProceedings of the Fourth Eu- ropean Conference on Universal Multiservice Networks (ECUMN), Toulouse, France, February 2007.
[B] Astrid Undheim, Yuming Jiang, and Peder J. Emstad. “Network Calculus Approach to Router Modeling Using External Measurements.” InProceedings of the International Conference on Communications and Networking in China (ChinaCom), Shanghai, China, August 2007.
[C] Natalia Markovich, Astrid Undheim, and Peder J. Emstad. “Slice-based VBR Video Traffic-Estimation of Link Loss by Exceedance.” In Proceedings of the 4th International Telecommunication Networking Workshop on QoS in Multiservice IP Networks (QoS-IP), Venice, Italy, February 2008.
[D] Astrid Undheim and Peder J. Emstad. “Distribution of Loss Periods for Aggregated Video Traffic.” InProceedings of the ITC Specialist Seminar 18 (ITCSS’18), Karlskrona, Sweden, May 2008.
[E] Astrid Undheim and Peder J. Emstad. “Characterization of Slice-based H.264/AVC Encoded Video Traffic Using Token Buckets.”Telecommunication Systems, Springer, 39(2), October 2008.
[F] Natalia Markovich, Astrid Undheim, and Peder J. Emstad. “Classification of Slice-based VBR Video Traffic and Estimation of Link Loss by Exceedance.”
Computer Networks, Elsevier, 53(7), May 2009.
[G] Astrid Undheim and Peder J. Emstad. “Distribution of Loss Volume and Estimation of Loss for Aggregated Video Traffic.” Submitted for publication, 2009.
[A]: Yuan Lin encoded the video clips used in this paper and the rest of the thesis and contributed in writing the description of the slice-based video encoding scheme in this paper.
[B]: Yuming Jiang proposed to use the network calculus server models for the router model- ing, gave guidance during the work and the writing of the paper. The author performed the measurements, did the analysis and wrote the paper.
[C, F]: Natalia Markovich proposed new statistical non-parametric methods for the analysis of slice-based encoded video traffic. The author contributed in developing the ideas, applying the non-parametric methods to the video data, performing the computational analysis in Mathematica, and in writing the paper.
1.4. Publications
Other Papers by the Author
[H] Astrid Undheim, Yuming Jiang, and Peder J. Emstad. “Network Calculus Approach to Router Modeling Using External Measurements.” InProceedings of the 3rd EuroNGI Workshop on New Trends in Modelling, Quantitative Methods and Measurements, Turin, Italy, June 2006.
- This paper is an early version of Paper [B].
[I] Jan Erik Voldhaug, Erik Hellerud, Astrid Undheim, Erling Austreim, U.
Peter Svensson, and Peder J. Emstad. “Effects of Network Architecture on Perceived Audio Quality.” InProceedings of the 2nd ISCA Tutorial Research Workshop on Perceptual Quality of Systems. Berlin, Germany, September, 2006.
- This paper is an early version of Paper [K].
[J] Astrid Undheim and Peder J. Emstad. “Characterization of Slice-based H.264/AVC Encoded Video Using Token Buckets.” In Proceeding of the Euro-FGI Workshop on New Trends in Modelling, Quantitative Methods and Measurements.Gent, Belgium, May-June 2007.
- This paper is an early version of Paper [E].
[K] Jan Erik Voldhaug, Erik Hellerud, Astrid Undheim, Erling Austreim, U.
Peter Svensson, and Peder J. Emstad. “Influence of Sender Parameters and Network Architecture on Perceived Audio Quality.” Acta Acustica United with Acustica, 94(1), 2008.
[I, K]: The author contributed to ideas in this paper, performed the network simulations together with Erling Austreim and contributed to the writing of the paper.
Background
This chapter gives a brief background on topics of particular relevance to this thesis. Section 2.1 gives an overview of QoS provisioning in the Internet, with special focus on QoS architectures for the Internet. QoS for video transmission over the Internet is discussed in Section 2.2. Section 2.3 summerizes the most important aspects of network calculus relevant for this thesis, including server models and traffic models.
2.1 QoS Provisioning in the Internet
This section gives an introduction to QoS provisioning in today’s Internet, starting with an overview of the topic and continuing with the IntServ and DiffServ architectures.
2.1.1 Introduction
“The Holy Grail of computer networking is to design a network that has the flexibility and low cost of the Internet, yet offers the end-to-end quality-of-service guarantees of the telephone network” [12]. This quotation from the late 90’s illus- trates the ultimate goal of QoS provisioning in the Internet. Achieving this goal is difficult because of the differences between these two networks. The telephone network is connection-oriented and provides reserved resources as soon as the connection is set up. QoS in the telephone network is therefore defined as the call blocking probability as seen by the users and the term Grade of Service (GoS) is defined as the general quality, including user and service provider aspects [13].
The ITU-T recommendation E.800 [5] defines QoS for the telephone network and ISDN as follows:
Quality of Service (ITU-T):
“The collective effect of service performance which determine the degree of satisfaction of a user of the service.”
2.1. QoS Provisioning in the Internet
According to E.800, QoS depends on the service performance, which is divided into support, operability, serveability, and security. The service performance then again relies on the network performance characteristics such as transmission performance and availability.
In the more recent ITU-T Recommendation G.1000 [14], new definitions for QoS terms are given in order to have a set of consistent definitions. G.1000 gives four different viewpoints for the QoS. These are:
1. QoS requirements of user/customer 2. QoS offered/planned by provider 3. QoS delivered/achieved by provider 4. QoS perceived by user/customer
In addition to putting more attention on the user, these new definitions are more targeted to the use in Internet, where the diversity of applications and services calls for new approaches to QoS compared to the telephone network. In this sense, the fourth viewpoint also resembles the term PQoS which is discussed in more details in Section 2.2.3.
The Internet, in contrast to the telephone network, was designed to offer connection-less, best-effort data delivery and had no focus on QoS initially [1]. It worked this way until the introduction of delay sensitive applications on top of IP.
QoS was then also defined by the IETF in RFC 2216 [15] as follows:
Quality of Service (IETF)refers to:
“the nature of the packet delivery service provided, as described by parameters such as achieved bandwidth, packet delay, and packet loss rates.”
This definition focuses only on the network performance parameters, and does not take the user aspect into account at all.
With the introduction of real-time services over IP and the focus on QoS, the need for service differentiation mechanisms in the Internet became clear, leading to heavy research on the area. The research conducted mainly led to two different proposals for support of QoS in the Internet, developed by the IETF. These are the Integrated Service (IntServ) architecture [7] that offers absolute QoS and the Differentiated Service (DiffServ) architecture [8] that provides relative QoS.
IntServ and DiffServ are described next.
2.1.2 IntServ
Observing that real-time applications did not perform well across the Internet due to variable queueing delays and congestion losses, the IETF proposed IntServ as the inital QoS architecture. IntServ was originally designed to be able to control the end-to-end packet delay and provide bandwidth sharing [7]. It is a per-flow based service differentiation scheme, focusing on resource reservation and
Sender Receiver 1. Path message (TSpec, RSpec)
2. Resv/Error message Figure 2.1: Resource reservation in IntServ using RSVP.
admission control on a per-flow basis for providing service guarantees. As such, state information for each IntServ flow is needed in every router on the path from source to destination, which makes the scheme unscalable in a large network.
The Resource ReSerVation Protocol (RSVP) [16] is used as signaling protocol for resource reservation in IntServ, requesting resources along a path as shown in Figure 2.1. The reservation is receiver initiated, but the sender will signal the receiver to initiate the reservation. The reservation procedure is then as follows: 1) During the signaling phase, a path message with a Traffic Specification (TSpec) [17] and a Reservation Specification (RSpec) is sent from the sender to the receiver. The TSpec specifies the traffic characteristics of the flow with parameters such as token rate, bucket depth, peak rate, and maximum packet size while the RSpec defines the level of service required, such as delay guarantees and bandwidth requirements. 2) An admission control scheme is needed in the routers to determine whether a router should accept the flow or not. If accepted, the router records the traffic characteristics contained in the path message before forwarding it to the next router on the path. 3) The receiver responds to the path message by sending a reservation (Resv) message in the opposite direction along the same route as the path message. If the request is rejected, an error message is sent back to the sender. 4) If every router on the path accepts the resource request based on the TSpec and RSpec contained in the path message, bandwidth and buffer space are allocated and flow-specific state information is stored in the routers. Every router on the path must participate in the resource reservation process meaning that partial deployment of IntServ is not feasible.
IntServ introduces two service classes in addition to the best-effort service class. A deterministic Guaranteed Service [18], which gives an upper bound on the end-to-end delay and a stochastic Controlled Load Service [19], which provides a QoS to a flow approximately equal to the QoS that the flow would receive from an unloaded network element.
IntServ has not been a great success, mainly due to the per-flow resource reservation and the succeeding per-flow processing and per-flow state in the routers. These, together with the violation of the end-to-end design principle of the Internet (see e.g., [20] for a discussion on this issue), are some of the main reasons why IntServ has never been deployed. Different approaches have been proposed to solve the scalability problem. In [21], it is proposed to use IntServ
2.1. QoS Provisioning in the Internet
over DiffServ, where a DiffServ domain serves as a network element in IntServ and participates in the end-to-end resource reservation as a single network element.
Furthermore, in [22] it is recommended to enhance the RSVP to perform resource reservation on classes of aggregates, where an aggregate consists of a number of flows with shared ingress and egress routers through an aggregate network. In this case, per-flow resource reservation is needed only at the edge of the network and per-aggregate resource reservation is performed in the aggregate network.
2.1.3 DiffServ
Two main problems with IntServ governed the need for another approach to QoS provisioning in IP networks. First and foremost, with each router maintaining per-flow state, IntServ could not scale well in a large network. In addition, only two service classes were specified and the flexibility of differentiation between flows of the same service class was lacking. With this in mind, the IETF proposed the DiffServ architecture [8]. DiffServ is therefore not a per-flow based scheme but a per-aggregate-class based service differentiation scheme, where the traffic is divided into a small number of Behavior Aggregates (BA). DiffServ also allows for different drop precedence levels for different flows, and even packets, belonging to the same class.
The classification into BAs is done at the ingress of the network and involves the DiffServ edge router assigning a DiffServ Code Point (DSCP) value to the Type of Service (TOS) byte in each IP-packet. This DSCP value specifies which BA the packet belongs to and decides the treatment that the packet receives from the core routers. Routers in the core of the network provide service guarantees to aggregates, using a variety of scheduling and queue management procedures.
The externally observable forwarding behavior in the routers for each BA is called Per Hop Behavior (PHB), where each PHB maps to a DSCP value.
DiffServ defines two PHBs in addition to the default best-effort PHB, which are the Expedited Forwarding (EF) PHB [23] and the Assured Forwarding (AF) PHB group [24]. The EF PHB is supposed to provide low delay, low jitter, and low packet loss by ensuring a configured service rate to the EF aggregate, as well as a bounded deviation from this configured rate. Each node that provides EF service should then comply with the Packet Scale Rate Guarantee (PSRG) server model [25], as described in Section 2.3.1. The AF PHB group consists of four different AF classes, each of which is allocated an amount of buffer space and bandwidth in the nodes. Within each AF class, there are three different levels of drop precedence, thereby providing differentiated treatment within each class.
The packets with the highest drop precedence value are dropped first in the case of congestion, using an active queue management scheme such as Random Early Detection (RED) [26].
Service Level Agreements (SLAs) define a service contract for the provisioning of service guarantees, and are used between the customers and their source DiffServ domain, as well as between different DiffServ domains [8]. These SLAs contain a Service Level Specification (SLS), which specifies the traffic characteristics of an aggregate as well as the PHB. The traffic characteristics are often defined using
Source Destination
BB BB
Egress router Ingress router
Core routers Core routers
Leaf router
SLA
Classifier Marker Shaper/dropper Classifier Marker Shaper/dropper
Meter Meter Meter Meter
Figure 2.2: A simple DiffServ architecture (from [27]).
token bucket models [9] as described in Section 2.3.2. A Bandwidth Broker (BB) function is then needed in each DiffServ domain to perform admission control, manage network resources, etc., based on the SLAs.
A simple DiffServ architecture is shown in Figure 2.2. For the given source, the first capable downlink router in the source domain (leaf router) will perform per-flow classification, metering, and marking [27]. In addition, ingress and egress routers need traffic conditioning capabilities. At an egress router, DiffServ aggregates are shaped to conform to their profile, before being sent to another DiffServ domain. At the ingress routers, classification and marking of aggregates are performed, possibly after metering. In addition, core routers may include traffic conditioning capabilities such as metering and shaping (or dropping) of packets that are out-of-profile.
The use of DiffServ for transmission of layered MPEG-2 encoded video is demonstrated in [28], where different video coding layers are put in different DiffServ classes. It showed that the perceived quality of the the transmitted video is highly dependent on how the layers are created, and the layered video can tolerate a higher network load than regular video while achieving the same quality target.
An experimental DiffServ network was developed for the Internet2 Qbone project [27], but even though the project successfully demonstrated DiffServ in a test network, DiffServ has not gotten the deployment that was hoped for.
Some people talk about overprovisioning instead of QoS support, arguing that DiffServ will not the be widely deployed because of too high costs relative to the benefits [29]. While overprovisioning may be an option in the backbone network, the wireless access links still have limited resources. Hence, in [30] it is argued that QoS mechanisms are indeed needed in today’s network and that QoS support is well taken care of by the DiffServ architecture. Furthermore, the main obstacles to a wide deployment of DiffServ are seen as mainly business related. Hence,
2.2. QoS for Video Transmission over the Internet
Playout buffer Packetization
Internet
Network performance
Video Encoder Decoder
Network protocols
User
PQoS/QoE Figure 2.3: Multimedia and network aspects influencing the PQoS for video transmissions over the Internet.
with an increase in the amount of traffic transmitted over the Internet, service differentiation is likely to become a value-added feature in the near future.
2.2 QoS for Video Transmission over the Internet
In this section, QoS issues related to video transmission over the Internet are dis- cussed. Different types of video applications that have divergent QoS requirements are described. Finally, challenges related to assessing the end-user perceived QoS for a video transmission over the Internet are discussed.
2.2.1 Video Transmission over the Internet
Video transmitted over the Internet is often real-time video, which means that requirements for low delay and loss are imposed in order to provide satisfactory end-user quality [31]. Because of these requirements and the high and variable bitrate of video traffic, various challenges arise for video transmissions over the Internet. Traditionally, network performance metrics such as throughput, delay, delay jitter, and packet loss probability were used for estimating the QoS for a video transmission, in accordance with IETFs QoS definition. However, with new advances in video coding and application level support for QoS, several other aspects influence the QoS perceived by the end-users. An overview of multimedia and network aspects that influence the perceived QoS is given next. These are also shown in Figure 2.3.
Video Content
The characteristics of the video content is the first aspect affecting the resulting quality. In particular, the type of video application is important. Video conference streams on one hand are often quite static and have a high degree of spatial
and temporal dependencies, making it relatively easy to compress. The resulting bitrate is therefore low and has a low variability. Action movies on the other hand have frequent scene shifts and higher motion within scenes, as well as a large amount of details, resulting in less spatial and temporal dependencies. This type of video will hence require more bits to encode, i.e., the rate-distortion function is larger for a high activity scene than for a low activity scene with the same distortion [32]. The resulting bitrate will probably also have higher variability because of some low activity scenes. The influence of the video content on the perceived quality is investigated in [33], where it is shown that the spatial and temporal complexity influence the perceived quality and should be taken into account in a model for predicting perceived quality.
The bitrate characteristics of slice-based encoded video are studied in Chapter 4 and 6. The bitrate reflects the video content after encoding and is important to study since the bitrate characteristics of the video will influence the network performance experienced.
Encoding
In order to transport a video stream over the Internet, compression is needed to reduce the bitrate of the stream. H.264/AVC [4] is the latest standard for video coding and is studied in this thesis. The encoding process and the encoding parameters for H.264/AVC are targeted to the application type, the video content, the underlying network, and possible feedback from the network. Application layer techniques including error control and congestion control are used for coping with variable network conditions. Error control comprise Forward Error Correction (FEC), retransmissions, and error resilience [34]. For H.264/AVC, there has been much focus on error resilience tools for transmission in lossy network environments.
These tools include picture segmentation, data partitioning, reference picture selection, flexible macroblock ordering etc. The effect of a lost packet on the distortion is highly dependent on which error resilience techniques are employed, and this is studied using simulations in [35].
Packetization
After encoding, the video frames are divided into packets in a process called packetization. For H.264/AVC, the Network Abstraction Layer (NAL) was in- cluded to provide coding transparency towards the transmission medium. Simple packetization then involves putting a NAL Unit (NALU), containing a slice of a frame, into the payload of a Real-Time Transport Protocol (RTP) packet [36]. In order to have fully decodable packets at the receiver, it is advantageous to keep the slice size smaller than the Maximum Transport Unit (MTU) of the underlying network. Hence, being able to decode all packets that are successfully transmitted.
Otherwise, the RTP packets are fragmented on the IP-layer to resemble the MTU of the underlying network [35]. Video frames are typically encoded at a rate around 30 frames per second and are usually much larger than typical MTU sizes.
2.2. QoS for Video Transmission over the Internet
This results in bursts of packets carrying the size of the frames being sent to the network. So, video traffic is generally bursty.
Network Protocols
The transmission medium and the network protocols above are taken into account in the encoding process. The physical transmission medium will influence the FEC and error resilience tools applied to the encoded video [37]. The packetization process described above is also tailored to the underlying network, and the MTU of the Medium Access Control (MAC) layer will decide the optimal NALU size.
For video traffic transmission over the Internet, the Internet Protocol (IP) [38] is the obvious choice at the network layer. For the transport layer, the Transmission Control Protocol (TCP) [39, 40] is currently employed for most non-real time video content. TCP is connection-oriented and ensures the delivery of the video packets without loss. This is accomplished by using retransmission of lost packets and hence delay is traded for zero loss. Some streaming services also use TCP as transport protocol, buffering a large amount of packets at the client side before starting the playback. The buffering will account for variable network delay and even give enough time to retransmit lost packets. For real-time video, there are more strict time constraints because of a maximum allowable delay. Packets arriving too late for decoding are of no use, while a low packet loss probability in the order of a few percents is usually considered acceptable. The User Datagram Protocol (UDP) [41] is therefore the preferred choice for transmission of real-time video such as conversational services and real-time streaming. In addition, the RTP [42] is usually employed to add support for real-time audio and video services on top of UDP. RTP includes sequence numbers, facilitating detection of lost packets. Finally, the Real Time Control Protocol (RTCP) is used together with RTP to monitor the QoS of the session [42] and the Real Time Streaming Protocol (RTSP) [43] is used for streaming video.
Network Performance
In addition to the different choices made for video transmission, the network performance plays a significant role for the final perceived quality. The most important network performance parameters that can be evaluated at the network egress are:
• Throughput; defined as the number of bits successfully transmitted and received by a source destination pair in a time interval, divided by the time interval. IP link capacity using IP-layer bits is defined similarly in [44].
• Delay; defined as one-way delay [45] and round-trip delay [46].
• Delay variation (jitter); defined as the differences in the one-way delay of packets belonging to one stream [47].
Sending time
Playout time Arrival time
Size of playout buffer
Time
Frame number
Lost packets
Figure 2.4: A video playout buffer for streaming and conversational video.
• Packet Loss Rate/Ratio (PLR); defined as the ratio of the number of lost packets to the number of transmitted packets between a source destination pair [48].
• Packet loss pattern; defined using the distance between consecutive losses and the length of a loss period [49].
An overview of the requirements to the performance parameters for different video applications is given in Section 2.2.2. Ultimately, these parameters should be used for assessing the QoS perceived by the end-users. This is discussed in more details in Section 2.2.3,
Playout Buffer
For streaming video and conversational video as defined in the next section, an application specific playout buffer (or jitter buffer) is needed at the receiver side for absorbing variable network delays and to allow for retransmission of lost packets.
The size of the playout buffer decides the maximum allowable variation in network delay and also the minimum time needed for buffering before the video playback is started. A too big buffer then means unnecessary delay while a too small buffer causes excessive packet loss [35]. A packet arriving at the playout buffer after its scheduled playout time is considered lost. This is shown in Figure 2.4. For real-time streaming video and conversational video, the size of the playout buffer should be minimized to ensure low end-to-end delay. For non real-time video streaming, the playout buffer can be large to account for large network delays as well as packet retransmissions and will hence provide a low packet loss probability.
2.2. QoS for Video Transmission over the Internet
Decoding
The decoder reconstructs the original video streams. In case of missing frames or parts of frames, the decoder performs error concealment, e.g., using prediction from previous frames or neighboring macroblocks [37]. The degree to which this is successful depends on the video content, the error resilience tools applied at the encoder, and also which parts of a frame and how much is lost. As described in [50], basic MPEG-2 systems do not decode a frame with a lost packet, but discard the entire frame and insert the previous frame instead. With H.264/AVC on the other hand, the error concealment is usually more sophisticated and a lost packet should optimally only result in one lost slice. This assumes that the slices are small enough to avoid fragmentation on the IP layer, as described under packetization.
Hence, lost slices are recovered using error concealment tools corresponding to the error resilience added at the encoder.
Perceived QoS
After decoding, the video stream is played to the end-user. The final video quality perceived by the user is then denoted by the Perceived QoS (PQoS). As can be seen from this discussion, the perceived quality depends on the video clip, the encoding, the packetization, the network protocols, the network performance, the playout buffer, and finally the decoding. Adding the user expectation brings on the term Quality of Experience (QoE). The final quality experienced by the users is hard to predict and analyze without the use of subjective tests. PQoS and QoE for video transmissions over the Internet are discussed in more details in Section 2.2.3, together with an overview of important results for estimating the PQoS from the network performance parameters and the multimedia imposed impairments.
2.2.2 Video Applications
Applications used over the Internet can be divided into elastic and non-elastic applications [51]. Traditional Internet applications such as email, file transfer, and web-surfing are elastic, meaning that they can tolerate delay and losses, in addition to being able to decrease and increase their transmission rate depending on the network conditions. These applications typically use TCP. Real-time video applications on the other hand are non-elastic, meaning that they are less tolerant to packet loss and variations in delay, and they require a minimum capacity equal to the bitrate of the stream and cannot benefit from a higher available capacity.
These applications typically use UDP.
A classification of video into download, streaming, and conversational video is common [31], where only download video is elastic. In addition, the Video-on- Demand (VoD) sub-class of streaming video is called semi-elastic in this thesis, since it can use TCP. These classes of video applications have divergent requirements for throughput, delay, delay jitter, and packet loss. This is also taken into account in the video encoder, where the encoder can be optimized for low-latency or coding efficiency, depending on the application [35].
Table 2.1: Network performance requirements for classes of video applications.
Class Application Bandwidth Delay/Jitter Loss
Download Video download Elastic <15 seconds Zero Streaming Video Video-on-Demand Semi-elastic <10 seconds <1%
Live streaming Non-elastic <2 seconds <1%
Conversational Video conferencing Non-elastic <150 ms <1%
Video telephony Non-elastic <150 ms <1%
The least demanding video application in terms of delay is the downloading of a video for later replay. This application is elastic and can increase and decrease the bitrate according to the available bandwidth, as well as being tolerant to variations in the network delay. Downloading of video requires a lossless transmission, however this is taken care of by the TCP protocol.
Streaming video includes all types of video transmissions where the playback starts before the transmission of the video is finished [34]. Streaming video can be VoD, where the video typically is pre-recorded and stored at a streaming server, or live (real-time) streaming, which is available only in real-time. An example of the former is YouTube, while live-streaming from football matches is an example of the latter. These two types of streaming applications differ in their requirements to the playout delay. Although both VoD and live-streaming can tolerate some buffering, the latter typically has a shorter playout buffer and therefore lower maximum delay, but also higher loss probability. Both of them have stringent requirements for the packet loss probability, even as low as 1% [31].
Finally, the term conversational video is used, covering video conferencing, video telephony, and other applications that are two-way/multi-way. These applications have more stringent requirements for the network performance in terms of maximum end-to-end delay and loss probability. In addition, the applications are non-elastic and the bandwidth requirements are therefore stringent. Loss and delay critical applications are the focus in this thesis, and real-time streaming and conversational video are used.
The requirements to the network performance parameters for different video applications, based on the segmentation from the ITU-T Recommendation G.1010 [31] are summarized in Table 2.1. Live-streaming is not explicitly addressed in [31], but its delay requirement is set to a typical delay of two seconds here to distinguish it from VoD.
2.2.3 Perceived QoS and Quality of Experience
While QoS proposals for the Internet focus on network performance measures such as throughput, delay, delay jitter, and packet loss, the terms PQoS or QoE have gained more importance for multimedia applications. In this thesis, PQoS is used for the QoS as perceived subjectively by the end-users, while for QoE also user expectations and economical aspects are included. The former is therefore most important for the work in this thesis.
2.2. QoS for Video Transmission over the Internet
Focus on the QoE conforms with the G.1000 recommendation which clearly indicates the user perspective. A new definition for QoE is included as an appendix in the ITU-T Recommendation P.10 [52].
Quality of Experience:
“A measure of the overall acceptability of an application or service, as perceived subjectively by the end-user.”
This definition makes a clear separation between the QoS defined in E.800 [5] and in RFC 2216 [15] as concerned with the network performance and the QoE as the quality subjectively perceived by the users. However, this QoE definition better resembles the fourth viewpoint in G.1000 [14].
For evaluating the quality of a video stream as perceived by the users, actual users must be tested. A common measure of the perceived QoS is then the subjective Mean Opinion Score (MOS) [53], where the opinion score is a measure of the perceived quality as seen by test subjects. The evaluation is done by setting up a viewing test as described in [53] and letting a group of people evaluate different video clips, ranging them from 1 (bad) to 5 (excellent). The MOS result is then given as the average of the individual opinion scores. This approach is expensive and time-consuming. Objective tests are therefore frequently used instead. The goal of the objective tests is then to give results that correlate well with the MOS results. Three different classes of tests are used: Full Reference (FR), Reduced Reference (RR), and No-Reference (NR) tests. These are distinguished by the availability of the complete video stream or some simple statistics of the original video stream at the receiver, for comparison with the altered stream.
For FR methods, the original video stream is required for comparison with the distorted stream, which may not always be feasible. The Peak Signal to Noise Ratio (PSNR), which uses the Mean Squared Error (MSE) of the two streams, is a very common FR method, although being criticized for low correlation with subjective results [54]. Another simple FR method is the Structural Similarity Index Measurement (SSIM) described in [55]. SSIM focuses on measuring the structural information change in the distorted stream compared to the original stream in order to assess the image distortion. For RR methods, some statistics of the original video stream must be available at the receiver for evaluation of the quality. In [56], a combined RR/FR method is proposed, using wavelets. Finally, for the NR method, no information about the original stream is available at the receiver. Hence, the distorted video stream must be evaluated for estimating the quality. One approach is to measure the block-edge impairments as described in [57].
The Video Quality Experts Group (VQEG) has led the work on objective tests for assessing multimedia quality. The results reported in [58] showed satisfactory results for two FR algorithms and one RR algorithm, leading to two new ITU-T Recommendations in 2008. These are J.247: Objective perceptual multimedia video quality measurement in the presence of a full reference [59] and J.246:
Perceptual audiovisual quality measurement techniques for multimedia services
over digital cable television networks in the presence of a reduced bandwidth reference [60].
2.2.4 Evaluation of PQoS using Network Performance Parameters
Ultimately, a mapping between the network QoS parameters such as throughput, delay, delay jitter, and packet loss and the PQoS is the goal. This is also identified in a recent paper on video quality assessment [54], where packet loss based metrics are seen as a good solution to the PQoS assessment, due to the low computational complexity compared to evaluation of the fully decoded video stream. Having the discussion from the previous sections in mind, this may look as a difficult task. However, in particular the packet loss burstiness has been identified as an important metric for the PQoS, both for speech, audio, and video. This is especially important for these applications since decoders in general have more difficulties with concealing the effect of consecutive packet losses than single losses, as discussed e.g., in [50].
Speech
For speech, perceived QoS can be assessed using the E-model [61], which is an additive impairment model. Hence, impairments due to SNR, coding, transmission delay etc., are added to give a rating factor that is converted to a MOS value. The E-model has been updated recently, first to account for random packet loss based on results published in [62] and next to account for arbitrary loss distributions based on results published in [63]. The loss distribution is accounted for using the packet loss burst ratio, given as the ratio of the first moment of the length of a loss period to the first moment of the length of a loss period for random losses.
Audio
For audio, network simulations and subjective tests are used for evaluating the effect of packet loss burtiness on the perceived quality in [64]. The distribution of packet loss for music streams transmitted over a network is modeled using results from simulations. Both best-effort and DiffServ nodes are simulated and the differences in the packet loss burstiness resulting from these setups are investigated. The best-effort case showed higher burstiness compared to the DiffServ case because of RED active queue management for the latter. The same packet loss ratio then resulted in a higher MOS value for the DiffServ case than for the best-effort case for acceptable loss ratios.
Video
For video, several approaches have been proposed to assess the perceived QoS depending on the loss process. However, most of these approaches use the PSNR/MSE to estimate the distortion, instead of using subjective tests to estimate the PQoS. The effect of the burst loss on the distortion is modeled and compared
2.2. QoS for Video Transmission over the Internet
to simulations in [65]. The results show that the burst length of the loss process is important for estimating the distortion and that loss occurring in bursts affects the distortion more than single losses of the same amount. In [66], three different methods are presented for evaluation of the quality of distorted video using the MSE. The video is encoded using MPEG-2 and is transmitted over a packet network. The NoParse method uses measures of the packet loss rate only, while the QuickParse and FullParse methods also incorporates the impact of losses.
Next, an approach to real-time assessment of the video quality is described in [50].
Here, a loss-distortion model is developed, using both multimedia and network aspects. For estimation of the distortion, the video content, type of video codec, packetization, loss recovery mechanisms, and the amount of loss are taken into account, the latter through the average number of packets between losses. A mapping between the distortion and the PSNR is given. However, an additive impairment model is used for the losses, and the more severe effects on the distortion when the losses occur in bursts are not taken into account. Finally, a hybrid metric for evaluation of perceived QoS is described in [54]. This model takes the network impairments and information about the video stream into account.
Loss of intra or predictive coded slices give different impairments and the video coding layer complexity, including the content charateristics, the amount of scene changes, and the quantization level, are included to give the final MOS value.
Subjective test results on the effect of consecutive packet losses are given e.g., in [67]. Here, a random neural network model trained with results from subjective tests is used for evaluating the effect of both coding parameters and network QoS parameters on the perceived quality, for H.263 encoded video. In particular, it is found that increasing the number of consecutive lost packets while keeping the loss ratio constant leads to better quality because of fewer deteriorated frames. This is explained by the high frame rate (30 frames per second is used) and thereby difficulty of detecting a distorted frame.
In [68], a similar approach as for the E-model is pursued for video traffic. A parametric video quality model is proposed, with additive impairment factors calculated from the source quality, video coding, transmission impairments etc.
The transmission impairments include the packet loss as well as the packet loss concealment. Only the packet loss ratio is investigated, however non-uniform distributed packet loss is expected to be included in a future model. A strong point is the use of subjective tests, and the ultimate goal is a comprehensive model for the evaluation of video quality comparable to the E-model for speech. The quality evaluation is more complicated for video compared to for speech, because of the content dependencies on the perceived quality as shown in [33]. The effect of the spatial and temporal complexity in the video sequences on the perceived quality are investigated for inclusion in the model from [68].
The extent to which losses are concealed is highly dependent on the error resilience tools, as shown using simulations in [35]. This also means that results for the effect of bursty losses can be taken into account when applying these tools.
As this discussion shows, the effect of bursty losses is highly dependent on the decoding. With a decoding scheme that discards the whole frame in the case of a lost packet, bursty losses should improve the perceived quality compared to