Lossless Video Compression Using Bayesian
Networks and Entropy Coding
Rochan Avlur Venkat
Computer Science and EngineeringMahindra Ecole Centrale
Hyderabad, India [email protected]
Dr. Chandrasekar Vaidyanathan
Chairman, Department of MathematicsDayanand Sagar University
Bangalore, India [email protected]
Abstract—Lossless video compression algorithms are used in applications ranging from archival of video records to the field of medicine. In this paper, we propose a simple yet effective encoding technique for lossless video compression. Our technique automatically learns Bayesian Networks to discover Conditional Dependencies in a video stream through Stochastic Hill Climbing. Utilizing this network, variable length codes are generated using Entropy Coding procedure achieved by Huffman Coding. This structured data is used to encode the video stream. The algorithm has been tested and compared alongside H.264, FF Video Codec 1 (FFV1) and Gzip. The proposed compression model performs on average at least as good and at times better than the aforementioned techniques.
Index Terms—Bayesian Network, Entropy Coding, Lossless Compression, Video Compression, Video Coding Algorithms, Encoder, Decoder
INTRODUCTION
High Fidelity video data is increasingly playing an impor-tant role in medicine, science, education, entertainment and areas of human collaboration. Studies show that by 2020, IP video traffic is expected to constitute 82% of all IP traffic [1]. This growing demand can be seen in other industries and applications where video data is being consumed at unprece-dented rates. Around 66% of all Business Internet traffic is expected to be due to Business Internet video traffic. Video Surveillance traffic will grow 10-fold and Virtual Reality traffic will increase 61-fold by 2020 [2] [3].
This increased demand challenges currently deployed net-work infrastructure with limited transmission rates. It is more beneficial to use software optimization techniques to reduce this strain. Video Compression algorithms are specifically de-signed to tackle this problem. Similar to general compression algorithms, they function on the principle of representing video data in a more compact and robust way to effectively manage the storage and transmission resources in terms of size, bandwidth and power consumption. Extensive research over the last few decades have paved the way to efficient and effective compression techniques [9] [10] [11]. To enable wider adoption, specific attention is given to maintaining speed in high performance environments and reproducibility of pixel color without loss of visual fidelity after decompression. It is common knowledge that when confronted with a large video stream, the actual information-theoretic amount of information
contained in the video stream might be much less. Each video frame contains spatially correlated information and temporally correlated information between consecutive frames.
Popular compression methods used today are mostly lossy, which is acceptable for casual day-to-day viewing. However, many critical environments such as healthcare and research require source level quality, rendering the lossy compressed video unacceptable. Moreover, compression of video data using lossy techniques for archival purposes restricts the users ability to obtain the original video after decompression, thus elevating the need for efficient and flexible lossless com-pression techniques. Lossless video comcom-pression algorithms exploit pixel redundancy and encode the video stream without any loss of information. This enables perfect reconstruction of the original video from the compressed data, though limiting the compression ratio achievable.
This paper illustrates the potential and performance of a video compression algorithm using Bayesian Networks learning to discover inter-relationships and Entropy Coding that allows for very aggressive compression of the given data. The paper begins with a overview of the background research. We provide a brief overview of lossless video codec landscape. Section II provides a description of the algorithm design, illustrates the interplay between Bayesian Networks and Entropy Coding techniques and describes the pseudo code. Section III dives into the experimentation and observations. The paper closes with a discussion on the impact, use cases, future improvements and futher areas of research.
I. BACKGROUND A. Traditional compression techniques
Video compression can generally be associated to a problem of pattern recognition and pattern classification. Discovery of structural patterns in input data can eliminate redundancy to represent it more succinctly. Majority of the lossless compres-sion algorithms adopt the following two steps in sequence:
• learn a statistical model for the video stream
• use the model to map input data to bit sequences in such a way that probable (e.g. frequently encountered) data will produce shorter output than improbable data
,6%1 3$57&)37(2$57
,(((
Lossless codecs such as FF Video Codec 1 (FFV1), for example, utilizes an intra-frame context model along with Arithmetic Coding.
B. Bayesian Networks
Bayesian Networks [12] or Belief Networks have been a subject of intense research in a wide variety of scientific fields. Predominantly, in the recent past, they have been used to model and explain a domain, update beliefs about states of certain variables when some other variables were observed. Bayesian Networks have been successful at comput-ing conditional probability distributions, findcomput-ing most probable configurations of variables, supporting decision making under uncertainty, finding good strategies for solving tasks in a domain with uncertainty and many more. Previous research in exploring Bayesian Networks based compression have shown potential in dataset compression [4] [5] [6]. However, video compression using Bayesian Network and Entropy Coding has not been explored. The core of this paper aims to investigate and justify the performance of a video compression pipeline built using the above concepts.
A Bayesian Network consists of a Directed Acyclic Graph (DAG) in which each vertex corresponds to a variable, plus a probability distribution P(x1 | xi) for each variable xi, wherexi is the set of xi’s parents in the DAG. Vertices of the Bayesian Network are variables while the Edges represent the probabilistic dependencies among the corresponding ran-dom variables. These conditional dependencies in the graph are often estimated by using statistical and computational methods. Given such a Bayesian Network, the Joint Probability (JP) distribution over all variables x1, ..., xnis then calculated as: P(x1,· · · , xn) = n i=0 P(xi| xi) (1) C. Structure Learning
Given a dataset of values, motivation is to discover a Bayesian Network structure that best represents the data [7] [8]. Common implementations of a Structure Learning algo-rithm utilizes a score function to be maximized, typically a characteristic of greedy search algorithms.
Score based Structure Learning algorithms begin with some network structure, which is a DAG. The learning algorithms neighborhood in DAG space can be specified by networks we can reach by applying an operator Add, Delete or Reverse to an edge. At each search step, the best neighbor is chosen and we move to it.
Greedy algorithms are susceptible to multiple Local Max-ima. Two simple strategies can be effective to overcome this situation:
• TABU List - Do not revisit recently seen structures • Random restarts - Apply some operators at random when
at a local optimum
The Structure Learning algorithm is described as follows:
Algorithm 1 Structure Learning
function GreedyHillClimbing(initial structure, Ninit, dataset D, scoring function s, stopping criteria C)
N← N init, N
← N, tabu← N while C is not satisfied do
N← argmaxN∈neighborhood(N)andN∈tabus(N)
if s(N) > s(N) then // Check local optimum
N← random(N) // Random operators end if
if s(N
) > s(N) then // Check new best N← N end if tabu← tabu ∪ N N ← N // Move to neighbor end while return N end function
Structure Learning of the Bayesian Network was imple-mented using bnlearn and gRain packages in R programming language.
D. Entropy Coding and Statistical Models
Entropy Coding is a lossless compression technique that attempts to replace frequent source symbols with shorter codes. Entropy Coding algorithms also follow a two step sequence:
• Modeling assigns probabilities to the symbols
• Coding produces a bit sequence from these probabilities Efficiency of the encoding scheme can be improved if the exact probability estimation is available. Since the model is responsible for the probability of each symbol, modeling is one the most important tasks in data compression.
There are two primary ways of constructing Statistical Models. In a Static Model, data is analyzed and a model is con-structed, then this model is stored with the compressed data. This approach is simple and modular, but has the disadvantage that the model itself can be expensive to store. It also forces using a single model for all data being compressed. Static Models performs poorly on files that contain heterogeneous data. Adaptive Models, as the name suggests, dynamically update the model as the data stream is compressed. Both the Encoder and Decoder begin with a trivial model, yielding poor compression of initial data. But as they learn more about the data, performance improves.
Entropy Coding can be achieved by different coding schemes. A common and relatively simple scheme, which uses a discrete number of bits for each symbol, is Huffman
Coding. The process behind this scheme includes sorting
numerical values from a set, in order of their frequency, and assigning symbols with length inverse to their probabilities. Another approach is Arithmetic Coding, which outputs a bit sequence representing a point inside an interval. The interval is build recursively by the probabilities of the encoded symbols. Arithmetic Coding achieves a significantly better compression
2
ratio than Huffman Coding, but requires more computational resources.
This paper describes a video compression algorithm that take advantage of Bayesian Networks as the Statistical Model and leverages Huffman Encoding to generate variable length codes, while exploiting spatial and temporal redundancy of video data.
II. MODEL
A generic high-level outline of the proposed compression technique is depicted in Fig. 1. In this section, we limit our focus to operations performed specifically by the Encoder, since the Decoder simply performs similar operations in a slightly different order.
Algorithm 2 Model Encoding Scheme function ENCODE(numOf F rames)
f ramen ← P ixelT rans(1) for i= 2 to numOfF rames do
f ramen+1← P ixelT rans(i)
BinCodes← BinaryT rans(framen, f ramen+1) BayesCP T ← BayesNetworkLearn(BinCodes) Huf f Codes← HuffmanT ree(BayesCP T ) Compress(HuffCodes)
f ramen← framen+1 end for
end function
• Pixel Transformation - YCbCr defines a Color Space in terms of one Luminescence component and two Chroma components. Y represents the Luma component while Cb and Crrepresent Blue and Red Chrominance components respectively. Conversion of the Color Space aids the proposed compression technique in performing Channel specific Conditional Relationship learning efficiently and accurately.
• Binary Code Construction - Responsible for assigning
suitable source code lengths to obtain optimal com-pression. Binary representation of paired pixel values restricted among individual Color Channel provides the
right balance between segmented and correlated pixel data. Conditional Relationship can be further strength-ened by pairing constructions based on Block Motion Compensation.
• Learning Bayesian Networks - Discovering Conditional
Dependencies among binary representation of pixel val-ues corresponding to Channels from video stream is treated as a learning problem. Our algorithm discovers the best representing Bayesian Network, chosen using a search procedure that tries to maximize a Scoring function. The underlying Conditional Probabilities in the data are then passed on to the Entropy Encoder. • Entropy Encoding - The Joint Probability distribution
can be computed from the Conditional Probabilities as given by “(1)”. Variable length codes are generated using Huffman Coding and stored in a Lookup Table.
Encoding of the video stream is achieved using this Lookup Table. However, storing the table in its entirety is resource heavy and can lead to sub-par compression ratios. Encoding the Conditional Probability tables learned form the Bayesian Network is more efficient and scales impressively with longer video streams. The following sections provide detailed de-scription of individual blocks of the proposed compression technique and methodology.
A. Pixel Transformation
The proposed compression technique begins with Pixel
Transformation. It is well established that our eyes perceive
different Color Channels uniquely. Over the years, researchers have exploited this knowledge by utilizing Color Channel specific video and image compression. The algorithm utilizes mathematical transformations to convert Video Color Space to YCbCr Color Space. Another reason for conversion of the Color Space is due to the fact that utilization of YCbCr Color Space has demonstrated to achieve better compression ratios than other Color Spaces.
Our algorithm exploits the Channel specific spatial and temporal correlation that can be exposed effectively using the specified Color Space. High correlation can be observed between consecutive frames from the same Color Axis, hence
CHROMA& LUMAPROCESSING
Raw Video Framing
YCbCr Decomposition Pixel Transformation Binary Code Construction Learning Bayesian Networks Entropy Encoding
Raw Digital Video
Compressed Video
Fig. 1. Schematic Block Diagram
3
justifying independent encoding of each Color Axis. The below procedure is executed on each Color Channel indepen-dently.
Algorithm 3 Pixel Transformation define F rame[Height][W idth] define C[Height][W idth][3]
function PIXELTRANS(F rameN um)
F rame← V ideo(F rameNum) // Extract frame // Convert Color Space
C[i][j][0] ← Luma(F rame[i][j]) C[i][j][1] ← BChroma(F rame[i][j]) C[i][j][2] ← RChroma(F rame[i][j])
return C end function
Observation from experiments showed inferior compression ratios when Color Channels were interleaved or considered together when learning a Bayesian Network. We propose further research to be conducted regarding the choice of Color Space transformation.
B. Binary Code Construction
To enable the Bayesian Network to discover Conditional Dependencies, the algorithm converts pixel data from indi-vidual Color Channels into binary variables. Pixels can be represented as 8-bit binary streams or at times, even as 10-bit binary streams, depending on the color depth.
Our algorithm iterates over pixel values strengthening Con-ditional Relationships among the neighboring Pixels, while at the same time, retaining only relevant information using inter-frame compression.
Algorithm 4 Binary Code Construction define P air[2]
define16bit binary
function BINARYTRANS(P ixeln, P ixeln+1) for iterate over pixels do
if|P ixeln+1[i][j][c] − P ixeln[i][j][c]| > 0 then pair[0] ← P ixeln+1[i][j][c]
pair[1] ← P ixeln+1[i + a][j + b][c] // wherea and b are modifiers end if
end for
for iterate over pair do
binary← pair[0] binary <<8 binary← pair[1] end for return binary end function
The algorithm takes two passes over consecutive frames. In the first pass, the algorithm filters out Pixels that undergo
context change. In the second pass, it constructs fixed length binary codes. We observed that pairing neighboring Pixels greatly increases the compression achieved. The generated codes are then passed on to the next module.
C. Learning Bayesian Networks
Upon completion of Binary Code Construction for the indi-vidual Color Channels of the frame, our algorithm implemen-tation utilizes a score-based algorithm for network learning. The Codes from the previous step directly map to the Vertices of the Bayesian Network. The algorithm first tries to learn the graphical structure of the Bayesian Network (Structure Learning algorithms) and then estimates the parameters of the local distribution. Structure Learning algorithms can be grouped into two categories, constraint-based algorithms and
score-based algorithms.
Constraint-based algorithms learn the network structure by
analyzing the Probabilistic Relations entailed by the Markov property of Bayesian Networks with Conditional Independence Tests and then constructing a graph which satisfies the cor-responding D-separation statements. Score-based algorithms assign a score to each candidate Bayesian Network and try to maximize it with a Heuristic Search algorithm. Greedy Search algorithms (e.g. Hill-Climbing) are a common choice. Algorithm 5 Learning Bayesian Network
function BayesN etworkLearn(BinCodes)
N et← HillClimbing(N, Ninit, BinCodes, s, C) ConditionalP robT ables← CP T (Net)
return ConditionalP robT ables end function
Partially Directed Acyclic Graphs contain arcs that may
not have a specified direction and thus making the direction of Conditional Dependence ambiguous. Our implementation adopted a Hill-Climbing algorithm with 16 discrete variables with two levels in each variable. The learning algorithm was implemented in the R programming language and utilized the bnlearn package. The Hill-Climbing algorithm produces a Directed Graph unlike other Structure Learning algorithms such as Grow-Shrink, Incremental Association and Max-Min
Parents and Children, to name a few.
The network structure learned by Structure Learning al-gorithms is equivalent to the one learned by score-based algorithms. Changing the arc without a specific direction to one with a direction results in networks with the same score because of the score equivalence property of Bayesian
Information Criterion. Therefore if there is any prior
in-formation about the relationship between two nodes with an undirected arc, the appropriate direction can be white-listed. Our algorithm performs efficiently and accurately and generates Conditional Probability Tables using score-based algorithms as described in 1. However, investigation on the best learning algorithm wasn’t carried out as it deviated away from the main objective of this research.
4
D. Entropy Encoding
On discovering the optimal Bayesian Network for individual Color Channel, the algorithm proceeds to encode the delta pixel values using Entropy Encoding. Existing lossless video compression techniques utilize Entropy Encoding predomi-nantly for compression of residual pixel values that do not fit the Statistical Model. Our algorithm, on the other hand, primarily relies on Entropy Encoding techniques such as
Huffman Encoding, Arithmetic Encoding, to name a few.
Our experimentation consisted of a standard Huffman En-coding scheme for variable length code generation. Entropy Encoding techniques such as those discussed above rely on Entropy of the data and as such the Probability Distribution of the data. Conventional solutions that incorporate Entropy En-coding store the table relating fixed length Source Symbol and the generated variable length Codes. Certain data distributions may benefit largely from making use of such an approach. However, in case of video compression, more often than not, the size of the table can itself occupy considerable section of the overall compressed file space.
The above situation can be counteracted by developing a solution that effectively stores the Relation Table or the Data Distribution itself. The proposed algorithm stores the Joint Probability Distribution of the data in the form of Condi-tional Probability Tables of the learned Bayesian Network. The algorithm utilizes this exact information in re-computing the Joint Probability Distribution during the decompression (decode) phase.
Algorithm 6 Huffman Encoding
// Construct the binary tree. Start with a forest of one-node trees. // Merge until only one remains.
define C // 16-bit input set define Q // priority queue
define f() // frequency associated with node // IfC is a leaf, f(C) is the number of times character C appears in input set. IfC is an internal node, f(C) is the sum of the frequencies of its children
function HUFFMAN TREE(C, Q)
n← |C| Q← C
for iterate over C do
z← allocate − node
x← left(z) ← extract − min(Q) y← right(z) ← extract − min(Q) f(z) ← f(x) + f(y)
end for
return extract− min(Q) end function
The proposed algorithm uses a multi-state encoding ap-proach to tackle certain variable length codes generated for low probability fixed length source. Data with low Joint Probability Distribution can lead to the generation of deep and wide
Huffman Trees causing Codes that are longer than the fixed length Source Symbols. Generated Codes that are longer than the Symbols are rejected and are encoded in their original fixed length form.
E. Decoding
Our compression technique consists of a relatively simple decoding process. Unlike previous Entropy Encoding tech-niques, wherein the Code Word Table is compressed alongwith the original file, our algorithm reconstructs the table during decompression process.
During decompression of the video file at the destination, the algorithm re-computes the Joint Probability Distribution of the data from the Conditional Probability Tables encoded in the form of Bayesian Networks. Subsequently, a Huffman Tree is generated that follows the original Huffman Tree, which was generated during encoding of the video file.
The reduced storage space allocation required by Bayesian Networks collectively results in lower storage requirement than in the case of using a table.
Algorithm 7 Decoding Video Streams function DECODER(numOf F rames)
for iterate over frames do
BayesCP T ← Decode(frame)
Huf f Codes← HuffmanT ree(BayesCP T ) Decompress(HuffCodes)
end for end function
III. EXPERIMENTS A. Settings
Experiments were conducted on a set of raw uncompressed video files selected from the SVT High Definition Multi-Format Test Set [14]. Video datasets consisted of details of varying subjects, foreground motion and background motion. Our motivation was to not only establish the potential of the proposed compression technique, but also to compare it with existing compression techniques.
For each of the test video datasets, we compared the proposed technique (labeled as BayesianCompress) with five of the most widely used techniques: Gzip - a general file compression and decompression software; FFV1 - a lossless intra-frame video codec; H.264 or MPEG-4 Part 10 - a block-oriented motion-compensation based video compression standard (Constant Rate Factor of 0); Dirac - a format that aims to provide high-quality video compression for Ultra HDTV & beyond and as such competes with existing formats such as H.264; JPEG2000 - an intra-frame compression format based on Cosine Transformation.
For the five existing techniques, we utilized the FFmpeg Multimedia Framework to decode raw video streams and encode them in their respective codecs. In our implementation, all Pixel Transformation and Entropy Coding processes were written in C programming language and compiled using the
5
TABLE I EXPERIMENTRESULTS
Parameter blue sky rush hour station tractor Avg. Ratio(25 fps) crowd run into tree old town Avg. Ratio (50 fps)
Frame Rate (fps) 25 25 25 25 - 50 50 50
-# of Frames 217 500 313 690 - 500 500 500
-Codec Compression Ratio
Gzip 1.85 2.03 1.90 1.67 1.86 1.25 1.57 1.54 1.45 FFV1 2.74 3.19 2.69 2.74 2.84 1.88 2.04 1.97 1.96 H.264 2.68 3.04 2.69 2.65 2.77 2.03 2.26 2.03 2.11 Dirac 2.67 2.90 2.68 2.56 2.70 1.90 2.18 2.01 2.03 JPEG2000 2.62 3.20 2.69 2.73 2.81 1.84 2.03 1.95 1.94 BayesianCompress 2.69 3.13 2.80 2.88 2.87 1.95 2.13 2.01 2.03
GNU C Compiler (GCC). Structure Learning of the Bayesian Network was implemented using bnlearn and gRain packages in R programming language.
B. Observations
Table I illustrates the performance of existing and proposed compression techniques on a subset of selected video datasets. To determine the viability of using the proposed technique, we used H.264 as the baseline for comparison. Our experimental setting consisted of multiple test video datasets.
Observing the results of experimental runs, we notice that the difference between BayesianCompress and H.264 com-pressed files are marginal among video test file recorded at 25 frames per second (fps). The proposed technique outperforms H.264 codec in a test video such as rush hour, station and tractor. From the average compression ratios, we see that our
proposed algorithm performs the best in comparison to other tested techniques.
At frame rates of 50 fps however, we can observe from the average compression ratios that H.264 performs better than our algorithm. We advocate that more rigorous testing on a larger test set is required to validate the trend between lower compression ratios for higher frame rate videos while utilizing our algorithm.
Another parameter that severely dictates the usability of a compression technique is the computational resources re-quired. While the requirement of running in real-time severely constrains the capacity of the proposed compression model, future research being carried out discussed in the next section would enable feasible deployment in real-life applications.
IV. CONCLUSIONS
Although there have been past efforts to utilize Bayesian Networks for compression on other forms of data [4] [5] [6], they mainly rely on correlations between data values in the rows of the dataset. This rigid approach greatly hinders the usage of the said compression technique, specifically for video data streams. In this paper, we aimed to exploit spatial and temporal redundancy of video streams. We introduced a novel compression technique that uses Correlation and Conditional Dependencies of Pixels in independent Color Axis of a video stream. We experimentally demonstrated that our algorithm
achieves comparable performance vis-a-vis other techniques on the SVT High Definition Test set. The proposed algorithm performed on average better than state-of-the-art techniques at a frame rate of 25 fps while at a frame rate of 50 fps, it performed on average slightly behind H.264, the current industry standard.
Futhermore, exploring the use of more effective Entropy Compression techniques like Arithmetic Coding can be incre-mentally incorporated. Performance can be further optimized by investigating the use of hardware accelerators (FPGA, ASIC or Co-processor / Multi-Core) to enable parallel exe-cution of the algorithm. Future research areas include imple-mentation of a custom Bayesian Network learning algorithm specifically for the proposed video compression technique to improve its overall accuracy and improve performance.
REFERENCES
[1] Cisco The Zettabyte Era — Trends and Analysis. Cisco, June 02nd, 2016 [2] Cisco Visual Networking Index: Global Fixed and Mobile Internet Traffic
Forecasts 2015-2020. Cisco, Jun 06th 2016
[3] Cisco Visual Networking Index: Global Mobile Data Traffic Forecast
Update 2016-2021. Cisco, Feb 09th 2017
[4] Brendan J. Frey. Bayesian Networks for Pattern Classification, Data
Compression and Channel Coding. Ph.D. Thesis, Univ. Toronto, Toronto,
ON, Canada, 1997
[5] S. Davies and A. Moore. Bayesian networks for lossless dataset
com-pression. In Conference on Knowledge Discovery in Databases (KDD),
1999
[6] S. Davies. Fast Factored Density Estimation and Compression With
Bayesian Networks. Ph.D. Thesis, School of Computer Science, Carnegie
Mellon University, 2002.
[7] D. Grossman, P. Domingos. Learning Bayesian Network Classifiers
by Maximizing Conditional Likelihood. In International Conference on
Machine Learning. ACM, 2004.
[8] R.E. Neapolitan. Learning Bayesian Networks. Pearson Prentice Hall, Upper Saddle River, NJ, 2004.
[9] Differential Pulse Code Modulation (DPCM),
http://einstein.informatik.uni-oldenburg.de/rechnernetze/dpcm.htm
[10] Rongkai Zhao et.al. Fast Near-Lossless or Lossless Compression of Large 3D Neuro-Anatomical Images
[11] Tony Robinson, SHORTEN: Simple lossless and near lossless waveform compression
[12] Artificial Intelligence: A Modern Approach. Pg. 461-565
[13] John Skilling. Nested Sampling for General Bayesian Computation [14] SVT High Definition Multi Format Test Set, ftp://vqeg.its.bldrdoc.gov
/HDTV/SVT MultiFormat/SVT MultiFormat v10.pdf
6