• No results found

Multimedia Encryption

Section 4. 4: MULTIMEDIA ENCRYPTION SCHEMES

4.4.2 Selective Encryption

As a contrast to full encryption, which encrypts all the data except headers, selective encryption encrypts only partial data by exploiting the compression characteristics that some data is less important than or dependent on others in a bitstream. The remaining data will be left unencrypted. Selective encryption is usually a lightweight encryption which trades content leakage and security for reduced processing complexity. Many selective encryption schemes have been proposed in the literature. Although the same principle is utilized by all these schemes, each scheme is designed to work with a specific type of bitstream to exploit the inherent properties of the codec used to generate the bitstreams.

1The overhead is usually a little larger than the cipher’s block size since partition information may be needed to identify an IV in the header field.

Section 4.4: MULTIMEDIA ENCRYPTION SCHEMES 89

They also differ in choosing different portions of data, different ways to encrypt, and different domains to work in. The focus of most selective encryption schemes is perceptual degradation rather than content secrecy. A partial list of selective encryption schemes has been reviewed along with their performance and secu- rity analysis in the literature [17, 23–25]. We describe representative selective encryption schemes in this section.

For an easy description, we have divided selective encryption schemes into four categories: selective encryption for images, videos, and audios, according to the multimedia types, and perceptual encryption. Readers should be aware that due to similarities between image and video compression technologies, an encryption scheme designed for one visual type may be applied to the other encoded with similar technologies with no or small modifications. For example, the similarity between JPEG and MPEG may enable an encryption scheme originally designed for MPEG to be equally applicable to JPEG images, after the parts that exploit the MPEG-specific features are dropped. Readers should be warned that extending an encryption scheme originally designed for one type to a different type without modification may cause unexpected content leakage. For example, when a secure image encryption scheme is applied to encrypt each frame of a video sequence, some content information may still leak out due to inter-frame correlation. This is evidenced in an experiment with MPEG-4 FGS reported by Zhu and co-workers [20, 26]. In the experiment, the base layer was encrypted with a full encryption scheme, and the enhancement layer was left unencrypted. When examined indi- vidually, each frame appeared like a purely random image, and no information about the content could be extracted. When the encrypted sequence was played as a video by setting all the pixel values of each frame in the base layer to a fixed value, contours and trajectories of moving objects in the video were readily visible. Viewers could easily identify those moving objects and what they were doing.

Selective Image Encryption

Selective image encryption encrypts partial image data. Depending on the com- pression technology it works with, these selected data can be Discrete Cosine Transform (DCT) coefficients for DCT-based compression technologies, wavelet coefficients and the quadtree structure of wavelet decomposition for wavelet-based compression technologies, or important pixel bits. These schemes are described next.

Selective Encryption on DCT Coefficients. DCT coefficients in JPEG com-

pression can be selected for selective encryption. One method is to encrypt the bitstream of leading DCT coefficients in each DCT block [27], but it is argued in Cheng and Li [28] that such a scheme may not provide enough perceptual degra- dation since higher order DCT coefficients carry edge information. According to

Cheng and Li [28], even when the encrypted part is more than 50% of the total size of a JPEG compressed image, the object outlines are still visible. An opposite scheme is to encrypt all DCT coefficients except the DC coefficient or the DC coef- ficient plus AC coefficients of low frequencies in a block of a JPEG compressed image [29]. This is achieved by leaving the portion encoded with a variable-length code (VLC) from the Huffman table untouched and the subsequent bits specify- ing the sign and magnitude of a non-zero AC coefficient encrypted. The authors argued that the DC coefficients carry important visible information and are highly predictable. However, the scheme leaves substantial visible information in an encrypted JPEG image. It might be useful in an application where perceptual encryption is desired. Perceptual encryption will be described later in this section.

Selective Encryption for Wavelet Image Compression. Selective encryption

algorithms are also proposed for other image compression technologies. Wavelet compression algorithms based on zerotrees [30, 31] encode information in a hier- archical manner according to its importance. This is very suitable for selective encryption, since intrinsic dependency in a compressed bitstream enables encryp- tion of a small amount of important data to make the remaining unencrypted data useless. A partial encryption scheme for the Set Partitioning In Hierarchical Trees (SPIHT) compression algorithm [31] is proposed in Li and co-workers [32, 33], encrypting only the bits related to the significance of pixels and sets in the two highest pyramid levels, as well as the parameternthat determines the initial thresh- old. Without the significance information related to the pixels and sets in the two highest pyramid levels, it is difficult for a cryptanalyst to determine the meaning of the unencrypted bits. This encryption scheme does not affect SPIHT’s com- pression performance, but requires a deep parsing into a compressed bitstream in both encryption and decryption. The same authors have also extended the selective encryption scheme to SPIHT-based video compression [32, 33], where motion vectors, intra-frames, and residue error frames after motion compensa- tion are encrypted. Intra-frames and residue error frames are encrypted with their SPIHT-based image encryption scheme.

Secret permutation of the wavelet coefficients in each subband of a wavelet compressed image is proposed in Uehara, Safavi-Naini, and Ogunbona [34]. This can be enhanced by encrypting the lowest subband with a cipher. An alternative scheme is proposed in Lian and Wang [35], which permutes the wavelet coef- ficients among the child nodes that share the same parent node for the quadtree structure of the wavelet decomposition of an image. Permutation of blocks of wavelet coefficients along with selective encryption of other data is proposed in Zeng and Lei [36, 37] for video encryption (see the Selective Video Encryp- tion section for details). For wavelet packet image compression algorithms, the quadtree subband decomposition structure is encrypted in the schemes proposed in Uhl and co-workers [38–40]. Security of these permutation schemes is not

Section 4.4: MULTIMEDIA ENCRYPTION SCHEMES 91

very high due to correlation of wavelet coefficients across subbands and inho- mogeneous energy distribution in a subband. For example, wavelet coefficients corresponding to a texture area have significant larger magnitudes than those cor- responding to a smooth area in a high-frequency subband. This information can be used to deduce the secret permutation applied to wavelet coefficients. It is shown in Pommer and Uhl [38] that the encryption of the quadtree subband decomposition structure for wavelet packet compression algorithms is not secure enough against ciphertext-only attacks if uniform scalar quantization is used. These schemes are also vulnerable to known-plaintext attacks if such type of attacks can be launched. Selective encryption for JPEG 2000 that maintains syntax compliance will be described in Section 4.4.4.

Selective Encryption in the Spatial Domain. Selective encryption is also

applied in the spatial domain. A simple approach is to encrypt bitplanes of an image [29, 41, 42] before compression or alone without compression. Since a higher sig- nificant bitplane contains more visual information than a lower significant bitplane, it is natural for selective encryption to encrypt from the most significant bitplane to the least significant bitplane. Podesser and co-workers [41, 42] reported that for a gray-scale image of 8 bitplanes, encryption of the most significant bitplane still leaves some structural information visible, but encryption of the two most significant bitplanes renders the directly decompressed image without any visi- ble structures and encryption of the four most significant bitplanes provides high confidentiality. Selective encryption in the reverse order, i.e., from the least signif- icant bitplane to the most significant bitplane is proposed in Van Droogenbroeck and Benedett [29] to provide perceptual encryption.2

These authors report that at least four or five bitplanes out of a total of eight bitplanes need to be encrypted to provide visible degradation. These schemes incur significant compression effi- ciency overhead if combined with compression, since encryption is applied before compression and encryption changes the statistical properties of the image data exploited by a compression algorithm.

Quadtree image compression can be considered as a spatial domain compression approach which partitions an image block recursively to form a quadtree structure, with the initial block set to the image itself [43–46]. The parameters attached to each leaf node describe the corresponding block. Quadtree compression is com- putationally efficient. A selective encryption scheme is proposed by Cheng and co-workers [32, 33], which encrypts only the quadtree structure and leaves the parameters attached to the leaf nodes unencrypted for the quadtree image com- pression where only one parameter is associated with a leaf node to describe the average intensity of the corresponding block. The unencrypted leaf node values

have to be transmitted in some order. According to the authors, the in-order traver- sal of the quadtree is considered insecure since certain properties possessed in such ordering make it susceptible to cryptanalysis. It is recommended to use the ordering which encodes the leaf node values one level at a time from the highest level to the lowest level.

Selective Video Encryption

For compressed video such as MPEG video, the partial data selected for encryption can be headers, frames, macroblocks, DCT coefficients, motion vectors, etc. Since MPEG-1 and MPEG-2 are two coding standards that are widely used for video compression, most selective video encryption schemes, especially those developed at an early time, were designed for MPEG-1 and MPEG-2.

Header Encryption. Headers are encrypted as the lowest of the four levels of

encryption in SECMPEG, a modified MPEG bitstream incorporated with selective encryption and additional header information proposed by Meyer and Gadegast in 1995 [47]. On the one hand, header encryption has the advantage of low complex- ity, thanks to easy parsing of headers and encryption of a small fraction of data. On the other hand, the encryption prevents extraction of basic information of video from the cipher bitstream and makes adaptation impossible without decryption. Reconstruction of headers in MPEG-1 is relatively simple [17, 48] due to simple MPEG-1 headers and a limited variation of headers in a codestream. The security of header encryption is low.

Prediction-Based Selective Encryption. In MPEG video coding, predictive

coded frames (P-frames) and bidirectional coded frames (B-frames) are predicted from intra coded frames (I-frames). Without the knowledge of corresponding I-frames, P- and B-frames are not decodable and therefore are useless. This idea has been used in the selective encryption schemes proposed by Maples and Spanos [49] and Li et al. [50], where only I-frames are encrypted. Reduction of data to be encrypted in this approach depends on the frequency of I-frames. Since an I-frame contains a substantially significant number of bits than a P- or B-frame, a large fraction of data still needs to be encrypted with the scheme unless I-frames appear very infrequently, which may cause other undesirable consequences such as a long delay in switching channels or prolonged perceptual distortion when packet loss occurs.

A method to reduce the amount of data to be encrypted for an I-frame is pro- posed in Qiao and Nahrstedt [51, 52]. This method partitions the bytes of each I-frame in an MPEG bitstream into two halves: one half consisting of bytes at odd indices and the other half consisting of bytes at even indices. The XORing result of the two halves replaces the first half, and the second half is encrypted

Section 4.4: MULTIMEDIA ENCRYPTION SCHEMES 93

with a standard encryption algorithm such as DES. In effect, the DES encrypted half of the bitstream serves as a one-time pad for the other half. Low correlation between bytes in the MPEG bitstream makes this approach quite secure. MPEG compression removes effectively most correlation in a video sequence. Some cor- relation still remains after compression, since a practical compression scheme cannot achieve the theoretical rate-distortion limit. In addition, the headers and synchronization markers inserted into a bitstream introduce additional correlation. This approach achieves only modest reduction in complexity. The method, which is extended in Tosun and Feng [53], applies the algorithm again to the second half, resulting in a quarter of bytes being encrypted with DES. By doing so, the security is lowered since the one-time pad is no longer just one time.

Selective encryption of I-frames suffers content leakage. Agi and Gong [54] have shown that some scene content from the decoded P- and B-frames is percep- tible even in the absence of I-frames, especially for a video sequence with a high degree of motion. This content leakage is partially due to inter-frame correlation and mainly from unencrypted I-blocks in the P- and B-frames. Certain multime- dia applications such as pay-per-view may live with such content leakage since the perceptual quality of the reconstructed sequence is still quite low. For those applications requiring higher security, a simple remedy is to encrypt I-blocks in P- and B-frames in addition to I-frames [47, 54]. Increasing I-frame frequency may also mitigate the content leakage problem with lowered compression effi- ciency. Another remedy [55] is to encrypt all I-macroblocks in all frames as well as headers of all predicted macroblocks with a substantial increase in complexity. The amount of data to be encrypted is about 40–79% of the total. The complexity can be reduced by encrypting every other I-macroblock and predicted macroblock header to achieve adequate security, resulting in 18–40% of the total bits to be encrypted [56].

Selective Encryption on DCT Coefficients and Motion Vectors. DCT coeffi-

cients and motion vectors are also selected to be encrypted in selective encryption. One approach is to encrypt selected or all sign bits of DCT coefficients and motion vectors of MPEG video. The Video Encryption Algorithm (VEA) proposed in Shi and Bhargava [57] uses a secret key to randomly flip the sign bits of all the DCT coefficients. This is achieved by XORing sign bits with a keystream con- structed by repeating a pseudo-randomly generated bitstream of lengthm. Security of this scheme is very low. It is vulnerable to known-plaintext attacks. A simple ciphertext-only attack may also break the encryption due to the repetition in the keystream. A variation is to randomly flip the sign bits of DC coefficients of I-frames and the sign bits of motion vectors of P- and B-frames [58]. The security of this variation is weaker than VEA [59]. Security is improved in the scheme proposed by Shi, Wang, and Bhargava [60]. This scheme uses a block cipher to encrypt sign bits of DCT coefficients up to 64 bits per macroblock selected from

low to high frequencies. Those sign bits are extracted from each macroblock and placed back into their corresponding positions after encryption with a block cipher. These schemes have weak security. The search space is not large enough, and a brute force attack is feasible, especially for low bitrate compressed video. For example, when MPEG-4 base layer compression is applied to Quarter Common Intermediate Format (QCIF) video sequences of a frame size of 144×176 with the AC coefficient prediction turned on, “Miss America” has on average 1 non-zero AC coefficients per 8×8 block at 30 kbps, and “Coast Guard” has on average 4.3 non-zero AC coefficients per block at 100 kbps [20, 26]. This means that on average the sign bits of AC coefficients in each block can generate only 2 states for the first case and about 20 states for the latter case. They may also suffer from error-concealment-based attacks [11]. Nonetheless, these schemes may be applied to applications where degradation rather than secrecy is the main concern. Their main disadvantage is the necessity to parse fairly deeply into the compressed bit- stream at both encryption and decryption. Parsing and extraction of sign bits may take a substantial fraction of total computational time. Experiments reported in Zhu et al. [20, 26] show that with a fast cipher full encryption of all the video data in a frame is much faster, if time spent to extract data and place back is included, than the selective encryption scheme which extracts sign bits and other data, encrypts, and then places them back to their corresponding positions.

Permutation of Selected Data. A frequently used approach is to permute ran-

domly selected or all blocks, macroblocks, coefficients, and motion vectors. An early MPEG video encryption scheme [61] is to use a random permutation to replace the zigzag order in scanning two-dimensional (2-D) DCT coefficients into a one-dimensional (1-D) vector before run-length coding for each block. The DC coefficient is split into two halves, with the highestAC coefficient of the block being set to the higher half. This method can be combined with the aforementioned sign bit encryption of DCT coefficients [62]. In Zeng and Lei [36, 37], encryption of sign bits of DCT coefficients and motion vectors is combined with permutation of DCT coefficients at the same frequency location within each segment consisting of several 8×8 blocks or macroblocks. The permutation is controlled by a key. The same authors also describe a variation of the scheme for wavelet-based video compression that selective encryption of sign bits is combined with permutation of blocks where each subband is partitioned into a number of blocks of the same size. Rotation controlled by a key can also be applied to each block of subband coefficients. Permutation is also used in Wen et al. [11, 63] for syntax-compliant encryption, which will be described in Section 4.4.4. The security of these schemes is weak. They cannot withstand known-plaintext and chosen-plaintext attacks if the permutation table does not change frequently [51, 64, 65]. Ciphertext- only attacks can also be launched successfully against the random permutation of the scanning order of 2-D DCT coefficients by exploiting the fact that non-zero

Section 4.4: MULTIMEDIA ENCRYPTION SCHEMES 95

AC coefficients are likely gathered in the upper-left corner of an I-block [51]. An additional disadvantage of the scheme of random permutation of the scanning