Multimedia Authentication
Section 5. 3: VIDEO AUTHENTICATION
• Temporal attack detection. Temporal attacks are attacks in the time axis,
e.g., adding, cutting, or reordering video frames in a video sequence.
• Low computational overhead. Authentication solutions should not be very
complex since real-time authentication is often required.
• Robustness to transcoding. Video transcoding, e.g., requantization, frame
resizing, and frame dropping, is often employed in video streaming to cre- ate a new video bitstream to adapt to various channels and terminals. In requantization, a larger quantization step is employed to requantize the DCT coefficients to reduce the bit-rate. Frame resizing means reducing the frame resolution to adapt to end-users’ monitors. In frame dropping, for instance, an original video with 25 frames per second can be transcoded to a new video with 5 frames per second by dropping 20 frames.
• Robustness to object-based video manipulations. In object-based video
applications, the corresponding authentication solutions should be robust to object-based manipulations, e.g., object-based coding and editing.
Based on applications, solutions for video authentication can be classified into frame-based and object-based solutions. The former is for MPEG-1/2-related applications, while the latter is for MPEG-4-related applications.
5.3.1 Frame-Based Video Authentication
Video can be considered as a collection of video frames, so the integrity of a video includes the integrity of each video frame and the integrity of the sequence of video frames. To authenticate each video frame, many image authentication solutions can be employed with little or even no modification. To detect the temporal attacks in the sequence of video frames, time information, such as time stamp, picture index, or GOP (Group of Picture) index in an MPEG video, could be employed.
Lin and Chang proposed two types of signatures for authenticating videos that have undergone two categories of processes [3]. In the processes in the first cate- gory, which include three transcoding approaches and one editing process, Motion Vectors (MVs) picture type, and GOP structure of a video that has undergone these processes are kept unchanged. In the second category processes, e.g., format trans- mission, the MVs, picture type, and GOP structure of a video may all be changed. In both types of signatures, relations between DCT coefficients in the same position of different DCT blocks are employed to detect content alteration. In the signa- ture for authenticating videos that have undergone the first category processes, the hash digest of the GOP header, picture header, MVs, and other time and structure information is employed to detect temporal perturbation. In the second type of signature, only the picture time code is employed to detect temporal attacks, since other time information will be changed.
The above two types of signatures are generated based on each GOP. Since signature signing is much more time-consuming than signature verification, the
Signature of the video Signing Last
Packet
(a) Stream signing (b) One packet contains several hash digests of other packets
Crypto-hashing Packeting Encoding Raw Video Private Key Y N H(N) H(N) H(N+1) H(N+2) N+1 H(N+1) H(N+2) H(N+2) H(N+2) H(N+1) N H(N+3) H(N+3) H(N+4) N+2 N+2 N+1
FIGURE 5.10: Stream authentication solution robust to packet loss.
computation cost of Lin and Chang’s scheme [3] is high. To reduce the computation cost, a typical solution considers video as a stream of video packets and only signs the last packet (Figure 5.10(a)): the hash digest of each packet is XORed with the hash digest of its previous packet, and only the hash digest of the last packet is signed by a private key to generate a digital signature for this video. During verification, similar procedures are employed to obtain the hash digest of the received video. The whole video cannot be verified until the recipient receives the signature and the last group of video packets. Nevertheless, it is a fragile scheme; any packet loss during transmission, which is very common if the video is streamed over unreliable channels or protocols such as wireless or User Datagram Protocol (UDP), may lead to failure in authenticity verification. In order to tolerate packet loss, Park, Chong, and Siegel proposed an interesting authentication work based on the concept of ECC [26]. Their basic idea is illustrated in Figure 5.10(b). Naturally, a hash digest has to be appended to its corresponding packet for stream authentication, as shown in the upper part of Figure 5.10(b). The authentication can be executed after receiving the last packet. Such a scheme cannot deal with packet loss during the transmission because the signature was generated based on all hash digests from all packets. To overcome this problem, a straightforward solution is to append the hash digests of the current packet and several other packets to the current packet. If one packet is lost during transmission, its hash digest can still be acquired from its neighboring packets for signature generation. The basic idea behind this scheme is to reduce transmission errors by adding some redundancies.
In Park, Chong, and Siegel’s scheme [26], the importance of different packets is equal. However, in a video bitstream, the packets containing DC components are more important than those only containing AC components. Park, Chong, and Siegel’s scheme is also not robust to video transcoding approaches such as frame resizing, frame dropping, and requantization. To overcome these disadvan- tages, Sun, He, and Tian [27] proposed a transcoding resilient video authentication system by extending Park, Chong, and Siegel’s scheme. In Park, Chong, and
Section 5.3: VIDEO AUTHENTICATION 129