3: VIDEO AUTHENTICATION 129 Signed Bit- - Multimedia Authentication

Multimedia Authentication

Section 5. 3: VIDEO AUTHENTICATION 129 Signed Bit-

steam Signature of Video Signing End of Video Private Key Authentication Generation Watermarking Entropy Encoding

ECC & Crypto- hashing DCT Coefficients Feature Extraction Entropy Decoding Bit-stream Transcoding Approaches Authentication Strength Y

FIGURE 5.11: Diagram of the signing part of the transcoding resilient video authentication system.

Seigel’s scheme, the packet is replaced by a video frame, and the hash digest of packet data is replaced by the hash digest of features of a video frame.

The brief block diagram of the signing part of Sun, He, and Tian’s solution is shown in Figure 5.11. To be compatible with most video transcoding schemes and independent of video coding and transcoding approaches, the proposed solution is performed in the DCT domain by partially decoding the MPEG bitstream. The three inputs of the signing part are the video sender’s private key, the authentication strength, and possible transcoding approaches. Here, the authentication strength means protecting the video content to a certain degree (i.e., the video will not be deemed as authentic if it is transcoded beyond this degree). First, frame-based invariant features are extracted from the DCT coefficients based on the given transcoding approaches and the authentication strength. Second, features are encoded using the feature ECC scheme to get a feature ECC codeword; this is to ensure that the same features can be obtained during verification in spite of the incidental distortion. Third, the feature ECC codeword is crypto-hashed to get a hash digest for every frame. Authentication information, consisting of the PCB data and the hash digests of the current video frame and other consecutive

video frames, is sent to the recipient for later authentication through watermarking. In addition, the hash digest is recursively operated frame by frame until the end of the video; a signature of this video is generated by signing the final hash value using the sender’s private key. This signature is also sent to the recipient through watermarking.

The robustness of the solution in Sun, He, and Tian [27] depends on how to select invariant features, how to obtain the hash digests of the dropped frames, and how to design a robust watermarking algorithm. A watermarking algorithm resilient to normal transcoding approaches has been proposed in Sun, He, and Tian [27].

To ensure a system’s robustness to requantization, several quantized DCT coefficients are chosen as the feature of the video frame. The quantization step is set as the maximum quantization step that may be used in future transcoding. This is based on the invariant property discovered by Lin and Chang [15]: if a DCT coefficient is modified to be an integral multiple of a quantization step that is larger than the steps used in later JPEG compressions, then this coefficient can be exactly reconstructed after these compressions. In MPEG compression, this invariance is almost certainly preserved, except “dead-zone” quantization area in the Inter-Macroblock may cause some variance, which is however, small and could be easily eliminated [27]. To ensure a system’s robustness to frame resizing (i.e., the conversion of video from the Common Intermediate Format (CIF) to the Quarter Common Intermediate Format (QCIF)), features are extracted from a QCIF video frame instead of the original CIF video frame. The issue of robustness to frame dropping is solved by embedding into the current frame (e.g., FrameN) not only its ECC check information and hash digest, but also the hash digests of other frames (e.g., FrameN-1, N-2, ? N-m). Therefore, even if some frames betweenN-1andN-mare dropped during transcoding, their corresponding hash digests can still be obtained from the current frame.

5.3.2 Object-Based Video Authentication

Object-based video authentication is designed for MPEG-4-related applications. In MPEG-4, a video frame is viewed as a composition of meaningful video objects with shape, motion, and texture rather than a collection of pixels with luminance and chrominance in MPEG-1/2, and video coding or editing is carried out on the Video Object Planes (VOPs).

An object-based authentication system should also be a semi-fragile authentication system. That is, it should be robust to incidental distortions while being sensitive to intentional distortions. Compared with frame-based video authentication, however, the incidental and intentional distortions in object-based video authentication are different since the acceptable video processes and intentional attacks are different. Here, some common acceptable video processes and intentional attacks are listed.

Section 5.3: VIDEO AUTHENTICATION 131

Some common acceptable video processes are

• RST(Rotation, scaling, and translation). In object-based applications, the

interesting video object may be rotated, scaled, and/or translated to meet the special requirements of end-users. The rotation angle could be to any degree, the translation could be in any style, and the scaling factor could be in a reasonable range.

• Segmentation error. Segmentation error refers to the difference between

shapes of the original object at the sending site and the resegmented object at the receiving site.

• MPEG-4 coding. In MPEG-4 coding, processes that affect the robustness

of a video authentication system can be classified into two categories. One category consists of traditional coding processes including quantization, motion estimation, and motion compensation, which are similar to those in MPEG-1/2 coding. The other category is composed of the processes that are unique for MPEG-4 coding, such as VOP formation and padding. Common intentional attacks include traditional and object-based attacks. As shown in Figure 5.12, object-based attacks could be content modification on an object or background, object replacement, or background replacement. Note that the shape stays unchanged in the object replacement.

Not many object-based video authentication solutions have been proposed. Yin and Yu claimed that their solution for MPEG-1/2 video could be extended to MPEG-4 video [28] by expanding the locality precision to video object (VO) level, video object layer (VOL) level, video object plane (VOP) level, and group of VOP (GOV) level and/or block level and by containing shape information in the watermark besides the texture and motion information. A system-level authentication solution is proposed in He, Sun, and Tian [29]. The block diagram of this solution is shown in Figure 5.13. The procedure for signing is on the left side, while the procedure for verification is on the right side.

In the signing procedure, the input could be in either raw video format (segmentation is needed in this case) or object/background MPEG-4 compliant format, while the outputs are signed MPEG-4 bitstreams. First, robust features of the object and its associated background are extracted. Second, authentication information

(a) Original video (b) Object replacement (c) Background replacement (d) Object modification

MPEG4 Decoder Background Authentication Information Extraction Feature Extraction Feature Extraction Feature Extraction Feature Extraction Segmen- tation Video Frame Object Background Authentication Information Extraction Object MPEG4 Decoder MPEG4 Encoder MPEG4 Encoder Authentication Information Insertion Authentication Information Generation Authentication Information Insertion Authenticity Verification _{Y/ N}

FIGURE 5.13: Block diagram of the object-based video authentication.

is generated. The procedure for generating authentication information is similar to that in a frame-based video authentication system, in which an ECC scheme is employed to tackle feature distortions caused by acceptable video processes. The only difference is that besides the object features, features of the background are also employed to create the hash digest of the object. By including the features of the background into the authentication information, a secure link between the object and its associated background has been created. Therefore, the object is not allowed to be combined with other backgrounds. In other words, malicious attacks such as object or background replacement can be easily detected. Third, authentication information is sent using watermarking techniques. Finally, the signed object and background are compressed into MPEG-4 bitstreams. Note that a digital signature for the video could be generated by signing the hash digest using the sender’s private key.

To authenticate the received video, the MPEG-4 bitstreams have to be de- compressed to retrieve the object and background. Following the same procedures as in the signing part, features of the object and the background can be obtained. Meanwhile, the authentication information is extracted from the watermark. The authenticity decision comprises two steps, similar to those introduced in the semi- fragile crypto-hash-based image authentication solution (Figure 5.8).

The system’s robustness to incidental distortions depends on how to select robust features and how to design a robust watermarking algorithm. For robust object-based watermarking schemes, refer to Chapter 7 or Reference [29]. To select robust features, the definition of Angular Radial Transformation (ART), a visual shape descriptor in MPEG-7 [30, 31], is first extended from an object mask to object content, and then the ART coefficients are selected as the features of the object and the background. ART has the following specific properties: (1) it gives a compact and efficient way to describe the object, and (2) the ART

In document Multimedia Security Technologies for Digital Rights Management pdf (Page 153-157)