Multimedia Authentication
Section 5. 4: AUDIO AUTHENTICATION
coefficients are robust to segmentation error and invariant to object rotation and shape distortions [31].
5.4 AUDIO AUTHENTICATION
Audio signals, especially speech signals, are often employed as evidence in today’s courts. From the viewpoint of practical application, audio authentication plays the most important role among all types of multimedia authentication. However, solutions for audio authentication are not as abundant as those for image/video authentication. Actually, some digital audio authentication solutions are derived from solutions for image authentication.
Like image/video authentication, a natural solution for audio authentication is comparing the features of the original audio signal and the signal to be authenticated to determine the authenticity. Wu and Kuo called this type of approach acontent feature extraction approach[32]. They proposed a speech authentication solution in which semantic features, e.g., pitch information and the changing shape of the vocal tract and energy envelope, are employed to authenticate speech signals. Compared with syntactic features such as energy function and ZCR, semantic features have a smaller size. For example, at a sampling rate of 8000 samples per second, and the syntactic feature data rate is approximately 400 bytes per second. The storage and transmission of such a large feature are impossible in audio authentication. The computation cost of extracting semantic features is high; however, it could be reduced by integrating feature extraction with Code Excited Linear Prediction (CELP) coding in Wu and Kuo’s scheme.
Transcoding or D/A-A/D conversion of the speech signal may cause mis- synchronization. Therefore, synchronization between the original and received speech signals must be processed before extracting features from the received signal for authenticity verification. In Wu and Kuo’s scheme, the positions of salient points, where the energy of the speech signal climbs rapidly to a peak, are employed to solve the resynchronization issue. The locations of the first 15 salient points in the original speech are encrypted and transmitted together with the authentication information. At the receiving site, salient point locations are detected from the received speech signal and compared with the decrypted data to identify the amount of time-shifting during transmission.
In Wu and Kuo [32], features and positions of salient points are transmitted to the receiving site along with speech data. As a result, the payload of data transmis- sion in this scheme is increased. To overcome this shortcoming, Steinebach and Dittmann proposed a watermarking-based digital audio authentication system, in which authentication information is inserted back into the original audio data [33]. The system is designed to be robust to compression, dynamics, A/D-D/A con- version, and many other operations that only change the signal but not the content.
The diagram of this system is quite similar to Figure 5.6. Robust features employed in this system include RMS (root mean square), ZCR, and the spectra of the audio data. RMS provides information about the energy of a number of samples of an audio file. Muted parts or changes in an audio sequence could be detected by com- paring the RMSs of the original and processed audio sequences. ZCR provides information about the amount of high frequencies in a window of sound data; the brightness of the sound data is described by it. Besides the robust features, a syn- chronization pattern for resynchronization is also included in the authentication information. Since the authentication information is sent to the receiver via water- marking technology and the capacity of an audio watermarking algorithm is very low, the extracted features are quantized and then employed to generate a feature checksum. The checksum could be hash digest, cyclic redundancy checks, or the result of XOR function. Therefore, the actual authentication information in this solution includes the feature checksum and synchronization pattern. Replacing features with feature checksums will not affect the robustness of the system. Since an ideal feature is robust to all acceptable manipulations, its checksum would be exactly the same after the manipulations.
Authenticity verification is a process of comparing two feature checksums after the synchronization pattern is located. One is extracted from the embedded water- mark, and the other is the newly generated feature checksum from the received audio signal following the same procedure as that in signature generation.
In some high-security applications, a one-bit change in an audio track should be detected, and the original audio data should also be recovered after the watermark is detected. This is similar to lossless image authentication. However, the solutions for lossless image authentication cannot be directly employed for lossless audio authentication due to the following problems:
(i) The data size is large during a long recording. If a watermark is to be embedded in a special device, very large memory reserves would be necessary.
(ii) Integrity may not be lost even if the original data is not completely present. For example, a recording of an interview may be edited later, and the message of the interview will not be corrupted if only the introduction of the reporter is removed.
To solve these problems, Steinebach and Dittmann proposed another audio authentication solution, called invertible audio authentication [33], in which no manipulation but cutting is allowed. In this solution, a number of consecutive samples are considered as one audio frame, e.g., 44,100 samples for one sec- ond of CD-quality mono data. Since one sample is represented by 16 bits, one bit layer of this frame can be selected and compressed by a lossless compres- sion algorithm. The difference between memory requirements of the original and the compressed bit layer is used to carry the authentication information.
REFERENCES 135
The authentication information includes the necessary information for proving the integrity of the audio frame, i.e., hash digest of the audio data, synchro- nization header, sequence IDs, and incremental frame IDT. IDs verifies that the frame belongs to a certain recording; this provides security versus exchanges from other recordings. IDT protects the frame order in an audio sequence from being exchanged.
5.5 SUMMARY
In this chapter, we introduced a list of authentication schemes for multimedia applications. Based on the robustness to distortions, these schemes can be classified into complete authentication and content authentication. Signatures and water- marking are two important technologies employed in designing these schemes. Signatures can be further classified into digital signatures and media signatures for complete authentication and content authentication, respectively. We then focused on discussion of various media signature-based authentication techniques for mul- timedia applications, e.g., image, video, and audio. We argued that a good content authentication solution should be not only secure enough against malicious attacks, but also robust enough to acceptable manipulations. Such a good system should also be application-dependent.
REFERENCES
[1] A. Menezes, P. Oorschot, and S. Vanstone.Handbook of Applied Cryptography, CRC
Press, Boca Raton, FL, 1996.
[2] B. Schneier.Applied Cryptography, Wiley, New York, 1996.
[3] C.-Y. Lin and S.-F. Chang. Issues and solutions for authenticating MPEG video,
SPIE Int. Conf. Security and Watermarking of Multimedia Contents, San Jose, CA, vol. 3657, no. 06, p. 54–56, EI ’99, January 1999.
[4] C.-W. Wu. On the design of content-based multimedia authentication systems,
IEEE Trans. Multimedia, 4:385–393, 2002.
[5] Q. B. Sun and S.-F. Chang, Signature-based media authentication, inMultimedia
Security Handbook, CRC Press, Boca Raton, FL, 2004.
[6] J. Fridrich, M. Goljan, and R. Du. Invertible authentication watermark for JPEG
images, in Proc. Int. Conf. Information Technology: Coding and Computing,
pp. 223–227, Las Vagas, NV, 2001.
[7] M. Goljan, J. Fridrich, and R. Du. Distortion-free data embedding for images, inProc.
4th Int. Workshop Information Hiding (IHW), Pittsburgh, PA, pp. 27–41, 2001. [8] M. Awrangjeb and M.S. Kankanhall. Lossless watermarking considering the human
visual system, Digital Watermarking: Second International Workshop (IWDW),
Seoul, Korea, pp. 581–592, October 2003
[9] R. Du and J. Fridrich. Lossless authentication of MPEG-2 video,Proc. Int. Conf.
[10] D. K. Zou, C.-W. Wu, G. R. Xuan, and Y. Q. Shi. A content-based image authentication
system with lossless data hiding,Proc. IEEE Int. Conf. Multimedia and Expo (ICME),
2:213–216, 2003.
[11] M. M. Yeung and F. Mintzer. An invisible watermarking technique for image
verification,Proc. Int. Conf. Image Processing (ICIP), 2:680–683, 1997.
[12] P. W. Wong and N. Memon. Secret and public key image watermarking schemes
for image authentication and ownership verification,IEEE Tran. Image Processing,
10:1593–1601, 2001.
[13] J. Fridrich and M. Goljan. Images with self-correcting capabilities,Proc. Int. Conf.
Image Processing (ICIP), 3:792–796, 1999.
[14] J. J. Eggers, and B. Girod. Blind watermarking applied to image authentication,
Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), 3:1977–1980, 2001.
[15] C-Y. Lin and S.-F. Chang. Semi-fragile watermarking for authenticating JPEG visual
content, SPIE Security and Watermarking of Multimedia Conference, San Jose,
CA, 3971:140–151, 2000.
[16] S.-J. Han, I.-S. Chang, and R.-H. Park. Semi-fragile watermarking for tamper proof-
ing and authentication of still images, in 2ndInternational Workshop on Digital
Watermarking (IWDW), Seoul, Korea, pp. 328–339, 2003.
[17] M.P. Queluz. Towards robust, content based techniques for image authentication, inIEEE Second Workshop Multimedia Signal Processing, pp. 297–302, Los Angeles, CA, 1998.
[18] S. M. Ye, Q. B. Sun, and E. C. Chang. Error resilient content-based image authen-
tication over wireless channel, inIEEE Int. Symp. Circuits and Systems (ISCAS),
pp. 2707–2710, Kobe, Japan, 2005.
[19] J. Fridrich. Robust bit extraction from images,IEEE Int. Conf. Multimedia Computing
and Systems, 2:536–540, 1999.
[20] J. Fridrich and M. Goljan. Robust hash functions for digital watermarking, inProc.
IEEE Int. Conf. Information Technology—Coding and Computing’00, Las Vegas, pp. 178–183, 2000.
[21] G. R. Arce, L. H. Xie, and R. F. Gravemen. Approximate image authentication
codes, inProc. 4th Annual Fedlab Symp. Advanced Telecommunications Information
Distribution, College Park, MD, vol. 1, 2000.
[22] L. H. Xie, G. R. Arce, and R. F. Graveman. Approximate image message authentication
codes,IEEE Trans. Multimedia, 32, 242–252, 2001.
[23] Q. B. Sun, S.-F. Chang, K. Maeno, and M. Suto. A new semi-fragile image authen-
tication framework combining ECC and PKI infrastructure,Proc. IEEE Int. Symp.
Circuits and Systems (ISCAS), 2:440–443, 2002.
[24] W. Wesley Peterson and E. J. Weldon, Jr. Error-Correcting Codes, MIT Press,
Cambridge, MA, 1984.
[25] Q. B. Sun, S. M. Ye, C.-Y. Lin, and S.-F. Chang. A crypto signature scheme for image
authentication over wireless channel,Int. Image and Graphics, 61, 1–14, 2005.
[26] J. M. Park, E. K. P. Chong, and H. J. Siegel. Efficient multicast packet authentica-
tion using signature amortization, inProc. the IEEE Symp. Security and Privacy,
REFERENCES 137
[27] Q. B. Sun, D. J. He, and Q. Tian. A secure and robust authentication scheme for
video transcoding,IEEE Trans. Circuits and Systems for Video Technology (CSVT),
submitted.
[28] P. Yin and H. Yu. Semi-fragile watermarking system for MPEG video authentica-
tion, inProc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP),
4:3461–3464, 2002.
[29] D. J. He, Q. B. Sun, and Q. Tian. A secure and robust object-based video authentication
system,EURASIP Journal on Applied Signal Processing (JASP), Special Issue on
Multimedia Security and Right Management, Vol. 2004, No. 14, p. 2185–2200, October 2004.
[30] W.-Y. Kim and Y.-S. Kim. A New Region-Based Shape Descriptor, ISO/IEC
MPEG99/M5472, Maui, Hawaii, December 1999.
[31] M. Bober. MPEG-7 visual shape descriptors,IEEE Trans. Circuits and Systems for
Video Technology (CSVT), 116:716–719, 2001.
[32] C.-P. Wu and C.-C. J. Kuo. Comparison of two speech content authentication
approaches, inPhotonics West 2002: Electronic Imaging, Security and Watermarking
of Multimedia Contents IV, Vol. 4675 of SPIE proceedings, pp. 158–169, San Jose, Ca, 2002.
[33] M. Steinebach and J. Dittmann. Watermarking-based digital audio data authentication,