Source Codec for Multimedia Data Hiding

(1)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 4, April 2013)

77

Source Codec for Multimedia Data Hiding

K. Somasundaram

1

, M. Sukumar

2

, K. P.Vigneshkumar

3

1,3_{M.Tech-Information Technology, SNS College of Engineering, Coimbatore} 2 _{B.Tech-Information Technology, University College of Engineering, Tindivanam}

Abstract— The communication system can be achieved based on steganography in stream media such as LSB based (Least Significant Bit) methods are the most popular strategies. The steganography algorithm for embedding data in the active and inactive frame audio streams encoded by XOR source codec. This is used extensively in channel. Then LSB embedding in the secret information flat regions of the speech. This method divides the two frames, each representing ‘0’ and ‘1’ respectively. The active and inactive frames of audio are more suitable for data embedding. The steganography in the active and inactive audio frames attains a larger data embedding capacity the same imperceptibility. An improved voice activity detection algorithm is suggested for detecting active voice and inactive voice. For each frame, the VAD function will return a 0 and 1 indication to indicate whether speech is active voice and inactive voice Experimental results show our proposed steganography algorithm not only achieved perfect imperceptibility but also gained the integrity of hidden messages in the case of packet loss.

Keywords- Adaptive Steganography, Block Steganography, Steganalysis, Conver Communication, VAD

I. INTRODUCTION

Information hiding is the technology of embedding secret message into ordinary cover-object. The output of such operation is called stegano object, which is transmitted through the channel. The receiver can extract the secret message from the stegano object. This technology hides not only the content of the message, but also the existence of the transmission. Information hiding techniques have recently become important in the number of application areas. Digital audio, video, and pictures are increasingly furnished with distinguish but imperceptible marks. Military communication systems make increasing use of the traffic security techniques which, rather than merely concealing the content of a message using encryption, seek to conceal its sender, its receiver, or its very existence. Similar techniques are used in some mobile phone systems for digital elections.

Criminals try to use whatever traffic security properties are provided intentionally or otherwise in the available communications systems, and police forces try to restrict their use. However, many of the techniques proposed in this young and rapidly evolving field can trace their history back to antiquity, and many of them are surprisingly easy to circumvent.

In this article, we try to give an overview of the field, of what we know, what works, what does not, and what are the interesting topics for research.

Many researches on audio based information hiding have been reported, most of which are in audio files in high speed formats like WAV and MP3. However, information hiding in low bit-rate audio signal like compressed speech in Voice over IP (VoIP) is still an emerging problem. Low bit-rate codec has certain signal processing models as well as specific bit stream definition and codebooks. These restrictions might be the reasons that slow down the research of information hiding in low bit-rate speech. Also there are a couple of aspects that should be considered. The first is the requirement for real time communication. In other words, delay is not tolerable to be enlarged, especially if the cover speech is instantly spoken out rather than previously recorded. The speech is usually segmented into frames of 20ms or so, thus the message should also be embedded frame by frame. The second is robustness. The method of Least Significant Bit (LSB) might be efficient, but not able to survive low bit-rate compression. If the secret message is embedded in the original signal before the low bit-rate compression, there would be a probability of bit error during extraction. Hence directly embedding in the bit stream of low bit-rate speech is probably a better solution. It is more challenging to embed in low bit-rate speech because the redundancy in the original waveform is eliminated by the parameter model based coding.

(2)

International Journal of Emerging Technology and Advanced Engineering

78

In each index one secret bit is embedded. The essence of QIM method is to divide the whole codebook into two parts and assign a label of „0‟ or „1‟ to every codeword.

When a secret bit is embedded, only the corresponding part of the codebook is used. On the receiving side, the hidden bit is extracted by checking which part of the whole codebook the codeword belongs to. On condition that the channel is reliable, the receiver is able to directly extract the message from the compressed speech stream. Meanwhile the embedding algorithm only searches in half of the codebook rather than the entire one, so it does not cause additional delay. However, since the number of code words used in quantization is lessened, the distortion is increased. In order to lighten the distortion, one of the most important tasks is to find an ideal codebook partition scheme. Voice activity detection (VAD) refers to the ability of distinguishing speech from noise and is an integral part of a variety of speech communication systems, such as speech coding, speech recognition, hands-free telephony, audio conferencing and echo cancellation. In the GSM-based wireless system for instance a VAD module are used for discontinuous transmission to save battery power. Similarly a VAD device is used in any variable bit rate codec to control the average bit rate and the overall coding quality of speech. In wireless systems based on code division multiple accesses. This scheme is important for enhancing the system capacity by minimizing interference. In early VAD algorithms, short-term energy, zero-crossing rate and LPC coefficients were among the common features used for speech detection. Formant shape and least-square periodicity measure are some of the more recent metrics used in VAD designs.

A set of metrics including line spectral frequencies (LSF), low band energy, zero-crossing rate and full-band energy is used along with heuristically determined regions and boundaries to make a VAD decision for each 10 ms frame. Higher-order statistics (HOS) have shown promising results in a number of signal processing applications, and are of particular value when dealing with a mixture of Gaussian and non- Gaussian processes and system nonlinearity. The application of HOS to speech processing has been primarily motivated by their inherent Gaussian suppression and phase preservation properties. Work in this area has been based on the assumptions that speech has certain HOS properties that are distinct from those of Gaussian noise.

While previous work in the area of speech analysis such as detection voicing classification or pitch estimation, have attempted to exploit some of the observed features of the HOS of speech signals, little has been done in providing an analytical framework for using these cumulates in a voiced/unvoiced detector using the bispectrum is developed and based on the observation that unvoiced phonemes are produced by a Gaussian-like excitation and thus result in a small bispectrum whereas the same is not true for voiced phonemes. In a method based on Gaussianity tests for the bispectrum and the triple correlation is used to discriminate voiced and unvoiced segments. The method exploits the Gaussian blindness of HOS but not the peculiarities of the HOS of voiced speech to better classify the segments.

In the normalized skewness and kurtosis of short-term speech segments are used to detect transitional speech events (termed innovation), based on the observation that these two statistics take on nonzero values at the boundaries of speech segments, but no Analytical ground is given to support the results. In a pitch estimation method based on the periodicity of the diagonal slice of the third order cumulate is described and yields more reliable pitch estimates than the autocorrelation but the claim of the third-order cumulate slice having similar periodicity as the underlying speech is not clearly demonstrated. A robust VAD algorithm based on newly established HOS properties of speech. The first part the characteristics of the third - and fourth-order cumulates of the LPC residual of speech signals. The flat spectral envelope of this residual results in distinct characteristics for these cumulates in terms of phase, periodicity and harmonic content and yields closed form expressions for the skewness and kurtosis. It is shown, in the case of voiced speech, that these cumulates have zero-phase, a similar harmonic nature as the underlying speech and harmonic amplitudes that are a function of speech energy.

(3)

International Journal of Emerging Technology and Advanced Engineering

79

The properties and experimental findings thus

established show that the HOS of speech are in general nonzero and sufficiently distinct from those of Gaussian noise to be used as a basis for speech detection. The statistics are immune to Gaussian noise make them a set of robust metrics that are particularly effective in low SNR conditions.

The second part of the HOS properties of speech thus established and presents a new VAD algorithm that combines HOS metrics with classical second-order measures to classify short frames as speech or noise. A necessary condition for voicing is derived based on the relation between the skewness and kurtosis of voiced speech. The practical issues related to HOS analysis such as the bias and variance of the estimators is addressed. Using the white Gaussian assumption about noise in the LPC residual, a new unbiased estimator for the kurtosis is proposed and the variances of the HOS estimators are derived and expressed in terms of the underlying process variance (i.e., the noise energy). Knowledge of these variances allows quantifying the noise likelihood of a given frame given the values of these two estimates. The algorithm is tested using a variety of noise types and different SNR levels and its performance compared to the ITU-T G.729B VAD. To quantify performance, the probability of correctly classifying speech and noise frames as well as the probability of false classification are computed by making references to truth marker files in clean speech conditions. To compute these metrics and generate the noisy speech test cases, we used the material in the TIA database proposed for the evaluation of VAD algorithms.

Eighty test cases were used, with each case consisting of a different combination of speech normalization level, noise type and SNR. Four SNR levels are used dB, 18 dB, 12 dB, and 6 dB, with the SNR value computed as the ratio of the total energy of speech to that of the noise over the entire utterance, according to the procedure. The results show that the proposed algorithm performs overall better than G.729B with noticeable improvement in the Gaussian-like noises, such as street and parking garage and moderate to low SNR. Digital steganography in low bit rate audio streams is commonly regarded as a challenging topic in the field of data hiding. There have been several steganography methods of embedding data in audio streams. A G.711 based adaptive speech information hiding approach. lossless steganography in G.711 encoded speeches.

A steganography method of embedding data in G.721 encoded speeches. All these methods adopt high bit rate audio streams encoded by the waveform codec as cover objects, in which plenty of least significant bits exist.

However, VoIP are usually transmitted over low bit rate audio streams encoded by the source codec like ITU G.723.1 codec to save on network bandwidth. Low bit rate audio streams are less likely to be used as cover objects for steganography since they have fewer least significant bits than high bit rate audio streams. Little effort has been made to develop algorithms for embedding data in low bit rate audio streams. The embedded information in G.729 and MELP audio streams. A steganography algorithm for embedding information in low bit rate audio streams. But these steganography algorithms have constrains on the data embedding capacity that is, their data embedding rates are too low to have practical applications. Thus the main focus of this study was to work out how to increase the data embedding capacity of steganography in low bit rate audio streams. The some related work, discussing the possibility of embedding data in the inactive frames of low bit rate audio streams. In the imperceptibility of the steganography algorithm for embedding data in the inactive audio frames is analyzed.

II. LITERATURE REVIEW

A prediction-based conditional entropy coder which Annex A, et al [1] Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 Kbit/s, G.723.1 specifies a coded representation that can be used for compressing the speech or other audio signal component of multimedia services at a very low bit rate. In the design of this coder, the principal application considered was very low bit rate visual telephony as part of the overall H.324 family of standards. G.723.1 has two bit rates associated with it. These are 5.3 and 6.3 Kbit/s. The higher bit rate has greater quality. The lower bit rate gives good quality and provides system designers with additional flexibility. Both rates are a mandatory part of the encoder and decoder. It is possible to switch between the two rates at any 30 ms frame boundary. An option for variable rate operation using discontinuous transmission and noise fill during non-speech intervals is also possible.

(4)

International Journal of Emerging Technology and Advanced Engineering

80

This paper newly investigates the possibility of a semi-loss less steganography technique for increasing the capacity of the loss less steganography technique.

C. Bao, et al [4] Based on the analyzing of the redundancy of coded parameters in G.723.1, a novel approach to detect hiding information is proposed in this paper. By using the statistical value of increaser entropy, this scheme can not only detect hidden messages embedded in compressed speech, but also estimate the embedded message length accurately. The experimental results show that the proposed scheme is effective.

M. U. Celik, et al [5] A novel lossless (reversible) data-embedding technique, which enables the exact recovery of the original host signal upon extraction of the embedded information. A generalization of the well-known least significant bit (LSB) modification is proposed as the data-embedding method additional operating points on the capacity-distortion curve. Lossless recovery of the original is achieved by compressing portions of the signal that are susceptible to embedding distortion and transmitting these compressed descriptions as a part of the embedded payload. utilizes unaltered portions of the host signal as side-information improves the compression efficiency and thus the lossless data-embedding capacity.

L. Ma, Z. Wu, and W. Yang, et al [8] an approach for speech information hiding based on G.721 scheme. Dynamic secret speech information data bits can be embedded into original carrier speech data, with high efficiency in steganography and good quality in output speech. This method is superior to available classical algorithms on hiding capacity and robustness. This paper implements the proposed approach based on speech coding scheme G.721 and the experiments show that this approach meets the requirements of information hiding, satisfies the constraints of speech quality for secure communication, and achieves high hiding capacity of 1.6Kbps with an excellent speech quality and complicating speakers‟ recognition.

F.A.P. Petit colas, et al [9] Information-hiding techniques have recently become important in a number of application areas. Digital audio, video, and pictures are

increasingly furnished with distinguishing but

imperceptible marks. Military communications systems make increasing use of the traffic security techniques which, rather than the merely concealing the content of a message using encryption, seek to conceal its sender, its receiver, or its very existence. Similar techniques are used in some mobile phone systems for digital elections.

Criminals try to use whatever traffic security properties are provide intentionally or otherwise in the available communications systems, and police forces try to restrict their use.

Z. Wu and W. Yang et al [13] suggested a G.711-based

an adaptive LSB (Least Significant Bit) algorithm to embed dynamic secret speech information data bits into public speech of G.711-PCM (Pulse Code Modulation) for the purpose of secure communication according to energy distribution with high efficiency in steganography and good quality in output speech. It is superior to available classical algorithms, LSB. The embedding up to 20 Kbps information data of secret speech into G.711 speech at an average embedded error rate of 10− 5. It meets the requirements of information hiding, and satisfies the secure communication speech quality constraints with an excellent speech quality and complicating speaker recognition.

B. Xiao, et al [14] Which is applied to information hiding in instant low bit-rate speech stream. The QIM method divides the codebook into two parts, each representing „0‟ and „1‟ respectively. Instead of randomly partitioning the codebook, the relationship between code

words is considered. The proposed algorithm

Complementary Neighbor Vertices (CNV) guarantees that every codeword is in the opposite part to its nearest neighbor, and the distortion is limited by a bound. The feasibility of CNV is proved with graph theory. Moreover, in our work the secret message is embedded in the field of vector quantization index of LPC coefficients, getting the benefit that the distortion due to QIM is lightened adaptively by the rest of the encoding procedure. Experiments on iLBC and G.723.1 verify the effectiveness of the proposed method. Both objective and subjective assessments show the proposed method only slightly decreases the speech quality to an indistinguishable degree. The hiding capacity is no less than 100 bps. To the best of our knowledge, this is the first work adopting graph theory to improve the codebook partition while using QIM in low bit-rate streaming media.

III. PROPOSED SYSTEM

(5)

International Journal of Emerging Technology and Advanced Engineering

81

Thus the volume of the speech does not change imperceptibly even though their inactive audio frames contain hidden information. The theoretical analysis above suggests that steganography in the inactive frames of low bit rate audio streams would attain a larger data embedding capacity if an appropriate steganography algorithm were used. The two type of frame and steganography algorithms are then used respectively to embed the secret information. Then the low bit rate stream with hidden information is called stegano speech. which is transmitted to using VoIP. The stegano speech is then decoded and the extraction of secret information from the stegano speech is the inverse process of the embedding algorithm. Then finally got the secret information as well as PCM formatted audio stream.

In our proposed steganography algorithm not only achieved perfect imperceptibility but also achieved a high data embedding capacity we using XOR operation with 8kb/s. The data embedding capacity of the proposed algorithm is very much larger than those of previously suggested algorithms.

A.VAD Algorithm

The input speech and noise data will be read from WAV files with a sampling rate of 8 kHz. The speech data will be divided into frames of 80 samples (10 ms).For each frame, the VAD function (supplied) will return a 0/1 indication to indicate whether speech is active voice and inactive voice. If the Enr<Thresh is called inactive voice and Enr>=Thresh then it is called active voice. Active speech frame, the samples will be converted to µ-law codes (8-bits per sample).For the first inactive speech frame, a silence descriptor (SID) frame will be transmitted. This will contain at most 11 parameters. Subsequent inactive speech frames only send a place holder to indicate no information. This information for each frame will be written to a file.

 Active frames will contain a flag (byte with value

1), followed by 80 bytes of speech data.

 SID frames will contain a flag (byte with value 2), followed by 11 floating point values.

 Silence frames will contain a flag (byte with value 0) and nothing else.

16-bit samples to 8-bit codes and because of the "silence compression" afforded by DTX. The transmitter must generate a file. The receiver has only the file available and must generate speech samples from the data in the file. Even in this simple setup, the intermediate file will be significantly smaller than the original file because of the µ-law coding. The receiver reads the first byte of each frame from the data file and based on the flag value operates as follows.

For active frames, the speech data is decoded and converted to 80 speech samples. In SID, frames, 80 comfort noise samples are generated. The noise is generated based on the information in the 11 parameters contained in the SID frame. In silence frames, 80 comfort noise samples are generated based on the information received in SID frames.

B. Data Embedding

All the audio signal and secret message are then encoded uniformly by XOR into low bit rate stream. Then the output is for formed Encrypted message. The low bit rate stream contains inactive and active frames. The two type of frame and steganography algorithm are then used respectively to embed the secret information. Then the low bit rate stream with hidden information is called stegano speech, which is transmitted to using channel. Then the encrypted message are again hide with original audio. Then new audio is generated. This is provide the security.

C. Data Extracting

The stegano speech is then decoded and the extraction of secret information from the stegano speech is the inverse process of the embedding algorithm. Then the new audio is transmitted to the receiver through the channel. The receiver extract the audio signal and encrypted message. Then the encrypted message and audio signal is De-XOR. Then the receiver received formatted audio stream and secret information. Then finally got the secret information as well as PCM formatted audio stream.

IV. CONCLUSION

A high-capacity steganography algorithm for embedding data in the active and inactive frames of low bit rate audio streams encoded by XOR operation source codec. VAD algorithms are used to separate active frame and inactive frame from the audio. The data are hidden in both active and inactive frames. The encrypted messages are again hidden with the same audio. The experimental results have shown that our proposed steganography algorithm can

achieve a larger data embedding capacity with

(6)

International Journal of Emerging Technology and Advanced Engineering

82

To use this method VoIP uses the integrity of hidden message which gives no packet loss. In future we planned to make this method more efficient by giving zero packet loss and higher data security.

REFERENCES

[1 ] Annex, A. (2009) „Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 Kbit/s‟, ITU-T

Recommendation G.723.1 [Online].

Available:http://www.itu.int/net/itut/sigdb/speaudio/AudioForms.asp x?val=11 172.

[2 ] Aoki, N. (2008) „A technique of lossless steganography for G.711 telephony speech‟, in Proc. 2008 4th Int. Conf. Intelligent Inf. Hiding Multimedia Signal Process. (IIH-MSP), Harbin, Aug. 2008, pp. 608–611.

[3 ] Bai, L.Y. and Xiao, B. (2008) „Covert channels based on jitter field of the RTCP header‟, in Proc. IEEE Int. Conf. Intelligent Inf. Hiding Multimedia Signal Process, pp. 1388–1391.

[4 ] Bao.C, and Zhu, C. (2006) „Steganalysis of compressed speech‟, in Proc. IMACS Multiconf. Computational Eng. Syst. Applicat. (CESA), pp. 5–10.

[5 ] Celik.M.U, Sharma, G. and Saber, E. (2005) „Lossless generalized lsb data embedding‟, IEEE Trans. Image Process., vol. 14, no.2, pp. 253–266.

[6 ] Chen.B, and Wornell, G.W. (2001) „Quantization index modulation: a class of provably good methods for digital watermarking and information embedding'. IEEE Transactions on Information Theory, Vol47(4): pp.1423-1443.

[7 ] Kitawaki.N, Nagabuchi, H. and Itoh, K. (1988) „Objective quality evaluation for low-bit-rate speech coding systems‟, IEEE J. Sel. Areas Commun.,vol. 6, no. 2, pp. 242–248.

[8 ] Ma.L, Z. Wu, and Yang, W. (2007) „Approach to hide secret speech information in G.721 scheme‟, Lecture Notes Comput. Sci., vol. 4681, pp.1315–1324.

[9 ] Petitcolas.F.A.P, and Kuhn,M.G. (1999) „Information hiding-a survey Proceedings of the IEEE‟, Vol.87(7):pp. 1062-1078. [10 ]Phil Sallee, (2004) „Model-Based Steganography‟, IWDW 2003,

LNCS 2939, pp.154-167,

[11 ]Quatieri.T.F. (2002) „Discrete-Time speech signal processing: Principles and practice‟, Prentice Hall PTR.

[12 ]Tian, H. Zhou, K, Feng, D and Liu, J. (2008) „A covert communication model based on least significant bits steganography in voice over IP‟, in Proc. 9th Int. Conf. For Young Comput. Scientists, pp. 647–652.

[13 ]Wang Chungyi, Wu Quincy.(2007) „Information Hiding in Real-Time VoIP Streams. Ninth IEEE International Symposium on Multimedia, Proceedings‟, pp.255-262.

[14 ]Wu and W. Yang. (2006) „G.711-based adaptive speech information hiding approach‟, Lecture Notes Comput. Sci., vol. 4113, pp. 1139– 1144.