Cryptanalyzing an encryption scheme based on blind source separation

(1)

Cryptanalyzing an Encryption Scheme Based on

Blind Source Separation

Shujun Li, Chengqing Li, Kwok-Tung Lo, Member, IEEE and Guanrong Chen, Fellow, IEEE

Abstract— Recently there is a proposal of using the underde-termined BSS (blind source separation) principle to design image and speech encryption. In this paper, we report a cryptanalysis of this BSS-based encryption scheme and point out that it is not secure against known/plaintext attack and chosen-ciphertext attack. In addition, we discuss some other security defects of the schemes: 1) it has a low sensitivity to part of the key and to the plaintext; 2) it is weak against a ciphertext-only differential attack; 3) a divide-and-conquer (DAC) attack can be used to break part of the key. We ﬁnally analyze the role of BSS in this approach towards cryptographically secure ciphers.

Index Terms— blind source separation (BSS), speech encryp-tion, image encrypencryp-tion, cryptanalysis, known-plaintext attack, chosen-plaintext attack, chosen-ciphertext attack, differential attack, divide-and-conquer (DAC) attack.

I. INTRODUCTION

With the rapid development of multimedia and networking technologies, the security of multimedia data becomes more and more important in many real applications. To fulﬁll such an increasing demand, during the past two decades many encryption schemes have been proposed for protecting multimedia data, including speech signals, images and videos [1]–[9].

According to the nature of the data, multimedia encryption schemes can be classified into two basic types: analog and digital. Most early schemes were designed to encrypt analog data in various ways: element permuting, signal masking, frequency shuffling, etc., all of which may be exerted in the time domain or the transform domain, or both. However, due to the simplicity of their encryption procedures, almost all analog encryption schemes are not sufficiently secure against cryptographical attacks, especially those modern attacks such as known/chosen-plaintext and chosen-ciphertext attacks [2], [3], [10], [11]. As a comparison, in digital encryption schemes,

Copyright (c) 2007 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to [email protected]. This research was partially supported by The Hong Kong Polytechnic University’s Postdoctoral Fellowships Program under grant no. G-YX63, and by the Alexander von Humboldt Foundation of Germany. The work of K.-T. Lo was supported by the Research Grants Council of the Hong Kong SAR Government under Project Number 523206 (PolyU 5232/06E).

Shujun Li is with the FernUniversit¨at in Hagen, Lehrgebiet Information-stechnik, Universit¨atsstraße 27, 58084 Hagen, Germany.

Chengqing Li and Guanrong Chen are with the Department of Electronic Engineering, City University of Hong Kong, Kowloon Toon, Hong Kong SAR. Kwok-Tung Lo is with the Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong SAR.

The corresponding author is Shujun Li. Contact him via his personal web site: http://www.hooklee.com.

one can employ many cryptographically strong ciphers, such as DES [12] and AES [13], to achieve a higher level of security. Besides, to achieve a higher efﬁciency of encryption and to meet some special demands of multimedia encryption (such as format-compliance [14] and perceptual encryption [15]), many speciﬁc multimedia encryption schemes have also been developed [4]–[6]. Recent cryptanalysis work [16]– [30] has shown that some multimedia encryption schemes are insecure against cryptographical attacks.

Recently Lin et al. suggested employing blind source sepa-ration (BSS) for the purpose of image and speech encryption [31]–[37]. The basic idea is to mix multiple plaintexts (or mul-tiple segments of the same plaintext) with a number of secret key signals, in the hope that an attacker has to solve a hard mathematical problem – the underdetermined BSS problem. In Sec. VII of [37], Lin et al. claimed that this BSS-based cipher “is immune from the attacks such as the ciphertext-only attack, the known-plaintext, and the chosen-plaintext attack”, “as long as the intractability of the underdetermined BSS problem is guaranteed by the mixing matrix for encryption”.

This paper re-evaluates the security of the BSS-based en-cryption scheme and points out that it is actually insecure against known/chosen-plaintext attack and chosen-ciphertext attack. In addition, some other security defects are found under the scenarios of ciphertext-only attack, including its low sensitivity to the mixing matrix (part of the secret key) and to the plaintext, and a differential attack can be recommended, which works well when the matrix size is small. Based on the cryptanalytic ﬁndings, we further discuss the role of BSS in this approach towards cryptographically secure ciphers.

The rest of this paper is organized as follows. In the next section, a brief introduction is given to the BSS-based encryption scheme. Section III is the main body of this paper, which reports detailed cryptanalysis of the BSS-based encryption scheme. Then, the role of BSS in cryptography is discussed in Sec. IV. Finally, the last section concludes the paper.

II. BSS-BASEDENCRYPTION

Blind source separation is a technique that tries to recover a set of unobserved sources or signals from some observed mixtures [38]. Given N unobserved signals s1,· · · , sN and a mixing matrix A of size M×N, the BSS problem is to recover s1,· · · , sN from M observed signals x1,· · · , xM, where

[x1,· · · , xM]T = A[s1,· · · , sN]T. (1) When M ≥ N, blind source separation is possible if A satisﬁes some conditions. However, when M < N , this is

(2)

generally impossible (whatever A is), thus leading to the difﬁcult underdetermined BSS problem.

In [31]–[37], Lin et al. introduced a number of secret key signals to make the determination of the plaintext signals become an underdetermined BSS problem in the case that the key signals are unknown. Given P input plain-signals

s1(t),· · · , sP(t) and Q key signals k1(t),· · · , kQ(t), the encryption procedure is described as follows1_:

x(t) = [x1(t),· · · , xP(t)]T = Ask(t), (2) where x(t) denote P cipher-signals, sk(t) = [s1(t),· · · , sP(t), k1(t),· · · , kQ(t)]T, and A is a P×(P +Q) mixing matrix whose elements are within [−1, 1]. Write A = [As, Ak], where As is a P × P matrix and Ak is a P × Q matrix. Then, the encryption procedure can be represented in an equivalent form as

x(t) = Ass(t) + Akk(t), (3)

where s(t) = [s1(t),· · · , sP(t)]T and k(t) = [k1(t),· · · , kQ(t)]T. Thus, as long as As is an invertible matrix, one can decrypt s(t) as follows2:

s(t) = A−1_s (x(t)− Akk(t)) . (4) Different values of Q was used in Lin et al.’s papers: Q = 1 in [31] and Q = P in [32]–[37]. When Q = P , Lin et al. further set As= B and Ak = βB, where β ≥ 10 for image encryption and β ≥ 1 for speech encryption. In this case, the encryption procedure becomes

x(t) = B (s(t) + βk(t)) , (5)

and the decryption procedure becomes

s(t) = B−1x(t)− βk(t). (6)

Observing Eq. (3), one can see that the encryption procedure contains two steps:

• Step 1: x(1)(t) = Ass(t); • Step 2: x(t) = x(1)(t) + Akk(t).

The ﬁrst step corresponds to a simple matrix-based block cipher, and the second step corresponds to a simple addition-based stream cipher. From a different point of view, the two steps are exchanged as follows:

• Step 1: x(1)(t) = s(t) + A−1_s Akk(t); • Step 2: x(t) = Asx(1)(t).

In any case, the BSS-based encryption scheme is always a product cipher composed by a simple block cipher and a simple stream cipher. In next section, we will show that the two sub-ciphers can be separately broken by known/chosen-plaintext attack and chosen-ciphertext attack.

In Lin et al.’s BSS-based encryption scheme, the key signals

k1(t),· · · , kQ(t) are so long as the plain-signals and have to 1_{To provide a clearer description of the BSS-based encryption scheme, in}

this paper we use some notations different from those in Lin et al.’s original papers. For example, in [37], the i-th key signal is denoted by sni(t), while

in this paper we use ki(t) to emphasize the fact that it is a key signal.

2_{In Lin et al.’s papers, it is said that the decryption procedure was}

achieved via BSS. However, from the cryptographical point of view, it is more convenient to denote the decryption procedure by Eq. (4).

be generated by a pseudo-random number generator (PRNG) with a secret seed I0, which serves as the secret key. According

to the principle of BSS, the decryption process actually does not need any knowledge about the mixing matrix A to separate (i.e., decrypt) the encrypted plain-signals (i.e., the plaintexts). So, it seems that Lin et al. do not consider A as part of the secret key. However, we believe that A should be kept secret like part of the key, due to the following considerations:

• From a cryptographer’s point of view, except for the secret key, all details about a cryptosystem are known to attackers. So, if A is not part of the secret key, it should be considered known to attackers. Unfortunately, in this case the product cipher will be degraded to be a simple stream cipher. Considering x∗(t) = A−1s x(t) as the equivalent cipher-signal, the encryption procedure becomes

x∗(t) = s(t) + A−1_s Akk(t). (7) From the above equation, one can see that the encryption scheme is actually independent of the underdetermined BSS problem.

• As the theoretical basis of the BSS technique, it is assumed that the input signals are mutually independent of each other. Though Lin et al. have shown that the BSS techniques can work well for some images and speeches, it remains doubtful if the BSS technique can still work well to decrypt closely-related input signals (such as an image and its watermarked version, or consecutive frames in a movie). Thus, the knowledge about A is required to ensure that any plaintexts can be exactly decrypted in any case. This is the reason why we represent the decryption procedure as Eq. (4) without employing any BSS algorithm.

• As will be shown later in Sec. III-A.4, the key signals can be totally circumvented in a ciphertext-only differential attack, so the mixing matrix A must be kept secret as a second defence to maintain the security.

Thus, in this paper we assume that the secret key consists of both I0 and A. Note also that adding A into the secret key

will deﬁnitely lead to a stronger (at least equivalently strong) cryptosystem compared with the one with a single key I0. By

cryptanalyzing the stronger cryptosystem, the original one is also cryptanalyzed.

In [31]–[35], the BSS-based encryption scheme was mainly designed to encrypt P images simultaneously, where si(t) is the t-th pixel in the i-th image. In [36], [37], the encryption scheme was suggested to encrypt a single speech, each frame of which is divided into P segments and si(t) is the t-th sample in the i-th segment. This encryption scheme can also be applied to a single image, by dividing it into P blocks of the same size. To facilitate the following discussion, we assume that the encryption scheme is used to encrypt a single plaintext with P segments of equal sizes.

In Sec. VII of [37], it was claimed that the BSS-based encryption scheme is secure against most modern crypto-graphical attacks, including the ciphertext-only attack, known-plaintext attack, and chosen-known-plaintext attack. In the next sec-tion, we will show that this claim is questionable.

(3)

III. CRYPTANALYSIS

Before introducing the cryptanalytic results, let us see how large the key space is. In [31]–[37], each element of A is restricted within the interval [−1, 1]. Thus, assuming that each element in A has R possible values3, the number of all possible mixing matrix A is RP (P +Q). Furthermore, assuming that the bit size of I0 is L, the size of the whole key space is RP (P +Q)₂L_{. When Q = P and A = [B, βB], the size of the} whole key space is RP22L. Later, we will show that the real size of the key space is much smaller than this estimation, due to some essential security defects of the BSS-based encryption scheme. We will also explain why this encryption scheme is not secure against known/plaintext attack and chosen-ciphertext attack.

A. Ciphertext-Only Attack

1) Divide-and-Conquer (DAC) Attack: Rewriting Eq. (4) in the following form:

s(t) = ˆAxk(t), (8) where xk(t) = [x1(t),· · · , xP(t), k1(t),· · · , kQ(t)]T and ˆ A = A−1_s [I,−Ak] = £ A−1_s ,−A−1_s Ak ¤ .

From the above equation, to recover si(t), one only needs to know k(t) and the i-th row of ˆA. In other words, when the BSS-based encryption scheme is used to encrypt P indepen-dent plaintexts, the i-th plaintext can be exactly recovered with the knowledge of I0 and the i-th row of ˆA. A similar result

can be obtained when P segments of one single plaintext is encrypted with the encryption scheme. This fact means that

P rows of ˆA can be separately broken with a divide-and-conquer (DAC) attack. As a result, the size of the key space is reduced to be P R(P +Q)₂L_{. When Q = P and A = [B, βB],} it becomes P RP₂L_.

2) Low Sensitivity to A: From the cryptographical point of view, given two distinct keys, even if their difference is the minimal value under the current ﬁnite precision, the encryption and decryption results of a good cryptosystem should still be completely different. In other words, this cryptosystem should have a very high sensitivity to the secret key [12]. Unfortunately, the BSS-based encryption scheme does not satisfy this security principle, because the involved matrix computation is not sufﬁciently sensitive to matrix mismatch. Given two matrices A1 = [a1;i,j] and A2 = [a2;i,j] of

size M × N, if the maximal difference of all elements is

ε, then one can easily deduce that ∆xi, the i-th element of 3_{The value of R is determined by the ﬁnite precision under which the}

cryptosystem is realized. For example, if the cryptosystem is implemented with n-bit ﬁxed-point arithmetic, R = 2n_{; if it is implemented with IEEE}

ﬂoating-point arithmetic, R ≈ 231 (single-precision) or R≈ 263 (double-precision) [39], where the sign bit of the ﬂoating-point number is always negative.

∆x = A1s(t)− A2s(t), satisﬁes the following inequality:

|∆xi| = ¯¯ ¯¯ ¯¯ N X j=1 (a1;i,j− a2;i,j)sj ¯¯ ¯¯ ¯¯, ≤ N X j=1 |a1;i,j− a2;i,j| · |sj|, ≤ Nε max(|s(t)|),

where|s(t)| denotes the vector composed of absolutes values of all elements in s(t), i.e.,|s(t)| = [ |s1(t)| · · · |sN(t)| ]T (the same expression will also be used for other vec-tors/matrices afterwards without further explanation). As a result, the matrix A can be approximately guessed under a relatively large finite precision ε, still maintaining an accept-able quality of the recovered plaintexts. Since under the finite precision ε one only needs to guess d2/εe values of each element in A, the size of the key space is significantly reduced from P R(P +Q)2L to Pd2/εe(P +Q)2L, whered2/εe(P +Q)¿

R(P +Q).4 When Q = P and A = [B, βB], the size of key space is reduced from P RP2L to Pd2/εeP2L.

The above low sensitivity can be easily veriﬁed with exper-iments described as follows:

• Step 1: for a randomly-generated key (A, I0), calculate the ciphertext x(t) corresponding to a plaintext s(t); • Step 2: with another mismatched key (A + εR, I0),

decrypt x(t) to get ˜s(t) – an estimated version of s(t), where ε∈ (0, 1) and R is a P ×(P +Q) random (1, −1)-matrix.

For each value of ε, the second step was repeated for 100 times to get a mean value of the recovery error (measured in MAE – mean absolute error)5_{. Then, one can observe the}

relationship between the recovery error and the value of ε. Figure 1 shows the experimental results when the plaintexts are a digital image and a speech ﬁle, respectively.

The experimental results conﬁrm that a mismatched key can approximately recover the plaintext. Considering that humans have a good capability of correcting errors in viewing images and listening to speech, even relatively large errors may not be able to prevent a human attacker from recognizing the plain-image or plain-speech. Thus, the value of ε may be relatively large. When P = 4, A = [B, βB] and ε = 0.1, we give two examples of such recognizable plaintexts with relatively large errors in Figs. 2 and 3.

From the above experimental results, we can exhaustively search for an approximate version of A under the ﬁnite precision ε = 0.01∼ 0.1. Such an approximate version of A is then used to roughly reveal the plaintext. Since the searching complexity is O¡Pd2/εeP +Q¢, such an exhaustive search is feasible when P, Q is not very large6. When P = 2 and

4_{In this paper,} _{dxe denotes the ceiling function of x, i.e., the minimal}

integer that is not less than x.

5_{When the plaintext is a digital image with 256 gray scales, we ﬁrst calibrate}

each sub-image into the range{0, · · · , 255} and then calculate the recovery error of the whole image.

6_{In [31]–[37], small values are used in all examples: P = 2 or 4 and}

(4)

10−3 10−2 10−1 100 0 10 20 30 40 50 60 70 ε MAE Legend:∗ – P = Q = 4;◦ – P = 4 and A = [B, βB] (β = 10). a) 10−3 10−2 10−1 100 10−3 10−2 10−1 100 101 102 ε MAE Legend:∗ – P = Q = 4;◦ – P = 4 and A = [B, βB] (β = 2). b)

Fig. 1. The experimental relationship between the recovery error and the value of ε: a) the plaintext is a digital image “Lenna” (Fig. 3a); b) the plaintext is a speech ﬁle “one.wav” that corresponds to the pronunciation of the English word “one” (from Merriam-Webster Online Dictionary, http://www.m-w.com).

A = [B, βB], we carried out a large number of experiments in the following steps:

• Step 1: for a randomly-generated key (B, I0), calculate the ciphertext x(t) corresponding to a plaintext s(t); • Step 2: randomly generate a matrix R (each element

in the interval [−1, 1]), and then decrypt x(t) with the guessed key (R, I0) to get ˜s(t);

• Step 3: repeat Step 2 for r rounds, output the recovered

plaintext ˜s∗(t), every segment of which corresponds to the best recovery performance in all the r rounds; • Step 4: for the i-th segment of ˜s∗(t), ﬁnd the

correspond-ing matrix R, and extract its i-th row of its inverse R−1

0 0.1 0.2 0.3 0.4 0.5 −0.6 −0.4 −0.2 0 0.2 a) 0 0.1 0.2 0.3 0.4 0.5 −0.5 0 0.5 b)

Fig. 2. An example of human capability of recognition against large noises in speech: a) the original plain-speech “one.wav”; b) the recovered speech (MAE=0.164103). For reader’s veriﬁcation, the recovered speech is posted on-line at http://www.hooklee.com/Papers/Data/BSSE/one MAE=0.164103.wav.

a) b)

Fig. 3. An example of human capability of recognition against large noises in images: a) the original plain-image “Lenna”; b) the recovered image (MAE=47.6913).

to form the i-th row of ˜B−1, the inverse of an estimation of the original matrix B.

Assuming that the target ﬁnite precision is ε > 0, the interval [−1, 1] is divided into nε = d2/εe sub-intervals. Without loss of generality, assume that 2/ε is an integer. Then, each sub-interval is of equal size. Thus, if the element in the random matrix R has a uniform distribution over [−1, 1], the probability that |ri,j− ai,j| < ε occurs at least one time in r rounds of experiment is p(nε, r) = 1−(1−1/nε)r, where ri,j and ai,j are the (i, j)-th elements of R and A, respectively. One can easily deduce that p(nε, r) is an increasing function with respect to r and

p(nε, nε) > lim nε→∞ p(nε, nε) = 1− lim nε→∞ (1− 1/nε)nε = 1− e−1≈ 0.6321, which results in that p(nε, r) > 1− e−1 when r ≥ nε. In other words, with r≥ nεexperiments, it is a high-probability event that at least one ri,j is “equal” to ai,j under the ﬁnite precision ε. To get an approximate estimation of the i-th row of A, one can see that r = O¡nP

ε ¢

rounds of experiment are needed.

Apparently, the above steps actually simulate the process of a real ciphertext-only attack that tries to reveal the plaintext and to exhaustively guess B−1 (under the assumption that I0

(5)

is known). Note that MAE cannot be calculated to evaluate the recovery performance in a real attack, in which one does not know the plaintext. Fortunately, exploiting the large information redundancy existing in natural images and speech signals, one can resort to use some other measures to reﬂect the recovery performance of each segment of ˜s(t). In our experiments, we use a measure called MANE (mean absolute neighboring error), which is deﬁned as follows for the i-th segment of ˜s(t): 1 T− 2 T_X−1 t=2 |˜si(t)− ˜si(t− 1)| + |˜si(t)− ˜si(t + 1)| 2 , (9)

where T denotes the segment length. In Figs. 4 and 5, one recovered plain-speech signals and two recovered plain-images are shown for demonstration. One can see that r = O(10, 000) (or ε ≈ 0.01) is sufﬁcient to get a good estimation of the plaintext. 0 0.1 0.2 0.3 0.4 0.5 −0.6 −0.4 −0.2 0 0.2 a) 0 0.1 0.2 0.3 0.4 0.5 −0.6 −0.4 −0.2 0 0.2 b)

Fig. 4. A recovered speech in one 50,000-round experiment of exhaustively

guessing A when P = 2 and A = [B, βB]: a) the original

plain-speech “one.wav”; b) the recovered plain-speech (MANE of each segment: 0.0469, 0.0521). For reader’s veriﬁcation, the recovered speech is posted online at http://www.hooklee.com/Papers/Data/BSSE/one MANE=0.0469-0.0521.wav.

a) b)

Fig. 5. Two recovered plain-images in experiments of exhaustively guessing B when P = 2 and A = [B, βB]: a) r = 1, 000 (MANE of each segment: 39.7491, 14.9373); b) r = 10, 000 (MANE of each segment: 16.3888, 15.1722).

Note that for 2-D images the above 1-D MANE may be generalized to include more neighboring pixels, thus achieving a more accurate description of the recovery performance. In addition, multiple quality factors can be employed to further enhance the evaluation of the recovery performance.

3) Low Sensitivity to k(t): Due to the same reason as

the low sensitivity to A, one can deduce that the BSS-based encryption scheme is also insensitive to the key signal k(t). Given two key signals k1(t) and k2(t), if the

maxi-mal difference of all elements is ε, then each element of

|Akk1(t)−Akk2(t)| is not greater than Q max(|Ak|)ε = Qε. Since k(t) itself is not part of the secret key, but generated from I0, this problem does not have much negative inﬂuence

on the security of the whole cryptosystem against ciphertext-only attacks.

4) Differential Attack: Given two plaintexts s(1)_{(t) and} s(2)_{(t), if they are encrypted with the same key (A, I}

0), one

can get the following formula from Eq. (3):

∆x(t) = As∆s(t), (10) where ∆x(t) = x(1)(t)−x(2)(t) and ∆s(t) = s(1)(t)−s(2)(t). Note that Akk(t) disappears from the above equation. This means that from the differential viewpoint only As is the secret key, i.e., I0 is insigniﬁcant in the key. Considering the

low sensitivity of the encryption scheme to A, under ﬁnite precision ε the key space becomes O¡P nPε

¢

, and so one might exhaustively search As to recover the plaintext difference as follows:

∆s(t) = A−1s ∆x(t). (11) From the obtained plaintext difference, one can get a mixed view of the two interested plaintexts, in which both plaintexts may be completely recognizable by humans. Figures 6 and 7 show four plaintext differences of two speech ﬁles and two images. 0 0.1 0.2 0.3 0.4 0.5 −0.6 −0.4 −0.2 0 0.2 a) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 −0.5 0 0.5 b) 0 0.1 0.2 0.3 0.4 0.5 −0.5 0 0.5 c) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 −0.5 0 0.5 d)

Fig. 6. Differences of two plain-speech files: a) the first speech “one.wav”; b) the second speech “two.wav”; c) the difference speech signal “one”−“two”; d) the difference speech signal “two”−“one”. For readers’ verification, the two difference speech files are posted online at http://www.hooklee.com/Papers/Data/BSSE/one-two.wav and http://www.hooklee.com/Papers/Data/BSSE/two-one.wav.

(6)

a) b)

Fig. 7. Differences of two plain-images, “Lenna” and “cameraman”: a) Lenna-cameraman; b) cameraman-Lenna.

Denoting the guessed matrix by Ãs, one has ˜ ∆s(t) = Ã −1 s ∆x(t) = Ã −1 s As∆s(t). (12) Apparently, if Ãs 6= As, the obtained plaintext difference

˜

∆s(t) will have an inter-segment mixture, which may make the recognition of the two plaintexts more difficult. Fortunately, when P is relatively small, such an inter-segment mixture may not be too severe to prevent the recognition of the two plaintexts by humans. More importantly, our experiments showed that humans are able to recognize the two plaintexts even when the mismatch between Ãsand Asis not very small. When P = 2, As= · 0.7123 −0.4272 0.1958 0.1295 ¸ , Ãs= · 0.5914 0.9527 0.5726 0.1437 ¸ , (13)

a plain difference-image obtained in our experiments is shown in Fig. 8. One can see that both plain-images, “Lenna” and “cameraman”, can still be roughly recognized from such a heavily mixed difference-image. Another obtained plain difference-speech for “one.wav” and “two.wav”, is shown in Fig. 9, from which the two English words (“one” and “two”) are also distinguishable.

Fig. 8. One obtained plain difference-images with a badly-mismatched key when P = 2.

To further show the real performance of the differential attack with a badly-mismatched key, we have also carried out some experiments on the plain-images “Lenna” and

“camera-0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 −0.2

0 0.2

Fig. 9. One obtained plain difference-speech when As and ˜As have a

relatively large mismatch. For readers’ veriﬁcation, this difference-speech is posted online at http://www.hooklee.com/Papers/Data/BSSE/two-one-large-mismatch.wav. man” when P = 4, As=     0.6444 −0.2417 −0.2687 −0.7043 −0.1770 0.2778 0.5539 −0.7035 0.0539 −0.6525 −0.3157 0.2781 0.8404 0.5716 0.5484 −0.8133     and ˜ As=     0.9283 0.8109 0.9567 0.4825 0.1668 0.4057 0.8549 0.3246 0.3868 0.7950 0.9863 0.0574 0.4860 0.0194 0.2507 0.2910     . One result is shown in Fig. 10.

Fig. 10. One obtained plain difference-image with a badly-mismatched key when P = 4.

a) b)

Fig. 11. A visually-optimal result obtained in 100 plain difference-images: a) the optimal difference-image; b) the negative of the optimal difference-image.

In this differential attack, some quality evaluation factors (such as MANE used in Sec. III-A.2) are not suitable to auto-matically determine the best result in many plain difference-signals, because each segment of an obtained difference-signal is also a natural signal with abundant information redundancy.

(7)

Instead, one has to output all obtained difference-signals, and check them with naked eyes or ears to ﬁnd a perceptually-optimal result with the least inter-segment mixture. Figure 11 shows such a result in 100 plain difference-images when

P = 2 and A follows Eq. (13). By checking each segment

separately and combining the P optimal segments together, one can further get a better result with less inter-segment mixture.

While this differential attack works well for P = 2 as shown above, it will become infeasible when P is sufﬁciently large, due to the following reasons: 1) the inter-segment mixture is too severe; 2) the complexity of checking all O¡Pd2/εeP¢

difference-signals is beyond human’s capability.

B. Low Sensitivity to Plaintext

Another cryptographical property required by a good cryp-tosystem is that the encryption should be very sensitive to plaintext, i.e., the ciphertexts of two plaintexts with a slight dif-ference should be very different [12]. However, this property does not hold for the BSS-based encryption scheme. Given two key signals s1(t) and s2(t), if the maximal difference of

all elements is ε, then each element of |Ass1(t)− Ass2(t)|

is not greater than P max(|As|)ε = P ε. When the same secret key is used to encrypt two closely-correlated plaintexts, such as a plaintext and its watermarked version, this security defect means that the exposure of one plaintext leads to the revealment of both.

C. Known-Plaintext Attack

In this kind of attack, one can access to a number of plaintexts that are encrypted with the same key. From Eq. (10), with P plaintext differences, one immediately knows that the mixing matrix can be uniquely determined as follows:

As= ∆X(t)(∆S(t))−1, (14) where ∆S(t) and ∆X(t) are P× P matrices, constructed row by row from the P plaintext differences and the corresponding ciphertext differences, respectively. Then, Akk(t) can be further solved from any plaintext and its ciphertext:

Akk(t) = x(t)− Ass(t). (15) Now, (As, Akk(t)) can be used to recover other plaintexts encrypted by the same key (A, I0). Note that Akk(t) has a ﬁnite length determined by the maximal length of all known plaintexts, so (As, Akk(t)) can only recover plaintexts under this ﬁnite length.

When A = [B, βB], the key signals can also be determined: k(t) =s(t)− B

−1_x(t)

β . (16)

If the PRNG used is not cryptographically strong (such as LFSR [12]), it may be possible to further derive the secret seed I0, thus completely breaking the BSS-based encryption

scheme.

Note that n distinct plaintexts can generate ¡n₂¢ = n(n− 1)/2 plaintext differences. Solving the inequality n(n−1)/2 ≥

P , one can get the number of required plaintexts to yield at

least P plaintext differences:

n≥lpP− 1/4 + 1/2

m

≈√P . (17)

D. Chosen-Plaintext/Ciphertext Attack

In chosen-plaintext attack, one can freely choose a number of plaintexts and observe the corresponding ciphertexts, while in chosen-ciphertext attack, one can freely choose a number of ciphertexts and observe the corresponding plaintexts. So, in these attacks, one can choose P plaintext differences easily, which means that the above differential known-plaintext attack still works ﬁne in the same way.

IV. DISCUSSION

As we pointed out in the last section, the BSS-based encryp-tion scheme is always insecure against known/chosen-plaintext attack. So, the secret key cannot be repeatedly used in any case. This means that the encryption scheme has to work like a common stream cipher, by changing the secret key for each distinct plaintext. However, in this case, k(t) (equivalently, the secret seed I0) is enough to provide a high level of security,

since k(t) satisﬁes the cryptographical properties in a perfectly secure one-time-a-pad cipher (see Sec. V.B of [37]). This means that the BSS process becomes excessive.

Even when one wants to add a second function against potential attacks by applying the BSS mixing, the low sensi-tivity of encryption/decryption to the mixing matrix A (recall Sec. III-A.2) makes this goal less useful. As a result, in the current encryption design, the BSS model does not play a key role in the security of the scheme. The real core of the encryption scheme actually is the embedded PRNG that generates the key signals for masking the plaintexts.

If one wants to use the BSS-based encryption scheme with a repeatedly used key, some essential modifications have to be made to enhance the security against various attacks. Following the cryptanalysis given in the last section, we suggest adopting two coutermeasures simultaneously: 1) use a sufficiently large P ; 2) like the design of most modern block ciphers [12], iterate the BSS-based encryption for many rounds to improve its sensitivity to the secret key and to the plaintext. It is obvious that both countermeasures will significantly influence the encryption/decryption speed of the encryption scheme. It therefore seems doubtful if such an enhanced encryption scheme will have any advantages comparing with other multiple-round block ciphers, especially AES [13] that can be optimized to run at a very high rate on PCs [40].

Finally, it is worth mentioning that the original BSS-based encryption scheme can be used to realize lossy decryption, an interesting feature that may be useful in some real applica-tions7. This feature means that an encryption scheme can still (perhaps roughly) recover the plaintext even when there are some errors in the ciphertexts. An typical use of this feature is that the ciphertext can be compressed with some lossy algorithms to save the required storage in local computers or

7_{Another possible scheme is a matrix-based image scrambling system}

(8)

the channel bandwidth for transmission. For the BSS-based encryption scheme, the lossy decryption feature is ensured by low sensitivity of decryption to ciphertext, which is due to the same reason of the low sensitivity of encryption to plaintext (recall Sec. III-B). However, one should keep in mind that the lossy decryption feature is induced by the low sensitivity to plaintext/ciphertext, so there is a tradeoff between this feature and the security.

V. CONCLUSION

This paper has analyzed the security of an image/speech encryption scheme based on BSS mixing technique proposed in [31]–[37]. It has been shown that this BSS-based encryp-tion scheme suffers from several security defects, includ-ing its vulnerability to a ciphertext-only differential attack, known/chosen-plaintext attack and chosen-ciphertext attack. It remains to see how the BSS-based technique can be further improved for constructing cryptographically strong ciphers.

REFERENCES

[1] H. J. Beker and F. C. Piper, Secure Speech Communications. London: Academic, 1985.

[2] I. J. Kumar, “Cryptology of speech signal,” in Cryptology: System Identiﬁcation and Key-Clustering. Laguna Hills, California: Aegean Park Press, 1997, ch. 6.

[3] R. K. Nichols and P. C. Lekkas, “Speech cryptology,” in Wireless Security: Models, Threats, and Solutions. New York: McGraw-Hill, 2002, ch. 6, pp. 253–327.

[4] B. Furht, D. Socek, and A. M. Eskicioglu, “Fundamentals of multimedia encryption techniques,” in Multimedia Security Handbook, B. Furht and D. Kirovski, Eds. Boca Raton, Florida: CRC Press LLC, 2004, ch. 3, pp. 93–132.

[5] S. Li, G. Chen, and X. Zheng, “Chaos-based encryption for digital images and videos,” in Multimedia Security Handbook, B. Furht and D. Kirovski, Eds. Boca Raton, Florida: CRC Press LLC, 2004, ch. 4, pp. 133–167, preprint is available at http://www.hooklee.com/pub.html.

[6] A. Uhl and A. Pommer, Image and Video Encryption: From Digital Rights Management to Secured Personal Communication. Boston: Springer Science + Business Media Inc., 2005.

[7] B. Furht, E. Muharemagic, and D. Socek, Multimedia Encryption and Watermarking. Springer, 2005.

[8] W. Zeng, H. Yu, and C.-Y. Lin, Eds., Multimedia Security Technologies for Digital Rights Management. Academic Press, 2006.

[9] B. Javidi, Optical and Digital Techniques for Information Security. New York: Springer Science + Business Media Inc., 2005.

[10] M. G. Kuhn, “Analysis for the nagravision video scrambling method,” Online document, available at http://www.cl.cam.ac.uk/∼mgk25, 1998.

[11] S. Li, C. Li, G. Chen, N. G. Bourbakis, and K.-T. Lo, “A general quan-titative cryptanalysis of permutation-only multimedia ciphers against plaintext attacks,” accepted by Signal Processing: Image Communi-cation, in press, doi: 10.1016/j.image.2008.01.003, an early preprint available online at http://eprint.iacr.org/2004/374, 2008.

[12] B. Schneier, Applied Cryptography - Protocols, Algorithms, and Souce Code in C, 2nd ed. New York: John Wiley & Sons, Inc., 1996.

[13] National Institute of Standards and Technology (US), “Speciﬁcation for the advanced encryption standard (AES),” Federal Information Processing Standards Publication 197 (FIPS PUB 197), November 2001.

[14] J. Wen, M. Severa, W. Zeng, M. H. Luttrell, and W. Jin, “A format-compliant conﬁgurable encryption framework for access control of video,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 6, pp. 545–557, 2002.

[15] S. Li, G. Chen, A. Cheung, B. Bhargava, and K.-T. Lo, “On the design of perceptual MPEG-video encryption algorithms,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 2, pp. 214–223, 2007.

[16] M. Bertilsson, E. F. Brickell, and I. Ingemarson, “Cryptanalysis of video encryption based on spaceﬁlling curves,” in Advances in Cryptology -EUROCRYPT’89 Proceedings, ser. Lecture Notes in Computer Science, vol. 434. Berlin / Heidelberg: Berlin, 1989, pp. 403–411.

[17] J.-K. Jan and Y.-M. Tseng, “On the security of image encryption method,” Information Processing Letters, vol. 60, no. 5, pp. 261–265, 1996.

[18] L. Qiao, K. Nahrstedt, and M.-C. Tam, “Is MPEG encryption by using random list instead of ZigZag order secure?” in Proc. IEEE Int. Symposium on Consumer Electronics (ISCE’97), 1997, pp. 226–229.

[19] T. Uehara and R. Safavi-Naini, “Chosen DCT coefﬁcients attack on MPEG encryption schemes,” in Proc. IEEE Paciﬁc-Rim Conference on Multimedia (IEEE-PCM’2000), 2000, pp. 316–319.

[20] C.-C. Chang and T.-X. Yu, “Cryptanalysis of an encryption scheme for binary images,” Pattern Recognition Letters, vol. 23, no. 14, pp. 1847– 1852, 2002.

[21] A. M. Youssef and S. E. Tavares, “Comments on the security of fast encryption algorithm for multimedia (FEA-M),” IEEE Trans. Consumer Electron., vol. 49, no. 1, pp. 168–170, 2003.

[22] S. Li and K.-T. Lo, “Security problems with improper implementations of improved FEA-M,” J. Systems and Software, vol. 80, no. 5, pp. 791– 794, 2007.

[23] S. Li and X. Zheng, “Cryptanalysis of a chaotic image encryption method,” in Proc. IEEE Int. Symposium on Circuits and Systems, vol. II, 2002, pp. 708–711.

[24] ——, “On the security of an image encryption method,” in Proc. IEEE Int. Conference on Image Processing, vol. 2, 2002, pp. 925–928.

[25] C. Li, S. Li, D. Zhang, and G. Chen, “Cryptanalysis of a chaotic neural network based multimedia encryption scheme,” in Advances in Multimedia Information Processing - PCM 2004 Proceedings, Part III, ser. Lecture Notes in Computer Science, vol. 3333. Berlin / Heidelberg: Springer-Verlag, 2004, pp. 418–425.

[26] C. Li, S. Li, G. Chen, G. Chen, and L. Hu, “Cryptanalysis of a new signal security system for multimedia data transmission,” EURASIP J. Applied Signal Processing, vol. 2005, no. 8, pp. 1277–1288, 2005.

[27] C. Li, X. Li, S. Li, and G. Chen, “Cryptanalysis of a multistage encryption system,” in Proc. IEEE Int. Symposium on Circuits and Systems, 2005, pp. 880–883.

[28] C. Li, S. Li, D.-C. Lou, and D. Zhang, “On the security of the Yen-Guo’s domino signal encryption algorithm (DSEA),” J. Systems and Software, vol. 79, no. 2, pp. 253–258, 2006.

[29] S. Li, C. Li, G. Chen, and K.-T. Lo, “Cryptanalysis of RCES/RSES image encryption scheme,” accepted by J. Systems and Software, in press, doi: 10.1016/j.jss.2007.07.037, preprint available online at http: //eprint.iacr.org/2004/376, 2008.

[30] S. Li, C. Li, K.-T. Lo, and G. Chen, “Cryptanalysis of an image scrambling scheme without bandwidth expansion,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 3, pp. 338–349, 2008.

[31] Q.-H. Lin and F.-L. Yin, “Blind source separation applied to image cryptosystems with dual encryption,” Electronics Letters, vol. 38, no. 19, pp. 1092–1094, September 2002.

[32] Q. Lin and F. Yin, “Image cryptosystems based on blind source separation,” in Proceedings of the 2003 International Conference on Neural Networks and Signal Processing (ICNNSP’2003), vol. 2. IEEE, 2003, pp. 1366–1369.

[33] Q.-H. Lin, F.-L. Yin, and Y.-R. Zheng, “Secure image communication using blind source separation,” in Proceedings of the IEEE 6th Circuits and Systems Symposium on Emerging Technologies: Frontiers of Mobile and Wireless Communication (CASSET’2004), vol. 1. IEEE, 2004, pp. 261–264.

[34] Q. Lin, F. Yin, and H. Liang, “Blind source separation-based encryption of images and speeches,” in Advances in Neural Networks - ISNN 2005 Proceedings, Part II, ser. Lecture Notes in Computer Science, vol. 3497. Berlin / Heidelberg: Springer-Verlag, 2005, pp. 544–549.

[35] Q.-H. Lin, F.-L. Yin, and H.-L. Liang, “A fast decryption algorithm for BSS-based image encryption,” in Advances in Neural Networks - ISNN 2006 Proceedings, Part III, ser. Lecture Notes in Computer Science, vol. 3973. Berlin / Heidelberg: Springer-Verlag, 2006, pp. 318–325.

[36] Q.-H. Lin, F.-L. Yin, T.-M. Mei, and H. Liang, “A speech encryption algorithm based on blind source separation,” in Proceedings of the 2004 International Conference on Communications, Circuits and Systems (ICCCAS’2004), vol. 2. IEEE, 2004, pp. 1013–1017.

[37] ——, “A blind source separation based method for speech encryption,” IEEE Trans. Circuits Syst. I, vol. 53, no. 6, pp. 1320–1328, June 2006.

[38] J.-F. Cardoso, “Blind signal separation: Statistical principles,” Proc. IEEE, vol. 86, no. 10, pp. 2009–2025, 1998.

[39] IEEE Computer Society, “IEEE standard for binary ﬂoating-point arith-metic,” ANSI/IEEE Std. 754-1985, 1985.

[40] B. Gladman, “AES and combined encryption/authentication modes,” on-line document, available at http://fp.gladman.plus.com/AES/index.htm, 2006.

(9)

[41] D. V. D. Ville, W. Philips, R. V. de Walle, and I. Lemanhieu, “Image scrambling without bandwidth expansion,” IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 6, pp. 892–897, 2004.