Radio character recognition - UCAM-CL-TR-577 Compromising emanations: eavesdropping risks of co

Figure 3.9: These single-glyph signals extracted from Figs. 3.4 and 3.5 demonstrate the benefits of a larger receiver bandwidth for the reduction of inter-character interference. The two images on the left were received with 200 MHz bandwidth, the others with 50 MHz. Within these groups, the left “W” is from position (61,1) and the right one from position (58,2) in Fig. 3.3, respectively.

3.4 Radio character recognition

Manually evaluating and transcribing captured video signals from compromising emanations is feasible, as the examples in the previous section show, but slow and labor intensive. A well-equipped eavesdropper can, therefore, be expected to use pattern recognition soft- ware, in order to automatically transcribe a signal into plaintext. The example presented in this section demonstrates that this is feasible, with a very simple algorithm, at least for monospaced fonts.

If all characters have the same width, as is still the case with terminal emulators and video terminals that imitate the behavior of a teletype machine, then the received and rastered image can simply be cut into character cells of equal size, as shown in Fig. 3.9. The rigid timing of the video signal causes all characters to end up aligned identically in these cells. If the eavesdropper can correctly guess the exact font used, then the only two parameters that need exact adjustment for radio character recognition are the line rate ˜fh and the

time when the first pixel of a text field is transmitted. Most other unknown variables that make optical character recognition a complex problem (rotation, scaling, glyph shapes, pixel alignment, glyph separation, etc.) are not an issue with video characters.

The choice of the best bandwidth for radio character recognition is a tradeoff. In lower bandwidths, the impulse response of the AM demodulator is longer than a single pixel, and some content of each character cell is influenced by the left neighbor character; see Fig. 3.9 for two examples of the character ‘W’ with different left neighbors at 200 and 50 MHz bandwidth. Inter-character interference is reduced at higher bandwidths. On the other hand, the sharper glyph shapes that come with increased bandwidth also make the comparison algorithm more sensitive to misalignment caused by errors in ˜fh.

In my experiments, choosing the bandwidth close to the pixel frequency turned out to provide the best results. Merely cutting the version of Fig. 3.5 with 256 averaged frames

into character cells, using those in lines 3–5 of the test text as a reference, and using the sum of all absolute values of pixel-value differences in a character cell as a decision metric, leads to the following recognition result for the remaining text:

The quick brown fox jumps over the lazy dog. THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG! 6x13 !"#$%&’()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_‘abcdefghijklmnopqrstuvwxyz{|}~ It is well known that electronic equipment produces electromagoetic fields which may cause interference to radio and television reception. The phenomena underlying this have been thoroughly studied over the past few decades. These studies have resulted in internationally agreed methods for measuring the interference produced by equipment. These are needed because the maximum interference levels which equipment may generate have been laid down by law in most countries. (from: Electromagnetic Radiation from Video Display Units: An Eavesdropping Risk?)

Only one single character (“electromagoetic”) is wrong in this example, which corresponds to a character error rate of 0.13 %. This result depends, of course, significantly on a good signal-to-noise ratio. When we apply the same matching algorithm on the signal generated from only 16 averaged frames, the recognized text reads as:

Ihc quick bcown fox_jumps-evec-toe Iazg dsg_=TOE_QHICK-DROWM-EHX JUHPS Q?ER iUE L0ZY DH6! -6zi3= !"#$%&’()* ,-=Z0!?3‘567O9:;< >?@ADcDEFCHIJKLHNcPQRHTHVQ%YZ[\]^=‘abedcBg6Ijkimndpqcstuvw:yz{|}" it Ic weII=kocwn=tHat-clectroric=cguipmcnt e_dduces-electrpmugmctic_fidlde_whico-may euuse _-. = icce-feceaee tc-radic-and teIcvisicn ceccpticc=-|6e phcncmcna uedcrlyigg tcic=have=bcec_= -= _-tncceughIy ctuHicd=dvcc the eust few=decudes, ihcsc stvdics‘have =ecuItcd io_inteceutiocu_iy - _ ugrceH=mct6edc=foc meacuciny t6c icterfcsesce pcoduccd_bg eeuipmcnt. Tbese are-nccded bccouse

toc=meximum intcrfercncc ievcls which-eguipmcnt may gesc-atc-6ave oecn la7d=dewc=by law in mcsc ceuntricc=-(fcem: FIectromegnctic-Radiatibn f_om Video Dispiey_Hsitc:=Hn Eavcsdcc=pimg-Risk?)-

With a character error rate of 34 %, it is very severely distorted and not directly usable, for example, for full-text searching. But because most of the mis-recognized characters graphically resemble the correct character, it is still possible to guess most of the text. Even though only 66 % of the characters were recognized correctly, half of the remaining ones (16 %) came second, and another 6 % third, on a list that shows, for each received pattern, all candidate symbols sorted according to how closely they match. In addition to the character error rate, we can introduce further performance quantities that take also into account, whether the correct character came as a close second or third.

If s is a random vector representing the received signal for a character or symbol to be recognized, and {r1, . . . , rn} are the reference signals available for the n characters to be

distinguished, where rc is the correct one, then we can produce for each output of s a

sorted list (d1, . . . , dn) such that ||s − rdi|| < ||s − rdj|| ⇒ i < j, where || · || is the distance metric used (e.g., the vector norm | · |1 in the above experiment). Let pi = P (rc = rdi) be the probability that the correct character appears at the i-th position in that list. The average depth of the correct character2 _{is then} Pn

i=1pi· i and the entropy of the depth of

the correct character is −Pn

i=1pi· log2pi. The entropy measure, in particular, illustrates,

how many bits of uncertainty the eavesdropper still has for each character. It can help to estimate, for example, the complexity remaining for a brute-force search to find a correct password, given that a distorted copy of it has already been recognized, and candidate passwords can therefore be tested in order of falling probability.

The average depth of the correct character in the above two experiments was 1.0013 and 2.0995, respectively. The entropy of the depth of the correct character was 0.0141 bit and 1.7778 bit, respectively.

2_{The term average depth of correct symbol (ADCS) is also mentioned in the NSA Tempest standard}

[8]. Its definition there remains classified, and we can only speculate whether it refers to the same order statistic used here.

56 3.4. RADIO CHARACTER RECOGNITION

Figure 3.10: This matrix indicates the pairwise difference of the radio signals of the ASCII characters in Fig. 3.5. Brighter positions indicate larger differences and dark spots signal character pairs that automatic recognition systems are more likely to confuse.

Figure 3.10 shows, for the font used in this example, the difficulty of distinguishing be- tween all possible character pairs. A list of character pairs sorted by increasing distin- guishability starts with

and ends in

Nn-1H8 y1R))QV% b;MNO(?R1!)’9Y1.!K’.Q1MNUhN)W(h@HMQ;M!NHHUUQNh,MUM),)MN |~NNj}M~d{UKTpyN{Ml]{Nn}MUHN{pUMMjMN{Q_}j|_hjM{j|‘}N|Q‘}{|}|{}Mj{}NNM{j

A font designer might use such difference matrices, in order to generate security fonts that minimize the success of automatic radio character recognition. But this is likely to also affect safety properties; glyphs that are too similar might be mixed up more easily by the regular viewer.

Figure 3.11: Photograph of the CRT screen display seen by the user while the hidden message from Fig. 3.14 is being transmitted.

In document UCAM-CL-TR-577 Compromising emanations: eavesdropping risks of computer displays (Page 54-57)