• No results found

Thesis Document

N/A
N/A
Protected

Academic year: 2021

Share "Thesis Document"

Copied!
159
0
0

Loading.... (view fulltext now)

Full text

(1)

STI College – Bacoor

ADAPTIVE DIGITAL STEGANOGRAPHY FOR TRUE-COLOR

BITMAPS

A Thesis Presented to

Systems Technology Institute

Bacoor

in Partial Fulfillment

of the Requirements for the Degree of

Bachelor of Science in Computer Science

by:

Gan, Mark David C.

Mr. Gerry A. Villanueva

Thesis Adviser

(2)

STI College – Bacoor

ADVISER’S RECOMMENDATION SHEET

This Thesis entitled

Adaptive Digital Steganography for True-Color Bitmaps

by:

Gan, Mark David C.

and submitted in partial fulfillment of the requirements of the

Bachelor of Science in Computer Science degree has been examined

and is recommended for acceptance and approval

_________________________________

Mr. Gerry A. Villanueva

(3)

STI College – Bacoor

THESIS COORDINATOR AND DEAN’S

ACCEPTANCE SHEET

This Thesis entitled

Adaptive Digital Steganography for True-Color Bitmaps

after having been recommended and approved is hereby accepted

by the Information Technology Department

of Systems Technology Institute - Bacoor

_________________________________

Mr. Angelo Magdangal A. Maderal

Thesis Coordinator

_________________________________

Ms. Minerva R. Almalvez

(4)

STI College – Bacoor

PANEL’S APPROVAL SHEET

This Thesis entitled

Adaptive Digital Steganography for True-Color Bitmaps

developed by:

Gan, Mark David C.

after having been presented is hereby approved

by the following members of the panel

_________________________________

Mr. Angelo Magdangal A. Maderal

Panelist

March 2003

_________________________________

Mr. Ruben B. Muñoz

Panelist

March 2003

_________________________________

Mr. Reynaldo P. Gozum

Lead Panelist

(5)

TABLE OF CONTENTS

Title Page

Adviser’s Recommendation Sheet

Thesis Coordinator and Dean’s Acceptance Sheet

Panel’s Approval Sheet

List of Appendices

List of Tables

List of Figures

Acknowledgement

Abstract

1.0 Introduction . . . 1-1

1.1 Background of the Study . . . 1-1

1.2 Statement of the Problem . . . 1-4

1.3 Objectives of the Study . . . 1-6

1.3.1 General Objective . . . 1-6

1.3.2 Specific Objectives . . . 1-6

1.4 Significance of the Study . . . 1-7

1.5 Scope and Limitation . . . 1-9

2.0 Methodology of the Study . . . 2-1

3.0 Review of Related Literature and Studies . . . 3-1

4.0 Theoretical Framework . . . 4-1

4.1 Parameters of Information Hiding . . . 4-1

4.2 Least Significant Bit Encoding . . . 4-4

4.3 Transform Embedding . . . 4-5

4.4 Masking and Filtering . . . 4-6

4.5 Bit-plane Steganalysis . . . 4-6

4.6 Statistical Attacks . . . 4-8

4.7 Capacity Evaluation . . . 4-10

(6)

4.8 Minimum-Error Replacement . . . 4-13

4.9 Error Diffusion . . . 4-14

4.10 Pseudorandom Number Generators . . . 4-16

4.11 Hash Functions . . . 4-17

5.0 Data Gathering Procedures and Output . . . 5-1

6.0 Documentation of Current System . . . 6-1

6.1 EyeMage IIE . . . 6-1

6.2 Invisible Secrets 2002 . . . 6-2

6.3 Steganos File Manager . . . 6-5

6.4 The Third Eye . . . 6-6

7.0 Requirement Analysis Specification . . . 7-1

7.1 Performance . . . 7-1

7.2 Security . . . 7-1

7.3 Usability . . . 7-2

7.4 Functionality . . . 7-2

8.0 System Design Specification . . . 8-1

8.1 Stegosystem Model . . . 8-1

8.1.1 Encoding . . . 8-2

8.1.2 Transmission . . . 8-5

8.1.3 Decoding . . . 8-6

8.2 Stegosystem Pseudocode . . . 8-7

8.2.1 Capacity Evaluation Function . . . 8-8

8.2.2 Minimum-Error Replacement Function . . . 8-8

8.2.3 Error Diffusion Function

. . . 8-9

8.2.4 Encoding Function . . . 8-9

8.2.5 Metadata Decoding Function . . . 8-10

8.2.6 Data File Decoding Function . . . 8-10

8.3 Image Formats . . . 8-11

9.0 Systems Implementation . . . 9-1

9.1 Programming Considerations, Issues and Tools . . . 9-1

(7)

9.2 System Requirement Specification . . . 9-2

9.3 Testing Activities . . . 9-4

9.4 Installation Process . . . 9-12

10.0 Conclusion and Justification . . . 10-1

11.0 Recommendation . . . 11-1

Appendices

Bibliography

Curriculum Vitae

(8)

LIST OF APPENDICES

Appendix A. Information Hiding Terminology

Appendix B. Mathematical Notations

Appendix C. Project Schedule

Appendix D. Cover Images Used in Performance Testing

Appendix E. Stego Images With Severe Distortions

Appendix F. Chameleon Help File

(9)

LIST OF TABLES

Table 9-1. Cover-Images Used in Performance Testing . . . 9-5

Table 9-2. Test Results for Cover Image “abandon.bmp” . . . 9-6

Table 9-3. Test Results for Cover Image “antelope.bmp” . . . 9-6

Table 9-4. Test Results for Cover Image “badwater.bmp” . . . 9-7

Table 9-5. Test Results for Cover Image “death.bmp” . . . 9-7

Table 9-6. Test Results for Cover Image “gardens.bmp” . . . 9-7

Table 9-7. Test Results for Cover Image “kiss.bmp” . . . 9-8

Table 9-8. Test Results for Cover Image “nap.bmp” . . . 9-8

Table 9-9. Test Results for Cover Image “passing.bmp” . . . 9-8

Table 9-10. Test Results for Cover Image “race.bmp” . . . 9-9

Table 9-11. Test Results for Cover Image “rain.bmp” . . . 9-9

Table 9-12. Test Results for Cover Image “ruins.bmp” . . . 9-9

Table 9-13. Test Results for Cover Image “storm.bmp” . . . 9-10

Table 9-14. Test Results for Cover Image “style.bmp” . . . 9-10

Table 9-15. Test Results for Cover Image “subtle.bmp” . . . 9-10

Table 9-16. Test Results for Cover Image “sunrise.bmp” . . . 9-11

(10)

LIST OF FIGURES

Figure 2-1.

Software Engineering Paradigm . . . 2-1

Figure 4-1.

Data-Hiding Problem Space . . . 4-3

Figure 4-2.

A Grayscale STI College Logo and its Bit-planes . . . 4-7

Figure 4-3.

Neighbors of a Pixel . . . 4-9

Figure 6-1.

A Noise Image Created by EyeMage IIE . . . 6-1

Figure 6-2.

Screenshot of EyeMage IIE . . . 6-2

Figure 6-3.

Screenshot of Invisible Secrets 2002 . . . 6-4

Figure 6-4.

Screenshot of Steganos File Manager . . . 6-5

Figure 6-5.

Screenshot of The Third Eye . . . 6-7

Figure 8-1.

Framework of the Proposed Stegosystem . . . 8-1

Figure 8-2.

HIPO Chart of the Encoding Module . . . 8-2

Figure 8-3.

HIPO Chart of the Decoding Module . . . 8-6

Figure D-1.

abandon.bmp . . . D-1

Figure D-2.

antelope.bmp . . . D-2

Figure D-3.

badwater.bmp . . . D-3

Figure D-4.

death.bmp . . . D-4

Figure D-5.

gardens.bmp . . . D-5

Figure D-6.

kiss.bmp . . . D-6

Figure D-7.

nap.bmp . . . D-7

Figure D-8.

passing.bmp . . . D-8

Figure D-9.

race.bmp . . . D-9

Figure D-10. rain.bmp . . . D-10

Figure D-11. ruins.bmp . . . D-11

Figure D-12. storm.bmp . . . D-12

Figure D-13. style.bmp . . . D-13

Figure D-14. subtle.bmp . . . D-14

Figure D-15. sunrise.bmp . . . D-15

Figure E-1.

e_abandon_data350k.bmp . . . E-1

(11)

Figure E-2.

e_gardens_data310k.bmp . . . E-2

Figure E-3.

e_rain_data170k.bmp . . . E-3

Figure E-4.

e_storm_data350k.bmp . . . E-4

Figure E-5.

e_style_data190k.bmp . . . E-5

Figure E-6.

e_sunrise_data360k.bmp . . . E-6

(12)

ACKNOWLEDGEMENT

First of all, the author would like to thank his thesis adviser, Mr. Gerry

Villanueva, and the thesis coordinator, Mr. Angelo Magdangal Maderal. Their continued

efforts in supervising the development of this thesis is truly appreciated.

The author would also like to express his appreciation for the well-needed support

and guidance extended by the faculty and staff of STI College Bacoor.

Sincere gratitude also goes to Mr. Conrado Vidal who provided valuable

assistance in the preparation of this paper.

The author also gives out thanks to his friends and classmates who have helped in

various ways for the benefit of this project.

Lastly, the author extends out heartfelt appreciation to his parents and relatives

who have contributed a great deal of moral and material support not only for the

completion of this project but more importantly for the physical, mental, and spiritual

well-being of the author. This project would not have been possible without them.

(13)

ABSTRACT

Steganography is the science of hiding the existence of information for the

purpose or covert communications. Unlike cryptography which conceals the content of a

message, steganography conceals the very existence of a message.

In the electronic world, one of the most appropriate “hosts” for steganography are

digital images. Digital photographs are a commonly shared, sent, and exchanged

throughout the Internet in the form of email attachments or web postings. However,

current steganographic software available on the market have poor support for

high-capacity image steganography. Even worse, some steganographic software actually

distort or degrade the appearance of cover images and therefore exposes the

steganographic transformation the image has undergone.

In this study, a new image steganography software was presented to answer the

need for a software that makes optimum use of hiding space in an image without creating

any visible distortions. Along with a highly secure method for randomized encoding,

techniques for adaptive encoding were incorporated with the design of the software.

These techniques include capacity evaluation, minimum-error replacement, and error

diffusion.

The test results of comparing the performance of the presented software with

popular and currently-available steganography software showed that the presented

software is a major improvement especially in terms of providing high capacities for

hidden data while preserving the quality and appearance of the image.

(14)

1.0

INTRODUCTION

1.1

Background of the Study

Cryptography has long been the cornerstone of information

security. With a history that can be traced back to the ancient Egyptians

some 4000 years ago, it still plays a crucial role in present-day diplomatic

and military services [MENE1996]. However, encryption of messages is

obviously useful only when there is an expected enemy that is monitoring

the channel of communications. Unfortunately, detection of encrypted

communications is likely to provoke an enemy to exert great efforts in

destroying the exposed signal. In an even worse scenario, an enemy may

have enough computing power to break the encryption and decipher the

message. In such situations, the secrecy of the message transmission can

be as vital as the secrecy of the message itself.

Such a case was introduced by G. J. Simmons in 1983 as the

Prisoners’ Problem [ANDE1998]. The problem is that of two prisoners

named Alice and Bob who want to devise an escape plan.

Communications between them pass through a warden named Willie, who

frustrates their plans by putting them into solitary confinement every time

an encrypted message is detected. To avoid detection, they must first find

a way to conceal not only the content of the message but also the message

transmission itself. The solution is an ancient craft called steganography.

(15)

Steganography is the art and science of hiding the existence of

information. The term originated from Greek words (

steganoz and

grafein) that literally mean “covered writing” [HETZ2002]. The Greek

historian Herodotus recorded several accounts of steganography in the

ancient times [KAHN1996]. One of these stories was that of a man named

Histaieus who wanted to inform his allies when to revolt against the

Medes and the Persians. To avoid detection, the message was tattooed on

the shaved head of a trusted slave whose hair was later allowed to grow

before being dispatched to Persia. The message was successfully sent and

a victorious revolt soon followed.

Steganography played a key role in classical warfare and it

continued to do so even in modern times. In World War I, for example,

spies made extensive use of invisible inks in order for their mail to pass

through censorship bureaus [DAVE1995]. During World War II, microdot

technology allowed the Germans to photographically shrink documents

into the size of a printed period and avoid detection [KAHN1996]. At the

turn of the century, even the international terrorist leader Osama bin Laden

is believed be using steganography to “covertly distribute information to

his supporters and hide messages throughout the Internet” [SIEB2001].

Evidently, the practical use of steganography in covert communications

makes it one of the most significant subdisciplines in the field of

information hiding.

(16)

Although considered as a relative of cryptography, steganography

presents an entirely different perspective in protecting data. Cryptography

conceals the content of a message; steganography conceals the very

existence of a message [ANDE1998]. Encrypted text is quite prone to

detection, primarily due to its enigmatic or scrambled appearance. For

instance, practically any person who sees a piece of text written like

“HQFUBSWHG PHVVDJH” would suspect that it is an encrypted

message, since the text is obviously meaningless in its present form. On

the contrary, steganography avoids exposure by hiding information within

common and seemingly harmless media. Instead of merely obscuring the

appearance of information, the observer is provided with a convincing

illusion. As a result, the act of communication itself is kept secret.

The introduction of computers to the modern world has provided

various forms of digital media that can be exploited by steganography to

secretly host vital information. Modern approaches include the use of text

documents, digital pictures and even music files as carriers of hidden data.

For digital images, the most basic method for steganography is called least

significant bit encoding (LSB encoding). The simple and logical concept

behind this method makes it effective in concealing relatively large

volumes of data within a single image.

Unfortunately,

the

transformation-sensitive

nature

of

LSB-embedded data makes LSB encoding inapplicable to some image

formats, including popular standards like JPEG and GIF. As a result, most

(17)

researchers and developers focus their work on improving alternative

methods for image steganography; therefore leaving behind formats that

use LSB encoding like the standard Windows Bitmap (.bmp).

With the advent of these emerging steganographic technologies,

steganographic software applications are now becoming considerably

popular in the information security industry. Among the most popular and

highly acclaimed are Steganos File Manager, Invisible Secrets 2002,

EyeMage IIE and The Third Eye. All these four steganographic software

utilize safe hiding algorithms for standard Windows bitmaps. However,

most steganographic software available in the market are not always

suitable or sufficient for real-world applications of steganography because

of certain problems with respect to their design.

This study works on the problems of current steganographic

programs and presents a fully-operational software designed for real-world

application of steganography in covert communications and covert data

transfer.

1.2

Statement of the Problem

Generally, current steganographic software are adequate tools for

concealing information. However, most of the steganographic techniques

incorporated by these tools have certain weaknesses that may render the

software inapplicable in certain real-world situations that require the use

of steganography.

(18)

Basically, these weaknesses include the following:

·

Inefficient use of hiding space for bitmap images.

Current steganographic software uses standard LSB

encoding for bitmap images, which means the number of bits

used for data hiding is the same (fixed) for every pixel. To

minimize the possibility of making visual distortions in the

resulting image, only the first LSB of each color component of

each pixel is used as hiding space. Considering the relatively

larger file size of uncompressed bitmaps, keeping the size of

the hiding space to a fixed minimum results in inefficiency.

·

Tendency to produce distortions in the image.

To maximize hiding capacity, an alternative method is to

use more bits in a pixel instead of just the least significant bits.

As embedding is performed on more-significant bits,

distortions or noticeable marks tend to appear on the image.

Moreover, with original high-resolution digital photographs,

distortions can appear even when embedding is performed only

on the least significant bits.

·

Insecure hiding schemes.

Some steganographic software utilize insecure techniques

such as comment insertion for JPEG (Joint Photographic

Experts Group) and PNG (Portable Network Graphics) image

formats. As the name suggests, the algorithm works by merely

inserting data as comments in the header of the image file.

Although this method allows a virtually unlimited size of data

to be hidden, this is obviously insecure. Since the data is not

encoded within the pixels of the image but rather as header

information, the hidden data is readily visible and extractable

(19)

in a file editor. Such hiding scheme hardly qualifies as real

steganography.

For a further discussion on the nature of these problems, the

concepts behind LSB encoding along with other topics on information

hiding are discussed in Chapter 4.

1.3

Objectives of the Study

1.3.1 General Objective

This study aims to develop a secure steganography

software that overcomes the weaknesses of current steganographic

software and that is suitable for real-world application in covert

communications before the end of the school year 2002-2003.

1.3.2 Specific Objectives

To accomplish this objective, this study aims to:

·

Incorporate techniques that would eliminate the

possibility of creating human-perceptible distortions in

the image.

·

Incorporate techniques that would maximize the use of

hiding space in the image.

·

Incorporate techniques that encode the hidden data

within the pixels of the image in order to make them

more integrated with the image itself and not merely as

appended header information in the image file.

(20)

·

Establish security that is reliable even when an enemy

has knowledge of how the steganographic algorithm

works, i.e., unconditional security.

1.4

Significance of the Study

As a relative of cryptography in the spycraft family, steganography

is undeniably significant in the espionage industry. However, its use is

certainly not limited to illegal or malicious purposes. Steganography can

also be used to defeat espionage by concealing and securing the

transmission of sensitive information. Around the world, there are

growing concerns regarding diplomatic, industrial and even domestic

espionage. Nowadays, such attacks may come from a wide variety of

organizations that range from government intelligence agencies to

international terrorist networks.

In the fifth of July 2000, for instance, the European Parliament

decided to investigate claims that a global interception system, presumably

codenamed ECHELON, is “being used for purposes of industrial

espionage” under the control of the US National Security Agency (NSA)

and associate countries [SCHM2001]. Although no substantial evidence

proved that it has actually been used to gather competitive intelligence in

favor of American business firms, it was concluded that the satellite-based

interception system undoubtedly exists and is operational.

During the same year, the US Federal Bureau of Investigation

(FBI) also stirred up human rights issues with its Carnivore software

(21)

[MCCU2000]. Like its reported predecessors, Carnivore is capable of

intercepting Web and email traffic as part of investigations on suspected

felons. The online article “How Carnivore Works” explains that the

software uses a technology called packet sniffing, which is a common

technology used by network administrators to monitor network traffic

[TYSO2001]. The unfortunate availability of this technology leads to

serious security concerns regarding Internet traffic.

Government intelligence agencies also have rising concerns

regarding the communications capabilities of extremist factions. Even

before the heightened security concerns brought about by the September

11, 2001 terrorist attack on the World Trade Center, Muslim extremist

groups that are linked to the attacks were already reported to be using

“Internet bulletin boards carrying pornographic and sports information” as

hosts for innocuous-looking pictures encoded with terrorist guerilla plots

[PLEM2001]. Clearly, such exploitations of Internet media pose a threat to

society and are unacceptable.

This paper provides useful information regarding both established

and experimental steganographic techniques. These techniques provide

formidable security for data transmission over public channels such as the

Internet. Companies and various institutions can also integrate

steganographic technology as a primary layer to their existing security

protocols. For example, those which use cryptography as their only

security layer may incorporate steganography as an additional first line of

(22)

defense next to encryption. This way, transmitted messages need to be

detected, extracted and deciphered before its contents can be compromised

to unauthorized personnel.

In addition, understanding the concepts behind digital

steganography can help Web-based companies to develop precautionary

measures to protect their websites or message boards from being used as

hosts for covert criminal transactions such as terrorist plots. Similar

protocols may also be employed by business firms to prevent disloyal

employees from using steganography in secretly passing sensitive

information to rival companies through the company’s own website.

1.5

Scope and Limitation

This study centers its work on the application of digital image

steganography in covert communications. The techniques presented in this

study are not suitable for other areas of information hiding since different

applications focus on different features, as explained in Chapter 4.

Focusing on communications, the primary concerns of the proposed

steganographic software are imperceptibility and hiding capacity.

This study also limits its work on true-color bitmaps. True-color

bitmaps represent pixels as a combination of three 8-bit values

corresponding to the red, green, and blue color components of a pixel.

Despite of its relatively large file size as compared to grayscale images,

the true-color format offers a wider range of distinct colors. More colors in

(23)

an image’s palette means less difference in color tone between each pair of

close colors. This allows for higher levels of modification to be made in

each pixel before the human eye is capable of noticing the changes in

color.

It is important to note that the presented algorithms are

inapplicable to image formats that use lossy compression. Lossy

compression refers to the type of compression wherein “some data is

deliberately discarded to achieve massive reductions in the size of the

compressed file” [PFAF1995]. An example of which is the Discrete

Cosine Transform (DCT) of the popular JPEG image format. This

transformation smoothens the random textures of an image, the same areas

of texture in which embedding is most appropriate. Such transformations

can therefore destroy the data embedded within an image.

Furthermore, although such transformations remove details that are

typically imperceptible to the human eye, repeated processing will

eventually result in a highly degraded image quality. On the other hand,

bitmaps that utilize lossless compression store color values in exact detail.

This means that unlike in JPEG images, bitmaps remain exactly the same

regardless of how many times they are saved or copied.

(24)

2.0

METHODOLOGY

As an outline for the software development section of this study, the

author followed the generic software engineering paradigm presented by

Pressman [PRES1992]. However, Pressman’s generic paradigm is an outline for

company-based software systems. In contrast, the application software developed

in this study is not designed for any specific organization or individual. Its main

purpose is to demonstrate the proposed stegosystem and to test it effectiveness.

With this in mind, the generic paradigm was modified to suit the needs of the

project. The five phases of this modified paradigm is presented in Figure 2-1.

The analysis phase is concerned with defining the scope and functions of

the software being developed. In this phase, the specific attributes that need to be

incorporated to the stegosystem are assessed. Then follows the design phase,

where concepts and techniques are put together to form an abstract model of the

Figure 2-1.

Software Engineering Paradigm

Analysis

Design

Coding

Testing

Enhancement

(25)

proposed stegosystem. In the coding phase, the model conceptualized in the

design phase is translated to computer code by construction of a software

prototype. After construction, next is the testing phase where the effectiveness of

the design is evaluated. Finally, in the enhancement phase, additional

optimizations or corrections are made to the algorithm as necessary.

Supplementary software features are also added in this phase, yielding a fully

engineered version of the application software.

It is important to note that since the software is not a company-based

software system, adaptive maintenance is inapplicable. The software is designed

for a specific use, not for a specific user. It is generic and does not need to adapt

to a specific organization or individual.

Furthermore, requirement analysis is very limited. No company-related

data gathering operations, like interviews and surveys, are needed in this study.

The requirements defined for the proposed stegosystem are based entirely on the

concepts of information hiding and of other fields in computer science.

(26)

3.0

REVIEW OF RELATED LITERATURE AND STUDIES

Over the past few years, a growing interest in the field of information

hiding started to arise. In May of 1996, the “First International Workshop on

Information Hiding” was held in Cambridge, UK. From then on, several areas of

the new field, like steganography and digital watermarking, started to get

attention from the research community. INSPEC reported that by 1998, the

number of publications on digital watermarking alone increased to 103 from

merely 2 back in 1992 [PETI1999].

Following the trend, different research institutions published reports and

journal articles that advertised the developing field. The Institute of Electrical and

Electronics Engineers (IEEE) published many of these articles, especially those in

the area of steganography. Neil Johnson and Sushil Jajodia’s article in the IEEE’s

magazine Computer provided a good primer for understanding steganography

[JOHN1998]. The article discussed basic techniques for image steganography and

evaluated a few steganographic software. Ross Anderson and Fabien Petitcolas

discussed the limitations of steganography in a special issue of the IEEE Journal

of Selected Areas in Communications [ANDE1998]. Petitcolas and Anderson,

along with Markus Kuhn, also wrote a summary of different areas of information

hiding and discussed several practical applications in the July 1999 issue of

Proceedings of the IEEE [PETI1999].

In 1996, Walter Bender et al. wrote a comprehensive article entitled

“Techniques for Data Hiding” for the IBM Systems Journal [BEND1996]. This

renowned publication discussed watermarking techniques for digital images such

(27)

as Texture Block Coding and the robust Patchwork algorithm. For digital audio,

the article also discussed low-bit coding, phase coding, echo data hiding, and

Direct Sequence Spread Spectrum (DSSS) encoding.

Bender et al. also explained semantic and syntactic methods for data

hiding in text. Syntactic methods manipulate punctuations such as commas in

order to encode a series of bits. Semantic methods, on the other hand, encode data

by replacing certain words with synonyms, where each synonym corresponds to a

certain binary value. Perhaps the simplest text-based steganographic techniques

presented in the article are the open space methods, otherwise known as white

space steganography. Bender presented such methods that encode data bits by

modulating the number of spaces between each pair of words. A simple approach

to such methods is to encode 0’s as single spaces and 1’s as double spaces.

A more efficient white space method is Matthew Kwan’s SNOW

algorithm, which stands for “Steganographic Nature of Whitespace”

[KWAN2001]. Unlike Bender’s example, SNOW encodes a set of three bits as a

set of spaces whose length is equal to the decimal value of the set of bits. Each set

of spaces is terminated with a tab character, which is also a white space. This

ensures that for each column of white spaces, exactly three bits of data are stored.

For example, the binary value 111 (decimal number 7) is encoded as a

combination of seven spaces and a tab. The binary value 011 (decimal number 3)

on the other hand, is encoded as a combination of three spaces and a tab.

Furthermore, these sets of white spaces are placed only at the end of every line of

text so as not to affect the appearance of the text.

(28)

Bender’s article was later followed by the equally comprehensive

“Applications for Data Hiding” in 2000 [BEND2000]. This next article, also

written by Bender, discussed several applications of information hiding such as

anti-counterfeiting, tamper detection, and copyright marking.

Well noted for his works on digital watermarking, Bender also supervised

the Master’s degree thesis of Fernando Paiz entitled “Tartan Threads: A Method

for the Real-time Recognition of Secure Documents in Ink Jet Printers”

[PAIZ1999]. This impressive thesis for the Massachusetts Institute of Technology

(MIT) presented a robust watermarking technology that can be used to prevent

inkjet printers from printing copyrighted documents and even monetary bills.

Such technologies are truly valuable in fighting forgery and counterfeiting.

Recognizing its significance in covert communications, the US Military

also conducted studies on steganography. One of which was the project lead by

Lisa Marvel on Spread Spectrum Image Steganography (SSIS), which was

sponsored by the US Army Research Laboratory [MARV1999]. SSIS adopted

concepts from spread spectrum communications to maximize embedding capacity

and imperceptibility. The Air Force Research Laboratory, on the other hand,

sponsored the work of Jiri Fridrich et al. on steganalysis techniques for detecting

LSB encoding in color images [FRID2000]. The Office of Naval Research also

supported steganographic studies. Under the Naval Research Laboratory,

Moskowitz et al. wrote the article “A New Paradigm Hidden in Steganography”.

The article explained how the concepts of information theory are inapplicable in

steganography, unlike in the case of cryptography [MOSK2000].

(29)

With the emergence of several studies on various steganographic

techniques, some researchers moved on to the study of steganalysis, i.e., the

process of detecting and defeating steganography. In 2002, Jessica Fridrich and

Miroslav Goljan wrote a paper entitled “Practical Steganalysis of Digital Images”

[FRID2002]. The paper is a compilation of current approaches to steganalysis and

included discussions on both visual attacks and statistical attacks. Visual attacks

rely on a human observer to “look for suspicious artifacts using simple visual

inspection”. The paper pointed out that despite the simplicity of visual attacks, “it

may be impossible to distinguish noisy images or highly textured images from

stego images using this technique”.

Statistical tests, the other hand, look for discrepancies in certain expected

properties of an image. Neils Provos and Peter Honeyman explains that “some

tests are independent of the data format and just measure the entropy of the

redundant data” [PROV2001a]. This means that images with hidden data are

expected to have “higher entropy” than those without. Images without hidden

data, for example, tend to have an LSB plane that is correlated to 1 or 0. Jiri

Fridrich’s paper on steganalysis for the Air Force Research Laboratory discusses

a technique that is based on evaluating the relative frequencies of close color pairs

[FRID2000].

With respect to statistical attacks, Neils Provos wrote a paper entitled

“Defending Against Statistical Steganalysis” which discusses certain

countermeasures that could help prevent the detection of hidden information.

These approaches “included preserving correlations to one and entropy measured

(30)

by the Maurer test” [PROV2001]. These concepts were incorporated in Provos’

image steganography software called OutGuess which uses the JPEG format.

Another software designed to defeat statistical steganalysis is Andreas

Wesfeld’s software named F5 [WEST2001]. F5 uses a technique called matrix

encoding in order to maximize hiding capacity in JPEG images. As a result, F5

uses an average of 13% of the image file size, which means it is as space-efficient

as the standard fixed-size LSB encoding in lossless true-color bitmaps.

In 2002 however, Jessica Fridrich et al. wrote a paper entitled “Attacking

the OutGuess” which explains how stego images created with OutGuess, F5, and

other JPEG-based steganography software can be reliably detected and identified

[FRID2002a]. The results presented in the paper was verified by Provos himself

by developing a software called Stegdetect based on the presented steganalysis

techniques. The reliability of the techniques presented were acknowledged by

Provos in the OutGuess website, where Stegdetect is available for download

[PROV2002].

Other relevant publications on steganography are cited in Petitcolas and

Anderson’s “Information Hiding: An Annotated Bibliography” [ANDE1999].

This bibliography written for the Computer Laboratory of the University of

Cambridge is an excellent resource guide on practically every area of information

hiding.

(31)

4.0

THEORETICAL FRAMEWORK

4.1

Parameters of Information Hiding

Depending on the application, different methods for information

hiding are inherently focused on different features or parameters. The

basic parameters of information hiding systems are imperceptibility,

hiding capacity, and robustness.

Imperceptibility, or perceptual transparency, is obviously an

inherent goal of every information hiding system. In order to conceal the

existence of hidden information, it is important that the embedding

process does not produce perceptible distortions to the cover-medium.

Bender et al. related this concept to the magician’s trick of misdirection,

which allows “something to be hidden while it remains in plain sight”

[BEND1996].

An extension to imperceptibility is undetectability. Instead of

exploiting the weaknesses of human perceptive capabilities, as in the case

of imperceptibility, undetectability focuses on defending steganography

against computer-based steganalysis by preserving certain statistical

characteristics of the cover-medium. Undetectability is therefore

concerned with the stego-medium’s consistency with the statistical

characteristics of the original cover-medium [FRID1998].

Hiding capacity, which is also referred to as embedding capacity,

bit-rate, and payload, was defined by Eugene Lin and Edward Delp as

(32)

“the size of information that can be hidden relative to the size of the

cover” [LIN1999]. They further explained that a higher capacity rate

reduces the need for using larger cover-files. This in turn enhances

portability and transmission speed, which are both high priorities in covert

communications. Digital watermarking systems, on the other hand, embed

only small volumes of data like copyright information and thus capacity is

of minimal importance in such applications [BEND1996].

Robustness is the resistance against attempts to destroy the

embedded data by means of modifying the stego-medium. Bender et al.

stated that sources of such modifications range “from intentional and

intelligent attempts of removal to anticipated manipulations”

[BEND1996]. In image processing, these “anticipated manipulations”

include the inevitable consequences of certain image transformations. A

good example of which is the downgrading of image quality in the JPEG

format, as explained in Chapter 1.

Beyond accidental or intentional destruction of embedded data, an

even greater threat is tampering. An example of such malicious attack is a

pirate’s attempt to alter embedded copyright information. As with music

files, copyrighted images are common targets of piracy. Robustness and

tamper resistance are therefore primary concerns of digital watermarking

systems [LIN1999].

Marvel et al. stated that capacity and robustness is impossible to

maximize at the same time while adhering to high imperceptibility rates

(33)

[MARV1999]. In an article for the IBM Systems Journal, Bender et al.

introduced this trade-off between capacity and robustness as the

data-hiding problem space [BEND1996]. The article explains that to

achieve robustness, redundant encoding of the embedded data on the

cover-medium must be performed, which in turn sacrifices capacity.

Figure 4-1 is based on Fridrich’s diagram for the data-hiding problem

space, which depicts the mutually competitive nature of these parameters

[FRID1998].

As the triangular shape of the diagram shows, the three opposing

parameters cannot be maximized all at the same time. In the case of covert

communications, for example, working on the midpoint between

imperceptibility and hiding capacity provides optimum balance between

the two parameters but at the expense of completely compromising

robustness. On the other hand, digital watermarking systems sacrifice

capacity in favor of robustness, as is required by the application.

Figure 4-1.

Data-Hiding Problem Space

Imperceptibility

Hiding Capacity

Robustness

Digital

Watermarking

Steganography

(34)

4.2

Least Significant Bit Encoding

The concept behind LSB encoding is to replace the least

significant bit (LSB) of each pixel in an image with the bits of the data to

be embedded [LIN1999]. In a grayscale image, for example, each pixel is

represented by an 8-bit value that corresponds to the pixel’s intensity. The

lower the value, the closer the pixel is to the color black. Since the least

significant bit has a place value of 1, modifying it would result in a

maximum difference of only 1. Because the human eye is not capable of

distinguishing minute changes in color, such modifications would

normally be imperceptible.

In true-color bitmaps, the three colors of light (red, green, and

blue) are combined in varying intensities to define the color of each pixel.

Each color component is represented by an 8-bit value that corresponds to

the component’s intensity, ranging from 0 to 255. Since there are three

LSB’s in each pixel, the hiding capacity in terms of bits is three times the

total number of pixels in the cover image. In a 300x300 bitmap, for

example, the hiding capacity is 270,000 bits (33,750 bytes).

In cases that require large volumes of data to be hidden,

embedding can be performed even up to the second or third LSB of each

color component of a pixel. However, using the more significant portions

of a color component will certainly cause drastic changes in the image.

These changes may exist in the form undesirable marks and artifacts or as

distortions in the contour of smooth areas in an image.

(35)

LSB encoding, therefore, deals with two basic problems:

·

If capacity is maximized, modifications become evident and

secrecy is compromised.

·

If imperceptibility is maximized, lower volumes of data can be

hidden.

Furthermore, embedding in images with areas of very smooth

textures or solid patterns, such as computer-drawn images and graphs, is

likely to create visible distortions even if only the first LSB’s are

modified.

4.3

Transform Embedding

Because of the loss of data caused by the transformations

employed by many image file formats, standard LSB encoding is

inapplicable to many images. In such cases, instead of manipulating the

pixels or the spatial domain of an image, the coefficients of the transform

domain of a processed image is used [LIN1999]. The JPEG format, for

example, uses a lossy compression algorithm called the Discrete Cosine

Transform (DCT). The LSBs of the coefficients of this transform can then

be used as hiding space for data bits.

Although transform embedding techniques are typically more

robust as compared with standard LSB encoding, they also typically offer

less hiding capacity [LIN1999]. Furthermore, since such techniques are

format-dependent and embed data after the transformations have been

(36)

made, the embedding process may alter certain statistical properties of the

image that are common or unique to that particular image format. This

makes transform embedding techniques susceptible to detection or

steganalysis. Considering this, such techniques are ideal only for

applications like copyright marking and authentication.

4.4

Masking and Filtering

Another alternative method for data hiding in digital images are

masking and filtering techniques. Such techniques hide data by “marking

an image in a manner similar to paper watermarks”, thus they are designed

for application in digital watermarking [JOHN1998].

Digital watermarking techniques are more robust than traditional

steganographic techniques since watermarks are more integrated into the

image. The Patchwork watermarking algorithm, for example, embeds data

by manipulating the brightness or luminance of certain pairs of points in

an image [BEND1996].

4.5

Bit-plane Steganalysis

A bit-plane is the plane formed by the bits of the same bit position

in each pixel. Figure 4-2 shows a grayscale image of the STI College logo

along with its eight bit-planes. Black areas in the bit-planes represent a bit

value of 0 while white areas represent 1. Figure 4-2b is the most

significant bit-plane of the original image depicted as Figure 4-2a. Figure

(37)

4-2i, on the other hand, is the least significant bit-plane of the original

image.

Yeuan-Kwen Lee and Ling-Hwei Chen made two important

observations regarding bit-planes that are relevant to information hiding

[LEE1999]. The first observation is that areas appearing as random texture

in a more significant bit-plane will also appear as random texture in less

significant bit-planes. The second observation is that randomness in the

texture of a specific area increases gradually from the most significant

bit-plane to the least significant bit-plane. These observations are evident

in Figure 4-2.

Random textures in a bit-plane are produced by transitions of

0-to-1 or 1-to-0 in the values of adjacent bits in a bit-plane. Considering

the observations on bit-planes, Lee and Chen concluded that transition

(b)

(c)

(d)

(a)

(e)

(g)

(f)

(h)

(i)

Figure 4-2.

(38)

density is higher in less significant bit-planes compared to that of more

significant bit-planes [LEE1999].

Embedding in the more significant bits of an image is likely to

increase the transition densities of these bit-planes. The supposedly

gradual increase in transition density from more significant bit-planes to

less significant bit-planes would consequently be changed. Considering

this phenomenon, an attacker can therefore detect the presence of

embedded data by analyzing the transition densities of the bit-planes of an

image.

However, it is important to note that like most techniques for

steganalysis, bit-plane steganalysis is essentially based on statistical

observations and assumptions of distinct properties of the cover-medium.

The results of such methods for detection are not always reliable as they

may sometimes miss detecting hidden data or falsely identify a plain

image as a host for steganography. Moreover, Lee and Chen’s paper does

not provide details as to how bit-plane steganalysis may be automated

with the use of algorithms or software. Bit-plane steganalysis therefore

remains a visual attack and thus require a human observer to evaluate the

bit planes.

4.6

Statistical Attacks

Unlike visual attacks, statistical attacks are steganalytic methods

that can be automated through software. Statistical attacks operate by

(39)

detecting discrepancies in certain expected properties of an image or

image format. Lossy-compressed image formats, like JPEG for example,

tend to have distinct statistical properties because of the transformations

involved. Jessica Fridrich and Miroslav Goljan’s paper on practical

steganalysis explains that after a JPEG image is embedded with data, “the

cover image will become incompatible with the JPEG format in the sense

that it may be possible to prove that a particular 8x8 block of pixels could

not have been produced by JPEG decompression” [FRID2002]. The JPEG

compatibility test can therefore “potentially detect messages as short as

one bit”. Such distinct properties therefore make JPEG images very

vulnerable to attacks.

On the other hand, lossless-compressed bitmaps do not undergo

transformations and are therefore unlikely to have distinct properties, since

any combination of pixels in any area of the image is possible. However,

Jiri Fridrich et al. wrote a paper entitled “Steganalysis of LSB Encoding in

Color Images” that describes a steganalysis technique that is expected to

detect statistical discrepancies even in lossless-compressed true-color

bitmaps [FRID2000]. This technique called the RQP method (Raw Quick

Pair method) “is based on analyzing close pairs of colors created by LSB

embedding” [FRID2002].

Fridrich et al. states that data embedding tends to increase the

number of close colors in an image’s palette [FRID2000]. If an image

already has been embedded with hidden data, the ratio between the

(40)

number of all pairs of close colors and the number of all color pairs will

no longer increase if LSB encoding of random data is again performed on

the image.

A pair of colors is considered to be “close” if the difference

between the red-green-blue values of one color and the red-green-blue

values of the other color is less than or equal to 1. This expression can be

expressed as:

(|

R

1

R

2

|

£

1)

and

(|

G

1

G

2

|

£

1)

and

(|

B

1

B

2

|

£

1)

This is expression is also equivalent to:

(

R

1

R

2

)

2

+ (

G

1

G

2

)

2

+ (

B

1

B

2

)

2

£

3

The RQP method is said to work “reasonably well as long as the

number of unique colors in the cover image is less than 30% of the

number of pixels” [FRID2002].

4.7

Capacity Evaluation

Instead of embedding on a fixed number of bits in every pixel, the

size of the hiding space in each pixel must be dependent on the color

variation of the adjacent pixels. This prevents the embedding process from

making drastic modifications in the smooth areas of an image, therefore

minimizing human-perceptible distortions. Such adaptive selection of

hiding space also protects the stego image from bit-plane steganalysis and

(41)

other visual attacks since the significant bits of each pixel remain

unchanged.

Lee and Chen presented a simple method for evaluating the hiding

capacity of a pixel in a grayscale image [LEE1999]. For a particular pixel

P

, the first step is to get the gray level variation in its top-left adjacent

pixels

A

,

B

,

C

, and

D

. The schematic layout of these pixels is depicted in

Figure 4-3. The gray level variation is defined as the difference between

the maximum and the minimum gray values among the four pixels. The

formula for the gray level variation

V

can be written as:

V = max {A, B, C, D} – min {A, B, C, D}

According to Lee and Chen, the number of bits that can be

modified in pixel

P

is the minimum number of bits needed to store the

Figure 4-3.

Neighbors of a Pixel

B

(x-1,y-1)

A

(x-1,y)

H

(x-1,y+1)

C

(x,y-1)

P

(x,y)

G

(x,y+1)

D

(x+1,y-1)

E

(x+1,y)

F

(x+1,y+1)

(42)

binary value of

V

minus 1. Mathematically, this can be expressed simply

as:

K =

ë

log

2

V

û

It is important to note that with this technique, embedding can only

be performed in a top-to-bottom left-to-right orientation. Only the four

top-left adjacent pixels are considered in the evaluation because the

embedding function has already passed through these pixels. The other

four pixels on the bottom-right are still to undergo the embedding process,

which means their values may still change.

Lee and Chen also noted a distinct characteristic of the human

visual system (HVS) that is crucial to capacity evaluation. They stated that

for the human eye, “the greater the gray-scale is, the more change of the

gray-scale could be tolerated” [LEE2000]. Simply put, this means that the

closer a pixel is to the color white, the more tolerant it is to modifications.

Considering this, an upper boundary for the capacity of a pixel can be set

based on its intensity. Lee and Chen gives the following condition:

if

P > 191 then U = 5, else U = 4

A threshold of 5 is set for

U

because the more significant bits of

the pixels must not be allowed to change so that the upper boundary may

still be calculated accurately in the decoding process. The succeeding

(43)

topic will explain why the constants 191, 5, and 4 were chosen for the

condition for

U

.

Although not mentioned in Lee and Chen’s paper, such a technique

is, in theory, also applicable for true-color images. One possible

interpretation of the original algorithm is to use the capacity evaluation

function independently for each of the three color channels of a pixel.

Instead of evaluating grayscale values, the intensity of each color

component of a pixel may be considered. This approach will be

experimented in this study.

4.8

Minimum-Error Replacement

In order to minimize the changes made to a pixel as a result of

embedding, minimum-error replacement (MER) will also be incorporated

in the proposed stegosystem. The idea behind MER is to adjust the bit that

is immediately succeeding the modified LSB’s of a particular color value

in such a way that the change caused by the embedding operation is

minimal.

For example, if a binary number 1000 (decimal number 8) is

changed to 1111 (decimal number 15) because its three LSB’s were

replaced with embedded data, the difference from the original number is 7.

This difference in the original value of a color component is called the

embedding error. By adjusting the fourth bit from a value of 1 to a value

of 0, the binary number now becomes 0111 (decimal number 7) and the

(44)

embedding error is reduced to 1 while at the same time preserving the

value of the three embedded bits.

For every

K

number of LSB’s used for embedding in a particular

color value, the maximum embedding error for that color value is

2

K

–1

, or

the maximum value for a set of

K

bits. Since MER adjusts the bit next to

the modified LSB’s, the embedding error is restricted to a maximum value

of

2

(

K–1)

[LEE1999].

Going back to the calculation of the upper boundary in capacity

evaluation, the constants 191, 5, and 4 were chosen with consideration to

how MER works. A threshold of 5 was selected since embedding five bits

into an 8-bit value (a byte) using MER would mean that the sixth LSB

may also change. Since the only remaining bits that are protected from

modification are the two most significant bits (MSB), the value of a byte

when only these two bits are set to 1 is 192, i.e., 2

7

+ 2

6

. Whatever the

value of the six LSBs, the value of the byte will always be greater than

191 as long as both of the two MSBs are set to 1.

4.9

Error Diffusion

Since the capacity-evaluation technique presented by Lee and

Chen analyzes only the four top-left adjacent pixels, distortions in the

contour of certain areas or shapes in an image can still be made. For an

image to adapt to the changes in color caused by embedding, an error

diffusion technique must also be employed.

(45)

Lee and Chen presented an error diffusion technique called

Improved Gray-Scale Compensation (IGSC), which makes up for the

embedding error in the grayscale value of a pixel by spreading that error

evenly across the adjacent pixels [LEE2000]. This is done by adding ¼ of

the embedding error to the intensity or grayscale value of each of the four

bottom-right adjacent pixels. These pixels are depicted in Figure 4-3 as

pixels

E

,

F

,

G

, and

H

.

In simpler terms, when a pixel’s intensity increases, the intensities

of the four adjacent pixels on its bottom-right sides are decreased. Lee and

Chen did not discuss exactly how such an operation helps in preventing

distortions in the image. Nevertheless, it is conceivable that when a pixel’s

intensity increases, the capacity evaluation function is likely to allocate

higher capacities for the succeeding pixels than what it would have

originally allocated if the intensity of the current pixel had not increased,

and vice versa. This is because an increase or decrease in the intensity of

the current pixel affects the capacity evaluation for the succeeding pixels

that have not been processed yet. IGSC therefore compensates for the

change by maintaining balance between the intensities of adjacent pixels.

Moreover, performing such correcting operations allow the image to

preserve the average color intensity of the image as well as possible

correlations in the image’s bit-planes.

Although IGSC was presented as a technique for grayscale images,

Lee and Chen stated that it was based on an error-diffusion technique used

(46)

when converting true-color images to 8-bit color formats [LEE2000]. It

may therefore be assumed that the concept behind IGSC is applicable to

color images.

4.10 Pseudorandom Number Generators

According to Menezes et al., the “security of many cryptographic

systems depends upon the generation of unpredictable quantities”

[MENE1996]. In digital cryptography, this is particularly true in the sense

that every computer-based cryptosystem utilizes what is generally known

as a pseudorandom number generator (PRNG). A PRNG is a device or

algorithm that outputs a seemingly-random sequence of numbers based on

a given numerical value called the seed. Unlike real random number

generators (RNG), PRNGs are deterministic, which means a particular

seed will generate the exact same sequence of numbers every time it is

used.

Before the use of computer-based PRNGs in cryptography, data is

encrypted with a keystream as long as the unencrypted data in order to

eliminate any detectable patterns in encryption process. With PRNGs,

instead of requiring the user to remember an extremely long keystream,

encryption algorithms can use as a keystream the long sequence of

numerical values generated by a PRNG that is seeded by the numeric

equivalent of a given password or encryption key. This way, encryption

keys can be relatively short while still maintaining high security.

(47)

The same concept for key security can be used in steganography.

PRNGs can be used in steganography in randomizing the pattern of

embedding. With PRNGs, data bits may be scattered pseudorandomly

across the cover-medium. This makes the embedded data extremely

difficult, if not impossible, to extract without knowledge of the seed used

by the PRNG. In such a setup, the seed of the PRNG can be the hash value

of a user-given stego key.

4.11 Hash Functions

Like PRNGs, hash functions “play a fundamental role in modern

cryptography” [MENE1996]. Hash functions transform input values of

varying length or range into a fixed-length bit stream. Unlike

cryptographic functions, these functions are not reversible. Once a value

has been hashed, information about its original value has already been

lost.

This unique feature of hash functions makes them ideal for

password verification operations. For example, if a password-protected

operating system stores a list of valid passwords within the system for

login verification, a potential attacker or hacker may eventually find a way

to access the contents of such a list and break into the system.

A solution to this problem is to store the hash values of the

passwords instead of the original values. During login, entered passwords

are verified by calculating their hash values and comparing them with the

(48)

hash values stored in the system. This way, even if a hacker gains access

to the password list, the only way to know the actual passwords is to

calculate the hash value every possible character combination within the

maximum length for passwords and find the ones that matches the hash

values in the list. This would be an extremely exhaustive operation for the

attacker.

The same concept can be used in steganography. For password

verification, a hash value of the original password may be embedded in an

image along with the hidden data bits.

References

Related documents

In general, to determine fuel moisture effect on the gasification process, the model calculated such parameters as: amount of produced syngas, heating value of the syngas, cold gas

Its buffer should have a pH of 9.00 ± 0.5 with the ability to stay within one pH unit of this target when 20.0 mL of strong acid or base is added..

BMI: Body mass index; CAF: Cancer-associated fibroblast; DMP: Differentially methylated position; dsRNA: Double-stranded RNA; DVP: Differentially variable position; ERV:

Please note that the software synchronization must be disabled (see Vector Hardware Config | General information | Settings | Software time synchronization) if the hardware

What does the previous account of capacity development unveil? In short, it describes practical means for governance, a detailed illustration of governance

[87] demonstrated the use of time-resolved fluorescence measurements to study the enhanced FRET efficiency and increased fluorescent lifetime of immobi- lized quantum dots on a

+ California State Health Facility Surveyor Training, Sacramento CA, 12 weeks, 1999 + Minimum Data Set (MDS) / Resident Assessment Instrument (RAI), 1996 and 1998 + Provision