Data Representation

(1)

(2)

Data Representation

How do computers represent data?

▪ Recognize only two discrete states: on or off

▪ Use a binary system to recognize two states

▪ Use number system with two unique digits: 0 and 1, called bits (short for

binary digits)

▪ Smallest unit of data computer can process

(3)

(4)

Data Representation

What is a byte?

Eight bits grouped together as a unit

Provides enough different combinations of 0s and 1s to represent 256 individual characters

▪ Numbers

▪ Uppercase

and lowercase letters

(5)

Converting Binary to Decimal

Decimal number system is base 10 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

Uses 10 numbers

23,625

Power of 10

representation 104 103 102 101 100

Decimal

representation 10000 1000 100 10 1

Base 10

(6)

Converting Binary to Decimal

Binary number system is base 2

0, 1

Uses 2 numbers

10010001 = 145

Base 2

representation 27 26 25 24 23 22 21 20

Decimal

representation 128 64 32 16 8 4 2 1

Base 2

(7)

Converting Decimal to Binary

Convert decimal 35 to binary

1. Using 8 bits, find largest power of 2 that will “fit” into 35

2. Place a 1 into that slot

3. If the # doesn’t fit, place a 0 into that slot

Power of 2

representation 27 26 25 24 23 22 21 20

Decimal

representation 128 64 32 16 8 4 2 1

Base 2

representation 0 0 1 0 0 0 1 1

(8)

Convert Binary to Decimal

1. Choose an 8 bit binary number = 10101110

2. Write the binary digits under the correct column

3. For each column with a 1, you will add that decimal value

4. You will not add the values of the columns you entered 0

Power of 2

representation 27 26 25 24 23 22 21 20

Decimal

representation 128 64 32 16 8 4 2 1

Base 2

representation 1 0 1 0 1 1 1 0

10101110 = 174

(9)

Binary Representation (1)

Why binary representation (as suppose to decimal or octal, etc..)?

Because the devices that store and manage the digital data are far less expensive and complex for binary representation.

They are also far more reliable when they have to represent one out of two possible values.

(10)

Binary Representation (2)

One bit can be either 0 or 1. Therefore, one bit can represent only two things.

To represent more than two things, we need multiple bits. Two bits can represent four things because there are four combinations of 0 and 1 that can be made from two bits: 00, 01, 10,11.

In general, n bits can represent 2n things because there are 2n combinations of 0 and 1 that can be made

from n bits. Note that every time we increase the number of bits by 1, we double the number of

(11)

Computing Systems Data

Usually the computing systems are complex devices, dealing with a vast array of information categories The computing systems store, present, and help us

modify: Text

Audio

(12)

Data Formats -

How to Interpret Data

Meaning of internal representation must be

appropriate for the type of processing to take place:

i.e. Images & sound: have to be digitized

▪ Images – need detailed description of the data, how color is represented at each data point

▪ Sound – need sampling rate

Proprietary formats

Unique to a product or company

E.g., Microsoft Word, Corel Word Perfect, IBM Lotus Notes

Standards

Evolve two ways:

▪ Proprietary formats become de facto standards (e.g., Adobe PostScript, Apple Quick Time)

▪ Committee is struck to solve a problem (Motion Pictures Experts Group,

(13)

Why Standards?

They exist because they are:

Convenient – sometimes the time to market is very important

whenever trying to finish a product, therefore existing

standards may be used to save time elaborating own protocols and interfaces

Efficient – most of the standards are put together by

committees with a wide experience in the specific area

Flexible – usually the standards allow for manufacturer or

OEM specific extensions

Appropriate – address a specific problem in a specific domain

Allow communication and sharing of information

Allow computing systems and software to interoperate (at both hardware and software levels)

(14)

Standards Organizations

ISO – International Standards Organization IEEE – Institute for Electrical and Electronics

Engineers

CSA – Canadian Standards Association

(15)

Examples of Standards

Type of Data Standards

Alphanumeric ASCII, Unicode

Image JPEG, GIF, PCX, TIFF, BMP,

etc

Motion picture MPEG-2, MPEG-4, etc

Sound WAV, AU, MP3, etc..

(16)

Data Representation

What are three popular coding systems to represent data?

ASCII—American Standard Code for Information Interchange EBCDIC—Extended Binary Coded Decimal Interchange Code Unicode—coding scheme capable of representing all

world’s languages

ASCII Symbol EBCDIC

00110000 0 11110000

00110001 1 11110001

00110010 2 11110010

(17)

Data Representation

How is a letter converted to binary form and back?

Step 1.

The user presses the capital letter D (shift+D key) on the keyboard.

Step 2.

An electronic signal for the capital letter D is sent to the system unit.

Step 3.

The signal for the capital letter D is converted to its ASCII binary code (01000100) and is stored in memory for processing.

Step 4.

(18)

Codes and Characters

The problem:

Representing text strings, such as “Hello, world”, in a computer

Each character is coded as a byte ( = 8 bits) Most common coding system is ASCII

ASCII = American National Standard Code for Information Interchange

(19)

ASCII Features

7-bit code

8th bit is unused (or used for a parity bit) 27 = 128 codes

Two general types of codes:

95 are “Graphic” codes (displayable on a console)

(20)

Most significant bit

(21)

(22)

(23)

(24)

(25)

= = = = = = = = = = = = Binary 01001000 01100101 01101100 01101100 01101111 00101100 00100000 01110111 01101111 01110010 01101100 01100100 Hexadecimal 48 65 6C 6C 6F 2C 20 77 6F 72 6C 64 Decimal 72 101 108 108 111 44 32 119 111 114 108 100 H e l l o , w o r l d = = = = = = = = = = = = = = = = = = = = = = = =

Note: 12 characters – requires 12 bytes Each character requires 1 byte

(26)

Unicode (1)

The extended version of the ASCII character set is not enough for international use.

The Unicode character set uses 16 bits per character. Therefore, the Unicode character set can represent 216, or over 65 thousand, characters.

Unicode was designed to be a superset of ASCII. That is, the first 256 characters in the Unicode character set correspond exactly to the extended ASCII

(27)

Unicode (2)

Version 2.1

1998

Improves on version 2.0

Includes the Euro sign (20AC₁₆ = ) From the standard:

▪ …contains 38,887 distinct coded characters derived from the supported scripts. These characters cover the principal written languages of the Americas, Europe, the Middle East, Africa, India, Asia, and Pacifica.

Latest version of Unicode is 4.0

(28)

Memory

What is memory?

Electronic components that store instructions, data, and results

Consists of one or more chips on

motherboard or other circuit board Each byte stored in unique location called an address, similar to addresses on a passenger train

Seat #2B4

(29)

Memory

● Stores three basic categories of items: 1. OS and system software

2. Application programs

3. Data and information

● Byte is basic storage unit in memory

● To access data or instructions in memory, computer

references the address that contain the bytes of data

● Manufacturers state the size of memory and

(30)

Memory

How is memory measured?

Term Abbreviation Approximate Size

Kilobyte KB or K 1 thousand bytes Megabyte MB 1 million bytes

Gigabyte GB 1 billion bytes Terabyte TB 1 trillion bytes

(31)

Name Abbr. Size

Kilo K 2^10 = 1,024

Mega M 2^20 = 1,048,576

Giga G 2^30 = 1,073,741,824

Tera T 2^40 = 1,099,511,627,776

Peta P 2^50 = 1,125,899,906,842,624

Exa E 2^60 = 1,152,921,504,606,846,976

Zetta Z 2^70 = 1,180,591,620,717,411,303,424

(32)

(33)

Course Technology Publishing

Slides 3, 5-8, 12-15 added by Mickie Mueller with graphics

from “Discovering Computers 2004: A Gateway to Information”

(34)

References

“The Architecture of Computer Hardware and Systems Software”, Irv Englander, ISBN: 0-471-36209-3