Data Representation
How do computers represent data?
▪ Recognize only two discrete states: on or off
▪ Use a binary system to recognize two states
▪ Use number system with two unique digits: 0 and 1, called bits (short for
binary digits)
▪ Smallest unit of data computer can process
Data Representation
What is a byte?
Eight bits grouped together as a unit
Provides enough different combinations of 0s and 1s to represent 256 individual characters
▪ Numbers
▪ Uppercase
and lowercase letters
Converting Binary to Decimal
Decimal number system is base 10 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
Uses 10 numbers
23,625
Power of 10
representation 104 103 102 101 100
Decimal
representation 10000 1000 100 10 1
Base 10
Converting Binary to Decimal
Binary number system is base 2
0, 1
Uses 2 numbers
10010001 = 145
Base 2
representation 27 26 25 24 23 22 21 20
Decimal
representation 128 64 32 16 8 4 2 1
Base 2
Converting Decimal to Binary
Convert decimal 35 to binary
1. Using 8 bits, find largest power of 2 that will “fit” into 35
2. Place a 1 into that slot
3. If the # doesn’t fit, place a 0 into that slot
Power of 2
representation 27 26 25 24 23 22 21 20
Decimal
representation 128 64 32 16 8 4 2 1
Base 2
representation 0 0 1 0 0 0 1 1
Convert Binary to Decimal
1. Choose an 8 bit binary number = 10101110
2. Write the binary digits under the correct column
3. For each column with a 1, you will add that decimal value
4. You will not add the values of the columns you entered 0
Power of 2
representation 27 26 25 24 23 22 21 20
Decimal
representation 128 64 32 16 8 4 2 1
Base 2
representation 1 0 1 0 1 1 1 0
10101110 = 174
Binary Representation (1)
Why binary representation (as suppose to decimal or octal, etc..)?
Because the devices that store and manage the digital data are far less expensive and complex for binary representation.
They are also far more reliable when they have to represent one out of two possible values.
Binary Representation (2)
One bit can be either 0 or 1. Therefore, one bit can represent only two things.
To represent more than two things, we need multiple bits. Two bits can represent four things because there are four combinations of 0 and 1 that can be made from two bits: 00, 01, 10,11.
In general, n bits can represent 2n things because there are 2n combinations of 0 and 1 that can be made
from n bits. Note that every time we increase the number of bits by 1, we double the number of
Computing Systems Data
Usually the computing systems are complex devices, dealing with a vast array of information categories The computing systems store, present, and help us
modify: Text
Audio
Data Formats -
How to Interpret DataMeaning of internal representation must be
appropriate for the type of processing to take place:
i.e. Images & sound: have to be digitized
▪ Images – need detailed description of the data, how color is represented at each data point
▪ Sound – need sampling rate
Proprietary formats
Unique to a product or company
E.g., Microsoft Word, Corel Word Perfect, IBM Lotus Notes
Standards
Evolve two ways:
▪ Proprietary formats become de facto standards (e.g., Adobe PostScript, Apple Quick Time)
▪ Committee is struck to solve a problem (Motion Pictures Experts Group,
Why Standards?
They exist because they are:
Convenient – sometimes the time to market is very important
whenever trying to finish a product, therefore existing
standards may be used to save time elaborating own protocols and interfaces
Efficient – most of the standards are put together by
committees with a wide experience in the specific area
Flexible – usually the standards allow for manufacturer or
OEM specific extensions
Appropriate – address a specific problem in a specific domain
Allow communication and sharing of information
Allow computing systems and software to interoperate (at both hardware and software levels)
Standards Organizations
ISO – International Standards Organization IEEE – Institute for Electrical and Electronics
Engineers
CSA – Canadian Standards Association
Examples of Standards
Type of Data Standards
Alphanumeric ASCII, Unicode
Image JPEG, GIF, PCX, TIFF, BMP,
etc
Motion picture MPEG-2, MPEG-4, etc
Sound WAV, AU, MP3, etc..
Data Representation
What are three popular coding systems to represent data?
ASCII—American Standard Code for Information Interchange EBCDIC—Extended Binary Coded Decimal Interchange Code Unicode—coding scheme capable of representing all
world’s languages
ASCII Symbol EBCDIC
00110000 0 11110000
00110001 1 11110001
00110010 2 11110010
Data Representation
How is a letter converted to binary form and back?
Step 1.
The user presses the capital letter D (shift+D key) on the keyboard.
Step 2.
An electronic signal for the capital letter D is sent to the system unit.
Step 3.
The signal for the capital letter D is converted to its ASCII binary code (01000100) and is stored in memory for processing.
Step 4.
Codes and Characters
The problem:
Representing text strings, such as “Hello, world”, in a computer
Each character is coded as a byte ( = 8 bits) Most common coding system is ASCII
ASCII = American National Standard Code for Information Interchange
ASCII Features
7-bit code
8th bit is unused (or used for a parity bit) 27 = 128 codes
Two general types of codes:
95 are “Graphic” codes (displayable on a console)
Most significant bit
= = = = = = = = = = = = Binary 01001000 01100101 01101100 01101100 01101111 00101100 00100000 01110111 01101111 01110010 01101100 01100100 Hexadecimal 48 65 6C 6C 6F 2C 20 77 6F 72 6C 64 Decimal 72 101 108 108 111 44 32 119 111 114 108 100 H e l l o , w o r l d = = = = = = = = = = = = = = = = = = = = = = = =
Note: 12 characters – requires 12 bytes Each character requires 1 byte
Unicode (1)
The extended version of the ASCII character set is not enough for international use.
The Unicode character set uses 16 bits per character. Therefore, the Unicode character set can represent 216, or over 65 thousand, characters.
Unicode was designed to be a superset of ASCII. That is, the first 256 characters in the Unicode character set correspond exactly to the extended ASCII
Unicode (2)
Version 2.1
1998
Improves on version 2.0
Includes the Euro sign (20AC16 = ) From the standard:
▪ …contains 38,887 distinct coded characters derived from the supported scripts. These characters cover the principal written languages of the Americas, Europe, the Middle East, Africa, India, Asia, and Pacifica.
Latest version of Unicode is 4.0
Memory
What is memory?
Electronic components that store instructions, data, and results
Consists of one or more chips on
motherboard or other circuit board Each byte stored in unique location called an address, similar to addresses on a passenger train
Seat #2B4
Memory
● Stores three basic categories of items: 1. OS and system software
2. Application programs
3. Data and information
● Byte is basic storage unit in memory
● To access data or instructions in memory, computer
references the address that contain the bytes of data
● Manufacturers state the size of memory and
Memory
How is memory measured?
Term Abbreviation Approximate Size
Kilobyte KB or K 1 thousand bytes Megabyte MB 1 million bytes
Gigabyte GB 1 billion bytes Terabyte TB 1 trillion bytes
Name Abbr. Size
Kilo K 2^10 = 1,024
Mega M 2^20 = 1,048,576
Giga G 2^30 = 1,073,741,824
Tera T 2^40 = 1,099,511,627,776
Peta P 2^50 = 1,125,899,906,842,624
Exa E 2^60 = 1,152,921,504,606,846,976
Zetta Z 2^70 = 1,180,591,620,717,411,303,424
Slides 1, 2, 4, 9, 10, 11 from Chapter 4 The Components of the System Unit; “Discovering Computers 2004: A Gateway to Information” by Shelly, Cashman, Vermaat; © 2003;
Course Technology Publishing
Slides 3, 5-8, 12-15 added by Mickie Mueller with graphics
from “Discovering Computers 2004: A Gateway to Information”
References
“The Architecture of Computer Hardware and Systems Software”, Irv Englander, ISBN: 0-471-36209-3