• No results found

How Real-World Information Becomes Computable Data

—ARTHUR CONAN DOYLE

2.3 DATA CAPACITY

Data encoding requires us to know how many bits are required to store a piece of information. When storing the symbols on a keyboard, for exam- ple, we must know how many bits would be required to store any one of the symbols. Would it be possible to encode a keyboard symbol as a 3 bit string or a 4 bit string rather than an 8 bit string?

Consider the following scenario. You are given a list of all keyboard symbols and you are also given a bit string of length three. Your task is to generate a unique bit pattern for each of the symbols in your list. You begin with the symbol “A” and decide to encode this symbol as 000. As you move through the alphabet, you choose to encode “B” as 001, “C” as 010, “D” as 011, “E” as 100, “F” as 101, “G” as 110, and “H” as 111. At this point you realize that there are no patterns left to encode the remaining symbols of the keyboard and hence a bit string of length three is not suf- ficient to encode one keyboard symbol.

In general, the number of bits required to store a piece of information is proportional to the number of values that the information may take. A single day of the week can be encoded as a bit string of length three since there are seven days in a week and there are eight patterns avail- able. A single month of the year can be stored in a bit string of length four since there are 12 months in a year and 16 patterns available. Information that involves a large set of possible values will therefore require longer bit strings to encode while information of little content can be encoded in shorter bit strings. Figure 2.5 illustrates how the number of bits that are required to encode information of a certain type is related to the number of values associated with that type.

The data capacity of a computing system is the amount of informa- tion that can be encoded by the system. Since the data capacity is directly related to the number of bits that are available on the system, the data capacity is simply a count of the number of bits. Data capacity is not usu- ally based on a direct count of the number of bits, but rather is based on the unit of measure known as a byte. A byte is a bit string of length eight. A single byte, therefore, is able to store 28 or 256 unique patterns.

One other measure of data capacity is known as a word, which is a unit of data capacity that is based on the hardware of a computing system. A word is a fixed-length sequence of bits that are processed as a single item

by the processor. The number of bits in a word varies by computing sys- tem but will typically be a multiple of eight. You may have heard of 32 bit processors or 64 bit systems. These phrases describe the word-length of a particular computing system. Common word lengths include 8, 16, 32, and 64.

Prefixes are used as multipliers to measure very large data capacities and the symbol B is used to denote a single byte. The computing industry uses terms such as kilobyte, megabyte, and gigabyte, and corresponding symbols KB, MB, and GB as common measures of data capacity. Figure 2.6 shows the most common prefixes and their meaning as both a power of 2 and an approximate decimal value.

Prefix Symbol Power of 2 Decimal Kilo K 210 ~103

Mega M 220 ~106

Giga G 230 ~109

Tera T 240 ~1012

Peta P 250 ~1015

FIGURE 2.6 Data capacity prefixes.

Type of Information Number of Values Number of Bits coin toss 2 1 day of week 7 3 month of year 12 4 day of month 31 5 keyboard symbol ~104 7 day of year 365 9

FIGURE 2.5 The approximate number of bits required to store various types of information.

Computing systems are used to store vastly different types of informa- tion. Digital music players can record, store, and play back vast libraries of audio recordings. Cell phones can store and display high-quality video streams, while other systems gather and analyze massive amounts of sci- entific data for weather prediction or advancing scientific knowledge. The data capacity required by various types of information varies since the richness of the information content varies by type. Figure 2.7 shows how much data capacity is required to store certain types of information.

2.4 DATA TYPES AND DATA ENCODING

We have already asserted that all digital data is encoded as a sequence of bits and that information is associated with the various patterns that these bits may exhibit. This section gives further insight into how specific types of information are encoded and describes some of the inner workings of a computing system. We begin by discussing how numbers are encoded and then give a brief overview of how colors, pictures, and sound can be encoded. 2.4.1 Numbers

2.4.1.1 Numeral Systems

A numeral system is a way of representing numbers in written form. Consider, for example, the three numbers shown in Figure 2.8. If these markings are interpreted using the numeral system known as tally

Type of Information Data Capacity (Bytes) keyboard symbol (letter) 1 B

10 page paper 40 KB five minute MP3 audio recording 5 MB high resolution digital picture 5 MB CD audio disk 800 MB

DVD 8.5 GB