Counting letters - Data Types - How to Think Like a Computer Scientist: Learning with Python 3

Data Types

5.4 Dictionaries

5.4.4 Counting letters

In the exercises in Chapter 8 (Strings) we wrote a function that counted the number of occurrences of a letter in a string. A more general version of this problem is to form a frequency table of the letters in the string, that is, how many times each letter appears.

Such a frequency table might be useful for compressing a text file. Because different letters appear with different frequencies, we can compress a file by using shorter codes for common letters and longer codes for letters that appear less frequently.

Dictionaries provide an elegant way to generate a frequency table: >>> letter_counts = {}

>>> for letter in "Mississippi":

... letter_counts[letter] = letter_counts.get(letter, 0) + 1

...

>>> letter_counts

{'M': 1, 's': 4, 'p': 2, 'i': 4}

We start with an empty dictionary. For each letter in the string, we find the current count (possibly zero) and increment it. At the end, the dictionary contains pairs of letters and their frequencies.

It might be more appealing to display the frequency table in alphabetical order. We can do that with theitemsand

sortmethods (more precisely,sortorders lexicographically):

>>> letter_items = list(letter_counts.items())

>>> letter_items.sort()

>>> print(letter_items)

[('M', 1), ('i', 4), ('p', 2), ('s', 4)]

Notice in the first line we had to call the type conversion functionlist. That turns the promise we get fromitems

into a list, a step that is needed before we can use the list’ssortmethod.

5.4.5 Glossary

call graph A graph consisting of nodes which represent function frames (or invocations), and directed edges (lines with arrows) showing which frames gave rise to other frames.

dictionary A collection of key:value pairs that maps from keys to values. The keys can be any immutable value, and the associated value can be of any type.

immutable data value A data value which cannot be modified. Assignments to elements or slices (sub-parts) of immutable values cause a runtime error.

key A data item that ismapped toa value in a dictionary. Keys are used to look up values in a dictionary. Each key

must be unique across the dictionary.

key:value pair One of the pairs of items in a dictionary. Values are looked up in a dictionary by key.

mapping type A mapping type is a data type comprised of a collection of keys and associated values. Python’s only

built-in mapping type is the dictionary. Dictionaries implement theassociative arrayabstract data type.

memo Temporary storage of precomputed values to avoid duplicating the same computation.

mutable data value A data value which can be modified. The types of all mutable values are compound types. Lists and dictionaries are mutable; strings and tuples are not.

5.4.6 Exercises

1. Write a program that reads a string and returns a table of the letters of the alphabet in alphabetical order which occur in the string together with the number of times each letter occurs. Case should be ignored. A sample output of the program when the user enters the data “ThiS is String with Upper and lower case Letters”, would look this this:

a 2 c 1 d 1 e 5 g 1 h 2 i 4 l 2 n 2 o 1 p 2 r 4 s 5 t 5 u 1 w 2

2. Give the Python interpreter’s response to each of the following from a continuous interpreter session: a. >>> dictionary = {"apples": 15, "bananas": 35, "grapes": 12}

>>> dictionary["bananas"]

b. >>> dictionary["oranges"] = 20 >>> len(dictionary)

c. >>> "grapes" in dictionary

d. >>> dictionary["pears"]

f. >>> fruits = list(dictionary.keys())

>>> fruits.sort()

>>> print(fruits)

g. >>> del dictionary["apples"]

>>> "apples" in dictionary

Be sure you understand why you get each result. Then apply what you have learned to fill in the body of the function below:

1 def add_fruit(inventory, fruit, quantity=0): 2 return None

4 # Make these tests work...

5 new_inventory = {}

6 add_fruit(new_inventory, "strawberries", 10)

7 print("strawberries" in new_inventory) 8 print(new_inventory["strawberries"] == 10)

9 add_fruit(new_inventory, "strawberries", 25)

10 print(new_inventory["strawberries"] == 35)

3. Write a program calledalice_words.pythat creates a text file namedalice_words.txt containing

an alphabetical listing of all the words, and the number of times each occurs, in the text version of Alice’s

Adventures in Wonderland. (You can obtain a free plain text version of the book, along with many others, from http://www.gutenberg.org.) The first 10 lines of your output file should look something like this:

Word Count ======================= a 631 a-piece 1 abide 1 able 1 about 94 above 3 absence 1 absurd 2

How many times does the wordaliceoccur in the book?

Numpy

The standard Python data types are not very suited for mathematical operations. For example, suppose we have the lista = [2, 3, 8]. If we multiply this list by an integer, we get:

>>> a = [2, 3, 8]

>>> 2 _* a

[2, 3, 8, 2, 3, 8]

Andfloat’s are not even allowed: >>> a = [2, 3, 8]

>>> 2 * a

>>> 2.1 * a

TypeError: can't multiply sequence by non-int of type 'float'

In order to solve this using Python lists, we would have to do something like:

values = [2, 3, 8]

result = []

for x in values:

result.append(2.1 _* x)

This is not very elegant, is it? This is because Pythonlist’s are not designed as mathematical objects. Rather, they

are purely a collection of items. In order to get a type of list which behaves like a mathematical array or matrix, we use Numpy.

>>> import numpy as np

>>> a = np.array([2, 3, 8])

>>> 2.1 * a

array([ 4.2, 6.3, 16.8])

As we can see, this worked the way we expected it to. We note a couple of things: - We abbreviated numpy to np, this

is conventional. -np.arraytakes a Python list as argument. - The list[2, 3, 8]containsint’s, yet the result

containsfloat’s. This means numpy changed the data type automatically for us.

>>> import numpy as np >>> a = np.array([2, 3, 8]) >>> a _* a array([ 4, 9, 64]) >>> a_**2 array([ 4, 9, 64])

This has nicely squared the array element-wise.

Note: Those in the know might be a bit surprised by this. After all, ifais a vector, shouldn’ta**2

be the dot product of the two vectors,⃗𝑎·⃗𝑎? Well,numpyarrays are not vectors in the algebraic sense.

Arithmetic operations between arrays are performed element-wise, not on the arrays as a whole.

To tellnumpywe want the dot product we simply use thenp.dotfunction:

>>> a = np.array([2, 3, 8])

>>> np.dot(a,a)

Furthermore, if you pass 2D arrays tonp.dotit will behave like matrix multiplication. Several other

similar NumPy algebraic functions are available (likenp.cross,np.outer, etc.)

Bottom line: when you want to treatnumpyarray operations as vector or matrix operations, make use of the specialized functions to this end.

6.1 Shape

One of the most important properties an array is its shape. We have already seen 1 dimensional (1D) arrays, but arrays can have any dimensions you like. Images for example, consist of a 2D array of pixels. But in color images every pixel is an RGB tuple: the intensity in red, green and blue. Every pixel itself is therefore an array as well. This makes a color image 3D overall.

To get the shape of an array, we useshape:

>>> import numpy as np

>>> a = np.array([2, 3, 8])

>>> a.shape

(3,)

Something slightly more interesting: >>> b = np.array([ [2, 3, 8], [4, 5, 6], ]) >>> b.shape (2, 3)

6.2 Slicing

Just like with lists, we might want to select certain values from an array. For 1D arrays it works just like for normal python lists:

>>> a = np.array([2, 3, 8])

>>> a[2]

>>> a[1:]

np.array([3, 8])

However, when dealing with higher dimensional arrays something else happens: >>> b = np.array([ [2, 3, 8], [4, 5, 6], ]) >>> b[1] array([4, 5, 6]) >>> b[1][2] 6

We see that usingb[1]returns the 1th row along the first dimenion, which is still an array. After that, we can select

individual items from that. This can be abbreviated to: >>> b[1, 2]

But what if I wanted the 1th column instead of the first row? Then we use:to select all items along the first dimension,

and then a 1: >>> b[:, 1]

array([3, 5])

By comparing with the definition ofb, we see that this is the column we were looking for.

Note: Instead of first, I write 1th on purpose to signify the existence of a 0th element. Remember that in Python, as in any self-respecting programming language, we start counting at zero.

Find out more about advanced slicing at theNumpy indexing documentationpage.

6.3 Masking

This is perhaps the single most powerful feature of Numpy. Suppose we have an array, and we want to throw away all values above a certain cutoff:

>>> a = np.array([230, 10, 284, 39, 76])

>>> cutoff = 200 >>> a > cutoff

np.array([True, False, True, False, False])

Simply using the larger than operator lets us know in which cases the test was positive. Now we set all the values above 200 to zero: >>> a = np.array([230, 10, 284, 39, 76]) >>> cutoff = 200 >>> a[a > cutoff] = 0 >>> a np.array([0, 10, 0, 39, 76])

The crucial line isa[a > cutoff] = 0. This selects all the points in the array where the test was positive and assigns 0 to that position. Without knowing this trick we would have had to loop over the array:

>>> a = np.array([230, 10, 284, 39, 76]) >>> cutoff = 200 >>> new_a = [] >>> for x in a: >>> if x > cutoff: >>> new_a.append(0) >>> else: >>> new_a.append(x) >>> a = np.array(new_a)

Looks rather silly now, doesn’t it? When working with images this becomes even more obvious, because there we might have to loop over three dimensions before we can use the if/else. Can you imagine the mess?

In document How to Think Like a Computer Scientist: Learning with Python 3 Documentation - How to Think Like a Computer Scientist - Free Computer, Programming, Mathematics, Technical Books, Lecture Notes and Tutorials (Page 133-140)