Use pattern matching to delete files

Write a script “remove_files.py” that will look in the chapter 7 practices files folder named “little pics” as well all of its subfolders. The script should use os.remove() to delete any JPG file found in any of these folders if the file is less than 2 Kb (2,000 bytes) in size.

You can supply the os.path.getsize() function with a full file path to return the file's size in bytes. Check the contents of the folders before running your script to make sure that you delete the correct files; you should only end up removing the files named “to be deleted.jpg” and “definitely has to go.jpg” - although you should only use the file extensions and file sizes to determine this. If you mess up and delete the wrong files, there is a folder named “backup” that contains an exact copy of the “little pics” folder and all its contents so that you can copy these contents back and try again.

?

7.3) Read and write CSV data

he types of files we have to deal with in our everyday lives are usually more complicated than plain text files. If we want to be able to modify the contents of these files (rather than just copy, rename or delete them), we will need more complex systems for being able to read this information.

T

One common way of storing text data is in CSV files. “CSV” stands for Comma-Separated Value, because each entry in a row of data is usually separated from other entries by a comma. For example, the contents of the file named “wonka.csv” in the chapter 7 practice materials folder look like this:

First name,Last name,Reward

Charlie,Bucket,"golden ticket, chocolate factory" Veruca,Salt,squirrel revolution

Violet,Beauregarde,fruit chew

We have three columns of variables: First name, Last name, and Reward. Each line represents another row of data, including the first row, which is a “header” row that tells us what each entry represents. Our entries have to appear in the same order for each row, with each entry separated from others by commas. Notice how "golden

ticket, chocolate factory" is in quotes – this is because it contains a comma, but this comma isn't meant to separate one entry from another. There is no set standard for how to write out CSV files, but this particular file was created with Microsoft Excel, which added the quotation marks around the entry containing a comma.

If you open the “wonka.csv” practice file, it will most likely be opened

automatically by Excel, OpenOffice Calc, LibreOffice Calc, or a similar program; all of these programs have the ability to read and write CSV data, which is one reason why this format is so useful. As long as you don't need to track

characteristics of a data file such as formatting and colors, it's usually easiest to export data to a CSV file before working with the data in Python; the CSV can always be opened in the appropriate program and re-saved as the proper format again later. CSV files can also be useful for importing or exporting data from systems such as SQL databases that we will learn about in chapter 9.

!

Python has a built-in csv module that makes it nearly as easy to read and write CSV files as any other sort of text file. Let's start with a basic example and read in our

wonka.csv file, then display its contents:

import csv import os

myPath = "C:/Real Python/Course materials/Chapter 7/Practice files" with open(os.path.join(myPath, "wonka.csv"), "rb") as myFile:

myFileReader = csv.reader(myFile) for row in myFileReader:

print row

We opened a file just as we've done before, but this time we chose “rb” mode, which stands for read binary. The additional “binary” part is important here, because CSV files (despite their appearance) aren't saved in the same way as raw text files. (Specifically, on Windows we will end up saving extra newline characters after every line if we only specify “r” mode.)

We then created a CSV file reader using csv.reader() and passed it the file. Notice that we had to pass the actual opened file object to csv.reader(), not just the file name! From there, we can easily loop over the rows of data in this CSV reader object, which are each displayed as a list of strings:

>>>

['Charlie', 'Bucket', 'golden ticket, chocolate factory'] ['Veruca', 'Salt', 'squirrel revolution']

['Violet', 'Beauregarde', 'fruit chew'] >>>

Much like with readline() versus readlines(), there is also a next() method that gets only the next row of data from a CSV reader object. This method is usually used as a simple method of skipping over a row of “header” data; for instance, if we wanted to read in and store all the information except the first line of our CSV file, we could add the line myFileReader.next() after opening the CSV file to skip over the first line, then loop through the remaining rows as usual.

If we know what fields to expect from the CSV ahead of time, we can even unpack them from each row into new variables in a single step:

import csv import os

myPath = "C:/Real Python/Course materials/Chapter 7/Practice files" with open(os.path.join(myPath, "wonka.csv"), "rb") as myFile:

myFileReader = csv.reader(myFile) myFileReader.next()

for firstName, lastName, reward in myFileReader:

print "{} {} got: {}".format(firstName, lastName, reward)

After skipping the first header row, we assigned the three values in each row to the three separate strings firstName, lastName and reward, which we then used inside of the for loop, generating this output:

>>>

Charlie Bucket got: golden ticket, chocolate factory Veruca Salt got: squirrel revolution

Violet Beauregarde got: fruit chew >>>

The commas in CSV files are called delimiters because they are the character used to separate different pieces of the data. Sometimes a CSV file will use a different

character as a delimiter, especially if there are a lot of commas already contained in the data. For instance, let's read in the file “tabbed wonka.csv”, which uses tabs instead of commas to separate entries and looks like this:

First name Last name Reward

Veruca Salt squirrel revolution Violet Beauregarde fruit chew

We can read files like this using the csv module just as easily as before, but we need to specify what character has been used as the delimiter:

import csv import os

myPath = "C:/Real Python/Course materials/Chapter 7/Practice files" with open(os.path.join(myPath, "tabbed wonka.csv"), "rb") as myFile: myFileReader = csv.reader(myFile, delimiter="\t")

myFileReader.next() for row in myFileReader: print row

Here we used the special character “\t” to mean the “tab” character and assigned it to the argument delimiter when we created myFileReader.

Writing CSV files is accomplished using the csv.writer() method in much the same way. Just as rows of data read from CSV files appeared as lists of strings, we first need to structure the rows we want to write as lists of strings:

import csv import os

myPath = "C:/Real Python/Course materials/Chapter 7/Practice files/Output" with open(os.path.join(myPath, "movies.csv"), "wb") as myFile:

myFileWriter = csv.writer(myFile)

myFileWriter.writerow(["Movie", "Rating"])

myFileWriter.writerow(["Rebel Without a Cause", "3"])

myFileWriter.writerow(["Monty Python's Life of Brian", "5"]) myFileWriter.writerow(["Santa Claus Conquers the Martians", "0"])

We opened a new file in “wb” mode this time so that we could write binary data. We then wrote out individual rows to the CSV file writer object using its writerow() method. We also could have used the writerows() method, which takes a list of rows, to write all the rows in a single line:

import csv import os

myPath = "C:/Real Python/Course materials/Chapter 7/Practice files/Output" myRatings = [ ["Movie", "Rating"],

["Rebel Without a Cause", "3"],

["Monty Python's Life of Brian", "5"],

myFileWriter = csv.writer(myFile) myFileWriter.writerows(myRatings)

If we wanted to export data created by a Python script to (for instance) an Excel workbook file, although it's possible to do this directly, it's usually sufficient and much easier to create a CSV file that we can then open later in Excel and, if needed, convert to the desired format.

Review exercises:

– Write a script that reads in the data from the CSV file “pastimes.csv” located in the chapter 7 practice files folder, skipping over the header row – Display each row of data (except for the header row) as a list of strings

– Add code to your script to determine whether or not the second entry in each row (the “Favorite Pastime”) converted to lower-case includes the word “fighting” using the string methods find() and lower()

– Use the list append() method to add a third column of data to each row that takes the value “Combat” if the word “fighting” is found and takes the value “Other” if neither word appears

– Write out a new CSV file “categorized pastimes.csv” to the Output folder with the updated data that includes a new header row with the fields “Name”, “Favorite Pastime”, and “Type of Pastime”

In document Real Python (Page 98-102)