Write a script “remove_files.py” that will look in the chapter 7 practices files folder named “little pics” as well all of its subfolders. The script should use os.remove() to delete any JPG file found in any of these folders if the file is less than 2 Kb (2,000 bytes) in size.
You can supply the os.path.getsize() function with a full file path to return the file's size in bytes. Check the contents of the folders before running your script to make sure that you delete the correct files; you should only end up removing the files named “to be deleted.jpg” and “definitely has to go.jpg” - although you should only use the file extensions and file sizes to determine this.
If you mess up and delete the wrong files, there is a folder named “backup” that contains an exact copy of the “little pics” folder and all its contents so that you can copy these contents back and try again.
?
7.3) Read and write CSV data
he types of files we have to deal with in our everyday lives are usually more complicated than plain text files. If we want to be able to modify the contents of these files (rather than just copy, rename or delete them), we will need more complex systems for being able to read this information.
T
One common way of storing text data is in CSV files. “CSV” stands for Comma-Separated Value, because each entry in a row of data is usually separated from other entries by a comma. For example, the contents of the file named “wonka.csv” in the chapter 7 practice materials folder look like this:
First name,Last name,Reward
Charlie,Bucket,"golden ticket, chocolate factory"
Veruca,Salt,squirrel revolution Violet,Beauregarde,fruit chew
We have three columns of variables: First name, Last name, and Reward. Each line
represents another row of data, including the first row, which is a “header” row that tells us what each entry represents. Our entries have to appear in the same order for each row, with each entry separated from others by commas. Notice how "golden ticket, chocolate factory" is in quotes – this is because it contains a comma, but this comma isn't meant to separate one entry from another. There is no set standard for how to write out CSV files, but this particular file was created with Microsoft Excel, which added the quotation marks around the entry containing a comma.
If you open the “wonka.csv” practice file, it will most likely be opened
automatically by Excel, OpenOffice Calc, LibreOffice Calc, or a similar program;
all of these programs have the ability to read and write CSV data, which is one reason why this format is so useful. As long as you don't need to track
characteristics of a data file such as formatting and colors, it's usually easiest to export data to a CSV file before working with the data in Python; the CSV can always be opened in the appropriate program and re-saved as the proper format again later. CSV files can also be useful for importing or exporting data from systems such as SQL databases that we will learn about in chapter 9.
!
Python has a built-in csv module that makes it nearly as easy to read and write CSV files as any other sort of text file. Let's start with a basic example and read in our
wonka.csv file, then display its contents:
import csv import os
myPath = "C:/Real Python/Course materials/Chapter 7/Practice files"
with open(os.path.join(myPath, "wonka.csv"), "rb") as myFile:
myFileReader = csv.reader(myFile) for row in myFileReader:
print row
We opened a file just as we've done before, but this time we chose “rb” mode, which stands for read binary. The additional “binary” part is important here, because CSV files (despite their appearance) aren't saved in the same way as raw text files. (Specifically, on Windows we will end up saving extra newline characters after every line if we only specify “r” mode.) In Python 3, you should still use “r” mode instead of “rb” for reading CSV files.
We then created a CSV file reader using csv.reader() and passed it the file. Notice that we had to pass the actual opened file object to csv.reader(), not just the file
name! From there, we can easily loop over the rows of data in this CSV reader object, which are each displayed as a list of strings:
>>>
['First name', 'Last name', 'Reward']
['Charlie', 'Bucket', 'golden ticket, chocolate factory']
['Veruca', 'Salt', 'squirrel revolution']
['Violet', 'Beauregarde', 'fruit chew']
>>>
Much like with readline() versus readlines(), there is also a next() method that gets only the next row of data from a CSV reader object. This method is usually used as a simple method of skipping over a row of “header” data; for instance, if we wanted to read in and store all the information except the first line of our CSV file, we could add the line next(myFileReader) after opening the CSV file to skip over the first line, then loop through the remaining rows as usual.
If we know what fields to expect from the CSV ahead of time, we can even unpack them from each row into new variables in a single step:
import csv import os
myPath = "C:/Real Python/Course materials/Chapter 7/Practice files"
with open(os.path.join(myPath, "wonka.csv"), "rb") as myFile:
myFileReader = csv.reader(myFile) next(myFileReader)
for firstName, lastName, reward in myFileReader:
print "{} {} got: {}".format(firstName, lastName, reward)
After skipping the first header row with the next() function, we assigned the three values in each row to the three separate strings firstName, lastName and reward, which we then used inside of the for loop, generating this output:
>>>
['First name', 'Last name', 'Reward']
Charlie Bucket got: golden ticket, chocolate factory Veruca Salt got: squirrel revolution
Violet Beauregarde got: fruit chew
>>>
The first line of this output was generated by the call to the next() function rather than a print statement of ours.
The commas in CSV files are called delimiters because they are the character used to separate different pieces of the data. Sometimes a CSV file will use a different
character as a delimiter, especially if there are a lot of commas already contained in the data. For instance, let's read in the file “tabbed wonka.csv”, which uses tabs instead of commas to separate entries and looks like this:
First name Last name Reward
Charlie Bucket golden ticket, chocolate factory Veruca Salt squirrel revolution
Violet Beauregarde fruit chew
We can read files like this using the csv module just as easily as before, but we need to specify what character has been used as the delimiter:
import csv import os
myPath = "C:/Real Python/Course materials/Chapter 7/Practice files"
with open(os.path.join(myPath, "tabbed wonka.csv"), "rb") as myFile:
myFileReader = csv.reader(myFile, delimiter="\t") next(myFileReader)
for row in myFileReader:
print row
Here we used the special character “\t” to mean the “tab” character and assigned it to the argument delimiter when we created myFileReader.
Writing CSV files is accomplished using the csv.writer() method in much the same way. Just as rows of data read from CSV files appeared as lists of strings, we first need to structure the rows we want to write as lists of strings:
import csv import os
myPath = "C:/Real Python/Course materials/Chapter 7/Practice files/Output"
with open(os.path.join(myPath, "movies.csv"), "wb") as myFile:
myFileWriter = csv.writer(myFile)
myFileWriter.writerow(["Movie", "Rating"])
myFileWriter.writerow(["Rebel Without a Cause", "3"])
myFileWriter.writerow(["Monty Python's Life of Brian", "5"]) myFileWriter.writerow(["Santa Claus Conquers the Martians", "0"])
We opened a new file in “wb” mode this time so that we could write binary data.
(Again, with Python 3 you should stick to writing out CSVs in “w” mode.) We then wrote out individual rows to the CSV file writer object using its writerow() method.
We also could have used the writerows() method, which takes a list of rows, to write all the rows in a single line:
import csv import os
myPath = "C:/Real Python/Course materials/Chapter 7/Practice files/Output"
myRatings = [ ["Movie", "Rating"],
["Rebel Without a Cause", "3"],
["Monty Python's Life of Brian", "5"],
["Santa Claus Conquers the Martians", "0"] ] with open(os.path.join(myPath, "movies.csv"), "wb") as myFile:
myFileWriter = csv.writer(myFile) myFileWriter.writerows(myRatings)
If we wanted to export data created by a Python script to (for instance) an Excel workbook file, although it's possible to do this directly, it's usually sufficient and much easier to create a CSV file that we can then open later in Excel and, if needed, convert to the desired format. There are a number of special modules that have been designed for interacting with Microsoft Excel documents (although they all have their own
limitations), including xlrd and xlwt for reading and writing basic Excel files, openpyxl for manipulating Excel 2007 files, and XlsxWriter for creating .xlsx files from scratch.
Review exercises:
– Write a script that reads in the data from the CSV file “pastimes.csv”
located in the chapter 7 practice files folder, skipping over the header row – Display each row of data (except for the header row) as a list of strings
– Add code to your script to determine whether or not the second entry in each row (the “Favorite Pastime”) converted to lower-case includes the word
“fighting” using the string methods find() and lower()
– Use the list append() method to add a third column of data to each row that takes the value “Combat” if the word “fighting” is found and takes the value “Other” if neither word appears
– Write out a new CSV file “categorized pastimes.csv” to the Output folder with the updated data that includes a new header row with the fields
“Name”, “Favorite Pastime”, and “Type of Pastime”