• No results found

PA2: Word Cloud (100 Points)

N/A
N/A
Protected

Academic year: 2021

Share "PA2: Word Cloud (100 Points)"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

PA2: Word Cloud (100 Points)

Due: 11:59pm, Thursday, April 16th

Overview

You will create a program to read in a text file and output the most frequent and unique words by using an

ArrayList.

Setup

In all of the following, the > is a generic command line prompt (you do not type that).

You will need to create a new directory named pa2 in your cs8b home directory on ieng6.ucsd.edu.

>

cd

>

mkdir pa2

The first command (cd) changes your current directory to your home directory. cd stands for change directory. By default, if you do not specify a directory to change to the command will put you in your home directory.

The second command (mkdir pa2) makes a new directory named pa2. This new directory will be in your home directory since you did a cd beforehand.

Copy the provided files from the public directory by typing in:

>

cp ~/../public/pa2/* ~/pa2/

Now type

>

cd pa2

This will change your current working directory to the new pa2 directory you just created. All files

associated with this programming assignment must be place in this directory. And in general, you should do all your work on this programming assignment in this pa2 directory.

Once you have created and navigated to your pa2 directory you can run the following command:

>

ls

Your pa2 directory should now contain the following files:

// code files

WordPair.java // do not change WordCloudTester.java // do not change

(2)

// text files, will not be submitted

commonWords.txt // the common words in English to exclude small.txt // small text file, feel free to edit and/or change usdeclaration.txt // the US Declaration of Independence for testing usconst.txt // the US Constitution for testing

screenplayTIG.txt // text from The Imitation Game

// sample output files, will not be submitted

usconst_10.out // correct output for running the usconst with 10 words usdeclaration_10.out // correct output for running the usdeclaration with 20 words

To use the sample output to compare with the output of your program, you may use the following command (diff).

If the files are the same, then after you type in the diff command nothing will be displayed. If they’re different then you will see text on the screen starting from the line where the two files differ.

Example diff output if correct

$ java WordCloudTester usconst.txt 10 > myOutput.out $ diff myOutput.out usconst_10.out

Example diff output if incorrect: $ diff myOutput.out usconst_10.out 4c4

< States(110) President(102) United(85) State(75) Congress(57) Office(37) Law(35) Amendment(35) Person(34) House(33)

---> States(114) President(106) United(85) State(75) Congress(57) Office(37) Law(35) Amendment(35) Person(34) House(33)

For more information on diff, type: man diff

README ( 10 points )

You are required to provide a text file named README, NOT Readme.txt, README.pdf, or

README.docx, etc. with your assignment in your pa1 t directory. There should be no file extension after the file name “README”. Your README should include the following sections:

Program Description ( 3 points ) :

Describe what the program does as if it was intended for a 5 year old or your grandmother. Do not assume your reader is a computer science major.

Short Response ( 7 points ): Answer the following questions:

Vim related Questions:

1. How do you switch from insert mode to command mode in vim? 2. What are two ways to enter insert mode from command mode in vim? 3. a. How do you quit a file in vim?

b. How do you quit a file in vim, without having to save the file first? 4. a. How do you save a file in vim?

(3)

Unix/Linux related Questions:

5. a. How do you change directories from the command line?

b. How do you change to the home directory, no matter what directory you are currently in?

6. How do you make a directory from the command line?

7. How do you show the path to the directory you are currently in?

Style ( 20 points )

You will be graded for the style of programming on this assignment. A few suggestions/requirements for style are given below. These guidelines for style will have to be followed for all the remaining

assignments. Read them carefully.

● Use reasonable in-line comments to make your code clear and readable.

Use class headers and method header blocks to describe the purpose of your program and methods (see below). Also, use file headers.

● Every time you open a new block of code (use a '{'), indent farther. Go back to the previous level of indenting when you close the block (use a '}').

● Keep all lines less than 80 characters. Use 2-3 spaces for each level of indentation. Make sure each level of indentation lines up evenly.

Use reasonable variable names. Example:

if (bunnies are in your house){ if(you are not allergic to them){ rejoice(); playWithBunnies(); } else{ calmlyExitHouse(); haveSomeoneMoveBunny(); } }

Other options for alignment of brackets: if (you have glitter) { throwAtYourFriends(); } else { getGlitter(); }

● Use static final variables to make your code as general as possible.

● Judicious use of blank spaces around logical chunks of code makes your code much easier to read and debug.

(4)

● Do not use magic numbers or hard-coded numbers. This means that if you want to use a number other than 0, -1, or 1, you must give it a variable name. This is so that your values are

understandable and also can be changed later if need be.

Always recompile and run your program right before turning it in, just in case you commented out some code by mistake.

You will be specifically be graded on commenting, file headers, class and method headers, meaningful variable names, sufficient use of blank lines, not using more than 80 characters on a line, perfect indentation, and no magic numbers/hard-coded numbers other than zero.

A note on the starter files given: There are some comments to explain what the methods do but you should delete them and write your own comments instead. Do not mistake these comments for method headers as method headers require additional information and

need to follow the format below.

Example file header comment: /*

* Name: Jane-Joe Student

* Login: cs8bXX <<< --- Use your cs8b course-specific account name * Date: Month Day, Year

* File: Name of this file, for example: WordCloud.java

* Sources of Help: ... (for example: names of people, books, websites, etc.) *

* Describe what this program does here. */

Example class and method header comment: /*

* Name: Class or method name

* Purpose: Briefly describe the purpose of this class or method

* Parameters: List all parameters and their types and what they represent. * If no parameters, just state None.

* Return: Specify the return type and what it represents. * If no return value, just specify void.

*/

Correctness (70 points)

You are provided with the full code for WordPair.java and WordCloudTester.java. You do not need to modify these files. You will need to implement the following methods in WordCloud.java. The intended use for this class is:

1. read in the words from a file

2. strip out any common words (i.e., the, a, an) (the file with common words is given to you) 3. display the most occurring words in the file

(5)

25 points

void getWordsFromFile( String filename )

This method constructs an ArrayList containing WordPairs for each word in the file. Your algorithm for this will be:

for every word in the file

search for the word in the arrayList if the word is present

increment count

if word was not already in the arrayList add word to the arrayList

Also, see Scanner section below. As an important note, adding a word into the ArrayList should be case insensitive and should just keep the String in the cloud the same

as its first occurrence.

15 points

void removeCommon(String filename)

This method will read in each word from the specified file and remove that word from the ArrayList. To remove the word from the list, beware that you cannot iterate over a list while modifying it. (In other words, your for loop which iterates over each value should start over whenever you remove a word.) In addition, your method need not be efficient (nested for loops is fine). Also, removing a word is case insensitive like in the method before.

Also, see Scanner section below.

25 points

void printTopNWords(int n)

This method will print the top n words, as determined by their frequency, in the array list.

To implement this method, you can iterate through the list looking for the word with the highest count. Once you’ve printed this word out, you can negate its count. When you are done outputting the top n words, you will need to iterate through the array list again and change the negated counts back to their original values. For instance, if “cat” originally has the count 222, then once it has been printed out, its count should be -222. By the time the method finishes running, cat’s count should be 222 again.

You are required to be able to have printTopNWords execute more than once and produce the same results. In the case of a tie (two words are equally frequent), you should select the word which occurred first in the original text file. You should not sort the array list. You will lose all points for this method if you sort the array list.

Also, you will be graded for formatting of the outputted text. If your formatting is wrong, up to 10 points will be deducted. Make sure that spaces and parentheses are consistent with what is provided in the sample output.

5 points

int findWordCount(String word)

This method will take a string and search for it in the arrayList. If the word is found, its count is returned. If the word isn’t found, then 0 is returned. Also, searching for a word is case insensitive like in the

(6)

Scanner

Use a Scanner object to read words from the file.

The Scanner Javadocs: http://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html

The basic framework to use a Scanner is:

// Construct a Scanner that reads in words from a file Scanner input = new Scanner( new File(fileName));

while ( input.hasNext() ) // while there are more words to be read in {

number = input.next(); // reads next string …

}

WordPair

This class is just a pairing of a String (a word) with an integer (the number of occurrences) for use in your WordCloud class. Your ArrayList will store WordPair objects. Applicable methods are provided.

ArrayLists

ArrayLists are dynamically resizing arrays. In Java, standard arrays are initialized to a fixed size when first created. However, with ArrayLists, there is an initial size but when it becomes full, it automatically

enlarges. When an element is removed, it automatically gets smaller.

Go to the Useful Links section on the course website for a link to Java API where you can find information about ArrayLists and the methods they have. Some suggested methods to use are : get, size, add, and remove.

How To Test (Sample Output)

The following is an example of using the WordCloudTester on the provided file (usdeclaration.txt), requesting the top 20 words.

> java WordCloudTester usdeclaration.txt 20

Reading in File: usdeclaration.txt Removing common words

Displaying the top 20 words

laws(8) people(7) government(5) States(4) powers(4) assent(4) large(4) time(4)

independent(4) free(4) Declaration(3) United(3) mankind(3) hold(3) rights(3) long(3) abolishing(3) usurpations(3) absolute(3) repeated(3)

Turnin

To turnin your code, navigate to your home directory and run the following command:

>

cse8bturnin pa2

You may turn in your programming assignment as many times as you like. The last submission you turn in before the deadline is the one that we will collect. Always recompile and run your program right before turning it in, just in case you commented out some code by mistake.

(7)

Verify

To verify a previously turned in assignment,

>

cse8bverify pa2

If you are unsure your program has been turned in, use the verify command. We will not take any late files you forgot to turn in. Verify will help you check which files you have successfully

submitted. It is your responsibility to make sure you properly turned in your assignment.

Files to be collected:

README

WordCloud.java WordCloudTester.java WordPair.java

The files that you turn in must be EXACTLY the same name as those above.

Extra Credit

Extra credit will be given for turning in your assignment early. You can earn up to maximum of 3 points (3%) extra credit.

Final Turnin Date: Extra Credit: Tuesday, April 14 11:59pm 3pts

Note: Only your latest turnin submission will be considered for receiving extra credit. This is because each submission overrides the last one.

NO LATE ASSIGNMENTS ACCEPTED.

DO NOT EMAIL US YOUR ASSIGNMENT!

References

Related documents

Next a SPITBOL program read each file of sorted words, counted like words and created still another file listing each word and its frequency and giving the

The South East offices cater for 76% of the business’ Sales roles; roles which attract a higher rate of pay due to their very nature and therefore the demographic of the business

or by following polytheism, etc.. ruined or lost the money without any profit, or punished by the loss of all that we spend for cultivation, etc.)! [See Tafsir Al-Qurtubi, Vol.

Benicia Unified School District,&#34; 7 where the court held that an insurer that is neither in breach of its duty to defend, nor a party to the settlement negotiations,

Write an assembly program to read a character from keyboard then display within Write an assembly program to read a character from keyboard then display within

In the current sample, within each sex, there were no significant differences in median scores between children with and without congenital adrenal hyperplasia (CAH), except

microdermabrasion, career development, hormones, plastic surgery overview, intro to eyelash extensions, advanced equipment and pre and post operative treatments. Graduates of

The latter case can be seen in Figure 1, where the system uses the Toulmin- based description of the selected interview, with the follow- ing rule: select video segments