Virtually all software performs analysing tasks; if this were not the case then the software application would not produce any useful information. There are a variety of different analysis features that are present in numerous software applications; our aim in this section is to examine common examples of such features and discuss how they are, or can be used to transform data into information.
In terms of software, the efficiency of analysing processes is largely determined by the organisation of the underlying data therefore it is important to consider the type of analysis to be performed when choosing a method of organisation.
In this section we examine the following software features used for analysis:
• Searching/selecting data • What-if scenarios
• Sorting • Charts and graphs
• Modelling/simulations • File comparison SEARCHING/SELECTING DATA
Most software applications search and select data based on some criteria. In many software applications, the user can directly initiate a search to find all occurrences of a particular data item. For example, the find dialogue from Microsoft
Excel, shown in Fig 5.6, is used for this purpose. In many software applications searching takes place as an integral part of some other larger process, in fact many analysing processes include various simple and complex searches. For example, to create a pie chart requires that data be grouped according to various categories; a search is being performed to allocate each data item to its correct category.
What do we mean by searching and selecting and is there a difference? Both searching and selecting are processes that identify required data within a larger set of data. Commonly the term ‘searching’ is used to describe the process of actually retrieving the data;
searching logically examines data items and compares them to some criteria. Any data that matches the search criteria forms part of
the resulting information. Such results can be displayed one at a time as they are found or all the results of the search can be retrieved in preparation for further processing or prior to display. On the other hand, the term ‘selecting’ is generally used to describe the process of specifying the source of the data to be searched. The technique for selecting the source data depends on the information system and the nature of the search. It may mean selecting a particular file or files, it may mean selecting part of a file such as a paragraph in a text document, a particular field within a database, or even a particular range of pixels within an image. Searching is performed on the selected data using the specified criteria.
Fig 5.6
Find dialogue from Microsoft Excel.
GROUP TASK Activity
Examine the ‘Find’ dialogue in various software applications. List and describe the various criteria that can be set prior to the search being initiated. In each case, how is the data source selected?
Search
To look through a collection of data in order to locate a
required piece of data.
If the source data to be searched is not sorted into an appropriate order then searching requires each data item to be examined in turn. On the other hand, if the data is sorted appropriately then the search process can execute more efficiently. Consider manually searching the white pages telephone directory for a specific name, as the white pages is sorted by name and we wish to find a specific name, the search is a simple one. If the names were in a random order, or we were searching for a specific telephone number then this would be a most tedious task.
The required data is determined by applying criteria, where the criteria is commonly a rule or set of rules that must be correct for each found data item. For example, in Fig 5.6 on the previous page, the criterion is the text ‘Fred’ therefore the find process searches for text that equals ‘Fred’. The search process considers each data item and decides if the data item fulfils the criteria or rules, if the current data item fulfils the criteria then it becomes part of the results. The mechanics of the actual searching and selecting processes are commonly provided within most software applications; the user does not need to concern themselves with the detail of how the process is performed, rather they merely initiate the
search after specifying the source of the data and the search criteria. For example, to retrieve the names of all the year 7 girls within a school’s database requires first selecting the fields that contain the student’s names within the correct database table. We then search the database for year 7 students who are also girls.
Fig 5.7 shows how this is specified as a query using Microsoft Access. The screen at the top of Fig 5.7 shows a graphical representation of the structured query language (SQL) statement reproduced below the screen. In the HSC topic
“Information Systems and Databases” we examine SQL in some detail.
Consider the following:
• Blurring the edge of a line within a bitmap image.
• Reducing noise within a sampled audio file.
• Producing CMYK colour separations using a desktop publishing application.
• Kerning all AW character pairs within a desktop publisher document.
• A spreadsheet is used to determine the student with the highest mark in an exam.
Fig 5.7
Microsoft Access query to retrieve the names of all year 7 girls.
SELECT Students.Surname, Students.Name FROM Students
WHERE Students.Sex ="F" AND Students.YearLevel=7;
GROUP TASK Activity
Use an Internet search engine to perform searches that include the logical operators NOT, AND and OR. Describe the effect of each of these logical operators.
GROUP TASK Discussion
Identify and describe how searching is used as an integral part of each of the above processes. How does the organisation of the data assist the searching process?
SORTING
Analysing information processes commonly involves sorting data, either sorting into alphabetical or numerical order or even sorting into different categories. When data is sorted, it becomes easier to understand – sorting transforms data into information. For example, an unsorted catalogue of all the different products stocked by a retailer is cumbersome and therefore of limited use, however when this same data is sorted into categories and then the products within each category are sorted alphabetically the catalogue becomes useable information.
The catalogue is made easier to search;
this is often the purpose of sorting data, to improve the efficiency of searches.
All digital data of all media types is represented as binary numbers therefore, sorting digital data is ultimately performed numerically. For alphabetical sorts it is primarily the numerical binary codes, commonly an extension of the ASCII system, which are used to determine the sort order. Let us consider how both numerical and alphabetical sorts are accomplished within software applications.
Numerical sorts consider the total value of the data item; hence an ascending numerical sort, as one would expect, arranges the data from smallest negative value to highest positive value. For example, -500, -5.6, -0.001, 2, 12 and 100 are in ascending numerical order; predictably, a descending numerical sort results in this list being reversed.
Problems occur when data items contain characters that are not part of a valid number; in reality this is seldom an issue as the method of representation used for numbers does not permit invalid characters. The problem is encountered when attempting to perform a numerical sort on text data. Often to resolve the problem invalid data items are all placed at the start and then ignored, or the non-valid characters within each data item can be ignored and the remaining valid numbers sorted. Most software uses a combination of both these approaches; if the data commences with an invalid character then that data item is totally ignored, however if it commences with a valid number followed by invalid characters then the valid number forms the basis for sorting.
Alphabetical sorts compare corresponding characters from left to right; if two characters are found to be the same then the next corresponding characters are considered. For example, an ascending alphabetical sort places “Calf” before “Cat” as “l” comes before
“t” in the alphabet. Problems commonly occur when numerical data is represented as text and is then sorted alphabetically, for example sorting -500, -5.6, -0.001, 2, 12 and 100 into ascending alphabetical order will, in most software applications, produce the result -0.001, 100, 12, 2, -5.6 and -500. So what is happening? Firstly, most software applications ignore all apostrophes ’ and hyphens when sorting alphabetically, hence the data actually sorted is really 500, 5.6, 0.001, 2, 12 and 100. Ignoring all hyphens and then sorting on the first character in each data item results in -0.001, 12, 100, 2, -500 and -5.6 as 0 comes before 1, which comes before 2, which comes before 5. Now we consider the second character when the first were the same; 0 comes before 2, so 100 appears before 12. What about -500 and -5.6? Most applications sort according to the following order: punctuation and other marks first, followed by the digits 0-9, and finally the characters A-Z; hence -5.6 comes before -500.
GROUP TASK Activity
Sort the numbers 23, 13, 2, 12, 33, 300, 1,45, 6 and 19 into ascending numerical order and then into ascending alphabetical order. Repeat the process using a spreadsheet and then using a word processor. Discuss any problems encountered.
Sort
To arrange a collection of items in some specified order.
Consider the following:
Text and numeric media types are commonly sorted as part of various analysis processes, however sorting processes are seldom performed on image, audio and video data. Sorting of image, audio and video media is generally restricted to sorting various attributes of the files used for storage.
In most operating systems it is possible to sort by various attributes of files stored on various secondary storage devices. Fig 5.8 shows this facility within Explorer in Windows XP.
Consider the following:
Fig 5.8
Screenshot from Explorer within Windows XP.
GROUP TASK Discussion
Why is sorting not commonly used for analysing image, audio and video data? Discuss with reference to the organisation of these media types.
GROUP TASK Discussion
In Fig 5.8 it is possible to perform ascending or descending sorts on Name, Size, Type or Date Modified. Classify each of these different sorts as either numerical or alphabetical sorts. Justify your answers.
GROUP TASK Discussion
The sort functions used in databases, word processors and spreadsheets are implemented in different ways. Describe the differences and explain why these differences exist.
Fig 5.9
Sort functions in Microsoft Access, Word and Excel.
SET 5A
1. During analysis data moves from:
(A) RAM into secondary storage prior to analysis within the CPU.
(B) secondary storage directly to the CPU, once processed it is held in RAM.
(C) secondary storage into RAM and then to the CPU.
(D) the CPU into RAM and then onto secondary storage.
2. The analysing process:
(A) transforms data into information.
(B) does not alter the data.
(C) makes sense of data for humans.
(D) All of the above.
3. Fast chip based storage present of most hard disks is called:
(A) ascending numerical order.
(B) descending numerical order.
(C) ascending alphabetical order.
(D) descending alphabetical order.
5. Secondary storage can be considered the
‘weakest link in the chain’ because:
(A) it is significantly slower than RAM or the CPU.
(B) it is permanent storage.
(C) hard disks are more prone to failure than RAM or CPU chips.
(D) computers use secondary continuously whilst RAM and the CPU are used only when required.
6. If all other parameters are equal then a 32-bit CPU will:
(A) be half as fast as a 16 bit CPU.
(B) be double the speed of a 64 bit CPU.
(C) be half as fast as a 64 bit CPU.
(D) be four times as fast as a 16 bit CPU.
7. In regard to analysing, the most important property of RAM is:
(A) its total memory capacity.
(B) the speed at which it can deliver data.
(C) the design of the RAM module.
(D) its compatibility with the CPU.
8. When searching, each data item must be examined in sequence if:
(A) the data is sorted into an appropriate order.
(B) the data is not sorted into an appropriate order.
(C) the filed being searched has been indexed.
(D) the search includes more than one field.
9. The time taken to locate a particular file on a hard disk can be measured using:
(A) areal density.
(B) seek and latency times.
(C) spindle speed.
(D) data transfer speed.
10. All sorting performed by computers:
(A) is ultimately performing a numerical sort.
(B) uses the ASCII code of each character.
(C) ignores apostrophe and hyphen characters.
(D) examines each corresponding character commencing on from the left.
11. Describe different measures used to compare the performance of hard disks.
12. Describe the relationship between RAM, secondary storage and the CPU during a typical analysing process.
13. Do you agree with the statement: “Each of the other information processes exists primarily to support the analysing process”? Justify your response.
14. According to the syllabus hardware requirements for analysing, include “large amounts of primary and secondary storage allowing for fast processing”. Do you agree with this statement?
Discuss both yes and no arguments.
15. Internet search engines rank or sort results based on some criteria. Examine a number of popular search engines and determine the criteria being used to rank the search results.