• No results found

TTwo N wo Numeric Variables umeric Variables

Therelationshiprelationship between two numeric random variables can be examined graphically by plotting their values on a set of axes.

The graphs that are useful to display the relationship between two numeric random variables are: ascatter plotscatter plot, atrendline graphtrendline graph and aLorenz curveLorenz curve. Each graph addresses a different type of management question.

Scatter Plot

Ascatter plotscatter plot displays the data points oftwo numeric variables on anx – y graph.

A visual inspection of a scatter plot will show the nature of a relationship between the two variables in terms of itsstrength (the closeness of the points), itsshape (linear or curved), its direction (direct or inverse) and anyoutliers (extreme data values).

For example, a plot of advertising expenditure (on the x-axis) against sales (on the y-axis) could show what relationship, if any, exists between advertising expenditure and sales. Another example is to examine what influence training hours (on thex-axis) could have on worker output (on they-axis).

Follow these steps to construct a scatter plot:

Label the horizontal axis (x-axis) with the name of the influencing variable (called the independent variable

independent variable,x).

Label the vertical axis (y-axis) with the name of the variable being influenced (called the dependent variable

dependent variable,y).

Plot each pair of data values (x;y) from the two numeric variables as coordinates on an x – y graph.

Example 2.6 Grocery Shoppers Survey – Amount Spent by Number of Store Visits Example 2.6 Grocery Shoppers Survey – Amount Spent by Number of Store Visits

Refer to the dataset in Table 2.1.

Construct a scatter plot for the amount spent on groceries and the number of visits to the grocery store per shopper by the sample of 30 shoppers surveyed.

Chapter 2 – Summarising Data: Summary Tables and Graphs

Management Questions Management Questions

By inspection of the scatter plot, describe the nature of the relationship between the number of visits and amount spent.

Solution Solution

To construct the scatter plot, we need to define the x and y variables. Since the number of visits is assumed to influence the amount spent on groceries in a month, let x = number of visits andy = amount spent.

On a set of axes, plot each pair of data values for each shopper. For example, for shopper 1, plotx = 3 visits againsty = R946; for shopper 2, plotx = 5 visits against y = R1 842.

The results of the scatter plot are shown in Figure 2.8.

2 500

Number of visits to store A m o u n t s p e n t

Figure Figure 2.82.8 Scatter plot – monthly amount spent on groceries against number of visits Management Interpretation

Management Interpretation

There is a moderate, positive linear relationship between the number of visits to a grocery store in a month and the total amount spent on groceries last month per shopper. The more frequent the visits, the larger the grocery bill for the month. There is only one possible outlier – shopper 13, who spent R2 136 over four visits.

Trendline Graph

Atrendline graphtrendline graph plots the values of anumeric random variable overtime.

Such data is calledtime series datatime series data. Thex-variable istime and they-variable is anumeric measure of interest to a manager (such as turnover, unit cost of production, absenteeism or share prices).

Follow these steps to construct a trendline graph:

The horizontal axis (x-axis) represents the consecutive time periods.

The values of the numeric random variable are plotted on the vertical (y-axis) opposite their time period.

The consecutive points are joined to form a trendline.

Applied Business Statistics

Trendline graphs are commonly used toidentify and track trends in time series data.

Example 2.7 Factory Absenteeism Levels Study Example 2.7 Factory Absenteeism Levels Study

Refer to the time series data in Table 2.10 on weekly absenteeism levels at a car manufacturing plant.

(SeeExcel file C2.2 – factory absenteeism.)

TTable able 2.102.10 Data on employee-days absent for a car manufacturing plant

Week

Produce a trendline plot of the weekly absenteeism levels (number of employee-days absent) for this car manufacturing plant over a period of 32 weeks.

Management Question Management Question

By an inspection of the trendline graph, describe the trend in weekly absenteeism levels within this car manufacturing plant over the past 32 weeks.

Solution Solution

To plot the trendline, plot the weeks (x = 1, 2, 3, …, 32) on thex-axis. For each week, plot the corresponding employee-days absent on they-axis. After plotting all 32y-values, join the points to produce the trendline graph as shown in Figure 2.9.

120

Figure Figure 2.92.9 Trendline graph for weekly absenteeism levels – car manufacturing plant Management Interpretation

Management Interpretation

Over the past 32 weeks there has been a modest increase in absenteeism, with an upturn occurring in more recent weeks. A distinct ‘monthly’ pattern exists, with absenteeism in each month generally low in weeks one and two, peaking in week three and declining moderately in week four.

Chapter 2 – Summarising Data: Summary Tables and Graphs

Lorenz Curve

ALorenz curveLorenz curve plots thecumulative frequency distributions (ogives) oftwo numeric random variables against each other. Its purpose is to show thedegree of inequality between the values of the two variables.

For example, the Lorenz curve can be used to show the relationship between:

the value of inventories against the volume of inventories held by an organisation the spread of the total salary bill amongst the number of employees in a company the concentration of total assets amongst the number of companies in an industry the spread of the taxation burden amongst the total number of taxpayers.

A Lorenz curve shows what percentage of one numeric measure (such as inventory value, total salaries, total assets or total taxation) is accounted for by given percentages of the other numeric measure (such as volume of inventory, number of employees, number of companies or number of taxpayers). The degree of concentration or distortion can be clearly illustrated by a Lorenz curve. It is commonly used as a measure of social/economic inequality. It was srcinally developed by M Lorenz (1905) to represent the distribution of income amongst households.

Follow these steps to construct a Lorenz curve:

Identify intervals (similar to a histogram) for they-variable, for which the distribution across a population is being examined (e.g. salaries across employees).

Calculate thetotal value of the y-variable per interval (total value of salaries paid to all employees earning less than R1 000 per month; total value of salaries paid to all employees earning between R1 001 and R2 000 per month; etc.).

Calculate thetotal number of objects (e.g. employees, households or taxpayers) that fall within each interval of the y-variable (number of employees earning less than R1 000 per month; number of employees earning between R1 001 and R2 000 per month; etc.).

Derive the cumulative frequency percentages for each of the two distributions above.

Scale each axis (x andy) from 0% to 100%.

For each interval of the y-variable, plot each pair of cumulative frequency percentages on the axes and join the coordinates (similar to a scatter plot).

If the distributions are similar or equal, the Lorenz curve will be result in a 45° line from the srcin of both axes (called the line of uniformity or the line of equal distribution). The more unequal the two distributions, the more bent (concave or convex) the curve becomes.

A Lorenz curve always starts at coordinate (0%; 0%) and ends at coordinate (100%; 100%).

Example 2.8 Savings Balances versus Number of Savers Study Example 2.8 Savings Balances versus Number of Savers Study

A bank wished to analyse the value of savings account balances against the number of savings accounts of a sample of 64 bank clients.

The two numeric frequency distributions and their respective percentage ogives (for the value of savings balances and number of savings accounts) are given in Table 2.11.

Applied Business Statistics

TTable able 2.112.11 Percentage cumulative frequency distributions of savings balances across savers

Savings

Calculate the Lorenz curve of savings account balances against the number of savings accounts (savers).

Management Question Management Question

Are there equal proportions of savers across all levels of saving accounts balances?

Comment by inspecting the pattern of the Lorenz curve.

Solution

Figure 2.10 Lorenz curve of distribution of savings balances across savers Management Interpretation

Management Interpretation

The diagonal (45°) line shows an equal distribution of total savings across all savers (e.g. 60% of savings account clients hold 60% of the total value of savings).

Chapter 2 – Summarising Data: Summary Tables and Graphs

In this example, an unequal distribution is evident. For example, almost half (47%) of all savers hold only 18% of total savings. At the top end of the Lorenz curve, it can be seen that the biggest 5% of savings accounts represent 22% of total savings. Overall, this bank has a large number of small savers, and a few large savers.