Introduction
Relationship between education spending and test scores
The correlation is negative (-0.2). The United States spends in education the second most of any country, and has below average test-scores. Ethnically homogeneous Japan, South Korea and Finland spend at average rates and have the best test scores. Tiny, ethnically homogeneous and "hungry" Estonia spends less than half as much as the United States and Norway on education but has far better test scores. Source: Economy Industry USA View
The Organization for Economic Co-operation and Development
(OECD)
released the results of it s 2009 global rankings
on student performance in mathematics, reading, and science, on
the Program for
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Relationship between per pupil spending and mean math scores in PISA 2012, by country
The figure shows the simple correlation between the mean scores in mathematics and the expenditure per pupil in secondary education for each of the countries that participated in PISA 2012. It is easy to see that students in countries like Qatar and Singapore spend similar amounts of Dollars per Student, achieving very different PISA math scores.
The Organization for Economic
Co-operation and Development (OECD)
released the results o f its 2012 global ranki ngs
on student performance in mathematics,
reading, and science, on the Program for International Student Assessment,
Ranking of top countries in math, reading, and science is out — and the US didn't crack the top 10
Source: OECD. China is represented by the
provinces of Beijing, Shanghai, Jiangsu, and Guangdong.
The PISA is a worldwide exam administered every three years that measures 15-year-olds
in 72 countries.
About 540,000 students took the exam in 2015.. Asian countries topped the rankings across all
PISA tests: Singapore top in global education rankings/2015
"If you think maths is a hard subject you won't succeed," 10-year-old Hai Yang tells me. striking feature of Singapore's education:
*The whole class has just been working on a problem, taking it in turns to stand up and explain how they worked it out. And they do this in English, one of several languages spoken in Singapore. It turns out there is more than one way to reach the right solution.
*What is impressive is their commitment to understanding exactly how to do it.
"If we just blindly look at the teacher's answer, when we grow up we might not know how to do it any more“ *Building blocks
This is an approach known as maths mastery which some schools in the UK have begun using in an adapted form.
*"We believe in Singapore in the fundamentals, that in order for a child to be well educated you need to give them the fundamental language and grammar in various disciplines, a language where you can read, a language where you can understand numbers." S
=ingapore has also thought a lot about how to make teaching a rewarding profession.
*Teachers can follow a career path that takes them towards being a principal, a researcher into education or a master classroom teacher. They get time to deepen their knowledge and prepare lessons.
*In Montfort Secondary school they are encouraging the teenage boys to make prototype products, ranging from a smart garden watering system to an electronic keyboard.
*Using your science and maths skills to solve real world problems is exactly the kind of ability the PISA tests are intended to measure. An empty room at the school is being turned into what they call a "makers lab". *Simple tools and materials will be available for the pupils to use in their spare time to make things to take home. If they want to work out how to light up their guitar with LED lights, this is where they can do it.
*Another striking feature of Singapore's education is that head teachers are rotated between schools every six to eight years. There is also an increasing emphasis on collaboration.
*"Today teachers work in teams, they grow together, they research together, they work together." High stakes
The Objective of Correlation and Regression
The objective for correlation is to establish the relationship between two or more quantitative variables without being able to infer causal relationships, and for
regression analysis is to establish a mathematical model to estimate the value of a variable based on the value of the other variables. This technique is appropriate when:
A mathematical function or equation linking two metric-scaled (interval or ratio) variables is to be constructed, under the assumption that values of one of the two variables is dependent on the values of the other.
Logistic regression analysis is used to examine relationships between variables when the dependent variable is nominal, even though independent variables are nominal, ordinal, interval, or some mixture thereof.
Suppose that one wanted to determine which program interventions were associated with a JOBS Program client's ability to get a job within six months of exiting the program. The outcome variable would be "job" or "no job” clearly a nominal variable. One could then use several independent variables such as job training, post-secondary education and the like to predict the odds of getting a job.
Multiple Regression Analysis Technique this technique is appropriate
Methodology
To perform a regression analysis and correlation is advisable to follow the following steps:
1. Collecting data from sources such as questionnaires, forms or databases, texts, brochures, magazines, internet, direct measurements, etc.
2. Draw the scatter diagram, which suggests that model could be used, is a graph showing the intensity and direction of the relationship between two variables. Only up to three-dimensional planes are best seen models suggested. This question is important: Does the relationship appear to be linear or curved? 3. Calculate the values of the correlation coefficient and the coefficient of determination (note: correlation coefficient measures the percentage of linear association between variables and coefficients of determination measures the percentage of variability of the dependent variable explained by the independent variable).
4. Set the model suggests the scatter diagram or suggested by the experience of the investigator.
5. Estimate the regression line using a processing program with statistical applications (Excel, SPSS, Statgraphics, Minitab, SAS, Statistics, etc.) or by formulas.
Techniques for Examining Associations
Spearman Correlation
The technique is appropriate
when:
The degree of association
between two sets of ranks
(pertaining to two variables) is
to be examined.
Illustrative research question(s)
this technique can answer
“Is
there a significant relationship
between motivation levels of
teachers and the quality of
their performance?“
Assume that the data on motivation and quality of performance are in the form of ranks, say, 1 through 50, for 50 teachers who were evaluated
subjectively by their administrators on each variable.
Pearson Correlation
This technique is appropriate
When:
The degree of association
between two metric-scaled
(interval or ratio) variables is to
be examined.
Illustrative research
question(s) this technique
can answer
“
Is there a
significant relationship between
parents' age (measured in
actual years) and their
perceptions of the school's
Spearman Rank Coefficient (r
s
)
• Used for non-linear relationships
• It is a non-parametric measure of correlation.
• This procedure makes use of the two sets of ranks that
may be assigned to the sample values of x and Y.
• Spearman Rank correlation coefficient could be
computed in the following cases:
Both variables are quantitative.
Both variables are qualitative ordinal.
One variable is quantitative and the other is
qualitative ordinal.
• The value of r
sdenotes the magnitude and nature of
Spearman Correlation
Example: Quality of life
Fourteen cities have been rated on an index that measures the quality of life.
Also, the percentage of the population that has moved into each city over the
past year has been determined. Have cities with higher quality of life scores
attracted more new residents?
Association between quality of life and percentage of new residents
City
Quality of life
Percentage of New Residents
A 25 5
B 10 4
C 15 3
D 30 6
E 20 3
F 25 9
G 10 5
H 15 3
I 30 7
J 20 8
K 15 5
L 17 6
M 20 7
Steps in SPSS for Spearman correlation
OUTPUT DATA – Spearman correlation
Correlations
Quality of Life
Percentage of New Residents Spearman's
rho
Quality of LifeCorrelation
Coefficient 1.000 .586* Sig. (2-tailed) .028
N 14 14
Percentage of New
Residents
Correlation
Coefficient .586* 1.000 Sig. (2-tailed) .028
N 14 14
*. Correlation is significant at the 0.05 level (2-tailed).
These variables have a moderate direct or positive association. The
moderate of quality of life score is relate with the moderate the
percentage of new residents. The value of r
2is (0.5862
2=0.3434), which
Simple Correlation (r) Pearson
It is also called Pearson's correlation or product moment correlation coefficient. It measures the direction (the sign denotes the direction) and strength (the value of
r denotes the strength of association) between two variables of the quantitative variables.
Direct or positive, if the values of the two variables deviate in the same direction i.e. if an increase (or decrease) in the values of one variable results, on average, in a corresponding increase (or decrease) in the values of the other variable the correlation is said to be direct or positive. Examples:
•Student’s performance and number of hours studied •Satisfaction and loyalty at work.
Inverse or negative, if the variables deviate in opposite direction i.e. if increase in the values of one variable results on average, in corresponding decrease in the values of other variable. Examples:
•TV viewing and class grades-students who spend more time watching TV tend to have lower grades (or phrased as students with higher grades tend to spend less time watching TV)
Pearson Correlation
-1
-0.75 -0.250
0.25 0.751
strong
moderate
weak
weak
moderate
strong
no relation Inverse perfect
correlation
Direct
inverse
Direct perfect correlation
The value of “r” ranges between ( -1) and ( +1)
The value of “r” denotes the strength of the association
as illustrated by the following diagram. If r = 0 or close to
Zero this means no association or correlation between
the two variables.
Steps for Hypothesis Testing for
ρ
Step 1: Hypotheses. We specify the null and alternative hypotheses:
Null hypothesis Ho: ρ = 0 (there is no association between performance and the time that usually wake-up)
Alternative hypothesis Ha: ρ ≠ 0 (There is an association between them) Step 2: Test Statistic
2 2 22 ( ) ( )
) ( ) ( y y n x x n y x xy n r y x xy xy
s
s
s
r
or
Step 3: Sig. (P-Value), we use the resulting test statistic to calculate the Sig. (P -value). Sig. (significance level) of the correlation can be determined : by using the correlation coefficient table for the degrees of freedom:
df=n−2, where n is the number of observation in x and y variables. Step 4: Make a decision:
If Sig. (P-value) is smaller than the significance level α, we reject the null
hypothesis in favor of the alternative. We conclude "there is sufficient evidence at the α level to conclude that there is a linear relationship in the population between the predictor x and response y."
Example
A sample of 12 students was selected, data about their performance and
the time that usually wake-up was recorded as shown in the following
table . It is required to find the correlation between performance and the
time that student usually wakes up.
Student
Wake-up
Time
Academic
Performance
Kalisa
5.30
13.0
Seraphine
10.00
9.0
Manasse
8.00
13.0
Odette
9.00
11.0
Laurence
6.00
16.0
Pascal
7.00
10.0
Gallican
7.30
13.0
Marcel
6.00
11.0
Sandrine
5.00
14.0
Acqueline
9.30
10.0
Judith
5.30
16.5
Innoncent
7.30
12.0
Hypothesis
Ho: ρ = 0 (there is no association between performance and the time that usually wake-up)
Ha:
There is an association between them
0
Steps in SPSS
Again to perform a correlation and regression analysis is advisable to
follow the following steps:
Step 1: Scatter Diagram (
After collecting the data, draw the scatter
diagram)
The starting point is to draw a scatter of points on a graph, with one
variable on the X-axis and the other variable on the Y-axis; it is
customary represent the dependent variable on the vertical axis and
independent on the horizontal axis. When studying the relationship
between two variables, one can be considered as cause and the other
as a result or effect of the other. Call the exogenous or independent
variable that causes, the effect is the endogenous variable. The scatter
plots or diagrams give an idea of the relationship (if any) between the
variables as suggested by the data. The closer the points of a straight
line are, the stronger the linear relationship between two variables will
be.
Steps and Output of scatter dot
Step 2. Correlation
OUTPUT - Correlation
Correlations Wake
up-Time
Academic performance Wake up-Time Pearson
Correlation 1 -.720** Sig.
(2-tailed) .008
N 12 12
Academic performance
Pearson
Correlation -.720** 1 Sig.
(2-tailed) .008
N 12 12
**. Correlation is significant at the 0.01 level (2-tailed).
Statistic Test: r = -.720. These variables have a strong inverse association. Sig.=.008
Decision and interpretation: We reject Null hypothesis, so we conclude "there is sufficient evidence at the 5% of level to conclude that there is a linear inverse relationship in the population between the predictor ‘wake up’ and response ‘academic performance’ i.e., the wake-up time is relate with the academic performance. Sig.=.008, means there is a strong inverse relationship between the time that students wake-up and their performance (the meaning is, later get up less score)
Coefficient of determination is the percentage of variation in the dependent variable ‘Y’ explained by the independent variable ‘X’.
How well does this line fit the data?
The value of r2 =(-.720)2=0.5184, 51.84 ≈ 52%
The 'goodness of fit' indicates the percentage of the variation in performance which is accounted for by the variation of the wake-up time; in other hands 52% of the variance in performance is explained by the time that students wake up.
Example
Country % Immunization Mortality_rate
Bolivia 77 118
Brasil 69 65
Cambodia 32 184
Canadá 85 8
China 94 43
Czech_Republic 99 12
Egypt 89 55
Ethiopia 13 208
Finland 95 7
France 95 9
Greece 54 9
India 89 124
Italy 95 10
Japan 87 6
México 91 33
Poland 98 16
Russian_federation 73 32
Senegal 47 145
Turkey 76 87
United_Kingdom 90 9
A study was conducted to find whether there is any relationship between the mortality rate and percentage of the immunization in some countries of the world. The following set of data was found in the page "http://www.unicef.org/statistics/". Let us determine is there relationship for this set of data. The first column represents the countries and the second and third columns represent the % of immunization and mortality rate of each country.
Steps in SPSS for draw Scatter diagram
Graphs>Chart builder>OK>front the variable box, take the variable immunization to “x-axis” and Rate_mortality to “y-“x-axis” and click in Group Point ID> take the variable country to the Point ID>OK
1
3
4
5
OK
Step 3. Regression Analysis
Scatter diagram of the mortality rate by % immunization with regression line inserted in some countries in the world
Steps in SPSS for Regression
Analyze >Regression Linear>
1
2
3
4
5
6
Interpretation from outcome of SPSS
•Checking the Model Fit
Model Summary
Model R R Square
Adjusted R Square
Std. Error of the Estimate
1 .791a .626 .605 40.13931
a. Predictors: (Constant), Immunization %
The model summary table reports the strength of the relationship between the model and the dependent variable. “R=.791”, correlation coefficient, is the linear correlation between the observed and model-predicted values of the dependent variable. Its large value indicates a strong relationship.
R Square = .626, the coefficient of determination, is the squared value correlation coefficient. It shows that about 62.6% the variation in mortality is explained by the model.
ANOVAa
Model
Sum of
Squares df
Mean
Square F Sig.
1 Regression 48497.050 1 48497.05 30.101 .000b
Residual 29000.950 18 1611.16
Total 77498.000 19
a. Dependent Variable: Mortality_rate b. Predictors: (Constant), Immunization %
The significance value of the F statistic (.000) is less than 0.05, which
means that the
variation explained by the model is not due to chance.
Checking the coefficients of the regression line
(parameter estimates)
This table shows the coefficients of the regression line:
•The first variable (constant) represents the constant, also referred to as the point to intercept the regression line when it crosses the Y axis. In other words this is the predicted value of mortality when all other variables are 0.
•The second, these are the values for the regression equation for predicting the dependent variable from the independent variable.
Coefficientsa
Model
Unstandardized Coefficients
Standardized Coefficients
t Sig.
B Std. Error Beta
1 (Constant) 224.316 31.440 7.135 .000
Immunization
% -2.136 .389 -.791 -5.486 .000
a. Dependent Variable: Mortality_rate
The regression equation can be presented in many different ways, for example:
Mortality predicted= 224.316 - 2.136* % of immunization
= 224.316 average mortality rate without any influence of the % of immunization (constant source).
= - 2.136 decreased mortality rate for each % of immunization as indicated nonzero correlation (slope of the line)1
0
Prediction of Mortality Rate
What rate of mortality could be predicted for the group of countries with
80% immunization?
The best estimate of the mortality is obtained by substituting the value of
80% for that of the independent variable, x, and calculating the
corresponding value of the Mortality.
Estimated Mortality:
mortality
of
rate
X
Y
224
.
316
2
.
136
224
.
316
2
.
136
*
80
53
.
436
53
Expected mortality would be 53 mortality rate.
With these results we conclude:
1. The variables are associated or related linearly in the population from which the sample comes (with a very small chance that the relationship found is explained by chance, less than one per thousand).
2. Found that the relationship is very good (r = - .791), in fact that the independent variable (% of immunization) explained 62.6% ( ) the variability of the dependent variable (mortality).
3. That the relationship is inverse or negative, decreasing in average mortality rate 2,136 per % increase in immunization in the countries under study.
626 .
2
Assignment 5
1. Find and interpret the relationship between Anxiety and Test Scores (follow all steps)
a. Draw and interpret the scatter diagram
b. Make a hypothesis for the correlation coefficient
c. Calculate and interpret the coefficient of determination (goodness of fit)
d. Calculate and interpret the hypothesis for regression (ANOVA). (Do the independent variables reliably predict the dependent variable?
e. Write the regression equation for the model
f. Prediction. What test score could be predicted with an anxiety level of 5.5? g. Check the assumptions about autocorrelation and normality distributed
OUTPUT FORM SPSS (
x
)
Anxiety 10 8 2 1 5 6 10 8 2 3 5 6
(
Y
) Test
score 2 3 9 7 6 5 2 4 7 7 6 4
Correlations
Test_score Anxiety Pearson
Correlation
Test_score 1.000 -.946 Anxiety -.946 1.000 Sig.
(1-tailed)
Test_score .000 Anxiety .000
N Test_score 12 12
Anxiety 12 12
Model Summaryb
Model R R Square
Adjusted R Square
Std. Error of the Estimate
Durbin-Watson 1 .946a .895 .884 .75214 3.187 a. Predictors: (Constant), Anxiety
b. Dependent Variable: Test_score
Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig.
B Std. Error Beta
1 (Consta
nt)
8.886 .458 19.385 .000
Anxiety -.676 .073 -.946 -9.212 .000
Assigment 5
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig. Standardized Residual .149 12 .200* .973 12 .937 *. This is a lower bound of the true significance.
a. Lilliefors Significance Correction
2. In a study of the relationship between level education and income the following data was obtained. Find the relationship between them and comment.
Compute the Spearman rank correlation coefficient (because 'level of education’ is categorical variable) and test it for significance at the .05 level. What conclusion may be reached?
People Level Education (x) Income (y)
A Preparatory 25
B Primary 10
C Master’s degree 8
D Secondary 10
E Bachelor degree 15
F Illiterate 50
G Postgraduate diploma 60
a. Interpret descriptive statistics such mean, standard deviation and
coefficient of variation.
b. Draw and Interpret scatter dot c. Make a hypothesis for the
Assignment 5
Descriptive Statistics
Mean
Std.
Deviation N Level_Education 4.5000 1.87083 6 Income 33.5714 24.10295 7
Correlations
Level_Educati
on Income Spearman's rhoLevel_Education Correlation
Coefficient
1.000 .928**
Sig. (2-tailed) .008
N 6 6
Income Correlation Coefficient
.928** 1.000
Sig. (2-tailed) .008
N 6 7
**. Correlation is significant at the 0.01 level (2-tailed).
Assignment 5
3. A psychologist believes that those who score high on a need-achievement test will likely have a high salary to match. To test this theory, the psychologist has given questionnaires to a random sample of 17 subjects and has ranked the data so that the highest value in each category has been assigned a 1.
Subject A B C D E F G H I J K L M N O P Q
Rank - Need
Achievement 1 8 4 10 12 2 13 6 16 11 14 3 9 7 15 17 5 Salary Rank 3 7 2 12 9 1 11 6 17 13 15 5 10 8 14 16 4
a. Compute and interpret the Spearman rank correlation coefficient and test it for significance at the .05 level.
What conclusion may be reached? b. Interpret the scatter dot
Correlations
Rank - Need Achievement
Salary Rank Spearman's
rho
Rank - Need Achievement
Correlation Coefficient
1.000 .949**
Sig. (2-tailed) .000
N 17 17
Salary Rank Correlation Coefficient
.949** 1.000
Sig. (2-tailed) .000
N 17 17
Assignment 5
,
Assignment 5
,
5. Multiple choice
5.1 The slope (B1) represents: a. Predict value of y when x=0 b. Predict value of Y
c. Change in Y per unit change in X
5.2 The Y intercept (B0) represents the: a.Change in Y per unit change in X b.Predict value of y when x=0
c.Variation around the regression line
5.3 The coefficient of determination (r2) tells you: a. The proportion of total variation that is explained b. Whether the slope has any significance
c. Whether the regression sum of squares is greater than the total sum of squares
5.4 In performing a regression analysis involving two numerical variables, you assume: a. The variance of X and Y are equal
b. That X and Y are independent c. All of the above
5.5 The residuals represent:
a.The difference between the actual Y values and the mean of Y b.The square root of the slope
Assignment 5
,
5.6 If the coefficient of correlation (r) = -1.00, then:
a. All the data points must fall exactly on a straight line with a inverse or negative slope. b. All the data points must fall exactly on a straight line with a positive slope
c.All the data points must fall exactly on a horizontal straight line with a zero slope.
5.7 Assuming a straight line (linear) relationship between X and Y, if the coefficient of correlation (r) = -0.30:
a. There is no correlation
b. Variable X is larger than variable Y c. The slope is negative
5.8 In a simple linear regression model, the coefficient of correlation and the slope: a. May have opposite signs