regression analysis

(1)

PROJECT REPORT

ON

REGRESSION

ANALYSIS

[IMBA]

PRESENTED

BY

NAME: - D.SRIKANTH

ENROLL NO: - 6NI14059

(2)

Trevor Bull - Managing Director

Mr. Trevor Bull joined Tata AIG Life as Managing Director in January 2006. Prior to this, Trevor was Senior Vice President and General Manager at American International Assurance in Korea

Tate AIG Life Insurance Company Ltd. and Tata AIG General Insurance Company Ltd. (collectively "Tata AIG") are joint venture companies, formed from the Tata Group and American International Group, Inc. (AIG). Tata AIG

combines the power and integrity of the Tata Group with AIG's international expertise and financial strength. Tata Group holds 74 per cent stake in the two insurance ventures, with AIG

holding the balance 26 per cent stake.

Tata AI G Life Insurance Company Ltd. provides insurance solutions to individuals and corporate. Tata AI G Life Insurance Company was licensed to operate in India on February 12, 2001 and started operations on April, 2001. Tata AIG Life offers a broad array of life insurance coverage to both individuals and

(3)

groups, providing various types of add-ons and options on basic life products to give consumers flexibility and choice.

Tata AIG Life Insurance Company offers products in Ahmedabad, Bangalore, Chandigarh, Chennai, Guwhati,

Hyderabad, Jaipur, Jamshedpur, Jodhpur, Kochi, Kolkata, Mangalore, Muinbai, New Delhi, Pune, Rajkot, Trichi, - Vijay Wada and

Lucknow

Objective of the Study

The objective of this study is to

measure the regression analysis method used by TATA AIG in the city of Hyderabad.

Questionnaire Development

For the purpose of this study, a structured questionnaire was developed. In this stage, an exploratory study was carried out using personal and focus group interviews

Collection of Data

The above mentioned questionnaire was used to collect the primary data. For secondary data, research papers, journals and magazines were referred

.

(4)

Regression analysis

In statistics, regression analysis is a collective name for techniques for the modeling and analysis of numerical data

consisting of values of a dependent variable (also called response variable or measurement) and of one or more independent variables (also known as explanatory variables or predictors). The dependent variable in the regression equation is modeled as a function of the independent variables, corresponding parameters ("constants"), and an

error term.

The error term is treated as a random variable. It represents unexplained variation in the dependent variable. The

parameters are estimated so as to give a "best fit" of the data. Most commonly the best fit is evaluated by using the least squares method, but other criteria have also been used.

Regression can be used for prediction (including forecasting of time-series data), inference, hypothesis testing, and modeling of causal relationships. These uses of regression rely heavily on the underlying assumptions being satisfied. Regression analysis has been criticized as being misused for these purposes in many cases where the appropriate assumptions cannot be verified to hold. One factor contributing to the misuse of regression is that it can take considerably more skill to critique a model than to fit a model

(5)

Underlying assumptions

Classical assumptions for regression

analysis include:

• The sample must be representative of the population for the inference prediction.

• The error is assumed to be a random variable with a mean of zero conditional on the explanatory variables.

• The independent variables are error-free. If this is not so, modeling may be done using errors-in-variables model techniques.

• The predictors must be linearly independent, i.e. it must not be possible to express any predictor as a linear combination of the others. See Multicollinearity.

• The errors are uncorrelated, that is, the variance-covariance matrix of the errors is diagonal and each non-zero element is the variance of the error.

• The variance of the error is constant across observations (homoscedasticity). If not, weighted least squares or other methods might be used.

These are sufficient (but not all necessary) conditions for the least-squares estimator to possess desirable

properties, in particular, these assumptions imply that the parameter estimates will be unbiased, consistent, and efficient in the class of linear unbiased estimators. Many of these assumptions may be relaxed in more advanced treatments.

Regression Analysis that involves two variables is termed bi-variate linear Regression Analysis. Regression Analysis that involves more than two variables is termed as “Multiple

(6)

The Bi-variate linear Regression Analysis

involves Analyzing the straight line relationship between two continues variables the Bi-variate linear Regression can be expressed as:

Y = α + β X

Where,

Y represents the dependent variable X is independent

α and β are two constraint which are know as regression coefficient.

β is slope of coefficient

β can be symbolically represented as ∆Y/∆X α= Yi-Xiβ

β = (Yi-Yj)/ (Xi-XJ)

Least square

method

The method of least squares or ordinary least squares (OLS) is used to solve over determined systems. Least squares are often applied in statistical contexts, particularly regression analysis. Least squares can be interpreted as a method of fitting data. The best fit in the least-squares sense is that instance of the model for which the sum of squared residuals has its least value, a residual being the difference between an observed value and the value given by the

model. The method was first described by Carl Friedrich Gauss around 1794.[1]_{Least squares correspond to the maximum likelihood criterion}

(7)

derived as a method of moments estimator. Regression analysis is available in most statistical software packages.

The relationship between the amount spent on advertisement per month & number of customer visited because of advertisement given by TATA AIG Life Insurance Co.

The equation for regression line assume by least square is shown below Y=a+bX+ci Where, Y is dependent variable X is independent variable a is a Y intersect b is a slope of line

The below table shows the amount spent on advertisement & number of customer visited through advertisement.

(8)

The constant b can be calculated using formula b=m∑ (XY)-∑X ∑Y/n ∑(X2_)-(∑X) 2

X is dependent variable Y is independent variable

a is calculated as shown below:

a = Ῡ-b

Where,

Ῡ = the mean of value of dependent variable

=

 the mean of value of independent variable ei= is the error. It is called as residual value.

The criterion for the least squar method is given below.

AMOUNT SPENT ON ADVERTISING (IN CRORES)[X] N.O OF CUSTOMERS VISITED (IN 000’S) [Y] JAN 3.6 9.3 FEB 4.8 10.2 MAR 2.4 9.7 APR 7.2 11.5 MAY 6.9 12 JUN 8.4 14.2 JUL 10.7 18.6 AUG 11.2 28.4 SEP 6.1 13.2 OCT 7.9 10.8 NOV 9.5 22.7 DEC 5.4 12.3

(9)

Σ e2 i i=1 Where _e i = Yi Ŷi _Y

i is the actual value of the

Dependent variable

Ŷi is the value lying on the

Estimated regression line.

Let a solve the example previously discussed

using the least square method.

We need to determine the constant a&b to develop the regression equation. The required

calculation for determining the constant are shown in table AMOUNT SPENT ON ADVERTISING (IN CRORES)[X] N.O OF CUSTOMERS VISITED (IN 000’S) [Y] XY X2 3.6 9.3 33.48 12.96 4.8 10.2 48.96 23.04 2.4 9.7 23.28 5.76

(10)

7.2 11.5 82.8 51.84 6.9 12 82.8 47.61 8.4 14.2 119.28 70.56 10.7 18.6 199.02 114.49 11.2 28.4 318.08 125.44 6.1 13.2 80.52 37.21 7.9 10.8 85.32 62.41 9.5 22.7 215.65 90.25 5.4 12.3 66.42 29.16

Σx=84.1 ΣY=172.9 ΣXY=1355.61 ΣXY=1355.61

b = 12(1355.61)-

(84.1)(172.9)/12(670.73)-(84.1)2

= 1.768

The step is to calculate “a”

To calculate the value of small “a” we need to first determine the mean of value of variable X&Y

= 84.1/12  =7.0 Ῡ = 172.9/12 =14.40

Substituting the value in equation

a = 14.40-(1.768)(7) = 14.40-12.39

(11)

= 2.01

We know develop the estimated regression equation by substituting the value of a & b in equations

Ŷ = 2.01+1.768X

Ŷ represents the estimated value of dependent variable for a given value of X

The Strength of Association – R2

R2_{can be calculated using the following formula:}

R2

= explained variance/total variance

Total variance = explained variance – unexplained variance

Explained variance = total variance – unexplained variance

Therefore

R2

= total variance – unexplained variance/total variance

R2 _{= 1-unexplained variance/total variance}

The unexplained variance is given by Σ(Yi – Ŷ) 2

The total variance by Σ(Yi - Ῡ) 2

R2_{= 1-Σ(Y} i – Ŷ) 2 / Σ(Yi - Ῡ) 2 X Y XY X 2 _Ŷ _{Y- Ŷ (Y- Ŷ)} 2 _(Ŷ- Ῡ) 2 (Y- Ῡ) 2

(12)

3.6 9.3 33.48 12.96 8.37 48 0.925 2 0.85599 504 36.30 304 26.01 4.8 10.2 48.96 23.04 10.4 964 -0.296 4 0.08785 296 15.23 809 17.64 2.4 9.7 23.28 5.76 6.25 32 3.4468 11.88043024 66.37035 22.09 7.2 11.5 82.8 51.84 14.7 396 -3.239 6 10.4950 0816 0.115 328 8.41 6.9 12 82.8 47.61 14.2 092 -2.209 2 4.88056 464 0.036 405 5.76 8.4 14.2 119.28 70.56 16.8 612 -2.661 2 7.08198 544 6.057 505 0.04 10.7 18.6 199.02 114.49 20.9 276 -2.327 6 5.41772 176 42.60 956 17.64 11.2 28.4 318.08 125.44 21.8 116 6.5884 43.40701456 54.93181 196 6.1 13.2 80.52 37.21 12.7 948 0.405 2 0.16418 704 2.576 667 1.44 7.9 10.8 85.32 62.41 15.9 772 -5.177 2 26.8033 9984 2.48756 12.96 9.5 22.7 215.65 90.25 18.8 06 3.894 15.1632 36 19.41 284 68.89 5.4 12.3 66.42 29.16 11.5 14 0.786 0.61779 6 8.328 996 4.41 Σx= 84.1  =7. 0 ΣY= 172. 9 Ῡ =14. ΣXY=1 355.61 ΣXY=1355.61 Σ (Y- Ŷ) 2 =126. 855 Σ (Ŷ- Ῡ) 2 =25 4.4 Σ (Y- Ῡ) 2 =38 1.29

(13)

40 682 Therefore R2_{= 1- (Y} i – Ŷ) 2 / Σ(Yi - Ῡ) 2 _{= 1- 126.885/381.29} = 1- 0.33 = 0.67 = 67%

Conclusion

This implies that of the total variation of Y, nearly 67% is explain by the variation in X.

Hence there is strong linear relationship between the two variables.