REGRESSION STUDY GUIDE

Simple regression:

1.Conditional distribution of Y (the dependent variable) for a given fixed value of X (independent variable) denoted by Y|X.

2.Assumptions about the conditional distributions

§They are all normal (for any value of X)

§They have constant variance σ ².

§ They have means equal to A + BX more specifically μ of Y = A + BX. Y = A + BX is also called the true model (see population Plot).

3. None of the population parameters A, B or sare known, but estimated from a sample using the least squares (see Sample1, sample2, sample3 and S1_analysis) method with a, b and s_e. Thus the estimated model (line) is Y = a + bX. With estimated common standard deviation of the conditional distributions of s_e.

4.The “degree of fit” is measured by s_ewhich by its definitionthe degree of scatter of the sample points around the estimated line. A better measure is sample coefficient of determination, r² which measures the ratio of variations in Y explained (accounted for) by variations in X—the explanatory variable. The positive square root of r², r is called the sample correlation coefficient and has the same interpretation you learned in Quant I, the degree of association between two variables.

5 Inferences (Inferences) see also reg_inf.doc using the estimated model Y = a + bX

§Since B is not known and but estimated using b we try to make inferences about B.Either testing a hypothesis such as B= 0 versus B <> 0;or B = B₀(B₀is some other number) versus B<> B₀.Or we may compute a confidence interval which contains y\the true but unknown slope of the true model, B.

§We may want to make inferences in the form of a confidence interval, about the mean of all Y values (a cross-section of the population) for some fixed X value.

§We may want to make inferences about a single occurrence of Y (not the mean of all such Y’s), in the form of confidence intervals, for some fixed X value.

Multiple regression:

Y = A + B₁X₁+ B₂X₂ + . . .– we now have more than one independent variable, everything we said about simple regression applies to multiple regression as well.

Two additional notes.

1A special inference on the overall usefulness of the estimated regression equation:

Test the null hypothesis H₀ : B₁ = B₂ = . . . . = 0 using the F-stat and the significance of the F stat (the p-value of the F-stat)

2. The problem of multicolinearity(see multicolinearity).