Using the 2012 Smoking and Drug Use Amongst English Pupils Dataset (2012smokedrugs.dta), perform diagnostics on the second cigarette consumption multiple linear regression model. To remind you, the outcome variable was cigs7, but recoded to remove all 0s, and the predictor variables were free, schyear, and sex.
Normality Diagnostics:
(a) Create a histogram of the residuals to test for non-normality.
(b) Create a Q-Q plot to test for non-normality.
(c) Perform a Shapiro-Wilk Normality test.
(d) If you find non-normality in the residuals, try to find a solution using the techniques.
x11()
hist(model.1$residuals,xlab="Residuals",main="")

Definitely not normally distributed.
b.
x11()
qqnorm(model.1$residuals)
qqline(model.1$residuals,col="red")

Definitely not normally distributed.
c.
shapiro.test(model.1$residuals)
Shapiro-Wilk normality test
data: model.1$residuals
W = 0.8451, p-value < 2.2e-16
Since the p-value is below .05, we violate the normality assumption.
d.
drugs$cigs7b <- drugs$cigs7a + 1
summary(powerTransform(drugs$cigs7b))
bcPower Transformation to Normality
Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
drugs$cigs7b 0.0063 0 -0.0827 0.0952
Likelihood ratio test that transformation parameter is equal to 0
(log transformation)
LRT df pval
LR test, lambda = (0) 0.01904637 1 0.89023
Likelihood ratio test that no transformation is needed
LRT df pval
LR test, lambda = (1) 441.0524 1 < 2.22e-16
The LR test says that we should transform the outcome variable and the suggested transformation is to raise
it to .0063.
model.1a <- lm(I(cigs7a^.0063) ~ free + schyear + sex, data=drugs)
summary(model.1a)
Call:
lm(formula = I(cigs7a^0.0063) ~ free + schyear + sex, data = drugs)
Residuals:
Min 1Q Median 3Q Max
-0.0179023 -0.0072318 0.0005963 0.0084505 0.0188354
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.0088510 0.0024222 416.494 <2e-16 ***
free 0.0016630 0.0010939 1.520 0.1292
schyear 0.0013119 0.0005281 2.484 0.0134 *
sex 0.0008289 0.0009281 0.893 0.3723
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.009328 on 404 degrees of freedom
(7181 observations deleted due to missingness)
Multiple R-squared: 0.01931, Adjusted R-squared: 0.01203
F-statistic: 2.652 on 3 and 404 DF, p-value: 0.0484
As we discussed in the chapter, transforming the outcome variable in a non-intuitive way makes it difficult to
interpret the coefficients. Therefore, we may be better off leaving the outcome variable in its original form.
You might also like to view...
A symptom of Secondary Trauma Syndrome is burnout
a. True b. False Indicate whether the statement is true or false
Compare and contrast income and wealth. Then compare and contrast absolute and relative poverty.
What will be an ideal response?
Social work educational programs are the same across the globe
Indicate whether the statement is true or false
An individual that who is uncomfortable with how he feels about himself related to his gender is said to be experiencing:
A. Gender dysphoria B. Ego- systonic C. Gender dystoria D. An anxiety disorder