Using the Singapore Census dataset and SPSS or Excel, answer the following:
a. Construct a regression on the the number of people with no religion based on
the independent variables of the population above 65, the Chinese population, the
Malay population, the Indian population, the unemployment rate, the illiteracy
rate, the percent with a university degree, monthly incomes under 1k, and monthly
incomes greater than 8k. Investigate multicollinearity and outliers, and comment
on the results.
b. Attempt to improve upon the regression equation constructed in part (a), and justify
the variables you include and exclude.
A. The VIF values are extremely high, suggesting massive multicollinearity problems, and only
two are significant at a 0.05 level, namely the Chinese and Indian populations. The overall
adjusted r2 is 0.992.
B. Removing variables iteratively based on their VIF values, we exclude monthly income below
1k, then the percent with no school, the Chinese population, the Indian population, and
the population over 65 before the VIF values are all below 5. Going further, the adjusted r2
associated with a regression on the Malay population, unemployment rate, illiteracy rate, percent
with a university degree, and monthly incomes over 8k is 0.880. Excluding the statistically
insignificant variables from the regression (unemployment, illiteracy, and the percent with a
university degree), the model which explains non-religion based on the Malay population and
the set of monthly incomes over 8k has an adjusted r2 of 0.839. The parsimony of this model is a vast improvement over the kitchen sink version presented in section (a).
You might also like to view...
Cherry picking the data refers to
A) Selecting and reporting only those data that support your hypothesis. B) Using only the most reliable data. C) Fabricating data to support the hypothesis. D) Using the wrong inferential statistic.
Which of the following statements is NOT an example of experimenter expectancy?
A) participants trying to be "good participants" B) data selection to support the hypothesis C) choosing favorable statistical tests D) changing instructions from one group to another
Which of the following would most likely be found unconstitutional by a federal court?
a. a rule that requires women to pass a physical strength test to work in a warehouse b. a law that prevented Americans of Arabic descent to work in airport concessions c. a rule that disallowed gay men from trying out for female parts in a Broadway play d. a rule that allows race as a consideration when awarding scholarships e. a law that prohibits gender discrimination in programs at educational institutions
In 2002, the median income of women was $40,000, while that of men was $30,203
Indicate whether the statement is true or false