Mini-Assignment 10

Directions

Familiarize yourself with the codebook for the movies dataset below and then import/load the dataset.

Question 1: Suppose you want to determine whether movie budget is significantly associated with the movie rating. Construct the appropriate regression. Is budget significantly associated with movie rating? Interpret the term in the model that describes the relationship.

Question 2: Suppose you want to determine whether movie budget is significantly associated with movie rating after controlling for MPAA-designation. Construct the appropriate regression. Is budget significantly associated with movie rating when controlling for MPAA-designation?

Question 3: Using your model from the previous question, are NC-17 movies rated significantly differently than R-rated movies when controlling for budget?

Question 4: There is reason to believe that the relationship between budget and viewer rating may vary differently based on whether the movie is a Comedy. Construct an appropriate graph that allows you to assess this theory. Does it visually appear that Comedy-status moderates the relationship between budget and viewer rating?

Question 5: Create a subset that includes only Comedies. With this subset, construct a model that determines whether there is a relationship between budget and viewer rating. Among comedies, is there a relationship between budget and viewer rating?

Question 6: Create a subset that includes only non-Comedies. With this subset, construct a model that determines whether there is a relationship between budget and viewer rating. Among non-comedies, is there a relationship between budget and viewer rating?

Question 7: Does the relationship between budget and viewer rating vary based on whether a movie is a comedy?

Question 8: Construct a model using the whole data set which assess whether the relationship between budget and viewer rating varies significantly based on whether a movie is a Comedy. Does the relationship between budget and viewer rating significantly vary based on whether a movie is a comedy according to your one regression equation?

Familiarize yourself with the codebook for the nhanes dataset below and then import/load the dataset.

Question 9: Construct a new variable “BMI category” based on the quantitative BMI variable in the data set. The categories of your new variable should be: 

  • “Underweight”: for BMI’s under 18.5
  • “Healthy weight”: for BMI’s 18.5-24.9
  • “Overweight”: for BMI’s 25-29.9
  • “Obese”: for BMI’s 30-39.9
  • “Morbidly obese”: for BMI’s 40

Suppose you want to determine whether BMI category(the explanatory variable) is associated with the likelihood of having  diabetes (the response variable). Construct the appropriate visualization to help you assess this relationship. Describe the visual relationship between BMI categorization and diabetes.

Question 10: Suppose you want to determine whether BMI category is significantly associated with diabetes. Construct the appropriate regression. The model estimates that the odds of having diabetes are ______ times higher for those who are morbidly obese compared to those of a healthy weight.

Question 11: Now suppose you want to determine whether the quantitative version of BMI is significantly and positively associated with the likelihood of diabetes. Construct the appropriate regression. Describe the relationship between BMI and diabetes.

Question 12: There is reason to believe that the relationship between BMI and diabetes varies based on gender. Construct the appropriate regression that allows you to address this question. Does the relationship between BMI and diabetes vary significantly based on gender?


CODEBOOK: Movies Data

The internet movie database, http://imdb.com/, is a website devoted to collecting movie data supplied by studios and fans. It claims to be the biggest movie database on the web and is run by amazon. More about information imdb.com can be found online,http://imdb.com/help/show_leaf?about, including information about the data collection process,http://imdb.com/help/show_leaf?infosource.

The description of the data is as follows:

  • title. Title of the movie.
  • year. Year of release.
  • budget_millions. Total budget (if known) in US dollars
  • length. Length in minutes.
  • rating. Average IMDB user rating.
  • votes. Number of IMDB users who rated this movie.
  • r1-10. Multiplying by ten gives percentile (to nearest 10%) of users who rated this movie a 1.
  • mpaa. MPAA designation.
  • Action, Animation, Comedy, Drama, Documentary, Romance, Short. Binary variables representing if movie was classified as belonging to that genre.

CODEBOOK: NHANES

This is survey data collected by the US National Center for Health Statistics (NCHS) which has conducted a series of health and nutrition surveys over the years.

The variables in this data set include:

Variable NameDescription
IDUnique identifier
Gendermale/female
AgeAge of participant (in years)
RaceBlack, Hispanic, Mexican, White, Other
EducationHighest level of education 
MaritalStatusDivorced, Never Married, Married, Separated, Widowed
TestosteroneTesterone total (ng/dL).
GeneralHealthSelf-reported overall health (Poor, Fair, Good, Vgood, Excellent)
BMIBody mass index
PhysActiveDaysNumber of days in a typical week that participant does vigorous-intensity activity.
DiabetesIndicates whether or not someone has diabetes (1)=diabetes, (0)=no diabetes
BP_Sys_Reading1Systolic blood pressure (mm Hg) at beginning of appointment
 BP_Sys_Reading2 Systolic blood pressure (mm Hg) at end of appointment
 Work Indicates whether or not someone is working (1)=working, (0)=not working
 TotChol Total HDL cholesterol (mmol/L)