+ - 0:00:00
Notes for current slide
Notes for next slide

FundRmentals 09

Dr Danielle Evans | University of Sussex

1 / 12

Setup & Q&A

  • Open RStudio

  • Open your fundRmentals R Project (click the blue cube in the top right corner of RStudio)

  • Open a new Rmd file for today

  • In a new code chunk, load tidyverse, palmerpenguins, broom, robust, & parameters, using the library() command (install them in the console if you need to)

  • In another code chunk, create peng_data by copying the below code:

peng_data <- palmerpenguins::penguins
  • Explore the data by using View() & names() on the peng_data object (in the console!!)


Any questions from the last tutorial?

2 / 12

Overview

  • Linear model

    • 1 predictor: 'simple regression'

    • 2+ predictors: 'multiple regression'

    • Robust models

  • Next steps

3 / 12

lm() Function

  • We can use the lm() function to fit our model, the general structure is as follows:
our_model <- lm(outcome ~ predictor(s), data = data, na.action = an action)
  • Where we sub in the name we want to give our_model, the outcome, the predictor(s), the data/tibble, & any actions we want to apply to missing data (i.e., na.action = na.exclude)


Task: Fit a linear model to our peng_data, using body_mass_g as the outcome, and flipper_length_mm as our predictor, give it the name peng_model

  • We have some missing data today - make sure to exclude it by using na.action = na.exclude


4 / 12

lm() Function

  • We can use the lm() function to fit our model, the general structure is as follows:
our_model <- lm(outcome ~ predictor(s), data = data, na.action = an action)
  • Where we sub in the name we want to give our_model, the outcome, the predictor(s), the data/tibble, & any actions we want to apply to missing data (i.e., na.action = na.exclude)


Task: Fit a linear model to our peng_data, using body_mass_g as the outcome, and flipper_length_mm as our predictor, give it the name peng_model

  • We have some missing data today - make sure to exclude it by using na.action = na.exclude


Solution:

peng_model <- lm(body_mass_g ~ flipper_length_mm, data = peng_data, na.action = na.exclude)
5 / 12

Our Model

To actually see the results, we can use summary(peng_model)

summary(peng_model)
##
## Call:
## lm(formula = body_mass_g ~ flipper_length_mm, data = peng_data,
## na.action = na.exclude)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1058.80 -259.27 -26.88 247.33 1288.69
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5780.831 305.815 -18.90 <2e-16 ***
## flipper_length_mm 49.686 1.518 32.72 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 394.3 on 340 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.759, Adjusted R-squared: 0.7583
## F-statistic: 1071 on 1 and 340 DF, p-value: < 2.2e-16

Alternatively, we can use broom::tidy() & broom::glance() to look at our estimates & model fit

broom::tidy(peng_model)
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -5781. 306. -18.9 5.59e- 55
## 2 flipper_length_mm 49.7 1.52 32.7 4.37e-107


broom::glance(peng_model)
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.759 0.758 394. 1071. 4.37e-107 1 -2528. 5063. 5074.
## # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
6 / 12

Multiple Predictors

  • We can use the lm() function again to fit our model with multiple predictors, the general structure is as follows:
our_model <- lm(outcome ~ predictor_1 + predictor_2, data = data, na.action = an action)
  • Where we sub in the name we want to give our_model, the outcome, the predictors with a +, the data/tibble, & any actions we want to apply to missing data (i.e., na.action = na.exclude)


Task 1: Fit a linear model to our peng_data, using body_mass_g as the outcome, flipper_length_mm & bill_length_mm as the predictors, give it the name peng_model_2

  • We have some missing data today - make sure to exclude it by using na.action = na.exclude


Task 2: view the results in the same way as before

7 / 12

Model solution:

peng_model_2 <- lm(body_mass_g ~ flipper_length_mm + bill_length_mm, data = peng_data, na.action = na.exclude)

Results:

summary(peng_model_2)
##
## Call:
## lm(formula = body_mass_g ~ flipper_length_mm + bill_length_mm,
## data = peng_data, na.action = na.exclude)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1090.5 -285.7 -32.1 244.2 1287.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5736.897 307.959 -18.629 <2e-16 ***
## flipper_length_mm 48.145 2.011 23.939 <2e-16 ***
## bill_length_mm 6.047 5.180 1.168 0.244
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 394.1 on 339 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.76, Adjusted R-squared: 0.7585
## F-statistic: 536.6 on 2 and 339 DF, p-value: < 2.2e-16


broom::tidy(peng_model_2)
## # A tibble: 3 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -5737. 308. -18.6 7.80e-54
## 2 flipper_length_mm 48.1 2.01 23.9 7.56e-75
## 3 bill_length_mm 6.05 5.18 1.17 2.44e- 1


broom::glance(peng_model_2)
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.760 0.759 394. 537. 9.09e-106 2 -2528. 5063. 5079.
## # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
8 / 12

Robust Models

For robust estimates we can use robust::lmRob() & summary() to see the output - like lm():

our_model_rob <- robust::lmRob(outcome ~ predictor_1 + predictor_2, data = data, na.action = an action)
summary(our_model_rob)
  • We can then compare these to the original betas to see if our model was biased


For robust p-values & CIs we can use the parameters::model_parameters() function to estimate the model with standard errors designed for heteroscedastic residuals, where we give the name of our original non-robust model, set the robust argument to be TRUE, and the vcov.type to be HC4

parameters::model_parameters(non_robust_model, robust = TRUE, vcov.type = "HC4", digits = 3)


Task: fit our peng_model_2 with robust estimates & robust CIs/p-values

9 / 12

Solutions

peng_model_2_rob <- robust::lmRob(body_mass_g ~ flipper_length_mm + bill_length_mm, data = peng_data, na.action = na.exclude)
summary(peng_model_2_rob)
##
## Call:
## robust::lmRob(formula = body_mass_g ~ flipper_length_mm + bill_length_mm,
## data = peng_data, na.action = na.exclude)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1079.80 -272.20 -18.12 256.65 1299.89
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5733.310 336.296 -17.048 <2e-16 ***
## flipper_length_mm 47.967 2.183 21.975 <2e-16 ***
## bill_length_mm 6.472 5.582 1.159 0.247
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 386 on 339 degrees of freedom
## Multiple R-Squared: 0.6255
##
## Test for Bias:
## statistic p-value
## M-estimate 5.902 0.1165
## LS-estimate 4.885 0.1804
## 2 observations deleted due to missingness
10 / 12

For robust parameters/estimates we can use the robust::lmRob() function & summary() to see the output (just like lm()):

For robust p values & CIs we can use the parameters::model_parameters() function to estimate the model with standard errors designed for heteroscedastic residuals, here we give the name of our original non-robust model

Solutions

parameters::model_parameters(peng_model_2, robust = TRUE, vcov.type = "HC4", digits = 3)
## Parameter | Coefficient | SE | 95% CI | t(339) | p
## -----------------------------------------------------------------------------------
## (Intercept) | -5736.897 | 293.486 | [-6314.18, -5159.61] | -19.547 | < .001
## flipper_length_mm | 48.145 | 1.883 | [ 44.44, 51.85] | 25.573 | < .001
## bill_length_mm | 6.047 | 4.947 | [ -3.68, 15.78] | 1.223 | 0.222
11 / 12

For robust parameters/estimates we can use the robust::lmRob() function & summary() to see the output (just like lm()):

For robust p values & CIs we can use the parameters::model_parameters() function to estimate the model with standard errors designed for heteroscedastic residuals, here we give the name of our original non-robust model

Next Steps

Next week is 'reading week', feel free to use this to catch up on any tutorials you've missed so far

The next session in week 11, is 'pick a test week'! You can complete any of the below discovr tutorials, posting any questions or topic suggestions on this padlet & I'll cover anything you want in week 11 - do make sure to post your questions/comments as early as you can so that I can prepare anything I need to with enough time...

discovr_10 - moderation & mediation
discovr_11 - ANOVA
discovr_12 - ANCOVA
discovr_13 - factorial designs
discovr_14 - repeated measures designs
discovr_15 - mixed designs
discovr_17 - exploratory factor analysis
discovr_20 - logistic regression

Paste the below code into the Console in RStudio, change it to the tutorial name you want to do, and press Enter

learnr::run_tutorial("tutorial_name", package = "discovr")
12 / 12

Setup & Q&A

  • Open RStudio

  • Open your fundRmentals R Project (click the blue cube in the top right corner of RStudio)

  • Open a new Rmd file for today

  • In a new code chunk, load tidyverse, palmerpenguins, broom, robust, & parameters, using the library() command (install them in the console if you need to)

  • In another code chunk, create peng_data by copying the below code:

peng_data <- palmerpenguins::penguins
  • Explore the data by using View() & names() on the peng_data object (in the console!!)


Any questions from the last tutorial?

2 / 12
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow