FundRmentals 09Dr Danielle Evans | University of Sussex1 / 12

Setup & Q&A

Open RStudio
Open your fundRmentals R Project (click the blue cube in the top right corner of RStudio)
Open a new Rmd file for today
In a new code chunk, load tidyverse, palmerpenguins, broom, robust, & parameters, using the library() command (install them in the console if you need to)
In another code chunk, create peng_data by copying the below code:

peng_data <- palmerpenguins::penguins

Explore the data by using View() & names() on the peng_data object (in the console!!)

Any questions from the last tutorial?

2 / 12

Overview

Linear model
- 1 predictor: 'simple regression'
- 2+ predictors: 'multiple regression'
- Robust models
Next steps

3 / 12

lm() Function

We can use the lm() function to fit our model, the general structure is as follows:

our_model <- lm(outcome ~ predictor(s), data = data, na.action = an action)

Where we sub in the name we want to give our_model, the outcome, the predictor(s), the data/tibble, & any actions we want to apply to missing data (i.e., na.action = na.exclude)

Task: Fit a linear model to our peng_data, using body_mass_g as the outcome, and flipper_length_mm as our predictor, give it the name peng_model

We have some missing data today - make sure to exclude it by using na.action = na.exclude

4 / 12

lm() Function

We can use the lm() function to fit our model, the general structure is as follows:

our_model <- lm(outcome ~ predictor(s), data = data, na.action = an action)

Where we sub in the name we want to give our_model, the outcome, the predictor(s), the data/tibble, & any actions we want to apply to missing data (i.e., na.action = na.exclude)

Task: Fit a linear model to our peng_data, using body_mass_g as the outcome, and flipper_length_mm as our predictor, give it the name peng_model

We have some missing data today - make sure to exclude it by using na.action = na.exclude

Solution:

peng_model <- lm(body_mass_g ~ flipper_length_mm, data = peng_data, na.action = na.exclude)

5 / 12

Our Model

To actually see the results, we can use summary(peng_model)

summary(peng_model)

## 
## Call:
## lm(formula = body_mass_g ~ flipper_length_mm, data = peng_data, 
##     na.action = na.exclude)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1058.80  -259.27   -26.88   247.33  1288.69 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -5780.831    305.815  -18.90   <2e-16 ***
## flipper_length_mm    49.686      1.518   32.72   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 394.3 on 340 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.759,    Adjusted R-squared:  0.7583 
## F-statistic:  1071 on 1 and 340 DF,  p-value: < 2.2e-16

Alternatively, we can use broom::tidy() & broom::glance() to look at our estimates & model fit

broom::tidy(peng_model)

## # A tibble: 2 x 5
##   term              estimate std.error statistic   p.value
##   <chr>                <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)        -5781.     306.       -18.9 5.59e- 55
## 2 flipper_length_mm     49.7      1.52      32.7 4.37e-107

broom::glance(peng_model)

## # A tibble: 1 x 12
##   r.squared adj.r.squared sigma statistic   p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>     <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.759         0.758  394.     1071. 4.37e-107     1 -2528. 5063. 5074.
## # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

6 / 12

Multiple Predictors

We can use the lm() function again to fit our model with multiple predictors, the general structure is as follows:

our_model <- lm(outcome ~ predictor_1 + predictor_2, data = data, na.action = an action)

Where we sub in the name we want to give our_model, the outcome, the predictors with a +, the data/tibble, & any actions we want to apply to missing data (i.e., na.action = na.exclude)

Task 1: Fit a linear model to our peng_data, using body_mass_g as the outcome, flipper_length_mm & bill_length_mm as the predictors, give it the name peng_model_2

We have some missing data today - make sure to exclude it by using na.action = na.exclude

Task 2: view the results in the same way as before

7 / 12

Model solution:

peng_model_2 <- lm(body_mass_g ~ flipper_length_mm + bill_length_mm, data = peng_data, na.action = na.exclude)

Results:

summary(peng_model_2)

## 
## Call:
## lm(formula = body_mass_g ~ flipper_length_mm + bill_length_mm, 
##     data = peng_data, na.action = na.exclude)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1090.5  -285.7   -32.1   244.2  1287.5 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -5736.897    307.959 -18.629   <2e-16 ***
## flipper_length_mm    48.145      2.011  23.939   <2e-16 ***
## bill_length_mm        6.047      5.180   1.168    0.244    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 394.1 on 339 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:   0.76,    Adjusted R-squared:  0.7585 
## F-statistic: 536.6 on 2 and 339 DF,  p-value: < 2.2e-16

broom::tidy(peng_model_2)

## # A tibble: 3 x 5
##   term              estimate std.error statistic  p.value
##   <chr>                <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)       -5737.      308.      -18.6  7.80e-54
## 2 flipper_length_mm    48.1       2.01     23.9  7.56e-75
## 3 bill_length_mm        6.05      5.18      1.17 2.44e- 1

broom::glance(peng_model_2)

## # A tibble: 1 x 12
##   r.squared adj.r.squared sigma statistic   p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>     <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.760         0.759  394.      537. 9.09e-106     2 -2528. 5063. 5079.
## # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

8 / 12

Robust Models

For robust estimates we can use robust::lmRob() & summary() to see the output - like lm():

our_model_rob <- robust::lmRob(outcome ~ predictor_1 + predictor_2, data = data, na.action = an action)
summary(our_model_rob)

We can then compare these to the original betas to see if our model was biased

For robust p-values & CIs we can use the parameters::model_parameters() function to estimate the model with standard errors designed for heteroscedastic residuals, where we give the name of our original non-robust model, set the robust argument to be TRUE, and the vcov.type to be HC4

parameters::model_parameters(non_robust_model, robust = TRUE, vcov.type = "HC4", digits = 3)

Task: fit our peng_model_2 with robust estimates & robust CIs/p-values

9 / 12

Solutions

peng_model_2_rob <- robust::lmRob(body_mass_g ~ flipper_length_mm + bill_length_mm, data = peng_data, na.action = na.exclude)
summary(peng_model_2_rob)

## 
## Call:
## robust::lmRob(formula = body_mass_g ~ flipper_length_mm + bill_length_mm, 
##     data = peng_data, na.action = na.exclude)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1079.80  -272.20   -18.12   256.65  1299.89 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -5733.310    336.296 -17.048   <2e-16 ***
## flipper_length_mm    47.967      2.183  21.975   <2e-16 ***
## bill_length_mm        6.472      5.582   1.159    0.247    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 386 on 339 degrees of freedom
## Multiple R-Squared: 0.6255 
## 
## Test for Bias:
##             statistic p-value
## M-estimate      5.902  0.1165
## LS-estimate     4.885  0.1804
## 2 observations deleted due to missingness

10 / 12

For robust parameters/estimates we can use the robust::lmRob() function & summary() to see the output (just like lm()):

For robust p values & CIs we can use the parameters::model_parameters() function to estimate the model with standard errors designed for heteroscedastic residuals, here we give the name of our original non-robust model

Solutions

parameters::model_parameters(peng_model_2, robust = TRUE, vcov.type = "HC4", digits = 3)

## Parameter         | Coefficient |      SE |               95% CI |  t(339) |      p
## -----------------------------------------------------------------------------------
## (Intercept)       |   -5736.897 | 293.486 | [-6314.18, -5159.61] | -19.547 | < .001
## flipper_length_mm |      48.145 |   1.883 | [   44.44,    51.85] |  25.573 | < .001
## bill_length_mm    |       6.047 |   4.947 | [   -3.68,    15.78] |   1.223 | 0.222

11 / 12

For robust parameters/estimates we can use the robust::lmRob() function & summary() to see the output (just like lm()):

Next Steps

Next week is 'reading week', feel free to use this to catch up on any tutorials you've missed so far

The next session in week 11, is 'pick a test week'! You can complete any of the below discovr tutorials, posting any questions or topic suggestions on this padlet & I'll cover anything you want in week 11 - do make sure to post your questions/comments as early as you can so that I can prepare anything I need to with enough time...

discovr_10 - moderation & mediation
discovr_11 - ANOVA
discovr_12 - ANCOVA
discovr_13 - factorial designs
discovr_14 - repeated measures designs
discovr_15 - mixed designs
discovr_17 - exploratory factor analysis
discovr_20 - logistic regression

Paste the below code into the Console in RStudio, change it to the tutorial name you want to do, and press Enter

learnr::run_tutorial("tutorial_name", package = "discovr")

12 / 12

Setup & Q&A

Open RStudio

Open your fundRmentals R Project (click the blue cube in the top right corner of RStudio)

Open a new Rmd file for today

In a new code chunk, load tidyverse, palmerpenguins, broom, robust, & parameters, using the library() command (install them in the console if you need to)

In another code chunk, create peng_data by copying the below code:

peng_data <- palmerpenguins::penguins

Explore the data by using View() & names() on the peng_data object (in the console!!)

Any questions from the last tutorial?

2 / 12

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help