class: center, middle, inverse, title-slide # FundRmentals 09 ### Dr Danielle Evans | University of Sussex --- # Setup & Q&A - Open RStudio - Open your fundRmentals R Project (click the blue cube in the top right corner of RStudio) - Open a new Rmd file for today - In a new code chunk, load **tidyverse**, **palmerpenguins**, **broom**, **robust**, & **parameters**, using the **library()** command (install them in the console if you need to) - In another code chunk, create **peng_data** by copying the below code: ```r peng_data <- palmerpenguins::penguins ``` - Explore the data by using **View()** & **names()** on the **peng_data** object (in the console!!) <br> .center[ ### Any questions from the last tutorial? ] --- # Overview - Linear model + 1 predictor: 'simple regression' + 2+ predictors: 'multiple regression' + Robust models - Next steps --- # lm() Function - We can use the **lm()** function to fit our model, the general structure is as follows: ```r our_model <- lm(outcome ~ predictor(s), data = data, na.action = an action) ``` - Where we sub in the **name** we want to give **our_model**, the **outcome**, the **predictor(s)**, the **data**/**tibble**, & any actions we want to apply to missing data (i.e., **na.action = na.exclude**) <br> **Task**: Fit a linear model to our **peng_data**, using **body_mass_g** as the outcome, and **flipper_length_mm** as our predictor, give it the name **peng_model** - We have some missing data today - make sure to exclude it by using **na.action = na.exclude** <br> -- **Solution**: ```r peng_model <- lm(body_mass_g ~ flipper_length_mm, data = peng_data, na.action = na.exclude) ``` --- # Our Model .pull-left[ To actually see the results, we can use **summary(peng_model)** ```r summary(peng_model) ``` .mini[ ``` ## ## Call: ## lm(formula = body_mass_g ~ flipper_length_mm, data = peng_data, ## na.action = na.exclude) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1058.80 -259.27 -26.88 247.33 1288.69 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -5780.831 305.815 -18.90 <2e-16 *** ## flipper_length_mm 49.686 1.518 32.72 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 394.3 on 340 degrees of freedom ## (2 observations deleted due to missingness) ## Multiple R-squared: 0.759, Adjusted R-squared: 0.7583 ## F-statistic: 1071 on 1 and 340 DF, p-value: < 2.2e-16 ``` ]] .pull-right[ Alternatively, we can use **broom::tidy()** & **broom::glance()** to look at our estimates & model fit ```r broom::tidy(peng_model) ``` .mini[ ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -5781. 306. -18.9 5.59e- 55 ## 2 flipper_length_mm 49.7 1.52 32.7 4.37e-107 ``` ] <br> ```r broom::glance(peng_model) ``` .mini[ ``` ## # A tibble: 1 x 12 ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 0.759 0.758 394. 1071. 4.37e-107 1 -2528. 5063. 5074. ## # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int> ``` ]] --- # Multiple Predictors - We can use the **lm()** function again to fit our model with multiple predictors, the general structure is as follows: ```r our_model <- lm(outcome ~ predictor_1 + predictor_2, data = data, na.action = an action) ``` - Where we sub in the **name** we want to give **our_model**, the **outcome**, the **predictors** with a **+**, the **data**/**tibble**, & any actions we want to apply to missing data (i.e., **na.action = na.exclude**) <br> **Task 1**: Fit a linear model to our **peng_data**, using **body_mass_g** as the outcome, **flipper_length_mm** & **bill_length_mm** as the predictors, give it the name **peng_model_2** - We have some missing data today - make sure to exclude it by using **na.action = na.exclude** <br> **Task 2**: view the results in the same way as before --- **Model solution**: .tiny[ ```r peng_model_2 <- lm(body_mass_g ~ flipper_length_mm + bill_length_mm, data = peng_data, na.action = na.exclude) ``` ] .pull-left[ **Results**: ```r summary(peng_model_2) ``` .mini[ ``` ## ## Call: ## lm(formula = body_mass_g ~ flipper_length_mm + bill_length_mm, ## data = peng_data, na.action = na.exclude) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1090.5 -285.7 -32.1 244.2 1287.5 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -5736.897 307.959 -18.629 <2e-16 *** ## flipper_length_mm 48.145 2.011 23.939 <2e-16 *** ## bill_length_mm 6.047 5.180 1.168 0.244 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 394.1 on 339 degrees of freedom ## (2 observations deleted due to missingness) ## Multiple R-squared: 0.76, Adjusted R-squared: 0.7585 ## F-statistic: 536.6 on 2 and 339 DF, p-value: < 2.2e-16 ``` ]] .pull-right[ <br> ```r broom::tidy(peng_model_2) ``` .mini[ ``` ## # A tibble: 3 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -5737. 308. -18.6 7.80e-54 ## 2 flipper_length_mm 48.1 2.01 23.9 7.56e-75 ## 3 bill_length_mm 6.05 5.18 1.17 2.44e- 1 ``` ] <br> ```r broom::glance(peng_model_2) ``` .mini[ ``` ## # A tibble: 1 x 12 ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 0.760 0.759 394. 537. 9.09e-106 2 -2528. 5063. 5079. ## # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int> ``` ]] --- # Robust Models For robust estimates we can use **robust::lmRob()** & **summary()** to see the output - like **lm()**: .tiny[ ```r our_model_rob <- robust::lmRob(outcome ~ predictor_1 + predictor_2, data = data, na.action = an action) summary(our_model_rob) ``` ] - We can then compare these to the original betas to see if our model was biased <br> For robust *p*-values & CIs we can use the **parameters::model_parameters()** function to estimate the model with standard errors designed for heteroscedastic residuals, where we give the name of our **original non-robust model**, set the **robust** argument to be **TRUE**, and the **vcov.type** to be **HC4** ```r parameters::model_parameters(non_robust_model, robust = TRUE, vcov.type = "HC4", digits = 3) ``` <br> **Task**: fit our **peng_model_2** with robust estimates & robust CIs/*p*-values --- # Solutions .mini[ ```r peng_model_2_rob <- robust::lmRob(body_mass_g ~ flipper_length_mm + bill_length_mm, data = peng_data, na.action = na.exclude) summary(peng_model_2_rob) ``` ``` ## ## Call: ## robust::lmRob(formula = body_mass_g ~ flipper_length_mm + bill_length_mm, ## data = peng_data, na.action = na.exclude) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1079.80 -272.20 -18.12 256.65 1299.89 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -5733.310 336.296 -17.048 <2e-16 *** ## flipper_length_mm 47.967 2.183 21.975 <2e-16 *** ## bill_length_mm 6.472 5.582 1.159 0.247 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 386 on 339 degrees of freedom ## Multiple R-Squared: 0.6255 ## ## Test for Bias: ## statistic p-value ## M-estimate 5.902 0.1165 ## LS-estimate 4.885 0.1804 ## 2 observations deleted due to missingness ``` ] ??? For robust parameters/estimates we can use the **robust::lmRob()** function & **summary()** to see the output (just like **lm()**): For robust p values & CIs we can use the **parameters::model_parameters()** function to estimate the model with standard errors designed for heteroscedastic residuals, here we give the name of our **original non-robust model** --- # Solutions ```r parameters::model_parameters(peng_model_2, robust = TRUE, vcov.type = "HC4", digits = 3) ``` ``` ## Parameter | Coefficient | SE | 95% CI | t(339) | p ## ----------------------------------------------------------------------------------- ## (Intercept) | -5736.897 | 293.486 | [-6314.18, -5159.61] | -19.547 | < .001 ## flipper_length_mm | 48.145 | 1.883 | [ 44.44, 51.85] | 25.573 | < .001 ## bill_length_mm | 6.047 | 4.947 | [ -3.68, 15.78] | 1.223 | 0.222 ``` ??? For robust parameters/estimates we can use the **robust::lmRob()** function & **summary()** to see the output (just like **lm()**): For robust p values & CIs we can use the **parameters::model_parameters()** function to estimate the model with standard errors designed for heteroscedastic residuals, here we give the name of our **original non-robust model** --- # Next Steps Next week is 'reading week', feel free to use this to catch up on any tutorials you've missed so far The next session in week 11, is 'pick a test week'! You can complete any of the below discovr tutorials, posting any questions or topic suggestions on [**this padlet**](https://uofsussex.padlet.org/de84/9gba8ktbw8egpqwm) & I'll cover anything you want in week 11 - do make sure to post your questions/comments as early as you can so that I can prepare anything I need to with enough time... **discovr_10** - moderation & mediation **discovr_11** - ANOVA **discovr_12** - ANCOVA **discovr_13** - factorial designs **discovr_14** - repeated measures designs **discovr_15** - mixed designs **discovr_17** - exploratory factor analysis **discovr_20** - logistic regression Paste the below code into the **Console** in RStudio, change it to the **tutorial name** you want to do, and press **Enter** ```r learnr::run_tutorial("tutorial_name", package = "discovr") ```