Presentation ready
plots I:
Telling a story

Lecture 9

Dr. Mine Çetinkaya-Rundel

Duke University
STA 113 - Fall 2023

Warm up

Peer review

Reviewer Reviewee
Coding Clowns Just Make Some Noise
Just Make Some Noise Stats Slayers
Stats Slayers Coding Clowns
25:00

Telling a story

Multiple ways of telling a story

  • Sequential plots: Motivation, then resolution

  • A single plot: Resolution, and hidden in it motivation

Project note: you’re asked to create two plots for your question. One possible approach: Start with a plot showing the raw data, and show derived quantities (e.g. percent increases, averages, coefficients of fitted models) in the subsequent plot.

Simplicity vs. complexity

When you’re trying to show too much data at once you may end up not showing anything.

  • Never assume your audience can rapidly process complex visual displays

  • Don’t add variables to your plot that are tangential to your story

  • Don’t jump straight to a highly complex figure; first show an easily digestible subset (e.g., show one facet first)

  • Aim for memorable, but clear

Project note: Make sure to leave time to iterate on your plots after you practice your presentation. If certain plots are getting too wordy to explain, take time to simplify them!

Consistency vs. repetitiveness

Be consistent but don’t be repetitive.

  • Use consistent features throughout plots (e.g., same color represents same level on all plots)

  • Aim to use a different type of visualization for each distinct analysis

Project note: If possible, ask a friend who is not in the class to listen to your presentation and then ask them what they remember. Then, ask yourself: is that what you wanted them to remember?

Designing effective visualizations

Packages and data

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggrepel)
library(ggthemes)
d <- tribble(
  ~category,                     ~value,
  "Cutting tools"                , 0.03,
  "Buildings and administration" , 0.22,
  "Labor"                        , 0.31,
  "Machinery"                    , 0.27,
  "Workplace materials"          , 0.17
)
d
# A tibble: 5 × 2
  category                     value
  <chr>                        <dbl>
1 Cutting tools                 0.03
2 Buildings and administration  0.22
3 Labor                         0.31
4 Machinery                     0.27
5 Workplace materials           0.17

Keep it simple

Judging relative area

Use color to draw attention



Play with themes for a non-standard look

Go beyond ggplot2 themes – ggthemes

Tell a story

Leave out non-story details

Order matters

Clearly indicate missing data

Reduce cognitive load

Use descriptive titles

Annotate figures

Plot sizing and layout

Sample plots

p_hist <- ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(binwidth = 2)

p_text <- mtcars |>
  rownames_to_column() |>
  ggplot(aes(x = disp, y = mpg)) +
  geom_text_repel(aes(label = rowname)) +
  coord_cartesian(clip = "off")

Small fig-width

For a zoomed-in look

```{r}
#| fig-width: 3
#| fig-asp: 0.618

p_hist
```

Large fig-width

For a zoomed-out look

```{r}
#| fig-width: 10
#| fig-asp: 0.618

p_hist
```

fig-width affects text size

Multiple plots on a slide

First, ask yourself, must you include multiple plots on a slide? For example, is your narrative about comparing results from two plots?

  • If no, then don’t! Move the second plot to to the next slide!

  • If yes, use columns and sequential reveal.

Project workflow overview

Demo

  • Rendering individual documents
  • Write-up:
    • Cross referencing
    • Citations
  • Website:
    • Rendering site
    • Making sure your website reflects your latest changes
    • Customizing the look of your website (optional)

Exam 1

Take home exam common issues

  • Do not use absolute paths to load data, use relative paths, e.g. "data/tv.csv" not "/cloud/project/data/tv.csv".
  • The tidyverse packages already loads nine packages with it: ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats, lubridate. No need to load these with library() after the tidyverse is loaded.
  • Similarly, no need for readr::read_csv() after tidyverse is loaded, you can just do read_csv().
  • Pay attention to code style and indentation.
  • Do not load data and packages for each question, once per document is sufficient.
  • Do not Repeat Yourself - DRY: If using the same data in further analysis (subsequent questions), prep the data (filter) once and save it and use that in subsequent analysis.
  • If using count() to create a frequency table, the resulting n column is numeric, no need to convert it to numeric again.

Take home exam redo (optional)

  • Due: Friday, Oct 13 at 1 pm
  • Must request opening your exam repo back up for resubmission by end of class on Thursday by messaging me on Slack
  • Work in exam-1-redo.qmd, this is a copy of your exam submission, without any changes I might have implemented to get it to render – do not overwrite exam-1.qmd.
  • Improve your answers working on your own. The same rules as the exam applies.
  • You will be eligible to receive up to 50% of the points you missed on the take home portion of the exam.
  • There is no in-class exam redo.