Visual inference

Lecture 16

Dr. Mine Çetinkaya-Rundel

Duke University
STA 113 - Fall 2023

Warm-up

Announcements

  • Project 2 proposal feedback delayed, will have it for you by the end of tonight!
  • Who is in class on Tuesday?

Setup

library(tidyverse)
library(tidymodels)
library(openintro)
library(nullabor)  # for visual inference
library(skimr)     # for skimming data 

Now you see me, now you don’t

One of the pictures is a photograph of a painting by Piet Mondrian while the other is a photograph of a drawing made by an IBM 7094 digital computer. Which of the two do you think was done by the computer?

A. M. Noll, “Human or machine: A subjective comparison of piet mondrian’s”composition with lines” (1917) and a computer- generated picture,” The Psychological Record, vol. 16, pp. 1–10, 1966.

Aside: Who is Piet Mondrian?

Apophenia

What is apophenia?

The tendency to perceive a connection or meaningful pattern between unrelated or random things.

Visualization, statistical inference, visual inference

  • Visualization provides tools to uncover new relationships, tools of curiosity, and much of visualization research focuses on making the chance of finding relationships as high as possible
  • Statistical inference provides tools to check whether a relationship really exists (tools of skepticism) and most statistics research focuses on making sure to minimize the chance of finding a relationship that does not exist
    • Testing: Is there a difference?
    • Estimation: How big is the difference?
  • Visual inference bridges these two conflicting drives to provide a tool for skepticism that can be applied in a curiosity-driven context - Testing: Is what we see really there, i.e., is what we see in a plot of the sample an accurate reflection of the entire population?

Hypothesis testing as a court trial

  • Null hypothesis, \(H_0\): Defendant is innocent

  • Alternative hypothesis, \(H_A\): Defendant is guilty

  • Present the evidence: Collect data

  • Judge the evidence: “Could these data plausibly have happened by chance if the null hypothesis were true?”
    • Yes: Fail to reject \(H_0\)
    • No: Reject \(H_0\)

Hypothesis testing framework

  • Start with a null hypothesis, \(H_0\), that represents the status quo

  • Set an alternative hypothesis, \(H_A\), that represents the research question, i.e. what we’re testing for

  • Conduct a hypothesis test under the assumption that the null hypothesis is true and calculate a p-value (probability of observed or more extreme outcome given that the null hypothesis is true)

    • if the test results suggest that the data do not provide convincing evidence for the alternative hypothesis, stick with the null hypothesis
    • if they do, then reject the null hypothesis in favor of the alternative

Potential errors

  • Type 1 error (false negative): Acquit / reject \(H_0\) when you shouldn’t

  • Type 2 error (false positive): Falsely convict an innocent / fail to reject \(H_0\) when you shouldn’t

  • Costs of these errors vary based on the severity of the consequences

Statistical vs. visual inference

Statistical

  • Test statistic
  • p-value compared to significance level

Visual

  • Plot of the data
  • Humans’ picking an “innocent” out of a series of null plots

Visual inference with a lineup

The lineup protocol

  • Plot of the real data is randomly embedded amongst a set of null plots

  • Matrix of plots is known as a lineup

  • Null plots are generated by a method consistent with the null hypothesis

  • The lineup is shown to an observer. If the observer can pick the real data as different from the others, this puts weight on the statistical significance of the structure in the plot.

  • The lineup() function from the nullabor package returns a set of generated null datasets and the real data embedded randomly among these null datasets

Using nullabor::lineup()

  • Generating null plots:
    • Option 1: Provide a method of generation and let the lineup() function generate them
    • Option 2: Generating the null datasets yourself and pass them to the lineup() function
  • Position of the real dataset is hidden by default
    • Decrypt to find out which plot is the real dataset

Will the real mtcars please stand up?

The making of the lineup I

Step 1. Permute the data

set.seed(20211029)
mtcars_permuted_lineup <- lineup(null_permute("mpg"), mtcars)

The making of the lineup II

Step 2. Peek at the permuted data

head(mtcars_permuted_lineup)
      mpg cyl disp  hp drat    wt  qsec vs am gear carb .sample
...1 10.4   6  160 110 3.90 2.620 16.46  0  1    4    4       1
...2 21.4   6  160 110 3.90 2.875 17.02  0  1    4    4       1
...3 14.7   4  108  93 3.85 2.320 18.61  1  1    4    1       1
...4 15.5   6  258 110 3.08 3.215 19.44  1  0    3    1       1
...5 30.4   8  360 175 3.15 3.440 17.02  0  0    3    2       1
...6 15.0   6  225 105 2.76 3.460 20.22  1  0    3    1       1

The making of the lineup II

Step 2. Peek at the permuted data

n = 20 by default

mtcars_permuted_lineup |>
  count(.sample)
   .sample  n
1        1 32
2        2 32
3        3 32
4        4 32
5        5 32
6        6 32
7        7 32
8        8 32
9        9 32
10      10 32
11      11 32
12      12 32
13      13 32
14      14 32
15      15 32
16      16 32
17      17 32
18      18 32
19      19 32
20      20 32

The making of the lineup III

Step 3. Plot the permutations

ggplot(data = mtcars_permuted_lineup, aes(x = mpg, y = wt)) +
  geom_point() +
  facet_wrap(~ .sample)

Decrypt the lineup

decrypt("K4Ul YG3G R8 oSJR3RS8 g6")
[1] "True data in position  16"

Visual inference with a Rorschach

The Rorschach protocol

  • The Rorschach protocol is used to calibrate the eyes for variation due to sampling
  • Plots generated corresponds to the null datasets, data that is consistent with a null hypothesis
  • rorschach() function returns a set of null plots which are shown to observers to calibrate their eyes with variation

Using nullabor::rorschach()

  • Generating null plots: Provide a method of generation and let the rorschach() function generate them

  • Provide the true data set

  • Set n, total number of samples to generate (n = 20 by default)

  • Set p, probability of including true data with null data (p = 0 by default)

Train your eyes to spot the real mtcars

The making of the rorschach I

Step 1. Permute the data

set.seed(20211029)
mtcars_permuted_rorschach <- rorschach(
  null_permute("mpg"), mtcars
)

The making of the rorschach II

Step 2. Peek at the permuted data

head(mtcars_permuted_rorschach)
   mpg cyl disp  hp drat    wt  qsec vs am gear carb .sample
1 10.4   6  160 110 3.90 2.620 16.46  0  1    4    4       1
2 21.4   6  160 110 3.90 2.875 17.02  0  1    4    4       1
3 14.7   4  108  93 3.85 2.320 18.61  1  1    4    1       1
4 15.5   6  258 110 3.08 3.215 19.44  1  0    3    1       1
5 30.4   8  360 175 3.15 3.440 17.02  0  0    3    2       1
6 15.0   6  225 105 2.76 3.460 20.22  1  0    3    1       1

The making of the rorschach II

Step 2. Peek at the permuted data

n = 20 by default

mtcars_permuted_rorschach |>
  count(.sample)
   .sample  n
1        1 32
2        2 32
3        3 32
4        4 32
5        5 32
6        6 32
7        7 32
8        8 32
9        9 32
10      10 32
11      11 32
12      12 32
13      13 32
14      14 32
15      15 32
16      16 32
17      17 32
18      18 32
19      19 32
20      20 32

The making of the rorschach III

Step 3. Plot the permutations

ggplot(data = mtcars_permuted_rorschach, aes(x = mpg, y = wt)) +
  geom_point() +
  facet_wrap(~ .sample)

Decrypt the rorschach

  • In this particular case there’s nothing to decrypt since p (probability of including true data with null data) is set to 0

  • If p is higher than 0, and the true null is included, you get the decryption key

set.seed(12345)
mtcars_permuted_rorschach <- rorschach(null_permute("mpg"), mtcars, p = 0.5)
decrypt("K4Ul YG3G R8 oSJR3RS8 gI")
[1] "True data in position  15"

Generating the null data

Generating the null data

  • By permuting a variable (what we’ve done so far with mtcars)
  • With a specific distribution
  • With null residuals from a model
  • With null data outside of nullabor (we won’t get into this, but see here for more)

Generate null data with a specific distribution

  • The null_dist() function takes as input a variable name of the data and a particular distribution
  • This variable in the data is substituted by random generations of the particular distribution
  • The different distributions include beta, cauchy, chi-squared, exponential, f, gamma, geometric, log-normal, lognormal, logistic, negative binomial, normal, poisson, t, and weibull
  • Parameters of the distributions are estimated from the given data (default) or can be provided as a list
  • null_dist() returns a function that generates a null data set given the data

Case study: Heights of adults

The following histogram shows the distribution of heights of 507 physically active individuals (openintro::bdims$hgt). Do the heights of these individuals follow a normal distribution?

Spot the real data

Which of the following is the plot of the real data? (Note: A different binwidth than the previous plot is used.)

Code: Generate null data with a specific distribution

Generate the null distribution

set.seed(20211029)

bdims_permuted_lineup <- lineup(null_dist("hgt", dist = "normal"), n = 10, true = bdims)

Code: Decrypt

decrypt("K4Ul YG3G R8 oSJR3RS8 Ng")
[1] "True data in position  2"

Generating the null data with residuals from a model

  • null_lm() takes as input a model specification formula as defined by lm() and method for generating null residuals from the model

  • Three built in methods for different (and valid) methods to generate null data when fitting a linear model:

    • method = "pboot"
    • method = "boot"
    • method = "rotate"
  • null_lm() returns a function which given the data generates a null dataset

Case study: Black cherry trees

Data measures the diameter, height and volume of timber in 31 felled black cherry trees (datasets::trees)

Summary

skim(datasets::trees)
── Data Summary ────────────────────────
                           Values         
Name                       datasets::trees
Number of rows             31             
Number of columns          3              
_______________________                   
Column type frequency:                    
  numeric                  3              
________________________                  
Group variables            None           

── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  skim_variable n_missing complete_rate mean    sd   p0  p25  p50  p75 p100 hist 
1 Girth                 0             1 13.2  3.14  8.3 11.0 12.9 15.2 20.6 ▃▇▃▅▁
2 Height                0             1 76    6.37 63   72   76   80   87   ▃▃▆▇▃
3 Volume                0             1 30.2 16.4  10.2 19.4 24.2 37.3 77   ▇▅▁▂▁

Case study: Black cherry trees

Plot

Fit the model

trees_fit <- lm(log(Volume) ~ log(Girth) + log(Height), 
                data = datasets::trees)

tidy(trees_fit)
# A tibble: 3 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)    -6.63    0.800      -8.29 5.06e- 9
2 log(Girth)      1.98    0.0750     26.4  2.42e-21
3 log(Height)     1.12    0.204       5.46 7.81e- 6

Augment the model

trees_aug <- as_tibble(datasets::trees) |>
  mutate(
    .resid = residuals(trees_fit),
    .fitted = fitted(trees_fit)
  )

trees_aug
# A tibble: 31 × 5
   Girth Height Volume   .resid .fitted
   <dbl>  <dbl>  <dbl>    <dbl>   <dbl>
 1   8.3     70   10.3  0.0219     2.31
 2   8.6     65   10.3  0.0343     2.30
 3   8.8     63   10.2  0.0138     2.31
 4  10.5     72   16.4 -0.0106     2.81
 5  10.7     81   18.8 -0.0430     2.98
 6  10.8     83   19.7 -0.0420     3.02
 7  11       66   15.6 -0.0557     2.80
 8  11       75   18.2 -0.0443     2.95
 9  11.1     80   22.6  0.0822     3.04
10  11.2     75   19.9  0.00926    2.98
# ℹ 21 more rows

Test

  • Hypotheses:
    • \(H_0\): Errors are \(NID(0, \sigma^2)\)
    • \(H_A\): Errors are not \(NID(0, \sigma^2)\)
  • Visual statistic: Residuals plot (residuals vs. fitted)
  • Null distributions: Generate residuals from random draws from \(N(0, \hat{sigma}^2)\) using the nullabor package
method <- null_lm(log(Volume) ~ log(Girth) + log(Height),
                  method = "pboot")
  • Compare the visual statistic from the data to the null distributions using a lineup
set.seed(2020)
trees_lineup <- lineup(method, true = trees_aug, n = 10)

Lineup

Which one is the real residuals plot?

Decrypt

decrypt("K4Ul YG3G R8 oSJR3RS8 Nq")
[1] "True data in position  3"

The visual inference test

The visual inference test

  • Notation:
    • \(n\): number of independent participants
    • \(X\): number of participants who detect the data plot
    • \(x\): observed value of \(X\)
    • \(m\): number of plots
    • \(p\): the probability of selecting the data plot
  • Hypotheses:
    • \(H_0: p = 1 / m\) - The probability of the data plot being selected is \(1 / m\) (same as all other plots)
    • \(H_A: p > 1 / m\) - The probability of the data plot being selected is greater than \(1 / m\)
  • Under \(H_0, X \sim B(n, 1/m)\). Then, visual inference p-value:

\[ p(X \ge x) = 1 - P(X \le x - 1) = \sum_{k = x}^n {n \choose k} \frac{(m - 1)^k}{m^n} \]

  • In R: \(P(X \le x)\) = pbinom(x, n, p)

Calculating p-values

We did three lineup tests today, where - \(x\) is the number of people who spotted the real data and - \(n\) is the number of people voting

Let’s calculate the p-values

  • Spot the real mtcars (n = 20)
1 - pbinom(4, 12, 1/20)
  • Spot the real bdims::hgt (n = 10)
1 - pbinom(0, 12, 1/10)
  • Spot the real residuals of trees model (n = 10)
1 - pbinom(0, 12, 1/20)

Acknowledgements

Acknowledgements