Colors

Lecture 18

Dr. Mine Çetinkaya-Rundel

Duke University
STA 113 - Fall 2023

Warm-up

Announcements

  • Any project questions?
  • Any specific requests for tips to cover on Tuesday?

Setup

library(countdown)
library(tidyverse)
library(scales)
library(ggthemes)
library(cowplot)
library(colorspace)
library(ggrepel)

theme_set(theme_gray(14)) # 16 for full width, 18 for half width 

Color scales

Uses of color in data visualization

  1. Distinguish categories (qualitative)

Qualitative scale example

Palette name: Okabe-Ito

Qualitative scale example

Palette name: ColorBrewer Set1

Qualitative scale example

Palette name: ColorBrewer Set3

Uses of color in data visualization

  1. Distinguish categories (qualitative)
  2. Represent numeric values (sequential)

Sequential scale example

Palette name: Viridis

Sequential scale example

Palette name: Inferno

Sequential scale example

Palette name: Cividis

Uses of color in data visualization

  1. Distinguish categories (qualitative)
  2. Represent numeric values (sequential)
  3. Represent numeric values (diverging)

Diverging scale example

Palette name: ColorBrewer PiYG

Diverging scale example

Palette name: Carto Earth

Diverging scale example

Palette name: Blue-Red

Uses of color in data visualization

  1. Distinguish categories (qualitative)
  2. Represent numeric values (sequential)
  3. Represent numeric values (diverging)
  4. Highlight

Highlight example

Palette name: Grays with accents

Highlight example

Palette name: Okabe-Ito accent

Highlight example

Palette name: ColorBrewer accent

Uses of color in data visualization

  1. Distinguish categories (qualitative)
  2. Represent numeric values (sequential)
  3. Represent numeric values (diverging)
  4. Highlight

Color scales in ggplot2

temps_months data

Getting the temps_months data:

temps_months <- read_csv("data/tempnormals.csv") |>
  group_by(location, month_name) |>
  summarize(mean = mean(temperature)) |>
  mutate(
    month = factor(
      month_name,
      levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
                 "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
    ),
    location = factor(
      location, levels = c("Death Valley", "Houston", "San Diego", "Chicago")
    )
  ) |>
  select(-month_name)

temps_months data

The temps_months data:

temps_months
# A tibble: 48 × 3
# Groups:   location [4]
   location  mean month
   <fct>    <dbl> <fct>
 1 Chicago   50.4 Apr  
 2 Chicago   74.1 Aug  
 3 Chicago   29   Dec  
 4 Chicago   28.9 Feb  
 5 Chicago   24.8 Jan  
 6 Chicago   75.8 Jul  
 7 Chicago   71.0 Jun  
 8 Chicago   38.8 Mar  
 9 Chicago   60.9 May  
10 Chicago   41.6 Nov  
# ℹ 38 more rows

popgrowth data

Getting the popgrowth data:

US_census <- read_csv("data/US_census.csv")
US_regions <- read_csv("data/US_regions.csv")
popgrowth <- left_join(US_census, US_regions) |>
    group_by(region, division, state) |>
    summarize(
      pop2000 = sum(pop2000, na.rm = TRUE),
      pop2010 = sum(pop2010, na.rm = TRUE),
      popgrowth = (pop2010-pop2000)/pop2000,
      .groups = "drop"
    ) |>
    mutate(region = factor(region, levels = c("West", "South", "Midwest", "Northeast")))

popgrowth data

The popgrowth data:

popgrowth
# A tibble: 51 × 6
   region  division           state      pop2000  pop2010 popgrowth
   <fct>   <chr>              <chr>        <dbl>    <dbl>     <dbl>
 1 Midwest East North Central Illinois  12419293 12830632   0.0331 
 2 Midwest East North Central Indiana    6080485  6483802   0.0663 
 3 Midwest East North Central Michigan   9938444  9883640  -0.00551
 4 Midwest East North Central Ohio      11353140 11536504   0.0162 
 5 Midwest East North Central Wisconsin  5363675  5686986   0.0603 
 6 Midwest West North Central Iowa       2926324  3046355   0.0410 
 7 Midwest West North Central Kansas     2688418  2853118   0.0613 
 8 Midwest West North Central Minnesota  4919479  5303925   0.0781 
 9 Midwest West North Central Missouri   5595211  5988927   0.0704 
10 Midwest West North Central Nebraska   1711263  1826341   0.0672 
# ℹ 41 more rows

ggplot2 color scale functions

Scale function Aesthetic Data type Palette type
scale_color_hue() color discrete qualitative

ggplot2 color scale functions

Scale function Aesthetic Data type Palette type
scale_color_hue() color discrete qualitative
scale_fill_hue() fill discrete qualitative

ggplot2 color scale functions

Scale function Aesthetic Data type Palette type
scale_color_hue() color discrete qualitative
scale_fill_hue() fill discrete qualitative
scale_color_gradient() color continuous sequential

ggplot2 color scale functions

Scale function Aesthetic Data type Palette type
scale_color_hue() color discrete qualitative
scale_fill_hue() fill discrete qualitative
scale_color_gradient() color continuous sequential
scale_color_gradient2() color continuous diverging

ggplot2 color scale functions

Scale function Aesthetic Data type Palette type
scale_color_hue() color discrete qualitative
scale_fill_hue() fill discrete qualitative
scale_color_gradient() color continuous sequential
scale_color_gradient2() color continuous diverging
scale_fill_viridis_c() color continuous sequential
scale_fill_viridis_d() fill discrete sequential
scale_color_brewer() color discrete qualitative, diverging, sequential
scale_fill_brewer() fill discrete qualitative, diverging, sequential
scale_color_distiller() color continuous qualitative, diverging, sequential

… and there are many many more

Examples

ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile(width = 0.95, height = 0.95) + 
  coord_fixed(expand = FALSE) +
  theme_classic()
  # no fill scale defined, default is scale_fill_gradient()

Examples

ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile(width = 0.95, height = 0.95) + 
  coord_fixed(expand = FALSE) +
  theme_classic() +
  scale_fill_gradient()

Examples

ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile(width = 0.95, height = 0.95) + 
  coord_fixed(expand = FALSE) +
  theme_classic() +
  scale_fill_viridis_c()

Examples

ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile(width = 0.95, height = 0.95) + 
  coord_fixed(expand = FALSE) +
  theme_classic() +
  scale_fill_viridis_c(option = "B", begin = 0.15)

Examples

ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile(width = 0.95, height = 0.95) + 
  coord_fixed(expand = FALSE) +
  theme_classic() +
  scale_fill_distiller(palette = "YlGnBu")

Color scales in the colorspace package

The colorspace package creates some order

Scale name: scale_<aesthetic>_<datatype>_<colorscale>()

  • <aesthetic>: name of the aesthetic (fill, color, colour)
  • <datatype>: type of variable plotted (discrete, continuous, binned)
  • <colorscale>: type of the color scale (qualitative, sequential, diverging, divergingx)
Scale function Aesthetic Data type Palette type
scale_color_discrete_qualitative() color discrete qualitative
scale_fill_continuous_sequential() fill continuous sequential
scale_colour_continous_divergingx() colour continuous diverging

Examples

ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile(width = 0.95, height = 0.95) + 
  coord_fixed(expand = FALSE) +
  theme_classic() +
  scale_fill_continuous_sequential(palette = "YlGnBu", rev = FALSE)

Examples

ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile(width = 0.95, height = 0.95) + 
  coord_fixed(expand = FALSE) +
  theme_classic() +
  scale_fill_continuous_sequential(palette = "Viridis", rev = FALSE)

Examples

ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile(width = 0.95, height = 0.95) + 
  coord_fixed(expand = FALSE) +
  theme_classic() +
  scale_fill_continuous_sequential(palette = "Inferno", begin = 0.15, rev = FALSE)

HCL (Hue-Chroma-Luminance) palettes: Sequential

colorspace::hcl_palettes(type = "sequential", plot = TRUE)

HCL palettes: Diverging

colorspace::hcl_palettes(type = "diverging", plot = TRUE, n = 9)

HCL palettes: Divergingx

colorspace::divergingx_palettes(plot = TRUE, n = 9)

Setting colors manually

Discrete, qualitative scales are best set manually

ggplot(popgrowth, aes(x = pop2000, y = popgrowth, color = region)) +
  geom_point() +
  scale_x_log10()
  # no color scale defined, default is scale_color_hue()

Discrete, qualitative scales are best set manually

ggplot(popgrowth, aes(x = pop2000, y = popgrowth, color = region)) +
  geom_point() +
  scale_x_log10() +
  scale_color_hue()

Discrete, qualitative scales are best set manually

library(ggthemes)  # for scale_color_colorblind()

ggplot(popgrowth, aes(x = pop2000, y = popgrowth, color = region)) +
  geom_point() +
  scale_x_log10() +
  scale_color_colorblind()  # uses Okabe-Ito colors

Discrete, qualitative scales are best set manually

ggplot(popgrowth, aes(x = pop2000, y = popgrowth, color = region)) +
  geom_point() +
  scale_x_log10() +
  scale_color_manual(
    values = c(West = "#E69F00", South = "#56B4E9", Midwest = "#009E73", Northeast = "#F0E442")
  )

Okabe-Ito RGB codes

Name Hex code R, G, B (0-255)
orange #E69F00 230, 159, 0
sky blue #56B4E9 86, 180, 233
bluish green #009E73 0, 158, 115
yellow #F0E442 240, 228, 66
blue #0072B2 0, 114, 178
vermilion #D55E00 213, 94, 0
reddish purple #CC79A7 204, 121, 167
black #000000 0, 0, 0

Further reading

A few considerations when choosing colors

1. Avoid high chroma

High chroma: Toys

Low chroma: “Elegance”

2. Be aware of color-vision deficiency

5%–8% of men are color blind!

Red-green color-vision deficiency is the most common

2. Be aware of color-vision deficiency

5%–8% of men are color blind!

Blue-green color-vision deficiency is rare but does occur

2. Be aware of color-vision deficiency

Choose colors that can be distinguished with CVD

Consider using the Okabe-Ito scale

Name Hex code    R, G, B (0-255)
orange #E69F00 230, 159, 0
sky blue #56B4E9 86, 180, 233
bluish green #009E73 0, 158, 115
yellow #F0E442 240, 228, 66
blue #0072B2 0, 114, 178
vermilion #D55E00 213, 94, 0
reddish purple #CC79A7 204, 121, 167
black #000000 0, 0, 0

CVD is worse for thin lines and tiny dots

When in doubt, run CVD simulations

Further reading