library(tidyverse)
HW 1 - Anti-LGBTQ+ contributions
This homework is due Thursday, Sep 14 at 5:00 pm ET.
Getting Started
Go to the sta113-f23 organization on GitHub. Click on the repo with the prefix
hw-1
. It contains the starter documents you need to complete the homework assignment.Clone the repo and start a new project in Posit Cloud.
Packages
Guidelines + tips
As we’ve discussed in lecture, your plots should include an informative title, axes should be labeled, and careful consideration should be given to aesthetic choices.
Remember that continuing to develop a sound workflow for reproducible data analysis is important as you complete this homework and other assignments in this course. There will be periodic reminders in this assignment to remind you to knit, commit, and push your changes to GithHub. You should have at least 3 commits with meaningful commit messages by the end of the assignment.
Note: Do not let R output answer the question for you unless the question specifically asks for just a plot. For example, if the question asks for the number of columns in the data set, please type out the number of columns. You are subject to lose points if you do not.
Workflow + formatting
Make sure to
- Update author name on your document.
- Label all code chunks informatively and concisely.
- Follow the Tidyverse code style guidelines.
- Make at least 3 commits.
- Resize figures where needed, avoid tiny or huge plots.
- Use informative title and axis labels.
- Turn in an organized, well formatted document.
Data
Use this dataset for Exercises 1 to 7.
The data for this assignment week comes from Data For Progress, by way of the TidyTuesday project.1
Each year, hundreds of corporations around the country participate in Pride, an annual celebration of the LGBTQ+ community’s history and progress. They present themselves as LGBTQ+ allies, but new research from Data for Progress finds that in between their yearly parade appearances, dozens of these corporations are giving to state politicians behind some of the most bigoted and harmful policies in over a decade.
Activists and allies wishing to hold these politicians accountable for bigotry can begin by holding their corporate backers accountable. In a new project series, Data for Progress has compiled a set of resources for activists, employees, community leaders, and lawmakers to push back on these policies and the prejudice powering them. We provide research tying the political giving of specific Fortune 500 companies to anti-LGBTQ+ politicians in six states, polling showing that such giving hurts the brands’ favorability, and upcoming policy memos to understand the issue and to take action.
The data we will analyze contains monetary contributions from January 2019 to March 2022 to politicians sponsoring anti-LGBTQ+ legislation and/or government policy in the 2021-2022 legislative session. The specific bills and executive orders considered are shown in Table 1.2
State | Bill Number | Purpose |
FL | CS/CS/HB 1557 | Silence discussion of queer identities in school |
AZ | SB.1138 | Restrict access to gender affirming healthcare |
ID | H.675 | Restrict access to gender affirming healthcare |
AL | SB184/HB266 | Restrict access to gender affirming healthcare |
TN | HB2670/SBSB2290 | Silence discussion of queer identities in school |
TX | Executive Order | Restrict access to gender affirming healthcare |
You can read the data as follows:
<- read_csv("data/anti-lgbtq.csv") anti_lgbtq
The companies included in this dataset are corporations who are either Fortune 500 companies or Pride sponsors.
Description of variables is given in Table 2.
Variable | Description |
---|---|
company |
Name of company |
pride_sponsor |
Whether the company sponsored a Pride event in SF, NYC, Seattle, LA, Chicago, Houston, Atlanta, or Miami |
hrc_pledge |
Whether the company took the Human Rights Campaign Business Pledge |
contribution_amount |
Amount of contribution, in USD |
n_politicians |
Number of anti-LGBTQ politicians supported |
n_states |
Number of states where these politicians serve |
fortune_500 |
Whether the company is a Fortune 500 company |
Exercises
Exercise 1
Using inline code to generate the number of rows and columns and include them in a sentence, answer the following questions:
- How many rows are in the
anti_lgbtq
dataset? What does each row represent? - How many columns are in the
anti_lgbtq
dataset? Indicate the type of each variable (categorical or numerical).
Now is a good time to render, commit, and push. Make sure that you commit and push all changed documents and your Git pane is completely empty before proceeding. You can use a commit message like “Finished Exercise 1”.
Exercise 2
Visualize the distribution of contribution amounts using a histogram. Based on the shape, calculate the appropriate summary statistics for the center and spread of the distribution. Using the visualization and the summary statistics you calculated, describe the distribution of contribution amounts to politicians sponsoring anti-LGBTQ+ legislation and/or government policy in the 2021-2022 legislative session.
Now is again a good time to render, commit, and push, with a commit message line “Finished Exercise 2”. Make sure that you commit and push all changed documents and your Git pane is completely empty before proceeding.
Exercise 3
For this exercise, make two visualizations that display the distribution of Pride sponsors based on whether the company is a Fortune 500 company.
The first distribution should allow you to compare the numbers of Pride sponsors who are and are not Fortune 500 companies.
The second visualization should allow you to compare the proportions instead.
What information about these two variables can you
get from both visualizations?
get from from the first visualization, but not the second?
get from from the second visualization, but not the first?
get from neither visualization?
Render, commit, and push, with an appropriate commit message. Make sure that you commit and push all changed documents and your Git pane is completely empty before proceeding.
Exercise 4
First, make a bar plot of hrc_pledge
and take note of the number of companies who have take the HRC pledge. Then, visualize the distribution of pride sponsors based on whether the company has made an HRC pledge. Make sure that your visualization allows you to make statements about the proportion of pride sponsors who have and have not made the HRC pledge. Describe the relationship, if any, between these two variables using features of your visualization as justification for your conclusions.
Render, commit, and push, with an appropriate commit message. Make sure that you commit and push all changed documents and your Git pane is completely empty before proceeding.
Exercise 5
Suppose you’re helping a friend who is writing a piece for The Chronicle about companies that both contribute to anti-LGBTQ+ politicians and sponsor Pride events. Your friend wonders: Does sponsoring a Pride event make a difference?
Luckily, you can help them answer this question with data visualization!
- Make side-by-side box plots of contribution amounts based on whether the company sponsored a Pride.
- Include informative title and axis labels.
- Finally, write a brief narrative comparing the distributions of anti-LGBTQ+ contribution amounts from companies that did and did not sponsor Pride events. Your narrative should touch on whether having sponsored a Pride event “makes a difference” in terms of the contribution amount.
Render, commit, and push, with an appropriate commit message. Before proceeding, check that you’ve committed and pushed all changed documents and your Git pane is completely empty.
Exercise 6
Next, your friend is curious if about the relationship between contribution amounts and number of politicians companies contribute to. They wonder: Does the contribution amount go up as the number of politicians contributed to increases?
Once again, data visualization to the rescue!
- Visualize the relationship between the number of politicians (on the x-axis) and the contribution amount (on the y-axis) and describe the relationship between these two variables.
- Identify any extreme outliers – you will need to dive into the data to figure out which companies these are. (Hint: There are two extreme outliers that are visibly very far away from the rest of the data.)
- Re-create the visualization without these two outliers and comment on whether the relationship is different without them.
Your narrative should touch on whether the contribution amount goes up as the number of politicians contributed to increases.
Render, commit, and push, with an appropriate commit message. Before proceeding, check that you’ve committed and pushed all changed documents and your Git pane is completely empty.
Exercise 7
Create a data visualization of interest to you based on these data. You can use the entire dataset, a subset based on a variable, or just a few companies of particular interest to you. Write a brief (2-3 sentence) narrative on why you chose this visualization and what the visualization displays/reveals.
Render, commit, and push, with an appropriate commit message. Before proceeding, check that you’ve committed and pushed all changed documents and your Git pane is completely empty.
The following exercises are conceptual and do not require a dataset or writing original code, however you may need to refer to code documentation to answer them.
Exercise 8
- Describe the following terms in your own words:
- Data-to-ink ratio
- Snake case
- Whisker (of a box plot)
- Read
?facet_wrap
. What doesnrow
do? What doesncol
do? What other options control the layout of the individual panels? Why doesn’tfacet_grid()
havenrow
andncol
arguments?
You know the drill: render, commit, and push!
Exercise 9
- Fill in the blanks:
- The gg in the name of the package ggplot2 stands for ___.
- If you map the same continuous variable to both
x
andy
aesthetics in a scatterplot, you get a straight ___ line. (Choose between “vertical”, “horizontal”, or “diagonal”.)
- Code style: Fix up the code style by spaces and line breaks where needed. Briefly describe your fixes. (Hint: You can refer to the Tidyverse style guide.)
ggplot(data=mpg,mapping=aes(x=drv,fill=class))+geom_bar() +scale_fill_viridis_d()
Render, commit, and push one last time.
Make sure that you commit and push all changed documents and your Git pane is completely empty before proceeding.
Wrap up
Review
Before you call it “done”,
- make sure you’ve answered all questions (or, at least, have not skipped any accidentally) and
- review the Section 1.2 section and make any edits/corrections needed.
You can change answers to exercises you’ve “completed” and commit and push again. We will only grade the final version of your assignment. You can commit and push as many times as you like until the deadline.
Submission
You do not have to do anything special to “submit” your assignment. We will close the repos to further pushes at the deadline, and will take your work as of that time point as your submission.
If you need to submit late for any reason, contact the professor.
Grading
- Exercise 1-9: 10 points each
- Workflow + formatting: 10 points
- Total: 100 points