The year is 2012. President Barack Obama is running for re-election on the Democratic ticket and Republican Mitt Romney is running against him. News coverage of the election is constant and news outlets are constantly trying new flashy web experiences to draw in and keep eyeballs on their pages. The New York Times, like many newspapers, was slow to realize the importance of its web content but by 2012 was on the forefront of offering interactive experiences for their readers.
Enter Mike Bostock, a Ph.D. student with Jeffrey Heer at Stanford. He was tired of the hassle of using Java, Flash, and other cumbersome web technologies to create interactive visualizations. Seeing what these visualizations looked like is hard since modern web browsers don’t retain the necessary software to run them, but you can see some of the Flash-based visualizations at the New York Times here: https://flowingdata.com/2024/01/10/nyt-flash-based-visualizations-work-again/. The old Java and Flash tech was slow and required plugins that were constant sources of security holes for web browsers. Bostock and collaborators ditched these old tools and used JavaScript, the language of web browsers, Cascading Style Sheets (CSS), and Scalable Vector Graphics (SVGs), to create a JavaScript library for Data-Driven Documents or D3.js. JavaScript, CSS, and SVG were all built into modern web browsers by 2011 and were commonly used by web developers so D3.js was super fast and relatively easy for devs to pick up too.
Bostock started working with the New York Times in 2012 and created some iconic graphics for the 2012 election. For example, the figure below let’s one explore presidential election results by state and see how state results have shifted from election to election.
Election results by state through 2012
While D3.js was a huge improvement over previous tools, it still has a steep learning curve to the non-web programmer. Thus, other tools have built on D3.js that allow data scientists familiar with R or Python or Julia to build interactive visualizations. In essence, what these tools do use R or Python or Julia to wrangle the data and create a graphic while the underlying javascript is used to display the graphic within a web browser window, which allows for interactivity. All of that can happen with R, where your interactive plot will open in a web browser window, or within RStudio, where the plot will open in the viewer. Using these tools, you can create an interactive data dashboard where the user can explore data and create their own custom versions of graphics and plots.
Interactive plots with Plotly.js
Plotly.js is a high-level, declarative charting library. plotly.js ships with over 40 chart types, including 3D charts, statistical graphs, and SVG maps.
Although Plotly.js can be used to create visualizations on its own, we will use it through an interface within R, https://plotly-r.com/. Plotly has its own framework and model for creating graphics that differs substantially from ggplot (see Chapter 2.1 and 2.2 here https://plotly-r.com/). Conveniently, Plotly also can convert ggplot graphics into Plotly without having to interface with Plotly fundamentals directly. For example, let load some COVID-19 hospitalization and mortality data
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.0 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
Rows: 11913 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): state
dbl (10): avg_adm_all_covid_confirmed, pct_chg_avg_adm_all_covid_confirmed_...
date (1): week_ending_date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
us_deaths =read_csv("US_COVID19_Deaths_ByWeek_ByState_20240125.csv") |>mutate(`Week Ending Date`=mdy(`Week Ending Date`)) # death table dates need conversion
Rows: 14364 Columns: 17
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): Data as of, Start Date, End Date, Group, Year, Week Ending Date, St...
dbl (9): Month, MMWR Week, COVID-19 Deaths, Total Deaths, Percent of Expecte...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
us_hosps_deaths = us_deaths |>rename(week_ending_date =`Week Ending Date`, state = State) |># rename columns so they match across the two tablesselect(-c(`Data as of`, `Start Date`, `End Date`, Group, Year, Month, `MMWR Week`, Footnote)) |># get rid of excess columns in deaths tableinner_join( # join the two tables together us_hosps |>rename(state_abbrv = state) |># hosps has states as abbreviations so we'll need to add full state namesleft_join(tibble(state_abbrv = state.abb, state = state.name) |>add_row(state_abbrv =c("USA", "DC", "PR"), state =c("United States", "District of Columbia", "Puerto Rico")))) |>filter(state !="United States")
Joining with `by = join_by(state_abbrv)`
Joining with `by = join_by(week_ending_date, state)`
Suppose that we wanted to see the hospital admissions for COVID-19 per capita for each state
p = us_hosps_deaths |>ggplot(aes(x = week_ending_date, y = total_adm_all_covid_confirmed_past_7days_per_100k)) +geom_line(aes(color=state), show.legend =FALSE)p
Definitely some strong common trends, but wouldn’t it be nice to be able to zoom in interactively on the plot? We can do this with Plotly. We simply apply the ggplotly function to our graphic.
ggplotly(p, dynamicTicks =TRUE)
Now the graphic has a toolbar where we can select zoom, pan, and reset axes. Bringing the cursor over a line in the plot calls up a popup with the underlying data for the x,y location of that data. Double clicking in the plot pane also resets the axes. Clicking on the state names in the legend hides or reveals each state. The dynamicTicks = TRUE option also nices let’s the ticks change when we zoom in so that the x and y information is more granular. You can also change which buttons are visible in the widget:
Plotly also has a number of animation features. The simplest case is where we want some variable to change value during the animation. The ggplotly function allows for this through frame = variable in aes where variable is the variable you want to change over the course of the animation. For example, instead of color = state in the plot above, we can set frame = state:
p = us_hosps_deaths |>ggplot(aes(x = week_ending_date, y = total_adm_all_covid_confirmed_past_7days_per_100k)) +geom_line(aes(frame=state), show.legend =FALSE)
Selecting individual plot elements is all well and good, but a complex interactive plot or dashboard will allow for you to select multiple elements and for multiple consequences for those selections. Plotly allows for selecting and highlighting with the highlight function. Essentially, you create a new Plotly object based on the data table where this object knows which variable is going to be used to select by. In our case, we can select each timeseries by state.
filter_states <-highlight_key(us_hosps_deaths, ~state, "Select a state")p = filter_states |>ggplot(aes(x = week_ending_date, y = total_adm_all_covid_confirmed_past_7days_per_100k)) +geom_line(aes(group=state))highlight(ggplotly(p, tooltip ="state"), selectize =TRUE)
Setting the `off` event (i.e., 'plotly_doubleclick') to match the `on` event (i.e., 'plotly_click'). You can change this default via the `highlight()` function.
The real magic is linking the selection to another plot element or statistic. We do this below for a histogram of the hospital cases that changes to whatever states are selected.
Setting the `off` event (i.e., 'plotly_doubleclick') to match the `on` event (i.e., 'plotly_click'). You can change this default via the `highlight()` function.
Something is off about the state selection window…
Create a data dashboard involves the construction and linking of multiple plot and selection elements that slice and display the data interactively. This give the data scientist a huge amount of flexibility in allowing the user to explore the data but it comes at the cost of steeply increased code complexity.
Dashboards
Producing dashboards is a sophisticated topic that we cannot cover in detail here. Instead, I will sketch out some places to go to find out more.
flexdashboard. This uses a RMarkdown document to create the layout and specify the charts and graphs.
Shiny. Shiny is a framework for creating web applications using R code. It is powerful but has a steep learning curve. You will have to learn about web servers, the basics of a user interface, and reactive programming. Nevertheless, whatever you can dream up can likely be implemented in Shiny.