Data visualizations - Animation and Interactivity

Showing GGAnimate and Plotly packages!
Author

Deepsha Menghani

Published

October 1, 2022

Quarto blog publish details

This blog was created using Quarto and published with Github Pages.

1 Data visualizations

Data visualizations are a key tool in this process of sharing insights with stakeholders.

But why do we care about animated visualizations?

  • To create involuntary shifts of attention
  • Drive your point across more impactfully
  • And sometimes, because it looks pretty
Github repository for code

You can find the code to reproduce this report at deepshamenghani/gganimate_and_plotly

2 US population example

Note that the US population dataset was downloaded from the census website and cleaned for purposes of this blog. Link to the raw data can be found at census.gov.

2.1 Dataset

Code
# Read the population dataset
population_dataset <- readxl::read_xlsx("population_and_density_change_data_table_US_by_state_1910to2020.xlsx") %>% 
    select(Name, `Geography Type`, Year, `Resident Population`, `Resident Population Density`)

# Clean data ----

colnames(population_dataset) <- gsub(" ", "_",colnames(population_dataset))

# Get the list of states and state count for creating ranking
state_list <- population_dataset %>% distinct(Name) %>% pull()
state_count <- length(state_list)

population_dataset_clean <- population_dataset %>% 
    # create a clean date column
    mutate(Date = anydate(str_c("01/01/",Year),"%m/%d/%y")) %>%
    # filter for state list
    filter(Geography_Type == "State") %>%
    # Create a rank column for population and density by year
    # Higher Rank implies higher density relative to other states
    group_by(Year) %>% 
    arrange(Resident_Population_Density) %>% 
    mutate(Rank_Population_Density = row_number()) %>% 
    ungroup() %>% 
    # Create label columns to use in plots    
    mutate(label_state = str_c(Name,"\n",Year)) %>%
    mutate(label_plotly = str_c(Name, "\n", Year, "\nResident Population Density: ", Resident_Population_Density, "\n", "Rank: ", Rank_Population_Density)) 

plot_density_vs_rank <- function(data, state_input) {
    # Filter data for the states input
    data_filtered <- data %>% 
        filter(Name %in% (state_input))
    
    plot <- data_filtered %>%
        ggplot(aes(y = Rank_Population_Density, x = Resident_Population_Density,  color = Name)) +
        geom_point(aes(text = label_plotly), alpha = 0.5,
                   show.legend = FALSE, size = 4) +
        geom_line(aes(y = Rank_Population_Density, x = Resident_Population_Density, color = Name), size = 1, alpha = 0.5)+
        theme_classic() +
        theme(
            text = element_text(family = "mukta"),
            plot.title = element_text(face = "bold",
                                      size = 14,
                                      hjust = .5),
            plot.subtitle = element_text(size = 12,
                                         hjust = .5),
            plot.caption = element_text(size = 10),
            axis.text.x = element_text(size = 20),
            axis.text.y = element_text(size = 20),
            axis.title.x = element_text(size = 15),
            axis.title.y = element_text(size = 15)
        ) +
        ggrepel::geom_text_repel(aes(y = Rank_Population_Density, x = Resident_Population_Density, label = label_state, color = Name), nudge_x = 0.5, size = 5)  +
        ylim(c(0, state_count)) +
        labs(title = '', 
             caption = "Data: US population by state",
             y = 'Population Density Rank', 
             x = 'Population Density') 
    
    return(plot)    
}

state_dim <- as.data.frame(states) # For getting latitude and longitude

population_dataset_lat_long <- population_dataset_clean %>% 
    # Get population data
    select(name = Name, population = Resident_Population, Year)  %>%
    # Create a label for plotting
    mutate(label_plotly = str_c(name, "\n", Year, "\nPopulation: ", scales::number(population, scale = 1e-3, accuracy = 1, suffix = "K", big.mark = ","))) %>% 
    # Join with dim table to get state longitude and latidude
    left_join(state_dim %>% select(name, state=abb, lat, long)) %>% 
    select(lon = long, lat, population, state, Year,label_plotly)

## Map plotting function
plot_usa_population_map <- function(data) {
    
    # Transform data for plotting points on map, note that the dataset to be transformed needs to contain "lat" and "lon" columns
    data_transformed <- usmap_transform(data)
    
    # Create the map plot
    plot_map <- plot_usmap(regions = "state", data=(data), values = "population",labels=TRUE)+
        geom_point(data = data_transformed, aes(x = x, y = y, text=label_plotly),
                   color = "white", alpha = 0.25, size = 0.001) +
        labs(title = "US population by state",
             # subtitle = 'Year: {closest_state}',
             size = 20) +
        scale_fill_continuous(low = "white", high ="darkblue", 
                              name = "Population",label = scales::comma
        ) + 
        theme(
            legend.position = "right",
            plot.title = element_text(size=20),
            plot.subtitle = element_text(size = 20)
        )
    
    return(plot_map)
}

3 GGAnimate

3.1 Geospatial population analysis

I wanted to showcase relative change in population among all states on the US map between the period 1910 to 2020.

3.1.1 Relative population 2020

The plot above shows relative population across states for 2020. To compare how the relative population has changed over time you would need to plot another map for 1910 and switch between the two. Instead, animating this map and creating a second gif allows us to see how the relative population has changed between the two time periods. Note that the animation below is just for two census data points, one in 1910 and one in 2020, not every census year in between.

Warning

The Code structure only shows the basic minimum structure you need to follow to replicate the plot creation. The Code extended tab shows the actual code in use to create the plots and it may use reproducible functions shown above.

3.1.2 Animate


usa_population_map_animated <- map_plot +
    transition_states(Year) 

animate(usa_population_map_animated, 
    renderer = gifski_renderer()
)

usa_population_map_multiyear <- plot_usa_population_map(data = population_dataset_lat_long %>% filter(Year %in% c(1910,2020)))

usa_population_map_animated <- usa_population_map_multiyear +
    labs(subtitle = 'Year: {closest_state}') +
    transition_states(Year) + 
    ease_aes('linear')

animate(usa_population_map_animated, duration =2, fps = 10, width = 900, height = 600, renderer = gifski_renderer())


Let’s slow that down

If the animation is really fast, you can change the duration parameter in the animate command along with other things like the figure dimensions.



usa_population_map_animated <- map_plot +
    transition_states(Year) 

animate(usa_population_map_animated, 
    duration = 10,
    renderer = gifski_renderer()
)


usa_population_map_multiyear <- plot_usa_population_map(data = population_dataset_lat_long %>% filter(Year %in% c(1910,2020)))

usa_population_map_animated <- usa_population_map_multiyear +
    labs(subtitle = 'Year: {closest_state}') +
    transition_states(Year) + 
    ease_aes('linear')

animate(usa_population_map_animated, duration =10, fps = 10, width = 900, height = 600, renderer = gifski_renderer())

The plot above makes it easier to showcase the insight of how California and Texas went from having relatively lower populations as compared to some states on the East Coast to being two of the more highly populated states by 2020.


3.2 Save and share the gif

Another advantage of this package is that you can easily save and share a gif in your word or email reports and updates. Below is the command you can use to save the gif.

anim_save(file = "usa_population_map_animated.gif")


Great, now let’s look at another plot example…

3.3 Population Density Rank vs population density

3.3.1 Dataset



# Note this is minimal code that excludes the plot aesthetics

ggplot_object <- data %>% 
                 filter(Name == "California")
                  ggplot(aes(x=Density, y=Rank)) +
                  geom_point() +
                  geom_line()

ggplotly(ggplot_object)

density_rank_ggplot <- plot_density_vs_rank(
  data = population_dataset_clean, 
  state_input = c('California')
)

ggplotly(density_rank_ggplot, tooltip = "text")

The plot above allows us to understand the density versus ranking for California over the 110-year period. Note that growth in population density doesn’t necessarily imply an increase in ranking because the ranking is relative to the density of other states in any year. Creating a plotly graph allows us to hover over each point to get more details such as the year to which each point corresponds. For example, in 1950, California’s population density was 68 with rank 31, putting it ahead of 30 other states. Because time isn’t along either the x- or y-axis, these labels allow us to trace the journey over the third dimension of time.

Next, let’s add a few more states to the plot above.



# Note this is minimal code that excludes the plot aesthetics

ggplot_object <- data %>% 
                 filter(Name %in% c('California', 'Washington', 'Alabama', 'Pennsylvania'))
                  ggplot(aes(x=Density, y=Rank)) +
                  geom_point() +
                  geom_line()

ggplotly(ggplot_object)

density_rank_plot_multistate <- plot_density_vs_rank(
  data = population_dataset_clean,
  state_input = c('California', 'Washington', 'Alabama', 'Pennsylvania')
)

ggplotly(density_rank_plot_multistate, tooltip = "text")


This is where animation can be very useful to derive insights that a plot with two dimensions doesn’t otherwise allow. I will now use the gganimate package to animate over the third dimension of time. transition_reveal allows us to not only see the points over time but leave the trace behind to show the journey so far.


# Note this is minimal code that excludes the plot aesthetics

animated_object <- ggplot_object +
    transition_reveal(Date)

animate(animated_object, 
        duration =10, 
        renderer = gifski_renderer())

density_rank_plot_multistate_animated <- density_rank_plot_multistate +
    transition_reveal(Date) + 
    ease_aes('linear')

animate(density_rank_plot_multistate_animated, duration =20, fps = 10, width = 900, height = 600, renderer = gifski_renderer())

3.4 Pros and Cons of GGAnimate package

3.4.1 Pros

  • Easily animate almost any ggplot object

  • Make it part of your report or save as a gif

3.4.2 Cons

  • No interactivity to pause and play

  • Very slow rendering of gif

  • Cannot focus on only part of the plot

4 Plotly Animation package

Plot package helps alleviate some of the concerns we saw with GGAnimate while allowing to add the component of interactivity. Now let’s try the same example and recreate the animation with the plotly package. Press the PLAY button to start the animation, or use the Scroll bar to navigate the plot over time.

4.1 Animation using plotly


# Create the frames
data_framed <- data %>%
  split(.$Year) %>%
  accumulate(~bind_rows(.x, .y)) %>%
  bind_rows(.id = "frame") 

# Plot the frames
data_framed %>%
  plot_ly(x = ~Density, y = ~Rank, color = ~Name)  %>%
  add_lines(frame = ~frame) %>%
  add_markers(frame = ~frame) %>%
  animation_opts(transition = 0) 


state_selected <- c('California', 'Washington', 'Alabama', 'Pennsylvania')
data <- population_dataset_clean %>% filter(Name %in% state_selected)

## Convert to frames 

data_med <- data %>%
  arrange(Year, Name) %>% 
  split(.$Year) %>%
  accumulate(~bind_rows(.x, .y)) %>%
  bind_rows(.id = "frame") %>%
  group_by(frame) %>% 
  arrange(Rank_Population_Density)

data_med %>%
  plot_ly(x = ~Resident_Population_Density, y = ~Rank_Population_Density, color = ~Name,
    hoverinfo = "text", text = ~paste0(Name,"\n",Year,"\n",Rank_Population_Density))  %>%
  add_text(x = 250, y = 18, text = ~Year, frame = ~Year,
           textfont = list(color = toRGB("gray80"), size = 40)) %>%
  add_lines(frame = ~frame) %>%
  add_markers(frame = ~frame) %>%
  animation_opts(
    frame = 1000,
    transition = 0,
    easing = "bounce"
  ) %>% 
  hide_legend()

Here are insights from the plot above that now become much easier to showcase:

  • Washington and California both started with similar low density and ranking in 1910, but California grew much faster and higher than Washington over time.
  • California rose in both density and ranking over time until 1960, after which it started to rise in density without much change in ranking, both of which came close to Pennsylvania by 2020.
  • Alabama, while continuing to rise in density, lowered in ranking over time and was topped by California in 1950 and by Washington in 2000.

4.2 Interactivity with Crosstalk package

Crosstalk package allows adding further interactivity to our plotly object. For instance, in the plot above, we want to enable the user to highlight a state, and we can do that with crosstalk + plotly. As previously, use the play button or scroll bar to navigate over time. Use the dropdown called Select a state to further highlight one or more states.


data_crosstalk <- SharedData$new(data_framed, key = ~Name, group = "Select a state")

# Plot the frames
data_crosstalk %>%
  plot_ly(x = ~Density, y = ~Rank, color = ~Name)  %>%
  add_lines(frame = ~frame) %>%
  add_markers(frame = ~frame) %>%
  animation_opts(transition = 0) 

## Plotting function ----
# Select states to plot

state_selected <- c('California', 'Washington', 'Alabama', 'Pennsylvania')
data <- population_dataset_clean %>% filter(Name %in% state_selected)

## Convert to frames 

data_med <- data %>%
  arrange(Year, Name) %>% 
  split(.$Year) %>%
  accumulate(~bind_rows(.x, .y)) %>%
  bind_rows(.id = "frame") %>%
  group_by(frame) %>% 
  arrange(Rank_Population_Density)

# Create a shared data object keyed by state
state_data <- SharedData$new(data_med, key = ~Name, group = "Select a state")

# Create a shared data object keyed by state
state_data %>%
  plot_ly(x = ~Resident_Population_Density, y = ~Rank_Population_Density, color = ~Name,
    hoverinfo = "text", text = ~paste0(Name,"\n",Year,"\n",Rank_Population_Density))  %>%
  add_text(x = 250, y = 18, text = ~Year, frame = ~Year,
           textfont = list(color = toRGB("gray80"), size = 40)) %>%
  add_lines(frame = ~frame) %>%
  add_markers(frame = ~frame) %>%
  animation_opts(
    frame = 1000, 
    transition = 0, 
    easing = "bounce"
  ) %>%
  highlight(selectize = TRUE, persistent = TRUE) %>% 
  hide_legend()

4.3 Pros and cons of plotly package

4.3.1 Pros

  • Addition of interactivity
  • Much faster rendering for live reports
  • Available for both R and Python

4.3.2 Cons

  • Can’t save as a gif and send it over
  • Requires frame creation
  • Relative to GGAnimate, Plotly requires a few more steps

5 Bringing these animations to other projects

Both packages are great and depending upon your scenario and medium of communication, you can choose one or the other. There are many scenarios where animation can be extended to help enable taking a step back and telling a broader story, such as:

  • Plotting customer count versus revenue for multiple products and animating over time to compare across product lines.
  • Mapping the relative change in support ticket resolution times across customer service centers.
  • Plotting the relative change in sales cycle timelines pre- and post-marketing campaigns.
  • And a personal one that I am planning to tackle next for myself: Relative change in expense distribution across various buckets of budgets month over month.

6 Takeaway

  • Try out animation, it is very easy
  • Don’t animate everything, just because you can!

Because Spiderman said so

Github repository for code

You can find the code to reproduce this report at deepshamenghani/gganimate_and_plotly