Note that the US population dataset was downloaded from the census website and cleaned for purposes of this blog. Link to the raw data can be found at census.gov.
2.1 Dataset
Code
# Read the population datasetpopulation_dataset <- readxl::read_xlsx("population_and_density_change_data_table_US_by_state_1910to2020.xlsx") %>%select(Name, `Geography Type`, Year, `Resident Population`, `Resident Population Density`)# Clean data ----colnames(population_dataset) <-gsub(" ", "_",colnames(population_dataset))# Get the list of states and state count for creating rankingstate_list <- population_dataset %>%distinct(Name) %>%pull()state_count <-length(state_list)population_dataset_clean <- population_dataset %>%# create a clean date columnmutate(Date =anydate(str_c("01/01/",Year),"%m/%d/%y")) %>%# filter for state listfilter(Geography_Type =="State") %>%# Create a rank column for population and density by year# Higher Rank implies higher density relative to other statesgroup_by(Year) %>%arrange(Resident_Population_Density) %>%mutate(Rank_Population_Density =row_number()) %>%ungroup() %>%# Create label columns to use in plots mutate(label_state =str_c(Name,"\n",Year)) %>%mutate(label_plotly =str_c(Name, "\n", Year, "\nResident Population Density: ", Resident_Population_Density, "\n", "Rank: ", Rank_Population_Density)) plot_density_vs_rank <-function(data, state_input) {# Filter data for the states input data_filtered <- data %>%filter(Name %in% (state_input)) plot <- data_filtered %>%ggplot(aes(y = Rank_Population_Density, x = Resident_Population_Density, color = Name)) +geom_point(aes(text = label_plotly), alpha =0.5,show.legend =FALSE, size =4) +geom_line(aes(y = Rank_Population_Density, x = Resident_Population_Density, color = Name), size =1, alpha =0.5)+theme_classic() +theme(text =element_text(family ="mukta"),plot.title =element_text(face ="bold",size =14,hjust = .5),plot.subtitle =element_text(size =12,hjust = .5),plot.caption =element_text(size =10),axis.text.x =element_text(size =20),axis.text.y =element_text(size =20),axis.title.x =element_text(size =15),axis.title.y =element_text(size =15) ) + ggrepel::geom_text_repel(aes(y = Rank_Population_Density, x = Resident_Population_Density, label = label_state, color = Name), nudge_x =0.5, size =5) +ylim(c(0, state_count)) +labs(title ='', caption ="Data: US population by state",y ='Population Density Rank', x ='Population Density') return(plot) }state_dim <-as.data.frame(states) # For getting latitude and longitudepopulation_dataset_lat_long <- population_dataset_clean %>%# Get population dataselect(name = Name, population = Resident_Population, Year) %>%# Create a label for plottingmutate(label_plotly =str_c(name, "\n", Year, "\nPopulation: ", scales::number(population, scale =1e-3, accuracy =1, suffix ="K", big.mark =","))) %>%# Join with dim table to get state longitude and latidudeleft_join(state_dim %>%select(name, state=abb, lat, long)) %>%select(lon = long, lat, population, state, Year,label_plotly)## Map plotting functionplot_usa_population_map <-function(data) {# Transform data for plotting points on map, note that the dataset to be transformed needs to contain "lat" and "lon" columns data_transformed <-usmap_transform(data)# Create the map plot plot_map <-plot_usmap(regions ="state", data=(data), values ="population",labels=TRUE)+geom_point(data = data_transformed, aes(x = x, y = y, text=label_plotly),color ="white", alpha =0.25, size =0.001) +labs(title ="US population by state",# subtitle = 'Year: {closest_state}',size =20) +scale_fill_continuous(low ="white", high ="darkblue", name ="Population",label = scales::comma ) +theme(legend.position ="right",plot.title =element_text(size=20),plot.subtitle =element_text(size =20) )return(plot_map)}
3 GGAnimate
3.1 Geospatial population analysis
I wanted to showcase relative change in population among all states on the US map between the period 1910 to 2020.
3.1.1 Relative population 2020
The plot above shows relative population across states for 2020. To compare how the relative population has changed over time you would need to plot another map for 1910 and switch between the two. Instead, animating this map and creating a second gif allows us to see how the relative population has changed between the two time periods. Note that the animation below is just for two census data points, one in 1910 and one in 2020, not every census year in between.
Warning
The Code structure only shows the basic minimum structure you need to follow to replicate the plot creation. The Code extended tab shows the actual code in use to create the plots and it may use reproducible functions shown above.
The plot above makes it easier to showcase the insight of how California and Texas went from having relatively lower populations as compared to some states on the East Coast to being two of the more highly populated states by 2020.
3.2 Save and share the gif
Another advantage of this package is that you can easily save and share a gif in your word or email reports and updates. Below is the command you can use to save the gif.
# Note this is minimal code that excludes the plot aestheticsggplot_object <- data %>%filter(Name =="California")ggplot(aes(x=Density, y=Rank)) +geom_point() +geom_line()ggplotly(ggplot_object)
The plot above allows us to understand the density versus ranking for California over the 110-year period. Note that growth in population density doesn’t necessarily imply an increase in ranking because the ranking is relative to the density of other states in any year. Creating a plotly graph allows us to hover over each point to get more details such as the year to which each point corresponds. For example, in 1950, California’s population density was 68 with rank 31, putting it ahead of 30 other states. Because time isn’t along either the x- or y-axis, these labels allow us to trace the journey over the third dimension of time.
Next, let’s add a few more states to the plot above.
# Note this is minimal code that excludes the plot aestheticsggplot_object <- data %>%filter(Name %in%c('California', 'Washington', 'Alabama', 'Pennsylvania'))ggplot(aes(x=Density, y=Rank)) +geom_point() +geom_line()ggplotly(ggplot_object)
This is where animation can be very useful to derive insights that a plot with two dimensions doesn’t otherwise allow. I will now use the gganimate package to animate over the third dimension of time. transition_reveal allows us to not only see the points over time but leave the trace behind to show the journey so far.
# Note this is minimal code that excludes the plot aestheticsanimated_object <- ggplot_object +transition_reveal(Date)animate(animated_object, duration =10, renderer =gifski_renderer())
Plot package helps alleviate some of the concerns we saw with GGAnimate while allowing to add the component of interactivity. Now let’s try the same example and recreate the animation with the plotly package. Press the PLAY button to start the animation, or use the Scroll bar to navigate the plot over time.
# Create the framesdata_framed <- data %>%split(.$Year) %>%accumulate(~bind_rows(.x, .y)) %>%bind_rows(.id ="frame") # Plot the framesdata_framed %>%plot_ly(x =~Density, y =~Rank, color =~Name) %>%add_lines(frame =~frame) %>%add_markers(frame =~frame) %>%animation_opts(transition =0)
state_selected <-c('California', 'Washington', 'Alabama', 'Pennsylvania')data <- population_dataset_clean %>%filter(Name %in% state_selected)## Convert to frames data_med <- data %>%arrange(Year, Name) %>%split(.$Year) %>%accumulate(~bind_rows(.x, .y)) %>%bind_rows(.id ="frame") %>%group_by(frame) %>%arrange(Rank_Population_Density)data_med %>%plot_ly(x =~Resident_Population_Density, y =~Rank_Population_Density, color =~Name,hoverinfo ="text", text =~paste0(Name,"\n",Year,"\n",Rank_Population_Density)) %>%add_text(x =250, y =18, text =~Year, frame =~Year,textfont =list(color =toRGB("gray80"), size =40)) %>%add_lines(frame =~frame) %>%add_markers(frame =~frame) %>%animation_opts(frame =1000,transition =0,easing ="bounce" ) %>%hide_legend()
Here are insights from the plot above that now become much easier to showcase:
Washington and California both started with similar low density and ranking in 1910, but California grew much faster and higher than Washington over time.
California rose in both density and ranking over time until 1960, after which it started to rise in density without much change in ranking, both of which came close to Pennsylvania by 2020.
Alabama, while continuing to rise in density, lowered in ranking over time and was topped by California in 1950 and by Washington in 2000.
4.2 Interactivity with Crosstalk package
Crosstalk package allows adding further interactivity to our plotly object. For instance, in the plot above, we want to enable the user to highlight a state, and we can do that with crosstalk + plotly. As previously, use the play button or scroll bar to navigate over time. Use the dropdown called Select a state to further highlight one or more states.
data_crosstalk <- SharedData$new(data_framed, key =~Name, group ="Select a state")# Plot the framesdata_crosstalk %>%plot_ly(x =~Density, y =~Rank, color =~Name) %>%add_lines(frame =~frame) %>%add_markers(frame =~frame) %>%animation_opts(transition =0)
## Plotting function ----# Select states to plotstate_selected <-c('California', 'Washington', 'Alabama', 'Pennsylvania')data <- population_dataset_clean %>%filter(Name %in% state_selected)## Convert to frames data_med <- data %>%arrange(Year, Name) %>%split(.$Year) %>%accumulate(~bind_rows(.x, .y)) %>%bind_rows(.id ="frame") %>%group_by(frame) %>%arrange(Rank_Population_Density)# Create a shared data object keyed by statestate_data <- SharedData$new(data_med, key =~Name, group ="Select a state")# Create a shared data object keyed by statestate_data %>%plot_ly(x =~Resident_Population_Density, y =~Rank_Population_Density, color =~Name,hoverinfo ="text", text =~paste0(Name,"\n",Year,"\n",Rank_Population_Density)) %>%add_text(x =250, y =18, text =~Year, frame =~Year,textfont =list(color =toRGB("gray80"), size =40)) %>%add_lines(frame =~frame) %>%add_markers(frame =~frame) %>%animation_opts(frame =1000, transition =0, easing ="bounce" ) %>%highlight(selectize =TRUE, persistent =TRUE) %>%hide_legend()
4.3 Pros and cons of plotly package
4.3.1 Pros
Addition of interactivity
Much faster rendering for live reports
Available for both R and Python
4.3.2 Cons
Can’t save as a gif and send it over
Requires frame creation
Relative to GGAnimate, Plotly requires a few more steps
5 Bringing these animations to other projects
Both packages are great and depending upon your scenario and medium of communication, you can choose one or the other. There are many scenarios where animation can be extended to help enable taking a step back and telling a broader story, such as:
Plotting customer count versus revenue for multiple products and animating over time to compare across product lines.
Mapping the relative change in support ticket resolution times across customer service centers.
Plotting the relative change in sales cycle timelines pre- and post-marketing campaigns.
And a personal one that I am planning to tackle next for myself: Relative change in expense distribution across various buckets of budgets month over month.