From the latest report from CNN health, authorities in 217 countries and territories have reported about 52.4 million Covid‑19 cases and 1.3 million deaths since China reported its first cases to the World Health Organization (WHO) in December.
In this tutorial I will introduce you how to use plotly to create choropleth map, which is a dynamic map to visualize the spreading of COVID-19 over the 10-month period.
This tutorial is a demonstration of how to manipulate data and perform geospatial visualization using Python. Particularly, this tutorial uses the COVID-19 time series dataset collected by John Hopkins University.
The libraries to run this code include 1) Pandas, 2) PyCountry, 3) Plotly
!pip install plotly
!pip install pycountry
# Import libraries
import pandas as pd
import plotly.express as px
import pycountry
The dataset we will here is the JHU CSSE COVID-19 dataset. You can download or fork the latest update version of it from the CSSE Github repo (https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series).
All the data from this GitHub repository represent the cumulative COVID-19 confirmed, death and recovered cases since the commencement of data record on 22nd January, 2020.
# Import the COVID-19 dataframe
covid_confirm = pd.read_csv('time_series_covid19_confirmed_global.csv')
covid_confirm.head(5)
To visualize the COVID-19 outbreak dynamically using Plotly package, we first clean the dataset with the following steps:
# Drop unnecessary columns from the dataset
covid_confirm = covid_confirm.drop(columns = ['Province/State', 'Lat', 'Long'])
# Groupby
covid_confirm = covid_confirm.groupby('Country/Region').agg('sum')
date_list = list(covid_confirm.columns)
# Generate three-letter country codes
def get_country_code(country):
'''
Input: Country name
Output: Three-letter country codes
Generate three-letter country codes for each country
'''
try:
return pycountry.countries.lookup(country).alpha_3
# If the country is not in the standard country dictionary
except:
return None
covid_confirm['country'] = covid_confirm.index
# Use apply function in pandas to country column to generate country code
covid_confirm['country_code'] = covid_confirm['country'].apply(get_country_code)
# Here we can see a wide format dataset
# We need to transform the dataset into long format
# We pass a dataset, assign country and country code as id variables and represent confirmed cases
# based on date as value variable
covid_confirm_long = pd.melt(covid_confirm, id_vars = ['country', 'country_code'], value_vars = date_list)
# It's a good practice to provide descriptive names for the date and confirmed cases
# pd.rename
covid_confirm_long = covid_confirm_long.rename(columns = {'variable':'date', 'value':'confirmed_cases'})
Suppose we want to look at Spain, the cleaned long format dataframe would look like the following:
covid_confirm_long[covid_confirm_long['country']=='Spain']
After cleaning the dataset, you can create an animation showing the growing number of COVID-19 infected population with the Express library in Plotly.
Here is a demonstration of creating an animated choropleth map from the cleaned dataframe covid_long
. The parameters used for the choropleth map animation include:
# You can run this code to get a list of continuous colorscales
# px.colors.named_colorscales()
# Create a figure object
# Assign confirmed cases data, and parameters for choropleth map
fig = px.choropleth(covid_confirm_long,
locations = 'country_code',
color = 'confirmed_cases',
hover_name = 'country',
animation_frame = 'date',
title = 'Total COVID-19 Confirmed Cases by Country',
height = 800,
projection = 'natural earth',
color_continuous_scale = 'rainbow',
range_color = [0, 500000]
)
fig.update_layout(margin = dict(l =50, r=50, t=100, b=75))
fig.show()
# Save the infected map to html style
fig.write_html("covid_infected_map.html")
Different countries' epidemics have followed different trajectories. The total confirmed_cases
is largely proportional to the population of countries.
The disease has hit the United States especially hard, followed by India and Brazil. About 10.4 million cases have been reported in the country and 242,073 patients have died in the U.S., which will be shown in the next graph.
The choice of range_color
is important. Since the data is right-skewed, we are tempted to choose a small value for the maximum value of color range. This makes the graph more dynamic so that changes in color is visible during the animation. However, if we perform log-transformation prior to plotting, assigning the range will become easier.
Upon knowing the total confirmed cases worldwide by country, we can apply the similar data cleaning and visualization methods to visualize COVID-19 recovery and death cases from January to November, 2020.
# Import the data
covid_death = pd.read_csv('time_series_covid19_deaths_global.csv')
# Drop unnecessary columns from the dataset
covid_death = covid_death.drop(columns = ['Province/State', 'Lat', 'Long'])
# Groupby
covid_death = covid_death.groupby('Country/Region').agg('sum')
date_list = list(covid_death.columns)
covid_death['country'] = covid_confirm.index
# Use apply function in pandas to country column to generate country code
covid_death['country_code'] = covid_death['country'].apply(get_country_code)
# We need to transform the dataset into long format
covid_death_long = pd.melt(covid_death, id_vars = ['country', 'country_code'], value_vars = date_list)
# It's a good practice to provide descriptive names for the date and confirmed cases
covid_death_long = covid_death_long.rename(columns = {'variable':'date', 'value':'death_cases'})
covid_death_long[covid_death_long['country']=='Spain']
# Create a figure object
# Assign death cases data, and parameters for choropleth map
fig = px.choropleth(covid_death_long,
locations = 'country_code',
color = 'death_cases',
hover_name = 'country',
animation_frame = 'date',
title = 'Total COVID-19 Death Cases by Country',
height = 800,
projection = 'natural earth',
color_continuous_scale = 'rainbow',
range_color = [0, 100000]
)
fig.update_layout(margin = dict(l =50, r=50, t=100, b=75))
fig.show()
# Save the death map to html style
fig.write_html("covid_death_map.html")