From the latest report from CNN health, authorities in 217 countries and territories have reported about 52.4 million Covid‑19 cases and 1.3 million deaths since China reported its first cases to the World Health Organization (WHO) in December.

In this tutorial I will introduce you how to use plotly to create choropleth map, which is a dynamic map to visualize the spreading of COVID-19 over the 10-month period.

This tutorial is a demonstration of how to manipulate data and perform geospatial visualization using Python. Particularly, this tutorial uses the COVID-19 time series dataset collected by John Hopkins University.

Prepare libraries

The libraries to run this code include 1) Pandas, 2) PyCountry, 3) Plotly

In [ ]:
!pip install plotly
In [ ]:
!pip install pycountry
In [2]:
# Import libraries
import pandas as pd
import plotly.express as px
import pycountry

Load and Clean Dataset

The dataset we will here is the JHU CSSE COVID-19 dataset. You can download or fork the latest update version of it from the CSSE Github repo (https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series).

All the data from this GitHub repository represent the cumulative COVID-19 confirmed, death and recovered cases since the commencement of data record on 22nd January, 2020.

In [89]:
# Import the COVID-19 dataframe
covid_confirm = pd.read_csv('time_series_covid19_confirmed_global.csv')
covid_confirm.head(5)
Out[89]:
Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 ... 11/2/20 11/3/20 11/4/20 11/5/20 11/6/20 11/7/20 11/8/20 11/9/20 11/10/20 11/11/20
0 NaN Afghanistan 33.93911 67.709953 0 0 0 0 0 0 ... 41633 41728 41814 41935 41975 42033 42092 42297 42463 42609
1 NaN Albania 41.15330 20.168300 0 0 0 0 0 0 ... 21523 21904 22300 22721 23210 23705 24206 24731 25294 25801
2 NaN Algeria 28.03390 1.659600 0 0 0 0 0 0 ... 58574 58979 59527 60169 60800 61381 62051 62693 63446 64257
3 NaN Andorra 42.50630 1.521800 0 0 0 0 0 0 ... 4888 4910 5045 5135 5135 5319 5383 5437 5477 5567
4 NaN Angola -11.20270 17.873900 0 0 0 0 0 0 ... 11228 11577 11813 12102 12223 12335 12433 12680 12816 12953

5 rows × 299 columns

Data Cleaning Steps

To visualize the COVID-19 outbreak dynamically using Plotly package, we first clean the dataset with the following steps:

  • Aggregate the data by country level
  • Retrieve a three-letter country codes for each country. You can achieve this by looking at https://pypi.org/project/pycountry/ - Countries section
  • Transform the dataset in a long format - to represent the date value in a single column. Use pd.melt() function.
In [90]:
# Drop unnecessary columns from the dataset 
covid_confirm = covid_confirm.drop(columns = ['Province/State', 'Lat', 'Long'])

# Groupby
covid_confirm = covid_confirm.groupby('Country/Region').agg('sum')
date_list = list(covid_confirm.columns)
In [8]:
# Generate three-letter country codes

def get_country_code(country):
    '''
    Input: Country name
    Output: Three-letter country codes
    Generate three-letter country codes for each country
    '''
    
    try:
        return pycountry.countries.lookup(country).alpha_3
    # If the country is not in the standard country dictionary
    except:
        return None
In [91]:
covid_confirm['country'] = covid_confirm.index

# Use apply function in pandas to country column to generate country code
covid_confirm['country_code'] = covid_confirm['country'].apply(get_country_code)
In [92]:
# Here we can see a wide format dataset
# We need to transform the dataset into long format
# We pass a dataset, assign country and country code as id variables and represent confirmed cases
# based on date as value variable
covid_confirm_long = pd.melt(covid_confirm, id_vars = ['country', 'country_code'], value_vars = date_list)
In [96]:
# It's a good practice to provide descriptive names for the date and confirmed cases
# pd.rename
covid_confirm_long = covid_confirm_long.rename(columns = {'variable':'date', 'value':'confirmed_cases'})

Suppose we want to look at Spain, the cleaned long format dataframe would look like the following:

In [97]:
covid_confirm_long[covid_confirm_long['country']=='Spain']
Out[97]:
country country_code date confirmed_cases
160 Spain ESP 1/22/20 0
351 Spain ESP 1/23/20 0
542 Spain ESP 1/24/20 0
733 Spain ESP 1/25/20 0
924 Spain ESP 1/26/20 0
... ... ... ... ...
55550 Spain ESP 11/7/20 1328832
55741 Spain ESP 11/8/20 1328832
55932 Spain ESP 11/9/20 1381218
56123 Spain ESP 11/10/20 1381218
56314 Spain ESP 11/11/20 1417709

295 rows × 4 columns

Create Map Animation with Plotly

After cleaning the dataset, you can create an animation showing the growing number of COVID-19 infected population with the Express library in Plotly.

Here is a demonstration of creating an animated choropleth map from the cleaned dataframe covid_long. The parameters used for the choropleth map animation include:

  • input dataframe
  • locations - To use the built-in countries geometry, provides locations as three-letter ISO country codes
  • color - the color is set to the accumulated confirmed_cases
  • hover_name - set country as the hover information (showing at the top of the box)
  • animation_frame - this allows us to animate and add the play and stop button under the map
  • title - set the title of the map
  • projection - projecting the earth like a sphere
  • color_continuous_scale - we can set a color scheme
  • range_color - determine the value of the color scale. Use 0 and max value
In [61]:
# You can run this code to get a list of continuous colorscales
# px.colors.named_colorscales()
In [116]:
# Create a figure object 
# Assign confirmed cases data, and parameters for choropleth map
fig = px.choropleth(covid_confirm_long,
                    locations = 'country_code',
                    color = 'confirmed_cases',
                    hover_name = 'country', 
                    animation_frame = 'date',
                    title = 'Total COVID-19 Confirmed Cases by Country',
                    height = 800,
                    projection = 'natural earth',
                    color_continuous_scale = 'rainbow',
                    range_color = [0, 500000]
                   )
fig.update_layout(margin = dict(l =50, r=50, t=100, b=75))
fig.show()
# Save the infected map to html style
fig.write_html("covid_infected_map.html") 

Interpretation

Different countries' epidemics have followed different trajectories. The total confirmed_cases is largely proportional to the population of countries.

The disease has hit the United States especially hard, followed by India and Brazil. About 10.4 million cases have been reported in the country and 242,073 patients have died in the U.S., which will be shown in the next graph.

The choice of range_color is important. Since the data is right-skewed, we are tempted to choose a small value for the maximum value of color range. This makes the graph more dynamic so that changes in color is visible during the animation. However, if we perform log-transformation prior to plotting, assigning the range will become easier.

Recovery and Death Cases

Upon knowing the total confirmed cases worldwide by country, we can apply the similar data cleaning and visualization methods to visualize COVID-19 recovery and death cases from January to November, 2020.

In [108]:
# Import the data
covid_death = pd.read_csv('time_series_covid19_deaths_global.csv')
# Drop unnecessary columns from the dataset 
covid_death = covid_death.drop(columns = ['Province/State', 'Lat', 'Long'])

# Groupby
covid_death = covid_death.groupby('Country/Region').agg('sum')
date_list = list(covid_death.columns)

covid_death['country'] = covid_confirm.index

# Use apply function in pandas to country column to generate country code
covid_death['country_code'] = covid_death['country'].apply(get_country_code)
# We need to transform the dataset into long format
covid_death_long = pd.melt(covid_death, id_vars = ['country', 'country_code'], value_vars = date_list)

# It's a good practice to provide descriptive names for the date and confirmed cases
covid_death_long = covid_death_long.rename(columns = {'variable':'date', 'value':'death_cases'})
covid_death_long[covid_death_long['country']=='Spain']
Out[108]:
country country_code date death_cases
160 Spain ESP 1/22/20 0
351 Spain ESP 1/23/20 0
542 Spain ESP 1/24/20 0
733 Spain ESP 1/25/20 0
924 Spain ESP 1/26/20 0
... ... ... ... ...
55550 Spain ESP 11/7/20 38833
55741 Spain ESP 11/8/20 38833
55932 Spain ESP 11/9/20 39345
56123 Spain ESP 11/10/20 39345
56314 Spain ESP 11/11/20 40105

295 rows × 4 columns

In [118]:
# Create a figure object 
# Assign death cases data, and parameters for choropleth map
fig = px.choropleth(covid_death_long,
                    locations = 'country_code',
                    color = 'death_cases',
                    hover_name = 'country', 
                    animation_frame = 'date',
                    title = 'Total COVID-19 Death Cases by Country',
                    height = 800,
                    projection = 'natural earth',
                    color_continuous_scale = 'rainbow',
                    range_color = [0, 100000]
                   )
fig.update_layout(margin = dict(l =50, r=50, t=100, b=75))
fig.show()
# Save the death map to html style
fig.write_html("covid_death_map.html") 

Future Directions

  • COVID-19 Death case
  • COVID-19 Recovered case
  • COVID-19 Medical Expenditure choropleth map

Documentations