Storytelling Data Visualization on Exchange Rates¶

In this project, we will focus on explanatory data visualization and practice the following:

  • How to use information design principles (familiarity and maximizing the data-ink ratio) to create better graphs for an audience.
  • About the elements of a story and how to create storytelling data visualizations using Matplotlib.
  • How to cretate visual patterns using Gestalt principles.
  • How to guide the audience's attention with pre-attentive attributes.
  • How to use Matplotlib built-in sytles — with a case study on the FiveThirtyEight style.

Introducing the Dataset¶

The dataset we'll use describes Euro daily exhcange rates between 1999 and 2021. The euro (symbolized with €) is the official currency in most of the countries of the European Union.

If the exchange rate of the euro to the US dollar is 1.5, you get 1.5 US dollars if you pay 1.0 euro (one euro has more value than one US dollar at this exchange rate).

Daria Chemkaeva put together the data set and made it available on Kaggle — the data source is the European Central Bank. Note that the dataset gets regualr updates — the one in this project is downloaded on March 2023.

Let's start by reading in the dataset.

In [1]:
import pandas as pd

exchange_rates = pd.read_csv('euro-daily-hist_1999_2022.csv')
exchange_rates.head()
Out[1]:
Period\Unit: [Australian dollar ] [Bulgarian lev ] [Brazilian real ] [Canadian dollar ] [Swiss franc ] [Chinese yuan renminbi ] [Cypriot pound ] [Czech koruna ] [Danish krone ] ... [Romanian leu ] [Russian rouble ] [Swedish krona ] [Singapore dollar ] [Slovenian tolar ] [Slovak koruna ] [Thai baht ] [Turkish lira ] [US dollar ] [South African rand ]
0 2023-01-27 1.5289 1.9558 5.5104 1.4479 1.0017 7.3690 NaN 23.826 7.4378 ... 4.8965 NaN 11.2108 1.4277 NaN NaN 35.7020 20.4365 1.0865 18.7185
1 2023-01-26 1.5308 1.9558 5.5572 1.4568 1.0002 7.3893 NaN 23.818 7.4383 ... 4.8818 NaN 11.1763 1.4292 NaN NaN 35.6870 20.4961 1.0895 18.6127
2 2023-01-25 1.5360 1.9558 5.5690 1.4544 1.0020 7.3778 NaN 23.808 7.4381 ... 4.9035 NaN 11.1335 1.4307 NaN NaN 35.7180 20.4658 1.0878 18.6745
3 2023-01-24 1.5470 1.9558 5.6164 1.4517 1.0053 7.3642 NaN 23.874 7.4399 ... 4.9171 NaN 11.0995 1.4337 NaN NaN 35.6090 20.4234 1.0858 18.7687
4 2023-01-23 1.5529 1.9558 5.6372 1.4523 1.0013 7.3730 NaN 23.881 7.4393 ... 4.9202 NaN 11.1183 1.4328 NaN NaN 35.6300 20.4478 1.0871 18.6597

5 rows × 41 columns

In [2]:
exchange_rates.tail()
Out[2]:
Period\Unit: [Australian dollar ] [Bulgarian lev ] [Brazilian real ] [Canadian dollar ] [Swiss franc ] [Chinese yuan renminbi ] [Cypriot pound ] [Czech koruna ] [Danish krone ] ... [Romanian leu ] [Russian rouble ] [Swedish krona ] [Singapore dollar ] [Slovenian tolar ] [Slovak koruna ] [Thai baht ] [Turkish lira ] [US dollar ] [South African rand ]
6224 1999-01-08 1.8406 NaN NaN 1.7643 1.6138 NaN 0.58187 34.938 7.4433 ... 1.3143 27.2075 9.1650 1.9537 188.8400 42.560 42.5590 0.3718 1.1659 6.7855
6225 1999-01-07 1.8474 NaN NaN 1.7602 1.6165 NaN 0.58187 34.886 7.4431 ... 1.3092 26.9876 9.1800 1.9436 188.8000 42.765 42.1678 0.3701 1.1632 6.8283
6226 1999-01-06 1.8820 NaN NaN 1.7711 1.6116 NaN 0.58200 34.850 7.4452 ... 1.3168 27.4315 9.3050 1.9699 188.7000 42.778 42.6949 0.3722 1.1743 6.7307
6227 1999-01-05 1.8944 NaN NaN 1.7965 1.6123 NaN 0.58230 34.917 7.4495 ... 1.3168 26.5876 9.4025 1.9655 188.7750 42.848 42.5048 0.3728 1.1790 6.7975
6228 1999-01-04 1.9100 NaN NaN 1.8004 1.6168 NaN 0.58231 35.107 7.4501 ... 1.3111 25.2875 9.4696 1.9554 189.0450 42.991 42.6799 0.3723 1.1789 6.9358

5 rows × 41 columns

In [3]:
exchange_rates.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6229 entries, 0 to 6228
Data columns (total 41 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Period\Unit:              6229 non-null   object 
 1   [Australian dollar ]      6229 non-null   object 
 2   [Bulgarian lev ]          5827 non-null   object 
 3   [Brazilian real ]         5961 non-null   object 
 4   [Canadian dollar ]        6229 non-null   object 
 5   [Swiss franc ]            6229 non-null   object 
 6   [Chinese yuan renminbi ]  5961 non-null   object 
 7   [Cypriot pound ]          2346 non-null   object 
 8   [Czech koruna ]           6229 non-null   object 
 9   [Danish krone ]           6229 non-null   object 
 10  [Estonian kroon ]         3130 non-null   object 
 11  [UK pound sterling ]      6229 non-null   object 
 12  [Greek drachma ]          520 non-null    object 
 13  [Hong Kong dollar ]       6229 non-null   object 
 14  [Croatian kuna ]          5941 non-null   object 
 15  [Hungarian forint ]       6229 non-null   object 
 16  [Indonesian rupiah ]      6229 non-null   object 
 17  [Israeli shekel ]         5961 non-null   object 
 18  [Indian rupee ]           5961 non-null   object 
 19  [Iceland krona ]          3822 non-null   float64
 20  [Japanese yen ]           6229 non-null   object 
 21  [Korean won ]             6229 non-null   object 
 22  [Lithuanian litas ]       4159 non-null   object 
 23  [Latvian lats ]           3904 non-null   object 
 24  [Maltese lira ]           2346 non-null   object 
 25  [Mexican peso ]           6229 non-null   object 
 26  [Malaysian ringgit ]      6229 non-null   object 
 27  [Norwegian krone ]        6229 non-null   object 
 28  [New Zealand dollar ]     6229 non-null   object 
 29  [Philippine peso ]        6229 non-null   object 
 30  [Polish zloty ]           6229 non-null   object 
 31  [Romanian leu ]           6167 non-null   float64
 32  [Russian rouble ]         5994 non-null   object 
 33  [Swedish krona ]          6229 non-null   object 
 34  [Singapore dollar ]       6229 non-null   object 
 35  [Slovenian tolar ]        2085 non-null   object 
 36  [Slovak koruna ]          2608 non-null   object 
 37  [Thai baht ]              6229 non-null   object 
 38  [Turkish lira ]           6167 non-null   float64
 39  [US dollar ]              6229 non-null   object 
 40  [South African rand ]     6229 non-null   object 
dtypes: float64(3), object(38)
memory usage: 1.9+ MB

Data Cleaning¶

Before we start creating data visualizations, we'll need to do a bit of data cleaning. Our focus in this porject will be on the exchange rate between the euro and the American dollar. Below we do the following:

  • Rename the [US dollar] and Period\Unit: columns to something easier to type — US_dollar and Time.
  • Change the Time column to a datetime data type.
  • Sort the values by Time in ascending order.
  • Reset the index (and drop the initial index).
  • Change the 'US_dollar' column to float data type.
In [4]:
exchange_rates.rename(columns={'[US dollar ]': 'US_dollar',
                               'Period\\Unit:': 'Time'},
                               inplace=True)
exchange_rates['Time'] = pd.to_datetime(exchange_rates['Time'])
exchange_rates.sort_values('Time', inplace=True)
exchange_rates.reset_index(drop=True, inplace=True)
exchange_rates.head()
Out[4]:
Time [Australian dollar ] [Bulgarian lev ] [Brazilian real ] [Canadian dollar ] [Swiss franc ] [Chinese yuan renminbi ] [Cypriot pound ] [Czech koruna ] [Danish krone ] ... [Romanian leu ] [Russian rouble ] [Swedish krona ] [Singapore dollar ] [Slovenian tolar ] [Slovak koruna ] [Thai baht ] [Turkish lira ] US_dollar [South African rand ]
0 1999-01-04 1.9100 NaN NaN 1.8004 1.6168 NaN 0.58231 35.107 7.4501 ... 1.3111 25.2875 9.4696 1.9554 189.0450 42.991 42.6799 0.3723 1.1789 6.9358
1 1999-01-05 1.8944 NaN NaN 1.7965 1.6123 NaN 0.58230 34.917 7.4495 ... 1.3168 26.5876 9.4025 1.9655 188.7750 42.848 42.5048 0.3728 1.1790 6.7975
2 1999-01-06 1.8820 NaN NaN 1.7711 1.6116 NaN 0.58200 34.850 7.4452 ... 1.3168 27.4315 9.3050 1.9699 188.7000 42.778 42.6949 0.3722 1.1743 6.7307
3 1999-01-07 1.8474 NaN NaN 1.7602 1.6165 NaN 0.58187 34.886 7.4431 ... 1.3092 26.9876 9.1800 1.9436 188.8000 42.765 42.1678 0.3701 1.1632 6.8283
4 1999-01-08 1.8406 NaN NaN 1.7643 1.6138 NaN 0.58187 34.938 7.4433 ... 1.3143 27.2075 9.1650 1.9537 188.8400 42.560 42.5590 0.3718 1.1659 6.7855

5 rows × 41 columns

In this project, we'll focus on Euro to US dollar exchange rate.

In [5]:
euro_to_dollar = exchange_rates[['Time', 'US_dollar']].copy()
euro_to_dollar['US_dollar'].value_counts()
Out[5]:
-         62
1.2276     9
1.1215     8
1.1268     7
1.1305     7
          ..
1.3818     1
1.3591     1
1.3405     1
1.3580     1
1.0865     1
Name: US_dollar, Length: 3718, dtype: int64

There are 62 '-' characters in the US_dollar column, which need to be removed as invalid value.

In [6]:
euro_to_dollar = euro_to_dollar[euro_to_dollar['US_dollar'] != '-']
euro_to_dollar['US_dollar'] = euro_to_dollar['US_dollar'].astype(float)

Rolling Mean¶

Now that we're finished cleaning the data, we'll generate a line plot to visualize the evolution of the euro-dollar exhange rate.

In [7]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.plot(euro_to_dollar['Time'], euro_to_dollar['US_dollar'])
plt.show()

If we look at the line's shape, we see many small wiggles — rather than seeing a smooth line. The wiggles, however, have meaning: : they are the visual representation of the daily variation in the exchange rate. The rate goes up and down, up and down again, day to day. The rate only shows clear upward or downward trends in the longer run (months or years).

Depending on our goals, we may not want to show that daily variation on our graph. If we want to hide it and show only the long-term trends, we can use the rolling mean (also known as the moving average). Here we apply different moving windows for the visualization.

In [8]:
plt.figure(figsize=(9, 6))

plt.subplot(3,2,1)
plt.plot(euro_to_dollar['Time'], euro_to_dollar['US_dollar'])
plt.title('Roling Window = 0 Days', weight='bold')

windows = [7, 30, 50, 100, 365]
for i in range(5):
    plt.subplot(3,2,i+2)
    plt.plot(euro_to_dollar['Time'],
             euro_to_dollar['US_dollar'].rolling(windows[i]).mean())
    plt.title('Rolling Window = ' + str(windows[i]) + ' Days', weight='bold')

plt.tight_layout() # Auto-adjusts the padding between subplots
plt.show()

Let's choose the rolling window of 30 days for the following analysis.

In [9]:
euro_to_dollar['rolling_mean'] = euro_to_dollar['US_dollar'].rolling(30).mean()
euro_to_dollar
Out[9]:
Time US_dollar rolling_mean
0 1999-01-04 1.1789 NaN
1 1999-01-05 1.1790 NaN
2 1999-01-06 1.1743 NaN
3 1999-01-07 1.1632 NaN
4 1999-01-08 1.1659 NaN
... ... ... ...
6224 2023-01-23 1.0871 1.067830
6225 2023-01-24 1.0858 1.068817
6226 2023-01-25 1.0878 1.069927
6227 2023-01-26 1.0895 1.070747
6228 2023-01-27 1.0865 1.071560

6167 rows × 3 columns

Coming Up With an Idea¶

To create a story, remember that we need to arrange our data into a series of events that show change. Here are two story ideas for our data:

  • Show how the euro-dollar rate changed during the 2007-2008's financial crisis. We can also show the data for 2016 and 2009 for comparison. We can use a line plot.
  • Show comparatively how the euro-dollar rate changed under the last three US presidents (George W. Bush (2001-2009), Barack Obama (2009-2017), and Donald Trump (2017-2021)). We can use a line plot.

Financial Crisis¶

In [10]:
financial_crisis = euro_to_dollar.copy(
                   )[(euro_to_dollar['Time'].dt.year >= 2006
                   ) & (euro_to_dollar['Time'].dt.year <= 2009)]
financial_crisis_7_8 = euro_to_dollar.copy(
                   )[(euro_to_dollar.Time.dt.year >= 2007
                   ) & (euro_to_dollar.Time.dt.year <= 2008)]
In [11]:
### Adding the FiveThirtyEight style
import matplotlib.style as style
import warnings
warnings.filterwarnings("ignore")
style.use('fivethirtyeight')

### Adding the plot
fig, ax = plt.subplots(figsize=(8, 3))
ax.plot(financial_crisis['Time'],
        financial_crisis['rolling_mean'],
        linewidth=1, color='#A6D785')

### Highlighting the 2007-2008 period
ax.plot(financial_crisis_7_8['Time'],
        financial_crisis_7_8['rolling_mean'],
        linewidth=3, color='#e23d28')

### Highlighting the peak of the crisis
ax.axvspan(xmin=13960, xmax=14140, ymin=0.09,
           alpha=0.3, color='grey')

### Adding separate tick labels
ax.set_xticklabels([])
ax.set_yticklabels([])

x = 13110
for year in ['2006', '2007', '2008', '2009', '2010']:
    ax.text(x, 1.13, year, alpha=0.5, fontsize=11)
    x += 365

y = 1.193
for rate in ['1.2', '1.3', '1.4', '1.5']:
    ax.text(13010, y, rate, alpha=0.5, fontsize=11)
    y += 0.1

### Adding a title and a subtitle
ax.text(13010, 1.67, "Euro-USD rate peaked at 1.59 during 2007-2008's finacial crisis",
        weight='bold')
ax.text(13010, 1.63, 'Eoro-USD exchange rates between 2006 and 2010',
        size=12)

### Adding a signature
ax.text(13010, 1.07, 'Clark' + ' '*108 + 'Source: European Central Bank',
        color='#f0f0f0', backgroundcolor='#4d4d4d', size=10)

### Add some Transparency to the grid
ax.grid(alpha=0.5)

plt.show()

The Three US Presidencies¶

In [12]:
bush_obama_trump = euro_to_dollar.copy(
                   )[(euro_to_dollar['Time'].dt.year >= 2001) & (euro_to_dollar['Time'].dt.year < 2021)]
bush = bush_obama_trump.copy(
       )[bush_obama_trump['Time'].dt.year < 2009]
obama = bush_obama_trump.copy(
       )[(bush_obama_trump['Time'].dt.year >= 2009) & (bush_obama_trump['Time'].dt.year < 2017)]
trump = bush_obama_trump.copy(
       )[(bush_obama_trump['Time'].dt.year >= 2017) & (bush_obama_trump['Time'].dt.year < 2021)]

Below, you'll notice we used matplotlib's functional approach to build the graphs. We use this approach because it offers more flexibility in arranging the subplots:

  • First build three of the graphs on a 2-by-3 grid (this grid should have six subplots, but only build three; the bottom row remains empty).
  • Then build only the bottom graph of a 2-by-1 grid (this grid should have two subplots; the top row remains empty).
  • The two grids are merged, and end up with three graphs on the top row and one graph on the bottom row.
In [13]:
### Adding the FiveThirtyEight style
style.use('fivethirtyeight')

### Adding the subplots
plt.figure(figsize=(12, 6))
ax1 = plt.subplot(2,3,1)
ax2 = plt.subplot(2,3,2)
ax3 = plt.subplot(2,3,3)
ax4 = plt.subplot(2,1,2)
axes = [ax1, ax2, ax3, ax4]

### Changes to all the subplots
for ax in axes:
    ax.set_ylim(0.8, 1.7)
    ax.set_yticks([1.0, 1.2, 1.4, 1.6])
    ax.set_yticklabels(['1.0', '1.2','1.4', '1.6'],
                   alpha=0.3)
    ax.grid(alpha=0.5)    
    

### Ax1: Bush
ax1.plot(bush['Time'], bush['rolling_mean'],
        color='#BF5FFF')
ax1.set_xticklabels(['', '2001', '', '2003', '', '2005', '',
                     '2007', '', '2009'],
                   alpha=0.3)
ax1.text(12210.0, 1.92, 'BUSH', fontsize=18, weight='bold',
        color='#BF5FFF')
ax1.text(11910.0, 1.8, '(2001-2009)', weight='bold',
        alpha=0.3)

### Ax2: Obama
ax2.plot(obama['Time'], obama['rolling_mean'],
        color='#ffa500')
ax2.set_xticklabels(['', '2009', '', '2011', '', '2013', '',
                     '2015', '', '2017'],
                   alpha=0.3)
ax2.text(15050.0, 1.92, 'OBAMA', fontsize=18, weight='bold',
        color='#ffa500')
ax2.text(14920.0, 1.8, '(2009-2017)', weight='bold',
         alpha=0.3)


### Ax3: Trump
ax3.plot(trump['Time'], trump['rolling_mean'],
        color='#00B2EE')
ax3.set_xticklabels(['2017', '', '2018', '', '2019', '',
                     '2020', '', '2021'],
                   alpha=0.3)
ax3.text(17700.0, 1.92, 'TRUMP', fontsize=18, weight='bold',
        color='#00B2EE')
ax3.text(17620.0, 1.8, '(2017-2021)', weight='bold',
         alpha=0.3)

### Ax4: Bush-Obama-Trump
ax4.plot(bush['Time'], bush['rolling_mean'],
        color='#BF5FFF')
ax4.plot(obama['Time'], obama['rolling_mean'],
        color='#ffa500')
ax4.plot(trump['Time'], trump['rolling_mean'],
        color='#00B2EE')
ax4.grid(alpha=0.5)
ax4.set_xticks([])

### Adding a title and a subtitle
ax1.text(10800.0, 2.35, 'EURO-USD rate averaged 1.22 under the last three US presidents',
         fontsize=20, weight='bold')
ax1.text(10800.0, 2.14, '''EURO-USD exchange rates under George W. Bush (2001 - 2009), Barack Obama (2009-2017),
and Donald Trump (2017-2021)''',
        fontsize=16)

### Adding a signature
ax4.text(10800.0, 0.65, 'Clark' + ' '*103 + 'Source: European Central Bank',
        color = '#f0f0f0', backgroundcolor = '#4d4d4d',
        size=14)

plt.show()