In this project, we will focus on explanatory data visualization and practice the following:
Matplotlib
.The dataset we'll use describes Euro daily exhcange rates between 1999 and 2021. The euro (symbolized with €) is the official currency in most of the countries of the European Union.
If the exchange rate of the euro to the US dollar is 1.5, you get 1.5 US dollars if you pay 1.0 euro (one euro has more value than one US dollar at this exchange rate).
Daria Chemkaeva put together the data set and made it available on Kaggle — the data source is the European Central Bank. Note that the dataset gets regualr updates — the one in this project is downloaded on March 2023.
Let's start by reading in the dataset.
import pandas as pd
exchange_rates = pd.read_csv('euro-daily-hist_1999_2022.csv')
exchange_rates.head()
Period\Unit: | [Australian dollar ] | [Bulgarian lev ] | [Brazilian real ] | [Canadian dollar ] | [Swiss franc ] | [Chinese yuan renminbi ] | [Cypriot pound ] | [Czech koruna ] | [Danish krone ] | ... | [Romanian leu ] | [Russian rouble ] | [Swedish krona ] | [Singapore dollar ] | [Slovenian tolar ] | [Slovak koruna ] | [Thai baht ] | [Turkish lira ] | [US dollar ] | [South African rand ] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2023-01-27 | 1.5289 | 1.9558 | 5.5104 | 1.4479 | 1.0017 | 7.3690 | NaN | 23.826 | 7.4378 | ... | 4.8965 | NaN | 11.2108 | 1.4277 | NaN | NaN | 35.7020 | 20.4365 | 1.0865 | 18.7185 |
1 | 2023-01-26 | 1.5308 | 1.9558 | 5.5572 | 1.4568 | 1.0002 | 7.3893 | NaN | 23.818 | 7.4383 | ... | 4.8818 | NaN | 11.1763 | 1.4292 | NaN | NaN | 35.6870 | 20.4961 | 1.0895 | 18.6127 |
2 | 2023-01-25 | 1.5360 | 1.9558 | 5.5690 | 1.4544 | 1.0020 | 7.3778 | NaN | 23.808 | 7.4381 | ... | 4.9035 | NaN | 11.1335 | 1.4307 | NaN | NaN | 35.7180 | 20.4658 | 1.0878 | 18.6745 |
3 | 2023-01-24 | 1.5470 | 1.9558 | 5.6164 | 1.4517 | 1.0053 | 7.3642 | NaN | 23.874 | 7.4399 | ... | 4.9171 | NaN | 11.0995 | 1.4337 | NaN | NaN | 35.6090 | 20.4234 | 1.0858 | 18.7687 |
4 | 2023-01-23 | 1.5529 | 1.9558 | 5.6372 | 1.4523 | 1.0013 | 7.3730 | NaN | 23.881 | 7.4393 | ... | 4.9202 | NaN | 11.1183 | 1.4328 | NaN | NaN | 35.6300 | 20.4478 | 1.0871 | 18.6597 |
5 rows × 41 columns
exchange_rates.tail()
Period\Unit: | [Australian dollar ] | [Bulgarian lev ] | [Brazilian real ] | [Canadian dollar ] | [Swiss franc ] | [Chinese yuan renminbi ] | [Cypriot pound ] | [Czech koruna ] | [Danish krone ] | ... | [Romanian leu ] | [Russian rouble ] | [Swedish krona ] | [Singapore dollar ] | [Slovenian tolar ] | [Slovak koruna ] | [Thai baht ] | [Turkish lira ] | [US dollar ] | [South African rand ] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6224 | 1999-01-08 | 1.8406 | NaN | NaN | 1.7643 | 1.6138 | NaN | 0.58187 | 34.938 | 7.4433 | ... | 1.3143 | 27.2075 | 9.1650 | 1.9537 | 188.8400 | 42.560 | 42.5590 | 0.3718 | 1.1659 | 6.7855 |
6225 | 1999-01-07 | 1.8474 | NaN | NaN | 1.7602 | 1.6165 | NaN | 0.58187 | 34.886 | 7.4431 | ... | 1.3092 | 26.9876 | 9.1800 | 1.9436 | 188.8000 | 42.765 | 42.1678 | 0.3701 | 1.1632 | 6.8283 |
6226 | 1999-01-06 | 1.8820 | NaN | NaN | 1.7711 | 1.6116 | NaN | 0.58200 | 34.850 | 7.4452 | ... | 1.3168 | 27.4315 | 9.3050 | 1.9699 | 188.7000 | 42.778 | 42.6949 | 0.3722 | 1.1743 | 6.7307 |
6227 | 1999-01-05 | 1.8944 | NaN | NaN | 1.7965 | 1.6123 | NaN | 0.58230 | 34.917 | 7.4495 | ... | 1.3168 | 26.5876 | 9.4025 | 1.9655 | 188.7750 | 42.848 | 42.5048 | 0.3728 | 1.1790 | 6.7975 |
6228 | 1999-01-04 | 1.9100 | NaN | NaN | 1.8004 | 1.6168 | NaN | 0.58231 | 35.107 | 7.4501 | ... | 1.3111 | 25.2875 | 9.4696 | 1.9554 | 189.0450 | 42.991 | 42.6799 | 0.3723 | 1.1789 | 6.9358 |
5 rows × 41 columns
exchange_rates.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 6229 entries, 0 to 6228 Data columns (total 41 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Period\Unit: 6229 non-null object 1 [Australian dollar ] 6229 non-null object 2 [Bulgarian lev ] 5827 non-null object 3 [Brazilian real ] 5961 non-null object 4 [Canadian dollar ] 6229 non-null object 5 [Swiss franc ] 6229 non-null object 6 [Chinese yuan renminbi ] 5961 non-null object 7 [Cypriot pound ] 2346 non-null object 8 [Czech koruna ] 6229 non-null object 9 [Danish krone ] 6229 non-null object 10 [Estonian kroon ] 3130 non-null object 11 [UK pound sterling ] 6229 non-null object 12 [Greek drachma ] 520 non-null object 13 [Hong Kong dollar ] 6229 non-null object 14 [Croatian kuna ] 5941 non-null object 15 [Hungarian forint ] 6229 non-null object 16 [Indonesian rupiah ] 6229 non-null object 17 [Israeli shekel ] 5961 non-null object 18 [Indian rupee ] 5961 non-null object 19 [Iceland krona ] 3822 non-null float64 20 [Japanese yen ] 6229 non-null object 21 [Korean won ] 6229 non-null object 22 [Lithuanian litas ] 4159 non-null object 23 [Latvian lats ] 3904 non-null object 24 [Maltese lira ] 2346 non-null object 25 [Mexican peso ] 6229 non-null object 26 [Malaysian ringgit ] 6229 non-null object 27 [Norwegian krone ] 6229 non-null object 28 [New Zealand dollar ] 6229 non-null object 29 [Philippine peso ] 6229 non-null object 30 [Polish zloty ] 6229 non-null object 31 [Romanian leu ] 6167 non-null float64 32 [Russian rouble ] 5994 non-null object 33 [Swedish krona ] 6229 non-null object 34 [Singapore dollar ] 6229 non-null object 35 [Slovenian tolar ] 2085 non-null object 36 [Slovak koruna ] 2608 non-null object 37 [Thai baht ] 6229 non-null object 38 [Turkish lira ] 6167 non-null float64 39 [US dollar ] 6229 non-null object 40 [South African rand ] 6229 non-null object dtypes: float64(3), object(38) memory usage: 1.9+ MB
Before we start creating data visualizations, we'll need to do a bit of data cleaning. Our focus in this porject will be on the exchange rate between the euro and the American dollar. Below we do the following:
[US dollar]
and Period\Unit:
columns to something easier to type — US_dollar
and Time
.Time
column to a datetime
data type.Time
in ascending order.float
data type.exchange_rates.rename(columns={'[US dollar ]': 'US_dollar',
'Period\\Unit:': 'Time'},
inplace=True)
exchange_rates['Time'] = pd.to_datetime(exchange_rates['Time'])
exchange_rates.sort_values('Time', inplace=True)
exchange_rates.reset_index(drop=True, inplace=True)
exchange_rates.head()
Time | [Australian dollar ] | [Bulgarian lev ] | [Brazilian real ] | [Canadian dollar ] | [Swiss franc ] | [Chinese yuan renminbi ] | [Cypriot pound ] | [Czech koruna ] | [Danish krone ] | ... | [Romanian leu ] | [Russian rouble ] | [Swedish krona ] | [Singapore dollar ] | [Slovenian tolar ] | [Slovak koruna ] | [Thai baht ] | [Turkish lira ] | US_dollar | [South African rand ] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1999-01-04 | 1.9100 | NaN | NaN | 1.8004 | 1.6168 | NaN | 0.58231 | 35.107 | 7.4501 | ... | 1.3111 | 25.2875 | 9.4696 | 1.9554 | 189.0450 | 42.991 | 42.6799 | 0.3723 | 1.1789 | 6.9358 |
1 | 1999-01-05 | 1.8944 | NaN | NaN | 1.7965 | 1.6123 | NaN | 0.58230 | 34.917 | 7.4495 | ... | 1.3168 | 26.5876 | 9.4025 | 1.9655 | 188.7750 | 42.848 | 42.5048 | 0.3728 | 1.1790 | 6.7975 |
2 | 1999-01-06 | 1.8820 | NaN | NaN | 1.7711 | 1.6116 | NaN | 0.58200 | 34.850 | 7.4452 | ... | 1.3168 | 27.4315 | 9.3050 | 1.9699 | 188.7000 | 42.778 | 42.6949 | 0.3722 | 1.1743 | 6.7307 |
3 | 1999-01-07 | 1.8474 | NaN | NaN | 1.7602 | 1.6165 | NaN | 0.58187 | 34.886 | 7.4431 | ... | 1.3092 | 26.9876 | 9.1800 | 1.9436 | 188.8000 | 42.765 | 42.1678 | 0.3701 | 1.1632 | 6.8283 |
4 | 1999-01-08 | 1.8406 | NaN | NaN | 1.7643 | 1.6138 | NaN | 0.58187 | 34.938 | 7.4433 | ... | 1.3143 | 27.2075 | 9.1650 | 1.9537 | 188.8400 | 42.560 | 42.5590 | 0.3718 | 1.1659 | 6.7855 |
5 rows × 41 columns
In this project, we'll focus on Euro to US dollar exchange rate.
euro_to_dollar = exchange_rates[['Time', 'US_dollar']].copy()
euro_to_dollar['US_dollar'].value_counts()
- 62 1.2276 9 1.1215 8 1.1268 7 1.1305 7 .. 1.3818 1 1.3591 1 1.3405 1 1.3580 1 1.0865 1 Name: US_dollar, Length: 3718, dtype: int64
There are 62 '-' characters in the US_dollar
column, which need to be removed as invalid value.
euro_to_dollar = euro_to_dollar[euro_to_dollar['US_dollar'] != '-']
euro_to_dollar['US_dollar'] = euro_to_dollar['US_dollar'].astype(float)
Now that we're finished cleaning the data, we'll generate a line plot to visualize the evolution of the euro-dollar exhange rate.
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(euro_to_dollar['Time'], euro_to_dollar['US_dollar'])
plt.show()
If we look at the line's shape, we see many small wiggles — rather than seeing a smooth line. The wiggles, however, have meaning: : they are the visual representation of the daily variation in the exchange rate. The rate goes up and down, up and down again, day to day. The rate only shows clear upward or downward trends in the longer run (months or years).
Depending on our goals, we may not want to show that daily variation on our graph. If we want to hide it and show only the long-term trends, we can use the rolling mean (also known as the moving average). Here we apply different moving windows for the visualization.
plt.figure(figsize=(9, 6))
plt.subplot(3,2,1)
plt.plot(euro_to_dollar['Time'], euro_to_dollar['US_dollar'])
plt.title('Roling Window = 0 Days', weight='bold')
windows = [7, 30, 50, 100, 365]
for i in range(5):
plt.subplot(3,2,i+2)
plt.plot(euro_to_dollar['Time'],
euro_to_dollar['US_dollar'].rolling(windows[i]).mean())
plt.title('Rolling Window = ' + str(windows[i]) + ' Days', weight='bold')
plt.tight_layout() # Auto-adjusts the padding between subplots
plt.show()
Let's choose the rolling window of 30 days for the following analysis.
euro_to_dollar['rolling_mean'] = euro_to_dollar['US_dollar'].rolling(30).mean()
euro_to_dollar
Time | US_dollar | rolling_mean | |
---|---|---|---|
0 | 1999-01-04 | 1.1789 | NaN |
1 | 1999-01-05 | 1.1790 | NaN |
2 | 1999-01-06 | 1.1743 | NaN |
3 | 1999-01-07 | 1.1632 | NaN |
4 | 1999-01-08 | 1.1659 | NaN |
... | ... | ... | ... |
6224 | 2023-01-23 | 1.0871 | 1.067830 |
6225 | 2023-01-24 | 1.0858 | 1.068817 |
6226 | 2023-01-25 | 1.0878 | 1.069927 |
6227 | 2023-01-26 | 1.0895 | 1.070747 |
6228 | 2023-01-27 | 1.0865 | 1.071560 |
6167 rows × 3 columns
To create a story, remember that we need to arrange our data into a series of events that show change. Here are two story ideas for our data:
financial_crisis = euro_to_dollar.copy(
)[(euro_to_dollar['Time'].dt.year >= 2006
) & (euro_to_dollar['Time'].dt.year <= 2009)]
financial_crisis_7_8 = euro_to_dollar.copy(
)[(euro_to_dollar.Time.dt.year >= 2007
) & (euro_to_dollar.Time.dt.year <= 2008)]
### Adding the FiveThirtyEight style
import matplotlib.style as style
import warnings
warnings.filterwarnings("ignore")
style.use('fivethirtyeight')
### Adding the plot
fig, ax = plt.subplots(figsize=(8, 3))
ax.plot(financial_crisis['Time'],
financial_crisis['rolling_mean'],
linewidth=1, color='#A6D785')
### Highlighting the 2007-2008 period
ax.plot(financial_crisis_7_8['Time'],
financial_crisis_7_8['rolling_mean'],
linewidth=3, color='#e23d28')
### Highlighting the peak of the crisis
ax.axvspan(xmin=13960, xmax=14140, ymin=0.09,
alpha=0.3, color='grey')
### Adding separate tick labels
ax.set_xticklabels([])
ax.set_yticklabels([])
x = 13110
for year in ['2006', '2007', '2008', '2009', '2010']:
ax.text(x, 1.13, year, alpha=0.5, fontsize=11)
x += 365
y = 1.193
for rate in ['1.2', '1.3', '1.4', '1.5']:
ax.text(13010, y, rate, alpha=0.5, fontsize=11)
y += 0.1
### Adding a title and a subtitle
ax.text(13010, 1.67, "Euro-USD rate peaked at 1.59 during 2007-2008's finacial crisis",
weight='bold')
ax.text(13010, 1.63, 'Eoro-USD exchange rates between 2006 and 2010',
size=12)
### Adding a signature
ax.text(13010, 1.07, 'Clark' + ' '*108 + 'Source: European Central Bank',
color='#f0f0f0', backgroundcolor='#4d4d4d', size=10)
### Add some Transparency to the grid
ax.grid(alpha=0.5)
plt.show()
bush_obama_trump = euro_to_dollar.copy(
)[(euro_to_dollar['Time'].dt.year >= 2001) & (euro_to_dollar['Time'].dt.year < 2021)]
bush = bush_obama_trump.copy(
)[bush_obama_trump['Time'].dt.year < 2009]
obama = bush_obama_trump.copy(
)[(bush_obama_trump['Time'].dt.year >= 2009) & (bush_obama_trump['Time'].dt.year < 2017)]
trump = bush_obama_trump.copy(
)[(bush_obama_trump['Time'].dt.year >= 2017) & (bush_obama_trump['Time'].dt.year < 2021)]
Below, you'll notice we used matplotlib's functional approach to build the graphs. We use this approach because it offers more flexibility in arranging the subplots:
### Adding the FiveThirtyEight style
style.use('fivethirtyeight')
### Adding the subplots
plt.figure(figsize=(12, 6))
ax1 = plt.subplot(2,3,1)
ax2 = plt.subplot(2,3,2)
ax3 = plt.subplot(2,3,3)
ax4 = plt.subplot(2,1,2)
axes = [ax1, ax2, ax3, ax4]
### Changes to all the subplots
for ax in axes:
ax.set_ylim(0.8, 1.7)
ax.set_yticks([1.0, 1.2, 1.4, 1.6])
ax.set_yticklabels(['1.0', '1.2','1.4', '1.6'],
alpha=0.3)
ax.grid(alpha=0.5)
### Ax1: Bush
ax1.plot(bush['Time'], bush['rolling_mean'],
color='#BF5FFF')
ax1.set_xticklabels(['', '2001', '', '2003', '', '2005', '',
'2007', '', '2009'],
alpha=0.3)
ax1.text(12210.0, 1.92, 'BUSH', fontsize=18, weight='bold',
color='#BF5FFF')
ax1.text(11910.0, 1.8, '(2001-2009)', weight='bold',
alpha=0.3)
### Ax2: Obama
ax2.plot(obama['Time'], obama['rolling_mean'],
color='#ffa500')
ax2.set_xticklabels(['', '2009', '', '2011', '', '2013', '',
'2015', '', '2017'],
alpha=0.3)
ax2.text(15050.0, 1.92, 'OBAMA', fontsize=18, weight='bold',
color='#ffa500')
ax2.text(14920.0, 1.8, '(2009-2017)', weight='bold',
alpha=0.3)
### Ax3: Trump
ax3.plot(trump['Time'], trump['rolling_mean'],
color='#00B2EE')
ax3.set_xticklabels(['2017', '', '2018', '', '2019', '',
'2020', '', '2021'],
alpha=0.3)
ax3.text(17700.0, 1.92, 'TRUMP', fontsize=18, weight='bold',
color='#00B2EE')
ax3.text(17620.0, 1.8, '(2017-2021)', weight='bold',
alpha=0.3)
### Ax4: Bush-Obama-Trump
ax4.plot(bush['Time'], bush['rolling_mean'],
color='#BF5FFF')
ax4.plot(obama['Time'], obama['rolling_mean'],
color='#ffa500')
ax4.plot(trump['Time'], trump['rolling_mean'],
color='#00B2EE')
ax4.grid(alpha=0.5)
ax4.set_xticks([])
### Adding a title and a subtitle
ax1.text(10800.0, 2.35, 'EURO-USD rate averaged 1.22 under the last three US presidents',
fontsize=20, weight='bold')
ax1.text(10800.0, 2.14, '''EURO-USD exchange rates under George W. Bush (2001 - 2009), Barack Obama (2009-2017),
and Donald Trump (2017-2021)''',
fontsize=16)
### Adding a signature
ax4.text(10800.0, 0.65, 'Clark' + ' '*103 + 'Source: European Central Bank',
color = '#f0f0f0', backgroundcolor = '#4d4d4d',
size=14)
plt.show()