In this project, we'll work with data from the CIA World Factbook, a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information like the following:
population
— the global population.population_growth
— the annual population growth rate, as a percentage.area
— the total land and water area.If you are interested in this project, you can download the SQLite factbook.db database here.
%%capture
# capture the stdout/stderr output of the cell
# load the SQL module in the notebook
%load_ext sql
# connects to an SQLite database
%sql sqlite:///factbook.db
We can now begin by exploring the data.
%%sql
SELECT *
FROM sqlite_master
WHERE type='table';
* sqlite:///factbook.db Done.
type | name | tbl_name | rootpage | sql |
---|---|---|---|---|
table | sqlite_sequence | sqlite_sequence | 3 | CREATE TABLE sqlite_sequence(name,seq) |
table | facts | facts | 47 | CREATE TABLE "facts" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "code" varchar(255) NOT NULL, "name" varchar(255) NOT NULL, "area" integer, "area_land" integer, "area_water" integer, "population" integer, "population_growth" float, "birth_rate" float, "death_rate" float, "migration_rate" float) |
%%sql
SELECT *
FROM facts
LIMIT 5;
* sqlite:///factbook.db Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
1 | af | Afghanistan | 652230 | 652230 | 0 | 32564342 | 2.32 | 38.57 | 13.89 | 1.51 |
2 | al | Albania | 28748 | 27398 | 1350 | 3029278 | 0.3 | 12.92 | 6.58 | 3.3 |
3 | ag | Algeria | 2381741 | 2381741 | 0 | 39542166 | 1.84 | 23.67 | 4.31 | 0.92 |
4 | an | Andorra | 468 | 468 | 0 | 85580 | 0.12 | 8.13 | 6.96 | 0.0 |
5 | ao | Angola | 1246700 | 1246700 | 0 | 19625353 | 2.78 | 38.78 | 11.49 | 0.46 |
Here are the descriptions for some of the columns:
name
— the name of the country.area
— the total land and sea area of the country.population
— the country's population.population_growth—
the country's population growth as a percentage.birth_rate
— the country's birth rate, or the number of births a year per 1,000 people.death_rate
— the country's death rate, or the number of death a year per 1,000 people.area
— the country's total area (both land and water).area_land
— the country's land area in square kilometers.area_water
— the country's water area in square kilometers.Let's start by calculating some summary statistics and see what they tell us.
%%sql
SELECT MIN(population) AS min_pop,
MAX(population) AS max_pop,
MIN(population_growth) AS min_pop_growth,
MAX(population_growth) AS max_pop_growth
FROM facts;
* sqlite:///factbook.db Done.
min_pop | max_pop | min_pop_growth | max_pop_growth |
---|---|---|---|
0 | 7256490011 | 0.0 | 4.02 |
A few things are interesting in the summary statistics on the previous screen:
0
.7256490011
(or more than 7.2 billion people).Let's use subqueries to concentrate on these countries without using the specific values.
%%sql
SELECT *
FROM facts
WHERE population == (SELECT MIN(population) FROM facts);
* sqlite:///factbook.db Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
250 | ay | Antarctica | None | 280000 | None | 0 | None | None | None | None |
It seems like the table contains a row for Antarctica, which explains the population of 0. This seems to match the CIA Factbook page for Antarctica:
%%sql
SELECT *
FROM facts
WHERE population == (SELECT MAX(population) FROM facts);
* sqlite:///factbook.db Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
261 | xx | World | None | None | None | 7256490011 | 1.08 | 18.6 | 7.8 | None |
We can see that the table contains a row for the whole world, which explains the maximum population of over 7.2 billion we found earlier.
Now that we know this, we should recalculate the summary statistics we calculated earlier, while excluding the row for the whole world.
%%sql
SELECT MIN(population) AS min_pop,
MAX(population) AS max_pop,
MIN(population_growth) AS min_pop_growth,
MAX(population_growth) AS max_pop_growth
FROM facts
WHERE name != 'World';
* sqlite:///factbook.db Done.
min_pop | max_pop | min_pop_growth | max_pop_growth |
---|---|---|---|
0 | 1367485388 | 0.0 | 4.02 |
There's a country whose population closes in on 1.4 billion!
Let's explore density. Density depends on the population and the country's area. Let's look at the average values for these two columns.
We should discard the row for the whole planet.
%%sql
SELECT AVG(population) AS avg_population, AVG(area) AS ave_area
FROM facts
WHERE name != 'World'
* sqlite:///factbook.db Done.
avg_population | ave_area |
---|---|
32242666.56846473 | 555093.546184739 |
We see that the average population is around 32 million and the average area is 555 thousand square kilometers.
T to finish, we'll build on the query above to find countries that are densely populated. We'll identify countries that have the following:
%%sql
SELECT *
FROM facts
WHERE population > (SELECT AVG(population) FROM facts WHERE name!='World')
AND area < (SELECT AVG(area) FROM facts WHERE name!='World');
* sqlite:///factbook.db Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
14 | bg | Bangladesh | 148460 | 130170 | 18290 | 168957745 | 1.6 | 21.14 | 5.61 | 0.46 |
65 | gm | Germany | 357022 | 348672 | 8350 | 80854408 | 0.17 | 8.47 | 11.42 | 1.24 |
80 | iz | Iraq | 438317 | 437367 | 950 | 37056169 | 2.93 | 31.45 | 3.77 | 1.62 |
83 | it | Italy | 301340 | 294140 | 7200 | 61855120 | 0.27 | 8.74 | 10.19 | 4.1 |
85 | ja | Japan | 377915 | 364485 | 13430 | 126919659 | 0.16 | 7.93 | 9.51 | 0.0 |
91 | ks | Korea, South | 99720 | 96920 | 2800 | 49115196 | 0.14 | 8.19 | 6.75 | 0.0 |
120 | mo | Morocco | 446550 | 446300 | 250 | 33322699 | 1.0 | 18.2 | 4.81 | 3.36 |
138 | rp | Philippines | 300000 | 298170 | 1830 | 100998376 | 1.61 | 24.27 | 6.11 | 2.09 |
139 | pl | Poland | 312685 | 304255 | 8430 | 38562189 | 0.09 | 9.74 | 10.19 | 0.46 |
163 | sp | Spain | 505370 | 498980 | 6390 | 48146134 | 0.89 | 9.64 | 9.04 | 8.31 |
173 | th | Thailand | 513120 | 510890 | 2230 | 67976405 | 0.34 | 11.19 | 7.8 | 0.0 |
182 | ug | Uganda | 241038 | 197100 | 43938 | 37101745 | 3.24 | 43.79 | 10.69 | 0.74 |
185 | uk | United Kingdom | 243610 | 241930 | 1680 | 64088222 | 0.54 | 12.17 | 9.35 | 2.54 |
192 | vm | Vietnam | 331210 | 310070 | 21140 | 94348835 | 0.97 | 15.96 | 5.93 | 0.3 |
Some of these countries are generally known to be densely populated, so we have confidence in our results!