%%capture
%load_ext sql
%sql sqlite:///chinook.db
%%sql
SELECT name,type
FROM sqlite_master
WHERE type IN ('table','view');
* sqlite:///chinook.db Done.
name | type |
---|---|
album | table |
artist | table |
customer | table |
employee | table |
genre | table |
invoice | table |
invoice_line | table |
media_type | table |
playlist | table |
playlist_track | table |
track | table |
The Chinook record store has just signed a deal with a new record label, and we've been tasked with selecting the first three albums that will be added to the store, from a list of four. All four albums are by artists that don't have any tracks in the store right now - we have the artist names, and the genre of music they produce:
Artist Name | Genre |
---|---|
Regal | Hip-Hop |
Red Tone | Punk |
Meteor and the Girls | Pop |
Slim Jim Bites | Blues |
The record label specializes in artists from the USA, and they have given Chinook some money to advertise the new albums in the USA, so we're interested in finding out which genres sell the best in the USA.
We'll need to write a query to find out which genres sell the most tracks in the USA, write up a summary of our findings, and make a recommendation for the three artists whose albums we should purchase for the store.
%%sql
WITH sales AS
(SELECT il.quantity, g.name, c.country
FROM invoice_line il
LEFT JOIN track t ON il.track_id = t.track_id
LEFT JOIN genre g ON t.genre_id = g.genre_id
LEFT JOIN invoice i ON i.invoice_id = il.invoice_id
LEFT JOIN customer c ON c.customer_id = i.customer_id
WHERE country ='USA')
SELECT name,
SUM(quantity) quantity,
CAST(SUM(quantity) AS float)/(SELECT SUM(quantity) FROM sales) percentage
FROM sales
GROUP BY name
ORDER BY quantity DESC
LIMIT 5;
* sqlite:///chinook.db Done.
name | quantity | percentage |
---|---|---|
Rock | 561 | 0.5337773549000951 |
Alternative & Punk | 130 | 0.12369172216936251 |
Metal | 124 | 0.11798287345385347 |
R&B/Soul | 53 | 0.05042816365366318 |
Blues | 36 | 0.03425309229305423 |
Based on the sales of tracks across different genres in the USA, we should purchase the new albums by the following artists:
It's worth keeping in mind that combined, these three genres only make up only 17% of total sales, so we should be on the lookout for artists and albums from the rock genre, which accounts for 53% of sales.
Each customer for the Chinook store gets assigned to a sales support agent within the company when they first make a purchase. We have been asked to analyze the purchases of customers belonging to each employee to see if any sales support agent is performing either better or worse than the others.
We might consider whether any extra columns from the employee table explain any variance, or whether the variance might instead be indicative of employee performance.
%%sql
SELECT employee_id, first_name, last_name, hire_date, SUM(total) total
FROM (SELECT *
FROM invoice i
LEFT JOIN customer c ON i.customer_id=c.customer_id
LEFT JOIN employee e ON c.support_rep_id=e.employee_id
)
GROUP BY employee_id
ORDER BY total DESC;
* sqlite:///chinook.db Done.
employee_id | first_name | last_name | hire_date | total |
---|---|---|---|---|
3 | Michelle | Brooks | 2017-04-01 00:00:00 | 1731.5100000000039 |
4 | Dominique | Lefebvre | 2017-05-03 00:00:00 | 1584.0000000000034 |
5 | Martha | Silk | 2017-10-17 00:00:00 | 1393.920000000002 |
While there is a 20% difference in sales between Jane (the top employee) and Steve (the bottom employee), the difference roughly corresponds to the differences in their hiring dates.
Our next task is to analyze the sales data for customers from each different country. We have used the country value from the customers table, and ignored the country from the billing address in the invoice table.
In particular, we have been directed to calculate data, for each country, on the:
Because there are a number of countries with only one customer, you should group these customers as "Other" in your analysis.
%%sql
WITH country_other AS
(SELECT CASE
WHEN (
SELECT COUNT(*)
FROM customer
WHERE country=c.country
) = 1 THEN 'Other'
ELSE c.country
END AS country,
c.customer_id,
i.total,
i.invoice_id
FROM customer c
INNER JOIN invoice i ON c.customer_id=i.customer_id
)
SELECT
country,
COUNT(DISTINCT customer_id) total_customer,
SUM(total) total_value,
SUM(total)/COUNT(DISTINCT customer_id) avg_value_customer,
SUM(total)/COUNT(DISTINCT invoice_id) avg_value_invoice,
CASE
WHEN country = 'Other' THEN 1
ELSE 0
END AS sort
FROM country_other
GROUP BY country
ORDER BY sort ASC, total_value DESC
* sqlite:///chinook.db Done.
country | total_customer | total_value | avg_value_customer | avg_value_invoice | sort |
---|---|---|---|---|---|
USA | 13 | 1040.4899999999998 | 80.0376923076923 | 7.942671755725189 | 0 |
Canada | 8 | 535.5900000000001 | 66.94875000000002 | 7.047236842105265 | 0 |
Brazil | 5 | 427.68000000000006 | 85.53600000000002 | 7.0111475409836075 | 0 |
France | 5 | 389.0699999999999 | 77.81399999999998 | 7.781399999999998 | 0 |
Germany | 4 | 334.62 | 83.655 | 8.161463414634147 | 0 |
Czech Republic | 2 | 273.24000000000007 | 136.62000000000003 | 9.108000000000002 | 0 |
United Kingdom | 3 | 245.52 | 81.84 | 8.768571428571429 | 0 |
Portugal | 2 | 185.13000000000002 | 92.56500000000001 | 6.383793103448276 | 0 |
India | 2 | 183.14999999999998 | 91.57499999999999 | 8.72142857142857 | 0 |
Other | 15 | 1094.9400000000003 | 72.99600000000002 | 7.44857142857143 | 1 |
Based on the data, there may be opportunity in the following countries:
It's worth remembering because the amount of data from each of these countries is relatively low. As such, we should be cautious about spending too much money on new marketing campaigns because the sample size isn't large enough to give us high confidence. A better approach would be to run small campaigns in these countries, collecting and analyzing the new customers to make sure that these trends hold with new customers.
The Chinook store is setup in a way that allows customer to make purchases in one of the two ways:
The store does not let customers purchase a whole album, and then add individual tracks to that same purchase (unless they do that by choosing each track manually). When customers purchase albums they are charged the same price as if they had purchased each of those tracks separately.
Management are currently considering changing their purchasing strategy to save money. The strategy they are considering is to purchase only the most popular tracks from each album from record companies, instead of purchasing every track from an album.
We have been asked to find out what percentage of purchases are individual tracks vs whole albums, so that management can use this data to understand the effect this decision might have on overall revenue.
%%sql
WITH buytrack AS (
SELECT il.invoice_id invoice_id,
CASE
WHEN (
SELECT track_id
FROM invoice_line
WHERE invoice_id = il.invoice_id
EXCEPT
SELECT track_id
FROM track
WHERE album_id = t.album_id
) IS NULL
AND (
SELECT track_id
FROM track
WHERE album_id = t.album_id
EXCEPT
SELECT track_id
FROM invoice_line
WHERE invoice_id = il.invoice_id
) IS NULL
THEN "No"
ELSE "YES"
END AS if_track
FROM invoice_line il
LEFT JOIN track t ON il.track_id = t.track_id
)
SELECT if_track,
COUNT(DISTINCT invoice_id) number_of_invoice,
COUNT(DISTINCT invoice_id)
/CAST(
(SELECT COUNT(invoice_id) FROM invoice)
AS FLOAT
) percentage
FROM buytrack
GROUP BY if_track
* sqlite:///chinook.db Done.
if_track | number_of_invoice | percentage |
---|---|---|
No | 114 | 0.18566775244299674 |
YES | 500 | 0.8143322475570033 |
There are more than 80% of the purchase is by track from the sales report. It is good idea to change the strategy to purchase only the most popular tracks from each album from record companies, instead of purchasing every track from an album.