%%capture
%load_ext sql
%sql sqlite:///chinook.db
%%sql
SELECT name,type
FROM sqlite_master
WHERE type IN ('table','view');
* sqlite:///chinook.db Done.
| name | type | 
|---|---|
| album | table | 
| artist | table | 
| customer | table | 
| employee | table | 
| genre | table | 
| invoice | table | 
| invoice_line | table | 
| media_type | table | 
| playlist | table | 
| playlist_track | table | 
| track | table | 
The Chinook record store has just signed a deal with a new record label, and we've been tasked with selecting the first three albums that will be added to the store, from a list of four. All four albums are by artists that don't have any tracks in the store right now - we have the artist names, and the genre of music they produce:
| Artist Name | Genre | 
|---|---|
| Regal | Hip-Hop | 
| Red Tone | Punk | 
| Meteor and the Girls | Pop | 
| Slim Jim Bites | Blues | 
The record label specializes in artists from the USA, and they have given Chinook some money to advertise the new albums in the USA, so we're interested in finding out which genres sell the best in the USA.
We'll need to write a query to find out which genres sell the most tracks in the USA, write up a summary of our findings, and make a recommendation for the three artists whose albums we should purchase for the store.
%%sql
WITH sales AS
    (SELECT il.quantity, g.name, c.country
      FROM invoice_line il
      LEFT JOIN track t ON il.track_id = t.track_id
      LEFT JOIN genre g ON t.genre_id = g.genre_id 
      LEFT JOIN invoice i ON i.invoice_id = il.invoice_id
      LEFT JOIN customer c ON c.customer_id = i.customer_id
      WHERE country ='USA')
SELECT name, 
       SUM(quantity) quantity, 
       CAST(SUM(quantity) AS float)/(SELECT SUM(quantity) FROM sales) percentage
FROM sales
GROUP BY name
ORDER BY quantity DESC
LIMIT 5;
* sqlite:///chinook.db Done.
| name | quantity | percentage | 
|---|---|---|
| Rock | 561 | 0.5337773549000951 | 
| Alternative & Punk | 130 | 0.12369172216936251 | 
| Metal | 124 | 0.11798287345385347 | 
| R&B/Soul | 53 | 0.05042816365366318 | 
| Blues | 36 | 0.03425309229305423 | 
Based on the sales of tracks across different genres in the USA, we should purchase the new albums by the following artists:
It's worth keeping in mind that combined, these three genres only make up only 17% of total sales, so we should be on the lookout for artists and albums from the rock genre, which accounts for 53% of sales.
Each customer for the Chinook store gets assigned to a sales support agent within the company when they first make a purchase. We have been asked to analyze the purchases of customers belonging to each employee to see if any sales support agent is performing either better or worse than the others.
We might consider whether any extra columns from the employee table explain any variance, or whether the variance might instead be indicative of employee performance.
%%sql
SELECT employee_id, first_name, last_name, hire_date, SUM(total) total
FROM (SELECT *
      FROM invoice i
      LEFT JOIN customer c ON i.customer_id=c.customer_id
      LEFT JOIN employee e ON c.support_rep_id=e.employee_id
      )
GROUP BY employee_id
ORDER BY total DESC;
* sqlite:///chinook.db Done.
| employee_id | first_name | last_name | hire_date | total | 
|---|---|---|---|---|
| 3 | Michelle | Brooks | 2017-04-01 00:00:00 | 1731.5100000000039 | 
| 4 | Dominique | Lefebvre | 2017-05-03 00:00:00 | 1584.0000000000034 | 
| 5 | Martha | Silk | 2017-10-17 00:00:00 | 1393.920000000002 | 
While there is a 20% difference in sales between Jane (the top employee) and Steve (the bottom employee), the difference roughly corresponds to the differences in their hiring dates.
Our next task is to analyze the sales data for customers from each different country. We have used the country value from the customers table, and ignored the country from the billing address in the invoice table.
In particular, we have been directed to calculate data, for each country, on the:
Because there are a number of countries with only one customer, you should group these customers as "Other" in your analysis.
%%sql
WITH country_other AS
    (SELECT CASE
              WHEN (
                SELECT COUNT(*)
                FROM customer
                WHERE country=c.country
                ) = 1 THEN 'Other'
              ELSE c.country
            END AS country,
            c.customer_id,
            i.total,
            i.invoice_id
    FROM customer c
    INNER JOIN invoice i ON c.customer_id=i.customer_id
    )
SELECT
    country,
    COUNT(DISTINCT customer_id) total_customer,
    SUM(total) total_value,
    SUM(total)/COUNT(DISTINCT customer_id) avg_value_customer,
    SUM(total)/COUNT(DISTINCT invoice_id) avg_value_invoice,
    CASE
        WHEN country = 'Other' THEN 1
        ELSE 0
    END AS sort
FROM country_other
GROUP BY country
ORDER BY sort ASC, total_value DESC
* sqlite:///chinook.db Done.
| country | total_customer | total_value | avg_value_customer | avg_value_invoice | sort | 
|---|---|---|---|---|---|
| USA | 13 | 1040.4899999999998 | 80.0376923076923 | 7.942671755725189 | 0 | 
| Canada | 8 | 535.5900000000001 | 66.94875000000002 | 7.047236842105265 | 0 | 
| Brazil | 5 | 427.68000000000006 | 85.53600000000002 | 7.0111475409836075 | 0 | 
| France | 5 | 389.0699999999999 | 77.81399999999998 | 7.781399999999998 | 0 | 
| Germany | 4 | 334.62 | 83.655 | 8.161463414634147 | 0 | 
| Czech Republic | 2 | 273.24000000000007 | 136.62000000000003 | 9.108000000000002 | 0 | 
| United Kingdom | 3 | 245.52 | 81.84 | 8.768571428571429 | 0 | 
| Portugal | 2 | 185.13000000000002 | 92.56500000000001 | 6.383793103448276 | 0 | 
| India | 2 | 183.14999999999998 | 91.57499999999999 | 8.72142857142857 | 0 | 
| Other | 15 | 1094.9400000000003 | 72.99600000000002 | 7.44857142857143 | 1 | 
Based on the data, there may be opportunity in the following countries:
It's worth remembering because the amount of data from each of these countries is relatively low. As such, we should be cautious about spending too much money on new marketing campaigns because the sample size isn't large enough to give us high confidence. A better approach would be to run small campaigns in these countries, collecting and analyzing the new customers to make sure that these trends hold with new customers.
The Chinook store is setup in a way that allows customer to make purchases in one of the two ways:
The store does not let customers purchase a whole album, and then add individual tracks to that same purchase (unless they do that by choosing each track manually). When customers purchase albums they are charged the same price as if they had purchased each of those tracks separately.
Management are currently considering changing their purchasing strategy to save money. The strategy they are considering is to purchase only the most popular tracks from each album from record companies, instead of purchasing every track from an album.
We have been asked to find out what percentage of purchases are individual tracks vs whole albums, so that management can use this data to understand the effect this decision might have on overall revenue.
%%sql
WITH buytrack AS (
    SELECT il.invoice_id invoice_id,
          CASE
            WHEN (
              SELECT track_id 
              FROM invoice_line
              WHERE invoice_id = il.invoice_id
              EXCEPT
              SELECT track_id
              FROM  track 
              WHERE album_id = t.album_id
            ) IS NULL
            AND (
              SELECT track_id
              FROM  track 
              WHERE album_id = t.album_id
              EXCEPT
              SELECT track_id 
              FROM invoice_line
              WHERE invoice_id = il.invoice_id
            ) IS NULL
              THEN "No"
            ELSE "YES"
            END AS if_track
    FROM invoice_line il
    LEFT JOIN track t ON il.track_id = t.track_id
              
    )
  SELECT  if_track,
          COUNT(DISTINCT invoice_id) number_of_invoice,
          COUNT(DISTINCT invoice_id)
          /CAST(
            (SELECT COUNT(invoice_id) FROM invoice) 
            AS FLOAT
            ) percentage
  FROM buytrack
  GROUP BY if_track
* sqlite:///chinook.db Done.
| if_track | number_of_invoice | percentage | 
|---|---|---|
| No | 114 | 0.18566775244299674 | 
| YES | 500 | 0.8143322475570033 | 
There are more than 80% of the purchase is by track from the sales report. It is good idea to change the strategy to purchase only the most popular tracks from each album from record companies, instead of purchasing every track from an album.