Answering Business Questions Using SQL¶

Introduction¶

In this project, we're going to practice using our SQL skills to answer business questions.

diagram

In [1]:
%%capture
%load_ext sql
%sql sqlite:///chinook.db
In [2]:
%%sql
SELECT name,type
FROM sqlite_master
WHERE type IN ('table','view');
 * sqlite:///chinook.db
Done.
Out[2]:
name type
album table
artist table
customer table
employee table
genre table
invoice table
invoice_line table
media_type table
playlist table
playlist_track table
track table

Selecting Albums to Purchase¶

The Chinook record store has just signed a deal with a new record label, and we've been tasked with selecting the first three albums that will be added to the store, from a list of four. All four albums are by artists that don't have any tracks in the store right now - we have the artist names, and the genre of music they produce:

Artist Name Genre
Regal Hip-Hop
Red Tone Punk
Meteor and the Girls Pop
Slim Jim Bites Blues

The record label specializes in artists from the USA, and they have given Chinook some money to advertise the new albums in the USA, so we're interested in finding out which genres sell the best in the USA.

We'll need to write a query to find out which genres sell the most tracks in the USA, write up a summary of our findings, and make a recommendation for the three artists whose albums we should purchase for the store.

In [3]:
%%sql
WITH sales AS
    (SELECT il.quantity, g.name, c.country
      FROM invoice_line il
      LEFT JOIN track t ON il.track_id = t.track_id
      LEFT JOIN genre g ON t.genre_id = g.genre_id 
      LEFT JOIN invoice i ON i.invoice_id = il.invoice_id
      LEFT JOIN customer c ON c.customer_id = i.customer_id
      WHERE country ='USA')
SELECT name, 
       SUM(quantity) quantity, 
       CAST(SUM(quantity) AS float)/(SELECT SUM(quantity) FROM sales) percentage
FROM sales
GROUP BY name
ORDER BY quantity DESC
LIMIT 5;
 * sqlite:///chinook.db
Done.
Out[3]:
name quantity percentage
Rock 561 0.5337773549000951
Alternative & Punk 130 0.12369172216936251
Metal 124 0.11798287345385347
R&B/Soul 53 0.05042816365366318
Blues 36 0.03425309229305423

Based on the sales of tracks across different genres in the USA, we should purchase the new albums by the following artists:

  • Red Tone (Punk)
  • Slim Jim Bites (Blues)
  • Meteor and the Girls (Pop)

It's worth keeping in mind that combined, these three genres only make up only 17% of total sales, so we should be on the lookout for artists and albums from the rock genre, which accounts for 53% of sales.

Analyzing Employee Sales Performance¶

Each customer for the Chinook store gets assigned to a sales support agent within the company when they first make a purchase. We have been asked to analyze the purchases of customers belonging to each employee to see if any sales support agent is performing either better or worse than the others.

We might consider whether any extra columns from the employee table explain any variance, or whether the variance might instead be indicative of employee performance.

In [4]:
%%sql
SELECT employee_id, first_name, last_name, hire_date, SUM(total) total
FROM (SELECT *
      FROM invoice i
      LEFT JOIN customer c ON i.customer_id=c.customer_id
      LEFT JOIN employee e ON c.support_rep_id=e.employee_id
      )
GROUP BY employee_id
ORDER BY total DESC;
 * sqlite:///chinook.db
Done.
Out[4]:
employee_id first_name last_name hire_date total
3 Michelle Brooks 2017-04-01 00:00:00 1731.5100000000039
4 Dominique Lefebvre 2017-05-03 00:00:00 1584.0000000000034
5 Martha Silk 2017-10-17 00:00:00 1393.920000000002

While there is a 20% difference in sales between Jane (the top employee) and Steve (the bottom employee), the difference roughly corresponds to the differences in their hiring dates.

Analyzing Sales by Country¶

Our next task is to analyze the sales data for customers from each different country. We have used the country value from the customers table, and ignored the country from the billing address in the invoice table.

In particular, we have been directed to calculate data, for each country, on the:

  • total number of customers
  • total value of sales
  • average value of sales per customer
  • average order value

Because there are a number of countries with only one customer, you should group these customers as "Other" in your analysis.

In [5]:
%%sql
WITH country_other AS
    (SELECT CASE
              WHEN (
                SELECT COUNT(*)
                FROM customer
                WHERE country=c.country
                ) = 1 THEN 'Other'
              ELSE c.country
            END AS country,
            c.customer_id,
            i.total,
            i.invoice_id
    FROM customer c
    INNER JOIN invoice i ON c.customer_id=i.customer_id
    )
SELECT
    country,
    COUNT(DISTINCT customer_id) total_customer,
    SUM(total) total_value,
    SUM(total)/COUNT(DISTINCT customer_id) avg_value_customer,
    SUM(total)/COUNT(DISTINCT invoice_id) avg_value_invoice,
    CASE
        WHEN country = 'Other' THEN 1
        ELSE 0
    END AS sort
FROM country_other
GROUP BY country
ORDER BY sort ASC, total_value DESC
 * sqlite:///chinook.db
Done.
Out[5]:
country total_customer total_value avg_value_customer avg_value_invoice sort
USA 13 1040.4899999999998 80.0376923076923 7.942671755725189 0
Canada 8 535.5900000000001 66.94875000000002 7.047236842105265 0
Brazil 5 427.68000000000006 85.53600000000002 7.0111475409836075 0
France 5 389.0699999999999 77.81399999999998 7.781399999999998 0
Germany 4 334.62 83.655 8.161463414634147 0
Czech Republic 2 273.24000000000007 136.62000000000003 9.108000000000002 0
United Kingdom 3 245.52 81.84 8.768571428571429 0
Portugal 2 185.13000000000002 92.56500000000001 6.383793103448276 0
India 2 183.14999999999998 91.57499999999999 8.72142857142857 0
Other 15 1094.9400000000003 72.99600000000002 7.44857142857143 1

Based on the data, there may be opportunity in the following countries:

  • USA contributes the most sales from the highest number of the customers, it is always the biggest market to prioritize the resources in order to increase the sales.
  • Canada stands the second place for the sales with the second most of the customers. However the average customer purchasing value is the lowest. It is a good potential improve the sales by encourage individual customer to buy more.
  • Czech Republic gets the highest average customer purchase. This a great potential market to explore, because they love our products. We need to focus on get more people to know our business.

It's worth remembering because the amount of data from each of these countries is relatively low. As such, we should be cautious about spending too much money on new marketing campaigns because the sample size isn't large enough to give us high confidence. A better approach would be to run small campaigns in these countries, collecting and analyzing the new customers to make sure that these trends hold with new customers.

Albums vs. Individual Tracks¶

The Chinook store is setup in a way that allows customer to make purchases in one of the two ways:

  • purchase a whole album
  • purchase a collection of one or more individual tracks.

The store does not let customers purchase a whole album, and then add individual tracks to that same purchase (unless they do that by choosing each track manually). When customers purchase albums they are charged the same price as if they had purchased each of those tracks separately.

Management are currently considering changing their purchasing strategy to save money. The strategy they are considering is to purchase only the most popular tracks from each album from record companies, instead of purchasing every track from an album.

We have been asked to find out what percentage of purchases are individual tracks vs whole albums, so that management can use this data to understand the effect this decision might have on overall revenue.

In [6]:
%%sql
WITH buytrack AS (
    SELECT il.invoice_id invoice_id,
          CASE
            WHEN (
              SELECT track_id 
              FROM invoice_line
              WHERE invoice_id = il.invoice_id
              EXCEPT
              SELECT track_id
              FROM  track 
              WHERE album_id = t.album_id
            ) IS NULL
            AND (
              SELECT track_id
              FROM  track 
              WHERE album_id = t.album_id
              EXCEPT
              SELECT track_id 
              FROM invoice_line
              WHERE invoice_id = il.invoice_id
            ) IS NULL
              THEN "No"
            ELSE "YES"
            END AS if_track
    FROM invoice_line il
    LEFT JOIN track t ON il.track_id = t.track_id
              
    )
  SELECT  if_track,
          COUNT(DISTINCT invoice_id) number_of_invoice,
          COUNT(DISTINCT invoice_id)
          /CAST(
            (SELECT COUNT(invoice_id) FROM invoice) 
            AS FLOAT
            ) percentage
  FROM buytrack
  GROUP BY if_track
 * sqlite:///chinook.db
Done.
Out[6]:
if_track number_of_invoice percentage
No 114 0.18566775244299674
YES 500 0.8143322475570033

There are more than 80% of the purchase is by track from the sales report. It is good idea to change the strategy to purchase only the most popular tracks from each album from record companies, instead of purchasing every track from an album.