Developing A Mobile App For Alleviating Lottery Addiction

This RMarkdown file is intended to lay out the logic of a mobile app designed for those addicted to the lottery. By showing a user how to calculate the incredibly small probabilities of winning the lottery, we hope that the app will help them better grasp that buying multiple lottery tickets will do little to help them win. Through this understanding, they will hopefully stop purchasing lottery tickets in an unhealthy manner.

Core Functions

factorial <- function(n) {
  product = 1
  for (i in 1:n) {
    product = product * i
  }
  return(product)
}
combinations <- function(n, k) {
  numerator <- factorial(n)
  denominator <- factorial(k) * factorial(n - k)
  return(numerator / denominator)
}

permutations <- function(n,k) {
  numerator <- factoiral(n)
  denominator <-factorial(n-k)
  return (numerator/denominator)
}

One-Ticket Probability

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. A player wins the big prize if the six numbers on their tickets match all the six numbers drawn. If a player has a ticket with the numbers {13, 22, 24, 27, 42, 44}, he only wins the big prize if the numbers drawn are {13, 22, 24, 27, 42, 44}. Even if just one number differs, they won’t win.

For the first version of the app, we want players to be able to calculate the probability of winning the big prize with the various numbers they play on a single ticket (for each ticket a player chooses six numbers out of 49). So, we’ll start by building a function that calculates the probability of winning the big prize for any given ticket.

We discussed with the engineering team of the medical institute, and they told us we need to be aware of the following details when we write the function:

Inside the app, the user inputs six different numbers from 1 to 49. Under the hood, the six numbers will come as an R vector, which will serve as the single input to our function. The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.

one_ticket_probability <- function(nums) {
  
  #the total number of possible outcomes
  total_combinations <- combinations(49, 6)
  
  #The user inputs just one combination, which means the number of successful outcomes is 1. Use the number of successful outcomes and the total number of possible outcomes to calculate the probability for one ticket.
  prob <- (1 / total_combinations) * 100
  
  #The function should print the probability in a way that's easy to understand. 
  #sprintf(fmt, ...) is w wrapper that returns a character vector containing a formatted combination of text and variable values. Format starts with a '%' and end with 'f'. 'm.n' Two numbers separated by a period, denoting the field width (m) and the precision (n). 'f' is double precision value,
  pretty_prob <- sprintf("%1.9f", prob)
  
  s <- paste("You have a ", pretty_prob, "% chance of winning the big prize.", sep = "")
  return(s)
}
one_ticket_probability(c(1, 2, 3, 4, 5, 6))
## [1] "You have a 0.000007151% chance of winning the big prize."

Historical Data Check for Canada Lottery

On the previous section, we wrote a function that can tell users what is the probability of winning the big prize with a single ticket. For the first version of the app, however, users should also be able to compare their ticket against past winning combinations in the historical lottery data in Canada. Having this functionality will allow users to determine whether they would have ever won by now.

library(tidyverse)
lottery649 <- read_csv("649.csv")
print(dim(lottery649))
## [1] 3665   11
head(lottery649, 3)
## # A tibble: 3 × 11
##   PRODUCT `DRAW NUMBER` SEQUEN…¹ DRAW …² NUMBE…³ NUMBE…⁴ NUMBE…⁵ NUMBE…⁶ NUMBE…⁷
##     <dbl>         <dbl>    <dbl> <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1     649             1        0 6/12/1…       3      11      12      14      41
## 2     649             2        0 6/19/1…       8      33      36      37      39
## 3     649             3        0 6/26/1…       1       6      23      24      27
## # … with 2 more variables: `NUMBER DRAWN 6` <dbl>, `BONUS NUMBER` <dbl>, and
## #   abbreviated variable names ¹​`SEQUENCE NUMBER`, ²​`DRAW DATE`,
## #   ³​`NUMBER DRAWN 1`, ⁴​`NUMBER DRAWN 2`, ⁵​`NUMBER DRAWN 3`, ⁶​`NUMBER DRAWN 4`,
## #   ⁷​`NUMBER DRAWN 5`
tail(lottery649, 3)
## # A tibble: 3 × 11
##   PRODUCT `DRAW NUMBER` SEQUEN…¹ DRAW …² NUMBE…³ NUMBE…⁴ NUMBE…⁵ NUMBE…⁶ NUMBE…⁷
##     <dbl>         <dbl>    <dbl> <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1     649          3589        0 6/13/2…       6      22      24      31      32
## 2     649          3590        0 6/16/2…       2      15      21      31      38
## 3     649          3591        0 6/20/2…      14      24      31      35      37
## # … with 2 more variables: `NUMBER DRAWN 6` <dbl>, `BONUS NUMBER` <dbl>, and
## #   abbreviated variable names ¹​`SEQUENCE NUMBER`, ²​`DRAW DATE`,
## #   ³​`NUMBER DRAWN 1`, ⁴​`NUMBER DRAWN 2`, ⁵​`NUMBER DRAWN 3`, ⁶​`NUMBER DRAWN 4`,
## #   ⁷​`NUMBER DRAWN 5`

A New Data Structure

Learn how to use pmap, take some time to practice creating and accessing data in lists.

data1 <- c(1, 3, 5)
data2 <- c(2, 4, 6)
data3 <- c(8, 9, 7)

unnamed_list <- list(data1, data2, data3)
first_vector <- unnamed_list[[1]]
named_list <-list(first = data1, second = data2, third = data3)
first_item_sum <- named_list$first[1] + named_list$second[1] + named_list$third[1]

Using pmap

data_list <- list(data1, data2, data3)

averages <- pmap(data_list, function(x, y, z) { (x + y + z) / 3 })
first_average <- unlist(averages)[1]

A test for rowwise() and map()

data1 <- c(1, 3, 5)
data2 <- c(2, 4, 6)
data4 <- c('aaa,bbb,ccc,d,e', 'aa,bb,cc,dd', 'aa,bbb,c')
test <- tibble(x=data1, y=data2, z=data4)
test1<- test %>% 
  rowwise()%>%
  mutate(count = length(str_split(z,',')[[1]])) %>% 
  ungroup()
test1
## # A tibble: 3 × 4
##       x     y z               count
##   <dbl> <dbl> <chr>           <int>
## 1     1     2 aaa,bbb,ccc,d,e     5
## 2     3     4 aa,bb,cc,dd         4
## 3     5     6 aa,bbb,c            3
#test2<- length(str_split(data3,',')[[1]])
 test2<- test %>% 
   mutate(count = unlist( map(z, function(x) length(str_split(x , ',')[[1]]))))
test2
## # A tibble: 3 × 4
##       x     y z               count
##   <dbl> <dbl> <chr>           <int>
## 1     1     2 aaa,bbb,ccc,d,e     5
## 2     3     4 aa,bb,cc,dd         4
## 3     5     6 aa,bbb,c            3

Function for Historical Data Check

In the previous section, we focused on opening and exploring the Canada lottery data set. On this screen, we’re going to write a function that will enable users to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

Inside the app, the user inputs six different numbers from 1 to 49.

  • Under the hood, the six numbers will come as an R vector and serve as an input to our function.
  • The engineering team wants us to write a function that prints:
    • the number of times the combination selected occurred in the Canada data set and
    • the probability of winning the big prize in the next drawing with that combination.

Extract all the winning six numbers from the historical data set into an R vector.

Use the pmap function to take the 6 NUMBER DRAWN columns and output a list of vectors. For example, the first row of the lottery data set is {3, 12, 11, 14, 41, 43}, so the first item of this list should be c(3, 12, 11, 14, 41, 43).

historical_lots <- pmap(
  list(
    u <- lottery649$`NUMBER DRAWN 1`,
    v <- lottery649$`NUMBER DRAWN 2`,
    w <- lottery649$`NUMBER DRAWN 3`,
    x <- lottery649$`NUMBER DRAWN 4`,
    y <- lottery649$`NUMBER DRAWN 5`,
    z <- lottery649$`NUMBER DRAWN 6`
  ), 
#  .f <- function(u, v, w, x, y, z) { c(u, v, w, x, y, z) }
#  function(u, v, w, x, y, z) { c(u, v, w, x, y, z) }
   function(u, v, w, x, y, z)  c(u, v, w, x, y, z) 
  )

Write a function named check_historical_occurence() that takes in two inputs: an R vector containing the user numbers and the list containing the sets of the winning numbers from last part

  • Compare the numbers given by the user against the list you created. If the user numbers match the winning lot, then return TRUE. If not, return FALSE. The end result of the comparison should be a vector of Boolean values. The setequal() function may come in handy here.
  • Print information about the number of times the combination inputted by the user occurred in the past.
  • Print information (in an easy-to-understand way) about the probability of winning the big prize in the next drawing with that combination.
library(sets)
## 
## Attaching package: 'sets'
## The following object is masked from 'package:forcats':
## 
##     %>%
## The following object is masked from 'package:stringr':
## 
##     %>%
## The following object is masked from 'package:dplyr':
## 
##     %>%
## The following object is masked from 'package:purrr':
## 
##     %>%
## The following object is masked from 'package:tidyr':
## 
##     %>%
## The following object is masked from 'package:tibble':
## 
##     %>%
check_historical_occurrences <- function(lot, hist_lots = historical_lots) {
  historical_matches <- unlist(map(hist_lots, function(x) setequal(x, lot) ))
  num_past_matches <- sum(historical_matches)
  s <- paste("The combination you entered has appeared ", 
             num_past_matches, 
             " times in the past. ",
             "Your chance of winning the big prize in the next drawing using this combination is 0.0000072%", sep = "")
  return(s)
}

Test the function with a few inputs

  • Try 3, 11, 12, 14, 41, 43, and 13. This is the first row in the data set, so your function should be able to detect it.
  • Try a string of 6 consecutive values. It’s highly unlikely that 6 consecutive numbers would get picked together, so we shouldn’t see it in the data set.
check_historical_occurrences(c(3, 12, 11, 14, 41, 43))
## [1] "The combination you entered has appeared 1 times in the past. Your chance of winning the big prize in the next drawing using this combination is 0.0000072%"
check_historical_occurrences(c(1, 2, 3, 4, 5, 6))
## [1] "The combination you entered has appeared 0 times in the past. Your chance of winning the big prize in the next drawing using this combination is 0.0000072%"

Multi-ticket Probability

So far, we’ve written two main functions for the app:

one_ticket_probability() — calculates the probability of winning the big prize with a single ticket check_historical_occurrence() — checks whether a certain combination has occurred in the Canada lottery data set

One situation our functions do not cover is the issue of multiple tickets. Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning.

we’re going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.

We’ve talked with the engineering team and they gave us the following information:

Write a function named multi_ticket_probability that prints the probability of winning the big prize depending on the number of different tickets played.

  1. Start by calculating the total number of possible outcomes — this is total number of combinations for a six-number lottery ticket. There are 49 total numbers, and six numbers are sampled without replacement. Use the combinations() function you wrote in a previous screen.
  2. The number of successful outcomes is given by the number of tickets the user intends to play.
  3. Use the number of successful outcomes and the total number of possible outcomes to calculate the probability for the number of tickets inputted.
  4. The function should print the probability in a way that’s easy to understand. It’s up to you what you choose, but here are a few suggestions:
    • Print the probability as a percentage.
    • Use the sprintf() method to make the printed message more personalized with respect to what the user inputs.
multi_ticket_probability <- function(n) {
  total_combinations <- combinations(49, 6)
  prob <- (n / total_combinations) * 100
  pretty_prob <- sprintf("%1.9f", prob)
  s <- paste("you have a ", pretty_prob, "% chance of winning the big prize.", sep = "")
  return(s)
}

Test the function

test_amounts <- c(1, 10, 100, 10000, 1000000, 6991908, 13983816)
for (n in test_amounts) {
  print(paste("For ", n, " tickets, ",  multi_ticket_probability(n), sep = ""))
}
## [1] "For 1 tickets, you have a 0.000007151% chance of winning the big prize."
## [1] "For 10 tickets, you have a 0.000071511% chance of winning the big prize."
## [1] "For 100 tickets, you have a 0.000715112% chance of winning the big prize."
## [1] "For 10000 tickets, you have a 0.071511238% chance of winning the big prize."
## [1] "For 1e+06 tickets, you have a 7.151123842% chance of winning the big prize."
## [1] "For 6991908 tickets, you have a 50.000000000% chance of winning the big prize."
## [1] "For 13983816 tickets, you have a 100.000000000% chance of winning the big prize."

Less Winning Numbers — Function

In this part, we’re going to write one more function to allow the users to calculate probabilities for three, four, or five winning numbers.

For extra context, in most 6/49 lotteries there are smaller prizes if a player’s ticket matches three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having three, four, or five winning numbers.

These are the engineering details we’ll need to be aware of:

To calculate the probabilities, we tell the engineering team that the specific combination on the ticket is irrelevant and we only need the integer between 3 and 5 representing the number of winning numbers expected. Consequently, we will write a function which takes in an integer and prints information about the chances of winning depending on the value of that integer.

Write a function named probability_less_6which takes in an integer and prints information about the chances of winning depending on the value of that integer

  1. Calculate the number of successful outcomes. For instance, if the user inputs 5, then the number of successful outcomes is given by all the combinations of five numbers from a ticket of six numbers (the actual numbers are irrelevant here, the number of combinations will be the same for each six-number ticket).
  2. Calculate the number of total possible outcomes. For instance, if the user inputs 5, then the number of total possible outcomes is given by all the five-number combinations from the set of 49 unique numbers that range from 1 to 49.
  3. Calculate the probability using the number of successful outcomes and the number of total possible outcomes.
  4. Display the probability value in a way that will be easy to understand for the user.
probability_less_6 <- function(n) {
  
    n_combinations_ticket = combinations(6, n)
    n_combinations_remaining = combinations(49 - n, 6 - n)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    n_combinations_total = combinations(49, 6)
    
    prob = (successful_outcomes / n_combinations_total) * 100
    
    # This Prob includes win 4, 5, and 6 numbers cases, so the result is the probability to win at least n numbers out of 6 numbers.
    
    pretty_prob <- sprintf("%1.9f", prob)
  
  s <- paste("you have a ", pretty_prob, "% chance of winning the big prize.", sep = "")
  return(s)
}

Test the function on some possible inputs

winning_nums <- c(3, 4, 5)
for (n in winning_nums) {
  print(paste("For ", n, " tickets, ",  probability_less_6(n), sep = ""))
}
## [1] "For 3 tickets, you have a 2.171081198% chance of winning the big prize."
## [1] "For 4 tickets, you have a 0.106194189% chance of winning the big prize."
## [1] "For 5 tickets, you have a 0.001887897% chance of winning the big prize."

Possible features for a second version of the app

library(sets)
check_historical_occurrences_update <- function(lot, hist_lots = historical_lots) {
  historical_matches <- unlist(map(hist_lots, function(x) setequal(x, lot) ))
  num_past_matches <- sum(historical_matches)
  total_combinations <- combinations(49, 6)
  prob <- (1 / total_combinations) * 100
  pretty_prob <- sprintf("%1.9f", prob)
  s <- paste("The combination you entered has appeared ", 
             num_past_matches, 
             " times in the past. ",
             "Your chance of winning the big prize in the next drawing using this combination is ",pretty_prob,"%", sep = "")
  return(s)
}

check_historical_occurrences_update(c(29,1,22,11,29,30))
## [1] "The combination you entered has appeared 0 times in the past. Your chance of winning the big prize in the next drawing using this combination is 0.000007151%"