This RMarkdown file is intended to lay out the logic of a mobile app designed for those addicted to the lottery. By showing a user how to calculate the incredibly small probabilities of winning the lottery, we hope that the app will help them better grasp that buying multiple lottery tickets will do little to help them win. Through this understanding, they will hopefully stop purchasing lottery tickets in an unhealthy manner.
factorial <- function(n) {
product = 1
for (i in 1:n) {
product = product * i
}
return(product)
}
combinations <- function(n, k) {
numerator <- factorial(n)
denominator <- factorial(k) * factorial(n - k)
return(numerator / denominator)
}
permutations <- function(n,k) {
numerator <- factoiral(n)
denominator <-factorial(n-k)
return (numerator/denominator)
}
In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. A player wins the big prize if the six numbers on their tickets match all the six numbers drawn. If a player has a ticket with the numbers {13, 22, 24, 27, 42, 44}, he only wins the big prize if the numbers drawn are {13, 22, 24, 27, 42, 44}. Even if just one number differs, they won’t win.
For the first version of the app, we want players to be able to calculate the probability of winning the big prize with the various numbers they play on a single ticket (for each ticket a player chooses six numbers out of 49). So, we’ll start by building a function that calculates the probability of winning the big prize for any given ticket.
We discussed with the engineering team of the medical institute, and they told us we need to be aware of the following details when we write the function:
Inside the app, the user inputs six different numbers from 1 to 49. Under the hood, the six numbers will come as an R vector, which will serve as the single input to our function. The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.
one_ticket_probability <- function(nums) {
#the total number of possible outcomes
total_combinations <- combinations(49, 6)
#The user inputs just one combination, which means the number of successful outcomes is 1. Use the number of successful outcomes and the total number of possible outcomes to calculate the probability for one ticket.
prob <- (1 / total_combinations) * 100
#The function should print the probability in a way that's easy to understand.
#sprintf(fmt, ...) is w wrapper that returns a character vector containing a formatted combination of text and variable values. Format starts with a '%' and end with 'f'. 'm.n' Two numbers separated by a period, denoting the field width (m) and the precision (n). 'f' is double precision value,
pretty_prob <- sprintf("%1.9f", prob)
s <- paste("You have a ", pretty_prob, "% chance of winning the big prize.", sep = "")
return(s)
}
one_ticket_probability(c(1, 2, 3, 4, 5, 6))
## [1] "You have a 0.000007151% chance of winning the big prize."
On the previous section, we wrote a function that can tell users what is the probability of winning the big prize with a single ticket. For the first version of the app, however, users should also be able to compare their ticket against past winning combinations in the historical lottery data in Canada. Having this functionality will allow users to determine whether they would have ever won by now.
library(tidyverse)
lottery649 <- read_csv("649.csv")
print(dim(lottery649))
## [1] 3665 11
head(lottery649, 3)
## # A tibble: 3 × 11
## PRODUCT `DRAW NUMBER` SEQUEN…¹ DRAW …² NUMBE…³ NUMBE…⁴ NUMBE…⁵ NUMBE…⁶ NUMBE…⁷
## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 649 1 0 6/12/1… 3 11 12 14 41
## 2 649 2 0 6/19/1… 8 33 36 37 39
## 3 649 3 0 6/26/1… 1 6 23 24 27
## # … with 2 more variables: `NUMBER DRAWN 6` <dbl>, `BONUS NUMBER` <dbl>, and
## # abbreviated variable names ¹`SEQUENCE NUMBER`, ²`DRAW DATE`,
## # ³`NUMBER DRAWN 1`, ⁴`NUMBER DRAWN 2`, ⁵`NUMBER DRAWN 3`, ⁶`NUMBER DRAWN 4`,
## # ⁷`NUMBER DRAWN 5`
tail(lottery649, 3)
## # A tibble: 3 × 11
## PRODUCT `DRAW NUMBER` SEQUEN…¹ DRAW …² NUMBE…³ NUMBE…⁴ NUMBE…⁵ NUMBE…⁶ NUMBE…⁷
## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 649 3589 0 6/13/2… 6 22 24 31 32
## 2 649 3590 0 6/16/2… 2 15 21 31 38
## 3 649 3591 0 6/20/2… 14 24 31 35 37
## # … with 2 more variables: `NUMBER DRAWN 6` <dbl>, `BONUS NUMBER` <dbl>, and
## # abbreviated variable names ¹`SEQUENCE NUMBER`, ²`DRAW DATE`,
## # ³`NUMBER DRAWN 1`, ⁴`NUMBER DRAWN 2`, ⁵`NUMBER DRAWN 3`, ⁶`NUMBER DRAWN 4`,
## # ⁷`NUMBER DRAWN 5`
Learn how to use pmap
, take some time to practice
creating and accessing data in lists.
data1 <- c(1, 3, 5)
data2 <- c(2, 4, 6)
data3 <- c(8, 9, 7)
unnamed_list <- list(data1, data2, data3)
first_vector <- unnamed_list[[1]]
named_list <-list(first = data1, second = data2, third = data3)
first_item_sum <- named_list$first[1] + named_list$second[1] + named_list$third[1]
data_list <- list(data1, data2, data3)
averages <- pmap(data_list, function(x, y, z) { (x + y + z) / 3 })
first_average <- unlist(averages)[1]
rowwise()
and map()
data1 <- c(1, 3, 5)
data2 <- c(2, 4, 6)
data4 <- c('aaa,bbb,ccc,d,e', 'aa,bb,cc,dd', 'aa,bbb,c')
test <- tibble(x=data1, y=data2, z=data4)
test1<- test %>%
rowwise()%>%
mutate(count = length(str_split(z,',')[[1]])) %>%
ungroup()
test1
## # A tibble: 3 × 4
## x y z count
## <dbl> <dbl> <chr> <int>
## 1 1 2 aaa,bbb,ccc,d,e 5
## 2 3 4 aa,bb,cc,dd 4
## 3 5 6 aa,bbb,c 3
#test2<- length(str_split(data3,',')[[1]])
test2<- test %>%
mutate(count = unlist( map(z, function(x) length(str_split(x , ',')[[1]]))))
test2
## # A tibble: 3 × 4
## x y z count
## <dbl> <dbl> <chr> <int>
## 1 1 2 aaa,bbb,ccc,d,e 5
## 2 3 4 aa,bb,cc,dd 4
## 3 5 6 aa,bbb,c 3
In the previous section, we focused on opening and exploring the Canada lottery data set. On this screen, we’re going to write a function that will enable users to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.
Inside the app, the user inputs six different numbers from 1 to 49.
Use the pmap function
to take the 6 NUMBER DRAWN columns
and output a list of vectors. For example, the first row of the lottery
data set is {3, 12, 11, 14, 41, 43}, so the first item of this list
should be c(3, 12, 11, 14, 41, 43)
.
historical_lots <- pmap(
list(
u <- lottery649$`NUMBER DRAWN 1`,
v <- lottery649$`NUMBER DRAWN 2`,
w <- lottery649$`NUMBER DRAWN 3`,
x <- lottery649$`NUMBER DRAWN 4`,
y <- lottery649$`NUMBER DRAWN 5`,
z <- lottery649$`NUMBER DRAWN 6`
),
# .f <- function(u, v, w, x, y, z) { c(u, v, w, x, y, z) }
# function(u, v, w, x, y, z) { c(u, v, w, x, y, z) }
function(u, v, w, x, y, z) c(u, v, w, x, y, z)
)
check_historical_occurence()
that takes in two inputs: an R vector containing the user numbers and
the list containing the sets of the winning numbers from last partTRUE
. If not, return FALSE
. The end result of
the comparison should be a vector of Boolean values. The
setequal()
function may come in handy here.library(sets)
##
## Attaching package: 'sets'
## The following object is masked from 'package:forcats':
##
## %>%
## The following object is masked from 'package:stringr':
##
## %>%
## The following object is masked from 'package:dplyr':
##
## %>%
## The following object is masked from 'package:purrr':
##
## %>%
## The following object is masked from 'package:tidyr':
##
## %>%
## The following object is masked from 'package:tibble':
##
## %>%
check_historical_occurrences <- function(lot, hist_lots = historical_lots) {
historical_matches <- unlist(map(hist_lots, function(x) setequal(x, lot) ))
num_past_matches <- sum(historical_matches)
s <- paste("The combination you entered has appeared ",
num_past_matches,
" times in the past. ",
"Your chance of winning the big prize in the next drawing using this combination is 0.0000072%", sep = "")
return(s)
}
check_historical_occurrences(c(3, 12, 11, 14, 41, 43))
## [1] "The combination you entered has appeared 1 times in the past. Your chance of winning the big prize in the next drawing using this combination is 0.0000072%"
check_historical_occurrences(c(1, 2, 3, 4, 5, 6))
## [1] "The combination you entered has appeared 0 times in the past. Your chance of winning the big prize in the next drawing using this combination is 0.0000072%"
So far, we’ve written two main functions for the app:
one_ticket_probability()
— calculates the probability of
winning the big prize with a single ticket
check_historical_occurrence()
— checks whether a certain
combination has occurred in the Canada lottery data set
One situation our functions do not cover is the issue of multiple tickets. Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning.
we’re going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.
We’ve talked with the engineering team and they gave us the following information:
multi_ticket_probability
that
prints the probability of winning the big prize depending on the number
of different tickets played.multi_ticket_probability <- function(n) {
total_combinations <- combinations(49, 6)
prob <- (n / total_combinations) * 100
pretty_prob <- sprintf("%1.9f", prob)
s <- paste("you have a ", pretty_prob, "% chance of winning the big prize.", sep = "")
return(s)
}
test_amounts <- c(1, 10, 100, 10000, 1000000, 6991908, 13983816)
for (n in test_amounts) {
print(paste("For ", n, " tickets, ", multi_ticket_probability(n), sep = ""))
}
## [1] "For 1 tickets, you have a 0.000007151% chance of winning the big prize."
## [1] "For 10 tickets, you have a 0.000071511% chance of winning the big prize."
## [1] "For 100 tickets, you have a 0.000715112% chance of winning the big prize."
## [1] "For 10000 tickets, you have a 0.071511238% chance of winning the big prize."
## [1] "For 1e+06 tickets, you have a 7.151123842% chance of winning the big prize."
## [1] "For 6991908 tickets, you have a 50.000000000% chance of winning the big prize."
## [1] "For 13983816 tickets, you have a 100.000000000% chance of winning the big prize."
In this part, we’re going to write one more function to allow the users to calculate probabilities for three, four, or five winning numbers.
For extra context, in most 6/49 lotteries there are smaller prizes if a player’s ticket matches three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having three, four, or five winning numbers.
These are the engineering details we’ll need to be aware of:
To calculate the probabilities, we tell the engineering team that the specific combination on the ticket is irrelevant and we only need the integer between 3 and 5 representing the number of winning numbers expected. Consequently, we will write a function which takes in an integer and prints information about the chances of winning depending on the value of that integer.
probability_less_6
which takes in
an integer and prints information about the chances of winning depending
on the value of that integerprobability_less_6 <- function(n) {
n_combinations_ticket = combinations(6, n)
n_combinations_remaining = combinations(49 - n, 6 - n)
successful_outcomes = n_combinations_ticket * n_combinations_remaining
n_combinations_total = combinations(49, 6)
prob = (successful_outcomes / n_combinations_total) * 100
# This Prob includes win 4, 5, and 6 numbers cases, so the result is the probability to win at least n numbers out of 6 numbers.
pretty_prob <- sprintf("%1.9f", prob)
s <- paste("you have a ", pretty_prob, "% chance of winning the big prize.", sep = "")
return(s)
}
winning_nums <- c(3, 4, 5)
for (n in winning_nums) {
print(paste("For ", n, " tickets, ", probability_less_6(n), sep = ""))
}
## [1] "For 3 tickets, you have a 2.171081198% chance of winning the big prize."
## [1] "For 4 tickets, you have a 0.106194189% chance of winning the big prize."
## [1] "For 5 tickets, you have a 0.001887897% chance of winning the big prize."
library(sets)
check_historical_occurrences_update <- function(lot, hist_lots = historical_lots) {
historical_matches <- unlist(map(hist_lots, function(x) setequal(x, lot) ))
num_past_matches <- sum(historical_matches)
total_combinations <- combinations(49, 6)
prob <- (1 / total_combinations) * 100
pretty_prob <- sprintf("%1.9f", prob)
s <- paste("The combination you entered has appeared ",
num_past_matches,
" times in the past. ",
"Your chance of winning the big prize in the next drawing using this combination is ",pretty_prob,"%", sep = "")
return(s)
}
check_historical_occurrences_update(c(29,1,22,11,29,30))
## [1] "The combination you entered has appeared 0 times in the past. Your chance of winning the big prize in the next drawing using this combination is 0.000007151%"