by R. Grothmann

Compute the probability to get a choice with different objects, if m of out of n are chosen randomly. The formula is of course

We use the prod formula of Euler for this.

>function map f(m,n) := prod((n:-1:n-m+1)/n)

The following is the chance to get 5 different numbers, if 5 are chosen from 10 numbers.

>f(5,10)

0.3024

It is an almost safe bet, that there are at least two out of 60 persons with same birthday.

>f(60,365)

0.00587733913465

The bet on this event becomes favorable with more than 23 people.

>m=1:100; plot2d(m,f(m,365)):

Let us investigate Lotto in a small town like Eichstätt, Germany (assuming 10000 Lotto players). Chances are 1:36 against all Lotto sheets being different.

>n=bin(49,6)

13983816

>1/f(10000,n)

35.7323646644

For large n and m, it is better to use an approximation using Sterling's formula.

>function map fa(m,n) := sqrt(n/(n-m))*(n/(n-m))^(n-m)*exp(-m) >fa(60,365)

0.00587760311246

>f(60,365)

0.00587733913465

When does the probability pass 0.5 in our birthday problem?

>bisect("fa(x,365)-0.5",20,30)

22.7679316109

In Euler, the bisection method works for discrete functions.

>bisect("f(x,365)-0.5",20,30)

23

Indeed, the switch is between 22 and 23.

>f(21:30,365)

[0.556312, 0.524305, 0.492703, 0.461656, 0.4313, 0.401759, 0.373141, 0.345539, 0.319031, 0.293684]

We simulate the choice with a Monte Carlo simulation. The following function returns the percentage of j equal entries when choosing m out of n.

>function simulate (m,n,k=100000) ... res=zeros(1,m); loop 1 to k h=floor(random(1,m)*n); j=max(count(h[1:m],n)); res[j]=res[j]+1; end; j=max(nonzeros(res<>0)); return res[1:j]/k; endfunction

We select 200 people 100000 times. It is very likely to find three or more with same birthday.

>res=simulate(200,365); plot2d(res,bar=1):

The simulated result is close to the theoretical result.

>res=simulate(40,365); res[1], f(40,365),

0.1099 0.108768190182

When selecting 5 out of 10, finding 2 equal numbers has the largest chance.

>res=simulate(5,10); plot2d(res,bar=1):

The simulation result is close to the theoretical result.

>res[1], f(5,10),

0.30068 0.3024

What is the average waiting time until two persons with same birthday are found? We observe that the chance to find the first same birthday at the i-th person is f(i-1,n)*(i-1)/n, since the first i-1 must be different from each other and the next one must be one of the first i-1.

>n=365; i=2:n; sum(fmap(i-1,n)*(i-1)/n*i)

24.6165858946

Here is a distribution of the waiting time.

>plot2d(fmap(i-1,n)*(i-1)/n):