Let’s look at an example of how rare events in big data can occur a large number of times if the population is large enough.
Let’s consider our example of tossing a coin again. The probability of getting a head in a single coin toss is 0.5 and having 10 heads in a row is
p=0.510?0.00097p=0.510?0.00097
The problem with doing this experiment many times, for example 10000 times or a million times, is that we are not accounting for these many independent experiments.
Let kk be the number of experiments that we are doing where each experiment consists of tossing the coin 10 times. We want to determine the probability that we see 10 heads in at least one of the kk experiments. You can figure this out by the following calculations: