Search
• Kalpesh Agrawal

# Statistical Concepts- 2

In part-1 of this article, you have been learned the types of statistics, and a detailed understanding of those types.

`Related: Statistical Concepts- 1`

### Distributions

Distributions, or more specifically probability distribution are a way of gaining meaningful insights about the data values with regard to the possibility of their outcomes over an individual or a range of values.

A set of defined distributions have been established for values, both DISCRETE and CONTINUOUS in nature.

### Discrete Probability Distributions

It is often experienced that our day to day life outcomes are binary in nature.

• Whether or not will the exam be cleared

• Whether or not the milkman will come

• Whether or not will India win the match today

All these outcomes can be observed to be BINARY in nature, given the appropriate set of assumptions, we could apply and model the outcomes using any one of the Below mentioned discrete probability distributions.

• Bernoulli

• Binomial

• Poisson

• Geometric

• Negative Binomial

But what if the outcome is DISCRETE BUT NOT BINARY in nature. In that case, we try and adopt the Goodness of Fit approach where we compare the observed outcome to the expected outcome and check for any statistically significant result.

### Bernoulli Distribution

This distribution is of great aid in modelling the binary outcomes. Though, there are various assumptions made along the way:

• There is only one trial

• Each trial has a defined probability of Success and Failure

• The probabilities are Independent and Exhaustive

Note: Exhaustive indicates that the probabilities add to 1.

What is meant by independence?

It simply means that neither does the probability of occurrence of one event influence the probability of another event nor is it get influenced.

Why is it an important distribution given it takes into consideration only a single trial?

It is an important distribution as it sets the basis for other distributions.

### Binomial Distribution

The above Bernoulli distribution if repeated for N trials and still manages to produce binary outcomes with the same probability in each trial are said to follow a Binomial Distribution. The assumption about the Independent and Exhaustive nature of the probabilities holds true for all the N trials.

For what objective is the distribution ideally used?

This distribution is highly useful in cases where we want to model the number of success in a defined number of trials.

Since, the order in which we observe the success is dynamic, we take into consideration all the possible combinations.

Let us understand

Suppose the probability of success is 𝑝 and the subsequent probability of failure is (1−𝑝). Then the probability of getting x success in n trials is...

Since the order in which the successes are observed in not known, we multiply the above probability with...

Thus, obtaining the required probability

### Poisson Distribution

It is a limiting case of the above discussed Binomial Distribution. There may arise a situation wherein the number of trials is way beyond the technical capacity of obtaining a probability. If that was not enough, the probability of observing a success is very small. In such a case, a Poisson distribution comes in handy to model such rare events arising out of a large number of trials.

The mean of this distribution is obtained via the product of the number of trails and the probability of the outcome of the trial being a success.

If β were the mean of the distribution, then the required probability of getting x success.

A befitting example of the use of the variants of the Poisson distribution in the Insurance Sector is to model the number of claims. It is done right so because the event of a claim being reported from a portfolio containing a large number of policies is a rare event and has a lower probability of occurrence.

But how do we get to know that the Poisson distribution is an ideal fit to model the number of claims given there are other competing distributions such as the Negative Binomial (which we will discuss later).

In order to ascertain that the Poisson distribution is an ideal fit, we simulate the samples and check whether the mean and variance as obtained in the simulations are significantly different. If they are not significantly different, then the Poisson distribution is an ideal fit (as it is one of the conditions of the Poisson model to have expected value close to the value of the variance when the sample is large enough).

### Geometric Distribution

It is a discrete distribution with a binary outcome that is helpful in ascertaining the probability of the first success.

We assume the probability of success and failure is well defined and does not change with the number of trials indicating its independence property.

Properties

• Each trial has a defined probability of FAILURE AND SUCCESS

• The probabilities are EXHAUSTIVE and INDEPENDENT

The geometric distribution does come in two variants

• Type 1

• Type 2

The required probability of the...

Type 1 geometric distribution

Type 2 geometric distribution

Difference between the two variants

Type 1 Geometric distribution is used to model the number of failures experienced before the first success WHILE Type 2 geometric distribution is useful in modelling the occurrence of first success in the nth trial in n trials.

Negative Binomial

This form of distribution comes in handy when the objective is to model the number of trials it requires to observe a predetermined number of successes. Again, assuming the probability of success and failure is well defined.

Assumptions underlying the distributions are the same as above:

1) Probability of occurrences being exhaustive in nature

2) Events are IDENTICAL and INDEPENDENT

Example

Take for instance the Indian Premier League, if we wanted to work out the probability of the Chennai Super Kings winning its 4th game in the 5th trial or considering a larger picture and wanting to ascertain the probability of the trials it would take for getting a defined number of wins. In such cases, the negative binomial distribution comes in handy.

Written by: Lakshay Guglani

Special thanks to Lakshay Guglani for being a guest writer on our website.