logo

johzu

About

Probability Distributions

Probability Distributions are mathematical functions that describe how the values of a random variable are distributed over a set of possible outcomes. They are used in statistics and probability to model random phenomena and predict the likelihood of different events.

Types of Probability Distributions

They are divided into two main categories based on the type of random variable:

Key Characteristics

Interactive chart

Instructions

To switch to the cumulative distribution function graph, click the blue line button. Click it again to revert to the probability density function. Next, select the tab to choose the desired probability distribution. Then, enter the necessary parameters, such as \(\mu\) and \(\sigma\). Afterward, select the area under the probability density curve. Below you can enter the values for the intervals.


Discrete Variable Distributions


Binomial distribution

The binomial distribution is a discrete probability distribution that models the number of successes in a series of independent Bernoulli trials, where each trial has only two possible outcomes: success or failure. It is widely used in situations where an experiment is repeated under the same conditions a fixed number of times.

Characteristics of the Binomial Distribution

Probability Mass Function (PMF)

The probability function of the binomial distribution is given by:

$$P\left( X = k \right) = {n \choose k} p^{k} \left(1 - p\right)^{n - k}$$

where:

Mean, Variance, and Standard Deviation

For a random variable \(X\) that follows a binomial distribution \(B(n,p)\), the following properties hold:

Example of Application

Suppose a factory produces light bulbs, and \( 5\ \% \) of them are defective. If 10 bulbs are randomly selected, the probability of finding exactly 2 defective bulbs is calculated using the binomial formula:

$$P\left( X = 2 \right) = {10 \choose 2} \left( 0.05 \right)^{2} \left( 0.95 \right)^{8}$$

This type of problem is common in quality control, medical studies, surveys, etc.

characteristic expression
notation $$B(n,p)$$
parameters $$n \in \left\lbrace 0,1,2,\ldots \right\rbrace$$ $$p \in \left[0,1\right]$$ $$q = 1 - p$$
support $$k \in \left\lbrace 0,1,\ldots,n \right\rbrace$$
PMF $${n \choose k} p^{k} q^{n - k}$$
CDF $$I_{q} \left( n - \lfloor k \rfloor, 1 + \lfloor k \rfloor \right)$$
mean $$np$$
median $$\lfloor np \rfloor, \lceil np \rceil$$
mode $$\lfloor (n + 1)p \rfloor, \lceil (n + 1)p \rceil - 1$$
variance $$npq = np(1 - p)$$
skewness $$\frac{q - p}{\sqrt{npq}}$$
kurtosis $$\frac{1 - 6pq}{npq}$$
entropy $$\frac{1}{2} \log_{2} \left( 2 \pi enpq \right) + O\left( \frac{1}{n} \right)$$
MGF $$\left( q + pe^{t} \right)^{n}$$
CF $$\left( q + pe^{it} \right)^{n}$$
PGF $$G(z) = \left[ q + pz \right]^{n}$$
Fisher information $$g_{n} (p) = \frac{n}{pq}$$

Pascal distribution (soon)


Poisson distribution

The Poisson distribution is a discrete probability distribution that models the number of events occurring in a given time or space interval, under the following conditions:

Probability Mass Function (PMF)

The probability of exactly \(k\) events occurring in a given interval is given by the Poisson formula:

$$P\left( X = k \right) = \frac{e^{-\lambda}\lambda^{k}}{k!}$$

where:

Mean, Variance, and Standard Deviation

For a random variable \(X\) that follows a Poisson distribution with parameter \(\lambda\), the following properties hold:

Example of Application

Suppose a company receives an average of 3 calls per hour at its customer service center. What is the probability of receiving exactly 5 calls in an hour?

We use the Poisson formula with \(\lambda = 3\) and \(k = 5\):

$$P\left( X = 5 \right) = \frac{e^{-3} 3^{5}}{5!} = \frac{e^{-3} 243}{120}$$

This calculation gives approximately \(0.1008\), meaning a \(10.08\ \%\) probability.

Relationship with Other Distributions

characteristic expression
notation $$\mathrm{Pois}\left( \lambda \right)$$
parameters $$\lambda \in \left(0,\infty\right)$$
support $$k \in \mathbb{N}_{0}$$
PMF $$\frac{\lambda^{k}e^{-\lambda}}{k!}$$
CDF $$\frac{\Gamma \left( \lfloor k + 1 \rfloor, \lambda \right)}{\lfloor k \rfloor !},$$ $$ e^{-\lambda}\sum_{j=0}^{\lfloor k \rfloor}\frac{\lambda^{j}}{j!},$$ $$Q \left( \lfloor k + 1 \rfloor, \lambda \right)$$
mean $$\lambda$$
median $$\approx \left\lfloor \lambda + \frac{1}{3} - \frac{1}{50 \lambda} \right\rfloor$$
mode $$\lceil \lambda \rceil - 1,$$ $$\lfloor \lambda \rfloor$$
variance $$\lambda$$
skewness $$\frac{1}{\sqrt{\lambda}}$$
kurtosis $$\frac{1}{\lambda}$$
entropy $$\lambda \left[ 1 - \log( \lambda ) \right] + e^{-\lambda} \sum_{k = 0}^{\infty} \frac{\lambda^{k} \log\left( k! \right)}{k!}$$ $$\approx \frac{1}{2} \log (2 \pi e \lambda) - \frac{1}{12 \lambda} - \frac{1}{24 \lambda^{2}} - \frac{19}{360 \lambda^{3}} + \mathcal{O}\left( \frac{1}{\lambda^{4}} \right)$$
MGF $$\exp\left[ \lambda \left( e^{t} - 1 \right) \right]$$
CF $$\exp\left[ \lambda \left( e^{it} - 1 \right) \right]$$
PGF $$\exp\left[ \lambda \left( z - 1 \right) \right]$$
Fisher information $$\frac{1}{\lambda}$$

Hypergeometric distribution

The hypergeometric distribution is a discrete probability distribution that models the number of successes in a sample drawn without replacement from a finite population containing successes and failures. It is used when each selection affects the probabilities of the subsequent ones.

Characteristics of the Hypergeometric Distribution

Probability Mass Function (PMF)

The probability of obtaining exactly \(k\) successes in a sample of size \(n\) is:

$$P \left( X = k \right) = \frac{{K \choose k}{N - K \choose n - k}}{{N \choose n}}$$

where:

Properties

For a random variable \(X\) following a hypergeometric distribution:

Example Application

Suppose a batch of 20 products contains 5 defective ones. If 4 products are randomly selected without replacement, what is the probability that exactly 2 are defective?

$$P \left( X = 2 \right) = \frac{{5 \choose 2}{15 \choose 2}}{{20 \choose 4}}$$

This gives approximately \(0.263\), meaning there is a \(26.3\ \%\) probability.

Difference from the Binomial Distribution

characteristic expression
parameters $$N \in \left\lbrace 0,1,2,\ldots \right\rbrace$$ $$K \in \left\lbrace 0,1,2, \ldots, N \right\rbrace$$ $$n \in \left\lbrace0,1,2, \ldots, N \right\rbrace$$
support $$k \in \left\lbrace \max \left( 0, n + K - N \right), \ldots, \min \left( n,K \right) \right\rbrace$$
PMF $$\frac{{K \choose k} {N - K \choose n - k}}{{N \choose n}}$$
CDF $$1 - \frac{{n \choose k + 1}{N - n \choose K - k - 1}}{{N \choose K}} {_3F_{2}} \left[{1, k + 1 - K, k + 1 - n \atop k + 2, N + k + 2 - K - n} ; 1\right]$$
mean $$n \frac{K}{N}$$
mode $$\left\lceil \frac{(n + 1)(K + 1)}{N + 2} \right\rceil - 1, \left\lfloor \frac{(n + 1)(K + 1)}{N + 2} \right\rfloor$$
variance $$n \frac{K}{N} \frac{N - K}{N} \frac{N - n}{N - 1}$$
skewness $$\frac{\left(N - 2K\right)\left(N - 1\right)^{\frac{1}{2}}\left(N - 2n\right)}{\left[ n K \left( N - K \right) \left( N - n \right) \right]^{\frac{1}{2}}\left( N - 2 \right)}$$
kurtosis $$\begin{split}&\frac{1}{nK(N - K)(N - n)(N - 2)(N - 3)} \cdot \\ &\left[ (N - 1)N^{2} \left( N (N + 1) - 6K (N - K) - 6n (N - n) \right) \right. \\ &\left.+ 6nK (N - K)(N - n)(5N - 6) \right] \end{split}$$
MGF $$\frac{{N - K \choose n} {_2F_1}\left( -n,-K;N - K - n + 1; e^{t} \right)}{{N \choose n}}$$
CF $$\frac{{N - K \choose n} {_2F_1}\left( -n,-K;N - K - n + 1; e^{it} \right)}{{N \choose n}}$$

Continuous Variable Distributions


Normal (Gaussian) distribution

The normal distribution, also known as the Gaussian distribution or bell curve, is one of the most important probability distributions in statistics and data science. It is used to model natural phenomena and random processes that tend to cluster around a mean value.

Characteristics of the Normal Distribution

Probability Density Function (PDF)

The normal distribution is defined by the function:

$$f\left( x \right) = \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^{2}}$$

where:

Standard Normal Distribution

When the mean is \(\mu = 0\) and the standard deviation is \(\sigma = 1\), the distribution is called the standard normal distribution and is denoted as \(N(0,1)\). Any normal distribution \(N(\mu,\sigma^{2})\) can be converted into a standard normal distribution using the standardized variable:

$$Z = \frac{X - \mu}{\sigma}$$

Empirical Rule (68-95-99.7 Rule)

In a normal distribution:

Data table

characteristic expression
notation $$\mathcal{N}\left( \mu,\sigma^{2} \right)$$
parameters $$\mu \in \mathbb{R}$$ $$\sigma^{2} \in \mathbb{R}_{>0}$$
support $$x \in \mathbb{R}$$
PDF $$\frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^{2}}$$
CDF $$\Phi \left( \frac{x - \mu}{\sigma} \right) = \frac{1}{2} \left[ 1 + \operatorname{erf} \left( \frac{x - \mu}{\sigma \sqrt{2}} \right) \right]$$
quantile $$\mu + \sigma \sqrt{2} \operatorname{erf}^{-1} \left( 2p - 1 \right)$$
mean $$\mu$$
median $$\mu$$
mode $$\mu$$
variance $$\sigma^{2}$$
MAD $$\sigma \sqrt{2} \operatorname{erf}^{-1} \left( 1/2 \right)$$
AAD $$\sigma \sqrt{2/\pi}$$
skewness $$0$$
kurtosis $$0$$
entropy $$\frac{1}{2} \log \left( 2 \pi e \sigma^{2} \right)$$
MGF $$\exp\left( \mu t + \sigma^{2} t^{2}/2 \right)$$
CF $$\exp\left( i \mu t - \sigma^{2} t^{2}/2 \right)$$
Fisher information $$\mathcal{I}\left( \mu, \sigma \right) = \begin{pmatrix} 1/\sigma^{2} & 0 \\ 0 & 2/\sigma^{2} \end{pmatrix}$$ $$\mathcal{I}\left( \mu, \sigma^{2} \right) = \begin{pmatrix} 1/\sigma^{2} & 0 \\ 0 & 1/(2\sigma^{4}) \end{pmatrix}$$

Student distribution (soon)


Chi-squared distribution (soon)


F-distribution distribution (soon)


Exponential distribution (soon)


Cauchy distribution (soon)


Weibull distribution (soon)


Gamma distribution (soon)


Beta distribution (soon)


Log-Normal distribution (soon)


Logistic distribution (soon)


See also

Probability and statistics formula sheet