Probability Distributions
Probability Distributions are mathematical functions that describe how the values of a random variable are distributed over a set of possible outcomes. They are used in statistics and probability to model random phenomena and predict the likelihood of different events.
Types of Probability Distributions
They are divided into two main categories based on the type of random variable:
- Discrete Variable Distributions: These apply when the random variable takes a finite or countable set of values.
- Continuous Variable Distributions: These apply when the random variable can takeany value within a continuous range.
Key Characteristics
- Probability mass function (PMF) for discrete variables or probability density function (PDF) for continuous variables: Determines the probability of each possible value.
- Expected value (mean): The expected average value of the random variable.
- Variance and standard deviation: Measure the spread of values around the mean.
- Cumulative distribution function (CDF): Indicates the probability that the random variable takes a value less than or equal to a given number.
Interactive chart
Instructions
To switch to the cumulative distribution function graph, click the blue line button. Click it again to revert to the probability density function. Next, select the tab to choose the desired probability distribution. Then, enter the necessary parameters, such as \(\mu\) and \(\sigma\). Afterward, select the area under the probability density curve. Below you can enter the values for the intervals.
Discrete Variable Distributions
Binomial distribution
The binomial distribution is a discrete probability distribution that models the number of successes in a series of independent Bernoulli trials, where each trial has only two possible outcomes: success or failure. It is widely used in situations where an experiment is repeated under the same conditions a fixed number of times.
Characteristics of the Binomial Distribution
- Fixed number of trials \((n)\): The number of times the experiment is conducted is constant.
- Independent trials: The outcome of one trial does not affect the others.
- Two possible outcomes in each trial: Each experiment results in either success \((1)\) or failure \((0)\).
- Constant probability of success \((p)\): The probability of success remains the same in each trial.
- Discrete random variable: Represents the number of successes in \(n\) trials.
Probability Mass Function (PMF)
The probability function of the binomial distribution is given by:
$$P\left( X = k \right) = {n \choose k} p^{k} \left(1 - p\right)^{n - k}$$
where:
- \(X\) is the number of successes in \(n\) trials,
- \({n \choose k} = \frac{n!}{k!\left( n - k \right)!}\) is the binomial coefficient, which counts the number of ways to obtain \(k\) successes in \(n\) attempts,
- \(p\) is the probability of success in a single trial,
- \(\left( 1 - p \right)\) is the probability of failure,
- \(k\) is the number of successes in the trials \(\left( k = 0,1,2,\ldots,n \right)\).
Mean, Variance, and Standard Deviation
For a random variable \(X\) that follows a binomial distribution \(B(n,p)\), the following properties hold:
- Expected value (mean):
- Variance:
- Standard deviation:
$$E\left(X\right) = np$$
$$\operatorname{Var}\left(X\right) = np\left(1 - p\right)$$
$$\sigma = \sqrt{np \left( 1 - p \right)}$$
Example of Application
Suppose a factory produces light bulbs, and \( 5\ \% \) of them are defective. If 10 bulbs are randomly selected, the probability of finding exactly 2 defective bulbs is calculated using the binomial formula:
$$P\left( X = 2 \right) = {10 \choose 2} \left( 0.05 \right)^{2} \left( 0.95 \right)^{8}$$
This type of problem is common in quality control, medical studies, surveys, etc.
characteristic | expression |
---|---|
notation | $$B(n,p)$$ |
parameters | $$n \in \left\lbrace 0,1,2,\ldots \right\rbrace$$ $$p \in \left[0,1\right]$$ $$q = 1 - p$$ |
support | $$k \in \left\lbrace 0,1,\ldots,n \right\rbrace$$ |
PMF | $${n \choose k} p^{k} q^{n - k}$$ |
CDF | $$I_{q} \left( n - \lfloor k \rfloor, 1 + \lfloor k \rfloor \right)$$ |
mean | $$np$$ |
median | $$\lfloor np \rfloor, \lceil np \rceil$$ |
mode | $$\lfloor (n + 1)p \rfloor, \lceil (n + 1)p \rceil - 1$$ |
variance | $$npq = np(1 - p)$$ |
skewness | $$\frac{q - p}{\sqrt{npq}}$$ |
kurtosis | $$\frac{1 - 6pq}{npq}$$ |
entropy | $$\frac{1}{2} \log_{2} \left( 2 \pi enpq \right) + O\left( \frac{1}{n} \right)$$ |
MGF | $$\left( q + pe^{t} \right)^{n}$$ |
CF | $$\left( q + pe^{it} \right)^{n}$$ |
PGF | $$G(z) = \left[ q + pz \right]^{n}$$ |
Fisher information | $$g_{n} (p) = \frac{n}{pq}$$ |
Pascal distribution (soon)
Poisson distribution
The Poisson distribution is a discrete probability distribution that models the number of events occurring in a given time or space interval, under the following conditions:
- Events occur independently: The occurrence of one event does not affect the probability of another occurring.
- The average rate of occurrence is constant: In a fixed time or space interval, the expected number of events remains the same.
- Two events cannot occur at exactly the same time: Events are assumed to be discrete and non-simultaneous.
Probability Mass Function (PMF)
The probability of exactly \(k\) events occurring in a given interval is given by the Poisson formula:
$$P\left( X = k \right) = \frac{e^{-\lambda}\lambda^{k}}{k!}$$
where:
- \(X\) is the random variable representing the number of events in an interval,
- \(\lambda\) is the average rate of occurrence of events in the interval,
- \(k\) is the number of events \(\left( k = 0,1,2,3,\ldots \right)\),
- \(e\) is the base of the natural logarithm \(\approx 2.718\).
Mean, Variance, and Standard Deviation
For a random variable \(X\) that follows a Poisson distribution with parameter \(\lambda\), the following properties hold:
- Expected value (mean):
- Variance:
- Standard deviation:
$$E\left( X \right) = \lambda$$
$$\operatorname{Var}(X) = \lambda$$
$$\sigma = \sqrt{\lambda}$$
Example of Application
Suppose a company receives an average of 3 calls per hour at its customer service center. What is the probability of receiving exactly 5 calls in an hour?
We use the Poisson formula with \(\lambda = 3\) and \(k = 5\):
$$P\left( X = 5 \right) = \frac{e^{-3} 3^{5}}{5!} = \frac{e^{-3} 243}{120}$$
This calculation gives approximately \(0.1008\), meaning a \(10.08\ \%\) probability.
Relationship with Other Distributions
- When \(n\) is large and \(p\) is small in a binomial distribution \(B(n,p)\), the binomial can be approximated by a Poisson distribution with \(\lambda = np\).
- The Poisson distribution is also used to model arrival processes in queueing theory, where it is related to the exponential distribution.
characteristic | expression |
---|---|
notation | $$\mathrm{Pois}\left( \lambda \right)$$ |
parameters | $$\lambda \in \left(0,\infty\right)$$ |
support | $$k \in \mathbb{N}_{0}$$ |
PMF | $$\frac{\lambda^{k}e^{-\lambda}}{k!}$$ |
CDF | $$\frac{\Gamma \left( \lfloor k + 1 \rfloor, \lambda \right)}{\lfloor k \rfloor !},$$ $$ e^{-\lambda}\sum_{j=0}^{\lfloor k \rfloor}\frac{\lambda^{j}}{j!},$$ $$Q \left( \lfloor k + 1 \rfloor, \lambda \right)$$ |
mean | $$\lambda$$ |
median | $$\approx \left\lfloor \lambda + \frac{1}{3} - \frac{1}{50 \lambda} \right\rfloor$$ |
mode | $$\lceil \lambda \rceil - 1,$$ $$\lfloor \lambda \rfloor$$ |
variance | $$\lambda$$ |
skewness | $$\frac{1}{\sqrt{\lambda}}$$ |
kurtosis | $$\frac{1}{\lambda}$$ |
entropy | $$\lambda \left[ 1 - \log( \lambda ) \right] + e^{-\lambda} \sum_{k = 0}^{\infty} \frac{\lambda^{k} \log\left( k! \right)}{k!}$$ $$\approx \frac{1}{2} \log (2 \pi e \lambda) - \frac{1}{12 \lambda} - \frac{1}{24 \lambda^{2}} - \frac{19}{360 \lambda^{3}} + \mathcal{O}\left( \frac{1}{\lambda^{4}} \right)$$ |
MGF | $$\exp\left[ \lambda \left( e^{t} - 1 \right) \right]$$ |
CF | $$\exp\left[ \lambda \left( e^{it} - 1 \right) \right]$$ |
PGF | $$\exp\left[ \lambda \left( z - 1 \right) \right]$$ |
Fisher information | $$\frac{1}{\lambda}$$ |
Hypergeometric distribution
The hypergeometric distribution is a discrete probability distribution that models the number of successes in a sample drawn without replacement from a finite population containing successes and failures. It is used when each selection affects the probabilities of the subsequent ones.
Characteristics of the Hypergeometric Distribution
- Finite population: The total number of elements \(N\) is known.
- Two categories: The population contains \(K\) successes and \(N - K\) failures.
- Sampling without replacement: The selection of one element affects the probabilities of subsequent selections.
- Fixed sample size: \(n\) elements are drawn from the population.
- Discrete variable: Represents the number of successes \(X\) in the sample.
Probability Mass Function (PMF)
The probability of obtaining exactly \(k\) successes in a sample of size \(n\) is:
$$P \left( X = k \right) = \frac{{K \choose k}{N - K \choose n - k}}{{N \choose n}}$$
where:
- \(X\) is the number of successes in the sample,
- \(N\) is the total population size,
- \(K\) is the number of successes in the population,
- \(n\) is the sample size,
- \(k\) is the number of successes in the sample,
- \({a \choose b}\) is the binomial coefficient.
Properties
For a random variable \(X\) following a hypergeometric distribution:
- Mean:
- Variance:
- Standard deviation:
$$E \left( X \right) = n \frac{K}{N}$$
$$\operatorname{Var}\left( X \right) = n \frac{K}{N} \frac{N - K}{N} \frac{N - n}{N - 1}$$
$$\sigma = \sqrt{\operatorname{Var}\left(X\right)}$$
Example Application
Suppose a batch of 20 products contains 5 defective ones. If 4 products are randomly selected without replacement, what is the probability that exactly 2 are defective?
- \(N = 20\), \(K = 5\), \(n = 4\), \(k = 2\)
$$P \left( X = 2 \right) = \frac{{5 \choose 2}{15 \choose 2}}{{20 \choose 4}}$$
This gives approximately \(0.263\), meaning there is a \(26.3\ \%\) probability.
Difference from the Binomial Distribution
- Hypergeometric: Used for sampling without replacement.
- Binomial:Used when sampling with replacement or when the population is very large.
- For large populations \(( N \gg n )\), the hypergeometric distribution can be approximated by a binomial distribution with parameter \(p = K/N\).
characteristic | expression |
---|---|
parameters | $$N \in \left\lbrace 0,1,2,\ldots \right\rbrace$$ $$K \in \left\lbrace 0,1,2, \ldots, N \right\rbrace$$ $$n \in \left\lbrace0,1,2, \ldots, N \right\rbrace$$ |
support | $$k \in \left\lbrace \max \left( 0, n + K - N \right), \ldots, \min \left( n,K \right) \right\rbrace$$ |
PMF | $$\frac{{K \choose k} {N - K \choose n - k}}{{N \choose n}}$$ |
CDF | $$1 - \frac{{n \choose k + 1}{N - n \choose K - k - 1}}{{N \choose K}} {_3F_{2}} \left[{1, k + 1 - K, k + 1 - n \atop k + 2, N + k + 2 - K - n} ; 1\right]$$ |
mean | $$n \frac{K}{N}$$ |
mode | $$\left\lceil \frac{(n + 1)(K + 1)}{N + 2} \right\rceil - 1, \left\lfloor \frac{(n + 1)(K + 1)}{N + 2} \right\rfloor$$ |
variance | $$n \frac{K}{N} \frac{N - K}{N} \frac{N - n}{N - 1}$$ |
skewness | $$\frac{\left(N - 2K\right)\left(N - 1\right)^{\frac{1}{2}}\left(N - 2n\right)}{\left[ n K \left( N - K \right) \left( N - n \right) \right]^{\frac{1}{2}}\left( N - 2 \right)}$$ |
kurtosis | $$\begin{split}&\frac{1}{nK(N - K)(N - n)(N - 2)(N - 3)} \cdot \\ &\left[ (N - 1)N^{2} \left( N (N + 1) - 6K (N - K) - 6n (N - n) \right) \right. \\ &\left.+ 6nK (N - K)(N - n)(5N - 6) \right] \end{split}$$ |
MGF | $$\frac{{N - K \choose n} {_2F_1}\left( -n,-K;N - K - n + 1; e^{t} \right)}{{N \choose n}}$$ |
CF | $$\frac{{N - K \choose n} {_2F_1}\left( -n,-K;N - K - n + 1; e^{it} \right)}{{N \choose n}}$$ |
Continuous Variable Distributions
Normal (Gaussian) distribution
The normal distribution, also known as the Gaussian distribution or bell curve, is one of the most important probability distributions in statistics and data science. It is used to model natural phenomena and random processes that tend to cluster around a mean value.
Characteristics of the Normal Distribution
- Symmetry: It is symmetric around the mean, meaning the probability of obtaining a value greater or smaller than the mean is the same.
- Bell-shaped curve: The curve is unimodal, with a single peak at the mean.
- Mean, median, and mode are equal: In a normal distribution, these three values coincide.
- Asymptotic nature: The tails of the curve approach the horizontal axis but never touch it.
- Total area under the curve = 1: Represents the total probability of all possible outcomes.
Probability Density Function (PDF)
The normal distribution is defined by the function:
$$f\left( x \right) = \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^{2}}$$
where:
- \(\mu\) is the mean (center of the distribution),
- \(\sigma\) is the standard deviation (spread of the data),
- \(\sigma^{2}\) is the variance, which measures data variability.
Standard Normal Distribution
When the mean is \(\mu = 0\) and the standard deviation is \(\sigma = 1\), the distribution is called the standard normal distribution and is denoted as \(N(0,1)\). Any normal distribution \(N(\mu,\sigma^{2})\) can be converted into a standard normal distribution using the standardized variable:
$$Z = \frac{X - \mu}{\sigma}$$
Empirical Rule (68-95-99.7 Rule)
In a normal distribution:
- \(68\ \%\) of the data falls within one standard deviation of the mean \((\mu \pm \sigma)\).
- \(95\ \%\) falls within two standard deviations \((\mu \pm 2\sigma)\).
- \(99.7\ \%\) falls within three standard deviations \((\mu \pm 3\sigma)\).
Data table
characteristic | expression |
---|---|
notation | $$\mathcal{N}\left( \mu,\sigma^{2} \right)$$ |
parameters | $$\mu \in \mathbb{R}$$ $$\sigma^{2} \in \mathbb{R}_{>0}$$ |
support | $$x \in \mathbb{R}$$ |
$$\frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^{2}}$$ | |
CDF | $$\Phi \left( \frac{x - \mu}{\sigma} \right) = \frac{1}{2} \left[ 1 + \operatorname{erf} \left( \frac{x - \mu}{\sigma \sqrt{2}} \right) \right]$$ |
quantile | $$\mu + \sigma \sqrt{2} \operatorname{erf}^{-1} \left( 2p - 1 \right)$$ |
mean | $$\mu$$ |
median | $$\mu$$ |
mode | $$\mu$$ |
variance | $$\sigma^{2}$$ |
MAD | $$\sigma \sqrt{2} \operatorname{erf}^{-1} \left( 1/2 \right)$$ |
AAD | $$\sigma \sqrt{2/\pi}$$ |
skewness | $$0$$ |
kurtosis | $$0$$ |
entropy | $$\frac{1}{2} \log \left( 2 \pi e \sigma^{2} \right)$$ |
MGF | $$\exp\left( \mu t + \sigma^{2} t^{2}/2 \right)$$ |
CF | $$\exp\left( i \mu t - \sigma^{2} t^{2}/2 \right)$$ |
Fisher information | $$\mathcal{I}\left( \mu, \sigma \right) = \begin{pmatrix} 1/\sigma^{2} & 0 \\ 0 & 2/\sigma^{2} \end{pmatrix}$$ $$\mathcal{I}\left( \mu, \sigma^{2} \right) = \begin{pmatrix} 1/\sigma^{2} & 0 \\ 0 & 1/(2\sigma^{4}) \end{pmatrix}$$ |
Student distribution (soon)
Chi-squared distribution (soon)
F-distribution distribution (soon)
Exponential distribution (soon)
Cauchy distribution (soon)
Weibull distribution (soon)
Gamma distribution (soon)
Beta distribution (soon)
Log-Normal distribution (soon)
Logistic distribution (soon)
See also
Probability and statistics formula sheet