## Order Statistics

What is the distribution of the maximum of $n$ random variables? What started out a utilitarian question in my exploration of some generalized versions of the secretary problem turns out to be quite a deep topic.

(Note that I have little background in probability and statistics. Please forgive (and inform me of, so I can fix!) errors of notation or language. Or general confusion on my part.)

### Geometric Interpretation of Maximum

Consider the cumulative density function $F_{\max(X_1,X_2)}(x) = P(\max(X_1,X_2) \leq x)$. You can picture two copies of the $X$ distribution as forming sides of a square. $\max(X_1, X_2)$ is the inside.

More generally, $P(\max(X_1, x_2 ... X_n) \leq x)$ can be visualized as an $n$-cube and equal to $P(X \leq x)^n$. By differentiation we get the probability density function,

$\displaystyle P(\max(X_1, X_2... X_n) = x) = n*P(X \leq x)^{n-1}P(X=x)$

$\displaystyle f_{\max(X_1, X_2... X_n)}(x) = n*F(x)^nf(x)$

(Note that we need to be a bit careful about the semantics of $P(X = x)$. This makes sense for discrete distributions, but for others we need to think of it as probability density. For the benefit of intuition, we will continue to use this notation, however.)

This can also be observed as a simple geometric fact, the probability being the expanding surface of the cube.

### Combinatorial Interpretation of Maximum

We may also observe that $P(\max(X_1, X_2... ~X_n) = x) = n * P(X\leq x)^{n-1}*P(X=x)$ in a nice combinatorial way.

Given $n$ random variables $\{X_n\}$, we imagine them ordered $X_{(1)} \leq X_{(2)} \leq ... X_{(n)}$. (Note the notational difference between $X_i$ which is some arbitrary random variable and $X_{(i)}$ which is the one that turned out to be in position $i$ when we ordered them.)

There are $n$ possibilities for which random variable $X_i$ is the last one ($X_{(n)}$). The probability that $X_i$, which will be equal to the maximum, will be $x$ is $P(X = x)$. The probability that the other $n-1$ variables will be less than equal to $x$ is $P(X \leq x)^{n-1}$.

The result follows.

### Minimum

The probability that $\min(X_1, X_2.. ~X_n) \leq x$ is the probability that $X_1, X_2 ...$ are not all greater than $x$.

The probability that that a particular variable is greater than $x$ is the probability that it isn’t less than or equal to it.

Therefore, $P(\min(X_1, X_2... X_n) \leq x) = 1 - (1-P(X \leq x))^n$

It this feels familiar to you, it’s probably because it’s just application of De Morgan’s law, equivalent to how A OR B is the same as NOT ((NOT A) AND (NOT B)).

You can also interpret this geometrically, similarly to how we interpreted maximum.

From here, differentiate to get probability density.

### Combinatorial Construction of Order Statistics

Suppose we have a number of random variables, $\{X_n\}$ from the same distribution. We want to know the distribution to expect from the random variable in a particular position after we order them, $X_{(1)} < X_{(2)} < X_{(3)} < ... X_{(n)}$, the order statistics. Let’s consider the distribution to expect of a variable $X_{(a,b)}$ which has $a$ variables less than (or equal to) it and $b$ variables greater than (or equal to) it.

The number of ways to select $a$ elements for the less than set, $1$ for the actual element, and $b$ for the greater than from $a+b+1$ elements is $\binom{a+1+b}{a,1,b} = \frac{(a+1+b)!}{a!1!b!}$. Then, the probability distribution of $X_{(a,b)}$ is

$\displaystyle P(X_{(a,b)} = x) = \binom{a+b+1}{a,b,1} * P(X \leq x)^a * P(X \geq x)^b * P(X = x)$

$\displaystyle f_{a,b}(x) = \binom{a+b+1}{a,b,1} * F(x)^a * (1-F(x))^b * f(x)$

It’s a little bit less elegant, but the usual way to think of this is as $X_{(a+1)}$, the $(a+1)$-th order statistic of $a+1+b$ random variables. Or more so, as the $i$-th order statistic of $n$ random variables.

$\displaystyle f_{X_{(i)}}(x) = i\binom{n}{i}F(x)^{i-1}(1-F(x))^{n-i}f(x)$

(This is my first writing done in emacs org mode + LaTeX and wordpress integration. Thanks to Sacha Chua for help! This is awesome.)

Tags: , ,

### 3 Responses to “Order Statistics”

1. Helping someone get started with Emacs and Org Mode through Org2Blog and LaTeX; troubleshooting steps » sacha chua :: living an awesome life Says:

[…] 2013-08-30: By the way, here’s the link to Christopher Olah’s first post using org2blog. […]

2. Anonymous Says: