High-dimensional space

Vector space model: representing documents by vectors
Dot-product: measure similarity
\(A^TA\): pairwise similarities
Better way: first find the best fit direction
Orthogonality: disjoint, disparity
Chernoff bounds
Central limit theorem: distribution of distances from random points to a point tends to concentrate about an average distance.
Random projection theorem: a collection of vectors can be projected to a lower-dimensional space approximately preserving all pairwise distances. Does not work for pairwise dot products,
Law of large numbers: \[ P\left(\left|\frac{x_1+x_2+\cdots+x_n}{n}-E(x)\right| > \epsilon\right) \le \frac{\sigma^2}{n\epsilon^2} \]
Markov's inequality: For nonnegative random variable \(x\) and for \(a>0\), \[ P(x\ge a) \le \frac{E(x)}{a}. \] Proof: \[ E(x) = \int_0^\infty x p(x) dx \ge \int_a^\infty xp(x) dx \ge a\int_a^\infty p(x) dx = a P(x\ge a). \] Corollary: \[ P(x\ge cE(x)) \le \frac{1}{c}. \]
Chebyshev's inequality: \(x\) is a random variable with mean \(m\) and variance \(\sigma^2\), \[ P(|x-m|\ge a\sigma) \le \frac{1}{a^2}. \] Proof: \[ P(|x-m|\ge a\sigma) = P((x-m)^2\ge a^2\sigma^2) \le \frac{E((x-m)^2)}{a^2\sigma^2} = \frac{1}{a^2}. \]
Proof of law of large numbers is straightforward from Chebyshev's inequality: \[ P\left(\left|\frac{x_1+x_2+\cdots+x_n}{n}-E(x)\right| > \epsilon\right) \le \frac{\sigma^2/n}{\epsilon^2}. \]
As the dimension \(d\) increases, the volume of a unit cube is always one, and the maximum possible distance between two points grow as \(\sqrt{d}\).
For unit-radius sphere, as \(d\) increases, its volume goes to zero, while the maximum possible distance between two points stays at two.
A vertex of a unit cube is at distance \(\sqrt{d}/2\) from the origin, and lie far outside the unit radius sphere for large \(d\).
The mid-point of each face of the cube is only at distance 1/2 from the origin and is inside the unit sphere.
For large \(d\) almost all the volume of the cube is located outside the sphere.
The surface area and volume of a unit sphere \[ A(d) = \frac{2\pi^{d/2}}{\Gamma(d/2)}, V(d) = \frac{2\pi^{d/2}}{d\Gamma(d/2)}. \]
The way to calculate the surface area of \(d\)-dimensional sphere is to make use of the \(d\)-dimensional Gauss integral: \[ \int_{-\infty}^{\infty} e^{-(x_1^2 + x_2^2 + \cdots + x_d^2)}dx_1dx_2\cdots dx_n. \]
Volume is near the equator: that is, most of the volume is below or above the equator plane, with height of the order of \(1/\sqrt{d-1}\).
The volume is in a narrow annulus. Most volume is contained in an annulus of width \(O(1/d)\).
The surface area is near the equator.
Generating points uniformly at random on the surface of a sphere: generate \(d\) Gaussian variables, then normalize this \(d\)-dimensional vector and project it onto the surface of a unit sphere.
The natural scale of a \(d\)-dimensional Gaussian is \(\sigma\sqrt{d}\).
Mixture of spherical Gaussians can be separated provided their centers are separated by more than \(d^{1/4}\). But better algorithms are available.
Suppose \(x\) and \(y\) are taken from the same spherical unit variance Gaussian distribution, then \(|x-y| = \sqrt{2d} + O(1)\).
Suppose \(x\) and \(y\) are taken from two spherical unit variance Gaussian distributions separated by \(\delta\), then \(|x-y|^2 = 2d + \delta^2 \pm O(\sqrt{d})\).
Two randomly chosen points in high dimension are almost surely nearly orthogonal.
Algorithm for separating points from two Gaussians.