High-dimensional space
- Vector space model: representing documents by vectors
- Dot-product: measure similarity
- \(A^TA\): pairwise similarities
- Better way: first find the best fit direction
- Orthogonality: disjoint, disparity
- Chernoff bounds
- Central limit theorem: distribution of distances from random points to a point tends to concentrate about an average distance.
- Random projection theorem: a collection of vectors can be projected to a lower-dimensional space approximately preserving all pairwise distances. Does not work for pairwise dot products,
- Law of large numbers:
\[
P\left(\left|\frac{x_1+x_2+\cdots+x_n}{n}-E(x)\right| > \epsilon\right) \le \frac{\sigma^2}{n\epsilon^2}
\]
- Markov's inequality:
For nonnegative random variable \(x\) and for \(a>0\),
\[
P(x\ge a) \le \frac{E(x)}{a}.
\]
Proof:
\[
E(x) = \int_0^\infty x p(x) dx \ge \int_a^\infty xp(x) dx \ge a\int_a^\infty p(x) dx = a P(x\ge a).
\]
Corollary:
\[
P(x\ge cE(x)) \le \frac{1}{c}.
\]
- Chebyshev's inequality: \(x\) is a random variable with mean \(m\) and variance \(\sigma^2\),
\[
P(|x-m|\ge a\sigma) \le \frac{1}{a^2}.
\]
Proof:
\[
P(|x-m|\ge a\sigma) = P((x-m)^2\ge a^2\sigma^2) \le \frac{E((x-m)^2)}{a^2\sigma^2} = \frac{1}{a^2}.
\]
- Proof of law of large numbers is straightforward from Chebyshev's inequality:
\[
P\left(\left|\frac{x_1+x_2+\cdots+x_n}{n}-E(x)\right| > \epsilon\right) \le \frac{\sigma^2/n}{\epsilon^2}.
\]
- As the dimension \(d\) increases, the volume of a unit cube is always one, and the maximum possible distance between two points grow as \(\sqrt{d}\).
- For unit-radius sphere, as \(d\) increases, its volume goes to zero, while the maximum possible distance between two points stays at two.
- A vertex of a unit cube is at distance \(\sqrt{d}/2\) from the origin, and lie far outside the unit radius sphere for large \(d\).
- The mid-point of each face of the cube is only at distance 1/2 from the origin and is inside the unit sphere.
- For large \(d\) almost all the volume of the cube is located outside the sphere.
- The surface area and volume of a unit sphere
\[
A(d) = \frac{2\pi^{d/2}}{\Gamma(d/2)},
V(d) = \frac{2\pi^{d/2}}{d\Gamma(d/2)}.
\]
Exponential function does not grow as fast as the factorial function, hence \(A(d)\) and \(V(d)\) go to zero as \(d\) goes to infinity.
- The way to calculate the surface area of \(d\)-dimensional sphere is to make use of the \(d\)-dimensional Gauss integral:
\[
\int_{-\infty}^{\infty} e^{-(x_1^2 + x_2^2 + \cdots + x_d^2)}dx_1dx_2\cdots dx_n.
\]
- Volume is near the equator: that is, most of the volume is below or above the equator plane, with height of the order of \(1/\sqrt{d-1}\).
- The volume is in a narrow annulus. Most volume is contained in an annulus of width \(O(1/d)\).
- The surface area is near the equator.
- Generating points uniformly at random on the surface of a sphere: generate \(d\) Gaussian variables, then normalize this \(d\)-dimensional vector and project it onto the surface of a unit sphere.
- The natural scale of a \(d\)-dimensional Gaussian is \(\sigma\sqrt{d}\).
- Mixture of spherical Gaussians can be separated provided their centers are separated by more than \(d^{1/4}\). But better algorithms are available.
- Suppose \(x\) and \(y\) are taken from the same spherical unit variance Gaussian distribution, then \(|x-y| = \sqrt{2d} + O(1)\).
- Suppose \(x\) and \(y\) are taken from two spherical unit variance Gaussian distributions separated by \(\delta\), then \(|x-y|^2 = 2d + \delta^2 \pm O(\sqrt{d})\).
- Two randomly chosen points in high dimension are almost surely nearly orthogonal.
- Algorithm for separating points from two Gaussians.