High-dimensional space

  1. Vector space model: representing documents by vectors
  2. Dot-product: measure similarity
  3. \(A^TA\): pairwise similarities
  4. Better way: first find the best fit direction
  5. Orthogonality: disjoint, disparity
  6. Chernoff bounds
  7. Central limit theorem: distribution of distances from random points to a point tends to concentrate about an average distance.
  8. Random projection theorem: a collection of vectors can be projected to a lower-dimensional space approximately preserving all pairwise distances. Does not work for pairwise dot products,
  9. Law of large numbers: \[ P\left(\left|\frac{x_1+x_2+\cdots+x_n}{n}-E(x)\right| > \epsilon\right) \le \frac{\sigma^2}{n\epsilon^2} \]
  10. Markov's inequality: For nonnegative random variable \(x\) and for \(a>0\), \[ P(x\ge a) \le \frac{E(x)}{a}. \] Proof: \[ E(x) = \int_0^\infty x p(x) dx \ge \int_a^\infty xp(x) dx \ge a\int_a^\infty p(x) dx = a P(x\ge a). \] Corollary: \[ P(x\ge cE(x)) \le \frac{1}{c}. \]
  11. Chebyshev's inequality: \(x\) is a random variable with mean \(m\) and variance \(\sigma^2\), \[ P(|x-m|\ge a\sigma) \le \frac{1}{a^2}. \] Proof: \[ P(|x-m|\ge a\sigma) = P((x-m)^2\ge a^2\sigma^2) \le \frac{E((x-m)^2)}{a^2\sigma^2} = \frac{1}{a^2}. \]
  12. Proof of law of large numbers is straightforward from Chebyshev's inequality: \[ P\left(\left|\frac{x_1+x_2+\cdots+x_n}{n}-E(x)\right| > \epsilon\right) \le \frac{\sigma^2/n}{\epsilon^2}. \]
  13. As the dimension \(d\) increases, the volume of a unit cube is always one, and the maximum possible distance between two points grow as \(\sqrt{d}\).
  14. For unit-radius sphere, as \(d\) increases, its volume goes to zero, while the maximum possible distance between two points stays at two.
  15. A vertex of a unit cube is at distance \(\sqrt{d}/2\) from the origin, and lie far outside the unit radius sphere for large \(d\).
  16. The mid-point of each face of the cube is only at distance 1/2 from the origin and is inside the unit sphere.
  17. For large \(d\) almost all the volume of the cube is located outside the sphere.
  18. The surface area and volume of a unit sphere \[ A(d) = \frac{2\pi^{d/2}}{\Gamma(d/2)}, V(d) = \frac{2\pi^{d/2}}{d\Gamma(d/2)}. \]
  19. Exponential function does not grow as fast as the factorial function, hence \(A(d)\) and \(V(d)\) go to zero as \(d\) goes to infinity.
  20. The way to calculate the surface area of \(d\)-dimensional sphere is to make use of the \(d\)-dimensional Gauss integral: \[ \int_{-\infty}^{\infty} e^{-(x_1^2 + x_2^2 + \cdots + x_d^2)}dx_1dx_2\cdots dx_n. \]
  21. Volume is near the equator: that is, most of the volume is below or above the equator plane, with height of the order of \(1/\sqrt{d-1}\).
  22. The volume is in a narrow annulus. Most volume is contained in an annulus of width \(O(1/d)\).
  23. The surface area is near the equator.
  24. Generating points uniformly at random on the surface of a sphere: generate \(d\) Gaussian variables, then normalize this \(d\)-dimensional vector and project it onto the surface of a unit sphere.
  25. The natural scale of a \(d\)-dimensional Gaussian is \(\sigma\sqrt{d}\).
  26. Mixture of spherical Gaussians can be separated provided their centers are separated by more than \(d^{1/4}\). But better algorithms are available.
  27. Suppose \(x\) and \(y\) are taken from the same spherical unit variance Gaussian distribution, then \(|x-y| = \sqrt{2d} + O(1)\).
  28. Suppose \(x\) and \(y\) are taken from two spherical unit variance Gaussian distributions separated by \(\delta\), then \(|x-y|^2 = 2d + \delta^2 \pm O(\sqrt{d})\).
  29. Two randomly chosen points in high dimension are almost surely nearly orthogonal.
  30. Algorithm for separating points from two Gaussians.