Search results
Results from the WOW.Com Content Network
The only divergence for probabilities over a finite alphabet that is both an f-divergence and a Bregman divergence is the Kullback–Leibler divergence. [8] The squared Euclidean divergence is a Bregman divergence (corresponding to the function x 2 {\displaystyle x^{2}} ) but not an f -divergence.
In probability theory, an -divergence is a certain type of function (‖) that measures the difference between two probability distributions and . Many common divergences, such as KL-divergence , Hellinger distance , and total variation distance , are special cases of f {\displaystyle f} -divergence.
In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of f-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909.
In mathematical statistics, the Kullback–Leibler (KL) divergence (also called relative entropy and I-divergence [1]), denoted (), is a type of statistical distance: a measure of how much a model probability distribution Q is different from a true probability distribution P.
The only divergence on that is both a Bregman divergence and an f-divergence is the Kullback–Leibler divergence. [ 6 ] If n ≥ 3 {\displaystyle n\geq 3} , then any Bregman divergence on Γ n {\displaystyle \Gamma _{n}} that satisfies the data processing inequality must be the Kullback–Leibler divergence.
The total variation distance (or half the norm) arises as the optimal transportation cost, when the cost function is (,) =, that is, ‖ ‖ = (,) = {(): =, =} = [], where the expectation is taken with respect to the probability measure on the space where (,) lives, and the infimum is taken over all such with marginals and , respectively.
The continuous mapping theorem states that for a continuous function g, if the sequence {X n} converges in distribution to X, then {g(X n)} converges in distribution to g(X). Note however that convergence in distribution of {X n} to X and {Y n} to Y does in general not imply convergence in distribution of {X n + Y n} to X + Y or of {X n Y n} to XY.
By Chentsov’s theorem, the Fisher information metric on statistical models is the only Riemannian metric (up to rescaling) that is invariant under sufficient statistics. [3] [4] It can also be understood to be the infinitesimal form of the relative entropy (i.e., the Kullback–Leibler divergence); specifically, it is the Hessian of