Search results
Results from the WOW.Com Content Network
The two most important classes of divergences are the f-divergences and Bregman divergences; however, other types of divergence functions are also encountered in the literature. The only divergence for probabilities over a finite alphabet that is both an f-divergence and a Bregman divergence is the Kullback–Leibler divergence. [8]
In probability theory, an -divergence is a certain type of function (‖) that measures the difference between two probability distributions and . Many common divergences, such as KL-divergence , Hellinger distance , and total variation distance , are special cases of f {\displaystyle f} -divergence.
In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of f-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909.
The only divergence on that is both a Bregman divergence and an f-divergence is the Kullback–Leibler divergence. [ 6 ] If n ≥ 3 {\displaystyle n\geq 3} , then any Bregman divergence on Γ n {\displaystyle \Gamma _{n}} that satisfies the data processing inequality must be the Kullback–Leibler divergence.
In mathematical statistics, the Kullback–Leibler (KL) divergence (also called relative entropy and I-divergence [1]), denoted (), is a type of statistical distance: a measure of how much a model probability distribution Q is different from a true probability distribution P.
By Chentsov’s theorem, the Fisher information metric on statistical models is the only Riemannian metric (up to rescaling) that is invariant under sufficient statistics. [3] [4] It can also be understood to be the infinitesimal form of the relative entropy (i.e., the Kullback–Leibler divergence); specifically, it is the Hessian of
The total variation distance (or half the norm) arises as the optimal transportation cost, when the cost function is (,) =, that is, ‖ ‖ = (,) = {(): =, =} = [], where the expectation is taken with respect to the probability measure on the space where (,) lives, and the infimum is taken over all such with marginals and , respectively.
More technically, the divergence represents the volume density of the outward flux of a vector field from an infinitesimal volume around a given point. As an example, consider air as it is heated or cooled. The velocity of the air at each point defines a vector field. While air is heated in a region, it expands in all directions, and thus the ...