INFORMATION CONTENT OF PROBABILITY DISTRIBUTIONS
 
IN THE ENGINEERING PRACTICE
 
(2023)
 
 

ABSTRACT

The evaluation is based on the modified rewritings of the Chebyshev, Markov and Cantelli inequalities: the modifications are aimed at determining the information content (the concentration of information around the expected value, i.e. the peakiness) that can be computed from the empirical expected value and the variance of a distribution, and at evaluating and computing it with the Chebyshev, Markov and Cantelli inequalities. Peakedess is defined as 1 - 1/κ, where 1/κ is the reciprocal of the kurtosis.

 
 
 INTRODUCTION
There are mathematical methods for evaluating and comparing probability characteristics: the square of variance, the spread, the flatness, ...  For comparison, to evaluate the information content, entropy is also used (in the continuous case see https://en.wikipedia.org/wiki/Differential_entropy, which gives the maximum for the geometric distribution in the exponential family, and https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence), we will not use them. For a distribution function, the kurtosis of κ measures the weight of values far from the expected value, calculated from the standardized fourth moment.
A proposal is the definition of peakedness : 1 - 1/ κ, where  κ stands for the kurtosis . "The kurtosis (https://en.wikipedia.org/wiki/Kurtosis#Excess_kurtosis ) is the fourth standardized moment, defined as:  
                                                                             

where μ4 is the fourth central moment and σ is the standard deviation." 

First a trial was that the reciprocal of the flatness may be related to the peakedness. The values of 1/κ = σ4/ μ4 are: 1/3 for normal distribution, 5/9 for uniform distribution, and 5/6 for exponential distribution, 1/6 for Laplace distribution (https://en.wikipedia.org/wiki/Laplace_distribution), and 1-[1 - 4 p(1-p)]/[1- 3p(1-p)] for Bernoulli distribution. For a geometric distribution (https://testbook.com/question-answer/the-excess-kurtosis-of-the-geometric-distribution--607e64a9c3ce62d9a72ef003): 1 - [5 + p2/(1-p)] / [6 + p 2/(1-p)], for a logistic distribution: 0.238, for a Wigner distribution: 0.5.
Unfortunately, the conjecture does not measure the peakedness, so we also examined the values 1 - 1/κ = 1 - σ4/ μ4.
1 - 1/κ  could be a useful definiation of the peakiness, because then its values increase proportionally with the peakiness. The values of 1 - 1/κ are: 2/3 for a normal distribution, 4/9 for a uniform distribution, and 1/6 for an exponential distribution, 5/6 for a Laplace distribution (https://en.wikipedia.org/wiki/Laplace_distribution), and [1 - 4 p(1-p)] / [1- 3p(1-p)] for a Bernoulli distribution. For a geometric distribution (https://testbook.com/question-answer/the-excess-kurtosis-of-the-geometric-distribution--607e64a9c3ce62d9a72ef003) the values are [5 + p 2/(1-p)] / [6 + p 2/(1-p)], for a logistic distribution they are 0.762, for a Wigner distribution 0.5.
 
A further possible measure of peakiness: we will consider a probability variable η, which can be discrete or continuous. We assume that there exists a quadratic expected value of η, so that the relation E (η 2) = E 2 (η) + D 2 (η) holds, where E (η) denotes a finite expected value and D(η) denotes a variance. The measures of information content to be considered are A (η) = D 2 (η) / E (η 2) ≤ 1, and B (η) = E 2 (η) / E (η 2) ≤ 1, and A (η) + B (η) = 1, which are well estimated empirically and can be represented in modified versions of the Markov, Chebyshev, Cantelli inequalities (https://en.wikipedia.org/wiki/Chebyshev%27s_inequality). The modified versions allow the evaluation of the information content (the "density, peakiness" of the density function) that can be computed from the parameters of a distribution.
                                                                                                                                                                                         

MODIFIED FORMS OF THE CHEBYSHEV AND CANTELLI INEQUALITIES

Different forms of the inequality* refer to the absolute value of the probability variable η - E (η), ε > 0:
                                                     P { |η - E ( η)| ≥ ε D(η ) } ≤ ε-2 
or ε replacing ε by ε / D (η):         
                                                     P { | η - E ( η ) ≥ ε } ≤ ε-2 2 (η) .                                                                            . 
         
Into the second version of the inequality substituting ε2 M (η2) for ε2 gives the following forms:
                                                                                             P { | η - E ( η)| ≥ ε E1/22) } ≤ D 2 (η) / ε2 E (η2)
 
where D 2 (η) / E (η2= A (η) where  ε2 = 1.
 
We turn to discussion of the standardized probability variable Z = (η - E ( η) ) ) / D (η). Z is a variable with zero expected value and with unit variance.
 
If k > 0, for a standardised variable Z, the Cantelli (https://en.wikipedia.org/wiki/Cantelli%27s_inequality) inequality has the following form
 
Substituting the quantity k = |E (η)| / D (η)|, we obtain  
 
                                                                        P { (η - E ( η) ) ) / D (η) ≥ |E (η)| / D (η) | } ≤ D2 (η) /( E 2 (η) + D 2 (η)) 
 
where D2 (η) /( E 2 (η) + D 2 (η)) = A(η). 
 
For P { | Z | ≥ k },  if k = |E (η)| / D (η)
 
                                                                                    P { | Z | ≥ k } = 2 D2 (η) /( E 2 (η) + D 2 (η)).
MODIFIED FORMS OF MARKOV INEQUALITY
Considering the Markov inequality P { η ≥ ε } =  M(η)/ε, η > 0, and replacing ε  by E 1/2 (η2) /ε   one can get 
                                                                                        P { η ≥ ε E 1/2 (η2) } ≤ E  (η) / E 1/22ε   
and when ε = 1                                                                  P { η ≥ E 1/2 (η2) } ≤ E  (η) / E 1/22) = B 1/2(η)  .
Note on the comparison of normal and uniform distributions: For the normal distribution N (E(η), D2(η)), the probability variable η is with probability 0.99730 in the neighbourhood of the expected value E(η) with width 3 D (η), i.e. then its range is T = 6 D (η) (Prékopa, c.f. p. 224). A probability variable with a uniform distribution whose expected value is E(η) and whose spread is T = 6 D (η) has a standard deviation T2/12 = 3 D2 (η), i.e. it is U (E(η), 3 D 2(η)) distributed: its standard deviation is three times the standard deviation of the normal distribution, then A(η) = 3D2 (η) / ( E2(η) + 3D2(η))= 1 - B(η). (The discussion of normal and uniform distributions, and some similar distributions with the inequality under consideration, differs from the discussion of other distributions because a fixed expected value M(η) can have several finite standard deviations D(η).) 
Thus, if we refer to a uniform distribution U (E(η),D(η)), then with probability 0.99730, the value C(η) = E2 (η) /D 2 (η) associated with a normal distribution with the same expected value is three times the value C(η) of the uniform distribution. The values of E(η) in the table below can be used to evaluate the values of B(η) and A(η) of other distributions.



 APPENDIX: COMPARISON OF A(η), B(η) AND A C(η) = E 2(η) /D 2(η) FOR DISTRIBUTIONS 
                                                                               .
A(η) = 1 / (1 +C (η)), and 0 < A (η) < 1 and 0 < B (η) < 1. The maximum sensitivity of the A(η), B(η) metrics is in the middle of the interval, with smaller A(η) and larger B(η) values corresponding to higher information content. 
The calculation of the coefficients A(η) = D2 (η) / ( E 2 (η) + D 2 (η) ), B(η) = E2 (η) / ( E 2 (η) + D 2 (η) ), C (η) = E 2 (η) /D 2 (η) are chosen for some distributions: the parameters p, λ, n are given, whose meaning p, λ is the probability of the event highlighted in the distribution (the definition of the event varies depending on the distribution considered) and n is the number of trials. The expected value and the standard deviation are given in parentheses after the name of the distribution. 
 
                                                                                            A(η)                                                  B (η)                                                         C (η)
Geometric distribution, (1/p, (1-p)/p2)                        (1 - p) / (2 - p)                                         1/(2 - p)                                                      1/(1 - p)
Indicator variable (Bernoulli) distribution,(p, p(1-p))          1 - p                                                       p                                                           p/(1- p)
Binomial distribution, (np, np(1-p))                            (1 - p) / (np + 1-p)                                np/(np + 1 -p)                                               np/(1- p)
Negative binomial distribution (n/p, n(1-p)/p 2 )         (1- p) / (1 - p + n)                                   n/(1+n-p)                                                    n /(1- p)
Exponential distribution (1/λ, 1 /λ2 )**                                     1/2                                                  1/2                                                              1
Gamma distribution (n/λ, n/λ2)                                        1 / (1 + n)                                           n / (1 + n)                                                        n
Poisson distribution (λ, λ)                                                1 / (1 + λ)                                           λ / (1 + λ)                                                         1
 Inference based on the formulas A(η) = D2 (η) / (M 2 (η) + D 2 (η)) and B(η) = 1 - A(η): for given p, λ and n, the minimum of A(η) can be calculated and compared using the above inequalities. E.g. if we consider normal distributions whose squared standard deviation is equal to, say, n/(1+n-p)2, then normal distributions with expected value greater than n/p have higher information content, are more concentrated around the expected value, i.e. the measure B(η) is larger.