Variants of the Markov inequality in engineering practice
(April 2023)
ABSTRACT
We describe four versions of the Markov inequality and two variants of the Cantelli inequality. The Cantelli inequalities being sharper ones.
Variants of the Markov inequality
The inequality gives the probability that the value of a probability variable is greater than a given threshold ε.
 
If X ≥ 0 is a real probability variable, E(X) denotes an expected value if ε > 0:
The inequality gives the probability that the value of a probability variable is greater than a given threshold ε.
If X ≥ 0 is a real probability variable, E(X) denotes an expected value if ε > 0:
P {X ≥ ε } ≤ E[X] / ε .
I. If X ≥ 0 a further variant is obtained by substituting the parameter ε for the quantity ε D[X], the ε-squared of the variance (or by substituting X by the relative variable X/D[X]):
 P {X ≥ ε D[X] } ≤ E[X] / ε D[X]. 
The relative expected value E [X]/D [X] is proportional to the information content (peakedness) around the expected value of the distribution, so for probability variables with smaller variance we obtain a higher probability, although a lower threshold value εD [X]. The variant for the relative expected value can be rewritten as P {X/D [X] ≥ ε } ≤ E [X] / ε D [X] . For the variable Y = X/D [X], it is true that D [Y] = 1.
II According to another version (https://en.wikipedia.org/wiki/Markov%27s_inequality), if λ E [X] → ε, then P {X ≥ λ E [X] } ≤ 1/λ.
III. Consider the substitution X → X - E(X) when the Chebyshev inequality is obtained, and Var (X) = D 2(X) is denoted by a constant:
IV. Further, with the λ 2 Var (X) → a 2 substitution, we obtain 
                                                                               P { abs (X - E(X)) ≥ λ D(X) } ≤ 1/λ 2 , i.e. P { Z ≥ λ } ≤ 1/λ 2 ,
 where Z denotes a standardized random variable.
| k | Min. % within k standard deviations of mean | Max. % beyond k standard deviations from mean | 
|---|---|---|
| 1 | 0% | 100% | 
| √2 | 50% | 50% | 
| 1.5 | 55.56% | 44.44% | 
| 2 | 75% | 25% | 
| 2√2 | 87.5% | 12.5% | 
| 3 | 88.8889% | 11.1111% | 
| 4 | 93.75% | 6.25% | 
| 5 | 96% | 4% | 
| 6 | 97.2222% | 2.7778% | 
| 7 | 97.9592% | 2.0408% | 
| 8 | 98.4375% | 1.5625% | 
| 9 | 98.7654% | 1.2346% | 
| 10 | 99% | 1% | 
V. The Cantelli inequality is a sharper one than the previous inequalities: (https://en.wikipedia.org/wiki/Cantelli%27s_inequality)
or by substitution λσ→ λ one obtains
                                                                                                    Pr { Z ≥ λ } ≤ 1/ (1 + λ 2),
 the form which is valid for the Z standardized variable. 
In the Cantelli's inequality, by the substitution of D (Z2) → λ we obtain a new inequality as
In the Cantelli's inequality, by the substitution of D (Z2) → λ we obtain a new inequality as
Pr { Z ≥ D [ Z 2] } ≤ 1/κ, 
where κ = M (Z4) = D2 (Z2) + 1 denotes the flatness of a standardized variable. (Moors, J. J. A. (1986), "The meaning of kurtosis: Darlington reexamined", The American Statistician, 40 (4): 283-284..) The flatness has the value of κ = μ 4 /σ 4 and for the standardized variables σ=1.
 *
The odd-order central moments of symmetric distributions are zero, since in the sum the members computed from values less than the expected value and the members computed from values greater than the expected value cancel each other.
The odd-order central moments of symmetric distributions are zero, since in the sum the members computed from values less than the expected value and the members computed from values greater than the expected value cancel each other.
**
Comparing normal and uniform distributions - which are absolute continuous distributions with a symmetric density function* around the expected value: for the normal distribution N (E(η),D2(η)), the probability variable η falls with probability 0.99730 in the neighbourhood of the expected value E(η) with width 3 D (η), then its spread has T = 6 D (η) . A probability variable with a uniform distribution whose expected value is E(η) and whose spread is T = 6 D (η) has a standard deviation T 2/12 = 3 D 2 (η), i.e. a uniform distribution U(M(η), 3D2(η)): its standard deviation is three times the standard deviation of the normal distribution. E(η) and D (η) are independent probability variables only in the normal case: in an important practical case, when M(η) = 0, the standard deviation of the uniform distribution is independent, which is an important statement for the Kalman filter. 
**
On independence: a set of elementary events is pairwise independent if the probability of two events occurring together is the product of the probabilities of the two events. Another definition is: if the conditional probability of an event is equal to the probability of the event. Successive events that do not occur together form a sequence. 
In the case of a sequences, the elementary events are repeated, then the definition of independence is: if the new element (or elements) depends only on the last element and is independent of the previous elements, then a series is a Markov process. If the new element is independent of the last element -and also of previous elements-, then the series is called a memoryless series. It is proved that in the latter case the series has a geometric distribution in the discrete case and an exponential distribution in the continuous case. 
An example of distributions is in the discrete case: if the number of elementary events is b, which for sequences can be the base number of a number system, and the events are uniformly distributed with probability 1/b, then the probabilities of independent elementary events in an independent sequence of arbitrary order, i.e. pattern, of length k, i.e. sequences of random numbers, are geometrically distributed:
(b-1)/b 1/b k -1 , 
and the parameter of the series is (b-1)/b, the expected value of the length of the series is b/(b-1), the reciprocal of the parameter. The existence of the expected values is conditional on kmax being countably infinite, the sum of the probabilities then being 1 - b-kmax = 1. Among all discrete probability distributions supported on {1, 2, 3, ... } with given expected value μ, the geometric distribution X with parameter p = 1/μ is the one with the largest entropy.
For the exponential distribution: among all continuous probability distributions with support [0, ∞) and mean μ, the exponential distribution with λ = 1/μ has the largest differential entropy. In other words, it is the maximum entropy probability distribution for a random variate η which is greater than or equal to zero and for which E(η) is fixed (https://en.wikipedia.org/wiki/Exponential_distribution).
Definition: the peakedness equal to σ4/μ4 where μ4 denotes the fourth moment.
The peakedness has the values of 1/3 for normal distribution, 5/9 for uniform distribution, and 5/6 for exponential distribution, 1/6 for Laplace distribution (https://en.wikipedia.org/wiki/Laplace_distribution), and p(1 - p) / [1- 6p(1-p)]  for Bernoulli distribution,
 (1 - p)/( p2 - 9p + 9)  for geometric distribution (https://mathworld.wolfram.com/GeometricDistribution.html), (1 - p)/[6 + p2/(1-p)] for a logistic distribution,  λ/(1 + 3λ) for a Poisson distribution (https://proofwiki.org/wiki/Excess_Kurtosis_of_Poisson_Distribution).
  
