MLH test for randomness based on the range of a pattern with uniform distribution

MLH test for the expected value

based on the range of a pattern with uniform distribution

Abstract

Figures of random numbers with uniform distribution are denoted by b=0,1,2,..b-1 values in base b. Figures have relative frequences 1/b. If follows that the length k subsequences, k= 1,2,3,…,∞ , have geometric distribution of (1-1/b) b^1-k probabilities in the countable finite case. Let kmax be finite, it is equal to the range of the pattern.

Than the sum of the (1-1/b) b^1-k formula has the form of

Σ (b-1)/b 1/b^1-k = 1- b ^-kmax .

The readout estimation - of the frequency of singles- is θ²= (b-1)² / b² is the simplest estimation -and results a test- since (b-1)/b has Bernoulli distribution (https://en.wikipedia.org/wiki/Bernoulli_distribution) in the binary case. One of the main point is: the expected value is reciprocal to the frequency of singles of (b-1)/b .

Introduction

A lot of test are accessible on the randomness of sequences with independent figures. The proposed one can be one of the simplest ones. The considered random numbers are analogous to the normal numbers of Borel with the length of k_max (which is equal to the range of the pattern).

The meaning of X random variable is the length k of the subsequences with any patterns. For simple readout the distribution of homogenous subsequences is considered: any -even rational - pattern of k has the same ptobability. In the countable finite case the distribution function has the form of

P_k (X=k) = (1-1/b)/ b^k-1 , k= 1,2,3,…,∞ ,

since patterns with the length of k =1 have the probability of 1 - 1/b. The next figure has the probability 1/b when the figure is equivalent, or the probability of 1 -1/b when it is not equivalent. Thus the homogeneous patterns have the probabilities of (b-1)/b 1/b ^k-1, for k = 1, 2, 3, ...,∞, and

Σ _k P_k(X=k) = (b-1)/b + (b-1)/b² + ... = 1.

Lemma: In the finite case:

Σ_k P_k ( X=k) = (1-b^-kmax)^-1 (1-b^-1) Σ_k 1/b^k-1 = 1 ,

since (1-b^-1) Σ_k b^-k+1 = 1- b^-kmax , k=1,2,3,...,k_max. .

Note 1.: Any pattern of the length k has the frequency of the homogenous pattern with the same length. The readout frequency of singles has the value of (b-1)²/b² since the figures standing before and after a single digit has the frequency of (b-1)/b and E(X)= b/(b-1), and since the way of the readout.

Note 2.: In the non-countable infinite and binary case, when b = 2, only the Haar measure is normalizable.

Randomness test

Statement of the problem: at Borel normal numbers (http://mathworld.wolfram.com/NormalNumber.html) with the length k_max subsequences have frequencies of (b-1) b^-kmax in base b. We assume that the range of the pattern: k_maxis given.

Computing the distribution of length k subsequences with homogenous patterns: the finite sequences are named semi-normal strings. The distribution of semi-normal sequences has geometric distribution with the parameter of θ = (b-1)/b in base b: in the finite case of semi-normal strings the distribution has the form of

Σ_k f(k,θ) = θ Σ_k1/b^k-1= 1- b^-kmax, k=1,2,3,...,k_max.

This distribution has the expected value of θ^-1named maximum entropy constraint. In the maximum entropy case the parameter θ is equal to the probability of singles, as well. The readout estimation - of the frequency of singles- is θ²= (b-1)²/b² is the simplest estimation, and is a test since (b-1)/b has Bernoulli distribution.(https://en.wikipedia.org/wiki/Maximum_entropy_probability_distribution).

Statement of the Estimation Problem

The empirical mean of the length k sequences is the MLH estimation of the unknown parameter θ^-1. The readout of subsequences of the pattern come by string comparison command when k≥2. The random sample is assumed to be geometric distribution with the mean of θ ^-1, than: find a MLH estimate of θ.

The estimated value of θ maximizes the joint probability i.e. the likelihood, i.e. the joint probability belonging to the length k_maxsemi-normal sequence. The actual probability density function of each k is f(k,θ). The joint probability density function is L(θ) = f(1,θ)⋅f(2,θ)⋯f(k_max,θ) since the independency.

The value of θ which maximizes L(θ) and ln L(θ), is determined by known method. The value of θ that maximizes the natural logarithm of the likelihood function ln L(θ) or log_b L(θ ) is also the value of θ that maximizes the likelihood function L(θ).

Likelihood Function and Maximum Entropy

Usually the mean of ln L(θ) is computed by the slope 1/k _maxof the pattern. The computation the mean by f(k,θ) instead of 1/k _max :

H = - Σ_k f(k,θ) ln f(k,θ) , k=1,2,3,…, k _max ,

where Σ_k f(k,θ) =1 -b^-kmax . The readout estimation of the frequency of singles θ can be used with the true value of θ²= (b-1)²/b². Since the sum of the probabilities of length k sequences is 1/b and the probability of singles is 1-1/b. (In the case of b = 2 they have the values of 1/2.)

Conclusions

Test for the normality of Borel numbers by the point estimation of θ =1-1/b or by the readout value of θ² of the frequency of singles with binomial distribution can be tested easily. The Likelihood Test named G-Test can be applied: https://en.wikipedia.org/wiki/G-test that are widely being used in situations where chi-squared tests were previously recommended. (The R programming language has the likelihood.test function in the Deducer package. In SAS, one can conduct G-test by applying the /chisq option after the proc freq. In Stata, one can conduct a G-test by applying the lr option after the tabulate command.)

Test for randomness based on the range of pattern