MLH test for randomness of singles and of the expected values

 

based on the range of a pattern with uniform distribution

 

Abstract
Figures of random numbers with uniform distribution are denoted by b=0,1,2,..b-1 values in base b. The length k subsequences, k= 1,2,3,…,∞ , have geometric distribution of  (1-1/b) b1-k probabilities in the countable finite case. Let kmax be finite and is equal to the range of the pattern.
Than the (1-1/b) b1-k formula has the form of
 
                                                                                 Σ (b-1)/b 1/b1-k     = 1- b -kmax                                                                          .
The readout estimation - of the frequency of singles- is  θ= (b-1)2 / b2 is the simplest estimation and is a test since it has Bernoulli distribution (https://en.wikipedia.org/wiki/Bernoulli_distribution).    

                                              

Introduction

A lot of test are accessible on the randomness of sequences with independent figures. The proposed one can be one of the simplest ones. The considered random numbers are analogous to the normal numbers of Borel with the length of kmax (which is equal to the range of the pattern).

The meaning of X random variable is the length k of the subsequences with any patterns. For simple readout the distribution of homogenous subsequences is considered: any -even rational - pattern of k has the same ptobability. In the countable finite case the distribution function has the form of

                                      Pk (X=k) = (1-1/b)/ bk-1 , k= 1,2,3,…,∞                  ,

 since patterns with the length of k =1 have the probability of 1 - 1/b. The next figure has the probability 1/b when the figure is equivalent, or the probability of 1 -1/b when it is not equivalent. Thus the homogeneous patterns have the probabilities of (b-1)/b  1/b k-1, for k = 1, 2, 3, ...,∞, and 

                                                              Σ k P(X=k) = (b-1)/b + (b-1)/b2 + ...   = 1.

Lemma: In the finite case:

                                   Σk Pk ( X=k) = (1-b-kmax  )-1  (1-b-1) Σk 1/bk-1  = 1       ,

since                                                                                                       (1-b-1) Σk b-k+1  =  1- b-kmax , k=1,2,3,...,kmax.                               .

Note 1.:  Any pattern of the length k has the frequency of the homogenous pattern with the same length. The readout frequency of singles has the value of (b-1)2/b2  since the figures standing before and after a single digit has the frequency of (b-1)/b and E(X)= b/(b-1): since the way of the readout is different. 

Note 2.: In the non-countable infinite and binary case, then b = 2, the Haar measure is normalizable, -provided that the Haar measure was nominalized for one-, the frequencies of singles and sequences are equal to each others and to 1/2 a priori, and an inverze to E(X), without any proof.   
 

Randomness test

Statement of the Problem: at  Borel normal numbers (http://mathworld.wolfram.com/NormalNumber.html) with the length kmax subsequences have frequencies of b-kmax in base b. We assume that the range of the  pattern: kmax is given.  

Computing the distribution of the length k subsequences with homogenous patterns: the finite sequences are named semi-normal strings. The distribution of semi-normal sequences has geometric distribution with the parameter of θ = (b-1)/b  in base b: in the finite case of semi-normal strings the distribution has the form of 

                                     Σk f(k,θ) θ Σ1/bk-1  = 1- b-kmax, k=1,2,3,...,kmax.  

This distribution has the expected value of θ-1 named maximum entropy constraintIn the maximum entropy case the parameter θ is equal to the probability of singles, as well. The readout estimation - of the frequency of singles- is  θ2= (b-1)2/b2 is the simplest estimation and test since it has Bernoulli distribution.(https://en.wikipedia.org/wiki/Maximum_entropy_probability_distribution).     

Statement of the Estimation Problem

The empirical mean of the length k sequences is the MLH estimation of  the unknown parameter θ-1 The readout of subsequences of the pattern come by string comparison command when k≥2. The random sample is assumed to be geometric distribution with the mean of θ -1, than: find a ML estimate of  θ. 

The estimated value of θ maximizes the joint probability i.e. the likelihood, i.e. the joint probability belonging to the length  kmax semi-normal sequence. The actual probability density function of each k is f(k,θ). The joint probability density function  is L(θ) = f(1,θ)⋅f(2,θ)⋯f(kmax,θ) since the independency.

The value of θ which maximizes L(θ) and ln L(θ), is determined by known method. The value of θ that maximizes the natural logarithm of the likelihood function ln L(θ) or  logb L ) is also the value of θ that maximizes the likelihood function L(θ).

 Likelihood Function and Maximum Entropy 

Usually the mean of ln L(θ) is computed by the slope 1/k max  of the pattern. The computation the mean by f(k,θ) instead of 1/k max :

                                      H = -  Σk f(k,θ) ln f(k,θ) , k=1,2,3,…, max ,

where Σk f(k,θ) =1 -b-kmax . The readout estimation of the frequency of singles θ can be used with the true value of  θ= (b-1)2/b2. Since the sum of the probabilities of length k sequences is 1/b and the probability of singles is 1-1/b. (In the case of b = 2 they have the values of 1/2.) 

Conclusions

Test for the normality of Borel numbers by the point estimation of θ =1-1/b or by the readout value of θ2 of the frequency of singles with binomial distribution can be tested easily. The Likelihood Test named G-Test can be applied: https://en.wikipedia.org/wiki/G-test that are widely being used in situations where chi-squared tests were previously recommended. (The R programming language has the likelihood.test function in the Deducer package. In SAS, one can conduct G-test by applying the /chisq option after the proc freq. In Stata, one can conduct a G-test by applying the lr option after the tabulate command.)