MLH test for randomness of singles and of the expected values
based on the range of a pattern with uniform distribution
A lot of test are accessible on the randomness of sequences with independent figures. The proposed one can be one of the simplest ones. The considered random numbers are analogous to the normal numbers of Borel with the length of kmax (which is equal to the range of the pattern).
The meaning of X random variable is the length k of the subsequences with any patterns. For simple readout the distribution of homogenous subsequences is considered: any -even rational - pattern of k has the same ptobability. In the countable finite case the distribution function has the form of
Pk (X=k) = (1-1/b)/ bk-1 , k= 1,2,3,…,∞ ,
since patterns with the length of k =1 have the probability of 1 - 1/b. The next figure has the probability 1/b when the figure is equivalent, or the probability of 1 -1/b when it is not equivalent. Thus the homogeneous patterns have the probabilities of (b-1)/b 1/b k-1, for k = 1, 2, 3, ...,∞, and
Σ k Pk (X=k) = (b-1)/b + (b-1)/b2 + ... = 1.
Lemma: In the finite case:
Σk Pk ( X=k) = (1-b-kmax )-1 (1-b-1) Σk 1/bk-1 = 1 ,
since (1-b-1) Σk b-k+1 = 1- b-kmax , k=1,2,3,...,kmax. .
Note 1.: Any pattern of the length k has the frequency of the homogenous pattern with the same length. The readout frequency of singles has the value of (b-1)2/b2 since the figures standing before and after a single digit has the frequency of (b-1)/b and E(X)= b/(b-1): since the way of the readout is different.
Randomness test
Statement of the Problem: at Borel normal numbers (http://mathworld.wolfram.com/NormalNumber.html) with the length kmax subsequences have frequencies of b-kmax in base b. We assume that the range of the pattern: kmax is given.
Computing the distribution of the length k subsequences with homogenous patterns: the finite sequences are named semi-normal strings. The distribution of semi-normal sequences has geometric distribution with the parameter of θ = (b-1)/b in base b: in the finite case of semi-normal strings the distribution has the form of
Σk f(k,θ) = θ Σk 1/bk-1 = 1- b-kmax, k=1,2,3,...,kmax.
This distribution has the expected value of θ-1 named maximum entropy constraint. In the maximum entropy case the parameter θ is equal to the probability of singles, as well. The readout estimation - of the frequency of singles- is θ2= (b-1)2/b2 is the simplest estimation and test since it has Bernoulli distribution.(https://en.wikipedia.org/wiki/Maximum_entropy_probability_distribution).
Statement of the Estimation Problem
The empirical mean of the length k sequences is the MLH estimation of the unknown parameter θ-1 . The readout of subsequences of the pattern come by string comparison command when k≥2. The random sample is assumed to be geometric distribution with the mean of θ -1, than: find a ML estimate of θ.
The estimated value of θ maximizes the joint probability i.e. the likelihood, i.e. the joint probability belonging to the length kmax semi-normal sequence. The actual probability density function of each k is f(k,θ). The joint probability density function is L(θ) = f(1,θ)⋅f(2,θ)⋯f(kmax,θ) since the independency.
The value of θ which maximizes L(θ) and ln L(θ), is determined by known method. The value of θ that maximizes the natural logarithm of the likelihood function ln L(θ) or logb L(θ ) is also the value of θ that maximizes the likelihood function L(θ).
Likelihood Function and Maximum Entropy
Usually the mean of ln L(θ) is computed by the slope 1/k max of the pattern. The computation the mean by f(k,θ) instead of 1/k max :
H = - Σk f(k,θ) ln f(k,θ) , k=1,2,3,…, k max ,
where Σk f(k,θ) =1 -b-kmax . The readout estimation of the frequency of singles θ can be used with the true value of θ2 = (b-1)2/b2. Since the sum of the probabilities of length k sequences is 1/b and the probability of singles is 1-1/b. (In the case of b = 2 they have the values of 1/2.)
Conclusions
Test for the normality of Borel numbers by the point estimation of θ =1-1/b or by the readout value of θ2 of the frequency of singles with binomial distribution can be tested easily. The Likelihood Test named G-Test can be applied: https://en.wikipedia.org/wiki/G-test that are widely being used in situations where chi-squared tests were previously recommended. (The R programming language has the likelihood.test function in the Deducer package. In SAS, one can conduct G-test by applying the /chisq option after the proc freq. In Stata, one can conduct a G-test by applying the lr option after the tabulate command.)