STOCHASTIC APPROXIMATION IN THE FUNCTION OF SIGNS OF THE ERROR PROCESS

STOCHASTIC APPROXIMATION IN THE FUNCTION OF SIGNS OF THE ERROR PROCESS

09. 09. 2019

ABSTRACT

In this comment the prediction error type output process is considered and is assumed to be a zero mean normal or uniform white noise process and uncorrelated to the predicted output process. To check the symmetry of +/- signs is proposed in a windows: it consists of 50% singles and 50% sequences (with the mean of 3) by theory. A scalar is modified in every step to get symmetric sings: 50% + or - signs in the window. Tshebishev type criteria are presented. The noise processes in the Kalman filter are assumed to be independent (i.e. white) Gaussian or uniformly distributed noise with zero expected value, the latter being a new finding.

INTRODUCTION

In stochastic approximation type algorithms the error processes can be error processes of equation errors, statistical errors, residual (prediction, filtered) ones and may be others. The statistical disturbance type error process is the deviation of the observed value from the -generally unobservable measured- true value, an example is the measurement noise. The residual error (that of predictors, filters) is the difference between the observed value and the estimated value. In the considered case the observed values are be corrupted by measurement disturbances. The sum of the residuals is zero and are assumed to be uncorrelated. (Whitening in closed loop: https://arc.aiaa.org/doi/abs/10.2514/2.4574?journalCode=jgcd). The sum of squares of the residuals and the sample mean can be shown to be independent of each other (https://en.wikipedia.org/wiki/Errors_and_residuals). Otherwise in a numerical computation, error may arise because of truncation and roundoff.

The considered problem is to compute the unknown vector Θn of the parameters of a dynamical system when only noise corrupted measurements of the outputs are available at discrete time n = 1,2,3,.... This problem is close to the classical linear regression and prediction problem with noise corrupted measurements, see the Kalman filter, with whitened residuals (https://arc.aiaa.org/doi/abs/10.2514/2.4574?journalCode=jgcd). In the scalar case the predictor model of linear in parameters is assumed in the form of yn = ΘnT gn where T stands for the transpose and g_n denotes grad_{_Θ} y_n i.e. the gradient of the estimation y_n by Θ in the n-th step.

The stochastic approximation algorithms family (STA, https://en.wikipedia.org/wiki/Stochastic_approximation) is the heuristic recursive estimation of Θ with gradient method (https://en.wikipedia.org/wiki/Stochastic_gradient_descent):

Θ_n= Θ_n-1 + 1/n (y_n,measured-y_n) g_n .

To avoid the large magnitude gradients the gradient is computed in a normed form (see the ADAGRAD-STA algorithm: https://en.wikipedia.org/wiki/Stochastic_gradient_descent)

Θ_n= Θ_n-1 + 1/n ( y_n,measured - y_n) (Σ g_i²)^-1/2 g_n,

It is assumed that the inverse exists. It is easy to compute the elements of g_i², i = 1,2,..., dim g) to avoid effects of large gardient values. In the nexts the prediction error process is discussed, which goes asymptotically to zero in mean by O (1/n) where O denotes Ordo and 1/n is considered sometimes as the learning rate which can be accelerated until n ^-1/2 (https://en.wikipedia.org/wiki/Stochastic_approximation). But the consequence of the acceleration can result bias in the parameters. The yn,measured - yn prediction error converges by n^-1 to n^-1/2 (https://arxiv.org/abs/1805.08114) to the system noise process.

THE PREDICTION ERROR PROCESS WITH ZERO MEAN

(For prediction: https://en.wikipedia.org/wiki/Naive_Bayes_classifier, https://en.wikipedia.org/wiki/Fisher_information)

The system, estimation and measurement noise processes are assumed to be zero mean white Gaussian or uniform noise processes uncorrelated with the predicted process. The assumed noise models involve the parameter estimation method, as well. A short sample (≤ 20 step) mean could serve as a device to control the mean and the signes of the prediction error process. The symmetry of signs could be ensured by a c_n› 0 scalar to be updated in every step.

The original step size is modified by a c_nscalar due to O(1/n) of step size and due to the symmetric signs of the steps in a window criteria of prediction errors to get equal numbers of ± signs in a moving window and decreasing error process by O (1/n). Applying moving window of fixed width (of 16-20 steps of window): so the average of an indicator value and the symmetry of the signs must be be discussed. First the average length of the symmetric sings is determined in the truly random case.

In base b the probabilities of occurrences of length k subsequences - with equal frequencies of 1/b of independent trials– are P(X=k) when any patterns of length k are considered in the computable infinite case:

P(X=k) = p (1-p)^k-1= (b-1)/b 1/b ^k-1, for k = 1, 2, 3, ...,∞ ,

where the parameter (b-1)/b is equal to the probability of singles, otherwise any length k pattern has the same probability of P(X=k).

The expected value of the random variable X is equal to b/(b-1) and reciprocal to the parameter (b-1)/b of this geometric distribution and when the entropy of the distribution is maximum in the exponential family (https://en.wikipedia.org/wiki/Geometric_distribution). The variance is equal to b/(b-1)².

Thus in our case of b = 2 the expected value is equal to 2.

Comment: the expected value is equal to (2b-1)/(b-1) = 3 when b=2, k = 2,3,4,.. .

For a finite k_max the (b -1) Σ_k b^-kformula has the form of

(b -1) Σ_k b^-k = 1- b^-kmax≈ 1/2, for k = 2,3.., i. e. cc. 20, when b = 2. .

Follows that a windows of nearly 20 steps width would consist of 50% singles and 50% sequences with the probability of nearly 1. Based on this criteria the c_nscalar is modified to get symmetric sings in the window of 20.

Another way to compute the scalar c_n: in the - it is a guess which has not been proved- optimal case the c_nscalar is mofied in such a way that the absoluth value of (yn,measured - yn) would be decreasing in the interval of 1/n and 1/n^1/2 and its sign is changing in every step.