Wednesday, August 28, 2013

Note on probability and periodicty

Draft 1

Please let me know of errors. My email address is conant78@gmail.com



By PAUL CONANT

We consider a binary string that, we assume, began specifically at the first observation.

If that string appears to follow a periodic pattern, a question often asked is whether that string was nonrandomly generated, that is, whether the probabilities for bit selection are independent.

One approach is the runs test, which matches the number of runs against a normal distribution of runs of length n. This is a very useful test, but it fails for

00110011

which has the mean number of runs for n = 8 but which one might suspect is not as likely to be random as would be an aperiodic string.

So what we want to know are the number of periods, which we calculate as 2(1 + 2C2 + 4C2). That subset's members are all unique permutations, each of which we consider to occur with equal probability, based on the provisional condition of independence of probability for each bit, which is put at 1/2.

So let us consider this test for nonrandomness on a string with length n, where n is a composite.

Let n = 8

and the string is

00110011

On length 8 we have the factors 1, 2, 4, 8. Now a string composed of all 0s or all 1s is certainly periodic. So we use the factor 1, along with factors 2 and 4. But we do not consider the factor 8 because a period of length 8 with no repetitions gives us no information about the probability of periodicity (there is no obvious periodic pattern).

So then the cardinality of the set of periods is:

2[1 + 2C2 + 4C2] = 16

which we divide by 2^8, or

8/2^7 = 1/16.

So our reasoning is that the probability of happening upon a randomly generated periodic 8-bit string is 1/16, or 6.25% in contrast to happening upon an 8-bit string of a specific permutation agreed upon in advance, which is 2^-8 or 1/256, or 0.39%. The probability of happening upon an aperiodic bit string is of course 15/16 or 93.75%. This all seems reasonable, where p(specific permutation) < p(having property of periodicity when the bit-length is composite) < p(having property of periodicity on that same string length).

So we argue that the probability of nonrandom influence is 93.75% 

The general formula for strings with remainder 0 is

[1 + (A_1)C2 + ... (A_m)C2]/2^(n-1)

Let's check n = 9.

[1 + 3C2]/2^8 = 1/64

A caveat: sometimes one permutation corresponds to more than one period. It will be found that that only occurs when the number of bits equals p^m, where p is some prime and m is a positive integer. We check the case of 8 bits. Here we find that

0000,0000    0101,0101    0011,0011

and their mirror images are the only strings that have one period superposed on another. That means we might wish to subtract 3 from our set of periodic strings, giving  [2(1 + 2C2 + 4C2) - 3]/2^8 = 5/2^6 = 0.078. However, as n increases we will be able to neglect this adjustment.

We have been discussing exact periodicity. Often, however, we are confronted by partial periodicity, such as this:

00100100100

So what we want to know is the probability that this is part of string 001001001001

which we obtain by 1/64(1/2) = 1/128 = 0.0078;

similarly for 0010010010

where we calculate 1/64(1/4) = 1/256 =  0.0039. This represents the probability that the string is part of a periodic string of length 12.

This probability is distinct from the probability of happening upon a periodic 12-bit string, which is:

(1 + 2C2 + 3C2 + 4C2)/2^11 = 11/2^11 = 0.00537.

Important points:

1. The periodicity probabilities change in accordance with the primes, which are not distributed smoothly.

2. As bit length n tends to infinity, the numerator 2(set of combinations of aliquot factors + 1) tends to 0 with respect to denominator 2^n. This means that with n sufficiently large, the difference between the probability of periodicity and the probability of a specific permutation are close enough to be viewed as identical.

Point 2 permits us to look at a specific string of bit length n >> 5, see that it is periodic or "near" periodic, and assign it a probability of about 2^-n. This is important because we are able to discern the probability of dependence by use of a number that is traditionally only used to predict a specific bit string.

A nicety here is that the ratio of primes to composites diminishes as bit length goes to infinity. For a prime, there are only aperiodic strings. That is, we have pC2/2^n. In the case of 11 bits, we have 55/2048 = 0.0269. So as n increases, the probability that a randomly selected number could be periodic goes up. This consideration does not affect the basic idea we have given.

(The formula of periodicity -- with no repetitions of the period -- for a prime is simply 1/2^(p-1).)

Of course periodicity isn't the only sort of pattern. One can use various algorithms -- say all 0's except at the (n^2)th bit -- to make patterns.

A simple pattern is mirror imaging, in which the string on either side of a midpoint or mid-space is a mirror of the other; that is, bits are reversed.

To wit:

001001.100100

How many mirror pairs are there? Answer 2^6. So the probability of  happening upon a mirror pair is 2^6/2^12 =  1/64 = 0.015625..

No comments: