Finding an unknown pattern in a sequence of bit streams

Hello Gentlemen. I am new here, so preliminary apologies for any blunders. I would like to write a program to find an unknown pattern in a sequence of several bit streams of varying length. The bit streams are basically lines of binary in a text file that come from a bit of excel magic and an oscilloscope (and before you ask, my scope is cheap and coding this could be fun). So for example, if I was given a file with:

1111111111111000111000011101000001110000011000110011001111111111110101
11111000111100011111000001111000001110110011101111111100000000000000000000
111111111111111111110001111000011110000011100000111001100110011111111111100000

I would want to do something like this this to the streams:

111110001110000111010000011100000110001100110011111111
111110001111000111110000011110000011101100111011111111
111110001111000011110000011100000111001100110011111111
------------------------------------------------------
111110001110000011110000011100000111001100110011111111

Find a pattern between the streams, align the streams, and take an average (as noise is a factor) discarding any "junk data" (like the stream of 111111 or 0000 at the front or end of the bit stream). It is possible that the program could be provided with a very rough approximation of what the stream looks like initially.

I have a few ideas, however I am by no means an expert programmer. Is there a reasonably efficient way to do this? (streams will be around 500 to 750 bits long)

Thanks in advance for your consideration of my current conundrum.
I will try and help, but don't expect many things to come from me. but what do you have so far?
Pattern recognition is generally accomplished one of two ways: either by brute force checking for certain patterns (ie, you have a set of patterns in mind, and you want to check them one by one) or by artificial intelligence. The latter is orders of magnitude more complicated than the former.

So you need to clarify a bit. Are you just trying to splice a bitstream into N consecutive,
equal-sized subcomponents such that all subcomponents are exactly equal to one another,
"almost" equal to one another, or what?
It sounds as if he's looking for a pattern with noise. So there isn't an exact bit pattern match across matches...definitely non-trivial.
I think you could try cross correlation which might involve array multiplication. just a guess from my DSP class some one might have a better idea. but i think you can try this
an average of the bits (which isn't what you are showing as the result) would give something also...

111110001110000111010000011100000110001100110011111111 (x)
111110001111000111110000011110000011101100111011111111 (y)
111110001111000011110000011100000111001100110011111111 (z)
------------------------------------------------------
111110001110000011110000011100000111001100110011111111 (A)


so average won't work.. it looks like if it's 1 only in the top two only it's zero, and if it's 1 only in the bottom two or 1 only in the top and bottom (not the middle), then it's 1, otherwise, if it's 1 only in one then it's 0 and if it's 1 in all three its 1 and 0 in all three 0...

if that is what you are looking for , then without really checking it, you could (if the first is x, second is y, and third is z, and A is the number you are looking for) do this:

A= ( (x&z) | (y&z) )

could do it for you

10101
11100
01011
--------
01001

if you want average (if there are at least two 1's then 1), you can do this

A= ( (x|z) & (y|z) )


10101
11100
01011
--------
11101


depends on what you want to do
Last edited on
I think you are looking for something a bit different though, looking for gaps and such... that's a little more complicated for long streams...
Hi again, this is ark488, not sure why but I can't login to that account...

Anyways, I think I feel the same way as jsmith. I had similar thoughts when I initially considered this problem. It's not something that can be entirely brute forced as the streams are far too long and there is no absolute cohesion between them.

The method I've been using so far requires some information about the bit stream. Knowing the preamble of the data, the streams can be aligned in time, which will align the data, at which point a simple average of the streams can be taken to get a reasonable approximation (and hopefully cancel out some noise in the streams).

Topic archived. No new replies allowed.