Row matching

Hi,

I am hoping someone can help me on this problem. I have a large data set (over a million records) with about 100 variables holding the results for each person at various times.

I am looking for an algorithm that will tell me which rows, within each subgroup, match all the values in another row. So every row that matches row 1 will be given a tag of 1 and will not be evaluated further, every remaining row that is not a match to row 1 will be evaluated to see if it matches row 2 and will be given a tag of 2 and will not be evaluated further, etc. The additional rule is that if a result is missing then that result is treated as a wild card can be matched to either a 0 or 1 (the only 2 possible outcomes). However, once it is matched to a 0 or 1 then it has to match to that value in any subsequent processing. The data set must be sorted by time.

Any help or suggestions would be appreciated.

Thanks,
Sharon

ne555 (10692)

Let me see if I understand
You've got > 1e6 rows, each one has 100 elements.
The elements may be 0, 1, or wildcard.
You want to find the exact matches

Try a trie

Topic archived. No new replies allowed.