1,2,3,4,5,16,23,35 1,2,3,4,6,17,23,36 1,2,3,4,7,18,23,37 1,2,3,4,8,19,23,38 1,2,3,4,9,20,23,39 1,2,3,4,10,21,23,40 1,2,3,4,11,22,23,41 1,2,3,4,12,23,24,42 1,2,3,4,13,24,25,43 1,2,3,4,14,25,26,44 |
13,4,7,8,18,20 9,10,11,12,5,6,7,8,1,2,3,4,21,22,23,24,13,14,15,16,29,30,31,32,45,46,47,48 29,49,36,37,34,17,15,9,16,30,28,47,46,27,20,32,14,26,1,4,3,6,10,2,7,48,44,41 |
1 // since only 4 is in common with filter line 1 7 // since only 35 not found in filter line 2 6 // since 5 23 35 not found in filter line 3 |
|
|
|
|
|
|
|
|
// Perform intersection with xor: |
&
instead of ^
)// Count bits that are still set |
|
|
|
|
Input file: 1,2,3,4,5,6,7,8 21,22,23,24,25,26,27,28 2,4,6,8,10,12,14,16 5,10,15,20,25,30,35,40 3, 7, 16, 18, 21, 23, 25, 38 23,43,25,17,9,3,22,14,22,41,31,39,16 Output: 0000000000111111111122222222223333333333444444444455555555556666 0123456789012345678901234567890123456789012345678901234567890123 0111111110000000000000000000000000000000000000000000000000000000 : 8 0000000000000000000001111111100000000000000000000000000000000000 : 8 0010101010101010100000000000000000000000000000000000000000000000 : 8 0000010000100001000010000100001000010000100000000000000000000000 : 8 0001000100000000101001010100000000000010000000000000000000000000 : 8 0001000001000010110000110100000100000001010100000000000000000000 : 12 0001000000000000100000010100000000000000000000000000000000000000 : 4 |
|
|
that seems too much for your limits |
If each comparison takes a millionth of a second the run would take nine-and-a-half months. If each comparison takes a billionth of a second it will run for about 7 hours. |
And I have 5,000 raw data files and 40,000 filters to go!!! |
dhayden wrote: |
---|
Keep in mind that at 4GHz, a single core executes up to 4 trillion instructions per second. |
And I have 5,000 raw data files and 40,000 filters to go!!! Do you need to compare each line in each raw file against all 40,000 filter files? Or is it more like "each raw file has 8 filter files that it must compare against"? Please be as specific as possible. I have an idea for another algorithm that might be faster if the numbers are big enough. |
Our client try to build up a "fortune telling" database from 2 Chinese fortune telling methods: "date of birth and eight characters of a horoscope" and "Purple Star Astrology" usually "date of birth and eight characters of a horoscope" ... ... encode the "Purple Star Astrology" |
1,2,3,4,5,16,23,35 |