1) The slowness really would come from so much nesting in this algorithm. In the checkFound you should be able to switch to something that does a memcmp and do a line at a time (or for a C++ way compare one object to another).
2) The second thing of the top of my head is to reserve some size for the text array. If you know the size ahead of time then great... otherwise just make some sort of estimate (ie figure out what the common sizes are that you use and use that).
3) Investigate string find algorithms. Michael Abrash did a series on these a while back.
4) You may be able to pre-pass a row quickly by simply checking to see if "any of the numbers you are looking for are in that sequence" of the 12th position. If not jump 12 places. To be clear... you should be able to scan until you find a cell that has one of the numbers that is in the smaller matrix...
Further more you could potentially OR all the numbers in that 12x12 together to get 1 number which you can quickly reject a cell... If OR doesn't work... try some other hash :p
something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
|
//Pseudo -> rushed sorry.
//This is our early out function
bool isValidValue(..., value)
{
if (value != 0 && value & quickHash == 0) //If none of my bits match the ones in the 12x12 then early out
{
return false;
}
for ([number of unique values in 12x12]) //Could be hashmap
{
if ([equal to current value])
{
return true;
}
}
return false;
}
void ComputeValidRangeForSmallMatrix([...])
{
[clear unique values vector]
ebmpBYTE quickhash = 0;
for ([each value in the small matrix])
{
quickhash |= value;
if ([value not in unique values vector]) //Could be hashmap
{
[add to unique values vector]
}
}
}
//Note you said you had a small matrix however it appears that checkFound is using the
// same colSize/rowSize and the external loops... so that doesn't add up to you saying that
//it searches smaller matrixes in the bigger one because in this algorithm they are the same size.
// I think there may be a bug. So to be clear I'm using [].
int i,j ;
for( vector<charRepPair>::iterator it = pairDB.begin() ; it != pairDB.end() ; it++)
{
[compute valid values for small matrix]
for( i = 0 ; i < byteRow - it->rowSize ; i++)
{
for( j = 0 ; j< byteCol - (it->colSize-1); j++)
{
while ([not at end] && !isValidValue(..., j + [small matrix size col - 1]))
{
j += [small matrix size col];
}
if ([at end of j])
{
break;
}
if( this->checkFound( bmpBytes , i,j,*it))
{
....
|
(Note I've probably made a tone of bugs in there... I'm only trying to get the concept across). This allows you to jump 12x12 blocks at a time if those blocks have no matching values. Once you've got that... you can try to improve it more... I'm sure you'll get more ideas.
5) (Warning Complicated)
This following technique may sound a little confusing to one who has not studied this stuff before but anyway... I'll try to give you a lead. You may be also able to use some sort of binary or quad search. I'd need to think about that more... and I haven't studied your algorithm completely to know all the edge cases.
Essentially in this sort of search you divide one (or both) of the bitmap up into 2 (binary) or 4 (quad). Each time you subdivide you check if the cells have the relevant information to traverse further (otherwise you stop recursively subdividing). How do you know when to stop? Well you have a couple of options.
You could pre-pass the bitmaps into a tree like structure. You can start at the top... at each node level you write something that gives you information about what is in the child nodes. For example you may store the range of bytes at each node level. Or you could figure this out has your going down a tree branch without a pre-pass.
Will this be faster? Maybe maybe not... but my guess is you could make it faster with some attention to detail. Currently your algorithm is something like O(N^6). The solution suggested with the pre-pass would be O(N^3)x2 (note added the x2 because you have to do this for the pre-pass as well)
Its kinda a hard tech to explain in the limited time I have. Its kinda like constructive solid geometry using quadtree or z-pyramid. (I described that in my thesis
http://www.gamasutra.com/features/20060417/anderson_01.shtml)... although its not quite the same.
Make no mistake though... this is not a simple approach however it will probably be the fastest from the ones I've mentioned so far.
6) Use CUDA or pure directX/OpenGL. That stuff is made for image processing.