(images) SIFT Matching between 2 sift matrices

Jan 2, 2011 at 12:13am
Hi, this is my first post so i'll get straightforward to the point.
i'm doing a function that takes as input two sift matrices of image A and B and returns the number of sift matching sifts.
the matrices have the following characteristics:

1)1 row describes 1 sift
2)a sift is desxribed only by integers:
x,y, coordinates, intensity, orientation + 128 integers in the range 0/255
3) the 128 integers are used to actually measure the distance between two SIFT
descriptors
4)SIFT x from image A matches SIFT y from image B if y is the closest (to x)
SIFT in B, and the second closest SIFT z in b has distance such that d(x,y)/
d(x,z)<alpha
-Usually with alpha=0.8.

my algorithm (I use C++, this is just pseudocode to not bore you):

int NumMatches

for each row in A
{ for each row in B
{ compute euclidian_distance(row(A),row(B));

select Minimum1 //the minimum distance
select Minimum2 //the second mimimum distance
}
}
if(Minimum1/Minimum2<alfa)
NumMatches++

return NumMatches


the algorithm must be applied to about 30K images, so i need that it would be as light as possible and my O(N^2) complexity is too much to handle.
Is it possilbe to do better?
Maybe changing the type of distance?
I need to know how i can make it faster!!
thanks!
Jan 2, 2011 at 8:40am
Man, SIFT matrices and what have you. Feels like the aliens are taking over the planet :)

Still, the best I could find is this: http://en.wikipedia.org/wiki/Nearest_neighbor_search
And this is also related: http://en.wikipedia.org/wiki/Kd-tree

Now, unless you are planning for months to come, I doubt that you must go this way. Those solutions lower only the average complexity to O(n logn), which is what matters of course. The worst case is probably the same as with you.

I also thought that creating the Voronoi diagram (space partitioning) would be helpful. But for high dimensional spaces it seems to scale poorly in both execution time and memory complexity. Execution time could be improved to "expected linear", or so some paper claims, but I wonder if it could be used to solve your problem, without use of vast storage space.

Regards

EDIT: I forgot to mention, that if I understood the SIFT matching thing correctly from google than you probably could have something like:
1
2
3
4
5
6
7
8
9
10
11
12
13
for each row in A
{
  for each row in B
  {
    compute euclidian_distance(row(A),row(B));
    select Minimum1 //the minimum distance
    select Minimum2 //the second mimimum distance
  }
  if(Minimum1/Minimum2<alfa)
    NumMatches++;
  exclude the winning row from B or something
}
return NumMatches;


I mean, shouldn't the matching be one-to-one, not many-to-one.
Last edited on Jan 2, 2011 at 9:14am
Jan 2, 2011 at 12:43pm
thanks for the answer simeonz,

I was looking for something that could avoid me to make a lot of euclidian distanceses between the rows of A and B.
(something like a particular data structure as you suggested)
(there are more than 1000 sifts in an image)

excluding B row in your code (line 11) maybe a good idea, but this won't bring down too much the computation, i guess.
thanks again for answering me.
Jan 2, 2011 at 12:59pm
There is indeed another way.

Lowe stated that for small vector distances, arcos(x) is a good replacement to approximate the angle between them.
Calculating the non-complex arcos(x) (complex calculus would be worse than the euclidean distance) and preferably using a discrete threshold to make the value real, may speed all your process up significantly.

I highly suggest you to take a look at Lowe's papers. Good Luck!
Jan 2, 2011 at 1:48pm
I meant that excluding B row could be more proper for correctness sake. This will indeed not improve the performance by much.
Jan 2, 2011 at 2:58pm
thanks kpeleo, i've foud this: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.157.3843&rep=rep1&type=pdf

it says that to improve efficiency i must not use deterministics way, but a probabilistic algorithm, but that reduces efficiency.
Before looking for a new probabilistic approach, i'm still looking for a deterministic one.

I know pthreads, openMP and MPI, and i can access to a 20-pc cluster.
But the serial part is too much anyways!
Any new ideas?
thanks for your coop!
Jan 2, 2011 at 3:43pm
Just curious. I am trying to learn. What was your reason to select the probabilistic approach? The paper whose url you posted claims that using k-d trees is not improvement over exhaustive search. According to the article in wikipedia, the average complexity for using k-d tree will be O(n logn) (O(logn) per query). This is hell better than O(n^2) average complexity for the exhaustive search. Of course, you may hit the worst case, but this should be the exception.
Jan 2, 2011 at 6:56pm
no worry, i'm trying to learn too, so just be patient if i say some crappy things.
i just wanted to try Best-Bin-First, that sounds like a probabilistic algorithm, but it's not.
in this case i have to change all of my data structure in the program (from matrices to k-d trees) is that correct?
any suggestions to improve it better?
some libraries?
I forgot to tell that i know CUDA too, but the quantity of data was too much even for a GTX460.
Feb 10, 2011 at 2:37pm
You could used already written approach of Opencv.
Or write it in minutes with Eigen matrices...

You could use FLANN implementation of KDtree nearest neighbors research...
http://people.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN

Or use libmv library. libmv => http://code.google.com/p/libmv/
Topic archived. No new replies allowed.