I'm working on an algorithm that requires the following task as a subroutine:
Given a large (~10^7) array of identically-sized vectors of 'double' values, suppose that vector A "dominates" vector B if each element of vector A is greater than or equal to the corresponding element in vector B. (e.g. {a,b,c} dominates {d,e,f} if a >= d && b >= e && c >= f). For each vector in the array, determine whether there exists another vector in the array that dominates it and if so identify one such vector.
Is there a data structure that one could use to organize the array of vectors so that this task could be performed quickly? Because checking every pair of vectors is really slowing me down. Thanks.
Maybe a list of sets. For each set have one dominating vector that dominates all the other vectors in the set.
To construct take each vector and compare it to the dominating vector of each set. If it has a relation to the dominating vector
(dominates or is dominated by) then add it to the set. If it dominates, then it is the new dominating vector for the set.
After this you have a bunch of sets, each with a dominating vector. Repeat the process by seeing if any of the dominating vectors, dominate (or are dominated by) the dominating vectors of other sets. If so, the sets can be merged keeping the overall dominating vector as the head.
Eventually, you have a number of sets, each with a dominating vector. None of these dominating vectors have a relation to each other.
To classify a vector, find which set it is in. The dominating vector of that set is the identified vector.
There are some details to fill in there but that's the overall idea.
Well you should definitely write a top level function that will determine which vector dominates. Than you can use the sort algorithm on the array that stores them. (yes, you can use the sort algorithm on an array).
You will need to pass your top-level function in as the 3rd parameter since the built-in vector < operator will not work for you and you cannot overload an STL containers method.
Once you have your array sorted it will be pretty damn easy to identify which vectors are dominating. Just return the number of vectors above the one you are asked about. : )
P.S.- I have not actually tried this but it should work in theory and is extremely simple since the only real code you are providing is the function the compares a set of vectors.
IceThatJaw's method will not work. Consider the vectors {5, 5, 5}, {1, 2, 3}, and {2, 1, 3}. The first vector dominates the other two, but neither smaller vector dominates the other. Because of this, sorting the vectors by domination doesn't really make sense.
You can sort them all, but this would be a topological sort and not an ordinary sort.
(since there is not a < relationship between any pair)
I thought about that but rejected the idea since you would have to construct a DAG and with ~10^7 entities, this would just be too large.
An update on my idea. You do not need to store the whole sets - just the dominating vector for that set.
After the process you have a set of dominating vectors, there is no relation between them. i.e. no one dominates or is dominated by another.
To process a vector you just compare it to each of these vectors. If it is one of them it has no dominator. Otherwise it will be dominated by (at least one of) them.