Seeking data structure advice

I'm working on an algorithm that requires the following task as a subroutine:

Given a large (~10^7) array of identically-sized vectors of 'double' values, suppose that vector A "dominates" vector B if each element of vector A is greater than or equal to the corresponding element in vector B. (e.g. {a,b,c} dominates {d,e,f} if a >= d && b >= e && c >= f). For each vector in the array, determine whether there exists another vector in the array that dominates it and if so identify one such vector.

Is there a data structure that one could use to organize the array of vectors so that this task could be performed quickly? Because checking every pair of vectors is really slowing me down. Thanks.
This doesn't involve data structures, but you can use the fact that domination is a partial ordering to increase efficiency.

Not sure if there are even more improvements that can be made, but this'll help.
Last edited on
Maybe a list of sets. For each set have one dominating vector that dominates all the other vectors in the set.

To construct take each vector and compare it to the dominating vector of each set. If it has a relation to the dominating vector
(dominates or is dominated by) then add it to the set. If it dominates, then it is the new dominating vector for the set.

After this you have a bunch of sets, each with a dominating vector. Repeat the process by seeing if any of the dominating vectors, dominate (or are dominated by) the dominating vectors of other sets. If so, the sets can be merged keeping the overall dominating vector as the head.

Eventually, you have a number of sets, each with a dominating vector. None of these dominating vectors have a relation to each other.

To classify a vector, find which set it is in. The dominating vector of that set is the identified vector.

There are some details to fill in there but that's the overall idea.
Well you should definitely write a top level function that will determine which vector dominates. Than you can use the sort algorithm on the array that stores them. (yes, you can use the sort algorithm on an array).

You will need to pass your top-level function in as the 3rd parameter since the built-in vector < operator will not work for you and you cannot overload an STL containers method.

Once you have your array sorted it will be pretty damn easy to identify which vectors are dominating. Just return the number of vectors above the one you are asked about. : )

P.S.- I have not actually tried this but it should work in theory and is extremely simple since the only real code you are providing is the function the compares a set of vectors.
Last edited on
IceThatJaw's method will not work. Consider the vectors {5, 5, 5}, {1, 2, 3}, and {2, 1, 3}. The first vector dominates the other two, but neither smaller vector dominates the other. Because of this, sorting the vectors by domination doesn't really make sense.
Well, TS would obviously provide the correct < operator. There has to be an ultimate tie breaker or this whole notion is pointless.

TS decides what happens in a tie breaker. My idea is brilliant.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include<vector>
#include<algorithm>
#include<ctime>

using namespace std;



bool IsLessThan(vector<int> a, vector<int> b) {
	srand(time(0));
	int aCount= 0, bCount = 0;
        for (int i = 0; i < a.size() and i < b.size(); i++) {
		if (a[i] < b[i])
			aCount++;
                else 
			bCount++;
		}

        if (aCount == bCount)
           return rand() % 2;
        else 
             return aCount < bCount;
} 
      
int main () {

	const int size = 10;
	vector<int> arr[size];

	//load up
	for(int i = 0; i < size; i++)
		for(int j = 0; j < 10; j++) 		
			arr[i].push_back(rand()%10+1);
	
	sort(arr, arr + size, IsLessThan);

	// figure out how they choose a vector, haha

	return 0;
}
Last edited on
You can sort them all, but this would be a topological sort and not an ordinary sort.
(since there is not a < relationship between any pair)

I thought about that but rejected the idea since you would have to construct a DAG and with ~10^7 entities, this would just be too large.

An update on my idea. You do not need to store the whole sets - just the dominating vector for that set.

After the process you have a set of dominating vectors, there is no relation between them. i.e. no one dominates or is dominated by another.

To process a vector you just compare it to each of these vectors. If it is one of them it has no dominator. Otherwise it will be dominated by (at least one of) them.

It seems likely the OP is developing Pareto front software?

This link gives an algorithm in MATLAB. It may be possible to examine the code and convert to C++

http://www.mathworks.com/matlabcentral/fileexchange/17251-pareto-front
Topic archived. No new replies allowed.