How to remove duplicate array elements in an efficient way?

Hi guys.

I have a question regarding a 2D array of edges, where an edge consists of 2 pixels. I'm working just in plain C, so not in C++ and my List output looks like:

List[Edge][0]=514 and List[Edge][1]=1027
List[Edge][0]=1025 and List[Edge][1]=1538
List[Edge][0]=514 and List[Edge][1]=1538
List[Edge][0]=514 and List[Edge][1]=1538
List[Edge][0]=514 and List[Edge][1]=1027
List[Edge][0]=1025 and List[Edge][1]=1538

As you can see: after the first tree entries, the same results are outputted.

List is defined as:

int** List = new int*[nbEdges]

As said, each Edge contains two pixels:

List[Edge] = new int[2];

As a result output, I want something that looks like:

List[Edge][0]=514 and List[Edge][1]=1027
List[Edge][0]=1025 and List[Edge][1]=1538
List[Edge][0]=514 and List[Edge][1]=1538
List[Edge][0]= 0 and List[Edge][1] = 0
List[Edge][0]= 0 and List[Edge][1] = 0
List[Edge][0]= 0 and List[Edge][1] = 0

It would be even better to just have an output like:

List[Edge][0]=514 and List[Edge][1]=1027
List[Edge][0]=1025 and List[Edge][1]=1538
List[Edge][0]=514 and List[Edge][1]=1538

I have two problems with it:
1. How do I do that on an efficient way?
2. Do I fill the other elements with 0, or? Would a memset be usefull?

Can anyone point me in the right direction?

Thanks in advance!
Do you care about the order of the edges? If not, you can sort them first. After that duplicates will all be in the same place, and it will be trivial to remove them.
I suggest that you create a new array for the cleaned version. That would be more simple.
If you do care about order, maybe you could pair them with numbers (1..n), sort the pairs by edges, remove duplicates and then sort back using attachments. I'm not sure how intelligent would that be. I'm guessing that's better than the obvious algorithm.
hamsterman, thanks for your quick reply.

Well, the order of the edges isn't important. My problem in general has to do with the fact that the elements are linked by 2... Sorting is indeed a good way to see the duplicates, but how do I sort 2 linked elements? That's actually my main problem... Since if I only sort on the first column, I could end up with:

List[Edge][0]=514 and List[Edge][1]=1027
List[Edge][0]=514 and List[Edge][1]=1538
List[Edge][0]=514 and List[Edge][1]=1538
List[Edge][0]=514 and List[Edge][1]=1027
List[Edge][0]=1025 and List[Edge][1]=1538
List[Edge][0]=1025 and List[Edge][1]=1538

Instead of

List[Edge][0]=514 and List[Edge][1]=1027
List[Edge][0]=514 and List[Edge][1]=1027
List[Edge][0]=514 and List[Edge][1]=1538
List[Edge][0]=514 and List[Edge][1]=1538
List[Edge][0]=1025 and List[Edge][1]=1538
List[Edge][0]=1025 and List[Edge][1]=1538

I hope you see what I mean...
1
2
3
4
5
bool less(int first0, int first1, int second0, int second1){
   if (first0 < second0) return true;
   if (first0 > second0) return false;
   return first1 < second1;
}
First compare by first member, and if they are equal, compare by second.
soory for my lack of knowledge hamsterman ... but i dont understand can you elobrate .
AFAIK there is no new in C.
When sorting you must move both fields. It will be easier if you use a
1
2
3
struct edge{
  int begin, end; //or int pixel[2];
};
@bluecoder
We're comparing pairs of integers. Think of how you would compare strings. You'd check the first characters. If you have "apple" and "banana", you can tell which goes first in the dictionary by looking at the first char. Only when you have "apple" and "ant", you have to check the second char too.
This is the same, just with integers.
Or was it a different part you didn't understand?
In the meanwhile I wrote a function for my problem, but I think it can be coded better. For a very long array with List[i][0] and List[i][1] with i going from 0 to 262144, it takes ages. My current code is listed below. So my question: how can I speed it up or make it better? I use this function for two lists actually. In one list I also know that there will be only one duplicate, so hence I should put a break somewhere to go further with the next edge when that first duplicate is found.

Thanks in advance!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
void RemoveDupEdges(int **array, int length, int &dup) {
    	    
	int aantal = 0;
	int EdgeInd = 1;
	int start = 0;
	
	int ver1 = array[start][0];
	int ver2 = array[start][1];
	
	int tmp1, tmp2;
		
	while(EdgeInd<length) {
        mexPrintf("EdgeInd = %i\n",EdgeInd);
		if( (ver1 == array[EdgeInd][0] && ver2 == array[EdgeInd][1]) || (ver1 == array[EdgeInd][1] && ver2 == array[EdgeInd][0])) {
			//change element with the last element of the list
			tmp1 = array[length-1][0];
			tmp2 = array[length-1][1];
			array[length-1][0] = array[EdgeInd][0];
			array[length-1][1] = array[EdgeInd][1];
			array[EdgeInd][0] = tmp1;
			array[EdgeInd][1] = tmp2;
			//update the amount of duplicates
			dup++;
			//updaten the edge
			if( (EdgeInd == (length-1)) ) {
					start++;
					ver1 = array[start][0];
					ver2 = array[start][1];
					EdgeInd = start+1;
			}
			//update length
			length--;

		} 
		else if( (EdgeInd == length-1) && (start != length-dup) ) {
			//update start position
			start++;
			if(start==length-1) {
				break;
			}
			ver1 = array[start][0];
			ver2 = array[start][1];
			EdgeInd = start+1;
			
		}
		else if( (EdgeInd == length-1) && (start == length-dup)) {
			break;
		}
		else {
			EdgeInd++;
		}
			
	}
    }
Thanks hamsterman .. that was exactly i did not understand ..thanks...
I see you aren't sorting. If you used a good sorting algorithm, like quicksort, it would be much faster. Although it may not be easy to write.
Topic archived. No new replies allowed.