Closest pair ,Planar Case

Hello , I've recently come across the closest pair problem and I don't understand some things .
Please tell me if the question is vague.

Questions :

1.Why we need to sort the list of our points by Y coordinate ?
Is it because for an efficient comparison ?
Couldn't we achieve the shortest pair also only sorting by X both sides of below image?

2.Am I right when I say that we must compare our point on left side of Y with at most 6 points ( p1 with a,b,c,d,e,f)?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
  

     |+delta
     |
     |
     |e_____f
delta||     |
 +   ||     |
p(1) |c_____d
 +   ||     |
delta||     |
     |a_____b
     |
     |
     |+delta


Last edited on
It has to do with algorithmic complexity. The more points you have to compare with one another, the longer it takes to find the closest pair. So if you can reduce the number of points compared with point c, for example, by half, that make a huge difference in how much time you must spend running the algorithm.

Also, you can do this by sorting either on X or Y -- both ways have been done.

You might want to google around this problem for a better explanation of what it is doing if you still need help.
> Please tell me if the question is vague.
It seems that you are discussing an specific algorithm to solve the problem.
I'm going to assume that you mean this approach https://en.wikipedia.org/wiki/Closest_pair_of_points_problem#Planar_case


> Why we need to sort the list of our points by Y coordinate ?
Because "we must compare our point on left side of Y (¿?) with at most 6 points ( p1 with a,b,c,d,e,f)"
¿how do you plan to get those 6 points?
If the central band is sorted by Y, then the points to compare against are all next to each other.
ne555

I thought that by sorting points the left side of Y axis by their x coordinate and again sorting the points by the x coordinate on the right side of Y axis and then calculating the distance would also work . Am I wrong ?
(pretty hard to explain such problem on a forum)
Last edited on
I thought that by sorting points the left side of Y axis by their x coordinate and again sorting the points by the x coordinate on the right side of Y axis and then calculating the distance would also work . Am I wrong ?
(pretty hard to explain such problem on a forum)


The points on each side are already sorted by the X axis in one array and by the Y in another. And remember this isn't the Y axis where the two halves are split, it's the vertical line that divides the point set in half. Ultimately you get lgn divisions.

In the concur step, you need to account for the fact that the closest pair may not be contained to a single division. It may be one point is one and one in another. But if so they must each be within the 2*delta strip along the line that divides them.

Like ne555 said, you rely on the fact you have an array of them sorted by Y, in order to be able to check only the next 6, because if any two points in the array are as close or closer than delta, then they are each members of some sub sequence of length 7 in the array. This means you can do this step in O(n) time complexity.

If they weren't sorted, then for all you know the closed pair has one at the end of the array and one at the beginning. You would have to check each against each other, and this would have O(n^2) complexity.

Another critical thing to understand about the algorithm is that it is able to run in nlgn time rather than n^2lgn time, because you can maintain sorted order with linear complexity, rather than resorting in each recursive call.

I wrote this to try understand the algorithm a little better but it was never really meant to be ran, probably full of errors.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
struct Point {
    float x;
    float y;
};

Pair< Pair< Point, Point>, float > brute_force_closest_pair( 
    vector<Point> vp 
) {

    float min = std::numeric_limits<float>::infinity();
    Pair<Point, Point> CP;

    int vpsz = vp.size();

    // compare all of them
	for ( int i = 0; i < vpsz; ++i ) {
		for ( int j = 0; j < vpsz; ++j ) {
			d =  euclidean_distance( vp[i], vp[j] );
            if ( d < min ) {
                min = d;
                CP = Pair<Point, Point>( vp[i], vp[j] );
            } 
		}
	}     

	return Pair< Pair< Point, Point>, float> ( CP, min );
}

// big-O time complexity is in O(nlogn)
Pair< Pair< Point, Point>, float >  closest_pair( 
    unordered_set<Point> P, 
    vector<Point> X, 
    vector<Point> y 
) {
    int sz = P.size();

    // base case
    // because the brute force method is only called with a max of 3 points,
    // the worst case is that the brute force algorithm will need to check 
    // 9 pairs of points implies this call has constant big-O time complexity
    if ( sz <= 3 ) 
    	return brute_force_closest_pair( X );

    // linear big-O time complexity
    // because X is alread assumed to be sorted by the points x values,
    // to parition into XL and XR, put the first half in XL, second in XR
    vector<Point> XL( X.begin(), X.begin() + sz/2 - 1 );
    vector<Point> XR( X.begin() + sz/2, X.end() );

    // linear big-O time complexity ( unorder_set is hash based )
    // make PL and PR from XL and XR
    unordered_set<Point> PL( XL.begin(), XL.end() );
    unordered_set<Point> PR( XR.begin(), XR.end() );

    // linear big-O time complexity
    // now split Y based on membership in PL and PR into YL and YR
    // YL and YR remain sorted by y values because Y is assumed to be 
    // sorted by Y values, and we iterate through and insert into YL, YR,
    // in order
    for ( int i = 0; i < sz; ++i  ) {

    	// constant big-O time complexity ( unordered_set is hash based )
    	unordered_set<Point>::iterator it = PL.find( Y[i] );

    	if ( it != PL.end() ) 
            YL.push_back( Y[i] );
    	else 
    		YR.push_back( Y[i] ); 
    }

    // recursive calls
    // because with each recursive call, the problem is divided in half, 
    // these happens lgn times
    Pair< Pair< Point, Point >, float > CPL = closest_pair( PL, XL, YL );
    Pair< Pair< Point, Point >, float > CPR = closest_pair( PR, XR, YR );
    
    // choose the closest pair of the two returned by the recursive call
    Pair< Point, Point > CP = CPL.second < CPR.second ? CPL : CPR;
  
    // represents the vertical line separating PL and PR
    // make it's x value the average of the two most center points on x-axis
    float lX = ( X[ sz/2 - 1 ] + X[ sz/2 ] ) / 2.0;

    // make a new vector containing all points within the min distance of lX
    // if the closest pair has one Point in XR, and the other in XL,
    // then it must be within this 2*CP.second strip centered at lX
    vector<Point> YP;
    for ( int i = 0; i < sz; ++i ) 
        if ( abs( Y[i].x - lX ) <= CP.second ) 
        	YP.push_back( Y[i] );
   
    // time complexity for this part is linear.
    // Now check for points within this strip ( in YP ), that are 
    // closer than the closest points either both in PL or both in PR.
    // For each point in YP, we only need to check the next 7 because
    // we know that each pair of points in the left side of the strip are 
    // at least CP.second away from eachother, and it follows that only
    // 8 points can reside in the area we are interested in looking at.  
    // So we only need to check the next 7 points in YP. 
    int ypsz = YP.size();
    for ( int i = 0; i < ypsz; ++i ) {
        for ( int j = 0; j < 7 && i + j < ypsz; ++j ) {
    	    float dP = distance( YP[i], YP[i+j] );
    	    if ( dP < CP.second ){
    	    	CP.first  = Pair< Point, Point>( YP[i], YP[i+j] );
                CP.second = dP; 
    	    }
    	}   	
    }

    return CP;
}  


Last edited on
It has to do with algorithmic complexity. The more points you have to compare with one another, the longer it takes to find the closest pair. So if you can reduce the number of points compared with point c, for example, by half, that make a huge difference in how much time you must spend running the algorithm.

Also, you can do this by sorting either on X or Y -- both ways have been done.


Reducing the number of points by half wouldn't reduce the time complexity. If you only sort by Y, or only sort by X, then you probably still can't get better than O(n^2) time complexity. It's not about reducing the number of steps, it's about reducing the growth of the number of steps with respect to the size of the input.
Last edited on
ok I got it , thanks
Reducing the number of points by half wouldn't reduce the time complexity.

It's a divide and conquer algorithm. You don't just reduce by half once. It's a recurrence. Hence, it has an asymptotic complexity -- O(n log n).

It's not about reducing the number of steps, it's about reducing the growth of the number of steps with respect to the size of the input.

Splitting hairs with vocabulary. OP doesn't care.
It's a divide and conquer algorithm. You don't just reduce by half once. It's a recurrence. Hence, it has an asymptotic complexity -- O(n log n).

If that was what you meant, it certainly wasn't the clearest of ways to say it. I'm sure most beginners would have been mislead by it as taking what you said literally would have been completely wrong. The other thing you said about either sorting by x or by y is also wrong.

Splitting hairs with vocabulary. OP doesn't care.

Its a critical distinction. What you said is wrong. Maybe that isn't what you meant, we can't read your mind.
Last edited on
Topic archived. No new replies allowed.