revised topic - greedy algorithm, advanced

Sorry for doing this, but I am reposting an alternate problem because I think I confused people the first time.

I have been reading about greedy algorithms (kruskals, primms) because I think they apply to my project. The nature of my problem (i dont have code for this yet) is to design an airport network over connecting all possible cities with the LEAST COST **This is where it is different... the real problem deals with "number of flights", but conceptually it is just a parameter to minimize, so I have replaced that with COST here to achieve the same affect.** Each city airport needs at least one incoming and one outgoing flight.


The input would look something like this: <city origin> <COST for route> <destination city>

Chicago # New York
Las Vegas # New York
Dallas # Detroit


etc. etc. for a decent sized input file. The important thing to understand is that "to/from" direction or orientation (whatever you wish to call it) is important. NOTE: All of these flights are one-way, meaning that Chicago to New York (if route exists in input) probably won't equal (again, if it exists) a New York to Chicago route in cost.

One of my main issues is dealing with this concept of bidirectionality? (not sure if that is the term) where I think the Kruskals / Primms algorithms neglect.

The other post was getting way too confusing and I think now people might be able to help me better. Thank you... I will delete my other post if I can.
Last edited on
If your going for a cost based objective function. Then you can have an optimisation implementation as you have suggested.
http://en.wikipedia.org/wiki/Optimization_(mathematics)

The greedy algorithm is just one of many different types you could implement. Although, as discussed in the Wiki article, it's highly unlikely to find the global optimum, so probably not an ideal choice.
http://en.wikipedia.org/wiki/Greedy_algorithm

What your really doing is trying to map a surface and find the lowest (or highest) point that best matches your situation. We use a quasi-newton method and a custom modified genetic method (taken from a differential evolutionary solver).
http://en.wikipedia.org/wiki/Quasi-Newton_method
http://en.wikipedia.org/wiki/Differential_evolution

The general pseudo-steps are:
1) Load base values
2) Generate candidate solution (through your chosen method)
3) Evalutate and generate a likelihood score (a fit-score)
4) Evaluate score against acceptable criteria
5) If score acceptable, take it. If not, go back to Step 1.
One of my main issues is dealing with this concept of bidirectionality? (not sure if that is the term) where I think the Kruskals / Primms algorithms neglect.


That in itself should not have any influence on the method. Your job is to provide the method with a way to pick the next best possible candidate solution. The way you manage your data structures should handle the bi-directional component of your problem.
I will look at those. All advice / suggestions are welcome. I am still unsure how to take the bidirectionality into consideration.

Is there any way to simplify this down? I think only some type of "brute force" natured algorithm will work here. For instance, say if I had a 6 line input file sorted by cost.

I could test all permutations, starting with row1, row1+row2, row1+row3, row1+row2+row3, row2+row3 ... and so on, but even then I am not sure how to program this out and for a big input file this would probably fail. I was thinking of implementing some kind of logic in there, but then once I try to ensure that every city appears both as a source/destination I lost sight of how to minimize.
You at some point have to specify a relationship between every airport.

e.g I have airports A, B, C and D. My likelihood function needs to know the cost of flying between them all.
A -> B = 100
A -> C = 150
A -> D = 50
B -> C = 75
B -> D = 100
C -> D = 300

From there, You will load your input file that has existing flights between cities as your starting point for the model. Then using the greedy method you'd find all airports with either a 0 inbound/outbound flight and connect them using the lowest possible cost flights.

Then, your likelihood score would be the total cost. And add a penalty if there is a break in your link.
e.g A->D & D->A & B->C & C->B where generated. You'd add a penalty because you couldn't get from A to B, and from A to C etc. So this would increase the score and make the solution less likely to be accepted. If you set the penalty to something really high like 500000, then any score under that is a valid solution, and you'd have your own criteria on whether or not it's actually acceptable.

Optimization a function isn't about finding the best solution, but a solution close to the global minimum. Without doing all possible combination's it's highly unlikely you'd get the perfect solution.

That' being said. You'd still find the best method was a simple loop solution that basically just picked the next lowest cost route continually until you have formed a complete loop.

e.g
Pick random airport (B)
Fly from B to lowest cost (D)
Fly from D to lowest cost (C)
Fly from C to lowest cost (A)
No more airports without inbound flights. Fly from A to B.
I think I sort of get what you are saying about the loop solution, but what if my input looked like this:

A -> C = 1
B -> A = 10
A -> B = 100
B -> C = 1000
C -> A = 10000
C -> B = 100000

if I start at the minimum route A -> C =1, from there I would go to B->A =10 and then finally to C->B = 100000.

This wouldn't be a minimum solution, because If I started with A->B = 100, I would go to B->C = 1000, then C->A = 10000 which would be smaller.

This is the current input file I am testing on... I probably should have just used this as a problem statement.

Your sort of getting it.

In a true greedy method, it's start with the lowest cost flight
A -> C (1) -> B (100,000) -> A (10) = 100,011 Total Score.

As mentioned. Greedy methods seldom find the global minimum. They are "greedy" in that they will take the best current method available to them at that time.

What you could do, is run the model X times where X is the number of airports you have. Starting from each airport and then taking the final best solution.

e.g
A -> C (1) -> B (100,000) -> A (10) = 100,011
B -> A (10) -> C (1) -> B (100,000) = 100,011
C -> A (10,000) -> B (100) -> C (1,000) = 11,100

So we'd find that starting @ C and going round is going to be 10x cheaper in cost than the other methods.
That is precisely what I was thinking of doing, but is it possible that for certain input configurations it still might not result in finding the solution? I am just thinking what a 30 line input file might result in (hard to visualize how this program could scale).

Let me ask you this, would for each airport iteration would you begin looking at the minimum all over again?

Say I start at B->C . Would you first tell the program to look at A->C, and if so, how would you deal with this? I know you were talking about the likelihood score and I think this is where it might come in. But say I was "blind" to the format of the input so my algorithm did not know for certain whether A->C would be part of the solution or not?
Last edited on
As mentioned. Greedy methods seldom find the global minimum.


It will find A good solution. But is highly unlikely to find THE best solution. That is a fundamental flaw in the greedy algorithm design.

If you picked your starting airport as (B). Then you'd look for the cheapest flight from (B) to any airport that doesn't have any incoming flight (Z). Now, your "pointer" is at (Z). So you look for the cheapest flight from (Z) and build that (D). And continue on that way. So your path-ways are very well defined based on the starting location.

The greedy algorithm has no knowledge of looking ahead and saying "well, if I flew to (Y) now it's going to cost a bit more, but I can get a much cheaper flight to (X) afterwards for a lower score". Hence the name "greedy"

To optimise a function, you want to generate candidate parameters or path and then run it against a likelihood function to get a score. In your case the base score is the total cost of flights. Then you can apply penalties or bonuses based on other criteria.

e.g. You could take the top 10% flights that cost the most, and apply penalty values if any of these were used. You could also apply bonuses if the cheapest 10% were used.

However, this wouldn't work with the greedy algorithm and you'd be better off using something more adaptive (genetic, differential evolution, collective intelligence etc). Personally, I'd go with a Differential evolution style and have quite a large population (20-50) that would give me a really good solution.
For the likelihood function, so this is just something you would run on a per- iteration-basis? Would you just keep a running vector containing the score for each iteration and then look back at this as a reference (i.e. find the lowest ?)

Thanks for the help Zaita. Right now it is back to the drawing board - I will definitely take and incorporate some of your advice.

The differential evolution style is well over my head i think. It is safe to say that you probably have sick skills/intelligence to know how to implement those methods.
Last edited on
For the likelihood function, so this is just something you would run on an iteration-basis? Would you just keep a running vector containing the score for each iteration and then look back at this as a reference (i.e. find the lowest ?)


It's run after each iteration. What you do with the score and results is upto you. Probably ideal to keep a running copy of the best solution/score.

Thanks for the help Zaita. Right now it is back to the drawing board - I will definitely take and incorporate some of your advice.

No worries. As mentioned though, you can use the greedy algorithm if you wished. But the results will be fairly simple and use a best-first approach.

The differential evolution style is well over my head i think.

Differential evolution is fairly simple.
http://www.icsi.berkeley.edu/~storn/code.html

It is safe to say that you probably have sick skills/intelligence to know how to implement those methods.

I know a lot more now than I did 12 months ago about function minimization. I've had to implement and improve a couple of different methods for a very complicated model :)
Topic archived. No new replies allowed.