Outputting the shortest path in a Directed Acyclic Graph

Can someone explain how I can modify the code below to actually output the shortest paths (using the topological sort, and hence in O(V+E) time)? The code was rewritten by me after studying the code from http://www.geeksforgeeks.org/shortest-path-for-directed-acyclic-graphs/
But the website does not explain how to actually print out the shortest paths, only the shortest distances. I can't seem to find that anywhere, and sadly I cannot figure it out from the code below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
#include <iostream>
#include <list>
#include <array>
#include <stack>
#include <limits>

const int INFINITY = std::numeric_limits<int>::max();
 
// A graph is represented using an adjacency list. Every node of an adjacency list contains a vertex number of the vertex to which
// edge it connects. It also contains the weight of the edge.
class AdjListNode {
	private:
	    int vertexNumber;
	    int weight;
	public:
	    AdjListNode (int v, int w) : vertexNumber(v), weight(w) {}
	    int getVertexNumber() const {return vertexNumber;}
	    int getWeight() const {return weight;}
};
 
// Class to represent a graph using adjacency list representation.
template <std::size_t N>
class Graph {
	private:
		std::array<std::list<AdjListNode>, N> adj;  // adj[i] is the list of all AdjListNodes connected to vertex i as the sourde of the directed edge.
	public:
    	void addEdge (int u, int v, int weight) {adj[u].push_back(AdjListNode(v, weight));}
    	std::array<int, N> shortestPath (int) const;
    	static int getNumVertices() {return N;}
    private:
    	void topologicalSortHelper(int, std::array<bool, N>&, std::stack<int>&) const;
};
 
// A recursive function used by Graph<N>::shortestPath(int).
template <std::size_t N>
void Graph<N>::topologicalSortHelper (int v, std::array<bool, N>& visited, std::stack<int>& stack) const {
    visited[v] = true;  // Mark the current node as visited.
    for (const AdjListNode& x : adj[v]) {  // Recursive call for all the vertices adjacent to this vertex.
        if (!visited[x.getVertexNumber()])
            topologicalSortHelper(x.getVertexNumber(), visited, stack);
    }
    stack.push(v);  // 'stack' will contain a topological sort of the directed acycylic graph.
}
 
// Function to find the shortest paths from a given vertex. It uses recursive topologicalSortHelper() to get the topological sorting of the given graph.
template <std::size_t N>
std::array<int, N> Graph<N>::shortestPath (int source) const {
    std::stack<int> stack;
    std::array<bool, N> visited = {false};  // Mark all the vertices as not visited.
    
    // Call the recursive helper function to store the Topological Sort starting from all vertices one by one.
    for (int i = 0; i < N; i++)
        if (!visited[i])
            topologicalSortHelper(i, visited, stack);

    // Initialize distances to all vertices as infinite and distance to source as 0.
    std::array<int, N> distance;
    distance.fill(INFINITY);
    distance[source] = 0;
 
    // Process vertices in topological order.
    while (stack.empty() == false) {
        int u = stack.top();  // Get the next vertex from the topological order.
        stack.pop();
        // Update distances of all adjacent vertices.
        if (distance[u] != INFINITY) {
		for (const AdjListNode& x : adj[u])
             	    if (distance[x.getVertexNumber()] > distance[u] + x.getWeight())
                	distance[x.getVertexNumber()] = distance[u] + x.getWeight();
        }
    }
    return distance;
}
 
int main() {
    Graph<6> g;
    g.addEdge(0,1,5);
    g.addEdge(0,2,3);
    g.addEdge(1,3,6);
    g.addEdge(1,2,2);
    g.addEdge(2,4,4);
    g.addEdge(2,5,2);
    g.addEdge(2,3,7);
    g.addEdge(3,4,-1);
    g.addEdge(4,5,-2);
 
    const int s = 1;
    const auto& shortestDistances = g.shortestPath(s);
    std::cout << "The following are the shortest distances from source " << s << ":\n";
    for (int i = 0; i < g.getNumVertices(); i++)
        (shortestDistances[i] == INFINITY) ? std::cout << "INF " : std::cout << shortestDistances[i] << " ";
}
Last edited on
Here is a diagram of the above graph, with 0=r, 1=s, 2=t, 3=x, 4=y, 5=z. The graph has been linearized by the topological order and the shortest distances to each vertex updated in each step. But I cannot figure out the algorithm for printing the shortest paths themselves.

[IMG]http://i58.tinypic.com/2mev5hj.jpg[/IMG]

This is what I'm trying at the moment:
1
2
3
4
if (distance[x.getVertexNumber()] > distance[u] + x.getWeight()) {
    distance[x.getVertexNumber()] = distance[u] + x.getWeight();
    parent[x.getVertexNumber()] = u;  // *** Added
}

then figure out the shortest path from the 'parent' array somehow. If I'm off track here, or someone knows a better idea, please let me know. Or perhaps 'parent' needs to be a 2D-array, since the shortest paths from the source to each vertex is sought here.
Last edited on
Ok, the 'parent' array, kept as 1-dimensional, seems to be working when I print out the output from:

1
2
3
4
5
6
7
8
9
10
11
    // Compute the shortest path to each vertex from the 'parent' array.
    std::array<std::list<int>, N> shortestPath;
    shortestPath[source] = {source};
    for (int destination = 0; destination < N; destination++) {
    	if (parent[destination] == NO_PARENT) continue;
    	int node = destination;
    	do {
	    	shortestPath[destination].push_front(node);
    		node = parent[node];
    	} while (node != NO_PARENT);
    }


I mean the print-out of 'shortestPath' agrees with the answer that I get when I work out the diagram on paper:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
The following are the shortest distances from source 1:
INF 0 2 6 5 3

The parent of 0 is -1.
The parent of 1 is -1.
The parent of 2 is 1.
The parent of 3 is 1.
The parent of 4 is 3.
The parent of 5 is 4.

Destination 0: Not possible
Destination 1: No path required
Destination 2: 1->2
Destination 3: 1->3
Destination 4: 1->3->4
Destination 5: 1->3->4->5


But I don't know if this will always be correct. How do I go about proving this? If the algorithm is correct, I do think it needs proof in order to be complete.
Last edited on
It is always correct, though you should keep in mind that it may not necessarily be the only shortest path s⤳v.

For proof it helps to reduce down to the simplest possible considerations.

If you have s→u as the shortest path, that means there is no other edge s→u that is shorter. (This is pretty much given if your DAG does not permit multiple edges for s→u.)

Now we add s→u→v. If there is any path s⤳v shorter than s→u→v, then s→u→v is not a shortest path s⤳v, meaning u cannot be the parent of v in s⤳v.

Make sense?
If you have s→u as the shortest path, that means there is no other edge s→u that is shorter. (This is pretty much given if your DAG does not permit multiple edges for s→u.)

Now we add s→u→v. If there is any path s⤳v shorter than s→u→v, then s→u→v is not a shortest path s⤳v, meaning u cannot be the parent of v in s⤳v.


Why can't s→w→v be a shortest path s⤳v for some other node w than u? My algorithm above computed that u is parent of v, but does not specify for which destination x is u a parent of v in s⤳x. That's why I was skeptical when I kept the 'parent' array one-dimensional and was surprised that it gave me the correct answer anyway (for the above specific DAG). If u is a parent of v for any destination x for s, how to go about proving that then?
Last edited on
You don't yet realize it, but you are changing the rules in the middle of the proof.

If s→w→v is a shorter path than s→u→v, then s⤳v must include w and exclude u: w is the parent of v for s⤳v. This cannot change. (Why? Because changing it to anything but w causes it to no longer be the shortest path s⤳v.)


You are confusing yourself by confounding some other s⤳x with s⤳v⤳x. It does not matter that v⤳x is not yet known.

First, if you try to replace the shortest path s⤳v with some other s⤳v, then s⤳x can no longer be the shortest path that includes v, because the shortest path s⤳v includes v by definition. (In other words, s⤳v⤳x always includes v, and you cannot change s⤳v to not be the shortest path without also changing s⤳x to not be the shortest path.)

Second, it is possible that the algorithm will, at some point, find a shorter path s⤳x that excludes v. If that happens, then v and v's parent (w) are no longer part of the shortest path s⤳x.

It might help to draw yourself a few graphs to play with.

Hope this helps. Remember, proofs often work by recognizing the relationship between the simplest case and the whole. (And that's often the hard thing to see.)
Ok, I think I've come up with a formal proof of the correctness of the algorithm, after reading Duoas' reasoning:

Theorem:
Let s be a source node in a directed graph, and let x be a node such that a shortest path from s to some destination point goes through x. If u is the parent node of x along this path, then for any node that is reachable from s through x, there exists a shortest path from s to that node going through x such that u is a parent of x.

Proof:
Let s be a source node in a directed graph, and suppose d is a destination node in the graph such that a shortest path k from the source node s to d goes though node x, where x has parent node u in k. Let e be any node such that that there exists a path from s through x to e. We wish to prove that there exists a shortest path from s through x to e such that u is the parent node of x along that path. Suppose no such path exists. Then for any shortest path p from s to e through x, we have some vertex w != u that is the parent node of x along p. Let q be the path from s to u to x along path k, let r be the path from s to w to x along path p, let t be the path from x to e that ends off the path p, and let y be the path from x to d that ends off the path k.
The distance along path r must be strictly less than the distance along path q, because otherwise the path from s to e along path q followed by the path t would be a shorter path than p that also goes from s to x to e (with u as parent of x by definition of q), contradicting the assumption that no shortest path from p to e through x exists with u as parent of x. But if the distance along path r is strictly less than the distance along path q, then the path from s to d consisting of path r followed by path y is a shorter path that goes through x than the path k, since the path k is path q followed also by path y. This contradicts the hypothesis that k is a shortest path from s to d through x. Hence we get a contradiction still. Thus there must exist a shortest path from x to e through x such that u is the parent node of x along that path. Since e was an arbitrarily chosen destination node from s, this completes the proof.

Note that the acyclic property of a directed acyclic graph was not needed in this proof, and thus this proof holds for any directed graph.
Last edited on
Actually, proving this theorem still does not prove that my algorithm above is correct. The algorithm

1
2
3
4
5
6
7
8
9
10
11
    // Compute the shortest path to each vertex from the 'parent' array.
    std::array<std::list<int>, N> shortestPath;
    shortestPath[source] = {source};
    for (int destination = 0; destination < N; destination++) {
    	if (parent[destination] == NO_PARENT) continue;
    	int node = destination;
    	do {
	    	shortestPath[destination].push_front(node);  // (*)
    		node = parent[node];
    	} while (node != NO_PARENT);
    }

uses the above theorem repeatedly for each 'shortestPath' element. But the theorem does not prove that all shortest paths through x MUST have u as the parent of x, only that THERE EXISTS a shortest path through x with u as the parent of x.
I still need to prove that if the theorem is applied repeatedly for every node on a path from s through x to any destination node, then that path is still truly a shortest path. I think mathematical induction is needed for this next proof.

Update:
I think I got it. But the proof looks strange, because I've never given a proof that uses direct text from a code before.

Suppose 'shortestPath[destination]' in line (*) above, right before the push_front action has the elements
shortestPath[destination] = {node_n, node_(n-1_), ..., node_1, destination}.
The last element must be 'destination' since 'node' is initiatized with value 'destination'. We wish to prove that
node_n -> node_(n-1_) -> ... -> node_1 -> destination
is the ending portion of a shortest path from 'source' to 'destination' for any n. If shortestPath[destination] = {destination} only, then it obviously true. So let us assume that for some n >= 1,
node_n -> node_(n-1_) -> ... -> node_1 -> destination
is the ending portion of a shortest path from 'source' to 'destination'. If node_n = 'source', then we have a shortest path from 'source' to 'destination' and we are done, so assume node_n != 'source'.
The next iteration of the loop containing (*) will result in
shortestPath[destination] = {parent[node_n], node_n, node_(n-1_), ..., node_1, destination}.
Since parent[node_n] is a parent of node_n for some shortest path from 'source' through node_n to some destination node (which may or may not be 'destination' itself), then by the theorem above,
there exists a shortest path from 'source' to 'destination' through node_n with parent parent[node_n] along that path. Thus
{parent[node_n], node_n, node_(n-1_), ..., node_1, destination}
is indeed the ending portion of a shortest path from 'source' to 'destination', completing the inductive step. Thus by the principal of mathematical induction, shortestPath[destination] will always contain elements {node_n, node_(n-1_), ..., node_1, destination} such that
node_n -> node_(n-1_) -> ... -> node_1 -> destination
is the ending portion of a shortest path from 'source' to 'destination'. Then when eventually node_n = 'source', we have found a shortest path from 'source' to 'destination', proving the correctness of the algorithm.



Last edited on
You've got the general idea, but you are blowing this up into something waaay too complicated.

I've already given you the basic proof.

The algorithm is linked to the proof by providing invariants at each step of the algorithm: show that the algorithm maintains those invariants (before and after each step) and you show that your algorithm is correct.

Hope this helps.
The algorithm is linked to the proof by providing invariants at each step of the algorithm: show that the algorithm maintains those invariants (before and after each step) and you show that your algorithm is correct.


Yes. Your idea provides a quicker proof, though for the proof to be rigorous I still need to provide plenty of detail:

Shorter second proof (addresses the algorithm directly without proving any formal theorem):
Suppose at some stage in the loop, shortestPath[destination] has elements {node_n, node_(n-1), ..., node_1, destination} such that
node_n -> node_(n-1) -> ... -> node_1 -> destination
is the ending portion of a shortest path from 'source' to 'destination' (base case: when shortestPath[destination] = {destination} only, then it is obviously true). The next iteration of the loop gives
shortestPath[destination] = {parent[node_n], node_n, node_(n-1), ..., node_1, destination}.
Suppose
parent[node_n] -> node_n -> node_(n-1) -> ... -> node_1 -> destination
is NOT the ending portion of a shortest path from 'source' to 'destination'. Then there exists a shortest path p from 'source' to 'destination' through node_n with z as parent node of node_n, where
z != parent[node_n]. By definition, parent[node_n] is the parent node of node_n along some shortest path q from 'source' through node_n to some destination node d. Let r the sub-path of q
from 'source' to node_n, let s be the sub-path of q from node_n to d, let t be the sub-path of p from 'source' to node_n, and let y be the sub_path of p from node_n to 'destination'.
Denote distance(y) to be the distance along a path y, and denote xUy to be the union of two paths x and y, given that x ends where y begins. We claim that distance(r) <= distance(t).
Suppose instead that distance(r) > distance(t). Then since both t and r are paths that go from 'source' to node_n, and s starts at node_n (and ends at d), we have
distance(tUs) = distance(t) + distance(s) < distance(r) + distance(s) = distance(rUs) = distance(q).
Thus tUs is a shorter path than q. But tUs is a path from 'source' to d though node_n, contradicting the fact that q is a shortest path such path. Thus we must have distance(r) <= distance(t).
But this means that
distance(rUy) = distance(r) + distance(y) <= distance(t) + distance(y) = distance(tUy) = distance(p).
Since rUy is a path from 'source' to 'destination' through node_n, and p is a shortest such path, then the above must be strict equality, which means that rUy is also a shortest path from
'source' to 'destination' through node_n. But rUY has parent[node_n] as parent of node_n, contradicting the assumption that
parent[node_n] -> node_n -> node_(n-1) -> ... -> node_1 -> destination
is not the ending portion of a shortest path from 'source' to 'destination'. Hence that assumption cannot be true, which means that
shortestPath[destination] = {parent[node_n], node_n, node_(n-1), ..., node_1, destination}
given by the next iteration of the loop is also the ending portion of a shortest path from 'source' to 'destination', completing the inductive step. Hence every iteration of the loop creates
a longer ending portion of a shortest path from 'source' to 'destination', and when the loop exits with 'source' as the last parent (this must always be possible otherwise parent[node_n] could not exist
to begin with), we finally get a shortest path from 'source' to 'destination', completing the proof of the algorithm's correctness.
Last edited on
Uh, no. I'm not sure you quite understand how to formulate the proof correctly.

The proof should be short and easy to follow, using no more than three or four node names absolute maximum. The algorithm is validated ("proved") by showing that at each step of the algorithm the invariants of the proof are maintained. [As an aid, you should be able to express it as a numbered list.]


Before you spend more time on this, you should consider the following:

- Is this effort really necessary? That is, is it a significant part of the grade, if required?

- If it is necessary, what does your professor expect from you? Do you have example proofs, either from your professor's lectures or from your readings?

- Assuming the first two, have you asked your professor for help? (With the difficulty you are having, you really should.)

Let us know what you need.
Topic archived. No new replies allowed.