Directed Graphs

Jul 13, 2012 at 7:06am

Any ideas on data structure to represent directed graphs which is efficient on time?
This can be implemented using adjacency lists (as linked lists) but when the nodes and edges are in lakhs, searching in the list is a kill. Can we make it better?
Jul 13, 2012 at 7:41am
can B or B+ tree be useful because searching in that is easier regardless of numbers of nodes and edges.
Jul 13, 2012 at 7:56am
Graphs can be represented as trees if they have minimum number of edges, which can be (n-1) but its not necessary because they can have cycles, the edges can be directed as well as undirected. They can have parallel edges.

Typical issue like connectivity, min-max-flows, strong connections may not be represented using a tree. Or is it possible ?
Jul 13, 2012 at 8:59am
It depends on the density of the graph. If it's very sparse, use an adjacency list. O(n') is very fast if n' <<< n and every list has a roughly equal amount of outgoing arcs. If it's very dense, use an adjacency matrix. O(1) lookups for O(n²) memory. Very efficient if it's near-complete.

When the graph is rather sparse, but there's a very big variance in list sizes (some have 0 outgoing arcs, some have nearly n), you could try switching out the lists for a search tree. O(log n') can make a difference, even if trees have a bit more overhead compared to linked lists.

The answer doesn't change much if your arcs are weighted. For a linked list/tree implementation, save pairs <destination, weight>; for a matrix implementation, just save the value instead of a bool and specify an error-value for non-existing arcs (BIGM, -1, whatever is useful; be careful for sign errors and overflows if you plan to sum different weights without checking first!).
Jul 13, 2012 at 3:04pm
> but when the nodes and edges are in lakhs, searching in the list is a kill.
> Can we make it better?

For a production program, consider using BGL
http://www.boost.org/doc/libs/1_50_0/libs/graph/doc/index.html

For educational purposes, I've found that a std::map<> or std::unordered_map<> with a std::vector<> to hold the adjacent nodes reasonably efficient in terms of both programmer time and processor time.

An untested toy example using this technique (posted elsewhere sometime back):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
#include <map>
#include <vector>
#include <set>
#include <iostream>
#include <stack>
#include <queue>
#include <cassert>
#include <string>
#include <sstream>

template< typename T > using adjacency_list_type = std::map< T, std::vector<T> > ;

typedef int node_type ; // for exposition
typedef adjacency_list_type<node_type> adjacency_list ;
// input of pairs of adjacent vertices of a directed graph
// 1 7 => edge from 1 -------> 7
// 2 7 => edge from 2 -------> 7
// 1 5 => edge from 1 -------> 5
// etc.

adjacency_list create( std::istream& stm )
{
    adjacency_list graph ;
    node_type a, b ;
    while( stm >> a >> b ) { graph[a].push_back(b) ; graph[b] ; }
    return graph ;
}

bool topological_sort( const adjacency_list& graph )
{
    std::map< node_type, int > num_pred_map ;
    for( const auto& pair : graph )
    {
         num_pred_map[pair.first] ;
         for( auto n : pair.second ) ++num_pred_map[n] ;
    }
    while( !num_pred_map.empty() )
    {
        bool cyclic = true ;
        for( auto iter = num_pred_map.begin() ; iter != num_pred_map.end() ;  )
        {
            if( iter->second == 0 )
            {
                std::cout << iter->first << ' ' ;
                for( auto& v : graph.find(iter->first)->second )
                         --num_pred_map[v] ;
                iter = num_pred_map.erase(iter) ;
                cyclic = false ;
            }
            else ++iter ;
        }
        if(cyclic)
        {
            std::cerr << "graph is cyclic - nodes " ;
            for( auto& pair : num_pred_map ) std::cout << pair.first << ' ' ;
            std::cout << '\n' ;
            return false ;
        }
    }
    std::cout << '\n' ;
    return true ;
}

template< typename STACK_OR_QUEUE, typename NEXT_FN >
bool graph_search( const adjacency_list& graph, node_type start, node_type target,
                    NEXT_FN next )
{
    STACK_OR_QUEUE stk_or_queue ;
    std::set<node_type> visited ;
    stk_or_queue.push(start) ;
    while( !stk_or_queue.empty() )
    {
        node_type id = next(stk_or_queue) ;
        std::cout << id << ' ' ;
        visited.insert(id) ;
        if( id == target ) { std::cout << " found target\n" ; return true ; }
        else
        {
            stk_or_queue.pop() ;
            auto iter = graph.find(id) ;
            if( iter != graph.end() )
                for( auto& n : iter->second )
                    if( visited.find(n)==visited.end() ) stk_or_queue.push(n) ;
        }
    }
    std::cout << "  could not find target\n" ;
    return false ;
}

bool depth_first_search( const adjacency_list& graph, node_type start, node_type target )
{
    return graph_search< std::stack<node_type> >( graph, start, target,
                        []( const std::stack<node_type>& stk ) { return stk.top() ; } ) ;
}

bool breadth_first_search( const adjacency_list& graph, node_type start, node_type target )
{
    return graph_search< std::queue<node_type> >( graph, start, target,
                        []( const std::queue<node_type>& q ) { return q.front() ; } ) ;
}

int main() // trivial test driver
{
    const std::string str = "1 2   1 7   1 8   2 3   2 6   2 10  3 4   3 5   3 11   "
                      "6 10   6 12  8 9   8 12   9 10   9 11   11 7   12 5" ;
    std::istringstream stm(str) ;
    adjacency_list graph = create(stm) ;

    std::cout << "top sort => " ; topological_sort(graph) ;
    std::cout << "DFS 1 to 5 => " ; depth_first_search( graph, 1, 5 ) ;
    std::cout << "BFS 1 to 5 => " ; breadth_first_search( graph, 1, 5 ) ;
    std::cout << "DFS 1 to 6 => " ; depth_first_search( graph, 1, 6 ) ;
    std::cout << "BFS 1 to 6 => " ; breadth_first_search( graph, 1, 6 ) ;
}
Jul 13, 2012 at 8:02pm
Its not related to production or any office work. This is just my leisure activity. So using a library or STL is of no point.

Gaminic, I am more towards adj lists because matrix will have huge times in some cases like quadratic for searching and for some problems it can be as high as cubic. At the moment I am looking for a right data structure because implementation would not be an issue.

JLBorges, lets go with your example:
1
2
// 1 7 => edge from 1 -------> 7
// 2 7 => edge from 2 -------> 7 


Lets say we use adj list and have something like above. For DFS, lets say I visit node 7 from 1. Now I have to mark it as visited, but in adj lists, 7 will be present at multiple locations as above and so I have to search 7 in all the lists to make all the instances as marked. Otherwise, when I visite 7 from 2, it will be marked as not visited but in fact its visited from 1.
I don't know how much time it would take.

I am trying to implement kosaraju's algo for finding strongly connected components in a directed graph and am finding adj lists to be taking a lot of time when the nodes are lets say 10lakh and edges are lets say 50lakh. Some sort of data structure where duplicate nodes are not present is required. Also needs to be taken care is that graph doesn't take much time in creating itself.

Edit: some problems like finding min-cuts in a graph is based on randomization and so the algo needs to run many times to get correct answer. If the graph DS is bad, even 1000 runs of the algo to find min-cut would take very long time.

Last edited on Jul 13, 2012 at 8:06pm
Jul 14, 2012 at 8:11am
> For DFS, lets say I visit node 7 from 1. Now I have to mark it as visited,
> but in adj lists, 7 will be present at multiple locations as above
> and so I have to search 7 in all the lists to make all the instances as marked.
> Otherwise, when I visite 7 from 2, it will be marked as not visited but in fact its visited from 1.

Wouldn't you be using an auxiliary set (BST or hash table) of visited node ids?

For instance, in the example I'd posted earlier:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
template< typename STACK_OR_QUEUE, typename NEXT_FN >
bool graph_search( const adjacency_list& graph, node_type start, node_type target, NEXT_FN next )
{
    STACK_OR_QUEUE stk_or_queue ;
    std::set<node_type> visited ;
    
    // ...
        node_type id = next(stk_or_queue) ;
        // ...
        visited.insert(id) ;
            // ...
            stk_or_queue.pop() ;
            auto iter = graph.find(id) ;
            // ...
                for( auto& n : iter->second )
                    if( visited.find(n)==visited.end() ) stk_or_queue.push(n) ;
    // ...
}

Jul 16, 2012 at 2:43pm

This is what I did in the end. :-)
To search a linked list i used a hash map as an external DS and used a pre-allocated vector for visited nodes.

But still struggling to remove the duplicates. Memory becomes critical when the graph size is huge and small memory makes a lot of difference.

But thanks for looking at it. :)

Edit:
I was actually looking at something like this:

1
2
3
4
5
6
7
typedef struct vertex
{
	int data;
	bool visited;
	struct vertex *next;
	struct vertex **adj_list;
} vertex;


now the adj_list will just point to vertices and there will not be any copy or duplictes. So if I have:
1->3->2
2->3
3->2
now instead of creating new nodes for adj_list, i will just point to the vertices. So 1 will point to 2 and 3. its easy to traverse edges with less memory. What do you think?
Last edited on Jul 16, 2012 at 2:49pm
Jul 18, 2012 at 2:28pm
> used a pre-allocated vector for visited nodes.
> But still struggling to remove the duplicates.


Use a std::set<> / std::unordered_set<> instead?


> Memory becomes critical when the graph size is huge
> ...
> now the adj_list will just point to vertices and there will not be any copy or duplictes.

You could just store pointers to vertices in the hash map and set.
Last edited on Jul 18, 2012 at 2:32pm
Jul 18, 2012 at 5:48pm
Finally its finished (kosaraju's algo for finding strongly connected components). Although not very optimized but still better.
for a total of ~1million nodes and 5 million edges, took around 75secs. This includes creating the graph two times, once the original and second time reversing the graph and traversing both and lookup of nodes. I think it can be done under 30s.

Thanks for your tips JLBorges. :-)
Topic archived. No new replies allowed.