Visualizing Undirected Graph That's Too Large for GraphViz? [closed]
I need advice for rendering an undirected graph with 178,000 nodes and 500,000 edges. I've tried Neato, Tulip, and Cytoscape. Neato doesn't even come remotely close, and Tulip and Cytoscape claim they can handle it but don't seem to be able to. (Tulip does nothing and Cytoscape claims to be working, and then just stops.)
I'd just like a vector format file (ps or pdf) with a remotely reasonable layout of the nodes.
Solution 1:
Graphviz itself provides a solution for rendering large graphs.
Namely, Graphviz includes sfdp
, a multiscale version of fdp (also in graphviz, similar to neato) for the layout of large undirected graphs which has been useful for drawing large graphs (70k nodes, 500k edges) in my project.
You can find documentation for this software on the graphviz web site itself at http://www.graphviz.org/
For more information, here is Efficient and high quality force-directed graph drawing, a paper by Yifan Hu describing the underlying techniques and examples: http://yifanhu.net/PUB/graph_draw_small.pdf
And a web archive version: https://web.archive.org/web/20210812011222/http://yifanhu.net/PUB/graph_draw.pdf
Solution 2:
I suggest that you first do some preprocessing of the data, for example collapsing nodes to clusters and then visualizing the clusters. Collapsing will reduce the number of nodes and makes it easier for algorithms such as Kamada-Kawai or Fruchterman-Reingold to render the resulting graph.
If you really need to visualize 500.000 nodes then can you consider using a simple circular layout. This will be easy to render without the issues that force-based algorithms have. Take a look at Circos: http://mkweb.bcgsc.ca/circos/
Circos is graph visualization developed by bio-informatics people which is tailored to visualize genomes and other extremely large and complex data-sets.
It's a PERL based package, I hope that's not problematic.
Solution 3:
I've had good results using the graph-tool library in python. The below graph has 1,490 nodes and 19,090 edges - it took around 5min to render on my laptop.
The graph data comes from the political blogging network described by Adamic and Glance in “The political blogosphere and the 2004 US Election” pdf link here. If you zoom in you can see the blog urls for each node.
Here's the code I used to draw it (blog http://ryancompton.net/2014/10/22/stochastic-block-model-based-edge-bundles-in-graph-tool/ ):
import graph_tool.all as gt
import math
g = gt.collection.data["polblogs"] # http://www2.scedu.unibo.it/roversi/SocioNet/AdamicGlanceBlogWWW.pdf
print(g.num_vertices(), g.num_edges())
#reduce to only connected nodes
g = gt.GraphView(g,vfilt=lambda v: (v.out_degree() > 0) and (v.in_degree() > 0) )
g.purge_vertices()
print(g.num_vertices(), g.num_edges())
#use 1->Republican, 2->Democrat
red_blue_map = {1:(1,0,0,1),0:(0,0,1,1)}
plot_color = g.new_vertex_property('vector<double>')
g.vertex_properties['plot_color'] = plot_color
for v in g.vertices():
plot_color[v] = red_blue_map[g.vertex_properties['value'][v]]
#edge colors
alpha=0.15
edge_color = g.new_edge_property('vector<double>')
g.edge_properties['edge_color']=edge_color
for e in g.edges():
if plot_color[e.source()] != plot_color[e.target()]:
if plot_color[e.source()] == (0,0,1,1):
#orange on dem -> rep
edge_color[e] = (255.0/255.0, 102/255.0, 0/255.0, alpha)
else:
edge_color[e] = (102.0/255.0, 51/255.0, 153/255.0, alpha)
#red on rep-rep edges
elif plot_color[e.source()] == (1,0,0,1):
edge_color[e] = (1,0,0, alpha)
#blue on dem-dem edges
else:
edge_color[e] = (0,0,1, alpha)
state = gt.minimize_nested_blockmodel_dl(g, deg_corr=True)
bstack = state.get_bstack()
t = gt.get_hierarchy_tree(bstack)[0]
tpos = pos = gt.radial_tree_layout(t, t.vertex(t.num_vertices() - 1), weighted=True)
cts = gt.get_hierarchy_control_points(g, t, tpos)
pos = g.own_property(tpos)
b = bstack[0].vp["b"]
#labels
text_rot = g.new_vertex_property('double')
g.vertex_properties['text_rot'] = text_rot
for v in g.vertices():
if pos[v][0] >0:
text_rot[v] = math.atan(pos[v][1]/pos[v][0])
else:
text_rot[v] = math.pi + math.atan(pos[v][1]/pos[v][0])
gt.graph_draw(g, pos=pos, vertex_fill_color=g.vertex_properties['plot_color'],
vertex_color=g.vertex_properties['plot_color'],
edge_control_points=cts,
vertex_size=10,
vertex_text=g.vertex_properties['label'],
vertex_text_rotation=g.vertex_properties['text_rot'],
vertex_text_position=1,
vertex_font_size=9,
edge_color=g.edge_properties['edge_color'],
vertex_anchor=0,
bg_color=[0,0,0,1],
output_size=[4024,4024],
output='polblogs_blockmodel.png')
Solution 4:
Mathematica could very likely handle it, but I have to admit my first reaction was along the lines of the comment that said "take a piece of paper and color it black." Is there no way to reduce the density of the graph?
A possible issue is that you seem to be looking for layout, not just rendering. I have no knowledge about the Big O characteristics of the layouts implemented by various tools, but intuitively I would guess that it might take a long time to lay out that much data.