Removing degree-2 vertices from a graph
Consider a map of a river system. Each point on the river line is a graph vertex. Some points have degree 1, at the start and end of a river, some have degree > 2 where rivers merge (or more rarely, split) and a load of vertex points will have degree 2 as a river winds along. This graph is generated from the spatial coordinates of the river.
However, the degree 2 vertices are not relevant to the topology of the river system, so I want to remove them. I'll end up with a kind of 'skeleton' graph, with just vertices of degree 1 or >2 connected but without the degree-2 vertices.
Does this transformation have a name? I can't find anything from a bit of research, but it can sometimes be hard to get through all the graph theory jargon (cliques, induced subgraphs, etc etc).
I'm using R and the igraph package. I have coded up an iterative process that repeatedly removes a single degree-2 node and joins its neighbours until no degree-2 vertices are left, but I wonder if there's a trick I'm missing.
Since I want this to be more general than just river networks (which are likely to be but not necessarily be trees) I'll have to think about what to do with a circular graph where all vertices are degree-2. I think I want to end up with one arbitrary vertex with an edge to itself.
Anyway, maybe someone will know exactly what this is...
There seems to be multiple names for this operation.
If one has two edges $xy$ and $yz$, the operation of removing $xy$ and $yz$ while adding $xz$ is called lifting the edges $\{xy,yz\}$. Thus one can repeatedly lift edges adjacent to vertices of degree 2.
This is also referred to as smoothing here, where smoothing is described in the same way. Thus one could define the operation of smoothing a graph, as repeatedly smoothing the vertices of degree 2.
My incremental method was slow, so I came up with an algorithm that works slightly faster.
Start with graph G
Take the subgraph of G of all degree=2 vertices. This has to be a disconnected set of simple chains (ie two degree=1 vertices connected by zero or more degree=2 vertices) or single (degree-zero) vertices
-
For each of the chains,
find the neighbors of the endpoints of the chain from G (this relies on graph ID values being propagated in the subgraph process)
add a new edge to G connecting those endpoints
delete all vertices in G in the chain
Note the speed improvement comes from deleting potentially many vertices at once and making fewer new edges. Its not really an algorithmic improvement.
Still need to think about how to handle loops and multiple edges and so forth, but at the risk of turning this into a programming question and having it kicked to StackOverflow I'll stop now.