Kosaraju’s Algorithm for Strongly Connected Components

This video covers strongly connected components
in directed graphs. You should already understand graph basics
including depth first search, as well as BOTH topological sorting
algorithms. After
introducing strongly connected components, I will try to show some
intuition of where a linear time algorithm comes from and explain why
it works, all with a running example. The components of an undirected graph are
the different connected pieces of the graph. If there is any path from one vertex to another,
they are in the same component. If the graph is connected, it has
just one component. One easy way to partition the graph into its
separate components is to run depth first search on the entire graph,
and each time that a top level call finds a previously undiscovered
vertex, that vertex’s DFS will discover its entire component. Here, A
discovers D, F, and K, K indirectly through F, and that is the first
component found. Top level searches for B and C discover their
components, but D is already discovered, so its top level search will
just exit right away so that we can move on to E. When depth first
search finishes, we have some tree and back edges, and each of the
trees in the depth first search forest makes one component. If a graph is directed, how do we define components? Here, if A has a
path to G, but G has no path back, are they in the same component? We
say that that two vertices are strongly connected if each has a path
to the other, so they lie on some cycle with each other. Vertices in
a directed graph are partitioned into strongly connected components:
each component is a subset of vertices that can all reach each other. Each vertex will be in exactly one component,
along with the vertices that it can reach and that can reach it. Notice here, it’s clear
that A and D are in the same component, and so are A and F. Because
of that, D and F have to be in the same component: they can each reach
the other through their paths to and from A. If a vertex isn’t in
any cycle? Even if it has edges in and out of it, it’s
all alone in its own component. Here, I is all alone. I is so very, very lonely. Our goal is to find strongly connected components
from a graph, ideally with an efficient algorithm. If we don’t care about
efficiency, it is easy: grab a vertex like A, see what it can reach
maybe with depth first search. Then, if you reverse the graph and
search the same vertex A again, you can see what can reach it in the
original graph. By superimposing one result onto the other,
we can see the intersection of those two sets, the
vertices that A can reach and that can reach A. That is A’s strongly
connected component, A, D, F, and K here. You could do the same thing for every vertex
in the graph, but you only need do it once per component,
because once you know D is in A’s component, you don’t need
to repeat that work on D. How long that algorithm takes depends on how
you implement it, but it would be pretty easy to find one component
in time proportional to the size of the graph, so if there are X strongly
connected components, that would take runtime X times V plus E. If we want better efficiency, let’s stop to
really think about the strongly connected components. I’ll collapse all vertices from each
component into one super vertex, and draw edges between super vertices
if there are any edges between their corresponding sets of vertices. That new graph, the underlying component graph,
must be acyclic: if it had a cycle, all vertices from the cycle’s
corresponding vertices in the original graph could all reach each other,
through some combination of intra and inter component edges,
and they would all collapse into one component. If the original graph is acyclic, the
underlying component graph looks just like the original graph. Every
vertex is its own component. The two topological order algorithms discussed
in previous videos each found topological orders either from the beginning
or the end of the ordering. Those seemed like easier places to start. Is it easier to
find strongly connected components if we do it by topological order
for the underlying component graph? We started this video by finding components
in an undirected graph with depth first search. There, we didn’t need to intersect that set
with vertices with from those of a second search. Well, in this directed
graph, imagine if you could start a search on a vertex in the
topologically last component, maybe C here. You would discover vertex
C’s strongly connected component, in time linear in the number of
vertices and edges in that component. You wouldn’t need to reverse
the graph, do another search, and find the intersection, you would
just discover the component and be done with it. Because C’s
component is topologically last the search can’t escape it, so it
would be nice to search it first, and mark it as done. If we
topologically order the underlying component graph, A’s component
would be second to last, just before C’s component. After we have
marked C’s component, it would be great if we could next search
something from A’s component: it would discover all of its component,
but it won’t rediscover C’s component, which was already discovered
when we searched C. That’s a clever observation, but of course,
because we don’t know what the components are, we don’t know how to easily
grab a vertex from the last component to search it. Can we figure out some way to
topologically sort the underlying component graph, even if we don’t
know the components? For the two topological sorting algorithms
we’ve seen, each behaves differently if you run it on a graph with
cycles. For any vertex on a
cycle, or even reachable from a cycle, Kahn’s algorithm will never get
that vertex’s in-degree down to zero, so it will never make it into
the ordering. You will end up with an order that only includes
the other vertices. In this graph, Kahn’s algorithm returns just
vertex B. That’s not too helpful here. On the other hand, the depth first search
based topological ordering runs depth first search, and orders vertices
in the reverse order that they finish. That algorithm will order all of the vertices
of the graph, regardless of cycles. Of course it won’t be a topological
ordering, because none exists in a graph with cycles. But, the
ordering is related to the topological ordering of the underlying
component graph. So here, I have run depth first search on
the graph, and recorded discovery and finish times for each vertex. But I also want to
consider the discovery and finish times for the underlying component
super vertices. The algorithm itself doesn’t have any idea
what the components are, but we can just cheat and
look at them for now. We can define a component’s discovery time
as the first time one of its vertices is discovered, and its finish
time is when its last vertex is finished. Notice, within each component, those two times
come from the same vertex. For A’s component, A is the first node
discovered, and the last finished. The same goes for E, G, and of
course the two single vertex components. It has to be like that. The
first vertex discovered in any component can not finish until
everything it can reach, including its entire component, is finished. Vertices that are discovered later in the
component will close earlier, here D is even able to finish before
other vertices in the same component have been discovered, because
it only reaches the rest of the component through vertex A. So that brings us to the next clever observation:
for the same reason that depth first search topological sort works,
depth first search will finish underlying components in reverse
topological order. From
depth first search, tree, forward, and cross edges all have to
finish their target vertex before their source vertex. How about back
edges? They complete cycles, so they all go between
two vertices from the same component. Edges between different components have to
be something other than a back-edge. They all finish their target
vertex, and everything it can reach, including the target vertex’s
entire component, before their source vertex, so that source vertex’s
topologically earlier component has to finish after the target’s
topologically later component. We still don’t know what the
components are, but running depth first search tells us something
about their topological order anyway. Our goal was to learn enough about the underlying
component graph’s topological order to grab a vertex from the
topologically last component, C’s component here. But from these finish times, that
still seems hard. In this example, vertex D finishes first,
but its component is in the middle somewhere. The topologically last
component finishes first, at time 10 for vertex G, but that is hidden
from us because we don’t know the components, and G isn’t the first
vertex to finish, D is. Without knowing the components, the finish
times don’t help us to identify a vertex from the topologically last
component, which finishes first. However, it’s easy to promise you
that here, E’s component finishes last, because E finishes last, so
E’s component can go topologically first. Let’s save vertices in the
order of their reverse finish times, like depth first search
topological sort does. It would be nice to find and delete that entire
first component and continue on, similar to what we did while
developing Kahn’s algorithm, but how do we delete it if we don’t know where
it starts and ends? Unlike taking a vertex in a topologically
last component, if we search a vertex in the topologically first component,
we might search other components too. Here, E would discover everything except B.
That brings us to our last clever observation:
if you reverse the graph, the strongly connected components stay the
same. Any two vertices
that were on a cycle are still on a cycle, the cycle just goes the
other way. But, the edges in the underlying component
graph now go in the opposite direction. The topologically first component that
couldn’t be reached by any other component? Now it cannot reach any
other component, it is topologically last. So, we can grab vertex E,
which we know is in that topologically last component in the reversed
graph, search it, and bang, we discover its entire strongly connected
component. It has no outgoing edges to other components. Conveniently, the vertex order we saved from
our depth first search on the graph now lists vertices from the topologically
latest components, of the reversed graph, first. We use that order to search for
components in the reversed graph, and we don’t need to delete them
from the graph to continue. Depth first search will mark all vertices
in E’s component as explored, and then we can just continue top level
searches from the other vertices, using the next latest finish time
from our saved vertex order. Continuing our search, we go to H next, but
H is already discovered, so depth first search will ignore it which
is good, because we already know its component. The next vertex we run into that we haven’t
already discovered is B. Its component has the second latest finish
time, so it can be topologically second to last, just before E’s
component, in the reversed underlying component graph, or second in
the original underlying component graph. B doesn’t have any outgoing
edges in the reversed graph, but the next vertex, I does: it has edges
to both E and B’s components, but we already know what those
components are, we can’t rediscover them. We finish this second depth
first search, on the reversed graph, taking vertices in the reverse
order that they finished the initial depth first search in the
original graph. Every time we come across a new top level
vertex that hasn’t been previously discovered, it will
discover its strongly connected component. This last part of the algorithm looks just
like discovering components in an undirected graph,
using depth first search, except we need to do our top level
searches in the specific order given from the earlier depth first search. When we finish, the top-level searches that
found something, E, B, I, A, and G, have no parent nodes, and each discovered
its component. The depth first search forest shows the components,
with each tree being one component. All edges from one component to another are
cross edges, that isn’t by chance. The only edge that can go between
components in that final search is a cross edge, because in the
reversed graph, each component is explored and finished before any
other component has a chance to accidentally discover it by another
edge. There can also be cross edges within a component,
but all edges between components will definitely be cross
edges. Tree, back, and
forward edges all have to go between two vertices in the same
component. There don’t happen to be any forward edges
here. So this gives us an efficient algorithm: first,
run depth first search on the graph, ordering vertices by decreasing
finish time. Second,
reverse the graph. Third, run depth first search on that reversed
graph, but use the order from step one for your top level searches. Each step takes time linear in the size of
the graph, so the entire algorithm is linear. In some rough sense, the first depth first
search gives us a topological order for the underlying component
graph, and that order is the only thing we need from that first
search. The second depth
first search helps us discover and mark components in topological
order, like we saw when developing Kahn. It looks similar to our
original inefficient way to find just one strongly connected
component, but here, we don’t need to take any set intersections. Finally, I’ll run through the entire algorithm
uninterrupted for this graph. We run the first depth first search, saving
vertices in order of reverse finish time, and then reverse the
graph. We then run depth
first search on that reversed graph. Each time we get to a top level
search on a vertex that hadn’t yet been discovered, every vertex
discovered in that search is in the same component. We can see that
the root of each tree has the earliest discovery and latest finishing
time within its component or tree. We can see the components, in the order they
were discovered, and add edges to see the full underlying component
graph. Okay, that’s going to do it for this one,
I’ve got to go break up the sniffing cycle my dogs are making with each
other.