This video covers strongly connected components

in directed graphs. You should already understand graph basics

including depth first search, as well as BOTH topological sorting

algorithms. After

introducing strongly connected components, I will try to show some

intuition of where a linear time algorithm comes from and explain why

it works, all with a running example. The components of an undirected graph are

the different connected pieces of the graph. If there is any path from one vertex to another,

they are in the same component. If the graph is connected, it has

just one component. One easy way to partition the graph into its

separate components is to run depth first search on the entire graph,

and each time that a top level call finds a previously undiscovered

vertex, that vertex’s DFS will discover its entire component. Here, A

discovers D, F, and K, K indirectly through F, and that is the first

component found. Top level searches for B and C discover their

components, but D is already discovered, so its top level search will

just exit right away so that we can move on to E. When depth first

search finishes, we have some tree and back edges, and each of the

trees in the depth first search forest makes one component. If a graph is directed, how do we define components? Here, if A has a

path to G, but G has no path back, are they in the same component? We

say that that two vertices are strongly connected if each has a path

to the other, so they lie on some cycle with each other. Vertices in

a directed graph are partitioned into strongly connected components:

each component is a subset of vertices that can all reach each other. Each vertex will be in exactly one component,

along with the vertices that it can reach and that can reach it. Notice here, it’s clear

that A and D are in the same component, and so are A and F. Because

of that, D and F have to be in the same component: they can each reach

the other through their paths to and from A. If a vertex isn’t in

any cycle? Even if it has edges in and out of it, it’s

all alone in its own component. Here, I is all alone. I is so very, very lonely. Our goal is to find strongly connected components

from a graph, ideally with an efficient algorithm. If we don’t care about

efficiency, it is easy: grab a vertex like A, see what it can reach

maybe with depth first search. Then, if you reverse the graph and

search the same vertex A again, you can see what can reach it in the

original graph. By superimposing one result onto the other,

we can see the intersection of those two sets, the

vertices that A can reach and that can reach A. That is A’s strongly

connected component, A, D, F, and K here. You could do the same thing for every vertex

in the graph, but you only need do it once per component,

because once you know D is in A’s component, you don’t need

to repeat that work on D. How long that algorithm takes depends on how

you implement it, but it would be pretty easy to find one component

in time proportional to the size of the graph, so if there are X strongly

connected components, that would take runtime X times V plus E. If we want better efficiency, let’s stop to

really think about the strongly connected components. I’ll collapse all vertices from each

component into one super vertex, and draw edges between super vertices

if there are any edges between their corresponding sets of vertices. That new graph, the underlying component graph,

must be acyclic: if it had a cycle, all vertices from the cycle’s

corresponding vertices in the original graph could all reach each other,

through some combination of intra and inter component edges,

and they would all collapse into one component. If the original graph is acyclic, the

underlying component graph looks just like the original graph. Every

vertex is its own component. The two topological order algorithms discussed

in previous videos each found topological orders either from the beginning

or the end of the ordering. Those seemed like easier places to start. Is it easier to

find strongly connected components if we do it by topological order

for the underlying component graph? We started this video by finding components

in an undirected graph with depth first search. There, we didn’t need to intersect that set

with vertices with from those of a second search. Well, in this directed

graph, imagine if you could start a search on a vertex in the

topologically last component, maybe C here. You would discover vertex

C’s strongly connected component, in time linear in the number of

vertices and edges in that component. You wouldn’t need to reverse

the graph, do another search, and find the intersection, you would

just discover the component and be done with it. Because C’s

component is topologically last the search can’t escape it, so it

would be nice to search it first, and mark it as done. If we

topologically order the underlying component graph, A’s component

would be second to last, just before C’s component. After we have

marked C’s component, it would be great if we could next search

something from A’s component: it would discover all of its component,

but it won’t rediscover C’s component, which was already discovered

when we searched C. That’s a clever observation, but of course,

because we don’t know what the components are, we don’t know how to easily

grab a vertex from the last component to search it. Can we figure out some way to

topologically sort the underlying component graph, even if we don’t

know the components? For the two topological sorting algorithms

we’ve seen, each behaves differently if you run it on a graph with

cycles. For any vertex on a

cycle, or even reachable from a cycle, Kahn’s algorithm will never get

that vertex’s in-degree down to zero, so it will never make it into

the ordering. You will end up with an order that only includes

the other vertices. In this graph, Kahn’s algorithm returns just

vertex B. That’s not too helpful here. On the other hand, the depth first search

based topological ordering runs depth first search, and orders vertices

in the reverse order that they finish. That algorithm will order all of the vertices

of the graph, regardless of cycles. Of course it won’t be a topological

ordering, because none exists in a graph with cycles. But, the

ordering is related to the topological ordering of the underlying

component graph. So here, I have run depth first search on

the graph, and recorded discovery and finish times for each vertex. But I also want to

consider the discovery and finish times for the underlying component

super vertices. The algorithm itself doesn’t have any idea

what the components are, but we can just cheat and

look at them for now. We can define a component’s discovery time

as the first time one of its vertices is discovered, and its finish

time is when its last vertex is finished. Notice, within each component, those two times

come from the same vertex. For A’s component, A is the first node

discovered, and the last finished. The same goes for E, G, and of

course the two single vertex components. It has to be like that. The

first vertex discovered in any component can not finish until

everything it can reach, including its entire component, is finished. Vertices that are discovered later in the

component will close earlier, here D is even able to finish before

other vertices in the same component have been discovered, because

it only reaches the rest of the component through vertex A. So that brings us to the next clever observation:

for the same reason that depth first search topological sort works,

depth first search will finish underlying components in reverse

topological order. From

depth first search, tree, forward, and cross edges all have to

finish their target vertex before their source vertex. How about back

edges? They complete cycles, so they all go between

two vertices from the same component. Edges between different components have to

be something other than a back-edge. They all finish their target

vertex, and everything it can reach, including the target vertex’s

entire component, before their source vertex, so that source vertex’s

topologically earlier component has to finish after the target’s

topologically later component. We still don’t know what the

components are, but running depth first search tells us something

about their topological order anyway. Our goal was to learn enough about the underlying

component graph’s topological order to grab a vertex from the

topologically last component, C’s component here. But from these finish times, that

still seems hard. In this example, vertex D finishes first,

but its component is in the middle somewhere. The topologically last

component finishes first, at time 10 for vertex G, but that is hidden

from us because we don’t know the components, and G isn’t the first

vertex to finish, D is. Without knowing the components, the finish

times don’t help us to identify a vertex from the topologically last

component, which finishes first. However, it’s easy to promise you

that here, E’s component finishes last, because E finishes last, so

E’s component can go topologically first. Let’s save vertices in the

order of their reverse finish times, like depth first search

topological sort does. It would be nice to find and delete that entire

first component and continue on, similar to what we did while

developing Kahn’s algorithm, but how do we delete it if we don’t know where

it starts and ends? Unlike taking a vertex in a topologically

last component, if we search a vertex in the topologically first component,

we might search other components too. Here, E would discover everything except B.

That brings us to our last clever observation:

if you reverse the graph, the strongly connected components stay the

same. Any two vertices

that were on a cycle are still on a cycle, the cycle just goes the

other way. But, the edges in the underlying component

graph now go in the opposite direction. The topologically first component that

couldn’t be reached by any other component? Now it cannot reach any

other component, it is topologically last. So, we can grab vertex E,

which we know is in that topologically last component in the reversed

graph, search it, and bang, we discover its entire strongly connected

component. It has no outgoing edges to other components. Conveniently, the vertex order we saved from

our depth first search on the graph now lists vertices from the topologically

latest components, of the reversed graph, first. We use that order to search for

components in the reversed graph, and we don’t need to delete them

from the graph to continue. Depth first search will mark all vertices

in E’s component as explored, and then we can just continue top level

searches from the other vertices, using the next latest finish time

from our saved vertex order. Continuing our search, we go to H next, but

H is already discovered, so depth first search will ignore it which

is good, because we already know its component. The next vertex we run into that we haven’t

already discovered is B. Its component has the second latest finish

time, so it can be topologically second to last, just before E’s

component, in the reversed underlying component graph, or second in

the original underlying component graph. B doesn’t have any outgoing

edges in the reversed graph, but the next vertex, I does: it has edges

to both E and B’s components, but we already know what those

components are, we can’t rediscover them. We finish this second depth

first search, on the reversed graph, taking vertices in the reverse

order that they finished the initial depth first search in the

original graph. Every time we come across a new top level

vertex that hasn’t been previously discovered, it will

discover its strongly connected component. This last part of the algorithm looks just

like discovering components in an undirected graph,

using depth first search, except we need to do our top level

searches in the specific order given from the earlier depth first search. When we finish, the top-level searches that

found something, E, B, I, A, and G, have no parent nodes, and each discovered

its component. The depth first search forest shows the components,

with each tree being one component. All edges from one component to another are

cross edges, that isn’t by chance. The only edge that can go between

components in that final search is a cross edge, because in the

reversed graph, each component is explored and finished before any

other component has a chance to accidentally discover it by another

edge. There can also be cross edges within a component,

but all edges between components will definitely be cross

edges. Tree, back, and

forward edges all have to go between two vertices in the same

component. There don’t happen to be any forward edges

here. So this gives us an efficient algorithm: first,

run depth first search on the graph, ordering vertices by decreasing

finish time. Second,

reverse the graph. Third, run depth first search on that reversed

graph, but use the order from step one for your top level searches. Each step takes time linear in the size of

the graph, so the entire algorithm is linear. In some rough sense, the first depth first

search gives us a topological order for the underlying component

graph, and that order is the only thing we need from that first

search. The second depth

first search helps us discover and mark components in topological

order, like we saw when developing Kahn. It looks similar to our

original inefficient way to find just one strongly connected

component, but here, we don’t need to take any set intersections. Finally, I’ll run through the entire algorithm

uninterrupted for this graph. We run the first depth first search, saving

vertices in order of reverse finish time, and then reverse the

graph. We then run depth

first search on that reversed graph. Each time we get to a top level

search on a vertex that hadn’t yet been discovered, every vertex

discovered in that search is in the same component. We can see that

the root of each tree has the earliest discovery and latest finishing

time within its component or tree. We can see the components, in the order they

were discovered, and add edges to see the full underlying component

graph. Okay, that’s going to do it for this one,

I’ve got to go break up the sniffing cycle my dogs are making with each

other.