This video covers Kahn’s Algorithm for topological

sorting. No. Still no. Really? I don’t know if my students know who that

is, but they should. Those weren’t spelled right. Before watching this video, you should watch

the introduction, especially if you don’t know what topological

order is. If you just

want the clean, final, linear time algorithm, you can jump ahead to

it, but this video is part of the You’re Not a Freakin’ Moron series,

so if you want an idea of how an algorithm might be developed, don’t

skip anything. We will have a full run-through of the final

algorithm near the end. To start, notice that if a vertex has no incoming

edges, it can go first. Also, in an acyclic graph, there has to be

at least one vertex with no incoming edges: if you want to find

one, start at an arbitrary vertex, and follow a path of incoming edges

backwards. If you get to

a vertex with no incoming edges, great. If not, once you’ve gone back

V edges in a graph with V vertices, you must be repeating vertices,

which means you have a cycle. With those observations, we can make an

inefficient algorithm: find a vertex with no incoming edges to put

first. Here it could be A or C, I’ll use A because

I will break ties alphabetically, and then remove that vertex

and its outgoing edges from the graph, and repeat with the remaining

graph. C or E could go

next, I’ll pick C, and then we just continue on until all the vertices

are ordered. That algorithm works. Its efficiency depends on implementation

details, but if we get the graph in outgoing adjacency list format,

finding a vertex with no incoming edges might take a while, and we do

that once for each vertex we put into the ordering. To make it

faster, maybe we don’t need to do that from scratch after every

vertex. To start the algorithm, in time linear for

the size of the graph, we can create an incoming list for

each vertex. How to do that

was covered in the graph representation video from the graph basics

playlist. With those incoming adjacency lists, we can

find all vertices with no incoming edges, those are the vertices that

can go first. We can add

all of those vertices to make an ordered list of vertices that are

ready to go, A and C here. Vertex A goes first, and we use its

outgoing adjacency list to figure out which edges to delete from the

incoming adjacency lists of B, and then E. When we delete A’s edge to

vertex E, we see it is E’s last incoming edge, so we can add E to the

end of our growing, ordered list. Finally, we delete A from G’s list,

and then we’ve finished deleting A from the graph. We continue through the list this way, where

each time we get to a vertex, we look at its outgoing edges, delete

them from the corresponding incoming edge lists, and if

it is the last incoming edge for a vertex, we add that vertex to our list. Let’s add those three

changes to the first algorithm’s pseudocode: first, calculate incoming

adjacency lists which takes time linear in the graph size, then add a

list of all finished vertices in order V time, and finally, for each

vertex in our list that we consider, use its outgoing edges to modify

those newly created incoming adjacency lists. Now we don’t have to

scan the whole graph to look for new vertices to go next. This algorithm still follows the idea of the

first one, but it’s a bit more efficient in its implementation, because

it decreases some of the repeated work. Next, I’ll do a little clean up. Because of the changes we just made

to the algorithm, we don’t need to delete the vertex u from the graph

anymore, we just need to delete it from the incoming adjacency lists

it is in. We can imagine using those lists, that we

created at the start of the algorithm, as working space,

while the outgoing lists don’t have to change at all. Also, that first round of changes

introduced a bit of clutter: we don’t need to have one ordered list of

vertices that are ready to go, and another for our topological order,

especially because both of them are in the same order. Let’s drop the

extra list. How efficient is it now? The initialization stuff takes linear time,

and assuming that you can add to and get the next vertex from the list

in constant time, going through all vertices and edges takes linear

total time, excluding the time to remove edges from the incoming

lists. But that removal, for each edge, takes…

some non-constant time. It might be a bit slow. Linked lists or array lists can take

linear time in the in-degree, and a balanced tree would take

logarithmic time. Without analyzing it exactly, can we do better? Well, how do we really use those incoming

lists? The only thing we

use each one for is to see if it is empty. We are keeping these

perfect lists, and deleting just the right edge from each, in order to

see if it is empty. We never use the list contents, only its size. So, just track the size. Instead of incoming adjacency lists, track

in-degrees, and decrement them instead of deleting edges. If they go

to 0, add them to the list. That’s the whole algorithm. I won’t try to write more detailed pseudocode,

but I do want to analyze the efficiency of this version more

carefully. Finding all

in-degrees takes linear time. Over the course of the entire

algorithm, each vertex is added to the list at most once, either at

the start if it has no incoming edges, or when its in-degree gets

decremented to 0. Each vertex in the list is considered at most

once, and each edge from the vertex causes at most

one constant time decrement. So, the algorithm takes time linear in the

graph size. Something to note when developing this algorithm. What if you had

made those last changes, to deal with in-degrees instead of incoming

lists, but kept the separate topological order, alongside the ordered

list? The algorithm still runs in linear time. What if you replaced

your ordered list with a queue, or even a stack? It still works, and

still in linear time. This version uses the order that things are

removed from the stack, but what if we used the order that things went

in to the stack? It still works, and its still linear time. This is

a problem where, after a few iterations, you can have a lot of

different versions, all asymptotically efficient. You can still clean

those versions, maybe making them slightly faster, but lots of

different versions are pretty good. I like to build a single ordered

list and step through it while I build it. First in first out order

also somehow seems more fair to me than stack order, even though

the vertices don’t complain either way. Lines like “not done with S” are pretty informal,

you might want more precise pseudocode if you want to try to formally

prove that the algorithm works, but all of these variants

will have similar proofs. You could use a loop invariant stating that

no vertex gets added to the list until all vertices that must precede

it have been considered as vertex u. For each vertex, the in-degree represents

the number of vertices with edges to that vertex that have

not yet been considered as vertex u. The invariant initially holds when vertices

with no incoming edges are added to the list, because

no vertices need to precede them. It gets updated each time a new vertex u is

used to decrement in-degrees, which is the only way

those values get decreased. Now, a detailed example. For this graph, first go through all edges

to figure out vertex in-degrees. Add A and C to our list because they

both have in-degree 0. Taking A, we go through its outgoing edges. For each, decrement the in-degree of the vertex

it leads to. B gets

decremented, and then E, which was only waiting on 1 vertex, and this

edge tells us that it was A. Now, E isn’t waiting on anything else,

so add E to the end of the list. G gets decremented too. Next, C

decrements G, and E is up next. E decrements D, adding it to the

list, and G, which isn’t ready yet. D ends up decrementing B, F, G, adding all

of them to the queue. These

vertices don’t have any outgoing edges, so we run through them and

finish the list. If we rearrange the vertices to be in topological

order, we see that all edges point to the right. Finally, what happens if there is a loop in

the graph? It won’t find

a topological order, which doesn’t exist for a graph with cycles, but

what happens if we run the algorithm on a graph with a loop anyway? Any vertex on a cycle will never get down

to in-degree 0, so it will never be added to the list. The list returned by the algorithm will

only include vertices that aren’t in, or reachable from, any cycle. We can just check the size of the list to

see if we have the whole graph or not. If not, it depends on why we were finding

the order before we can figure out what we should

do next. The next video, with a Depth First Search

based topological sort, uses some of the same ideas used here, but behaves

very differently if there are cycles. Anyway, that’s it for this one. I have to work on

my acting skills for my next video. Kahn!