1   8   Guiding Principles for Analysis of Algorithms 15 min

1 8 Guiding Principles for Analysis of Algorithms 15 min

Having completed our first analysis of an
algorithm, namely an upper bound on the running time of the Merge Short algorithm.
What I wanna do next is take a step back, and be explicit about three assumptions,
three biases that we made when we did this analysis of Merge Short, and interpreted
the results. These three assumptions we will adopt as guiding principles for how
to reason about algorithms, and how to define a so called fast algorithm for the
rest of the course. So, the first guiding principle is that we used what’s often
called worst case analysis. By worst case. Analysis, I simply mean that our upper
bound of six N log N plus six N. Applies to the number of lines of executed for
every single input array of length end. We made absolutely no assumptions about the
input, where it comes from, what it looks like beyond what the input length N was.
Put differently, if, hypothetically, we had some adversary whose sole purpose in
life was to concoct some malevolent input designed to make our algorithm run as slow
as possible. The worst this adversary could do, is. Upper bounded by this same
number, 6N log N + 6N. Now, this, so, sort of worst case guarantee popped out so
naturally from our analysis of Merge Short, you might well be wondering, what
else could you do? Well, two other methods of analysis, which do have their place,
although we won’t really dicuss them in this course, are quote unquote, average
case analysis. And also the use of a set of prespecified benchmarks. By average
case analysis, I mean, you analyze the average running time of an algorithm under
some assumption about the relative frequencies of different inputs. So, for
example, in the sorting problem, one thing you could do, although it’s not what we do
here. You could assume that every possible input array is equally unlikely, and
then analyze the average running time of an algorithm. By benchmarks, I just mean
that one agrees up front about some set, say ten or twenty, Benchmark inputs, which
are thought to represent practical or typical inputs for the algorithm. Now,
both average-case analysis and benchmarks are useful in certain settings, but for
them to make sense, you really have to have domain knowledge about your problem.
You need to have some understanding of what inputs are more common than others,
what inputs better represent typical inputs than others. By contrast, in
worst-case analysis, by definition you’re making absolutely no assumptions about
where the input comes from. So, as a result, worst-case analysis is
particularly appropriate for general-purpose sub-routines. Sub-routines
that you design. Find without having any knowledge of how they will be used or what
kind of inputs they will be used on. And happily, another bonus of doing worst case
analysis, as we will in this course, it’s usually mathematically much more
attractable than trying to analyze the average performance of an algorithm under
some distribution over inputs. Or to understand the detailed behavior of an
algorithm on a particular set of benchmark inputs. This mathemetical tractabilty
was reflected in our Merge Sort analysis, where we had no a priori goal of
analyzing the worst case, per se. But it’s naturally what popped out of our reasoning
about the algorithm’s running time. The second and third guiding principles are
closely related. The second one is that, in this course, when we analyze
algorithms, we won’t worry unduly about small constant factors or lower order
terms. We saw this philosophy at work very early on in our analysis of merge sort.
When we discussed the number of lines of code that the merge subroutine requires.
We first upper-bounded it by 4m plus two, for an array of length m, and then we
said, eh, let’s just think about it as 6m instead. Let’s have a simpler, sloppy
upper-bound and work with that. So, that was already an example of not worrying
about small changes in the constant factor. Now, the question you should be
wondering about is, why do we do this, and can we really get away with it? So let me
tell you about the justifications for this guiding principle. So the first motivation
is clear, and we used it already in our merge short analysis. Which is simply way
easier mathematically, if we don’t have to, precisely pin down what the [inaudible] constant factors and lower-order terms are. The second justification is a little less
obvious, but is extremely important. So, I claim that, given the level at which we’re
describing and analyzing algorithms in this course, it would be totally
inappropriate to obsess unduly about exactly what the constant factors are.
Recall our discussion of the merge subroutine. So, we wrote that subroutine
down in pseudocode, and we gave it analysis of 4m plus two on the number of
lines of code executed, given an input of length m. We also noted that, it was
somewhat ambiguous exactly how many lines of code we should count it as, depending
on how you count loop increments and so on. So even there it’s small constant factors
could creep in given the under specification of the pseudo code.
Depending on how that pseudo code gets translated into an actual program language
like C or Java. You’ll see the number of lines of code deviate even further, not
but a lot but again by small constant factors. When such a program is then
compiled down into machine code, you’ll see even greater variance depending on the
exact processor, the compiler, the compiler optimizations, the programming
implementation, and so on. So to summarize, because we’re going to describe
algorithms at a level. That transcends any particular programming language. It would
be inappropriate to specify precise constants. The precise constants were
ultimately determined by more machine dependent aspects like who the programmer
is, what the compiler is, what the processor is, and so on. And now the third
justification is frankly, we’re just going to be able to get away with it. [sound]
That is, one might be concerned that ignoring things like small constant factors
leads us astray. That we wind up deriving results which suggest that an algorithm is
fast when it’s really slow in practice, or vice versa. And for the problems we
discuss in this course we’ll get extremely accurate predictive power. Even though we
won’t be keeping track of lower terms and constant factors. When the mathematical
analysis we do suggests that an algorithm is fast, indeed it will be. When it
suggests that it’s not fast, indeed that will be the case. So we lose a little bit
of granularity of information. But we don’t lose in what we really care about,
which is accurate guidance about what algorithms are gonna be faster than
others. So the first two justifications, I think, are pretty self evident. This third
justification is more of an assertion, but it’s one we’ll be baking up over and over
again as we proceed through this course. Now, don’t get me wrong. I’m not saying
constant factors aren’t important in practice. Obviously, for crucial programs
the constant factors are hugely important. If you’re running the sorta crucial loop,
you know, your start up’s survival depends on, by all means optimize the constant
like crazy. The point is just that understanding tiny constant factors in the
analysis is an inappropriate level of granularity for the kind of algorithm
analysis we’re going to be doing in this course. Okay, lets move on the, the third
and final guiding principle. So the third principle is that we’re going to use
what’s called asymptotic analysis, by which I mean we will focus on the case of
a large input sizes. The performance of an algorithm as the size N of the input grows
large, that is, tends to infinity. Now this focus on large input size is it was
already evident when we interpreted our bound on Merge Sort. So, how did we
describe the bound on Merge Sort? We said, oh, well, it needs a number of operations
proportional, a constant fact or times in login. And we very cavalierly declared
that this was better than any algorithm which has quadratic dependence of it’s
running time on the number of operations. So for example, we argued that merge sort
is a better, faster algorithm than something like insertion sort, without
actually discussing the constant factors at all. So mathematically. We were saying
the running time of merge short, which we know, which we can represent as the
function. Six N log base two of N + 6N is better than any function which has a
quadratic dependence on n. Even one with a small constant like lets say 1/2 N squared which might roughly be the running time of insertion sort. And this is a
mathematical statement that is true if and only if N is sufficiently large once N
grows large it is certainly true that the expression on the left is smaller than
the expression on the right but for small N the expression on the right is actually
going to be smaller because of the smaller leading term so in saying that merge sort
is superior to insertion sort the bias is that we’re focusing on problems with a
large N so the question you should have is that reasonable is that a justified
assumption to focus on large input sizes and the answer is certainly yes. So the
reason we focus on large input sizes is because, frankly, those are the only
problems which are even, which are at all interesting. If all you need to do is sort
100 numbers, use whatever method you want, and it’s gonna happen instantaneously on
modern computers. You don’t need to know say, the divide and conquer paradigm, if
all you need to do is sort 100 numbers. So one thing you might be wondering is if,
with computers getting faster all the time according to Moore’s Law, if really, it
doesn’t even matter to think about algorithmic analysis, if eventually all
problem sizes will just be trivially solvable on super fast computers. But, in
fact, the opposite is true. Moore’s Law, with computers getting faster, actually
says that our computational ambitions will naturally grow. We naturally focus on ever
larger problem sizes. And the gulf between an N squared algorithm and an m log n
algorithm will become ever wider. A different way to think about it is in
terms of how much bigger a problem size you can solve. As computers get faster. If
you are using an algorithm with a running time which is proportional to the input
size then the computers get faster by a factor of four then you can solve problems
that are factor of four or larger. Whereas if you are using an algorithm whose
running time is proportional to the square of the input size then a computer gets
faster by a factor of four, you can only solve double the problem size and we’ll
see even starker examples of this gulf between different algorithm approaches as
the time goes on. So to drive this point home. Let me show you a couple of graphs.
So what we’re looking at here, is we’re looking at a graph, of two functions. So
the solid function. Is the upper bound that we proved on merge sort. So this is
gonna be 6nlog(base2)n plus 6n. And the dotted line is an estimate. A rather
generous estimate about the running time of, [inaudible] sort. Namely one-half
times N. Squared. And we see here in the graph exactly the behavior that we
discussed earlier, which is that the small N. Down here. In fact because one-half N.
Squared has a smaller leading constant it’s actually a smaller function. And this
is true up to this crossing point of maybe 90 or so. Again, beyond n=90. The
quadratic growth in the N squared term. Overwhelms the fact that it had a smaller
constant and it starts being bigger than this other function six of N + six N so in
the regime below 90 it’s predicting that the insertion store will be better and in
the regime above 90 it’s predicting that merge sort will be faster. Now here’s
what’s interesting let’s scale the X axis let’s look well beyond this crossing point
of 90 let’s just increase it in order of magnitude up to a raise in size 1500. And
I want to emphasize these are still very small problem sizes. If all you need to do is
sort arrays of size 1500 you really don’t need to know Divide-and-conquer
or anything else we’ll talk about — that’s a pretty trivial problem on modern
computers. [sound]. So what we’re seeing is, that even for very modest problem
sizes here, array of, of, size, say 1500. The quadratic dependence in the insertion
sort bound is more than dwarfing the fact, that it had a lower constant factor.
So in this large regime, the gulf between the two algorithms is growing. And of
course, if I increased it another 10X or 100x or 1000x to get to genuinely
interesting problem sizes, the gap between these two algorithms would be even bigger,
it would be huge. That said, I’m not saying you should be completely ignorant
of constant factors when you implement algorithms. It’s still good to have a
general sense of what these constant factors are so for example in highly tuned
versions of Merge Sort which you’ll find in mny programming libraries. In
fact, because of the difference in constant factors, the algorithm will
actually switch from Merge Sort over to insertion sort, once the problem size
drops below some particular threshold, say seven elements, or something like that. So
for small problem sizes, you use the algorithm with smaller constant factors,
and the insertion sort for larger problem sizes, you use the algorithm and better
rate of growth, mainly merge short. So, to review our first guiding principal is that
we’re going to pursue worse case analysis. We’re going to look to bounds on the
performance, on the running time of an algorithm which make no domain
assumptions, which make no assumptions about which input of a given length the
algorithm is provided. The second guiding principal is we’re not going to focus on
constant factors or lower returns, that would be inappropriate, given the level of
granularity at which we’re describing algorithms and third is where going to
focus on the rate of growth of algorithms for large problem sizes. Putting these
three principles together, we get a mathematical definition of a fast
algorithm. Namely, we’re gonna pursue algorithms whose worst case running time
grows slowly as a function of the input size. So let me tell you how you should
interpret what I just wrote down in this box. So on the left hand side is clearly
what we want. Okay, we want algorithms which run quickly if we implement them.
And on the right hand side is a proposed mathematical surrogate of a fast
algorithm. Right, the left hand side is not a mathematical definition. The right
hand side is, as well become clear in the next set of lectures. So we’re identifying
fast algorithms, which, those that have good asymptotic running time which grows
slowly with the input size. Now, what would we want from a mathematical
definition? We’d want a sweet spot. On one hand we want something we can actually
reason about. This is why we zoom out and squint and ignore things like constant
factors and lower terms. We can’t keep track of everything. Otherwise we’d never
be able to analyze stuff. On the other hand we don’t want to throw out the baby with
the bath water, we want to retain predictive power and this turns out this
definition turns out for the problems we’re going to talk about in this course,
to be the sweet spot for reasoning about algorithms okay worst case analysis using
the asymptotic running time. We’ll be able to prove lots of theorems. We’ll be able to
establish a lot of performance guarantees for fundamental algorithms but at the same
time we’ll have good predictive power what the theory advocates will in fact be
algorithms that are known to be best in practice. So, the final explanation I owe
you, is, what do I mean by, the running time grows slowly, with respect to the
input size? Well, the answer depends a little bit on the context, but, for almost
all of the problems we’re going to discuss, the holy grail will be to have
what’s called a linear time algorithm, an algorithm whose number of instructions
grows proportional to the input size. So, we won’t always be able to achieve linear
time, but that’s, in some sense, the best case scenario. Notice, linear time is even
better than what we achieved with our merge short algorithm for sorting. Merge
short runs a little bit superlinear, it’s n times log n, running as the input size.
If possible, we. To be linear time. It’s not always gonna be possible, but that is
what we will aspire toward. For most of the problems we’ll discuss in this course.
Looking ahead, the next series of videos is going to have two goals. First of all,
on the analysis side, I’ll describe formally what I mean by asymptotic
running time. I’ll introduce “Big Oh” notation and its variants, explain its
mathematical definitions, and give a number of examples. On the design side,
we’ll get more experience applying the divide and conquer paradigm to further problems. See you then.

3 thoughts to “1 8 Guiding Principles for Analysis of Algorithms 15 min”

  1. This lesson can be reduced by a factor of two and give the same information.

Leave a Reply

Your email address will not be published. Required fields are marked *