Compilers Lecture 0: Introduction and Syllabus


>>CSC 151, Compiler
Construction. And my name is Ghassan Shobaki. All right. So today, like I said, it will
be an introductory lecture. And we’ll go through
the syllabus. So, hopefully, by the end of
this lecture it should be clear to you what we will be going
to cover in this course, and, you know, the work that
you are required to do. So this is the compiler course. So what’s a compiler?>>Translate source code
into machine object code.>>Okay. Source code. And what’s source code? Yeah?>>Languages like
Java and [inaudible]?>>Yeah. So how do you —
how do you define these? I mean, how do you — how would
you describe the difference between these and machine code? Yeah?>>The source code is
a high-level language, and g code is low level.>>High level and low level. So what do you mean
by high level? Yeah?>>It combines operators
together such as, like, adding two numbers together and
[inaudible] it could take, like, four or five lines for
what you want to do. While in high level languages
it condenses down to one.>>Yeah. This is true. Another way of — yeah?>>So high-level language would
be something that is meant to be human readable,
human understandable. Whereas a low [inaudible]
language is something the computer itself understands.>>Yeah, exactly, exactly. So this is the —
this is the key point. You know, human — the high-level languages
that you are used to program in are human readable. So there are languages
that humans can understand, and that maximize productivity
so that when you program in these languages, you don’t
have to worry about many, you know, machine-level details. So you should focus on
your high-level languages, allow you to focus on the
problem that you are trying to solve by writing
this program. So this is the difference. And this is what
a compiler does. A compiler translates
a program written in a human-like language
into machine code. So, in fact — This is hidden. So the compiler will take — You know, it’s going to
take something, like — You know, z equals x plus
y. And to produce something like what will the machine-level
code for this look like? So this is an expression — or
it’s an assignment statement. This is an expression
assigned to a variable. So what will the machine-level
code look like here? Yes?>>It would set a register to x.
It would add to that register y, and then it would store that
register value back into z.>>Okay, this is good. Yeah. So assuming that z
and x and y are variables, program variables — so assuming
that they are in memory, so in your memory — and, by the
way, in most cases, you know, we assume that the
variable names that we work with in this course are
local variable names. So they are somewhere
on the stack. So you have these x and y and z. So these are variables
in memory. So what you want to do
in order to add these on a typical [inaudible]
machine, you would want to load x into some
register, load x and put it in some register, say R1. And then you want to load
y, you bring it from memory, and put it into some
register R2. And then do what? [ Inaudible Speakers ]>>Add R1 and R2.>>Add R1 and R2. And –>>Store the result
in another register.>>Store the result in some
other register like R3. And then the final step –>>Store R3 back to z.>>Exactly. Store R3 back into variable z.
Don’t worry about the details. You know, for example, you
know, here in today’s lecture, we’re not going to
worry about any details. So x in this case is a variable
name, but in actual assembly, it’s not going to be a
variable name, you know. Assembly doesn’t
know variable names. This is going to be an address
of a variable in memory. So for variable names,
variable names in actual assembly are going
to be addresses in memory, or, more precisely, if there are
local variables on the stack, these are — you know, this x
will be replaced by the offset for that particular
variable relative to the — to the starting address for
that particular function, which is what we call the
activation record pointer, pointer to the activation
record of the function that we are currently compiling. Anyway, we’re not going to get
into any of these details today. But this is what
a compiler does. It takes an expression like
this, and it translates it into some machine-level —
some machine-level operations. Is this all what
the compiler does? Does it just do the translation?>>No.>>No.>>No.>>[inaudible] the
compiler itself does. But it’s only one step
in the full process.>>Okay. What else does it do?>>It’s going to have the — where you have the
preprocessor, then the compiler. Then you have the
linker that links it to an executable program.>>Okay. So what
does the linker do?>>[inaudible] links your
program or the [inaudible] code with library functions for the
— to get it [inaudible] –>>Okay. [ Inaudible Speaker ] Yeah. So the linker
does link your program with library functions. But assuming that you are not
using any library functions, does this mean that you
will not need a linker? [ Inaudible Speaker ] Yes?>>You still need — you
still need the linker to link with the C library, which
you kind of can’t avoid.>>Okay. In addition
to the C library. So in addition to the library,
what does the linker do?>>It, like — it puts — it
puts the object code which, like, doesn’t have —
it chooses a location to put the object code in memory
when the program is loaded.>>Okay. But what
does linking mean? You know, it’s going
to link something.>>It’s going to put all your
compiled deep source file [inaudible] –>>Exactly, yeah. Deep source. Now, this is the key. Deep source file. So your program may consist
of multiple source files. And each source file will get
translated into one object file. And linking is about
linking these object files into an executable. So each object file may not
be executable by itself. So linking them will
produce the executable, in addition to linking
with library functions.>>I guess in a nutshell
it’s job is to resolve all external
references to the [inaudible].>>Yes.>>Internal and external
references.>>Yes, yes. But — yeah — okay. So this is still not — yeah. So there is still an important
thing that the compiler does in addition to doing this. Yeah. Yes, Eric [assumed
spelling].>>Would this be
optimization as well?>>Yeah, exactly. Optimization. So it will be applying
some optimizations. Even though the word
optimization is kind of misleading — using the world
optimization is misleading, because a compiler doesn’t
generate optimal code. You know? I’m not
aware of any compiler that generates optimal
code all the time. Okay? In fact, if,
you know, this — if someone ever figures
out how to build a compiler that generates optimal
code all the time, then researching compiler
optimizations will stop. But this hasn’t happened yet. And it’s unlikely to happen. But the compiler
generates good code. So it applies certain
enhancements. You know? So let’s
call it enhancements, certain transformations to the
code to make the code better, to make it more efficient. Of course, in this course, we will be learning
about optimizations. But the focus of the
course is not going to be on optimizations. The focus of the course
is going to be on how to do the basic translation. So this is the first compiler — this is your first
compiler course. And it’s going to focus on
doing the basic translation from high-level language
to machine code. But we will cover some — you know, a fair
amount of optimizations. But this will not be
the focus this course. Now, the compiler is not
going to do this in one step. So it’s not going to go — you know, a typical compiler
is not going to go all the way from high-level to machine code. In fact, there are some
intermediate steps. So there are some intermediate
representations of the code. So the compiler will not go — you know, I’m not aware
of any compiler that goes from high-level to, you
know, this level of assembly. There is usually some
intermediate level. And a very typical
intermediate-level representation of the code
is the abstract syntax tree, which is a tree representation
of the code. So let’s take an example that is
a little bit more interesting. So what if we have something
like x equals a plus — or a times b plus c times b? Okay. What if this is —
what if this is our input? So in this case, the compiler — the front end of the compiler
is going to parse this. And it’s going to translate
this into a tree representation that we call the abstract
syntax tree, which will look like for — you know, for
this expression, we have the a and the b, and we multiply them. Then we multiply the c and
the d. And then the results of the two multiplications
are going to get added. And then you assign the
result of this addition to x. So this is the abstract
syntax tree. So this is an intermediate
representation. It’s an intermediate
representation of the code. And a typical compiler
will, you know, generate this intermediate
representation, this hierarchical
tree-like representation. And it will apply some
optimizations to it. So there is the optimizer
of the code. Then the optimizer will
apply some transformations to improve the quality
of the code and eliminate some
redundancies, et cetera. And then it will
produce optimized IR, intermediate representation. So this abstract
syntax tree is one of the intermediate
representations that a compiler may use. So compilers typically
use multiple intermediate representations. And the abstract syntax
tree is one of them. So it’s, obviously,
something that is, you know, between high-level
representation and the assembly. And then the last part of the
compiler, which is the back end, is going to do the
code generation. So it’s going to do
instruction selection to select instructions. And there is instruction
scheduling, which is scheduling
instructions. You don’t have to understand — you know, to worry about
understanding the details of these concepts, because
we’ll go through them in detail. So it will find the right
ordering of the instructions. Then it will do what we call
register allocation, which is, you know, deciding which variables will
go into registers. And this will generate
the assembly. Like in this case, you know, what will the assembly
look like? So you have all of
these variables. So you’re going to load them. For example, load
a into register 1. Load b into register 2. And then add register
1 and register 2, and put the result
in register 3. Then you want to do
the same thing with c and d. Say it’s the
same thing with c and d. And the result will
go into register R5 — or you need four, five, six. And then you will add
registers R3 and R6. And you put the result
into register R7. And then you store
R7 back into x. Okay? Now, with this structure of the
compiler, you have the front end that will take this
high-level representation and produce this tree-like
representation of the code, the abstract syntax tree. So getting from here to
here is going to take us about 60% of this course. So 60% of the time will be spent to understand how the compiler
can get from this to this. So the process that
the compiler — the process of translating this into this involves what we
call scanning and parsing. Then we have semantic analysis. So let’s have some high-level
descriptions of these. Scanning, so basically
you look at your input, and you translate this into — you divide it up
into tokens or words. In this case, you know,
this is one word — Oh, this is not a good example, because each one is going
to be a single word. But if you had a variable
like, you know, xy23, so this whole thing is
going to be one word, which is a variable name. This is another word which
is an assignment operator. This is a word — this
is a word or a token. So scanning is going to break up your input program
into tokens or words. And parsing is going
to take these words and construct sentences
out of them. So scanning is at
the word level. You know, taking letters
and forming words. And parsing is about
forming sentences, which is, you know, a complete program. This is a very high-level
description. Like I said, we’ll
spend most of the time in this course understanding
these — you know, scanning
and parsing processes, and how to get from
this to this. So this is what we will
be spending, you know, all of our time before the
midterm trying to understand. Then semantic analysis is about,
you know, checking for the — if this — these sentences that we have formed
are meaningful or not. Semantics are about meanings. You know, sometimes you can have
a grammatically correct sentence that has no meaning. Like, you know, for example, the
apple ate the boy or something. So this is grammatically
correct, but it’s meaningless. And, you know, in computer
programs, you can write programs that are syntactically correct, but they’re semantically
incorrect or meaningless. You know, like a
program, for example, that does not have
the right typing. It takes, you know, two
objects and adds them and assigns the result to
an integer, for example. Type mismatch is an
example of a semantic error. So it will look grammatically
correct, but it’s meaningless. So this stuff — this material
is going to take us, like, you know, 60% of the time. Then we will look into
some optimizations. And we will — then we’ll spend
some time looking into these, you know, instruction selection,
instruction scheduling, selecting the instructions,
finding the right order for the instructions, and
then assigning variables to registers, because
normally, you know, your program will have
a lot more registers, a lot more variables than you
have registers on the CPU. Right? Okay. So this is very much
what we are going to do. We will be building
a basic compiler. So in this course,
you are required to build a basic compiler. And we will be using
the c language. You will be building
a basic compiler for a basic programming language
that has the basic constructs of a programming language. What are the basic constructs
of a programming language? So this is an expression. Right? So our programming
language will have expressions and assignments —
and assignments. What else? [ Inaudible Speaker ]>>Okay. That’s within
expressions and assignments. What are the basic constructs
of a programming language that you can’t write a
meaningful program without? [ Inaudible Speaker ] What’s that? [ Inaudible Speaker ] What kind of rules?>>[inaudible] rules.>>No, no. That constructs from
programmer point of view. So you need to have
expressions — the ability to write
expressions to do assignments. What else do you do? [ Inaudible Speakers ] Yeah, conditionals.>>Yeah.>>If/else, which
means that if/else — that we can’t live without, we can’t write a
meaningful program without. Also, what’s the other
important construct that we can’t write a
meaningful program without?>>Loops.>>Loops. So these are
the essential features of a programming language. These are the essential
features that we will learn how to build a compiler for,
how to build a compiler that takes a program that has
only these essential features, expressions, assignments,
conditionals and loops, and translate that
into assembly code. So we will not be, you know, covering advanced language
features like, you know, objects and inheritance and all of that. But you will be given the chance
to do extra work if you want. So the basic requirement what
you are required to do is to build a compiler that
only implements these basic constructs, and translates
them into assembly language. You know, 90 or 95% of
the lectures are going to be white board —
on the white board. So we will not be using the — we will not be using
PowerPoint much. You know, this time I just put
the syllabus on the white board. But, you know, 95% of the time
we’ll be doing white board. So we will not be
switching, you know, between white board
and a PowerPoint. So these are the — so this
is what we will be doing in this course. So you have to be interested. You know, from my experience
I only taught it once at Sac State, and if you are
not interested, then, you know, you will not do well
in this course. You know, a compiler is a
non-trivial piece of software. So a compiler is an
intelligent piece of software. And in order to build that
intelligent piece of software, there is a lot of science
that you have to learn. And this is interesting because
I didn’t know this, you know, about software until I
took a compiler course. So, you know, I’m originally
an electrical engineer. My degree is in electrical
engineering. And I didn’t have formal
education in computer science. And I was programming. You know, I could program. And I thought of myself
as a good programmer. And I thought that I can
program anything I want. Until I took a compiler course,
I realized that, you know, being a good programmer is not
sufficient to build a system that is as intelligent
as a compiler. So no matter how good your
programming skills are, if you don’t learn the science
that people have developed in order to be able
to build a compiler, you will not be able
to build a compiler. So you have to learn
that science. And, in fact, that has — you
know, has changed my life, because I was under
the impression that computer science doesn’t
have enough science in it. It shouldn’t be called
computer science. I thought, you know,
it’s just programming, and I already know
how to program, so why should I study
computer science? But when I took a
compiler course, I realized that there is
science in computer science. And I decided, you know,
after taking compiler classes, to switch from electrical
engineering to computer science. So compilers convinced me that computer science
has enough science in it, has enough meat in it. And that’s why, you know, I decided to do my Ph.D.
in computer science. So it’s interesting. You know, all of
these, you know, constructs that people have
built in order to be able to — have developed in
order to be able to build a compiler
are interesting. And these are concepts
that you have taken in 135. So in CSC 135, which is
theory of computation, in theory of computation
you took — what were the concepts
that you [inaudible]? [ Inaudible Speaker ] Yeah, finite automata.>>[inaudible] grammars.>>Okay. Grammars.>>Yeah.>>And?>>Push down automata.>>And — yeah. Well, regular expressions.>>Regular, yeah, [inaudible].>>So you need to know these. Now, the good news is that
in this course we will be applying these. So the compiler is an
application of these. It applies these concepts to
build an intelligent system that we can’t live
without, you know, the system that everybody
uses to write their programs in a high-level language,
including Java programmers. So these are the constructs
that we will be using. Now, is there anyone
who hasn’t taken 135? So everyone here has taken 135. So this means that I will be — the concepts of 135 will
be reviewed fairly quickly in this course. So we will be learning how to use finite automata
regular expressions and grammars to build
a compiler. Okay? So if — and we will
not be doing theory here. So, for example, how many
people like the pumping lemma? [ Laughter ] Okay.>>If anyone raised their
hand, they can leave.>>Okay. Well, in
fact, you know, in this course we will
not be doing theory. We will not be doing
the pumping lemma. This doesn’t mean that theory
is irrelevant, by the way. So I just wanted you to know that in this course
we will be applying. This is an application. A compiler is a real system. This doesn’t mean that
theory is irrelevant. You know, theory opens
new horizons for us. Just, you know, keep in
mind that all the technology that we are enjoying today
and all the electronics and all the wireless systems
that we’re enjoying today, they are based on the science that people have
built centuries ago. You know, they’re based
on nuclear theory. You know, electronics is
based on nuclear theory, because before people started
exploring nuclear theory, they didn’t know
about the electron.>>Right.>>And when they started
exploring that, that was — the only purpose
was pure knowledge. You know? Their people
were interested in knowing and understanding the
structure of material. So pure theory and pure
knowledge may not have applications at some point, but you never know
what smart people — what applications smart
people may come up with later on in the future, maybe
decades in the future or centuries in the future. So anything theoretical, anything that is pure
theory today, you know, should not be underestimated. You know, it’s fine to
say I don’t like theory, but it’s not fine to say theory
is useless and irrelevant. This is not true. Theory opens new
horizons for us. Theoretical stuff, whatever
theoretical work your people are working on today, it may have
some clever application sometime in the future. You don’t know what clever
people in the future, what applications
people may come up with for these theoretical concepts. So it’s okay to say that
you are like most people, don’t like to do theory. So most people do
not like theory. But it’s not okay
to say, you know, theory is irrelevant
and useless. These are the concepts. So now they should
make more sense. You know, we spend
a couple of — there will be an overview
first, a couple of weeks. It will be a relatively
long introduction so that you know the
big picture, you know, what are the different
parts of a compiler, and what are the
different concepts. So our introduction is going to
take two weeks, relatively long. Then we spend a couple
of weeks on scanning. Then three weeks on parsing. Then a couple of weeks on context sensitive
analysis or semantic analysis. Then a week on intermediate
representations. Then a week on basic
code generation, how to get from an
abstract syntax tree to assembly, generic assembly. Then we spend a week on
instruction selection, selecting the right
instructions, the right machine instructions
for a given sequence of code. Then we will do register
allocation, deciding how — you know, which variables
should be stored in registers. And then a week on
instruction scheduling, which is the right order
of the instructions. And then a week on
code optimizations, the different optimizations. So I hope that these concepts — you know, based on the
introduction that I gave today, I hope that they will
vaguely make some sense, on a high level they
make some sense. Of course, you know, we will be
— as you can see, there will be at least a week for
each one of these. So we will — you will
understand them better when we get there. But at least I hope
that on a high level, you know what each one of these,
you know, concepts or components of a compiler, what
each one of them is. The first three chapters that
discuss scanning and parsing, we will basically
cover the three — most of the topics in the book. I have the — you know, the
textbook for this is going to Engineering a Compiler
by Cooper and Torczon. So this is going
to be the textbook. So the first three chapters will
get covered almost entirely. But then after that, we finish
the first three chapters — or the first four chapters, we will be covering
selected topics. So that’s why, you know, each one of these topics,
they are big topics. You know, register
allocation is a big topic. But we will only spend a week. So we will only cover a
subset of what’s in the book on register allocation or
instruction scheduling. And instruction scheduling is
my area of research, by the way. So it’s — you know, a
week is extremely short, it’s an extremely short
period of time for me. But we will be trying
to cover, you know, selected topics about these. Questions?

Leave a Reply

Your email address will not be published. Required fields are marked *