Google I/O 2012 – Life of a Native Client Instruction

Google I/O 2012 – Life of a Native Client Instruction


NICK BRAY: Hello. My name is Nick Bray. I’m a software engineer. And I’m working on
Native Client. This is, unfortunately,
where dinner parties get a little awkward. So Nick, what’s Native Client? It’s a developer thing. It’s part of Chrome. I work on Chrome. That’s usually a fairly
soul-crushing thing. But fortunately, this isn’t
a dinner party. And we can talk about
interesting things. So when I say “interesting,”
what I mean is– you get to drink from
the fire hose. So we’re going to be discussing
address space. We’re going to be discussing
instructions, assembly language, that kind of thing. I will try to make sure everyone
can follow along, even if you don’t have a huge
amount of background in this. But we’re going to get into
the nitty gritty technical details of how Native
Client works. Before we do this, of course,
the only kind thing to do is give a bit of an overview and
say how this fits in, why we’re doing it, what’s
important here. So one big thing we keep saying
is that Native Client allows native code to be safe
and secure as JavaScript. And this is a very compressed
tag line, which unless you actually know what’s going on
behind the scenes, you aren’t quite sure what that means. So one picture is worth
a thousand words. And this is a picture, which
you probably are familiar with, is whenever you try to run
a piece of native code on a computer, you get a
scary dialog box, or run from the web. So say someone tries to install
an NPAPI plug-in on your computer or even download
an EXE from the Internet, then try to run it, well, the
operating system is typically skeptical of any binary which
is coming from the network. And says, hey wait, you probably
shouldn’t do this, but what do I know? You can do it anyways. So the problem with this is that
most users really cannot evaluate whether they should
click Run or not. And you sometimes lose 60 to
90% of your users when they get this dialog box. And even if they do hit Run,
then suddenly a lot of burden is on you. The burden becomes on you as
the developer to make sure that this piece of native code
you installed on your customer system is actually safe and
secure and doesn’t become an attack vector where someone
exploits your customers because you made a mistake. So native code, understandably,
is very scary, especially when you get
these dialog boxes. So does the story end there,
Native Client? Should we be doing this? As it turns out, it isn’t native
code itself which is the problem. It’s the fact that my
presentation has not been refreshed, and the bullet
points aren’t fading in. Hold on a second. OK, back in business. So the problem is that the
operating system has a very different notion of security
than the web does. So say I’m browsing a website
to watch videos of cats, because that’s what we all do,
we just don’t admit it. And then suddenly I notice my
tax return being uploaded to a Russian website. So this right here
is a Windows API call to open a file. So really, this Russian website
does not need my tax return in order for me to
watch videos of cats. So there’s something seriously
wrong here. But as it turns out, operating
systems are secure. They just think that any
program running on your computer is acting
on your behalf. So because you can open your
own spreadsheet, it assumes that any native program should
have access to it. The web figured out this is
probably not the way things should run when you’re loading
other people’s programs. So instead they say that the
program is operating on behalf of the website. And it should only do things
which you authorize the website to do. What this means, of course, is
that if you load native code and it can talk to the operating
system, it can just blow right past the browser. And that is the fundamental
problem with native code, is it can do things that your web
browser says is unsafe. Another problem, which isn’t
immediately apparent, is this is a Windows API call
to open a file. And imagine if you were, say,
writing a nice application to view cat videos, which also
happens to upload files. Well, suddenly you have a cross-
platform support issue. You can open the files on
Windows, but are you going to support Mac builds? Are you going to support
Linux builds? I mean, honestly, when you’re
writing malware, it becomes a huge problem. Of course, there’s some
honest developers who have the same problem. And that’s, when I distribute
native code, how do I make sure it actually
runs on all the operating systems out there? Another thing, again, which
isn’t immediately obvious, is this is a synchronous call. A lot of operating system APIs
were designed back in the days where synchronous blocking
of things seemed like a good idea. But with the advent of browsers
and JavaScript, a decision was made to eliminate
the use of threads for the most part within a single
JavaScript environment, a single document. And instead, everything was
single-threaded with asynchronous callbacks. So APIs have had to change in
order to support the web. Whenever you open a file on the
web, you, in fact, give it a callback to call you back. So, the big crux of Native
Client is making you talk to the web browser instead of
talking to the operating system, and in fact making it
impossible to talk to the operating system. So this is an example of what
a native program talking to the web browser would
look like. It’s a little ugly, but
it’s from real code. And well, real code is ugly. So this is an example of doing
a URL request to get the page www.google.com. It’s analogous to what we were
seeing with opening a file. But it’s a different
API, and it’s routed through the browser. So Native Client provides a
bunch of APIs for I/O that are mediated through the browser
through an API called the Pepper Plugin API. The Pepper Plugin API you can
think of as a successor to the Netscape Plugin API, where
things we’ve learned in the meantime, such as 3D graphics
are good, have been incorporated. And instead of just drawing to
a random window, you can now delegate to the browser and say
here’s some 3D content, just like WebGL. So ultimately, the Pepper Plugin
API gives you a lot of functionality similar
to JavaScript. As you can think, all the APIs
that JavaScript has to open URLs, to draw 3D content, it’s
also exposed to native code through Pepper. Not everything is I/O. So if you
want to spin up a thread or do things like that, that
actually occurs within a single process. You don’t need to talk
to the browser, don’t need it’s approval. And for that, we’ve used
the POSIX API. So you can spawn threads
and do similar things. So if your code’s running on
Linux, you can port the I/O to use Pepper. And more or less, everything
else should look relatively the same. And why are we doing this? The ultimate goal is no
scary dialog box. You can just run the code. It follows the web safety rules,
so you don’t have to warn the user. It’s part a seamless
experience. And in fact, most users won’t
know the running NaCl. We have a lot of games in the
web store now, which we aren’t trumpeting NaCl. It’s just you can run bastion
on your computer now. So the life cycle of a Native
Client application has three distinct stages. The first stage is what
the developer does on their computer. So you can get a bunch of
sources, existing library. Or say you’ve written a game,
and you want to port the game, run it on the web. You do supporting work on your
C files, and then you use a modified version of GCC,
which we provide. There’s one wrinkle on this. And that’s that you need to
use a version of GCC that targets binaries for different
platforms– or different architectures,
I should say, chip architectures. So the binaries that are
produced are OS independent, but for the moment, they have a
architecture, an instruction architecture set dependency. So for this talk, I’m going to
show you the internals for the x86-64 sandboxing model. And you can think that the
x86-32 and the ARM sandboxing models are quite similar. The details differ, but
spiritually they’re the same. At the end of the year,
we’re going to have a product called PNaCl. I should’ve defined this
a little earlier. When I say NaCl, I mean
Native Client. And I’ve just use this term so
much, I use it automatically. So I need to make very sure
that everyone knows I mean Native Client. So PNaCl, Portable Native,
Client, P-NaCl, is going to use an LLVM-based tool chain,
which will allow you to ship bit code, platform independent
bit code, across the wire. And that’ll get translated to
whatever architecture you want to use on the computer. So that’ll be roughly
the end of the year. And at the bottom level, what
we’re going to talk about today remains the same. So the interchange format will
change, but the sandboxing model, the inner mechanics,
is going to stay the same. So this modified version of GCC
outputs code which we can later statically analyze. And we’ll get into
what this is. We call it validation. Once you compile this code, you
upload it to a web server, just like a normal web app. In fact, it looks a lot
like a normal web app. You have an HTML file. You can have JavaScript. You can have CSS. And then within that app, you
have an embed tag somewhere. And the embed tag pulls in the
Native Client executable, and it can talk with the rest
of the web page. So you can make a UI
with HTML elements. And finally, at the end,
there is the user. The user is running a browser
on the computer. The browser loads the page,
loads the embed tag, pulls in the Native Client executable. So the question to
ask at this point is, where’s the security? Ultimately, the user wants to
say that this application I’m running on the network isn’t
going to harm my computer. So how are we are able to
make that assertion? We actually can’t say that
about the compiler. So the compiler tries to output
code, which we can verify as safe. But we don’t trust it, because
at the end of the day who knows what the developer’s
intending. They could just have an
arbitrary binary blob that they put together with
the hex editor. And when the user gets it, they
have to look at it and verify it’s safe before
they run it. And similarly, even if
it isn’t malicious, compilers have bugs. So GCC, LLVM, very complex
pieces of software. They were not written with
safety in mind to begin with. So saying that these compilers
are going to produce perfect code, that’s a bad assumption
to make. Instead, on the web browser, we
look at the code before we run it and apply some simple
rules to try to verify it’s safe rather than saying this
big complicated piece of software is where
the safety is. When Native Client actually runs
an EXE, the process model looks a little bit like this. So what you think of as the web
browser, what you see is called the browser process. And that’s just a normal
application running on your computer, talking with the OS. But every time Chrome visits a
new domain, it usually splits it off into its own process
and says there is a render process for the specific site,
which can do all the JavaScript execution, all the
rendering of the DOM. And we’re going to try to
keep sites separate. So if one site is compromised,
it rattles around in its own process and has a much harder
time attacking another site, stealing your credentials from
your banking system, or things like that. So these renderer processes run
in something called the Chrome Sandbox. The Chrome Sandbox, you can
think of it as deprivileging the processes. It says, hey, if these processes
ask for your tax return, that’s probably
a bad idea. So don’t trust them, don’t
give it to them. So you’d think that this
solves most of the problems for NaCl. But as it turns out, we’re
following a pattern called defense in depth. We try to build layers, each of
which is secure on its own. And if one of those layers
fails, the other should catch the problem. And there’s actually some
subtle problems with the sandbox I’m not going
to get fully into. But Native Client tries to
provide an inner sandbox inside its own process. So when you have an embed tag
in the web page, instead of running the Native Client
executable inside the render process, it spins up yet another
process, and then applies the inner sandbox to
make sure it never can– or we try to make sure it can
never do anything bad. So for the rest of this
presentation, I’m going to be talking about the
inner sandbox. I’m going to be talking
about what happens in the NaCl process. Now, there’s a lot of little
pieces that build up in order for us to verify that the
process isn’t talking with the operating system, or more
correctly, the code that’s loaded across the network is not
talking directly with the operating system. And we can do very controlled
calls to provide services that are needed. So the first step in this
journey is being able to understand what code we have. So you’d think this is easy. You’ve done assembly language. You see a lot of instructions. And we just look at the
instructions and say, bad instruction. We’re not running it. End of the story. However, computers see the world
in a different way than humans usually do. And that’s that native code
is a stream of bytes. And they start executing
the stream of bytes– pull in bytes, execute, pull
in more bytes, execute. And if we really want to
understand what the processor is doing, we have to disassemble
the code. We have to look at it from the
CPU’s point of view and see what it’s going to execute. So before we get into why this
is all difficult, the question is, what are we looking for? What instructions do we
not want to execute? The first one which I’ve been
harping on is syscall. So syscall, just as a
convention, on the right I will have the bytes that these
instructions compile into. So syscall is a two-byte
instruction. And what this does is, it says,
hey, operating system, I want you provide a
service for me. And without the outer sandbox,
without the Chrome sandbox, there is very obvious problems
here, is that you can open files, do all sorts
of bad things. But even with the Chrome
sandbox, there are still a lot of problems. So there’s a recently publicized
vulnerability in the Intel implementation of the
x86-64 architecture, where the sysexit return– so in the
operating system return from assist call, if you set it up
in such a clever way, you could cause it to overwrite
arbitrary memory inside the operating system and result in
a exploit, where you could escalate privileges in
the operating system. So the silicon itself
allowed an attack on the operating system. The bottom line is, we
simply do not want to make these calls. These calls are the gateway
to the operating system. They are an attack surface. So even if we’re in a
de-privilege process, we don’t want to make them in
the first place. Another interesting
instruction. This is actually a fairly old
example, but famous, is the FOOF instruction. The FOOF instruction, because
it starts with F-O-O-F, had this nasty habit of actually
freezing your entire computer when executed on an
older Pentium. So under the hood what was going
on is it applied a lock, and then it tried to execute
an invalid instruction. And it never recovered
and unlocked so your entire CPU hung up. So if you talk to some security
people, they’ll say, well, this isn’t really a
security vulnerability. Because, well, you know, you
aren’t using your bank account information to some
random hacker. But if you think about it from
a web perspective, do you really want to surf to a web
page and have to power cycle your computer? It’s bad. So there is these classes of
instructions that again we want to blacklist and say if
we encounter these in an executable, obviously the
person’s up to no good. So syscalls, FOOF instructions,
we’re not going to mess with them and just
reject the binary out right and not run it. So there’s a third class
of instruction, which is a little weird. And if you just look at this
instruction, all it does is multiply a bunch of
numbers together. Perfectly safe. There’s no problem with this. Well, as it turns out, the one
wrinkle here is that this is part of a new instruction set,
the SSE4 instruction set. So if you’re running on a
computer that doesn’t support this instruction set, what
happens when you try to execute it? So in theory, it should just
halt the process, but has everyone really tested every
invalid instruction possible on every chip? So instead of running this risk,
instead what we do when we encounter it, instead of
rejecting the program, we simply write over it with
halt instructions. So a well-formed executable
should not try to execute this instruction if it’s
not supported. But if it does, because we
overwrote it with halt instructions, that causes the
execution to stop when it encounters it, just like it
theoretically should if the instruction was not supported
by the processor. So overall, Native Client is
looking for a variety of instructions that wants to say
either don’t run the program, or overwrite this to be
sure that we’re safe. So how do we find these
instructions? That’s the crucial step. So previously, I said it’s
a stream of bytes. And you’re taking chunks
out of the stream of bytes as you go. And that’s nice until you
realize that you aren’t just going in one direction. You can occasionally hit
a jump that’ll take you somewhere else in
the execution. And it could be an
arbitrary byte. So there’s two classes
of jumps we’re going to deal with. One is direct jumps, jumps where
you know the address that you’re going to, and
indirect jumps, where your address is calculated from data,
and you may not know exactly where you’re
going upfront. So here’s an example of a
problematic direct jump. So two instructions here. The first instruction loads a
constant into a register. Now, this is a strange constant,
but we’ll assume the programmer knows what they’re
doing, and they have some reason for that constant. And then the next instruction
jumps backwards 4 bytes. So on the surface, this
should be OK. But the fundamental problem is
that the move instruction right before the jump backwards
is 5 bytes. So you’re actually jumping back
into the middle of the move instruction. So if we look at how the
processor sees this instead of how our human eyes see the
assembly instruction, it first sees the byte b8 and says, oh,
b8, that’s a move constant into eax instruction. And then there’s going to be
4 bytes following it, which define a constant. So it happily pulls out the
constant, moves in the register, goes on. Then it says, oh, there’s a
jump backwards 4 bytes. It’s actually a jump backwards 6
bytes, because the processor calculates from the end of the
instruction, whereas the assembly language calculates
from the beginning. Just a detail, but
some people may find it a little confusing. So you jump backwards 4 bytes,
and then the processor happily starts executing what
it previously treated as a constant. So it says oh, 0f 05. Hey, that’s a syscall. Let me do a syscall. And then suddenly, you don’t. And then it sees a jump, which
nicely takes you out past the previous jump, and you go on
with your normal execution. So in one single instruction,
we managed to smuggle in two additional instructions,
which just entirely compromised your system. So Native Client doesn’t
play this game. If it ever detects a program
trying to jump into what it previously thought was an
instruction, it says I’m not going to run this program. I’m not going to touch it. Obviously, you’re doing
something sketchy. So if you generate code like
this, we don’t run it. The other class of jumps
are indirect jumps. These are a little harder. Because you don’t know exactly
where you’re going. So how do we tell if we’re
jumping inside an instruction or not? So this is a C example of where
indirect jumps come in. You take a function pointer. You call the function pointer. So if you look at the assembly
language, this is very simplified, assuming
you have an aggressive optimizing compiler. You do a direct call to a known
address, which we don’t know what it is. So it’s not fully
disassembled. And then the return value ends
up in rax, the register. And then you say, yeah, just
call that, whatever it is. So if we were being aggressive,
we could do some deep analysis, try to figure
out what the pointer could possibly point to. But that’s hard. That’s expensive. So instead, a much simpler
thing to do is look at individual instructions and
say can we infer from the sequence if what we’re
doing is safe? So what we’re saying is that
this function pointer could be a random number for
all we care. Can we make jumping to
a random number safe? Can we make sure that jumping to
a random number doesn’t get us inside an instruction? So the first step, which may not
make sense until you see the second step, is any pointer,
any function pointer, any instruction pointer that
we’re going to jump to, we first put a mask on it. And that mask says, drop
the lower 5 bits. Set the lower 5 bits to 0. So how does this help us out? What it means is that instead
of being able to jump anywhere, we can instead jump
to every 32 bytes, 1/32 of everywhere. And this isn’t immediately
obvious how it improves our lives. But we modified a
compiler, right? So we can tell the compiler
that instead of having instructions that could move
over or lap over the 32-byte boundaries, any time you would
potentially omit an instruction that overlaps the
boundary, nudge it down a little bit. Stick in extra operations
that do nothing. And then we know that any time
you do an indirect jump to a 32-byte boundary, you will be
hitting the start of an instruction instead of the
middle of an instruction. So the mask allows you to jump
to known safe locations, even if you don’t know what
those locations are. Here’s a more concrete
example. Here is that funky move again
with that constant. And if you generated it so that
it overlapped the 32-byte boundary, an indirect
jump could again execute this syscall. Because it would go to a mov 32
address, see the 0f 05, and boom, there goes your
tax return. So instead, the validation
algorithm require would reject this, because it overlaps
the boundary. Instead the compiler would
generate this extra no-operation and move the
instruction down. So the combination of not
allowing direct jumps inside that instruction and making
sure that no instructions overlap 32-byte boundaries allow
you to know where all the control flow in your
program is going. Aha, you say, but I’m
a clever hacker. I can modify the code after
you validate it. So validation just happens
at the beginning. We say, we’ll look
at the code. If it’s good to go, we’ll
let you run the code. So, to prevent code
modification, we say any time we have a chunk of data, which
represents code, it’s going to be readable and executable
when you don’t have the permissions to write it. So everything that goes through
the validator, once we know what it does, we make sure
it keeps doing what we know it does. Aha, you say. But what about things
that aren’t code? So I can just do a buffer
overflow somewhere, jump to that buffer overflow, start
executing it, and I just executed code you haven’t
validated. Well, again, every piece of data
we make sure can be read and written, but not executed. So this plugs the hole for
self-modifying code. So I just lied to you. And that’s that things can
change after the initial setup of the program. So you can load dynamic
libraries. You can have just-in-time
compilers which emit new code and actually modify the code
in very controlled ways. But how you do that is
kind of complicated. Because you need to make sure
that if there’s multiple threads, you never get memory
de-coherency, where you execute an instruction
which is in the middle of being modified. At the end of this presentation,
I’ll have a link to the research papers. So if you’re really interested
in how we do memory safe code modification, you can read
up or ask me afterwards. But for this presentation, we’re
going to ignore this rather large, ugly issue. Another thing is that mprotect
is now security-critical. So syscalls, we’ve thought about
all the damage we could do with them. But now we can start doing
indirect damage like unprotecting a page, writing it,
then boom, we’re executing code that is invalid. Similarly, there’s other
syscalls like GetProcessID, not immediately obvious why
they’re dangerous, but they can be used to escalate attacks
by knowing where you’re going from. So the name of the game is
white-listing, only allowing functionality we know is safe
instead of saying, eh, do a syscall , whatever. So that’s the basics of how we
allow code to be decompiled. And if you actually start
looking at how this affects calling and returning a
function, there’s some interesting things that
get shaken out. So I’m going to do
in the reverse. I’m going to show how you
return from a function. Then I’m going to show how you
call it, because the return impacts the call. So usually, returning from
a function is a single instruction. Return, pop an address off the
stack, jumps to that address, and you’re back to where
you called from. So you could call the same
function for multiple places. So the call records where you
called from on the stack in order to be able to
return to it. But implicitly, this is
an indirect jump. So a malicious program could
stick a random number on the stack and then jump instead of
calling to the function. And then when the function
returned, who knows where you are. So there is a type of exploit
called return-oriented programming, which uses this
kind of thing where the returns can be repurposed
for jumping to arbitrary locations. So we can try to fix this. We can manually pop the return
address off the stack, mask it, just like we should for
indirect jumps, push it back on the stack, and then return. So problem solved. Well, no. I mentioned threads earlier. And threads are a big problem
because there could be another thread in the background trying
to smash the stack. And so between the moment where
you push the address on the stack and when you return,
the memory could get changed out from under you, and you
could end up anywhere. Who knows where? So we can’t really trust any
addresses in memory. We can only trust addresses
in registers. So what this means is that in
order to return, we can’t use the return instruction. We pop off the stack, mask it,
and then jump to the address. This has a few consequences,
like branch prediction is a little harder. And we’re using more bytes
to do the same operation. So I mentioned earlier that
the sandboxing schemes for different architectures
were different. And this is largely due to the
fact that we are trying to minimize the cost and tailor
it to each architecture. So we try to keep the number
of bytes per sandbox instruction as low as possible
through being horrendously clever about how
we mask things. And if you want to talk about
this, again, we can talk about it afterwards, about clever
instruction encodings. So return. A very basic instruction is
actually dangerous because it’s doing an indirect jump
to a location from memory. So bad idea. We have to do it explicitly. So whenever we have masks, it
becomes critical that we don’t bypass the mask. So we may have two instructions,
the mask and then the jump. But if there’s some other jump
which goes between the instructions, it doesn’t
actually violate what I talked about previously. I said you can’t jump
into an instruction. But if you jump between these
two instructions that are critical for safety,
suddenly you just stripped off the mask. The entire security
model fails. And you have a problem. We call these
pseudo-instructions. So whenever we have a sequence
of instructions which is security critical, we say treat
it just like it was an instruction. So direct jumps cannot jump
inside a pseudo-instruction. Indirect jumps cannot jump
inside a pseudo-instruction, which means that the entire
pseudo-instruction has to not cross a 32-byte boundary. As a terminology, we
call this bundling. So what does this mean for
calling a function? If you just call it like you
expected, just from the middle of a bundle somewhere,
you do the call. You know, you see
the mask here. You see the indirect jump. And then where do
you return to? The problem is that the mask
drops the lower 5 bits of the address, so you aren’t returning
to the address that was pushed on the stack
if those lower bits were not all zeroes. So you end up returning to the
beginning of the bundle where the call was from. And this is obviously
not what you want. This starts to look a bit like
an infinite loop unless you account for it. Where you really want it to
return is immediately after the instruction. So the work-around for this is
that whenever you have a call, you pat it down to the very
end of the bundle. And this means that the return
address is at the beginning of the very next bundle. So when you mask it, when you
drop the lower 5 bits, it doesn’t change it at all. So all these instruction
sequences that we’re showing in fact should not change the
correctness of the program. They are simply there for the
validator to say, oh, yep, I can prove that this is safe. And if somehow garbage data gets
in here, I know that I’m going to be jumping
to a known place. But in normal operation, the
compiler will stick everything on 32-byte aligned boundaries
that we need to jump to indirectly. OK, so yet again, I lied. I seem to be a serial
liar here. I apologize. There’s more going on
in the process than just this bit of code. We can validate a lot of code,
but as it turns out, there’s other code and data in
the process that we don’t fully control. It needs to be there
so we can use it. So we have a world where we
have a single process with code we don’t trust and
code that we do trust. So this is the general view
of what I’ve shown so far. There’s untrusted code
and untrusted data. And what I mean by untrusted
is this code is coming from somewhere across the wire. And instead of having to have
this dialog box that says, well, you’re running
at your own risk. Instead, we validate it. And then we say this conforms
to our rules. So we’ll run it without having
to place trust in it. We will enforce the security
instead of trusting, so untrusted code, untrusted
data. Well, every time you launch a
process, the operating system likes to stick in some code. So you can talk with the
operating system. And we could do something really
nasty, like try to overwrite this, kick it out. But we’re going to need
it eventually. We’re going to need
to do something. Simply living within the
sandbox isn’t enough. So down the road, we’re going
to need to talk to the NTDLL on Windows, for instance. But we don’t want the untrusted
code to do it. Similarly, we’re going to be
talking with a web browser. So the easiest way to do this
is load the DLL for the web browser in the process. So we can call the same
functionality to talk between processes that Chrome uses. There’s also trusted data. So when we’re running the
sandbox, we have to keep track of things like where
code is mapped. Because if the untrusted code
says, well, there’s actually no code there, so why don’t
you map code there again? Then we could get weird
overwrites, partial instructions. So there’s bookkeeping data. And if you could clobber that
data, if you could go there and say overwrite the table
for where all the code is, then the untrusted code
could start doing, again, very bad things. What we need to do is we need to
make sure all the execution and all the data access that
can be done directly by the untrusted code only happens
within a confined region that doesn’t include NTDLL, that
doesn’t include Chrome DLL, that doesn’t include any bit of
code or data which could be used as an exploit. So on 64-bit systems, this is a
4-gigabyte range of memory. And we reserve one of the
registers, R15, to point to the bottom of this range. So one of our security-critical
properties is that R15 cannot
be overwritten by the entrusted code. So as the validator goes
through, it looks for anything that can modify R15. And if something does, it
goes, nope, not going to deal with it. A thing you may note also is
that this is a 4-gigabyte range, which happens to be 2 of
32, which allows us to do some horrendously clever stuff
to make our masking as small as possible. We’ll get into that
in a second. So here’s a scenario that
we have to worry about. What happens when the untrusted
code tries to jump outside the sandbox? So it can’t do a direct jump
outside this constrained range, because the validator
can’t see the target. And because it can’t see the
target, it can’t tell whether it’s in the middle of an
instruction, so it says, no. But it could do an
indirect jump. So it could do an indirect
jump to a 32-byte aligned boundary somewhere in NTDLL. And we have to allow this,
because you could be loading shared libraries. So you may not know where the
code is before you load it. So what we have to do is we have
to make sure the indirect jumps only fall within this
constrained range. So how do we do that? We have to confine the jumps
to the 4-gigabyte range. Here’s an example. It’s just an empty function. What’s happening implicitly
here, however, is that it’s returning. And as we went through all these
explanations, this is what a return eventually
looks like. There’s this masked indirect
jump back to wherever you came from. But this could go into NTDLL. How do we fix it? So we confine it by masking
it and dropping the upper 32 bits. So we boil it down to
a 32-bit address. Then we add the offset. And then we actually use that. So remember I was saying
horrendously clever? This is not really
self-promotion. When I was doing this
presentation, I had to work through exactly how these
instructions worked. It’s actually pretty
interesting. So the “and” right here
is doing a 32-bit operation on a register. And then later the register is
being used as a 64-bit value. So doing the 32-bit operation
implicitly zeroes the upper bits. And this allows the actual “and”
to be packed down into a single byte data. So it says, it’s going to
be e0 sign extended. And then I’ll implicitly drop
the upper 32 bits, because it’s a 32-bit operation. Then you do a full 64-bit add
and a full 64-bit jump. So the cost of this is about
8 bytes as opposed 2 bytes. So there’s a bit of overhead for
doing it this way, but we know where it’s going. We know it’s only going to be
within the confined region. And we know it’s only going to
be to a 32-byte boundary. And we know there’s going to
be no instructions that are overlapping those 32-byte
boundaries. The next thing to worry about is
reading and writing bits of data that are outside
this confined range. Writing is obviously
a problem. If you can write to something,
you can change it. You can control it. It makes attacks much easier. Reading, debatably it’s not an
attack, but this can be used to help attacks. So if you can poke around
memory, find where things are, then you can do much more
controlled jumps, much more dangerous intended
actions than just jumping around randomly. So how do we confine
data access? Here’s an example
of a C function. We’re just taking a function
pointer, and we’re writing a constant to that pointer
wherever it may be. Thus far, we haven’t
talked about sandboxing rights at all. So the Intel instruction for
doing this just says move this constant to wherever the memory
address points to. So to sandbox it, we do
something similar to jumps. We mask it by moving a 32-bit
register to itself. So again, we rely on the
implicit zeroing of the upper bits. But since we don’t need to
discard the lower bits, it’s just a move. Simple enough. And then we do a complicated
addressing mode, which actually adds R15 simultaneously
with moving the constant to the address
that’s computed. So this move instruction is
saying add R15 to rax and then multiply rax by 1,
and there you go. Instead of 5 bytes to do this
move constant, we got 9 bytes. Not too bad. There’s a little curious
thing here, though. There’s that multiplier. So we know that rax is a
32-bit value, but that multiplier can be up to 8. So we aren’t actually operating
within a 4-gigabyte range, we’re potentially
doing a write to 8 times 4 gigabytes? And there’s ways you can rack
it up even further with constant offsets. So we could tweak this a little
harder and do the mask and get rid of the
multiplications, but sometimes compilers just like
to generate these. And the more features you get
rid of, the slower the code’s going to be. So instead of trying to do
instruction sequences that are safer, we actually say, well,
40 to 44 gigabytes on either side of this confined range,
we’re going to mark as– we own it. So you can’t use it. So you aren’t actually
allocating the memory. You’re just marking it as no one
gets this memory but us. And it’s illegal to read. It’s illegal to write. It’s illegal to jump to. It doesn’t exist. So if you can do a memory access
which is outside this 4-gigabyte range, you get caught
by the guard region. And that’s how we allow these
addressing modes. And just as a funny side note,
sometimes we get people benchmarking Native Client
and say, you take over 80 gigabytes of memory! And we’re like, do you have over
80 gigabytes of memory? But really what they’re looking
at is they’re looking at address space usage rather
than actual memory usage. So we can’t do anything fun. We can just go inside
the sandbox. How do we get out? How do we actually request the
URL like I showed in the beginning of this
presentation? To do that, at the bottom of
the sandbox Native Client inserts a bit of code called
the trampoline. Now the trampoline is code that
would not normally be validated, but allows you to do
a controlled jump outside the sandbox. So there’s a trampoline entry
for each service we provide, such as spawning threads. And when you want that, you
jump to the trampoline. And the trampoline jumps you out
into Chrome DLL, where we provide an implementation
for that. So the set of trampoline calls
you have, which are analogous to syscalls, are the same
on every platform. So in one swoop, we are
providing a cross-platform API and controlling exactly what
services the native code gets. The trampoline itself, again,
is small, but in some ways overly clever. And we take a constant address,
stick it in a register, and then call
that address. And there’s a few things
going on here. One is that we do the move into
the register instead of doing a direct call, so that
it’s easier to patch the code as we know exactly the address
we’re going to. And in fact, we can make that
address the same for all the trampolines. So if you have multiple
trampolines going to the same place, since direct jumps are
relative, you’d have to do a lot of math and make sure
that you’re jumping to the right address. But here we just jump to
a constant address. Another thing is that it’s a
call instead of a jump, so we can have a trace of where the
syscall’s coming from. So we know, oh, we’re going
through trampoline 4, therefore, we know what
service we’re getting. And then finally at the
end, there’s a halt. So even though we’re doing a
call, we never return to where we called from. It’s just a method to
trace the address of where we came from. So if anyone returns from inside
the system code, it’s going to hit the halt, and it’s going to prevent execution. This is all interesting because
it’s within 13 bytes. So this means that the
trampoline fits within the 32-byte bundle. And this means that indirect
jumps will never go inside the trampoline. They can only go to the start
to the trampoline. And this is what allows us to
do safe exits outside of the NaCl sandbox. So putting it all together, this
is the API call which I started with. It is loading a URL. So to do this, the untrusted
code initiates it by jumping to the trampoline and saying
I want to do this request. The trampoline takes
it to Chrome DLL. Chrome DLL has an implementation
that says, OK, native code wants to
do a URL request. Well, I can’t do it myself
because I’m running inside the Chrome sandbox. So instead, what I have to do
is I have to talk to the Chrome browser via the
render process. So to do that, I’m going to need
to do some inter-process communication. So it talks to the operating
system and says, hey, send this bit of the data to the
renderer process, and then it will know what to do with it. And at that point, it’s
out of NaCl’s control. It’s just however the JavaScript
call would be. Same paths. That is Native Client
in a nutshell. And I hope you all
followed that. And we have questions
afterwards, if you don’t. There’s more to this. As I mentioned before, dynamic
code loading in JIT, memory consistency, making the
sandboxing model work is a whole other ball of wax. I find it very fascinating. I hope you guys look into it. Portable Native Client. This is the future. Bit code, LLVM tool
chain, fixes, the architecture-specific issues
we have now, but you still can use it. You can write applications
now and switch to PNaCl when it’s available. You may have noticed that
nothing in this presentation really has to be inside
the browser. So this is a technical solution,
the sandboxing, software for the isolation. There have been some projects
to use the same sandboxing technology to, for instance, run
computation in the cloud. Or to just say, you know,
I don’t want to audit this piece of code. Well, I’ll just throw it in the
sandbox, and that way, I know that the third-party code
is going to be much more contained than it
would otherwise. Recommended reading. So every time we give a NaCl
talk, we point people to gonacl.com. This is a developer-oriented
site, where you download the compiler, the SDK tutorials
about how to get you started. This talk was a little
technical, more research-based, so we have a
bunch of research papers, too. I point you towards those. If you Google gonacl or Native
Client research papers, you’ll get these URLs. And my favorite one is actually
“A Tale of Two Pwnies.” So every year
or so, there’s a browser security contest. And this year, Chrome
had two exploits. One of them actually touched
NaCl but did not break it. So they used NaCl as
an attack platform to hit the GPU process. And I myself actually learned
a lot from reading this. And it’s really eye-opening to
see how many layers and levels you have to get through to
do a modern exploit. The one that involved
NaCl was six. The other one was 10 because
they had to chain that many different vulnerabilities
together to actually get an exploit. So security, very interesting
field. I strongly suggest you
read that paper. Now the fun part. Questions. [APPLAUSE] AUDIENCE: Thanks for
the presentation. NICK BRAY: My pleasure. AUDIENCE: Out of all of I/O,
it’s probably the most interesting, funnest one that
I’ve ever been to, so that was really cool. I’ve known NaCl for about
an hour and a half. I was wondering, does it matter
what version of CIUs? Does NaCl care whether
it’s C99 or C90? NICK BRAY: It doesn’t even
matter if you use a C-compiler. So you can actually hand-write
NaCl code, and it’ll run. But we provide compilers that
generate code that’s compatible. So I think our version
of GCC supports C99. Do you know? Yeah, I think, so
you can use C99. It’s whatever GCC supports,
really. AUDIENCE: The one question
I had was about the indirect jumps. And it sounds like you are
relying on the compiler to put everything on 32-bit
boundaries. That seems to me like the
only position where a handwritten exploit– I mean, you were assuming that
all the code coming in would allow jumps to 32-byte
boundaries. But if I were to hand-write some
asm that a jump into a 32-bit boundary was a actually
a bad execution, how do you manage that? NICK BRAY: Validation. So while we’re going through
looking at what the instructions are, we record
where they are, too. So internally, you can think
that we have a bit factor, which contains a bit
for each byte. And every time we see an
instruction start, we say, boom, there’s a bit. So whenever we have a
direct jump we say, is the bit set there? We actually have to do this
after we see all the instructions. But in the final thing, we go,
OK, here’s all the jump targets, here’s all the
instructions, starts– AUDIENCE: But what about
the indirect jumps into the 32 byte– NICK BRAY: We do that
on the fly. So you say, OK, while I’m
parsing this instruction, I notice that it’s overlapping
the 32-byte boundary. Boom, that’s bad. AUDIENCE: OK. So you also make sure that all
32-byte boundary instructions, whatever is at a 32-byte
boundary is also safe. NICK BRAY: Yes. AUDIENCE: So it looks like
you’re creating a 4-gig memory limit again. Didn’t we just get
rid of that? NICK BRAY: It depends on
how you look at it. So there’s all sorts
of devices. But what you’re really saying
is, can I get more than 4 gigs of memory? And the answer is we could
change the sandboxing model, but there would be performance
implications. So a lot of the clever things
we were doing with dropping the upper 32 bits, suddenly
you’re carting around 8-byte constants. And that’s a generally
bad thing. So has it been a problem? And the answer is, we haven’t
really had any developers complain about it. We’ve been living under the
4-gig limit so long, that it’s not been an issue. Plus do you really want an
application in your web browser consuming that
much memory? Eh, most the time, not. There are some applications
that you write that you may want to. AUDIENCE: In five years,
certainly? NICK BRAY: Yeah. So sandboxing models
are flexible. And once we get PNaCl running,
we can take another look at generating a new sandboxing
model or something along those lines. AUDIENCE: Kind of a
related question. You, at the beginning, showed
x64, x86, and ARM. On x86, you obviously can’t
do the same kind of jump constraint because you are– if you’re on 32-bit x86,
you’re then limited. You’re going to reduce your
memory space by far more and lose a very precious register. NICK BRAY: One gig. And you don’t lose precious
registers. We do something very perverse
on 32-bit Intel. We use segment registers. So we’re bringing back all
these things that you thought were dead. AUDIENCE: Awesome. NICK BRAY: So, for those of you
who don’t know, segment registers say this is the range
of memory you can use. So we say, OK, 256 megs for
code, 1 gig for data. If you jump outside
this, boom. And then we say you can’t change
the segment registers while you’re running. This has a few weird
implications, like most people thought they were dead. So the Intel atom processor
for instance, they didn’t spend so much time supporting
segment registers. They do lip service, but then
when you actually use them in nonstandard ways, it
slows down a lot. AUDIENCE: Thanks. AUDIENCE: Hi. It’s a great project. I love it. It is very perverse though, in
some ways, as you’re going through all of these things. I was just wondering, with the
LLVM thing that you’re going to be doing, does it get easier
now that you control the instruction set? I mean, can you somehow do
something to make this whole process simpler? NICK BRAY: Define the “whole
process simpler.” AUDIENCE: The verification
process. I mean, since you control the
intermediate format, is there some way to– will it become simpler when
you get to the LLVM model? NICK BRAY: The big problem
is that we can’t trust the compiler. So we can’t audit it. We can’t verify that it’s– you know, it should generate
the code we want. But at the end of the day, what
we do is, we have to have validation be the last
line of defense. So if the code doesn’t look
safe, we don’t run it. And we make no assumptions
about its providence. So what LLVM would allow us to
do is do more creative things. Right now, the binary that’s
shipped across the wire is something that we’ve stabilized
and we said this is what we’re going to support. Once we start supporting bit
code, then we can generate other sandboxing models. We can generate other
interesting low-level things. And it decouples us and gives
us a lot of flexibility. But at the bottom level, there’s
going to have to be some algorithm that goes through
and says does this native code look right? And if it doesn’t,
out of there. And once LLVM’s in
the picture, it should always look right. But we are going to
bank on that. We’re going to always
have the last line. AUDIENCE: Will the bit code be
actual LLVM byte code, or will you have something of
your own nature? NICK BRAY: I think the plan
is actual LLVM byte code. AUDIENCE: That was actually very
similar to the question I was going to ask. When you’re going through the
initial design for NaCl, why did you choose native code
versus LLVM or comparing it to the JVMs and what they do? NICK BRAY: Why choose
native code instead of everything else? AUDIENCE: Was it to have a
simpler just run- time environment, not have to
have an actual JIT? NICK BRAY: Yep. Part of the view was
compatibility, because we have a lot of infrastructure
for native code. So if we’re just running native
code, a lot of that should be analogous, fairly
straightforward. Less overhead. You can access certain
intrinsic instructions directly. You can do threads. You don’t have to solve
all these ugly issues at the VM level. Instead, you can just validate
it and let it rip instead of trying to have a larger surface
area, which you are trying to prove is safe. Part of it was also
a technical issue. We realized, hey,
we CAN do this. So how can we bring
it to the web? So we finally realized
native code doesn’t have to be unsafe. And what are the
opportunities? So we’ve been seeing
a lot of people port games, for instance. And when you spend how many
years writing native code, and you want to port it, well, you
don’t want to jump through too many hoops. You can try to do weird cross
compiles into JavaScript VMs, but, it works some
of the time. Instead, why don’t you just run
the native code, and call the browser instead of the OS. That’s the general philosophy,
is trying to keep the surface area small and trying to make
it as close to other native code development as possible. AUDIENCE: Cool. Thanks. AUDIENCE: I’ve got
two questions. I think they’re both small. Trampolines got me thinking. Do you have any dev tools that
would de-bug what’s going on, so that you can see in the
inspector that, OK, it’s making it HTTP requests
and so on? NICK BRAY: De-buggers are
something we’re working on. They’re harder than
you’d expect. Because they’ve made many silly
assumptions that native code just happens to work the
way native code does. So the moment we start adding
this R15 register to offset everything, there’s been a lot
of work to try to get the de-buggers to get all
the right symbols. AUDIENCE: I’m thinking on a much
higher level actually. If you’re coming as a web
developer looking at things in the inspector, what’s going
on in this web page? What is it doing? Can I see what stuff is it
requesting on the web? NICK BRAY: At the moment,
half and half. So whenever you’re doing
a Pepper call, that usually gets traced. So every time you see a URL
load, Chrome has the console of all network activity. And it will get logged
in that. So you’re mediating through
the browser, so all the instrumentation the
browser has. What’s actually going on inside
the native process is a little more opaque than I’d
like at this point. And we’re thinking about ways
to expose health and metrics and pull that out
of the process. AUDIENCE: That’s awesome. The second question is, so
you’re defending against all these things that are unsafe
from the system’s perspective. But are you having any checks
and bounds on stuff that’s causing infinite loops that
just eat the CPU? That kind of stuff. NICK BRAY: Nope. AUDIENCE: OK. Thanks. AUDIENCE: This is similar to
the question before last. Once you move into the LLVM
world and you’re sending basically Virtual Machine
instructions and calling a limited API, how would you say
that PNaCl would compare to Java or dot-net? NICK BRAY: One thing about LLVM
is it’s a bit misnamed. So the Virtual Machine
name came earlier in its life cycle. And it’s more a compiler
IR than it is, strictly speaking, a VM. There’s some
architecture-specific things that have got leaked into it,
which have had to be hammered out in order to use it as
an interchange format. So how would byte code
compare against VMs? It’s an interesting question. I think the only real answer
I would add to that is surface area. Securing a VM is going to be
much harder than validating native code and just use
the model that’s there. And the VM will likely be
slower, give or take just-in-time compilers,
how well those do. AUDIENCE: So does the validator
in PNaCl work on native code still? Or is it still validating
LLVM code? NICK BRAY: We validate all
native code before we run it. So PNaCl you can think of as
largely a translator, which is I’m going to take this bit
code, then turn it into machine code. And then we pass it off
to the validator. The validator says, OK, you
did your job right. You’re good to go. AUDIENCE: Thanks. AUDIENCE: Have you done any
benchmarking on the difference between the unmodified code
that you do, like adding padding and replacing things
with pseudo-instructions, versus the modified version? NICK BRAY: Yes. AUDIENCE: And what
are the results? NICK BRAY: It depends. Also, one of these
horrible answers. I’m not trying to weasel
out, but the truth is, it does depend. So if you’re doing a numerical
application, for instance, you’re not going to have
a lot of jumps. You’re not going to have
a lot of calls. So you can usually rip through
those instructions. But certain benchmarks which are
doing indirect jumps all over the place, you’re going
to get no-op padding. You’re going to get a bunch
of guard instructions. And on 32-bit, because we’re
using segment registers, it’s actually more efficient. So I think on 32-bit, it’s like
10 20% slowdown compared to full native speed. On 64-bit, our guard sequences
are a little more complex. They take up more bytes,
a few more instructions here and there. And I’m not exactly sure what
the benchmarks are. Again, I say rule of thumb,
20%, although on some degenerate benchmarks, it’s down
40, 50% just because of the way the code works out. Again, the answer is that, no
one’s complained yet either. So it works as intended. AUDIENCE: Thanks. AUDIENCE: Another
PNaCl question. Since you’re going to have
LLVM living in PNaCl to generate the instructions, you
mentioned earlier that you can still do JITs in
Native Client. I’m thinking about some ways
in which you could do that. Are you going to expose
the LLVM translator to applications so they can just
use that instead of having another copy of LLVM inside
to do JIT-ing? NICK BRAY: Interesting
question. Very interesting question that
I can’t answer, because I’m not working on the
PNaCl project. So what they’re working
on there. But you can imagine all the
complexities of, well, since we can’t trust the translator,
how do we fit it in so we can run it in an untrusted
capacity. So one neat thing I didn’t
mention is that the actual ahead-of-time time translator
is implemented inside of Native Client. So we have the LLVM compiler
running inside the Native Client sandbox to produce code
that then we run inside the Native Client sandbox. AUDIENCE: Thanks. NICK BRAY: Standard technique
for presentations is wait a few seconds. Usually someone gets
uncomfortable, stands up, and ask another question. If that doesn’t work, then you
say, OK, thanks for coming.

2 thoughts to “Google I/O 2012 – Life of a Native Client Instruction”

  1. Very interesting session, I though "GENIUS!" multiple times, starting with the direct jump "smuggling", I'd really like to see another session like that!

  2. It was about time somebody made some serious effort to get rid of all of the Java and Javascript nonsense on the web.

Leave a Reply

Your email address will not be published. Required fields are marked *