I’d been writing and running Linux binaries for over a decade before I actually understood how a dynamically-linked function call works at the instruction level. I’d vaguely assumed “the loader replaces the call with a pointer to the actual function at load time,” which is sort of true and sort of very wrong. The real story involves two tables — the PLT and the GOT — and some clever indirection, and it’s genuinely beautiful once you see it.

Let’s start with a simple C program:

// hello.c
#include <stdio.h>
int main(void) {
    puts("hello");
    return 0;
}

Compile: gcc hello.c -o hello. The puts function lives in libc.so.6, not in our binary. But our binary has a call instruction. Where does it jump to?

objdump -d hello shows us:

0000000000001149 <main>:
    1149: push %rbp
    114a: mov  %rsp,%rbp
    114d: lea  0xeb0(%rip),%rdi        # 2004 <_IO_stdin_used+0x4>
    1154: callq 1050 <puts@plt>
    1159: mov  $0x0,%eax
    115e: pop  %rbp
    115f: retq

callq 1050 <puts@plt>. The call goes to puts@plt. Not to puts. What’s at puts@plt?

0000000000001050 <puts@plt>:
    1050: jmp   *0x2fca(%rip)        # 4020 <puts@GLIBC_2.2.5>
    1056: pushq $0x0
    105b: jmp   1020 <.plt>

The PLT entry for puts is a jump through a pointer at address 0x4020 (in the GOT). Initially that pointer contains the address of the next instruction in the PLT entry itself (1056), which pushes 0x0 (the index of this symbol) and jumps to the PLT header at 1020.

The PLT header calls the dynamic linker’s resolution function. The resolver looks up the symbol, writes the actual address of puts into the GOT slot, and jumps to puts. On subsequent calls, the indirect jump at 1050 goes directly to puts because the GOT slot now points there.

This is “lazy binding.” Symbols are resolved on first call, not at load time. It saves startup time — if your binary only uses 10% of the symbols it imports, you don’t resolve the other 90%.

Let me draw the structure:

main
  |
  v
puts@plt:  jmp *[GOT: puts slot]
                       |
                       v
 (first call)  ----> .plt entry  ----> resolver  ----> libc's puts
                                           |
                                           updates GOT slot to point to puts
 (later calls) ---> puts directly (GOT slot now correct)

There are two tables:

  • PLT (Procedure Linkage Table): a table of stubs. Each stub is a small code sequence that jumps through a GOT entry.
  • GOT (Global Offset Table): a table of pointers. After resolution, each entry points to the actual function (or variable).

The genius of this design: the PLT entries are in .text (read-only executable). The GOT entries are in .data.rel.ro or .got.plt (writable, but marked read-only after load on modern systems with RELRO). The resolver mutates the GOT on first call, then marks it read-only.

You can see the GOT with readelf -r hello:

Relocation section '.rela.plt' at offset 0x5d8 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name
  000000004020  000300000007 R_X86_64_JUMP_SLO 0000000000000000 puts@GLIBC_2.2.5

The relocation says “at offset 4020 (the GOT slot for puts), put a JUMP_SLOT pointer to symbol puts from GLIBC_2.2.5.”

A few things that fall out of understanding this:

Every dynamically-linked function call is a double indirection. The call jumps to the PLT, which jumps through the GOT. That’s a few extra cycles per call — usually negligible, but measurable in hot loops that call many library functions.

LD_PRELOAD works by injecting into the GOT resolution. When you LD_PRELOAD=mylib.so, the dynamic linker searches mylib.so for symbols BEFORE libc. If mylib.so defines puts, your puts@plt GOT slot gets filled with the mylib.so version on first call. This is how function interposition works for debugging, profiling, mocking, etc.

-fno-plt in GCC/Clang generates direct indirect calls through the GOT, bypassing the PLT. Slightly faster for hot calls, less introspectable, incompatible with lazy binding.

RELRO (-Wl,-z,relro,-z,now) marks the GOT read-only after load. -z now forces eager binding at load time (all symbols resolved up front), so the GOT doesn’t need to be writable during execution. This is a security hardening — it prevents GOT overwrite attacks, which were a common exploit vector.

Full static linking bypasses all of this. A statically-linked binary has libc’s puts code embedded directly, and the call is a direct jump. No PLT, no GOT, no dynamic linker. Faster to start, larger on disk, no LD_PRELOAD possible. On musl/Alpine, static linking is the default for many Go binaries.

strace -e trace=!stat on program startup shows the dynamic linker opening shared libraries. ldd shows the same result statically.

An optimization I stumbled onto: in a service that called clock_gettime hundreds of thousands of times per second, switching from -fno-plt off to -fno-plt on gave us about a 2% overall CPU reduction. Not huge, but in aggregate across fleet, meaningful. The indirection is cheap, but it’s not free.

The historical reason for PLT/GOT is that PIC (position-independent code) can’t hardcode the address of a function in another library — those libraries can be loaded anywhere. The GOT solves this by putting all external references in one table, whose address is known at load time via the binary’s base. The PLT is the code-side wrapper that makes it look like a direct call.

If you’re curious about more, man ld-linux.so.8 is a surprisingly good read, and Ulrich Drepper’s old paper “How to Write Shared Libraries” is where I went for the detailed story. I don’t think about PLT/GOT often, but when a tool like perf or a profiler shows you symbols like puts@plt eating CPU, you want to understand what you’re looking at.

Related: Mach-O on macOS has similar concepts with different names (lazy symbol pointers instead of PLT, nonlazy symbol pointers instead of GOT, with dyld as the resolver).