What a Computer Is
May 16, 2026·37 min read·beginner
Before we can talk about pipelines, caches, or instruction sets, we need a clear picture of what a computer actually is. The word is used loosely in everyday speech to mean almost any electronic…
Before we can talk about pipelines, caches, or instruction sets, we need a clear picture of what a computer actually is. The word is used loosely in everyday speech to mean almost any electronic device with a screen, but in the engineering sense it has a precise meaning. A computer is a physical machine that carries out a sequence of well-defined operations on data, where both the operations and the data are represented as patterns of bits stored inside the machine itself. Every idea developed in the rest of this book — registers, memory hierarchies, branch prediction, virtual memory — is in some way an answer to a question that arises out of this single definition.
This chapter sets up the vocabulary and the mental model. It is intentionally slow. Many of the words used here — architecture, organization, firmware, stored program — are used with overlapping meanings in casual writing, but they refer to genuinely distinct concepts in the discipline. Getting them straight now will save a great deal of confusion later.
01. The Purpose of Computation
At its core, computation is the mechanical transformation of information according to fixed rules. Given some input, a computer produces an output by following a procedure that does not require human judgment at any step. The procedure is the algorithm; the machine that carries it out is the computer; and the act of running the procedure on the machine is execution.
It is worth stepping back and asking why we build such machines at all. Three motivations have driven computing since its earliest days, and they remain the same today.
The first is speed. A human being can add two ten-digit numbers in perhaps ten seconds with careful work. A modern processor can perform roughly ten billion such additions in the same amount of time. Many problems in science, engineering, and commerce — weather prediction, payroll, simulation of physical systems, training machine-learning models — are not difficult in principle but require so many operations that no human or team of humans could finish them in a useful amount of time.
The second is accuracy. A correctly designed digital circuit, given the same inputs, produces the same outputs every time. It does not get tired, distracted, or bored. For tasks like banking, navigation, or controlling an aircraft, this consistency matters more than raw speed.
The third, and most subtle, is automation. Once a procedure has been encoded as a program, the machine can carry it out without further human involvement. This means computation can be embedded inside other systems — engines, telephones, medical devices, factories — where a person could not realistically sit and operate it by hand.
Formally, we can describe a computation as a function
mapping inputs from a set to outputs in a set . The job of the computer is to evaluate on a given input. Whether represents adding two numbers, sorting a list, decoding a video stream, or simulating a galaxy, the mathematical picture is the same. What changes from problem to problem is the algorithm chosen to compute and the resources — time, memory, energy — that the algorithm consumes.
This view has an important consequence. A computer does not understand meaning. It does not know that the number 98.6 stands for a body temperature, or that a particular byte represents the letter "A". It manipulates bit patterns according to rules. The meaning lives entirely in the mind of the programmer and the user. Much of computer architecture is the careful business of arranging hardware so that those bit patterns can be moved, combined, and stored quickly enough to be useful, while preserving the illusion that the machine is operating on the meaningful objects we care about.
02. A Brief History of Computing
The machine on your desk is the latest member of a family whose ancestors stretch back several centuries. Knowing a little of that history helps explain why modern computers look the way they do, and why certain ideas — like the stored program, or the use of binary — are taken for granted today.
The earliest computers were not machines at all but human beings, usually clerks or astronomers, who carried out long arithmetic procedures by hand for navigation tables, ballistics tables, and census work. The word computer literally meant one who computes. Mechanical aids gradually appeared to make this work less tedious: Pascal's adding machine in the 1640s, Leibniz's stepped-drum calculator in the 1670s, and most ambitiously Charles Babbage's Analytical Engine in the 1830s, a fully mechanical design that already contained — at least on paper — a separation between a mill (arithmetic unit), a store (memory), and a control mechanism driven by punched cards. Babbage's machine was never finished, but Ada Lovelace's notes on it described what is now recognized as the first published algorithm intended for a machine, and the first hint that such a device might process more than just numbers.
The modern era opens in the 1930s and 1940s with a wave of electromechanical and electronic machines built under the pressure of the Second World War. Konrad Zuse's Z3 in Germany used electromechanical relays and was the first programmable, fully automatic digital computer. The Harvard Mark I, built at IBM, used relays as well. The Colossus machines at Bletchley Park used vacuum tubes and were specialized for codebreaking. ENIAC, completed in the United States in 1945, was the first large-scale electronic general-purpose computer, but it was programmed by physically rewiring patch panels — a job that could take weeks.
It was against this backdrop that the stored-program idea, discussed in the next-but-one section, was articulated. The first machines actually to implement it ran in 1948 and 1949: the Manchester Baby, the EDSAC at Cambridge, and shortly after the EDVAC for which the original report was written. From that point onward, every commercially successful general-purpose computer has been a stored-program machine.
The decades that followed are often described as a sequence of generations, distinguished by the underlying device technology used to build the logic. The first generation used vacuum tubes; machines were room-sized, fragile, and consumed tens of kilowatts. The second generation, in the late 1950s, replaced tubes with discrete transistors, shrinking machines by an order of magnitude and improving reliability dramatically. The third generation in the 1960s introduced integrated circuits, with several transistors fabricated on a single piece of silicon. The fourth generation, beginning in the 1970s, introduced the microprocessor: an entire CPU on one chip, starting with the Intel 4004 in 1971. The fifth and present era is one of very-large-scale integration (VLSI) and system-on-chip design, where billions of transistors, multiple processor cores, caches, memory controllers, graphics units, and accelerators all coexist on a single die a few square centimeters in area.
The ideas that drive computer architecture have changed surprisingly little across these generations. The technology beneath them has changed enormously. A useful working assumption when reading this book is that whenever an architectural idea seems extravagant — a deep cache hierarchy, a branch predictor, an out-of-order pipeline — it is almost always a response to a mismatch between two technologies that improved at different rates: typically, processor logic that grew fast and memory that grew slow.
03. Theoretical Foundations
Before the first electronic computer was built, mathematicians had already worked out, in remarkable detail, what computation actually is. The two key figures are Alan Turing and Alonzo Church, both writing in 1936.
Turing's contribution was the Turing machine, an imaginary device consisting of an infinite tape divided into cells, a read–write head that can move along the tape, and a small set of rules that say, given the current state and the symbol under the head, what symbol to write, which direction to move, and what state to enter next. Despite its almost trivial appearance, the Turing machine is powerful enough to compute anything that any modern computer can compute. It is the minimal model of mechanical computation.
Church independently arrived at a different formalism, the lambda calculus, with the same expressive power. The two models — and a third, general recursive functions — were soon shown to be equivalent. The conjecture that they capture every reasonable notion of effectively calculable is the Church–Turing thesis. It is a thesis rather than a theorem because it is a claim about a fuzzy, intuitive concept; but every formal model of computation proposed in the ninety years since has turned out to be equivalent to it or weaker.
Two consequences of this work shape the rest of the book.
First, Turing showed that there exists a universal machine: a single Turing machine that, given the description of any other Turing machine on its tape, will simulate that machine's behaviour. The universal machine is the theoretical ancestor of every general-purpose computer. When you install a new application, you are exploiting the same idea — a fixed piece of hardware reading a description of some other behaviour and carrying it out.
Second, Turing also showed that some problems are undecidable: there are well-defined questions for which no algorithm, however clever, can give the right answer for every input. The most famous is the halting problem: deciding, in general, whether a given program will eventually stop or run forever. This places a hard ceiling on what any machine of any future technology can do. Computer architecture lives entirely below this ceiling, but it is worth knowing the ceiling is there.
A practical corollary is that all general-purpose computers are, in a precise sense, equivalent in what they can compute. They differ only in how fast, how cheaply, how reliably, and with how much energy they can do it. Almost the entire discipline of computer architecture is concerned with those four words.
04. Analog and Digital Computers
Almost every machine discussed in this book is digital: it represents information as discrete symbols, almost always the two values 0 and 1. This was not historically inevitable. For much of the twentieth century, analog computers were a serious alternative.
An analog computer represents a quantity by a continuous physical variable — typically a voltage, a current, the angle of a shaft, or the level of a fluid. Computation proceeds by exploiting physical laws: an operational amplifier integrates a signal because integration is what its capacitor naturally does; a gear train multiplies because of its ratio. Analog machines were used heavily for fire-control systems on warships, for flight simulators, and for solving differential equations in engineering long after the war.
Digital computers won decisively, for several reasons that are worth listing because they bear on every chapter that follows.
- Noise tolerance. Any real signal carries some noise. In an analog system, that noise accumulates with every operation and is impossible to undo. In a digital system, as long as the noise is small enough that a 0 still looks more like a 0 than a 1, the original value can be recovered exactly. Each gate cleans up its input.
- Reproducibility. Running the same digital program twice gives bit-identical results. Running an analog computation twice gives results that agree only to within the tolerance of the components.
- Storage. Digital information can be stored indefinitely without degradation by periodically refreshing or by writing it to non-volatile media. Analog values stored on capacitors or magnetic tape drift over time.
- Manufacturability. Digital circuits can be built from a small number of standardized building blocks, replicated billions of times. Analog circuits typically require careful, individual tuning.
- Programmability. A digital machine can be redirected to a new task by loading new bits. An analog machine usually has to be physically rebuilt.
None of this means analog computation is dead. Specialized analog and mixed-signal circuits remain crucial wherever a computer must touch the real world: every microphone, sensor, radio, and motor controller has analog circuitry behind a digital interface. There is also active research in analog and neuromorphic accelerators for machine learning, where the energy advantage of letting physics do the arithmetic is large. But the central nervous system of a modern computer is digital, and that is what this book describes.
05. Hardware, Software, and Firmware
A working computer system has three layers of substance. They are physically and conceptually distinct, but they are designed to fit together so tightly that to a casual observer they appear to be one thing.
Hardware
Hardware is the physical machinery: the silicon dies, the printed circuit boards, the wires, the connectors, the power supply, the cooling fans. Hardware is what you can drop on your foot. It is built from transistors arranged into logic gates, gates arranged into functional blocks such as adders and registers, and blocks arranged into larger units such as processors and memory controllers. Once manufactured, hardware is essentially fixed. You cannot rewire a CPU after it leaves the factory; you can only choose which of its capabilities to use.
The fundamental property of hardware is that it operates in real, physical time. A transistor takes a measurable amount of time to switch. A signal takes a measurable amount of time to travel down a wire. These delays are not abstractions; they set hard limits on how fast a computer can operate, and a great deal of architectural cleverness is devoted to working around them.
Software
Software is the set of instructions and data that tells the hardware what to do. A program is software. So is the operating system, the web browser, the compiler, and the file you are reading. Software has no physical form of its own; it exists only as patterns of bits stored on some hardware medium — a disk, a memory chip, a network packet in flight. The same software can be copied from one machine to another and run identically, provided both machines understand the same instructions.
The decisive characteristic of software is that it is mutable. A program can be edited, recompiled, downloaded, patched, and replaced without changing any physical wiring. This is the source of the computer's enormous flexibility. The same piece of hardware can act as a word processor in the morning, a video player in the afternoon, and a chess engine in the evening, simply by loading different software into it.
Firmware
Firmware sits between hardware and software. It is software in the sense that it consists of instructions stored as bit patterns, but it lives very close to the hardware and is changed rarely if ever. The classic examples are the BIOS or UEFI code that runs when a PC is first powered on, the small program inside a hard drive controller, and the microcode inside a modern CPU that interprets complex instructions in terms of simpler internal operations.
Firmware exists for a practical reason. Some functions are too detailed or too device-specific to bake permanently into silicon, but too low-level and too rarely changed to ship as ordinary software. Putting them in non-volatile memory — a flash chip on the motherboard, a small ROM inside the processor — gives engineers a way to fix bugs and add features without redesigning the chip itself.
A useful way to picture the three layers is as a stack. Software at the top is general and changes frequently. Firmware in the middle is specialized and changes occasionally. Hardware at the bottom is fixed and changes only between product generations. Each layer hides the details of the one below it and offers a cleaner interface to the one above.
06. The Layers of Abstraction
The hardware–firmware–software split is a coarse division of substance. A finer and equally important division is by level of abstraction. A modern computing system is best understood as a tower of layers, each one offering a clean interface to the one above and resting on the one below. From bottom to top, a fairly standard list runs:
- Physics and devices. Electrons, electric fields, semiconductor junctions. The behaviour at this level is described by quantum mechanics and solid-state physics. Most of this book treats it as given, but it is worth remembering that when a transistor switches, what is actually happening is a change in carrier concentration in a piece of doped silicon.
- Circuits. Transistors connected into amplifiers, voltage references, current mirrors, sense amplifiers, and so on. This is the world of analog and mixed-signal design.
- Logic gates. Circuits arranged so that their output is, to a good approximation, a clean digital function of their input: AND, OR, NOT, XOR, and a handful of relatives. The behaviour above this level can usually be described entirely in terms of 0 and 1, with the messy continuous physics hidden underneath.
- Register-transfer level. Gates grouped into registers, multiplexers, adders, decoders, and the wires between them. This is where designs are typically described in a hardware description language such as Verilog or VHDL.
- Micro-architecture. Pipeline stages, caches, predictors, execution units, queues. Still hardware, but described in larger functional blocks.
- Instruction set architecture. The contract presented to software: instructions, registers, addressing modes, exceptions.
- Operating system. Manages processes, memory, files, devices, and provides system calls that look like a richer, safer machine to applications.
- Run-time libraries and language runtimes. The C standard library, the Java virtual machine, the Python interpreter, the garbage collector. These bridge a programming language's worldview to what the operating system actually offers.
- Applications. The programs the user actually runs.
Two features of this stack are worth dwelling on.
First, each layer is intended to be a clean abstraction: someone working at a given layer should be able to ignore the details below it. A web developer rarely thinks about cache associativity; a kernel developer rarely thinks about transistor threshold voltages. This abstraction is what makes the modern industry possible at all — no single person, and no single team, could hold all the layers in their head at once.
Second, the abstractions are imperfect. They leak. A web application that ignores cache behaviour can run an order of magnitude slower than one that respects it. A kernel that ignores branch prediction can be vulnerable to side-channel attacks. One of the recurring lessons of this book is that good engineers know which abstraction they are working at and know enough about the layer immediately below to recognize when its assumptions break down. Computer architecture as a discipline lives in the middle of this stack and is concerned, more than anything else, with the interfaces between adjacent layers.
07. The Stored-Program Idea
Of all the ideas in computer architecture, the single most important is the stored-program concept. It is so fundamental, and so familiar, that it is easy to miss how strange it once seemed.
In the earliest electronic computers of the 1940s, the program was not data. It was a physical configuration of the machine. To change what the computer did, an engineer had to rearrange plug-board cables, set banks of switches, or even rewire portions of the chassis. Loading a new problem could take days. The data being processed lived in a separate place — on punched cards, paper tape, or a small electronic memory — and was the only thing the machine could read and modify on the fly.
The stored-program idea, articulated most clearly in a 1945 report by John von Neumann describing the EDVAC machine (and anticipated by others, including Alan Turing and the team behind the Manchester Baby), is simple but radical:
Both the program and the data should be stored in the same memory, in the same format, and the machine should be able to read either one as it runs.
Once you accept this idea, several remarkable consequences follow.
A program can be loaded into memory just like data. You no longer have to rewire the machine to run a new application; you simply copy a new sequence of bits into memory and tell the processor to begin executing them. Software, in the modern sense, becomes possible.
A program can read another program. This is exactly what a compiler does: it takes the text of a high-level program as input data, processes it, and produces a different program as output. Without the stored-program model, compilers, assemblers, and interpreters would all be far harder to build.
A program can read or even modify itself. Whether this is wise is a separate question — self-modifying code is notoriously difficult to debug — but the capability falls out naturally from the design. Operating systems exploit a tame version of it every time they load a program from disk into memory and then jump into it.
Almost every machine you will encounter today is a stored-program computer, sometimes called a von Neumann machine in honor of that early report. The processor repeatedly performs a small loop:
| 1. Read the instruction at the address held in the program counter. | |
| 2. Decode that instruction to figure out what to do. | |
| 3. Execute the operation, possibly reading or writing data in memory. | |
| 4. Update the program counter so it points to the next instruction. | |
| 5. Repeat. |
This loop, called the fetch–decode–execute cycle, is the heartbeat of nearly every general-purpose computer. We will return to it in detail in Chapter 8. For now, the point is that the program counter is just another register holding bits, and the instructions themselves are just more bits in memory. There is nothing physically distinguishing an instruction from a number; the meaning is given entirely by which register reads it and when.
08. Architecture, Organization, and Micro-Architecture
When practitioners talk about how a computer is built, they use three closely related but distinct words: architecture, organization, and micro-architecture. Casual writing often blurs them. In this book they will be kept apart, because the distinction is genuinely useful.
Architecture
The architecture of a computer, in the strict sense, is the part visible to the programmer. It is the contract between the hardware and the software. Specifically, the architecture defines:
- the set of instructions the processor can execute, including their meanings and encodings;
- the registers that the programmer can name;
- the rules for how memory is addressed;
- the way exceptions, interrupts, and system calls behave;
- the data types the machine recognizes and their sizes.
Together these elements form the Instruction Set Architecture, abbreviated ISA. Examples include x86-64, ARM AArch64, and RISC-V RV64. An ISA is, in essence, a specification document. It tells you what a correct implementation must do, but says little about how to do it.
A useful analogy is a written language. The ISA is the grammar and vocabulary. Any speaker who follows the rules can be understood by any listener who follows the same rules, regardless of their voice, accent, or speaking speed.
Organization
The organization of a computer describes the larger functional pieces and how they are connected. At this level we talk about caches, memory controllers, buses, I/O subsystems, and the high-level structure of the processor pipeline. Organization answers questions like: Is there one cache or two? How wide is the bus to main memory? How are devices connected to the CPU? Two machines with the same architecture can have very different organizations, and they will run the same programs at very different speeds and power levels because of it.
Micro-Architecture
The micro-architecture is the most detailed level. It describes exactly how the organization is implemented in hardware: the specific pipeline stages, the branch predictor design, the number of execution units, the policies used by the cache, the way instructions are decoded into internal micro-operations, and countless other details. Two processors that implement the same ISA — say, an Intel Core and an AMD Ryzen, both running x86-64 — have very different micro-architectures. They run the same software but achieve their performance through different internal mechanisms, and they have different strengths and weaknesses on different workloads.
A short summary may help fix the idea:
| Level | Question it answers | Visible to |
|---|---|---|
| Architecture (ISA) | What does the machine do? | Programmer, compiler |
| Organization | What major blocks does it have, and how are they wired? | System designer |
| Micro-architecture | How is each block implemented? | Hardware designer |
The same ISA can support a tiny embedded core that fits in a few square millimeters and a giant server processor that consumes hundreds of watts. The architecture is the same; the organization and micro-architecture are utterly different.
09. Classes of Modern Computers
The word computer covers an extraordinary range of devices, from a microcontroller no larger than a grain of rice to a warehouse-sized supercomputer drawing tens of megawatts. Although they are all stored-program machines and all built from broadly similar components, the design priorities differ enormously. It is useful to recognize the main classes early on, because examples in later chapters will be drawn from across the spectrum.
- Microcontrollers. A complete computer on a single chip, including a small CPU, a few kilobytes of memory, and assorted peripherals such as timers, analog-to-digital converters, and serial ports. Used in appliances, toys, sensors, automotive electronics, and almost any object that contains some logic. Sold in tens of billions of units per year. Cost is measured in cents and power in milliwatts.
- Embedded systems. Larger than a microcontroller, often built around a small application processor running a real-time operating system or a stripped-down Linux. Found in routers, printers, set-top boxes, industrial controllers, and the body computers of cars. Often optimized for low power, predictable timing, and long product lifetimes.
- Mobile devices. Smartphones, tablets, and wearables. Built around a system-on-chip that integrates several CPU cores, a graphics processor, a neural-network accelerator, image-signal processors, modems, and security blocks. Designed under tight power and thermal limits because they run on batteries and have no fans.
- Personal computers. Laptops and desktops. Less power-constrained than mobile devices, more performance-oriented, and traditionally programmable by the end user. The ISA is almost always x86-64 or, increasingly, ARM AArch64.
- Workstations and servers. Machines optimized for sustained throughput, large memory, many I/O channels, and high reliability. They typically run twenty-four hours a day, seven days a week, in air-conditioned rooms, and are designed to recover gracefully from component failures.
- Datacenter and cloud systems. Vast collections of servers operated as a single resource. Individual machine performance matters less than aggregate throughput per dollar and per watt. Architectural decisions are increasingly driven by what works well at the scale of tens of thousands of servers behaving as one.
- Supercomputers. The largest scientific machines, used for weather, climate, fluid dynamics, materials science, and nuclear simulation. They emphasize floating-point throughput, fast interconnects, and parallel scalability across millions of cores.
- Accelerators. Not full computers in the classical sense, but specialized engines — GPUs, tensor processors, network processors, video codecs — that handle particular workloads orders of magnitude more efficiently than a general-purpose CPU. They are increasingly the interesting part of a modern system, and Chapter 56 is devoted to them.
The interesting fact about this list is how much technology is shared across it. The same underlying transistor, the same broad organizational ideas, and often the very same ISA appear at every level. What changes is the balance of priorities: cost, power, performance, area, reliability, security, predictability. The next section makes that balance explicit.
10. Design Constraints and Tradeoffs
A computer architect never optimizes a single number in isolation. Every interesting design decision involves trading one good against another. The constraints that recur throughout this book are worth naming up front.
- Performance. How quickly the machine completes useful work. This is itself not a single number — it can mean latency (time for one task), throughput (tasks per unit time), responsiveness (time to react to an event), or scalability (how performance grows with added resources). Chapter 10 takes performance seriously; for now, it is enough to know that the word is ambiguous and that any benchmark measures only one slice of it.
- Power and energy. Power is the rate at which the machine draws energy; energy is the integral of power over time. Power sets the cooling requirement and the upper limit on clock speed. Energy sets battery life and electricity bill. The two can be in tension: making a machine faster sometimes saves energy by letting it idle sooner (the so-called race-to-halt effect), but pushing the clock too high spends disproportionate power for marginal speed.
- Cost. The selling price has to cover the silicon area, the packaging, the testing, the cooling, the design effort amortized over the production run, and the supporting software ecosystem. A part that is twice as fast but five times more expensive will lose almost every market.
- Area. On a single die, every square millimeter spent on one feature is a square millimeter not spent on another. Caches, execution units, and accelerators all compete for space, and silicon area translates almost linearly into cost.
- Reliability. Modern chips contain billions of transistors, any of which can fail. Memory cells are perturbed by cosmic rays. Wires wear out by electromigration. Architectures must tolerate, detect, or recover from these events, and the bar is far higher in aircraft, medical, and automotive systems than in consumer electronics.
- Security. It is no longer acceptable to design a CPU as if every program running on it were trustworthy. Isolation between processes, between users, and between the operating system and the hardware itself has become a first-class architectural concern, and one whose subtleties are the subject of Chapter 51.
- Predictability. A real-time control system would rather have a guaranteed response within one millisecond than an average response of one microsecond with an occasional ten-millisecond stall. Many features that improve average performance — caches, branch predictors, dynamic frequency scaling — make worst-case timing harder to bound.
- Compatibility. A new processor that cannot run existing software faces a brutal market. Backward compatibility constrains nearly every commercially successful ISA and is the reason some surprisingly old features are still present in modern x86 chips.
- Time to market. A perfect design that ships two years late may be worth less than a merely good design that ships on time. Engineering effort and verification time are themselves constraints.
Different classes of computer weight these constraints very differently. A pacemaker prioritizes reliability and predictability over performance. A datacenter prioritizes performance per dollar and per watt. A smartphone prioritizes energy and area. A research supercomputer prioritizes peak floating-point throughput. The same architectural technique can be a brilliant idea in one context and a terrible one in another, which is why this book repeatedly returns to what is being optimized for whom.
11. Technology Scaling and the Engine of Progress
For most of the history of the integrated circuit, the dominant fact of life in the industry was that the number of transistors that could be economically placed on a single chip roughly doubled every two years. This observation, first made by Gordon Moore in 1965 and refined a decade later, is known as Moore's Law. It is not a law of nature but an industry roadmap that, for half a century, was self-fulfilling: companies invested on the assumption that it would hold, and the investment is part of what made it hold.
A companion observation, Dennard scaling, said that as transistors shrank, their voltage and current could be scaled down in proportion, so that power density — watts per square millimeter — stayed roughly constant. The combined effect was wonderful: each new process generation gave designers more transistors, switching faster, at the same total power. Single-thread performance roughly doubled every eighteen months for several decades on the back of these two trends, more or less for free as far as the architect was concerned.
Dennard scaling broke down around 2005. Voltages stopped falling because leakage current became dominant at low voltages, and the power density of an actively switching chip began to rise faster than cooling could remove the heat. Moore's Law has continued, after a fashion, but each new process node is more expensive, gives smaller speed gains, and requires increasingly heroic engineering. The industry's response has been visible in every product since:
- the stagnation of single-thread clock speeds in the 3–5 GHz range;
- the move to multi-core designs, exploiting parallelism rather than higher frequency;
- the rise of heterogeneous designs that combine different kinds of cores and accelerators on one chip;
- the explosion of interest in domain-specific hardware, where giving up generality buys efficiency on a particular workload;
- and the architectural push, throughout this book, toward extracting more useful work per joule and per transistor rather than per second.
Much of what makes contemporary architecture interesting is precisely that the easy gains are gone. The free lunch that Dennard scaling provided is over, and the progress that continues has to be earned with cleverer designs.
12. Standards, Compatibility, and Ecosystems
A last point of context concerns the role of standards. A computer is rarely useful in isolation. It is part of an ecosystem of operating systems, compilers, libraries, peripherals, networks, and other computers, all of which assume certain behaviours. Most of these behaviours are written down in standards, ranging from the precise — the IEEE 754 standard for floating-point arithmetic, the PCI Express specification for I/O — to the de facto — the calling conventions used by particular operating systems on particular processors.
Three consequences are worth noting now.
First, ISAs are remarkably long-lived. The original Intel 8086 shipped in 1978, and modern x86-64 processors still execute, at least in principle, the instructions that ran on it. ARM's first chips appeared in 1985, and the ISA family has been continuously extended rather than replaced. The reason is the enormous installed base of software: an ISA is valuable only insofar as programs exist for it, and replacing it would invalidate that investment. New ISAs do appear — RISC-V is the most prominent recent example — but they succeed by capturing markets where the incumbent's grip is weak, not by frontal assault.
Second, ISAs split into open and proprietary designs. RISC-V, like SPARC and OpenPOWER before it, can be implemented by anyone without paying licensing fees. ARM is proprietary but widely licensed. x86-64 is held by Intel and AMD under cross-licensing agreements. The choice between open and proprietary affects who can build chips, what features they can add, and who controls the long-term direction of the architecture.
Third, compatibility itself comes in degrees. Binary compatibility means the same executable runs without modification. Source compatibility means the same source code compiles and runs. Forward compatibility means new hardware accepts old software; backward compatibility means new software runs on old hardware. Designing for the right kind of compatibility, and being explicit about which kinds are not offered, is a recurring architectural decision.
13. The Lifecycle of a Program on a Computer
To finish the chapter, let us trace what actually happens when you run a program. Imagine you have written a small C program that prints a greeting:
| #include <stdio.h> | |
| int main(void) { | |
| printf("Hello, architecture!\n"); | |
| return 0; | |
| } |
The journey from this text file to a flickering pixel pattern on your screen is long. It crosses every layer we have discussed and previews almost every part of this book.
Step 1: Source code
The program above is a plain text file, say hello.c. It is human-readable and means nothing to the hardware. The CPU has no notion of printf, of strings, or of int. Before anything can run, this text has to be translated into instructions the processor understands.
Step 2: Compilation
A compiler reads the source code and produces assembly code, which is a low-level textual representation of the machine's instructions. On an x86-64 machine, the body of main might look something like this:
| main: | |
| lea rdi, [rip + .Lstr] ; load address of string | |
| call puts ; call the C library function | |
| xor eax, eax ; set return value to 0 | |
| ret ; return to caller | |
| .Lstr: | |
| .asciz "Hello, architecture!" |
Each line corresponds to one instruction the CPU will eventually execute. The compiler made many decisions on the programmer's behalf: which registers to use, how to lay out the string in memory, how to call the standard library. It enforces the calling convention, an agreement (covered in Chapter 14) about which registers hold arguments, which hold return values, and which must be preserved across calls.
Step 3: Assembly and object files
The assembler takes the assembly code and turns it into a binary object file, typically with a .o extension. The object file contains the actual bit patterns of the machine instructions, plus tables describing which symbols (functions, variables) it defines and which it expects to find elsewhere. At this stage, the call to puts is not yet a real address — it is a placeholder that says, in effect, "please patch in the address of puts once you know it."
Step 4: Linking
The linker combines one or more object files with libraries to produce an executable. It resolves all the placeholders, assigns final addresses to functions and data, and produces a file the operating system knows how to load. The C standard library, which contains the implementation of puts, is linked in either statically (copied directly into the executable) or dynamically (referenced so it can be loaded at run time).
Step 5: Loading
When you run the program, the operating system's loader allocates memory for it, copies the executable's code and data into that memory, sets up a stack and a heap, initializes registers, and finally jumps to the program's entry point. This is the moment the stored-program idea earns its keep: the program, once just a file on disk, has become a sequence of bits in memory ready to be fetched by the CPU.
Step 6: Execution
The processor begins its fetch–decode–execute loop at the address the loader gave it. Each instruction passes through the pipeline. The string "Hello, architecture!" is read out of memory one byte at a time, ultimately arriving at a system call that asks the operating system to write it to standard output. The operating system, in turn, talks to a device driver, which talks to hardware, which eventually causes pixels to light up on the screen.
Step 7: Termination
When main returns, the C runtime calls the operating system's exit routine, which reclaims the memory the program was using, closes its files, and returns control to the shell or whatever program launched it. The bits that made up the running program are gone from memory, though the executable file on disk remains, ready to be loaded and run again.
It is worth pausing to appreciate how many layers the simple act of printing a line of text actually involves. The compiler and assembler depend on the ISA. The linker and loader depend on the operating system and on file formats. The execution depends on the processor's micro-architecture, on the memory hierarchy, on the I/O subsystem, and on the firmware that brought the machine up in the first place. This book is, in a sense, a tour of all of these layers, working downward from the program you write to the transistors that ultimately carry it out.
14. Why Study Computer Architecture
A reasonable question to ask, before settling into a long book on the subject, is why bother. Most programmers never design a chip. Compilers, operating systems, and runtimes already hide most of what the hardware does. Why should anyone working two or three layers above the metal care what happens beneath?
There are several honest answers.
Performance. The gap between code that respects the machine and code that ignores it is routinely a factor of ten and sometimes a factor of a hundred. The same algorithm, expressed in two ways that look equivalent in source, can hit cache or miss cache, vectorize or not vectorize, predict well or mispredict, with dramatic effects on running time and energy. Knowing what the hardware is doing turns these from mysteries into design choices.
Correctness. Modern hardware is concurrent in ways that are easy to forget. Memory orderings, atomic operations, cache coherence, and pipeline effects can produce program behaviours that no purely sequential reasoning predicts. A surprising fraction of subtle bugs in systems software are misunderstandings of what the underlying machine guarantees.
Security. Side-channel attacks like Spectre and Meltdown made it permanently clear that a processor's micro-architecture is part of its security boundary, not merely an implementation detail beneath it. Defending or attacking modern systems requires understanding what is happening below the ISA.
Judgment. Software design is full of decisions that, in the end, come down to what is cheap and what is expensive on real hardware. Whether to denormalize a database, whether to favour pointer-rich or array-rich data structures, whether to parallelize or vectorize, whether to compress data in memory — all of these have answers that depend on the machine. A practitioner who has internalized the basic costs makes those decisions almost automatically; one who has not, guesses.
Curiosity. Finally, a modern processor is one of the most intricate artifacts our civilization has ever produced. Several billion transistors, switching billions of times a second, cooperating to evaluate a function. It is genuinely worth understanding for its own sake, in the same way that the inside of a clock or a jet engine is worth understanding.
The rest of this book takes these motivations seriously. Each chapter, even the most theoretical, will try to connect the ideas it develops back to what they mean for someone writing or running real programs.
15. Summary
A computer is a machine that mechanically transforms information by executing stored instructions. Its substance is divided into hardware, firmware, and software, each with its own rate of change and its own role, and its conceptual structure is best seen as a tower of abstractions running from physics at the bottom to applications at the top. The discipline rests on a deep theoretical foundation — the Turing machine, the Church–Turing thesis, the universal machine — that tells us what computation is and where its limits lie, and on a long historical lineage that gradually moved computation out of human hands and into electronic ones. The decisive engineering choice was to build digital rather than analog machines, and the decisive conceptual one was the stored-program idea. To reason carefully about a stored-program machine, we distinguish between its architecture (what the programmer sees), its organization (what the major blocks are), and its micro-architecture (how those blocks are actually built). Real machines span an enormous range, from microcontrollers to supercomputers, and architects choose between them by trading performance, power, cost, area, reliability, security, predictability, and compatibility against one another, under the shadow of the slowing — but not yet ended — march of technology scaling. And every program, no matter how small, passes through a long chain of compilation, linking, loading, and execution before its first instruction ever runs.
With this vocabulary in place, we can begin in Chapter 2 with the most basic question of all: how does a machine made of switches represent numbers, characters, and everything else we want it to compute on?