How the Internet Actually Works: Networks, Protocols, and the Web Under the Hood

Section 8 of 17

HTTP: The Language Websites Use to Communicate

HTTP: The Language of the Web

We've got IP addresses that tell data where to go, packets that carry the data, TCP ensuring reliable delivery, and DNS translating names to addresses. Now comes the fun part: the actual language that browsers and web servers use to have a conversation. Enter HTTP, the HyperText Transfer Protocol.

HTTP is an application-layer protocol that browsers and web servers use to exchange everything on the web — HTML pages, JSON data, images, CSS files, JavaScript, video, you name it. Tim Berners-Lee dreamed this up between 1989 and 1991 as part of the World Wide Web project, and it's never really gone out of style. Here's what's remarkable: Berners-Lee didn't just invent HTTP. He created HTTP, HTML (the markup language), and URLs (the addressing scheme) all at once — a complete system for sharing scientific documents across networks. Each piece was equally critical. The protocol worked because it was deliberately simple: if you read the spec and could write basic socket code, you could build a server or a client. No magic required.

HTTP's Core Mental Model: Request and Response

HTTP is a client-server protocol. Your browser (or curl, or a Python script, or a mobile app, or a microservice) makes a request. A server gets it, processes it, sends back a response. That's the whole dance. No back-and-forth negotiating, no persistent stateful connection at the HTTP layer — just request and response.

Here's the weird part: HTTP is stateless. Every single request stands completely alone. The server forgets you the moment it sends a response. Make 100 requests to the same server, and as far as HTTP cares, you're a stranger each time. (This creates a whole problem we'll untangle later.)

Think of it like the old postal system: you write a letter (request), address it, mail it, wait for a reply (response), read it, then forget about it. The mail carrier doesn't remember your correspondence or keep notes about previous letters. If you need the other person to remember something from last time, you've got to mention it again in your new letter — or set up some kind of filing system (sessions and cookies) to remember on your behalf.

Anatomy of an HTTP Request

An HTTP request is plain text. Boring, but useful. Here's what your browser actually sends when you visit http://example.com/:

GET / HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive

Let's break this down:

Request line: GET / HTTP/1.1

GET — the HTTP method (or verb), telling the server what you want to do
/ — the path you're requesting (everything after the domain name in the URL)
HTTP/1.1 — the protocol version you're speaking

Headers: Everything else is HTTP headers — key-value pairs that give the server context. You'll see these constantly:

Host: Required in HTTP/1.1 — tells the server which domain you want (essential when one server hosts dozens of domains)
User-Agent: Describes the client software making the request
Accept: Which content types the client can actually handle (q=0.9 means "I prefer this less than the default")
Accept-Encoding: What compression methods the client understands (gzip and deflate are standard; br is Brotli, the newer kid on the block)
Connection: keep-alive: "Hey, don't close this TCP connection when you're done — I'll probably send more requests"

Working Through a Real Example

Let's say you're logging into a website. You fill out a form on example.com with your username and password, hit submit, and it POSTs to /api/login. Here's what the browser actually sends:

POST /api/login HTTP/1.1
Host: example.com
Content-Type: application/json
Content-Length: 47
Connection: keep-alive

{"username":"[email protected]","password":"secret123"}

Notice what changed:

The method is now POST — you're doing something, not just asking to see something
Content-Type tells the server "the body I'm sending is JSON" (not HTML form data or XML)
Content-Length tells the server how many bytes are in the body — it knows exactly when the message ends
There's a blank line separating headers from the body
Then comes the actual data

The server processes that and responds:

HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 156
Set-Cookie: sessionid=abc123def456; Path=/; HttpOnly
Connection: keep-alive

{"status":"success","user_id":42,"user":"Alice Smith","auth_token":"xyz789"}

The 200 OK means success. The Set-Cookie header is the server saying "save this cookie on your computer, I'll need it next time." And notice the server also sends Connection: keep-alive — confirming that the TCP connection is staying open, so your browser can send more requests without a new handshake.

HTTP Methods: The Verbs

HTTP defines a set of request methods (think of them as verbs) that describe what you want to do:

GET: Fetch a resource. Safe to use repeatedly — no side effects. "Give me the current state of this resource." Examples: loading a web page, fetching a user's profile, downloading an image.
POST: Submit data to create or process something. Not safe to repeat — calling it twice creates two things. "Do something with this data, probably creating a new resource." Examples: submitting a form, uploading a file, posting a comment.
PUT: Replace an entire resource. Safe to repeat — calling it twice is the same as calling it once. "Overwrite this resource completely with what I'm sending." Example: PUT /api/users/42 with a complete user object replaces user 42 entirely.

PATCH: Update just part of a resource. "Change only these specific fields, leave the rest alone." Example: PATCH /api/users/42 with {"email":"[email protected]"} changes only the email.
DELETE: Remove a resource. "Get rid of this." Example: DELETE /api/users/42 deletes user 42.
HEAD: Like GET, but give me only the headers, skip the body. Useful for checking if something exists, checking its size or when it was last modified, without downloading everything. Developers use this for health checks or verifying a resource before fetching it.
OPTIONS: Tell me what methods are allowed on this resource. Used by browsers for CORS (Cross-Origin Resource Sharing) preflight requests — before making a real POST to another domain, the browser asks "is this allowed?"

REST and Semantics: Why Methods Matter

For Django developers building APIs, these methods are your vocabulary. A properly designed REST API uses methods intentionally:

Operation	HTTP Method + Path	Safe to Repeat?	Changes Things?
List users	`GET /users/`	Yes	No
Create user	`POST /users/`	No	Yes (creates resource)
Retrieve user 42	`GET /users/42/`	Yes	No
Replace user 42	`PUT /users/42/`	Yes	Yes (overwrites)
Update user 42's email	`PATCH /users/42/`	Yes	Yes (modifies)
Delete user 42	`DELETE /users/42/`	Yes	Yes (deletes)

Here's a common mistake: using GET for actions that change state. Like GET /users/42/delete/. This breaks HTTP semantics. GET should never have side effects. Why? Because browsers and caches assume they can prefetch GET requests, or retry them if the network hiccups. If a GET deletes something, you've created a nightmare of unreliability.

Anatomy of an HTTP Response

When the server sends back a response, it looks like this:

HTTP/1.1 200 OK
Date: Mon, 27 Jan 2025 12:00:00 GMT
Server: nginx/1.25.3
Content-Type: text/html; charset=UTF-8
Content-Length: 1256
Cache-Control: max-age=3600
Connection: keep-alive

<!DOCTYPE html>
<html>
<head><title>Example Domain</title></head>
...

Status line: HTTP/1.1 200 OK

Protocol version
Status code (200) — the three-digit result code
Reason phrase (OK) — human-readable explanation

Response headers: Metadata about what's coming:

Date: When the server sent this (helps clients spot clock problems)
Server: What software is running (mostly informational; some servers hide this for security)
Content-Type: What kind of thing is in the body (text/html, application/json, image/png, etc.)
Content-Length: Size of the body in bytes (the client knows exactly when the message ends)
Cache-Control: How long clients and proxies can keep this response cached (we'll dig into this later)

Empty line: A blank line marks where headers end and body begins.

Body: The actual content — HTML, JSON, image data, whatever was requested.

HTTP Status Codes: The Server's Report Card

HTTP status codes are three-digit numbers, organized into five categories. You'll see these constantly as a developer. Understanding them deeply makes debugging infinitely easier.

1xx — Informational: The server is working on it. You rarely see these, though 101 Switching Protocols shows up when upgrading to WebSocket (a protocol that keeps a persistent bidirectional connection open, unlike HTTP's request-response model).

2xx — Success: It worked.

200 OK — Everything happened as expected.
201 Created — A new resource was created (common response to a successful POST). Best practice: include the new resource's URL or ID so the client knows where to find it.
204 No Content — Success, but there's no body to send (common for DELETE, or for POST/PATCH operations where you don't need the result echoed back).
206 Partial Content — The server is sending part of a resource, used for range requests (resuming downloads, streaming video).

3xx — Redirection: The resource is elsewhere. Follow me.

301 Moved Permanently — This resource has permanently moved to a new URL. Browsers cache this aggressively; next time go straight to the new URL. (Search engines respect permanent redirects, so update your bookmarks.)
302 Found (or 307 Temporary Redirect) — Temporary redirect. The old URL still exists, but this request should go somewhere else. Don't cache this; try the original URL next time.
304 Not Modified — The resource hasn't changed since you last checked (usually because you sent an If-None-Match header with an ETag). Use your cached copy. This is a performance superpower — no body gets transmitted.

4xx — Client Errors (you did something wrong):

400 Bad Request — Your request was malformed (invalid JSON, missing required fields, etc.).
401 Unauthorized — You need to authenticate (log in). The response usually includes a WWW-Authenticate header explaining how.
403 Forbidden — You're authenticated but not allowed to do this (permissions issue).
404 Not Found — The resource doesn't exist. (The most famous status code — ask any developer and they know 404.)
405 Method Not Allowed — You used the wrong method (tried to POST somewhere that only accepts GET).
409 Conflict — Your request conflicts with the current state (trying to create a user with an email that already exists).
422 Unprocessable Entity — Your request was syntactically fine but doesn't make sense (a form where the email is valid JSON but not a valid email address).
429 Too Many Requests — You've hit a rate limit. The server is defending itself. The response usually includes a Retry-After header telling you how long to wait.

5xx — Server Errors (something broke):

500 Internal Server Error — The server messed up. An unhandled exception, a database connection died, something went wrong in the code. These are bad — they mean you've got a bug.
502 Bad Gateway — The reverse proxy (like nginx) tried to reach the upstream server and couldn't (Django crashed, isn't running, or is too slow).
503 Service Unavailable — The server is overloaded or in maintenance. Often includes a Retry-After header.
504 Gateway Timeout — The upstream server took forever to respond, so the proxy gave up.

Why Status Codes Matter in APIs

When you're building an API, the status code you choose matters. For example:

POST /users/ with valid data and you create the user? Respond with 201 Created, not 200 OK. The client knows a resource was created and can grab its URL from the response.
DELETE /posts/42/ succeeds? Respond with 204 No Content, not 200 OK. There's no body, and the client should expect that.
Rate limit kicks in? Respond with 429 Too Many Requests, not 503. The client can tell the difference between "I asked for too much" and "the service is broken."

Status codes are HTTP semantics. Respecting them makes your API clearer and debugging easier.

The Evolution of HTTP: 1.0, 1.1, 2, and 3

HTTP has changed dramatically since its invention, and understanding that history explains why modern web performance works the way it does.

HTTP/0.9 (1991)

The original. Shockingly minimal: no headers, no status codes, only GET. A complete request-response looked like:

GET /page.html

<!DOCTYPE html>
...

You sent one line, the server sent back raw HTML, then closed the connection. There was no way to:

Request CSS, images, or JavaScript (none of that existed yet)
Report errors (if the file didn't exist, you got nothing)
Negotiate content types or encoding
Identify yourself or the server

It worked for sharing hyperlinked documents, which was the whole point. But it couldn't scale.

HTTP/1.0 (1996)

Added headers, status codes, and multiple methods. A typical transaction:

GET /page.html HTTP/1.0
User-Agent: Netscape 2.0

HTTP/1.0 200 OK
Content-Type: text/html
Content-Length: 1234

<html>...

But here's the killer: one TCP connection per request. Every request meant:

A full TCP three-way handshake (SYN, SYN-ACK, ACK) — roughly 100ms to a distant server
Sending the HTTP request
Receiving the response
Closing the connection

Fetch a web page with 10 resources (HTML, 5 images, 2 CSS, 2 JS)? That's 10 separate TCP connections. On the 200ms internet of the 90s, you spent 2 seconds just in handshakes before downloading anything. The web felt glacially slow.

HTTP/1.1 (1997, ruled the web for 20 years)

Added persistent connections (Connection: keep-alive). Now the TCP connection stays open after a request, so the next request reuses it:

GET /page.html HTTP/1.1
Host: example.com
Connection: keep-alive

HTTP/1.1 200 OK
Content-Length: 1234

<html>...
GET /style.css HTTP/1.1
Host: example.com

HTTP/1.1 200 OK
Content-Length: 456

body { ...

No new TCP handshake for the second request — transformative performance improvement. Also added:

The Host header, so one server could host many domains
Proper Content-Type and Content-Length, enabling different resource types
Chunked transfer encoding, so servers didn't need to know the content length upfront
Pipelining (send multiple requests before reading responses), though browsers largely avoided it because head-of-line blocking made it risky

HTTP/1.1 dominated for roughly 20 years. Most of the web still runs on 1.1.

HTTP/2 (2015)

Designed for a web that had exploded in complexity. Modern pages have 20–100 resources, and HTTP/1.1's limitations were killing performance. Here's what broke:

Problem: Requesting tons of resources over one connection is slow. You keep one connection open to avoid handshake overhead. But HTTP/1.1 is fundamentally sequential — send a request, get a response, send the next request. If one response is slow, everything behind it stalls (head-of-line blocking).

Solution: Multiplexing. HTTP/2 sends multiple requests and responses over a single TCP connection in parallel. Send requests 1, 2, and 3 without waiting for response 1. The server responds whenever it's ready: maybe response 2 arrives first (it's small), then response 1, then response 3. The client uses frame IDs to reassemble them correctly. One open connection, many requests flying in parallel. Huge win.

Timeline comparison:

HTTP/1.1 over one connection:
req1 ----wait for resp1---- req2 ----wait for resp2---- req3 ----wait for resp3---- [done]

HTTP/2 (multiplexed):
req1, req2, req3 ----[responses arrive in any order]---- [done, faster]

Other wins:

Header compression using HPACK: HTTP headers are now compressed, cutting overhead for large cookies or repeated headers
Server push: Servers can send resources proactively ("I know you're fetching index.html; here's style.css too"). Fell out of favor in practice.
Binary framing: HTTP/2 uses binary (not plaintext), faster to parse and harder to mess up

HTTP/2 requires HTTPS (in practice, though technically it doesn't have to). Modern browsers only accept HTTP/2 over TLS.

HTTP/3 (2022)

Rebuilt on top of QUIC (Quick UDP Internet Connections) instead of TCP. A fundamental redesign of the transport layer.

TCP's problem: TCP guarantees order. If packet 3 gets lost, TCP doesn't deliver packets 4, 5, 6 to the application until packet 3 is retransmitted and arrives. In HTTP/2 over TCP, a single lost packet stalls all parallel streams.

QUIC's answer: QUIC runs on UDP and implements its own reliable delivery, but with independent streams. A lost packet only delays the stream it belongs to. Stream 1 keeps receiving data while Stream 3 waits for that lost packet.

Other HTTP/3 superpowers:

0-RTT connection resumption: Connected to this server before? Send data on the first packet, no handshake. Huge for latency-sensitive apps.
TLS 1.3 built-in: Encryption is mandatory and integrated, not bolted on
Better mobile performance: QUIC handles connection migration elegantly — switch from WiFi to cellular, and QUIC reconnects automatically. TCP requires a whole new connection.

HTTP/3 is newer and less deployed than HTTP/2, but adoption is accelerating. Major CDNs and cloud platforms support it now.

timeline
    title HTTP Protocol Evolution and Performance Improvements
    
    1991 : HTTP/0.9 : Single request/response
         : No headers or status codes
    1996 : HTTP/1.0 : Added headers
         : One connection per request
    1997 : HTTP/1.1 : Persistent connections
         : Reduced handshake overhead
    2015 : HTTP/2 : Multiplexing
         : Header compression
         : One connection, many parallel requests
    2022 : HTTP/3 : QUIC-based
         : Independent streams
         : Better mobile, 0-RTT resumption

HTTP/2 and HTTP/3 for Django Developers

Your Django code doesn't care which HTTP version the browser uses — the web server (nginx, Apache, Gunicorn) and browser negotiate that transparently. You write views that return responses; the framework handles the version details.

But understanding HTTP/2 pays dividends. It explains why:

A CDN between you and clients (Cloudflare, CloudFront) can dramatically improve performance — they speak HTTP/2 with browsers, multiplex requests, and cache aggressively
Domain sharding (split content across cdn1.example.com, cdn2.example.com) was a performance trick in the HTTP/1.1 days but backfires in HTTP/2 (multiple connections slow it down)
Modern bundlers like webpack combine JavaScript and CSS into few files — this strategy matters less in HTTP/2 (many parallel requests are cheap) but still helps

Most Django deployments run over HTTP/1.1 or HTTP/2, and you rarely need to think about the version. But when debugging performance or designing an API, this knowledge is invaluable.

How DNS Works: The Internet's Phone Book How HTTPS and TLS Encryption Secure Data on Public Networks

Only visible to you