Artificial Intelligence Explained: A Plain-Language Guide for Everyone
Section 3 of 17

History of AI from Chess Machines to ChatGPT

8 min listen Updated

The dream showed up early, and it showed up confident. In the summer of 1956, a small group of mathematicians and engineers gathered at Dartmouth College for a workshop. Their pitch, written the year before, claimed that a machine could be made to simulate "every aspect" of learning and intelligence — and that a serious crack at it might take a couple of months with the right people in a room. They invented the term "artificial intelligence" right there in the proposal. They figured the hard part was getting started.

Seventy years later, the room is still working on it. And that gap — between the breezy confidence of the 1950s and the slow, lurching reality of the decades that followed — is the story this whole episode is built around. Because the history of AI isn't a smooth climb. It's a story of a really good idea that took fifty years to find the three things it needed to actually work. Understand what those three things were, and you understand why ChatGPT showed up in your life when it did, and not in 1985.

Start with how the first builders thought the job would go. The early approach to AI was beautifully simple in concept: if you want a machine to be smart, you write down everything a smart person knows, as rules, and the machine follows them. This is the world of if-then-else. IBM's own write-up on machine learning makes the point with the humblest example imaginable — a thermostat. Give it a rule: if the temperature drops below sixty-eight, turn on the heat. That's technically a decision made by a machine without a human in the loop. It's the simplest rules-based AI there is.

Now scale that up. In the 1970s and 80s, the dream became what people called expert systems — giant rulebooks built by interviewing actual human experts and writing down their knowledge as thousands of rules. A medical one might take a patient's symptoms, their circumstances, their other conditions, and walk a branching tree of rules to suggest a diagnosis. For narrow, tidy problems, this worked. It genuinely worked.

Here's where it hit a wall, and the wall is worth understanding because it explains everything that came after. As IBM puts it, rules-based models get "increasingly brittle" as the task gets more complicated. The reason is that for most real-world problems, you can't write down every rule. Take spam email. Sit down and try to write the universal, airtight rule for what makes an email spam. You can't. Spammers change tactics. The exceptions have exceptions. The British computer scientists working on this in the early days kept discovering the same painful thing — every rule they wrote needed three more rules to patch the cases it broke. The rulebook didn't converge on intelligence. It exploded into chaos.

That brittleness wasn't a bug in someone's code. It was a signal that the whole strategy was backwards. And nowhere did it sting harder than in language — which brings us to a single afternoon in January 1954 that captures the entire long, slow road of AI in miniature.

On that day, IBM and Georgetown University ran a public demonstration. A machine translated more than sixty sentences from Russian into English. The press loved it. The researchers predicted that machine translation would be a solved problem within three to five years. It was hand-coded — a small vocabulary, six grammar rules, carefully chosen sentences. And it dazzled.

Three to five years came and went. So did thirty. Language turned out to be the cruelest possible test for rules-based AI, because language is nothing but exceptions. Think about a sentence the system would have to handle. "The spirit is willing, but the flesh is weak." There's an old, possibly apocryphal story that an early translator turned that into Russian and back and got something like "the vodka is good, but the meat is rotten." Whether or not that exact story is true, it nails the problem perfectly. Words don't carry fixed meanings you can look up. Meaning lives in context, in the relationships between words, in things no rulebook captures. The Georgetown demo promised the finish line was close. Language AI then spent roughly the next half-century stuck.

So if writing the rules by hand was a dead end, what was the alternative? This is the hinge of the entire story. The big idea wasn't to tell the computer what to do. It was to show it examples and let it figure out the rule itself.

Go back to that spam filter. The rules-based way needs a human to invent perfect criteria. The other way, the one IBM describes, needs only two things — a pile of sample emails and a method for learning. You show the system thousands of emails. For each one, it guesses spam or not spam. You tell it how wrong it was. It nudges its own internal settings to be a little less wrong. Then you do it again. And again, thousands of times, until it's accurate. Nobody ever wrote the rule for spam. The machine learned the pattern on its own.

This idea is older than people think. The term "machine learning" traces back to an IBM researcher named Arthur Samuel, who in a 1959 article described a computer that learned to play checkers. Samuel put the goal beautifully — he wanted a computer that could "learn to play a better game of checkers than can be played by the person who wrote the program." Sit with that for a second. A machine that ends up better than its own maker, not because the maker was a genius, but because the machine practiced. That's the whole shift, right there, in 1959.

So here's the obvious question. If the learning idea showed up in the 1950s, why did the world wait until the 2010s and 2020s for it to actually pay off? — The honest answer is that a good idea isn't enough. The idea needed three ingredients to come together, and for fifty years, at least one of them was always missing.

The first missing piece was data. Learning from examples only works if you have a mountain of examples. In 1959 there was no mountain. There was barely a hill. You couldn't show a machine a billion photos of cats because nobody had a billion digitized photos of anything. The internet changed that completely. Suddenly humanity was producing text, images, clicks, and video at a scale no one could have imagined — the raw material a learning machine eats.

The second missing piece was computing power, and this one has a wonderful twist. The technique at the heart of modern AI — neural networks, layers of simple math stacked on top of each other — wasn't new at all. As an MIT explainer from 2017 lays out, neural networks were first proposed back in 1944 by two University of Chicago researchers, Warren McCulloch and Walter Pitts. The idea has been going in and out of fashion for more than seventy years. The MIT neuroscientist Tomaso Poggio has a lovely way of describing this — he compares scientific ideas to flu viruses, each strain coming back roughly every twenty-five years once a fresh generation is ready to fall in love with it again. Neural networks got hyped in the 1940s, declared dead by MIT mathematicians around 1969, revived in the 1980s, faded again, and then came roaring back in the 2010s.

What finally broke the cycle wasn't a clever new theory. It was, of all things, video game hardware. As that same MIT piece notes, the recent comeback was fueled largely by the raw horsepower of graphics chips — GPUs, the processors built to render video games. It turns out the math you do to draw an explosion on screen is remarkably similar to the math you do to train a neural network. Gamers, in a sense, paid for the chips that trained the AI. The hardware that beat the final boss is the hardware that wrote your last email draft.

So: a learning idea from the 1950s, oceans of data from the internet, and cheap massive computing power from gaming chips. That's three of the ingredients. But there was a fourth thing — a better algorithm — and that's the piece that turned a decent technology into the explosion you're living through.

This is the part that trips people up, so let it land slowly. By the early 2010s, neural networks were already getting good at recognizing images. Show one enough labeled photos and it could tell a cat from a car. But language stayed stubborn. The systems of that era read a sentence word by word, in order, and by the time they reached the end they'd half-forgotten the beginning. They had no good way to handle the long-distance relationships that meaning depends on — the way the first word of a sentence can change what the last word means.

Then, in 2017, a team of researchers at Google published a paper with a deliberately cheeky title — "Attention Is All You Need." It introduced an architecture called the transformer. The deep details of how it works belong to a later episode, so hold the curiosity for now. What matters for the history is what it unlocked. The transformer let a model look at an entire passage all at once, instead of crawling through it one word at a time, and figure out which earlier words mattered most for predicting what came next. That made training dramatically faster, which meant you could train on far more data, which meant the models could get far bigger and far more capable.

And that's the spark. Massive data was ready. Cheap computing was ready. The learning idea had been ready since the 1950s. The 2017 transformer was the match. Within five years you had ChatGPT, image generators, coding assistants — the whole generative AI boom. Not because anyone suddenly got smarter than the Dartmouth group in 1956, but because the last ingredient finally arrived and met the other three.

Here's a debate worth knowing, because serious people genuinely disagree on it. One camp treats the transformer as the moment everything changed — the real birth of modern AI. Another camp, including plenty of veteran researchers, pushes back hard. Their argument is that the transformer wasn't a conceptual revolution at all. It was the same old neural network idea from 1944, finally given enough data and enough chips to show what it could always do. On this reading, there was no genius breakthrough — there was just scale, patiently waiting for the world to catch up. The evidence leans toward the second view. The math underneath has been around for decades. What changed wasn't the idea. It was everything around the idea.

So if a friend asks you why AI happened now and not forty years ago, here's the line to give them. The idea was never the bottleneck — the data, the chips, and one well-timed algorithm were. For half a century the dream sat there, fully formed and mostly useless, waiting for the world to build the conditions it needed to work.

Strip it down and three things carry the whole story. The first builders tried to write every rule by hand, and that brittleness killed them. The real breakthrough was flipping it — showing machines examples and letting them find the pattern. And that pattern-finding only exploded when data, cheap computing, and the 2017 transformer finally lined up at the same time.

Which leaves one question hanging, the one that powers everything from here on out. That spam filter, nudging its own settings until it stops being wrong — how does a machine actually learn from an example, step by step, without anyone telling it the rule? That's the engine under all of it, and it's where this goes next.