Section 3 established the core flip: instead of writing rules, you show the machine examples and let it find the pattern. But "examples" covers a lot of ground. Sometimes those examples come with the answers already attached. Sometimes they don't. And sometimes there are no examples at all — just a score the machine gets for trying.
Those three situations produce three distinct styles of machine learning, and getting them apart is worth the effort, because each one shows up in completely different corners of the AI you use every day. Here's the map.
Start with the most common style by far. Supervised learning is what you'd do if you were studying for a vocabulary test with flashcards. On the front, a word. On the back, the definition. You look at the front, guess the answer, then flip the card to check. Get it wrong, and you adjust. Do this a few thousand times and the words start to stick.
That's supervised learning in one image. The machine gets a pile of examples where every example is already labeled with the right answer. A photo, and the label "cat." An email, and the label "spam." A house with its square footage, its neighborhood, its number of bedrooms — and the label, the price it actually sold for. The machine makes a guess, checks it against the real answer, and nudges itself a little closer each time.
The word "supervised" is doing real work here. A human had to supply those answers. Someone, somewhere, labeled thousands of emails as spam or not-spam. Someone tagged the photos. That labeling is often the most expensive, most tedious part of the whole operation, and it's why good labeled data is treated like treasure.
Once it's trained, supervised learning splits cleanly into two jobs. The first is classification — sorting things into buckets. Spam or not spam. Cat, dog, or neither. Fraudulent transaction or normal one. The second is prediction of a number, which the field calls regression. How much will this house sell for? How many units will we ship next month? Same underlying idea, different shape of answer — one's a category, the other's a number.
Supervised learning is the workhorse behind an enormous amount of the AI you already touch. The spam filter quietly protecting your inbox. The system at your bank that flags a transaction in Bangkok when you've never left Ohio. The model that reads a medical scan and points a radiologist toward a suspicious patch. All of it learned from examples where humans had already supplied the answers.
Now take the answers away.
Picture the junk drawer in your kitchen. Batteries, rubber bands, takeout menus, a spare key, three pens, a screwdriver. Nobody handed you a labeling scheme. But if someone asked you to tidy it, you'd start making piles anyway — writing stuff here, electrical stuff there, the random keys in a corner. You'd find the groupings yourself, without anyone telling you the categories in advance.
That's unsupervised learning. The machine gets a heap of data with no labels, no answer key, no "here's what this is." Its job is to find the structure hiding inside — to notice that some things cluster together and others don't.
The classic business use is customer segmentation. A company feeds in everything it knows about its shoppers — what they buy, when, how often, how much — and the machine sorts them into natural groups. Maybe it surfaces a cluster of late-night bargain hunters, another of weekend big-spenders, another that only ever shows up during a sale. Nobody defined those groups ahead of time. The machine found them in the shape of the data, and now the marketing team can talk to each group differently.
The crucial difference is right there. In supervised learning, you know the categories and you're teaching the machine to apply them. In unsupervised learning, you don't know the categories yet — you're asking the machine to reveal them. It's less "is this spam?" and more "what kinds of things are even in here?"
Quick check before moving on. A streaming service wants to group viewers by their tastes without deciding the genres in advance. Supervised or unsupervised? Take a second.
It's unsupervised. There's no answer key — no one's labeled each viewer as "rom-com person" or "documentary person." The machine has to find those clusters on its own from the viewing data. That's the tell: no labels, looking for hidden structure, unsupervised.
The third style throws out the pile of examples entirely.
Think about training a dog. You don't hand the dog a textbook on sitting. You wait for it to do something in the right direction, and when it does, it gets a treat. Wrong move, no treat. Over many repetitions, the dog works out which behaviors lead to the good stuff and does more of those. It learned from consequences, not from examples.
That's reinforcement learning. There's no labeled dataset. Instead there's an agent — the learner — turned loose in some environment, free to take actions, and a reward signal that tells it how well things are going. The machine tries something, gets a score, and gradually shapes its behavior toward whatever earns the highest reward.
This is the style behind the headline-grabbing systems that mastered games. The program that beat the world's best players at Go didn't study a manual. It played millions of games, mostly against itself, getting rewarded for winning, and slowly discovered strategies no human had thought to teach it. The same approach trains robots to walk, to grip an unfamiliar object, to recover when they stumble — try, get a score, adjust, repeat.
Reinforcement learning fits a particular kind of problem: one where you can't easily write down the right answer for every situation, but you can recognize whether things are going well. You can't list every chess position and its correct move. You can tell who won. That feedback is enough.
So far this sounds like three tidy boxes. In practice, the most powerful modern systems reach for more than one at a time.
The chatbots from later in this course are the clearest example. A large language model first learns by predicting text, soaking up oceans of writing in a way that needs no human labeling. Then it gets refined using human feedback — people rating answers, the machine adjusting toward the responses people prefer. That refinement step is reinforcement learning, layered on top. Two styles, stacked, each doing the part it's best at. A whole later section is devoted to exactly how that works.
The takeaway isn't that you should memorize crisp boundaries. It's that "machine learning" was never one thing. The style depends entirely on what you've got: answers attached, no answers, or no examples at all.
So, three piles. Supervised learning: flashcards with the answers on the back — spam filters, fraud detection, medical scans. Unsupervised learning: sorting the junk drawer with no labels — finding hidden groups like customer segments. Reinforcement learning: the dog and the treats — learning from rewards, the engine behind game-playing AI and robotics. Three different answers to one question: what does the machine have to learn from?