Getting Started with Primary Source Research

There's a particular feeling that comes from holding a document nobody has looked at in decades — a pension file, a ship manifest, a handwritten letter tucked into a court record — and realizing that the person who filled it out, signed it, or received it had absolutely no idea you would someday be reading it. It's equal parts privilege and puzzle. The document exists because a government needed to track something, an institution needed to protect itself, or a clerk somewhere was simply doing their job. It survived fire, flood, bureaucratic indifference, and the slow entropy of neglect. And now it's in front of you, and it contains something true. The question is what, exactly, and how to tell.

That question is what this course is about. Primary sources — original documents, records, and artifacts created at or near the time of the events they describe — are the raw material of history. But they don't arrive pre-interpreted, labeled with their own significance, or scrubbed clean of the distortions that shaped them. A census record tells you where a family lived in a given year, but it also reflects who the enumerator was, what categories the government had decided mattered, and who, systematically, got left out. An immigration manifest captures a name, but possibly not the name the passenger actually used — the clerk wrote what they heard, and what they heard was filtered through language, accent, and the chaos of a crowded dock. Finding these documents is one skill. Reading them — understanding why they exist, who made them, and what they can and cannot tell you — is something else entirely. And that's the skill that separates a productive archive visit from spending hours going in circles.

Here's the thing about archives: they're not random. They follow logic. Understanding the systems behind record-keeping makes every single search more effective. Rather than just handing you a list of useful websites (though you'll get plenty of those), this course walks you through how archives actually work, why certain records survive and others disappear, and what bureaucratic thinking produced the documents you're hunting. Once you understand that logic, you stop searching blindly. You start searching intelligently. And you start finding things.

I'm imagining you as someone doing real research. Maybe you're a genealogist staring at a brick wall in the 1880s and you're tired of guessing. Maybe you're a journalist trying to reconstruct a company's regulatory history, and you know the answers exist somewhere — they're just not in the press releases. Maybe you're writing the history of your town, your neighborhood, your union local, and you've realized that the published books only scratch the surface. Whatever brought you here, this course assumes you're a capable adult who can handle nuance, tolerate ambiguity, and maybe even enjoy the occasional archival rabbit hole. (And if you're the kind of person who seeks out primary sources, you almost certainly are.)

By the end of this, you'll know how to walk into any archive with a plan, navigate finding aids without panic, evaluate what you find with genuine critical rigor, request records the government hasn't published, and build a research workflow that turns raw documents into actual answers. It's slow work sometimes. Occasionally frustrating. And every now and then — when the right folder opens to the right page — it's absolutely thrilling work.

Primary, Secondary, and Tertiary Sources: A Distinction That Actually Matters

Before we go any further, we need to get precise about what a primary source actually is — because the category is less obvious than it sounds, and the distinction carries real consequences for how you do research.

A primary source is a document, object, recording, or artifact created at the time of the event or period being studied, by someone with direct involvement in or firsthand knowledge of it. It is the thing itself, not a description of the thing. A soldier's letter home from the front is a primary source. The battle orders he was following are a primary source. The military pension file he filed thirty years later, describing what he experienced, is still a primary source — it was created by a direct participant and reflects his testimony, however imperfect memory may have made it.

A secondary source is an interpretation, analysis, or synthesis built on primary sources (and often on other secondary sources). A historian's book about that battle is a secondary source. A Wikipedia article summarizing the campaign is a secondary source. A documentary about the war is a secondary source. These aren't inferior to primary sources — they often do essential intellectual work, providing context and argument that raw documents can't provide on their own. But they are always one step removed from the original evidence. Someone has already made decisions about what matters, what to quote, and what to leave out.

A tertiary source is a synthesis of secondary sources — encyclopedias, textbook summaries, subject guides. They can be useful for orientation and for identifying secondary sources worth reading. But by the time a fact has traveled from a primary document through a secondary analysis into a tertiary summary, it has been interpreted, compressed, and paraphrased at least twice. Small distortions compound.

The distinction sounds academic. It has practical consequences that will affect your research every single week.

Consider a single historical event: the Triangle Shirtwaist Factory fire of March 25, 1911, in New York City, which killed 146 garment workers, most of them young immigrant women. Here is how that event looks at each level of the source hierarchy:

In a tertiary source — say, a history textbook chapter on the Progressive Era — you might read: "The Triangle fire, which killed 146 workers, galvanized the labor movement and led to sweeping workplace safety reforms." That sentence is accurate as far as it goes, but it is a compression of decades of historical argument into twenty words. It tells you nothing about the specific reforms, who fought for them, who resisted, or whether the workers who survived saw meaningful change in their own lifetimes.

In a secondary source — a historian's monograph on the fire and its aftermath — you get something much richer: argument, evidence, specific names, contested interpretations, footnotes pointing you toward the primary sources the author used. David Von Drehle's Triangle: The Fire That Changed America reconstructs the fire from newspaper accounts, trial testimony, investigation records, and survivor interviews. This is enormously valuable. But it is still Von Drehle's reading of those sources — his judgment about which testimony was credible, which details were significant, which causal connections to draw.

In a primary source — say, the transcript of the 1911 coroner's inquest — you encounter something no secondary account can fully replicate: the actual words of witnesses under oath, the specific questions investigators thought to ask (and the ones they didn't think to ask), the procedural language that reveals how the legal system categorized the deaths. You also encounter friction: witnesses who contradict each other, testimony that seems evasive, records that raise questions the inquest never resolves. A secondary source has already smoothed much of that friction. The primary source preserves it.

None of this means you should skip the secondary sources — quite the opposite. A good secondary source is your roadmap to the primary ones. Footnotes and bibliographies in scholarly books are, for the working researcher, almost as valuable as the argument itself. They tell you which archives hold the relevant collections, which record groups contain the useful files, which collections other researchers have found productive. Read the secondary literature first. Then go find the evidence it was built on.

The Core Mental Model: Creator, Purpose, Audience, Context

Here is the framework you'll apply to every document you encounter in this course and, if I've done my job, in every research project you undertake afterward. It has four elements:

Creator. Who made this document? Not just the name on the signature line — but what kind of person were they, what was their position, what did they know and not know, what pressures were they under? A census enumerator canvassing a crowded immigrant neighborhood in 1900 was an overworked government employee paid per household recorded. That fact shapes every entry they made.

Purpose. Why does this document exist? What was it created to accomplish? Documents are almost never created to serve future historians. They are created to meet an immediate institutional or personal need. A military pension file exists because the government needed to determine whether a veteran's injury claim was legitimate. That purpose shapes what questions were asked, what evidence was collected, and what the examining doctor was incentivized to write. Understanding the purpose tells you what the document was designed to capture — and therefore what it might systematically miss.

Audience. Who was this document written for? A letter intended for a commanding officer reads differently than a letter intended for a wife back home. A deposition given to a federal investigator reads differently than testimony given to a sympathetic church tribunal. The audience shapes what the creator chose to include, omit, emphasize, and soften. When you read a document, you are almost never the intended audience. That gap is where analysis lives.

Context. When and where was this document created, and what was happening? A government report on labor conditions written in 1935, at the height of the Depression and the early New Deal, exists within a specific political moment that shaped which facts the report's authors thought it was important to document, and which conclusions they were inclined to reach. Strip away that context and you misread the document.

Let's run this framework on a specific example: a Bureau of Indian Affairs field agent's report on conditions at a reservation school, dated 1902.

On its face, the report appears to be a neutral administrative document — columns of enrollment figures, attendance rates, crop yields from the school's farm, a brief narrative assessment of the school's "progress." It has the tone of bureaucratic objectivity. A researcher who takes it at face value might quote its statistics as straightforward facts about reservation life.

Apply the framework. The creator is a federal employee whose career advancement depended on demonstrating that government policy was working. His purpose was to justify the school's continued funding and demonstrate compliance with assimilation mandates. His audience was his superiors in Washington, who wanted evidence that the federal Indian boarding school program was achieving its goals. The context is a period of explicit government policy aimed at eliminating Indigenous languages and cultures — a policy the agent was employed to implement.

None of this means the enrollment figures are fabricated. They may be entirely accurate. But the narrative assessment of "progress" — what it counted as progress, what it didn't mention, who wasn't there to give a contradicting account — now looks very different. The document reveals something true about the federal government's self-perception and administrative machinery. It reveals far less, unmediated, about the actual experience of the children enrolled.

This is what primary source research actually looks like: not simply locating old documents and trusting what they say, but reading them through the lens of how and why they came to exist. The Creator-Purpose-Audience-Context framework gives you a repeatable set of questions to ask every time you open a new folder. By the end of this course, asking those questions will be automatic.

What Primary Source Research Can and Cannot Answer

Primary sources can answer questions about what was officially recorded, claimed, or documented at a specific moment in time. They can corroborate or contradict accounts that appear in secondary sources. They can reveal details about specific individuals, places, transactions, and events that no published history has ever synthesized. They are irreplaceable for reconstructing the texture of daily life — the actual names, wages, addresses, relationships, and movements of people who never wrote memoirs and never became famous.

What they cannot do, on their own, is tell you what they mean. A document is evidence, not argument. Moving from evidence to interpretation — deciding what a collection of sources, taken together, actually demonstrates — requires the skills of source criticism, corroboration, and synthesis that this course is designed to teach. It also requires intellectual honesty about what the evidence doesn't show, what records no longer survive, and where the gaps in the historical record are simply too large to bridge with confidence.

That tension — between what the documents say and what we can responsibly conclude from them — is where the real work of historical research happens. It's more demanding than Google. It's also considerably more interesting.

The rest of this course moves through the practical architecture of that work: how archives are organized and how to navigate them, how to find and read specific record types, how to evaluate what you find, and how to build a research workflow that actually produces answers. Each section builds on the last. By the time you finish, you won't just know where to look — you'll know how to think about what you find when you get there.

Let's get started.

What Is a Primary Source and Why It Matters

Only visible to you