How to Transcribe and Organize Family Oral History Recordings
After the Recording Stops: Processing, Transcription, and Your Notes
The interview is over. Your narrator has just told you about the time they crossed a border with nothing but a suitcase and a photograph, or the way their mother's kitchen smelled on Sunday mornings, or what they saw during the war that they've never told anyone else. You're both a little wrung out in the best possible way — the emotional weight of a real conversation settling around you like dust after a storm.
You showed up. You listened. You cared. And in that room, something irreplaceable happened. But here's what separates a meaningful oral history project from a forgotten recording: what happens next.
The interview you just conducted exists right now in a single, fragile location — on a device that can fail, corrupt, or disappear. The context that makes it fully legible — where you were sitting, how your narrator's voice changed when they mentioned their father, the detail they whispered that isn't quite audible on the recording — is already evaporating from your memory with every passing hour. The field notes you took, the impressions that felt significant in the moment: these are the scaffolding that will help you (and anyone else who encounters this recording later) understand what they're hearing and why it matters. What you do after the recording stops determines whether this interview becomes a lasting document or a file that sits on a hard drive, slowly becoming inaccessible and eventually incomprehensible. This section is about the work that transforms a raw audio file into something genuinely preserved, usable, and meaningful. None of it is glamorous. Some of it is slow. All of it matters enormously.
Make Three Copies Before You Do Anything Else
Okay, first things first. The moment your recording is complete, make two additional backup copies. Right now. Not later today. Now.
A single digital file is not a backup — it's a disaster waiting to happen. Devices fail. Files corrupt. A phone gets dropped. A hard drive clicks once and stops responding forever. You know this intellectually, but it doesn't feel real until it happens to you.
The standard practice in archival work is the 3-2-1 rule: three copies of important data, on two different kinds of media, with one copy stored offsite. For a family oral history project, this translates to:
- Copy 1: The device you recorded on
- Copy 2: External hard drive or cloud storage at home
- Copy 3: Cloud storage (Dropbox, Google Drive, iCloud) or an external drive kept somewhere else
This takes maybe ten minutes the day of the interview. It might save your family's history.
Rename That File Today
While you're at it, rename the file. The default filename your phone or recorder creates — something like AUDIO_20241103_142857.m4a — tells you almost nothing useful and becomes completely opaque in three years when you're trying to find "Grandma Vera's interview about the factory."
A good naming convention for family oral history is:
YYYY-MM-DD_LastNameFirstName_Topic_V1
So: 2024-11-03_KowalskiVera_ImmigrationPoland_V1
This format sorts chronologically by default, includes the narrator's name, gestures at the topic, and notes the version (in case you do multiple sessions). If you've conducted multiple interviews in one session, add an identifier: _Session1, _Session2.
Choose a convention and use it consistently from the first interview onward. Inconsistent naming is one of the most common and most correctable problems in personal oral history collections — and it becomes a genuine headache when you're trying to find something years later.
Adding Metadata
Beyond the filename, take five minutes to fill in whatever metadata fields your device or software supports. If you're working with audio files on a computer, you can right-click and view or edit file properties. At minimum, record:
- Date of interview
- Full name of narrator
- Full name of interviewer
- Location of interview
- Duration
- Recording device and format
- Brief subject description (two or three sentences)
This metadata is the difference between a file that's findable and a file that's lost in plain sight. The Smithsonian Institution Archives oral history program incorporates documentation and archival practices in their post-interview processes. While the Archives does use metadata standards, specific details about their post-interview metadata workflow are not publicly detailed.
Write Your Field Notes Now
Here is the most urgent thing I can tell you about post-interview practice: write your field notes before you do anything else. Not tomorrow morning. Not before dinner. Right now, while the interview is still alive in your body.
Field notes are not a transcript. They're not even about what was said. They're about everything that wasn't captured on the recording — the contextual information, the observations, the emotional texture, the things your narrator almost said but didn't, the moment they paused for a full thirty seconds before answering.
These notes are a companion document to the recording, and they make the recording legible in ways it can't be on its own. A researcher (including future you) listening to your grandmother's interview fifty years from now will not know that she was holding a rosary the entire time, or that her voice went flat when she mentioned a certain name, or that the room smelled of the soup she'd made for you. Your field notes carry that information.
What to Include in Field Notes
Think of field notes as answering three questions: What was the setting? How did it feel? What happened around the edges?
Setting and logistics:
- Where did the interview take place? Describe the physical space.
- Who else was present, even briefly?
- What were the lighting and sound conditions?
- Did any technical problems occur? At what point in the interview?
- How long did the recording actually run?
Emotional and relational observations:
- What was your narrator's general mood? Relaxed, guarded, enthusiastic, tired?
- Did their demeanor shift at any point? What seemed to trigger that shift?
- Were there topics they visibly avoided or seemed relieved to reach?
- What seemed to matter most to them?
Content that didn't make it onto the recording:
- What did they say before you pressed record, or after you turned it off?
- Were there things they said they wanted to discuss "off the record"?
- Were there people, places, or events they mentioned that you should follow up on?
- What questions arose that you didn't get to ask?
Your own reflections:
- What surprised you?
- What do you wish you'd asked?
- What do you think this interview reveals that might not be obvious on the surface?
The Smithsonian Folklife and Oral History Interviewing Guide recommends that interviewers document their observations and impressions alongside the recorded interview. Your presence in this interview is not incidental. You shaped what was shared and how. Documenting your perspective is part of honest, thorough practice.
A useful target: aim for at least one to two pages of field notes per hour of interview. That's not a lot — maybe 500 words — and it will feel like more than enough effort in the moment. But those 500 words may be invaluable twenty years from now.
Listening Back: The Review That Changes Everything
The Smithsonian Institution Archives recommends that interviewers listen to their recordings soon after the interview specifically to analyze their technique. That's useful professional development advice. But for family historians, listening back has a more immediate and personal payoff: you will hear things you completely missed while you were in the room.
This is one of the stranger experiences of doing this work. When you're sitting across from someone, your attention is divided — you're listening, watching their face, thinking about your next question, managing the equipment, feeling the weight of the moment. The recording captures everything while you were doing all that at once. When you listen back with no other demands on your attention, the interview opens up.
You'll notice the sentence they started and abandoned — and what they were about to say. You'll notice the long pause that preceded a significant disclosure. You'll notice that they answered a different question than the one you actually asked, and that their answer was more interesting than what you would have gotten. You'll hear exactly where the sound quality dipped (near the window, after they moved in their chair) so you can prepare better next time.
What to Listen For
When you do your first listen-back, try to do it in one sitting, without multitasking. Take notes — not a transcript, just flagged moments. Use a notebook or a simple document with timestamps:
- [12:43] — Voice drops here, worth noting in the transcript
- [27:15] — Background noise overwhelms for about 30 seconds
- [41:02] — The detail about the neighbor — needs follow-up question next session
- [58:30] — Didn't follow up on what she meant by "the arrangement" — schedule another interview
These timestamp notes become your working document for transcription and future interviews. They're also where you'll decide whether you need a second session — something you'll be in a much better position to judge after listening than right after the interview itself.
graph TD
A[Interview Ends] --> B[Immediate Backup - 3 copies]
B --> C[Rename file with convention]
C --> D[Add metadata]
D --> E[Write field notes - same day]
E --> F[Listen back - within 48 hours]
F --> G[Flag timestamps and gaps]
G --> H[Decide: transcribe or second session?]
H --> I[Begin transcription process]
Understanding Transcription: What It Is and Why It Matters
A transcript is a written record of a spoken interview. That sounds simple, but the decision to transcribe — and how — is more consequential than it first appears.
The case for transcription is practical and important: spoken audio is difficult to search, quote, share, or work with at a distance. A recording can become inaccessible if the format becomes obsolete, if the file is corrupted, or if the listener simply can't understand a word or accent without visual support. A transcript makes the content of an interview accessible to people who weren't there — including your own grandchildren reading it in 2070, or a researcher piecing together your family's history in a language that isn't English.
Transcription also forces you to engage with the material deeply. There's no passive transcribing. When you're converting speech to text, you have to listen to every word, catch every aside, decide how to render a laugh or a sob. That engagement often reveals things you missed even on your careful listen-back.
That said, transcription takes time. A professional transcriptionist typically spends four to six hours transcribing every one hour of audio. If you're doing it yourself, budget more. This is why choosing the right level of transcription for your project matters enormously.
The Three Levels of Transcription
There is no single correct way to transcribe an oral history. The right approach depends on how you plan to use the interview, how much time you have, and what your narrator's words deserve.
Level 1: Full Verbatim Transcription
Full verbatim transcription captures everything — every "um" and "uh," every false start, every interrupted sentence, every laugh, cough, and pause. It looks like this:
"And — and my mother, she — she was — I don't know, she was maybe thirty-five? No. No, she was thirty-two. Yes, thirty-two. She — when we got to the border — [pause] — I'm sorry. It's — it's hard to talk about this."
This level of transcription is what academic oral historians and archivists typically produce. It preserves the full texture of how something was said, which carries meaning that edited text erases. The hesitations tell you something about certainty. The false starts reveal the shape of memory. The pauses locate emotion.
Full verbatim transcription is appropriate when:
- The interview has significant historical or documentary value
- You intend to deposit the interview with an archive or library
- You or future researchers may want to study the interview linguistically or rhetorically
- The narrator is a primary witness to significant events
It is also the most labor-intensive option and produces text that can be difficult to read in long stretches.
Level 2: Edited Transcription
Edited transcription removes most of the verbal tics and false starts while preserving the narrator's authentic voice, grammar, and sentence structures. It's the most common approach for family oral history projects because it balances readability with authenticity:
"And my mother was — I think she was thirty-two. When we got to the border — [pause] — I'm sorry. It's hard to talk about this."
This level is appropriate for most family history projects. The story remains the narrator's own, in their words, but it's easier for a reader to follow. Crucially, it still preserves hesitations and pauses where they carry meaning.
Level 3: Summary Transcription
Summary transcription doesn't reproduce the narrator's exact words; instead, it summarizes what was said, often in the third person, with key quotes pulled out verbatim:
Narrator describes her mother's age at the time of the border crossing (approximately 32 years old) and becomes emotional when recounting the experience. "It's hard to talk about this."
This level is appropriate when:
- You have a very large collection and limited time
- The interview covers some ground that's already well-documented elsewhere
- You want to create an index or finding aid to accompany the full audio
Summary transcription should never be used as a substitute for preserving the recording itself — it's a navigational tool, not an archive.
graph LR
A[Full Verbatim] --> |Most accurate| B[Every word, pause, and sound]
C[Edited Transcript] --> |Best balance| D[Authentic voice, readable text]
E[Summary Transcript] --> |Fastest, least detail| F[Paraphrase + key quotes]
A --> G[Academic/archival use]
C --> H[Family history projects]
E --> I[Finding aids and indexes]
AI Transcription Tools: Useful, Not Perfect
Let's talk about the technology in the room. AI transcription tools have become genuinely remarkable in the past few years. Services like Otter.ai, Descript, Whisper (the open-source model underlying many of these tools), and Rev's AI tier can transcribe an hour of clear audio in minutes and produce a draft that's often 85–95% accurate for standard American English speech.
For family oral history, this is a game-changer in terms of time investment — but it comes with real caveats you need to understand before trusting the output.
What AI transcription does well:
- Clear speech in quiet environments
- Single speakers or well-separated speakers in alternating conversation
- Standard accents and vocabulary
- Fast turnaround at low or no cost
Where AI transcription struggles — and why this matters:
- Accents and dialects: If your narrator has a strong regional, ethnic, or foreign-language accent, accuracy drops significantly. An elderly narrator speaking in heavily accented English may produce AI transcripts full of plausible-sounding but wrong words — substitutions that change meaning.
- Names and places: AI tools frequently mishear proper nouns. Your grandmother's village in Ukraine will probably not be correctly transcribed. Neither will family names, local place names, or historical figures she mentions.
- Crosstalk and interruptions: If you or your narrator talk over each other, even briefly, accuracy degrades.
- Background noise: Any significant ambient noise — air conditioning, traffic, other family members in the background — increases errors.
- Emotional speech: Crying, very soft speech, or talking through laughter often produces errors or gaps.
The golden rule with AI transcription: treat it as a first draft that requires human review, not a finished document. The errors it makes are often plausible-sounding, which makes them dangerous. A listener checking the transcript against the audio will catch them. Someone reading the transcript cold will not.
For a practical workflow, use an AI tool to generate the draft, then sit with the audio and the transcript and correct it yourself. Even with heavy editing, this process is typically two to three times faster than transcribing from scratch. That's a real gift.
If your narrator speaks primarily in a language other than English, you'll need a human transcriptionist or translator who knows that language — preferably both the language and the dialect. AI tools are improving at non-English transcription, but their performance varies enormously by language and accent.
Editing for Readability Without Distorting Meaning
Once you have a transcript draft — whether AI-generated or typed from scratch — the editing question becomes: how much is too much?
This is where many well-intentioned family historians go wrong. The urge to "clean up" a narrator's speech is understandable, but it can slide into something that subtly misrepresents them. Smoothing out sentence fragments, correcting grammar, and eliminating repetitions might make the text easier to read — but it can also erase the narrator's distinctive voice, make a regional dialect sound "standard," and strip out the exact texture that makes the interview human.
A useful rule of thumb: edit for clarity, not for style. Remove accidental repetitions ("I, I, I was thinking"), obvious verbal tics that add no meaning, and interviewer backchannels like "mm-hmm" and "right" that clutter the page. But leave in the grammar of the narrator's speech patterns, the structures of their dialect, the way they tend to circle back to an idea. That's not error — that's character.
When in doubt, preserve. You can always make a lightly edited version for family sharing while keeping a fuller version in your archive.
Always indicate clearly in the transcript's header whether it is verbatim, edited, or summary — and who produced it, and when.
The Narrator Review Process
Here is a practice that separates thoughtful oral historians from people who just happened to conduct an interview: giving your narrator the opportunity to review their own words before you share them anywhere.
This is both an ethical obligation and a practical one. The Oral History Association best practices are explicit: oral historians should, whenever possible, provide narrators an opportunity to approve the oral history prior to public release. This isn't censorship — it's respect for the person who trusted you with their story.
In practice, narrator review means sharing the transcript (or, if no transcript exists, the recording) with your narrator and giving them reasonable time to read it. Some narrators will have no changes. Some will catch factual errors in how you've represented their words. Some will want to add clarifications. Some will realize there's something they said that they're not comfortable having preserved.
All of these responses are valid and should be taken seriously.
Handling Corrections
Minor factual corrections — a date wrong by a year, a name slightly misspelled in the transcript — are easy to handle. Make the correction in the transcript and note it: [Narrator corrected: date should be 1953, not 1954].
More substantive corrections require judgment. If your narrator says they misspoke and want a passage amended, honor that. If they say they want a section removed, you need to have a conversation. They have rights over their own words — but you should also help them understand what "removed" means in context. Does it mean cut from the transcript? Sealed in the archive? Destroyed entirely?
Your release agreement (which you should have signed before the interview — see the ethics and consent section of this course) will govern some of this. But the spirit of that agreement matters more than the letter. If your grandmother is frightened by something she said and wants it gone, the relationship matters more than the document.
Handling Retractions
A retraction is more serious than a correction — it's a narrator's request to withdraw something they said. This can happen when a narrator realizes, on reflection, that they've disclosed something sensitive: information about living relatives, traumatic details they didn't mean to share, confidential information about their professional past.
Retractions should be honored, but they require documentation. Make a note in your records: what was retracted, when, and at whose request. In some cases, the appropriate solution is not deletion but restriction — sealing that portion of the interview for a period of time, or limiting access to direct descendants. The Oral History Association's approach to narrator rights strongly supports this kind of flexibility.
Creating an Index and Summary Document
The last piece of post-interview work — and the one most likely to be skipped entirely — is creating a simple index or summary document for each interview.
Think of this as a road map to the recording. Someone who picks up this interview in ten years without having been there should be able to understand, quickly, what's in it and where to find specific content.
A good interview summary document includes:
Header information:
- Interview identifier and file name
- Date, time, and location
- Narrator full name, birth date, birthplace
- Interviewer name
- Recording duration
- Transcription level (verbatim, edited, summary, or none)
Abstract: A two to three paragraph description of what the interview covers — the main topics, key stories, time periods, and geographical areas discussed. This is what you'd read to decide whether to listen to the recording.
Thematic index: A list of topics covered, with timestamps. Even a rough index is enormously useful:
- Immigration to the United States: [00:05:30 – 00:18:45]
- Marriage to Josef Kowalski: [00:18:45 – 00:31:00]
- Working conditions at the textile factory: [00:31:00 – 00:47:20]
- The 1956 neighborhood fire: [00:47:20 – 00:52:10]
- Family recipes and food traditions: [00:52:10 – 01:04:00]
People and places mentioned: A simple list of names and locations that appear in the interview. This makes the interview searchable by genealogical connection — a future family member researching a specific ancestor can quickly find which interviews mention them.
Follow-up notes: Topics that need more exploration in a future session, questions you didn't get to ask, documents or photographs the narrator mentioned that you should try to locate.
This document doesn't need to be beautiful. A plain text file or a simple Word document is perfectly fine. The point is that it exists — that someone navigating your family archive can understand what they're looking at without listening to every hour of audio.
Building the Habit: The 48-Hour Window
If there's one thing to take from this entire section, it's this: most of the work described here — field notes, first backup, metadata, initial listen-back, beginning your summary document — should happen within 48 hours of the interview.
Memory fades. The emotional texture of the room, the things your narrator whispered, the question you meant to follow up on — all of it becomes blurry faster than you'd expect. The 48-hour window is when the interview is still fresh enough that your notes will be rich and accurate.
After that window, you can take your time with transcription, with narrator review, with building the index. Those don't require the same immediacy. But the first pass — the protective, contextual work — benefits enormously from speed.
One practical approach: some experienced oral historians write their field notes in the car before driving home, or in the parking lot, or on the train. They record a voice memo walking to their car. They do something in that first hour while the interview is still alive in them.
You don't have to be a professional archivist to do this work well. You just have to be the kind of person who sits down within 48 hours and writes a page of notes about what you just heard. That's all it takes to transform a recording from a file into a document.
The recording is the treasure. Everything we've talked about in this section is the map that ensures someone can find it.
Only visible to you
Sign in to take notes.