Why Cuts Work: The Science and Psychology of Film Editing

Section 5 of 13

How Juxtaposition Creates Meaning in Film Editing

The Kuleshov Effect: How Juxtaposition Creates Meaning

We've established that editing works because it resonates with how your brain already segments and processes reality. Griffith intuited this through sheer experimental output, building a grammar that leveraged cognitive machinery shaped over millennia. But intuition and proof are different things. In 1921, a young Soviet filmmaker named Lev Kuleshov designed an experiment — deceptively simple, but devastatingly clear — that would make this cognitive mechanism visible and undeniable. His work would show, with surgical precision, exactly what we mean when we say that meaning in film isn't transmitted by the images themselves, but generated in the gap between them, by viewers' own interpretive machinery.

Here's what Kuleshov actually did. He took a single shot of the Russian actor Ivan Mozzhukhin: a composed, expressionless close-up from a Tsarist-era film. He cut that shot into three different sequences. In the first, Mozzhukhin's face was followed by a shot of a steaming bowl of soup, then returned to. In the second, his face bracketed a shot of a woman's body in a coffin. In the third, his face appeared around a shot of a woman reclining on a divan. The audiences who saw these sequences were, by most accounts, moved. They praised Mozzhukhin's acting. They noted his hunger over the soup, his grief at the child's death, his desire for the woman. Vsevolod Pudovkin, who later claimed co-credit for the experiment, wrote that viewers "raved about the acting." The face, every single time, was identical.

That's the Kuleshov effect — and it's the proof of everything we've been building toward. These people watched the same footage and saw three entirely different performances. The grief, the hunger, the desire — none of that was on the screen. It was in the gap between shots, and the audiences generated it themselves, then attributed it to the actor. This is what it means to say that editing taps into existing cognitive processes. The Kuleshov effect demonstrates it empirically, not simply describe it. The face wasn't a blank screen; it was a canvas onto which the surrounding context projected meaning.

One thing that gets glossed over in popular accounts is that Mozzhukhin was not an unknown actor to these audiences — he was the leading romantic star of Tsarist cinema. This matters more than you might think. He wasn't simply "a face"; he was a face loaded with prior associations, charm, and audience affection. When viewers projected desire or grief onto him, they were also drawing on everything they already felt about him as a performer. The Kuleshov effect wasn't produced by blankness alone; it was catalyzed by meaningful blankness — a face known to be capable of expressiveness, now constrained and re-contextualized by what surrounded it. It's a detail that hints at something practitioners have long understood in their bones: a cut's effect depends not just on the two shots being joined, but on everything the audience already brings to each image. The "meaning between shots" is partly the relationship, partly the collision of prior associations. Context doesn't just modify meaning. Context is meaning.

Why Your Brain Does This: The Causality Engine

Here's the cognitive neuroscience version of the question: why would a brain interpret two sequential images as causally or emotionally related at all? Nothing in the physics of light hitting a screen requires you to infer that the person in shot A is feeling something about the thing in shot B. And yet viewers do it automatically, immediately, and with such confidence that they attribute imagined emotional performances to actors.

The answer is that humans are, at their evolutionary core, causality-detection machines. Our survival as a species has depended on our ability to rapidly infer relationships between events — to see a lion moving through grass and immediately update our model of the world to include "I may be in danger." The brain doesn't wait for proof of causality; it assumes it from sequential co-occurrence and then adjusts when contradicted. This is the inferential engine that David Hume was trying to describe philosophically, and it runs far faster and more automatically than conscious reasoning.

When you see two shots in sequence, your brain asks a question it cannot help but ask: why did I just see these two things together? This is not a choice. It's a reflex. The prefrontal cortex, working in close collaboration with the hippocampus and other associative memory systems, constructs a narrative bridge between images — a hypothesis about their relationship — and this happens before you have time to decide whether to do it.

graph TD
    A[Shot 1: Neutral Face] --> B{Brain asks: Why these two?}
    C[Shot 2: Emotional Scene] --> B
    B --> D[Prefrontal cortex builds causal link]
    D --> E[Hippocampus retrieves emotional associations]
    E --> F[Insula generates somatic response]
    F --> G[Orbitofrontal cortex assigns valence]
    G --> H[Meaning projected back onto Face]

This is why juxtaposition is the most efficient meaning-making tool in cinema. The editor doesn't have to show that the character is sad — they just have to show the character, then show something sad, then return to the character. The viewer's own cognitive machinery does the rest, and because the viewer generated the inference themselves, it often feels more vivid and personal than an explicit portrayal would. You didn't see grief described to you; you felt it, constructed it from the evidence provided, and believed it because you made it.

This is why a great cut often feels like revelation: the brain has arrived at an inference it believes is its own.

The Neural Correlates: What Brain Scans Show

For much of the 20th century, the Kuleshov effect was film theory's most beloved thought experiment — endlessly cited, rarely tested. The original footage is lost. Kuleshov's own account was imprecise. Pudovkin's description was written years later. When researchers finally tried to replicate it experimentally, the results were initially frustrating.

In 1992, Prince and Hensley ran the first rigorous experimental attempt using 137 participants and found no statistically significant effect. This was embarrassing for film theory — but as the methodology reveals, the study design was between-subjects (different participants saw different sequences), which is poorly suited to detecting within-person perceptual shifts. The signal was real; the instrument was miscalibrated.

The decisive confirmation came in 2006, when Dean Mobbs and colleagues ran a within-subject fMRI study — meaning each participant served as their own control, seeing the same neutral face in different emotional contexts. The results clearly showed the effect: neutral faces shown after sad scenes were perceived as sad; neutral faces shown after happy scenes were perceived as happy. And because this was fMRI, the researchers could see where in the brain this was happening.

The most thorough neural mapping of the effect comes from a 2024 study that addressed a persistent methodological complaint: previous experiments often used static photographs rather than actual film footage. This study used authentic film clips, shot under professional direction, and integrated them into genuine cinematic sequences. The behavioral results confirmed the effect. The neuroimaging data identified its neural architecture.

The key regions activated during the Kuleshov effect tell a story about how deeply the brain engages with it:

The hippocampus and parahippocampal gyrus — memory and contextual association. The brain is pulling up everything it knows about the emotional valence of what it just saw (the coffin, the soup, the woman) and connecting that stored knowledge to the face.

The orbitofrontal cortex — the brain's valence-assignment center, which evaluates whether something is positive or negative and how rewarding or aversive it is. This is the region deciding whether the face is sad or hungry or desirous, not just "emotionally activated."

The insula — interoception and embodied emotion. The insula is where emotional experiences acquire their physical texture. When you feel a pang of grief watching a scene, the insula is involved in generating the bodily component of that feeling. Its activation during the Kuleshov effect suggests that the projected emotion isn't purely cognitive — it has a somatic quality that makes it feel genuine rather than inferred.

The cuneus and precuneus — visual processing and mental imagery, including self-referential processing and the construction of imagined perspectives.

What this neural map reveals is that the Kuleshov effect isn't a simple cognitive trick. It's a full-stack emotional experience, engaging memory, valuation, embodied feeling, and perspective-taking simultaneously. The neurocinematic evidence shows that viewers aren't just reasoning about what the character feels — they're generating a miniature version of that emotion in themselves, then attributing it outward.

This is a remarkable finding. It means that when the Kuleshov effect works — when you watch Cary Grant's face and feel his suspicion, or watch a character look out a window and infer their longing — your brain is not passively decoding a signal. You are co-authoring the performance in real time, using your own emotional architecture as the generative instrument.

The POV Structure and Why Reversal Rewrites Everything

The face-scene-face pattern is one of cinema's most powerful structures precisely because it's so directional. Order matters enormously. Let's be concrete about why.

Version A: Face (neutral) → Object → Face Interpretation: The character is responding to the object. The object explains the face.

Version B: Object → Face (neutral) → Object Interpretation: The character and the object share a scene; the meaning is more ambiguous, more environmental.

Version C: Face (happy) → Object (coffin) → Face Interpretation: The character might be feeling happy despite the coffin, or pleased by it, or the juxtaposition creates irony or menace.

Hitchcock understood this viscerally. In his famous explanation of the effect to CBC's Fletcher Markle, he demonstrated the structural reversal in real time using himself as the subject: he squinted at footage of a woman with a baby, then smiled — audience reads "kind old man." Swap the baby footage for a woman in a bikini: "dirty old man." Same squint, same smile. Entirely different moral character. The shots within the sequence hadn't changed; only their combination had.

This is what Hitchcock meant when he talked about "pure cinema" — the creation of meaning through editing alone, without relying on performance or dialogue. The director's most powerful tool, he argued, was not what an actor could express, but what the audience would infer from sequential images.

The practical lesson for editors is this: whenever you're choosing where to cut to a character's face, you're not just capturing a reaction — you're constructing an argument about what that character thinks and feels. And whenever you're choosing what to cut to after a face, you're directing the inference engine of every viewer in your audience. The face hasn't changed. You have changed what it means.

The Spatial Plausibility Constraint

One of the subtler findings in the experimental literature is that the Kuleshov effect is not unconditional. It doesn't work equally for all types of inserted images. Specifically, it appears to be strongest — and most cleanly described as the "point-of-view effect" — when viewers believe the character could plausibly be looking at the intercut image.

This is the spatial plausibility constraint, and it has significant practical implications. If you cut from a character looking left at a 30-degree angle to footage of a mountain range, the viewer's causality engine constructs a POV relationship: the character is looking at the mountains. If you cut from a character looking right to footage of mountains, the spatial mismatch may be enough to disrupt the automatic inference, or at least slow it down.

This is one of the reasons continuity editing developed matching eyelines as a core principle — the 180-degree rule, the concern with consistent screen direction, the careful choreography of where characters look. These rules aren't arbitrary grammar; they're calibrated to maximize the spatial plausibility of the POV relationship, which in turn maximizes the Kuleshov effect. When we look at Hollywood's continuity system in the next section, we'll see that many of its conventions can be understood as attempts to make point-of-view inference as automatic and powerful as possible.

Conversely, when directors violate spatial plausibility intentionally — cutting to an image that couldn't be what the character is looking at — the effect doesn't disappear; it shifts register. The viewer now infers a metaphorical rather than literal relationship between the shots. This is the territory of associative montage, which Soviet filmmakers would explore aggressively: cut from a character to a machine, and the relationship becomes thematic or symbolic. The causality engine doesn't turn off; it reclassifies the link from "they are looking at this" to "this image is about them somehow."

Meaning Lives Between Shots

It's worth pausing to really sit with the core philosophical implication of all this, because it's stranger than it first appears.

In almost every other art form, meaning lives within the artwork. The paint is on the canvas. The notes are in the score. The words are on the page. A filmmaker or editor can certainly create meaning within a shot — through composition, lighting, performance, color, lens choice — but the Kuleshov effect reveals that cinema has access to a unique additional dimension: meaning that exists between shots, in the inferential gap, generated not by the filmmaker but by the viewer's own cognitive architecture.

This is what Kuleshov meant when he argued that it was not the content of images that mattered in cinema, but their combination. His theoretical position, as documented in the foundational film literature, was that raw footage was essentially pre-material — fragments awaiting assembly into meaning. The same shot of a face means nothing definite. The same shot of a bowl of soup means nothing definite. Put them together, and you have hunger. Put different elements together, and you have grief.

This is a genuinely radical position, and it has never been entirely superseded. Every time a modern editor makes an associative cut — every time they leave the logic of chronological cause-and-effect to juxtapose something unexpected — they are working in the space Kuleshov identified. The viewer's meaning-making machinery will construct a relationship. The editor's job is to engineer which relationship gets constructed.

The implications extend beyond the obvious. Consider: every shot in a film is, to some degree, "about" the shot that comes before and after it. A beautiful sunset, standing alone, is beautiful. A beautiful sunset following a scene of betrayal becomes ironic. Following a reunion, it becomes affirmation. Following a death, it becomes transcendence or indifference, depending on context. The sunset has not changed. The edit has changed its meaning completely.

The Efficiency Argument

There's also a purely practical dimension to why juxtaposition beats extended performance: it's faster, and in most cases, more emotionally effective.

Imagine you want to communicate that a character is worried about their elderly parent. You could write a scene in which the character explicitly says they're worried, or have another character ask about it. You could shoot five minutes of performance in which the actor conveys worry through behavior. Or you could cut from the character's face — held for three seconds, ambiguous — to a shot of an empty chair at a dinner table, then back to the face. Five seconds of footage. Zero dialogue. The viewer constructs the entire emotional meaning, draws on their own experiences of absence and anxiety, and likely feels it more personally than anything an actor could have been directed to perform.

This is the efficiency argument for juxtaposition, and it explains why cutting is not merely a logistical necessity of filmmaking (we couldn't possibly shoot everything in one continuous take) but an expressive advantage. The viewer's inferential machinery does work that no performance can replicate, because the inference is literally the viewer's own emotional material. The editor creates the occasion; the audience creates the experience.

This is also why juxtaposition can fail so badly when it goes wrong. If the relationship between shots is unclear, the viewer's causality engine still fires — it just constructs the wrong inference, or gives up and experiences confusion instead of meaning. The editor who lays down an associative cut that doesn't land isn't just making a puzzling choice; they're actively misfiring a neural mechanism. The audience isn't neutral when a cut doesn't work. They feel the wrongness of it, even if they can't articulate why.

Practical Implications: Every Cut Is an Argument

Let me try to translate all of this into something actionable for anyone who actually sits at an editing timeline.

The Kuleshov effect teaches that you are never simply "connecting" two shots. You are making an argument about their relationship. Sometimes that argument is causal ("this caused that"), sometimes emotional ("this character feels this way about that thing"), sometimes thematic ("these two things belong to the same idea"), sometimes ironic ("these two things contradict each other productively"). But there is always an argument, whether you intend one or not.

This means that the passive default — cutting from one shot to the next because that's what happened next in the script — is still making an argument, just probably an unexamined one. The character looked left, so we cut to what's on the left. Maybe that's exactly right. Maybe it's an opportunity missed. The question to ask at every cut is not "does this flow smoothly?" but "what relationship am I asserting, and is it the relationship I want to assert?"

It also means that the face is one of the most powerful tools in an editor's kit, precisely because of its malleability. A held face, in the right context, doesn't need to be doing anything to be doing everything. Some of the most affecting moments in great cinema are actors sitting still while the edit generates the emotional experience around them — Buster Keaton's studied blankness, the famous close-ups of Falconetti in The Passion of Joan of Arc, the long observational shots of faces in slow cinema. These moments work because the viewer's meaning-making mechanism, given a face and sufficient context, will fill the silence with something that feels utterly personal and true.

What Kuleshov discovered in a Moscow film workshop a century ago, and what neuroscientists have now confirmed with fMRI scanners, is that the audience is not a passive receiver of filmic information. They are co-creators of meaning, equipped with a sophisticated causality-inference engine that automatically builds relationships between sequential images and generates emotional experience from the gaps between shots. The neuroimaging evidence identifies exactly which neural systems participate in this construction — memory, valuation, embodied emotion, perspective-taking — and what it reveals is that a well-made cut doesn't just communicate an idea. It activates the viewer as a meaning-making participant.

Every cut is an act of applied neuroscience. And the Kuleshov effect is the foundational proof.

Early Film Editing History from Lumière to Griffith Soviet Film Montage Theory and Eisenstein

Only visible to you