Why Cuts Work: The Science and Psychology of Film Editing
Section 6 of 13

Soviet Film Montage Theory and Eisenstein

Soviet Montage Theory: Eisenstein, Pudovkin, and the Collision of Ideas

The Kuleshov effect wasn't some happy accident or laboratory curiosity—it was a deliberate discovery, born from systematic experimentation in a Moscow film workshop in the early 1920s. But Kuleshov's insight about how juxtaposition creates meaning was never meant to stand alone. It was the opening move in a much larger intellectual project: the attempt to build a complete, rigorous theory of editing as the fundamental language of cinema itself. The filmmakers who carried this project forward—Vsevolod Pudovkin, Sergei Eisenstein, and Dziga Vertov—took Kuleshov's empirical findings and pushed them into the realm of revolutionary ideology, artistic manifestos, and systematic philosophy. Working in post-revolutionary Moscow with scarce resources and an explicit mandate to reshape consciousness through cinema, they asked the questions Kuleshov had answered and demanded answers to deeper ones: What does juxtaposition do psychologically? Where exactly does meaning come from when two shots collide? Is editing merely the addition of images, or is it something more radical—a multiplication, a collision, a fusion that creates something entirely new? They argued furiously with each other. And in many ways, they were both essentially right. What emerged was the first serious, systematic theory of editing cinema has ever produced.

Pudovkin's Constructive Montage: Bricks Building a Wall

Vsevolod Pudovkin represents the more accessible wing of Soviet montage theory—accessible not because his ideas are obvious, but because they extend from something you already intuitively understand. His foundational metaphor is almost comfortingly straightforward: montage is the art of building meaning from pieces, the way a mason builds a wall from bricks.

For Pudovkin, individual shots don't arrive with meaning pre-loaded. A shot of a face is raw material, not a statement. Meaning gets constructed in the editing—in the deliberate selection and sequencing of shots that guide the viewer's eye, attention, and emotional response toward a specific conclusion. Pudovkin laid out his theory in his 1926 book Film Technique, distinguishing five types of editing relationships: contrast, parallelism, symbolism, simultaneity, and leitmotif. Each describes a different way shots can be deliberately selected and arranged to produce a meaning greater than any individual image.

The key word in Pudovkin's system is linkage. Shots link together. They chain. Each one leads to the next in a causal or associative sequence that the editor constructs, leading the audience through an experience that feels continuous and directed even when assembled from pieces shot at different times in different places. Think of it like a sentence: words have individual meanings, but the sentence creates something that depends entirely on their combination and order. For Pudovkin, editing is fundamentally syntactic—it's about well-formed sequences that communicate clearly.

This makes Pudovkin's theory deeply compatible with narrative filmmaking. His approach explains how you build a scene, how you guide attention, how you create suspense by intercutting between events happening simultaneously in different locations. The 180-degree rule, eyeline matching, the logic of screen direction—these aren't Pudovkin's inventions, but they fit naturally within his constructive framework. You're building an experience for the viewer, brick by careful brick.

Pudovkin was also a brilliant practical filmmaker. Mother (1926) and The End of St. Petersburg (1927) demonstrate his approach beautifully—scenes constructed through precise shot selection, emotional states built gradually through accumulated images, the craft remaining invisible in service of the emotional result. Watch the sequence in Mother where the protagonist's husband is arrested: Pudovkin doesn't cut to emphasize conflict or create cognitive dissonance. He cuts to guide attention, to show you what you need to see, to build the scene's emotional logic step by step.

The Pudovkin model is, in essence, what most editors do most of the time. It's the foundation of the Hollywood continuity system, even if Hollywood never called it by his name.

Eisenstein's Collision: The Third Thing That Belongs to Neither

And then there's Eisenstein.

Sergei Eisenstein agreed with Pudovkin on the essentials: montage was the essential act of cinema, and meaning was created in the cut, not in the individual shot. But he disagreed, emphatically and with characteristic intensity, about nearly everything else. Where Pudovkin saw linkage, Eisenstein saw collision. Where Pudovkin was building walls, Eisenstein was detonating them.

Here's the core of Eisenstein's theory: when you place two shots next to each other, you don't get the sum of those shots. You get something entirely new—a meaning that belongs to neither shot individually but is generated by their conflict. His model is explicitly dialectical, drawn from Hegel and Marx: thesis and antithesis produce synthesis. The synthesis is not thesis plus antithesis. It is a third thing.

Eisenstein found his theoretical model in an unexpected place: Japanese kanji. As he described in his 1929 essay "The Cinematographic Principle and the Ideogram", the Japanese written language combines simple pictographic elements to create meanings that transcend their components. The character for "eye" plus the character for "water" produces not "wet eye" or "watery eye" but "weeping." The character for "dog" plus the character for "mouth" produces "barking." Neither component contains the combined meaning—it emerges from their juxtaposition. For Eisenstein, a film cut worked exactly this way.

This is a more radical claim than it first appears. Pudovkin is saying: shots, carefully chosen and arranged, communicate the director's meaning to the audience. Eisenstein is saying something different: shots, placed in conflict, generate meaning that the audience constructs—a meaning that may be more complex, more ambiguous, more philosophically rich than any image could contain alone. The audience isn't being led to a conclusion; they're being forced into an act of creative synthesis.

graph TD
    A[Pudovkin: Linkage] --> B["Shot A + Shot B = A + B"]
    B --> C[Meaning is additive — shots build on each other]
    D[Eisenstein: Collision] --> E["Shot A × Shot B = C"]
    E --> F[Meaning is generative — a third idea emerges]
    G[Kuleshov Foundation] --> A
    G --> D
    H[Practical Result] --> I[Hollywood continuity editing]
    H --> J[Intellectual montage, Godard, music video, experimental film]
    A --> H
    D --> H

This also explains why Eisenstein's films feel different from Pudovkin's—more aggressive, more demanding, sometimes exhausting. Eisenstein isn't guiding you; he's assaulting your cognition with juxtapositions that force you to do work. A Pudovkin film builds a room and invites you in. An Eisenstein film throws bricks at you and expects you to understand what kind of building they might have formed.

The Five Methods: Eisenstein's Taxonomy of Montage

What makes Eisenstein particularly valuable as a theorist is that he didn't stop at the general principle. He went on to categorize the ways that collision could work—what variables a filmmaker could manipulate to create different kinds of meaning and different emotional effects. His taxonomy of five montage types, developed across several essays in the late 1920s, remains the most systematic attempt to describe what editors actually do when they cut.

Understanding these five types isn't just intellectual history. As we'll see, they map almost perfectly onto the psychological mechanisms we've been discussing throughout this course—the brain's systems for processing temporal pattern, motion, emotional tone, and abstract meaning.

Metric Montage: Rhythm as Pure Mathematics

Metric montage is the simplest and most extreme form: cutting at fixed intervals, regardless of what's happening in the frame. The length of each shot is determined by an absolute rhythmic beat, like a musical measure. Content is subordinate to temporal structure.

Eisenstein understood something fundamental about human perception here. The brain's temporal pattern-detection is one of its most primitive and most powerful systems. When the brain detects a regular rhythm, it anticipates the next beat. When the cut arrives exactly on that beat, there's a small satisfaction—a confirmation. When the rhythm accelerates (shorter and shorter intervals), the corresponding sense is urgency, mounting pressure, physiological arousal. The content of the individual shots barely matters; the body is responding to the temporal mathematics itself.

You've experienced metric montage in places you might not have expected. Modern action film trailers often operate on pure metric rhythm—cuts synchronized to a musical beat, independent of what's being shown. The effect is excitement, momentum, pulse. The meaning is almost entirely in the rhythm. Research on temporal pattern perception shows that regular rhythms activate anticipatory neural circuits that, when satisfied, produce small dopaminergic rewards, and when violated or accelerated, produce stress responses. Metric montage is literally playing with your autonomic nervous system.

The limitation Eisenstein himself acknowledged: metric montage is blunt. It produces excitement or urgency, but it's not very precise about what kind, and it can numb the audience if used for too long without variation. He was already thinking about how to layer additional dimensions onto the base rhythm.

Rhythmic Montage: Rhythm as Visual Choreography

Rhythmic montage introduces a second variable: the movement within the frame. The cut doesn't happen at an absolute beat but in relationship to the visual rhythm created by what's moving inside each shot—how fast, in what direction, with what force.

Here's the key insight: a cut can feel rhythmically right or wrong depending on whether the visual motion in one shot resolves harmoniously with the motion in the next, or conflicts with it. An object moving right in frame A, cut to an object moving left in frame B, creates a sense of visual collision—your brain registered momentum in one direction and was abruptly confronted with the reverse. An object moving right, cut to an object moving right, creates visual continuity even across a spatial discontinuity.

Eisenstein's masterwork demonstration of rhythmic montage is the Odessa Steps sequence in Battleship Potemkin (1925)—perhaps the most analyzed sequence in film history, and for good reason. The sequence shows Tsarist soldiers massacring civilians on the famous Odessa staircase, and Eisenstein uses rhythmic montage to create an overwhelming sense of mechanized brutality versus panicked human chaos.

Notice what he does with directional movement: the soldiers march steadily downward, frame left to right and top to bottom—organized, mechanical, inevitable. The civilians flee in every direction—chaotic, panicked, unpredictable. When Eisenstein cuts between these, the visual rhythms clash. But then he does something more subtle: he introduces shots of deliberate stillness—the mother with the dead child, who is moving against the flow; the baby carriage that begins an agonizingly slow roll down the steps. The contrast between the kinetic chaos around these still moments creates visual emphasis that no amount of cutting to action could produce. Film scholars have noted that the Odessa Steps sequence's power comes precisely from Eisenstein's manipulation of movement rhythm, not just cutting speed.

Rhythmic montage is the type most editors work with intuitively. When you feel that a cut is "off," it's often because the visual rhythm of the outgoing and incoming shot are fighting each other. When a cut feels "clean," it's often because you've found a moment where the rhythms align or resolve in a way that feels satisfying. Eisenstein's contribution was to make this explicit and systematic.

Tonal Montage: The Emotional Key of a Shot

Tonal montage operates on a different axis entirely—not time or movement, but what Eisenstein called the dominant emotional "tone" of a shot. This includes light and shadow, the degree of contrast, the sense of stillness or activity, warmth or coldness, density of visual information, and other qualities that give a shot an emotional temperature independent of what's being depicted.

The key idea is that shots can be cut together not to create narrative continuity or visual rhythm, but to create tonal contrast or consonance—to establish an emotional key, modulate it, and resolve it in a way that shapes the audience's feeling without their necessarily being able to articulate why.

Think of a sequence cut entirely in low-contrast, hazy shots with soft movement: it produces a dreamy, uncertain quality regardless of what's being shown. Cut to a shot with hard shadows and strong contrast, and the emotional shift is immediate—something has changed, something is sharper, more threatening. Eisenstein was describing what we'd now call the affective dimension of editing: how tone influences emotional state independent of semantic content.

Neuroscience research on visual affect confirms that low-level visual properties—contrast, brightness, spatial frequency, motion coherence—influence emotional processing in the brain's limbic system before conscious perception even occurs. Your amygdala is responding to the texture of the image before your cortex has processed what the image is of. Tonal montage is, in Eisenstein's framework, the art of editing to those pre-semantic emotional responses.

This is the hardest of the five types to teach directly, because it requires a sensitivity to shot quality that goes beyond content analysis. It's why experienced editors talk about shots "feeling wrong" even when they can't immediately explain why—they're detecting tonal mismatches that create emotional dissonance even when the narrative logic is sound.

Overtonal Montage: The Full Chord

Overtonal montage is Eisenstein's synthesis of the previous three: a cutting strategy that operates simultaneously on metric rhythm, visual movement rhythm, and emotional tone to create a complex, layered effect he compared to the harmonics of musical sound.

When you hear a note played on a piano, you're not hearing a single pure frequency. You're hearing a fundamental tone plus a series of overtones—higher harmonics that give the note its particular color and resonance. A piano middle C and a violin middle C are the same pitch, but they sound different because their overtone structures are different. Eisenstein's argument is that truly masterful montage works the same way: the primary effect is the most obvious meaning, but it resonates with rhythmic and tonal overtones that give the sequence its specific emotional color.

The Odessa Steps sequence again: the primary effect is horror at the massacre. But the metric rhythm creates a sense of inevitability (regular, relentless). The rhythmic montage between organized military movement and chaotic civilian panic creates a sense of helplessness. The tonal contrast between the soldiers' dark uniforms and the civilians' white clothes creates a moral dimension. All three operate simultaneously, reinforcing each other, producing an experience that is more than the sum of its analytical parts. You can't fully account for how the sequence feels by describing any one of these dimensions alone.

Overtonal montage is where editing becomes, in Eisenstein's view, most like music—not because it's synchronized to sound, but because it creates layered, simultaneous structures that produce effects no single element could generate. This is also, incidentally, why it's so difficult to deconstruct masterfully edited sequences: you're always in danger of mistaking the overtone for the fundamental.

Intellectual Montage: Images That Think

The fifth type is where Eisenstein becomes genuinely radical—and genuinely controversial, both aesthetically and politically.

Intellectual montage is the use of images not to create narrative meaning or emotional response, but to construct abstract ideas. Not "I feel sad for this character" but "I understand a concept about capitalism" or "I see the relationship between two social forces." The images become ideograms in the sense Eisenstein found in Japanese kanji: raw material for thought, not depiction.

His most famous example comes from Strike (1925). During the sequence depicting the massacre of striking workers, Eisenstein cuts to footage of a bull being slaughtered in an abattoir. There is no bull in the story. The workers are not bulls. No one is claiming that the Tsarist police are butchers in any literal sense. But the juxtaposition produces a concept: these workers are being slaughtered like animals. The meaning is entirely a product of the collision—it exists nowhere in either image alone.

graph LR
    A["Shot: Workers being massacred"] --> C{"Intellectual Montage Cut"}
    B["Shot: Bull being slaughtered in abattoir"] --> C
    C --> D[Concept: The dehumanization of workers]
    D --> E[Political meaning transcending both images]
    style C fill:#8B0000,color:#fff
    style D fill:#2C3E50,color:#fff
    style E fill:#1a5276,color:#fff

This is a politically charged example, and deliberately so—intellectual montage, for Eisenstein, was explicitly a tool of ideological communication. But the technique extends beyond propaganda. Any time editing is used to construct a conceptual relationship between two things that aren't narratively connected, you're in the territory of intellectual montage. The classic horror film technique of cutting from a character's fear to an ambiguous stimulus to a predator—creating the concept of imminent threat even before threat has materialized—is a form of intellectual montage. Music video editing that cuts between a performer and a series of metaphorically resonant images is intellectual montage. The hard cut from a bone flying in the air to a spaceship in 2001: A Space Odyssey is perhaps the most famous single-cut intellectual montage in film history: the entire history of human technology, collapsed into a single splice.

Film scholars continue to debate the limits of intellectual montage—particularly how reliably audiences construct the intended abstract concept, versus constructing something different, or something more emotionally than conceptually. Eisenstein himself had significant failures: October (1927), which attempted sustained intellectual montage throughout an entire feature, bewildered audiences who didn't share his intellectual framework. The technique works when the collision is sharp enough to force synthesis, and the two images are in the right kind of resonance with each other. When it fails, it just looks like a non sequitur.

The Odessa Steps: A Master Class

It's worth pausing to really sit with the Odessa Steps sequence, because it is one of those rare pieces of filmmaking that rewards detailed analysis at every level. At roughly six minutes in Battleship Potemkin, it depicts events that would have taken about two minutes in real time—Eisenstein's first manipulation is purely temporal, expanding duration through cutting.

The sequence has several distinct movements. It opens with civilians enjoying a peaceful afternoon on the steps—tonal montage establishing warmth, openness, community. The arrival of the soldiers introduces a visual and tonal shift: regular geometric formations, hard shadow, mechanical movement. Then the first shots are fired, and Eisenstein deploys rhythmic montage to make the soldiers' advance feel terrifyingly inevitable while the civilian panic feels impossibly fragmented and chaotic.

The baby carriage section is the emotional apex. A mother has been shot and releases her carriage at the top of the steps. The carriage begins its descent. Eisenstein crosscuts between the carriage rolling, the face of a horrified soldier, the faces of wounded civilians, in shots that become slightly longer as the tension builds—a rhythmic deceleration that paradoxically increases tension by making each moment feel more weighted. The carriage's descent is agonizingly slow, intercut with the massacre continuing around it, creating a moment of terrible focal clarity in the chaos. The sequence ends with the famous Cossack swinging his sword directly at the camera—a violation of the fourth wall that makes the violence intimate, personal, inescapable.

The entire sequence demonstrates all five montage types working simultaneously: the metric pulse of cutting speed, the rhythmic play of movement within and between shots, the tonal contrast between the innocent brightness of the crowd and the dark inevitability of the soldiers, the overtonal synthesis of these into a complex emotional experience, and periodic intellectual montage that creates conceptual statements about power, brutality, and helplessness.

What's remarkable, from a neuroscientific perspective, is how much of this works beneath conscious awareness. Most first-time viewers of the Odessa Steps sequence don't think "that's excellent rhythmic montage." They feel physically agitated, emotionally overwhelmed, perhaps nauseated or near tears. The analytical structures Eisenstein was operating are not visible on first viewing—but they are responsible for the effects. The craft is in the architecture, not the surface.

Dziga Vertov and the Kino-Eye: Editing as Revelation

No account of Soviet montage theory is complete without Dziga Vertov, who represents a third strand—more radical than Pudovkin, philosophically different from Eisenstein, and in some ways more prophetic of where cinema and editing would eventually go.

Vertov was not making fictional narrative films. He was making documentaries—or rather, he was making something that called into question the distinction between documentary and fiction, between reality and construction. His concept of the "Kino-Eye" (Kino-Glaz) held that the camera lens, freed from the limitations of human perception, could see truth that the naked eye could not. And the editing table, by assembling the camera's catches from different times and places, could reveal hidden connections in reality—relationships, patterns, truths that are real but invisible to unmediated observation.

Vertov's 1929 masterpiece Man with a Movie Camera is probably the most formally radical documentary ever made, and it remains astonishing nearly a century later. The film shows the city of Odessa through a day, but it's not simply observational. Vertov uses every editing technique available to him—split screens, slow motion, reverse motion, freeze frames, superimpositions—to create a portrait of urban life as a symphony of interconnected rhythms. Workers, machines, traffic, faces, sports, entertainment—all cutting together to reveal the underlying tempo of collective human existence.

Vertov's theory differs from Eisenstein's in a crucial way: Eisenstein imposed meaning on reality through the collision of images. Vertov believed editing discovered meaning that was already there in reality, waiting to be revealed by the right assemblage of shots. This isn't just an aesthetic distinction; it's a philosophical one about the relationship between cinema and truth. Vertov wrote extensively about the Kino-Eye as a way of seeing beyond ideology, beyond manipulation, into the material reality of the world.

Of course, the post-modern objection writes itself: every choice of what to film and how to assemble it is an act of authorship, not discovery. Vertov's "revelation" is as constructed as Eisenstein's "collision." But that tension—between the film that reveals and the film that constructs—is still at the center of documentary ethics and practice today. And Vertov's formal innovations in Man with a Movie Camera prefigure experimental film, music video, and essayistic documentary in ways that are still being worked out.

The specific editing rhythm of Man with a Movie Camera also anticipates something we now understand neurologically: the brain's capacity to extract pattern from rapidly presented, non-narrative stimuli. Vertov cuts at speeds that don't allow semantic processing of individual shots—the images are processed at a pre-cognitive level, creating impressions rather than thoughts. He's editing directly to the pre-attentive visual system, bypassing narrative cognition to create something that operates more like music than like story. Research on rapid serial visual presentation suggests that the brain can extract meaning from images presented far faster than conscious recognition allows—Vertov was discovering this empirically eighty years before the neuroscience caught up.

Why the Soviet Theorists Still Matter

It would be easy to dismiss Soviet montage theory as a historical artifact—the product of a specific political moment, useful for understanding early cinema but surpassed by more sophisticated approaches. This would be a mistake.

The categories Eisenstein developed—metric, rhythmic, tonal, overtonal, intellectual—remain the most precise vocabulary we have for describing what editing actually does and how it does it. Modern editors rarely use Eisenstein's specific terminology, but the underlying distinctions map cleanly onto the decisions editors make every day:

Should this cut happen at this exact beat, or should it happen slightly early or late? (Metric/rhythmic judgment.)

Does the movement in this shot resolve against the movement in the next? (Rhythmic judgment.)

Is the emotional temperature of these two shots compatible? (Tonal judgment.)

Am I creating an overall effect that is more than the sum of these individual shots? (Overtonal thinking.)

Am I trying to generate a concept that neither shot contains alone? (Intellectual montage thinking.)

Every one of these is a live question in a modern editing room, even if no one is invoking Eisenstein by name.

Contemporary research in cognitive film theory has largely vindicated the Soviet theorists' intuitions. The brain does process temporal rhythm, motion, and emotional tone as distinct dimensions of film experience. Juxtaposition does create meaning that transcends the component images. The construction of meaning from discontinuous shots does engage the same cognitive mechanisms that construct meaning from experience in general—the event segmentation and causal inference systems we discussed in earlier sections.

The Soviets got there first by necessity and intensity of focus. They were trying to build a new art form for a new society with no money and enormous ambition, and the pressure generated heat. Eisenstein in particular was not content with "this works"—he wanted to know why it works, wanted to build a system that could be taught, applied, extended. The system he built is imperfect and incomplete, but it remains indispensable.

There's also something worth honoring in the sheer ambition of the Soviet project. Eisenstein believed that intellectual montage could change how people thought—not just what they felt, but how they reasoned about the world. This turned out to be both his greatest achievement and his greatest failure: Battleship Potemkin remains perhaps the most emotionally and politically powerful film ever made, but the more explicitly ideological October and Old and New demonstrated the limits of intellectual montage for sustained philosophical argument. Film, it turned out, is extraordinarily good at creating emotional identification and tonal argument, somewhat good at creating intellectual concepts through juxtaposition, and quite limited at building the kind of sustained logical case that persuasion requires.

But that limitation is itself a lesson in the neuroscience of film. The brain watching cinema is primarily in an emotional and associative mode, not a propositional reasoning mode. Pudovkin and Eisenstein, through different routes, both discovered this truth empirically. The brain they were working with—the brain generating its own continuity from discontinuous images, constructing causality from juxtaposition, feeling rhythm in its autonomic nervous system—is the same brain we've been discussing throughout this course.

Which brings us to where these theories lead: to the Hollywood system that took Pudovkin's constructive approach and built an entire industrial grammar around it—a grammar of invisible editing that is, in its own way, just as sophisticated a manipulation of human perception as Eisenstein's most aggressive montage. The smooth continuity cut, as we'll see, turns out to exploit exactly the same perceptual mechanisms as the Kuleshov effect—just in service of concealment rather than collision.