Since 3:2 pulldown is an anomoly only present in NTSC TV, then this document is only of concern to NTSC video. Europe and some other parts of the world use PAL TV and do not have 3:2 pulldown. Therefore, this document assumes the NTSC TV standard and completely ignores PAL TV standard for the most part.
Some time ago, the film industry decided that the best framerate to film at was 24 frames per second (FPS). Sure, more frames per second would show smoother movement, but would use up more film and that would be costly. Fewer frames per second would still show movement, but not smoothly enough. So that was that. 24 FPS it would be, and that is the framerate that movies are still shot at today. Animated cartoons followed the same protocol.
Then Television came along. In the United States, the framerate was 30 FPS, or more spcifically 60 fields per second. A field is half of a frame, more specifically, every other horizontal line of the frame. So there are even fields and odd fields, and they display alternately 60 times per second. It makes for very smooth motion for live TV and video-sourced material. But broadcasting film-sourced material presents a challenge. 3:2 pulldown is the answer to that challenge.
When color TV came along, the rate was changed slightly .1% to add in the color info (30/1000 changed to 30/1001). So 30 FPS was changed to 29.97. But I will continue to refer to it as 30 FPS for the remainder of this webpage. Incidentally 24 fps for TV broadcast changed to 24/1001 which is 23.976.
Just how do you display 24 frames per second on TV at 60 fields per second? There are several methods that can get it done, but they all have their drawbacks. First of all, you could just speed up the film to 30 FPS. That would surely get it done, but it would be too fast. Everyone would talk like chipmonks. Better would be to repeat every fifth frame, so that 24 frames would stretch out to 30 frames. This would get it done, but cause a sort of rhythmic stuttering. The solution was to weave the frames together by splitting them into fields and displaying the fields in a certain pattern.
Take for example the following four frames:
The four frames above represent 1/6th of a second. In other words, these are four out of 24 frames which would display each second. If these four frames were to be shown in a theater they would be shown just as they are at 24 frames per second. This is what I call "progressive frames".
PROGRESSIVE FRAMES
In order to show them on TV, one frame out of every four needs to be repeated. This would stretch 24 FPS to 30 FPS. But instead of simply repeating one frame, one field from two different frames is repeated. The pattern looks like this. Keep in mind the captures below show both even and odd fields combined in the same frame. On a TV, the even and odd fields would be displayed alternately.
3:2 PULLDOWN
Here is what is happening. The top and bottom (also called even and odd) fields of frame A are shown, then top and bottom fields of frame B. Next the top field of frame B is shown a second time, followed by the bottom field of frame C. Then the top field of frame C is shown followed by the bottom field of frame D. Finally the top and bottom fields of frame D are shown. So looking back, Frame B and D each have a repeating field (the top one for B and the bottom one for D). Frames C and D are each displayed bottom field first (does it matter?). The end result is four frames stretched into five frames by cleverly weaving their fields together. It's a smooth transition as well. The pattern looks like this: AABBBCCDDD. This is where 3:2 pulldown gets its name, three progressive frames followed by 2 weaved (sometimes called interlaced) frames, or two fields follwed by three fields.
Here is a closeup of weaved frames BC and CD. Keep in mind that your eye does not see the combed/weaved frames the way you see them displayed here. The eye may notice a split second of weaving between B and C and between C and D, but the brain still sees the complete C frame.
Laserdiscs and VHS tapes (and of course broadcast TV) are basically raw video, so they have to have 30 FPS video. This means that 3:2 pulldown must be applied to any film-based material before writing it to video. This process is sometimes called "telecining". DVDs on the other hand can have progressive 24 FPS material. DVD players apply the 3:2 pulldown treatment on the fly as the disc is being played. This is perfect. All frame data is there on the disc and any frame paused will show the complete progressive frame, not two fields weaved together.
When converting a laserdisc or VHS tape, a decision must be made concerning 3:2 pulldown. You could just leave it at 30 FPS complete with the weaved (interlaced) fields. There is nothing wrong with this. The only drawbacks are more space will be consumed on the DVD, and you will not be able to advance frame by frame to study the animation. But at least all of the frame date is there and your eye will see it just fine when you play it.
Reversing the pulldown pattern (sometimes called "inverse telecining") is usually not too difficult. The problems happen when the pattern is disrupted. The pattern usually changes at commercial breaks, or disc/tape swaps. Other times the pattern gets disrupted when the orignal material was edited using video hardware prior to being broadcast or made into a VHS or laserdisc. One of my favorite shows (The PJs) does this. And sometimes, a show will be a hybrid or both video and film material. I won't comment on hybrid material for now.
The disrupting pattern can be overcome by clever scripting. I use a tool called AviSynth which is basically a scripting language that gives great control over just about every aspect of the video. My manual_IVTC function has evolved over time to a state of near perfection. Basically, it adds 1 - 4 frames at the beginning so that the 3:2 pattern lines up correctly. Then it restores progressive frames by reassembling the C frame from the two fields. Lastly, it deletes the extra frames that were added at the beginning. The result is perfect 3:2 pulldown reversal with audio perfectly in sync. In fact, the audio is unchanged.
Some people think it's clever to blend the fields, but please don't do that. Why would anybody want to do that? If you do not know how to reverse 3:2 pulldown and don't know what else to do, then please leave it unchanged. At least then the progressive frames can be recovered. Please do not EVER blend the fields. This is what happens when you blend fields:
BLENDED FIELDS. DON'T DO THIS!
If I used lowercase letters to illustrate blended fields, the pattern would be like this: AABBbccdDD. 40% of the fields being shown are blurred, and the C frame cannot be recovered. Furthermore, your brain won't see the complete C frame, only a blurred mess. Blending fields causes extremely blurry movement and permanent loss data. Please do not ever do this. Here is a closeup of blended frames BC and CD.
Other times, people think it is necessary to deinterlace it using an automated deinterlacing tool. This can cause even more serious damage to the video, and most certainly destroys the C frame! Furthermore, it brings with it the rhythmic stuttering movement. Ugh! For the sake of preserving classic animation, please just leave it alone. It would be better to use an automated inverse telecining tool such as Decomb.
What do you think? Let me know your thoughts.