Tuesday, April 10, 2012

Shrinking a Decade's Worth of Video Clips

My family just loves to shoot video clips with our digital cameras. I must admit, I love watching old videos of the kids when they were little. But storing all those videos is becoming a nightmare. The library has grown more than the kids!

I figured I could convert the videos to some other format and save some space. I was thinking there had to be a reason why AVI isn't used for streaming web videos. I poked around on the web, and after some trial and error, came up with the following command pattern using ffmpeg.

user@computer:~/Videos$ ffmpeg -threads 0 -i movie001.AVI -b 1500K \
  -vcodec libx264 -vpre slow movie001.mp4

As it turns out, all of my cameras are really bad at on-the-fly video compression. As an example, I have a 33 second, 61MB video clip. After transcoding to mp4 it's 6.4MB. The new clip looks as good as the original. This is going to solve all my video storage problems.

Okay, so I am motivated, and I have a command that will do what I want. But I need to do this for hundreds of video clips, and my bash programming stinks. I am pretty good with one-liners. Throw in looping and awk or sed and I'm lost.

What's a Java programmer to do?

Java isn't the best scripting language. Perl and Python are great for scripting, but I use them so seldom that I am constantly looking up simple stuff. I like Beanshell, but it lacks polish for command-line programming. So how about Groovy?

Groovy syntax matches Java almost exactly (better than Beanshell) and also extends it with Ruby-esque extensions and code-scrunching syntactical sugar. In other words, I can write a concise script that I can still read in six months - when I need it again.

The Groovy script below takes a list of video files and converts them to mp4. You can modify it to allow different file types or to tweak the ffmpeg settings:

user@computer:~/Videos$ transcode.gy *.avi

Here is the full listing for transcode.gy. It's "enigmatic" Groovy (closures, syntactic sugar, etc.), but if you are comfortable with Java you won't have any trouble following along. Enjoy!

 #!/usr/bin/env groovy 
  * This software is distributed WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied. Copy it, use it, sell it, even 
  * take credit for it if you want. But don't come back to me if there
  * is a problem.
 import java.util.regex.Matcher  
 import java.util.regex.Pattern  
 List<File> files = args.collect { new File(it).canonicalFile }  
 files.each {   
         throw new IOException("Can't process folder: " + it.path)  
         throw new FileNotFoundException(it.path) 
     /* Update this regular-expression to allow different video types. */
     if(!(it.name ==~ /(?i).*\.((avi)|(mod)|(mpeg))/))   
         throw new IOException("File type not supported: " + it.path) 
 files.each {  
     Matcher matcher = (it.name =~ /.*\.([a-zA-Z]+)/)  
     String ext = matcher.group(1)  
     String newName = "${it.name[0..-ext.size()-2]}.mp4"  
     Process p =   
         "-threads", "0", 
             /* 0 indicates use all cores. */  
         "-i", it.canonicalPath,
         "-b", "1500K",
             /* Default 200K. You can tweak this setting to change video 
              * size and quality. */
         "-vcodec", "libx264",   
         "-vpre", "slow", 
             /* Preset file. Also default, normal, veryslow, max, hq. 
              * Search for files ending in '.ffpreset' to see what's 
              * available to you. */
             /* Overwrite the .mp4 file if one already exists. */
         new File(it.parentFile, newName)].execute()  
     p.waitForProcessOutput(System.out, System.err)
     if(p.exitValue() != 0) 
         throw new Exception("Failed: ffmpeg ${p.exitValue()} ${it.path}")

Thanks to John Dyer for posting simplified command-line examples of ffmpeg for several popular video formats, including H.264.

Paul Williams pointed out in the comments that you can do much the same thing with this simple Bash script.

for i in `ls <input_dir>` 
    ffmpeg \
        -threads 0 \
        -i $i \
        -b 1500K \
        -vcodec libx264 \
        -vpre slow \
        <output_dir>/`basename $i .AVI`.mp4

I hardly ever get to use for ... in in my common Bash tasks because most commands I use take multiple files as input, whereas ffmpeg does not. I'll have to try to remember this next time I run into a similar situation.

If I left out the error checking and comments in the Groovy script, it would be a lot shorter, but not that short. I wonder how small I could make the program if I wrote it in Java? That could be interesting...

Here's a Java version using Jaks and Apache commons-io for a cleaner implementation. Lots of standard Java boilerplate (I use Eclipse to manage that bit), and there is a Maven component to this that I'm not going to go into. Keeping in mind that I left in some error checking code, the meat of the program is comparable to the size of the Bash script, and somewhat shorter than the original Groovy script - which is mostly due to me being a bit rusty with enigmatic Groovy, I think.

 1:  import static java.util.Arrays.asList;  
 2:  import static org.apache.commons.io.FilenameUtils.getBaseName;  
 3:  import static org.apache.commons.io.FilenameUtils.getExtension;  
 4:  import java.io.File;  
 5:  import java.io.FileNotFoundException;  
 6:  import java.util.List;  
 7:  import com.googlecode.jaks.cli.AbstractJaksCommand;  
 8:  import com.googlecode.jaks.cli.JaksNonOption;  
 9:  import com.googlecode.jaks.system.Subprocess;  
10:  public class Transcode extends AbstractJaksCommand  
11:  {  
12:      @JaksNonOption(required=true)  
13:      public List<File> vids;  
14:      @Override  
15:      public void execute() throws Exception   
16:      {  
17:          for(final File vid : vids)  
18:          {  
19:              if(!asList("avi", "mod", "mpeg").contains(getExtension(vid.getName())))  
20:              {  
21:                  throw new IllegalArgumentException(vid.getPath() + " is not a convertable file type.");  
22:              }  
23:              if(!vid.isFile())  
24:              {  
25:                  throw new FileNotFoundException(vid.getPath());  
26:              }  
27:              new Subprocess("ffmpeg",   
28:                      "-threads", "0",  
29:                      "-i", vid.getCanonicalPath(),  
30:                      "-b", "1500K",  
31:                      "-vcodec", "libx264",  
32:                      "-vpre", "slow",  
33:                      "-y",  
34:                      new File(vid.getParentFile(), getBaseName(vid.getName()) + ".mp4").getPath()  
35:                  ).call();  
36:          }  
37:      }  
38:  }  


  1. As a follow up, you can install Groovy on Ubuntu using apt-get or Ubuntu Software Center. Also, if you are not inclined towards a heavyweight IDE like NetBeans or Eclipse, vim and gvim have excellent Groovy syntax highlighting.

  2. The presets (-vpre ...) seem to have a big effect on the size of transcoded video file. veryslow and max are ... well ... very slow (something like 10x slower). If you have some time to kill and want maximum compression, try it and see.

  3. Just read this for the first time... you might want to review the bash man pages. Your groovy script can be rewritten as:

    for i in `ls `; do ffmpeg -threads 0 -i $i -b 1500K -vcodec libx264 -vpre slow /`basename $i .AVI`.mp4; done

  4. Grr... helpful of blogger to remove my angle brackets instead of replace with entities..

    for i in `ls <input_dir>`; do ffmpeg -threads 0 -i $i -b 1500K -vcodec libx264 -vpre slow <output_dir>/`basename $i .AVI`.mp4; done