Program to automatically generate subtitles using speech-to-text?

I have a video that I want to create subtitles for. Is there a program that can perform rudimentary speech-to-text in order to

  1. set the correct start/stop of each individual subtitle
  2. create rudimentary text subtitles (using some sort of speech-to-text)

I know about gnome-subtitles. However, it requires extensive effort to create those subtitles manually. You need to select yourself the start and stop for each sentence.

Youtube has the above features (creates rudimentary text subtitles at the correct timings, using speech-to-text). However I would rather not upload the videos to Youtube just to get my subtitles. Is it possible to do the subtitles efficiently in Ubuntu?

Update: I plan to use the .srt subtitles only, and do not need to hard code them on the videos. My biggest requirement is to have the program automatically find the start/stop for each sentence, so that I write the text in it.

Update #2: There is Speech-to-Text software for Linux, with the CMU Sphinx package. It is possible to use CMU Sphinx with a subtitle program according to http://sourceforge.net/projects/cmusphinx/forums/forum/5471/topic/3949891 In addition, one subtitle tool is aware of this CMU Sphinx feature, http://groups.google.com/group/universal-subtitles-testing/browse_thread/thread/613361ffb921b43b (web based tool), however there is no reference in the latest source code that they added CMU Sphinx. The quest continues to find a program that uses CMU Sphinx for rudimentary speech to text (which would set the correct timings as well), as Youtube already does.


I did not find a way to get the subtitle program to automatically add rudimentary subtitles, by analysing the voices in the video.

Therefore, the alternative that I use is

  1. Upload the video to Youtube (for example, privately) and use the in-build facility to create automatically rudimentary subtitles.

Then,

  1. Add the video to http://www.universalsubtitles.org/ and create manually the timeframes for each sentence, if the automated way in Youtube did not work, or sentences are mising.
  2. Use GNOME Subtitles (found in the Software Center) in order to clean up the subtitles and fix any timings.

I personally like Gnome Subtitles it is available in the repositories.

sudo apt-get install gnome-subtitles

I used Aegisub on Windows some years ago, and was really happy with it. Apparently it is available for Linux. It is pretty self explaining.

Aegisub only creates the subtitles file, e.g an .srt file. To combine the video and the subtitle to create a hard-coded subtitle you still need to use a second program.
On Windows I used VirtualDub, but it is not available for Linux. You can use VLC to do this on Linux:

Create your subs in Aegisub, saving it as usual as a .ass file.

Use VLC to add that subtitle track to your video. Subtitle -> Add subtitle file...

Configure the subtitle display style and settings so they display to your liking. Tools -> Preferences -> Subtitles/OSD

You can now watch the video to make sure the subs are displaying as you intended. For example I can check certain subs that I've specified in Aegisub to be displayed at the top of the screen rather than the bottom.

The output will be identical to how it looks now, so make sure all is good.

  1. Go to Media -> Convert/Save... (Ctrl + R).

  2. Under File Selection, add your video file. Tick "Use a subtitle file" and browse to your .ass sub file.

  3. Click the down arrow on the Convert/Save button and click Convert...(Alt + O).

  4. Under Settings, ensure the Convert option is ticked. Tick the Display the output option. Subs aren't added for some reason unless you tick this.

  5. Edit the profile so the video and audio settings are what you want. Under the subtitle tab, tick the Subtitles box, and use DVB subtitle codec. Make sure you tick 'Overlay subtitles on the video'. Press save.

  6. Enter a destination folder and filename in the Destination box.

  7. Press start.

Wait for it to be done, and that's it. The caveat with this method is that the encoding will happen in real-time with the video, so if you have a 2 hour video, it will take 2 hours. This is due to ticking the 'Display the output' box. But for some reason it only works when you tick this.

There are also other subtitle-editors.

Update:
I don't remember Aegisub having a functionality to automatically set beginning and end of a spoken sentence in the subtitles file. And I don't see a mention of such a function anywhere on the site. It is however with (key-combinations) pretty easy to set those times manually.

Is there even any program which has such a function (in any OS)?