Remove audio snippet from many audio files

I have a lot of audio files (.mp3) which all contain a specific audio snippet on different timestamps in those audiofiles. How could I remove those snippets automatically in all audio files?


Solution 1:

Only manually and file by file - if at all. How easily depends largely on what exactly you want to remove.

If the snippet has a clear, audible period of silence at both ends, you can simply open the file in an audio editor such as Audacity, Ocenaudio etc., paying attention that you cut at zero crossings as to avoid audible clicks. Bring the ends of the truncated waveform together and save the new file.

If there's no clear silence, it gets difficult. This is because an audio waveform is a very complex thingy, just cutting at zero crossings is usually not enough. I've tried that myself often enough in the past; matching the audio without audible clicks and pops is well-nigh impossible.

If the "snippet" is an overlaid recorded watermark such as "produced by ACME" on top of a sample, there's no way without destroying the original audio.

There's really no use for this kind of tool even in professional audio production; unless maybe in movie post-processing which I'm not familiar with. Even if such automatable tool exists it's bound to be VERY expensive, considering that the professional toolsets for noise reduction etc. cost several thousands.

EDIT to clarify the "why".

Let's start with the fact that computers do not work with audio, they work with chunks of data. Bits and bytes, and only with bits and bytes. And computers are very stupid.

ASCII-coded word "hello" consists of characters 104, 101, 108, 108, 111. It's a clear-cut string of bytes, unchanging. Therefore it's a trivial task to remove or replace the single word in a multitude of text files. Words "Hello", "hEllo" or "HelLo" are no longer identical, so you will need to instruct the computer to handle those separately.

Record yourself saying "hello" three times, exactly the same way. A human being might not hear any difference, but you will have produced three unique waveforms, and computer will see three unique strings of bits and bytes.

Record "hello" on your phone, play it back three times recording it with a computer. Any difference in the situation - moving the phone a millmeter to any direction, one of the recordings being ever so slightly louder, 100th of a second difference in starting the playback - will again have produced three unique waveforms, i.e. unique strings of bits and bytes.

Again, you will need to instruct the computer to remove each unique string of bits and bytes separately.

The problem here is very similar to face recognition. As @Tetsujin already hinted, this requires an AI.

Removing absolute, perfect silence is another trivial task as it's always identical string of zero bytes. Removing background noise (almost-perfect silence) less so; you will have to teach the system what "noise" is in the first place; and choose appropriate parameters to remove just the noise and nothing else. And it's effective only as long as the noise is constant... if AC starts to blow higher, the background noise has changed and you will need to adjust your parameters.

Then there's the other practical side...

Tools are created to perform tasks. The more common the task, the more tools are available and the cheaper they get. An example would be equalization or compression - we do it all the time, so there's a huge number of tools ranging from free to moderately expensive. There's a need, people will buy; even if good free options are available.

Clip repair, scratch removal and other such tasks are still part of the job, but far less often performed than just everyday mixing. Consequently the market for those tools is much smaller, and the prices are higher. Professional toolsets for this purpose cost thousands, but because there's a need, people will buy.

Removing a single word or sentence from a large number of audio files is not by any stretch of imagination a common task. Even if we do remove and rearrange audio, it's always an one-time job per track, never on scale like you're talking about. Consequently, should someone create such a tool, the market would be minuscule which would make the cost prohibitive.

I can't honestly even envision any legal purpose for it; except for maybe removing F-bombs. But due to the price it'd still be more cost-effective, not to mention reliable and accurate, to just let the soundguys do it manually.

I hope this clarifies it. There are no "magic" apps - there are tools to fill needs, in market-driven proportion to said needs. No need, no market, no tool.