Using a lossless video codec for archiving (monochrome) scientific video data

Basic question: what is a suitable codec for storing/archiving scientific video data in a lossless manner?

I am trying to help my research group out with storing/archiving some videos recorded with a microscope. These (grayscale) videos are in uncompressed (rawvideo) BGR24 format, 660x492@61fps, and typically about 1 minute long. My lab mates are going crazy with the sheer size of these files (gigabytes each). I suggested to compress them using a lossless codec. (The need for lossless here is because the videos are scientific data; hence there is some danger that a lossy codec may alter the content in bad/unexpected ways.)

Here is what I tried. First, I grabbed the first 10 seconds of one of these videos and converted to a monochrome (raw) format using FFMpeg.

ffmpeg -t 10 -i RecordedData.avi -c:v rawvideo -pix_fmt gray raw_gray.mkv

Then, I attempted to use the lossless mode of libx264 (by setting -crf 0) to compress the resulting file

ffmpeg -i raw-gray.mkv -c:v libx264 -crf 0 -pix_fmt yuv420p -color_range pc x264-yuv420p.mkv

Finally, I extracted the raw YUV data from both the raw and h264 MKV files and compared them.

ffmpeg -i raw-gray.mkv -c:v rawvideo -pix_fmt gray raw-gray.yuv
ffmpeg -i x264-yuv420p.mkv -c:v rawvideo -pix_fmt gray x264-decompressed.yuv
diff -sq raw-gray.yuv x264-decompressed.yuv

Here, the diff command reports that the files differ when I expected them to be the same. Why is this? Is this just some slight rounding error, or am I possibly losing something after doing the H264 (supposedly lossless) compression? There is some conversion of pixel formats happening (gray (YUV400) <-> YUV420), but the color (UV) channels should just be empty because the input is monochrome.

If I am indeed losing something, is there anything I can do to fix this? Is there another (lossless) codec that might be more appropriate for my data?


Update 1: I used hexdump to compare the contents of the uncompressed YUV data from raw-gray.yuv (never compressed) and x264-decompressed.yuv (compressed and then decompressed) in more detail. Here are the first few bytes.

[raw-gray.yuv]

00000000  4e 50 51 53 53 52 51 50  51 51 50 4f 50 50 50 50
00000010  51 51 50 51 52 53 51 51  52 52 53 53 52 51 51 53
00000020  51 53 54 55 53 51 52 54  53 53 52 50 51 50 52 52
00000030  51 52 51 51 51 52 54 52  52 52 51 51 51 53 57 58
00000040  57 57 55 54 54 52 53 51  51 52 53 55 55 54 53 53
00000050  51 51 52 52 53 52 51 50  50 50 50 51 51 4f 4f 4e
00000060  4c 4d 4e 4d 4f 50 4f 50  51 51 51 52 52 52 52 50
00000070  50 50 52 52 53 55 55 55  57 52 53 53 53 54 56 56

[x264-decompressed.yuv]

00000000  53 55 56 57 57 56 56 55  56 56 55 54 55 55 55 55
00000010  56 56 55 56 56 57 56 56  56 56 57 57 56 56 56 57
00000020  56 57 58 59 57 56 56 58  57 57 56 55 56 55 56 56
00000030  56 56 56 56 56 56 58 56  56 56 56 56 56 57 5b 5c
00000040  5b 5b 59 58 58 56 57 56  56 56 57 59 59 58 57 57
00000050  56 56 56 56 57 56 56 55  55 55 55 56 56 54 54 53
00000060  51 52 53 52 54 55 54 55  56 56 56 56 56 56 56 55
00000070  55 55 56 56 57 59 59 59  5b 56 57 57 57 58 5a 5a

The values from the former file are 4 to 5 less than the values in the latter. The same is found digging a little deeper into the file.


Update 2: If I use libx264 in RGB mode, I can get an exact match with the original by doing the same as above in addition to the following.

ffmpeg -i raw-gray.mkv -c:v libx264rgb -crf 0 -pix_fmt bgr24 x264-bgr24.mkv
ffmpeg -i x264-bgr24.mkv -c:v rawvideo -pix_fmt gray x264-bgr24-decomp.yuv
diff -sq raw-gray.yuv x264-bgr24-decomp.yuv

The last command reports that the two files are identical. Unfortunately, x264-bgr24.mkv is about 3 times larger than x264-yuv420.mkv, so the compression in RGB mode is not as good.

I read somewhere that libx264 compresses grayscale video efficiently in YUV mode because it picks up on the fact that only the Y channel contains any real information (U and V channels are both zero for monochrome video). In RGB mode, I believe all the channels would contain identical info for monochrome input. Maybe libx264rgb does not take advantage of that.

So, is there a way for me to use YUV mode without altering the video, since the compression is much more efficient this way?


Update 3: I was able to solve the problem with libx264 by using -pix_fmt yuvj420p instead of -pix_fmt yuv420p -color_range pc. Then, I reproduce the original file exactly after compression/decompression. From the FFmpeg documentation, I had the impression that these two sets of flags were equivalent, but this is evidently not the case. The only issue is that I get a warning with the latter set of flags: [swscaler @ 0x55b56347fe20] deprecated pixel format used, make sure you set the range correctly. Also, I found this bug report that could be related to my issue. I am not sure of the "proper" way to do things without using the apparently deprecated yuvj420p pixel format.


Solution 1:

This isn't a straight answer to your actual problem, but I would consider using the FFmpeg-internal FFV1 codec:

$ ffmpeg -i raw-gray.mkv -c:v ffv1 ffv1.mkv

Alternatively, version 3 of it:

$ ffmpeg -i raw-gray.mkv -c:v ffv1 -level 3 ffv1.mkv

Then:

$ ffmpeg -i ffv1.mkv -c:v rawvideo -pix_fmt gray raw-gray.yuv
$ diff -sq raw-ffv1.yuv raw-gray.yuv
Files raw-ffv1.yuv and raw-gray.yuv are identical

It's not as efficient as libx264 in lossless mode when using yuv420p, but it is more efficient than using libx264 with bgr24 (in my tests, data rate was somewhere in between). Some institutions like the Library of Congress also recognize FFV1 as a suitable preservation format.