CD-ROM subchannel is different when dumping the same disc?
Solution 1:
The various CD formats are a bit involved, and the official specifications ("red book" for audio CD, "yellow book" for data CD) are not freely available. But you can find some details in available standards like Ecma-130.
The original audio CD (also called CD-DA) was modelled on the vinyl record, which means it also uses is a spiral track of continous audio data (the DVD later used circular tracks). Interleaved within this audio data in a very complex way are 8 subchannels (P to W), of which the Q subchannel contains timing information (literally in minutes/seconds/fractions of seconds) and the current track number. For the original purpose this was enough: For continous play, the lens was just adjusted slightly to follow the track. To seek, the lens would move while decoding the Q subchannel until the right track was found. This positioning is a bit coarse, but completely adequate to listen to music.
Still today, many computer CD drives cannot completely accurately position the lens and synchronize the decoding circuitry so that reading of audio samples starts at an exact position. This is why many CD ripping programs have a "paranoia" mode, where they do overlapping reads and compare the results to adjust for this "jitter". As part of the audio stream, the subchannel is also subject to jitter, and that is why you get different subchannel files when you rip on a CD drive that cannot position accurately.
When the data CD (CD-ROM) specification was developed to extend the CD-DA specification, the importantance to accurately address and read data was recognized, so the audio frame of 2352 byte was subdivided into 12 sync bytes and 4 header bytes (for the sector address), leaving the remaining 2336 bytes for data and an additional level of error correction. Using this scheme, sectors can be addressed exactly without having to rely on the Q channel information only. Therefore the jitter effect doesn't apply, you get always the same data when you dump a CD-ROM, and no additional cleverness in dumping is needed.
Edit with more details:
According to Ecma-130, the data is scrambled in stages: 24 bytes make up an F1-Frame, the bytes of 106 of these frames are distributed into 106 F2-Frames, which get 8 extra bytes of error correction. Those frames in turn each get an extra byte ("control byte") to make them into F3-Frames. The extra byte contains the subchannel information (one subchannel for each bit position). A group of 98 F3-Frames is called a section, and the 98 associated control bytes contain two sync bytes and 96 bytes of real subchannel data. The Q subchannel in addition has 16 bits of CRC error correction in those 96 bits.
The idea behind this is to distribute data on the surface of the disk in such a manner that scratches, dirt etc. don't affect a lot of continous bits, so the error correction can recover the lost data as long is the scratches are not too big.
As a consequence, the CD drive hardware needs to read a complete section after repositioning the lens to find out where it is in the data stream. The descrambling of the various stages is done by hardware, which needs to sync itself to the 2 sync bytes in control-byte stream. All CD drive models need a different amount of time to sync compared to other models (you can test that by reading from two different drives, if you have them), depending how the hardware is implemented. Also, many models don't always take the exact same time to sync, so they can start a little early or late, and output the descrambled data not always at the same byte.
So when the ripping program issues a READ CD
(0xBE) command, it supplies a transfer length and a start address (or rather, Q-channel time). The drive positions the lens, descrambles the frames, extracts the Q-channel, compares the time, and when it finds the correct time, it starts to transfer. This transfer doesn't always begin at the same byte as explained above, so the result of multiple READ CD
commands may be shifted against each other. That's why you see different subchannel files from your ripper.
Depending on the hardware and the circumstances when the lens is adjusted, it's more or less random if the transfer starts a few samples early or a few samples late. So the only pattern you'll see in the results is that the shifts are a multiple of the transfer length.
Some drive models actually have accurate hardware which will always start transfer at the same time. The standard defines a bit in mode page 0x2a ("CD/DVD Capabilities and Mechanical Status Page") which indicates if that is the case, but real-world experience shows that some drives claiming to be exact are in fact not. (Under Linux, you can use sg_modes
from the sg3-utiles
package to read the mode pages, I don't know what tool to use under Windows).
Solution 2:
According to this Wikipedia article
A frame comprises 33 bytes, of which 24 bytes are audio or user data, eight bytes are error correction (CIRC-generated), and one byte is for subcode.
This suggests there is no error correction for subchannel.
I have also found another question elsewhere. It's about audio CDs but I think it addresses the right issue:
All I can say is that I've never managed to obtain two identical subchannel readings (*.SUB file) when reading from the same CD-DA/CD-TEXT. Is that normal when reading in RAW mode because data isn't corrected because CD-DA/CD-TEXT format doesn't carry EDC/ECC in all subchannels?
The answer there:
Only audio data is subjected to Reed-Solomon coding (C1 & C2). Subcode channel data (channels P...W) are not subjected to interleaving or error protection.
While dirkt may be right in another answer to your question that you may not need .sub
files, the answer doesn't explicitly address your question:
What is the explanation of this behavior?
My answer: you get different .sub
files because subchannels don't have error correction. Read errors are corrected (or at least detected) while reading audio or user data, but a read error can pass as-is when it occurs at subchannel bit. Particular errors due to scratches or dust may appear during one reading session, not appear during another etc. – hence .sub
files that differ.
Answer expanded to address the comment:
I have two copies of this disk one being in excellent condition (no visible scratch) and the behavior is still the same. I also have other older game CD-ROMs in worst condition that have consistent
.sub
file across multiple dumps.
I suspect (unfortunately without hard evidence though) different CDs may have been manufactured with different quality. In a case when subchannels don't matter, the lower quality disk may still pass quality tests designed to detect data inconsistency only. Or it may be simply probabilistic matter: one disk has its weak spot(s) (a bit that gives inconsistent readings) where error correction can fix it; another happens to have it in subchannel area.
One such subchannel bit is enough to give you different checksums, while even thousands "undecided" bits in user data area may be silently corrected when it is needed, if only they are distributed enough, so the error correction algorithm deals with not-too-much-of-them at a time.
Answer expanded in reaction to KProbe 2 results.
As far as I know C1 errors are allowed (to some quantity) because they are silently corrected (more here). This correction works because of error correction bits. As I said before, subchannels don't have such a redundancy in general (dirkt mentions Q-subchannel CRC error correction but that doesn't change much in my conclusion). Moreover if the error occurs there, there is no way to know it, unless you know beforehand what the correct subchannel data is.
So you had a total of 1855 errors you know about. Repeat the test (seriously, do it!) and you may have e.g. 1790 errors; or 1892. Yet the corrected output is the same every time you read.
If there is one subchannel bit for every 32 data bits then I say you probably have about 1855/32 subchannel bits that were read with undetected error. That's about 58 bits. Well, almost, because thanks to Q-subchannel CRC some of these errors may be detected at least. Since Q is one of eight subchannels I estimate you are left with about 50 erroneous bits in other subchannels. Next time you read you may get few of these bits without an error, and few new subchannel errors elsewhere. So you will get different .sub
file. And still you won't know for sure which of those bits were read correctly the first time or the second.