Solution 1:

PCM is a digital representation of an audio signal. It can be stored in memory or written on paper or whatever. An example of a 16-bit PCM audio sample might be something like 0x0152.

I2S is a electrical serial interface used to transmit PCM data from one device to another. The interface has a line used to delineate frames called the frame clock, a line for marking individual bits called the bit clock and 1 or more lines for the data. At the start of each frame clock a PCM sample is serialized bit by bit with a high voltage for a 1 and a zero voltage for a 0. The bit is held at that value for the entire duration of a bit clock and then it moves onto the next bit.

Here's some ascii art showing how an 8-bit sample 0x55 (01010101 binary), single channel might be transmitted. The frame clock runs at the sample rate, the bit clock at 8 times the sample rate and the data line contains the embedded data.

        _______________                 _
FCLK  _|               |_______________|
        _   _   _   _   _   _   _   _   _
BCLK  _| |_| |_| |_| |_| |_| |_| |_| |_|
            ___     ___     ___     ___
DATA  ___0_| 1 |_0_| 1 | 0 | 1 |_0_| 1 |_

The wikipedia articles do a pretty good job of explaining.

Solution 2:

From NXP documentation:

PCM

Most converters use a frame sync signal to signify the beginning of a new sample of audio data. These converters are usually associated with mono or single channel converters. The frame sync pulse frequency is usually the sample rate in the single channel converter. There are a few variations such as to whether the the most significant bit (MSB) or least significant bit (LSB) comes first or if the data starts with the frame sync or one bit time after. Other variations have to do with frame sync and clock being active high or active low. The figures below show some examples of audio data formats. The frame sync signal determines when the next audio sample is to be transferred between the controller and the converter. Also, the frame sync signal as seen in the above figure can be one bit time or a long bit time. That is why the frame sync frequency is usually the sample rate. There are some variations to accommodate more audio channels from having every other frame be a different channel to having the bit clock be fast enough to have more than one channel data in each frame sync. For example, having 32-bits transferred each frame sync when the data sample size is 16-bits. These channel variations can be interfaced to the MPC5200 PSC, but usually stereo 2-channel converters use an I2S interface, as described in the next section.

I2S

I2S was defined by Philips source for 2-channel stereo audio streams. The left or right channel audio data is defined by the state of the LRCK signal. The LRCK is the frame sync signal and defines the sample frequency for the data. I2S can accommodate any data size usually from 8 to 32 bits for each channel with the most significant bit (MSB) first. Notice the data is shifted by one bit from the start of the LRCLK. Since the MSB comes first, the controller can output more or less bits than the converter is expecting. For example, if the converter is 32-bit, but the controller only has 16-bit samples, the data can be left-justified to the MSB and have the lower 16-bits set 0. The converter can still accurately represent the signal in 32 bits. The same connection can be used for 8 or 32 bit data samples without changing anything except the number of bits used in the audio sample. A variation on I2S which is called left-justified swaps the state meaning of the frame sync signal from low meaning left to high meaning left, and it removes the single clock delay for the first bit in relation to the frame sync signal. The MPC5200 PSC can easily work with either format.


https://www.nxp.com/docs/en/application-note/AN2979.pdf