"ValueError: Input 0 of layer "sequential" is incompatible with the layer" In prediction

I am trying to make a classifier of voices, mine and others and then apply it to a future program. I used the CNN model for this, in training it gave very good results, I converted the audio to a spectrogram for CNN to understand. The problem is in the prediction, I do the same of converting the audio to a spectrogram but it gives me this error.

ValueError: Input 0 of layer "sequential" is incompatible with the layer: expected shape=(None, 129, 1071, 1), found shape=(None, 1071)

While in the model I put this and gave no error

model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(129, 1071, 1)))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

This is my code for the prediction

### VOICE CLASSIFICATION ###
voice_model = load_model(os.path.abspath('Models/voiceclassify2.model'))
classes = ['Other', 'Bernardo']

sample = os.path.abspath('Voiceclassification/Data/me/5.wav')
samplerate, data = wavfile.read(str(sample))

# convert into spectogram
frecuencies, times, spectogram = signal.spectrogram(data, samplerate)


vc_prediction = voice_model.predict(spectogram)[0]
idx = np.argmax(vc_prediction)
label = classes[idx]

print(label, " | ", vc_prediction[idx]*100, "%")

any idea?


EDIT:

After some fiddling this was the solution: On the one hand there was an error with the final dimension of the input (the 1 in the the input_shape). This represents the number of channels (think of RGB channels in an image). To expand our spectrogram we can use either

spectrogram = spectrogram.reshape(spectrogram.shape + (1,)) or

spectrogram = np.expand_dims(spectrogram, -1).

At this point the shape of spectrogram would be (129, 1071, 1).

On the other hand during inference the first dimension (129) was removed, because TensorFlow would interpret it as the batch dimension. You can solve this by wrapping the spectrogram in a (one element) NumPy array like this:

spectrogram = np.array([spectrogram])

Now spectrogram's shape is (1, 129, 1071, 1) which is exactly what we need.


Original:

This is definitely more of a comment than an answer but I cannot write those due to a lack in reputation, so feel free to move it to comments...

So the problem is that the expected shape (and thus the architecture of your network) and your data's shape don't match. I guess that's because the predict() call expects you to hand over a batch (look at the first dimension of each shape) of samples to evaluate. You may get around this by wrapping the spectrogram argument inside the predict call with a list: vc_prediction = voice_model.predict([spectogram])[0]. If this doesn't do the trick I'd recommend to further investigate the shapes of training and evaluation data, I like to do this during runtime in debug mode.