Feature Extraction using MFCC

I want to know, how to extract the audio (x.wav) signal, feature extraction using MFCC? I know the steps of the audio feature extraction using MFCC. I want to know the fine coding in Python using the Django framework


Solution 1:

This is the most important step in building a speech recognizer because after converting the speech signal into the frequency domain, we must convert it into the usable form of the feature vector.

import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile
from python_speech_features import mfcc, logfbank

frequency_sampling, audio_signal = 
wavfile.read("/home/user/Downloads/OSR_us_000_0010_8k.wav")

audio_signal = audio_signal[:15000]

features_mfcc = mfcc(audio_signal, frequency_sampling)

print('\nMFCC:\nNumber of windows =', features_mfcc.shape[0])
print('Length of each feature =', features_mfcc.shape[1])



features_mfcc = features_mfcc.T
plt.matshow(features_mfcc)
plt.title('MFCC')

filterbank_features = logfbank(audio_signal, frequency_sampling)

print('\nFilter bank:\nNumber of windows =', filterbank_features.shape[0])
print('Length of each feature =', filterbank_features.shape[1])

filterbank_features = filterbank_features.T
plt.matshow(filterbank_features)
plt.title('Filter bank')
plt.show()

or you may use this code to extract the feature

import numpy as np
from sklearn import preprocessing
import python_speech_features as mfcc

def extract_features(audio,rate):
"""extract 20 dim mfcc features from an audio, performs CMS and combines 
delta to make it 40 dim feature vector"""    

        mfcc_feature = mfcc.mfcc(audio,rate, 0.025, 0.01,20,nfft = 1200, appendEnergy = True)    
        mfcc_feature = preprocessing.scale(mfcc_feature)
        delta = calculate_delta(mfcc_feature)
        combined = np.hstack((mfcc_feature,delta)) 
        return combined