[Voiceprint Recognition] Voiceprint Recognition Algorithm Based on MFCC

Melfrequency cepstral coefficients (MFCCs) are representations of the short-term power spectrum of the sound, based on the linear cosine transform of the logarithmic power spectrum on the [nonlinear spectrum.] are representations of the short-term power spectrum of the sound, based on the linear cosine transform of the logarithmic power spectrum on the [nonlinear spectrum.] In the field of automatic speech recognition, MFCC is one of the most widely used features, and it is also widely used in the field of voiceprint recognition. The feature extraction process of MFCC is shown in Figure 1.

Figure 1 MFCC [feature extraction] process

[1) Pre-processing operations such as pre] Pre-processing operations such as pre] -emphasis, framing and windowing are performed on the original speech to obtain a short-term signal x(n);

[2) Perform fast Fourier transform] Perform fast Fourier transform] (FFT) on each short-term signal x(n) to obtain the corresponding linear spectrum Xa(k);

3) Take the square of the modulus of Xa(k) to obtain the discrete power spectrum X(k);

4) Filter the obtained [spectrum] Filter the obtained [spectrum] X(k) through the Mel filter [4] group, and then calculate the logarithmic energy mi for the output of the filter group;

5) Perform discrete cosine transform (DCT) on mi to obtain MFCC, which can be simplified as:

In the formula: Cn represents the coefficient of MFCC; L represents the order of MFCC.

Experiments show that when the order increases to a certain level, the improvement of the system recognition performance will become very small, but the complexity of the system will increase greatly. Therefore, in practical applications, only 12-16th order cepstral coefficients can be used to achieve high recognition efficiency.

function c=mfcc(s,fs) % Create a function mfcc, where c is the output variable, mfcc is the function name, and s and fs are the input variables;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%  function: mfcc() Calculate mel frequency cepstral coefficients
% input :    s: input voice digital signal fs: sampling frequency
% output:    MFCC characteristic coefficients
% rewriter: zhuchunqiang
% time:     2020.5.29
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% pre-emphasis
a=0.98;
len=length(s);
for i= 2 :len 
    s1(i)=s(i)-a*s(i- 1 );% A new signal s1 is formed;
 end 
%figure( 2 ),plot(s1),title( 'heavyed signal' ); s1 is the pre-emphasized signal;

% Calculate the power spectral density

n= 320 ;% the number of sampling points per frame
m= 160 ;% distance between the starting points of adjacent frames and the offset between frames;
% [Pxx,w]=pwelch(s1,n,m, 256 , 10000 ); This function is to calculate the power spectral density of the function described in parentheses;

% Framing
frame=floor((len-n)/m)+ 1 ;% the number of signal sub-frames, the function of floor is to take an integer close to A in the brackets;
 for j= 1 :frame % one column is one frame
     for i= 1 :n 
        z(i,j)=s1((j- 1 )*m+i); % Is this sentence correct, doesn't s1 start from 2 ? 1 
    end 
end

% window hamming
h=hamming(n);
for j=1:frame
    for i=1:n
    end 
end 
% z3=z2( : ); same as above
% figure(4),plot(z3),title('window')   

% fft transform
for j= 1 :frame 
    FFT( : ,j)=fft(z2( : ,j));% Fourier transform is required for each frame;
 end

%melfb generates a mel-domain filter bank;
m=melfb( 20 ,n,fs); % This should call the melfb function, where p= 20 , which refers to the number of filters;
n2=1+floor(n/2);
mel=m*abs(FFT( 1 :n2 , : )).^ 2 ; % Calculate the energy value weighted by the mel filter bank; abs(FFT( 1 :n2 , : )).^ 2 is the energy spectrum, The magnitude squared spectrum is passed through a Mer filter bank;
                            % *m is the energy value weighted by the filter bank obtained through a set of Mel-scale triangular filter banks;
c=dct(log(mel)); % Take the logarithm of the output of the filter bank, and then do the DCT transform; get the mel cepstral coefficient;
c( 1 , : )=[]; % remove the first line of c;

% Process summary: input speech - pre-emphasis - framing - windowing - FFT - frequency response weighting by MEL filter bank - calculating the weighted energy value - taking the logarithm of the output, doing DCT transformation - obtaining the mel cepstral coefficient;

【references】

[1] Wang Tao, Wang Guozhong, Zhu Linlin, et al. Design and implementation of an intelligent door lock system based on [voiceprint recognition] [J]. Electronic Measurement Technology, 2019, 42(3): 107-111.

[2] Li Hong, Xu Xiaoli, Wu Guoxin, etc. Research on speech emotion feature extraction based on MFCC[J]. Journal of Electronic Measurement and Instrumentation, 2017, 31(3): 448-453.

[3] Liu Xiang, Sun Jing, Zhao Yang, etc. Research on heart sound signal feature extraction and recognition based on MFCC[J]. Electronic Measurement Technology, 2018, 41(2): 1-5.

[4] Hu regime, Zeng Yumin, Zong Yuan, etc. Improvement of MFCC parameter extraction in speaker recognition[J]. Computer Engineering and Applications, 2014, 50(7): 217-220.

[5] Zhou Ping, Li Xiaopan, Li Jie, etc. Mixed MFCC feature parameters applied to speech emotion recognition [J]. Computer Measurement and Control, 2013, 21(7): 1966-1968, 1986.

Leave a Comment

Your email address will not be published. Required fields are marked *