Audio Processing#

Objective#

The objective of this project is to implement an automatic system for the determination of the severity level of Parkinson’s Disease (PD) of a patient by using speech features. This system takes a speech utterance from an unknown speaker and provides their level of PD by analyzing their voice by means of machine learning techniques.

Patients with PD usually have difficulties in speaking because of reduced coordination of the muscles involved in the human speech production system. This causes distortions in the phoneme articulation, prosody, etc., diminishing the subject’s speech intelligibility.

We have tried two main approaches along this project, one based on speech features of fixed length and another in time-varying speech features (sequences of features).

In the both approaches we have considered different models, and in the first one several feature extraction methods. Then we have selected the best alternative overall, in terms of predictive performance.

The models considered are the following:

  • In the approach based on speech features of fixed length:

    • Random Forest (RF)

    • Extreme Gradient Boosting (XGBoost)

    • Multi-layer Perceptron (MLP)

    • Two Neural Networks implemented by mean of PyTorch

  • In the approach based on time-varying speech features (sequences of features):

    • Gaussian Mixture Models (GMM)

    • Recurrent Neural Networks (RNN) implemented in PyTorch

And the feature extraction methods are the next ones:

  • Mel-Frequency Cepstrum Coefficients (MFCC)

  • Chromagram

  • Spectral Centroid

  • Spectral Bandwith

  • Spectral Contrast

  • Spectral Rolloff

  • Zero Crossing Rate

  • Tempogram

Requirements#

import numpy  as np
import polars as pl
import sys
import matplotlib.pyplot as plt
import librosa  # package for speech and audio analysis
import IPython.display as ipd
import seaborn as sns
sns.set_style('whitegrid')
sys.path.insert(0, r"C:\Users\fscielzo\Documents\Packages\PyAudio_Package_Private")
from PyAudio.preprocessing import get_X_audio_features, get_X_tensor_audio_features

Data#

We have a database composed by the following elements:

  • 20 speakers

  • 12 audios per speaker

  • We have speakers with different Parkinson disease levels:

    • normal (0)

    • slight (1)

    • moderate (2)

    • severe (3)

    hat has been recorded at a sampling frequency of 16000 Hz.

The dataset has been manually annotated following a subset of the Unified Parkinson’s Disease Rating Scale (UPDRS), a scoring scale utilized by neurologistics for clinical assessment of PD.

Reading Speech Files#

In this section we are going to show how to read speech files by mean od librosa.

Reading a class 0 (normal) speech#

Here we read a normal (0) speech.

# Reading a speech file from the database - Class 0

fs = 16000  # sampling frequency
audio_file = 'PDSpeechData/loc17/loc17_s01.wav'  # speech file

# 'audio_signal_1' is an array with the amplitude of the audio signal along the time:
audio_signal_1, sr = librosa.load(audio_file, sr=fs)

We have set a sampling frequency of 16000, this means that for each second we have 16000 values of amplitude. The time length of the read audio file is 13 seconds, so we have approximately 13*16000 = 208000 points of amplitude. This amplitudes point are saved in the array audio_signal_1.

Each audio file contains the amplitude of the audio expressed in a continuous scale, and what librosa.load(audio_file, sr=fs) does is to extract fs (16000) points for each 1 second interval of amplitudes of the original file. Each 1 sec interval is a continues interval, therefore, with infinity values, and the algorithm selects 16000 samples equally spaced.

audio_signal_1
array([ 0.04125977,  0.05276489,  0.03128052, ..., -0.00030518,
       -0.00030518, -0.00033569], dtype=float32)
time_audio_signal_1 = 13
time_audio_signal_1 * fs
208000
audio_signal_1.shape
(216500,)
  • Plotting the audio signal amplitudes along time

audio_signal = audio_signal_1
filter = range(0,500)
fig, axes = plt.subplots(2, 1, figsize=(12,7))
axes = axes.flatten()  

sns.lineplot(y=audio_signal, x=range(len(audio_signal)), color='blue', ax=axes[0])
sns.lineplot(y=audio_signal[filter], x=filter, color='blue', ax=axes[1])

axes[0].set_title(audio_file.split('/')[-1] + ' - Class 0 (normal)')
axes[1].set_title(audio_file.split('/')[-1] + f' - {str(filter)}')

for i in range(len(axes)):
    axes[i].set_ylabel('Amplitude', size=11)
    axes[i].set_xlabel('Time', size=11)
plt.subplots_adjust(hspace=0.4, wspace=0.5) 
_images/cf1ef0fdacc618c6d255480fc6c8f418daae1ea187aaeab0c2bfa81c85094d20.png
  • Displaying the audio file as sound:

ipd.Audio(audio_signal_1, rate=fs)

Reading a class 3 (severe) speech#

Now we read a severe (3) speech.

# Reading a speech file from the database - Class 3

fs = 16000  # sampling frequency
audio_file = 'PDSpeechData/loc18/loc18_s01.wav'  # speech file

audio_signal_2, sr = librosa.load(audio_file, sr=fs)
# 'audio_signal_2' is an array with the amplitude of the audio signal along the time
audio_signal_2
array([-0.25128174, -0.37490845, -0.24560547, ..., -0.04568481,
       -0.04675293, -0.04760742], dtype=float32)
time_audio_signal_2 = 7
time_audio_signal_2 * fs
112000
audio_signal_2.shape
(122500,)
  • Plotting the audio signal amplitudes along time

audio_signal = audio_signal_2
filter = range(0,500)
fig, axes = plt.subplots(2, 1, figsize=(12,7))
axes = axes.flatten()  

sns.lineplot(y=audio_signal, x=range(len(audio_signal)), color='blue', ax=axes[0])
sns.lineplot(y=audio_signal[filter], x=filter, color='blue', ax=axes[1])

axes[0].set_title(audio_file.split('/')[-1] + ' - Class 0 (normal)')
axes[1].set_title(audio_file.split('/')[-1] + f' - {str(filter)}')

for i in range(len(axes)):
    axes[i].set_ylabel('Amplitude', size=11)
    axes[i].set_xlabel('Time', size=11)
plt.subplots_adjust(hspace=0.4, wspace=0.5) 
_images/07c5fb91c8f74515d35a8b11f29dcb69919c2488de64e307fc2f37730444c542.png
  • Displaying the audio file as sound:

# Play the audio data
ipd.Audio(audio_signal_2, rate=fs)

Feature extraction#

In the following section we show an example of feature extraction for the previous speech signal audio_signal_2.

The point is, given an audio signal, extract features that characterize it, to be used along with Machine Learning algorithms, in this case to classify a new signal in one of the four PD levels mentioned above.

We distinguish two types of audio features:

  • Time-varying features (sequencies)

    • This type are suitable to be used with models that work well with sequential data, like Recurrent Neural Networks and Gaussian Mixture Models.

  • Fix length features

    • This type is suitable for models that work with tabular data, like Random Forest, XGBoost and Multi-Layer Perceptron Neural Networks.

    • This features are basically statistics computed on the time-varying features.

The methods for features extraction that we are going to use along this projects are the following:

  • Mel-Frequency Cepstrum Coefficients (MFCC)

  • Chromagram

  • Spectral Centroid

  • Spectral Bandwith

  • Spectral Contrast Spectral Rolloff

  • Zero Crossing Rate

  • Tempogram

These methods are applied directly to the amplitudes series of a given audio signal and return a matrix with the time-varying features for that signal. Then, we can retrieve a vector with the statistic about the features of that matrix, these will be the fixed length features of that audio signal.

This process can be done for all the \(n\) available audio signals, then we obtain \(n\) matrices with time-varying features.

Then, this \(n\) matrices can be accommodated in a 3D array, also known as tensor, that can be used as input by different Machine Learning, typically deep learning algorithms, like Recurrent Neural Networks (RNN).

But these \(n\) matrices (2D arrays) can also be transformed in vectors (1D arrays) computing statistics like mean or standard deviation for the features contained in those matrices. Then, we have \(n\) 1D arrays that can be concatenated to build a 2D array that could be interpreted as a predictor matrix (tabular data) to be used as input by classic Machine Learning algorithms, that work well with tabular data.

Mel-Frequency Cepstrum Coefficients (MFCC)#

MFCCs are a feature representation that captures the power spectrum of an audio signal, using a cosine transform of a log power spectrum on a nonlinear Mel scale of frequency. This Mel scale aims to mimic the human ear’s response more closely than the linearly-spaced frequency bands used in the typical Fourier transform. This makes MFCCs particularly useful for applications like speech recognition or music analysis, where the perception-like representation of audio can be more beneficial than purely physical representations.

The MFCC components refer to the number of coefficients extracted from each frame of the audio signal. The choice of the number of components is somewhat arbitrary but is based on empirical evidence suggesting that the first few coefficients (usually the first 12 to 20) capture most of the useful information about the spectral envelope of the audio signal. The higher coefficients, which represent finer details of the spectrum, are often discarded.

In this case, we want to compute a sequence of Mel-Frequency Cepstrum Coefficients (MFCC) with the following configuration:

  • Size of the analysis window = 32 ms = 0.032 secs

  • Frame period or hop length = 8 ms = 0.008 secs

  • Number of filters in the mel filterbank = 40

  • Number of MFCC components = 20

For doing that, we are going to use the function mfcc from the module feature of the librosa package. This function has, among others, the following input arguments:

  • y: speech signal

  • sr: sampling frequency

  • n_fft: window size (in samples)

  • hop_length: frame period or hop length (in samples)

  • n_mels: number of filters in the mel filterbank

  • n_mfcc: number of MFCC components

Note that in this function the window size and the hop length must be expressed in samples. Taking into account that the sampling frequency (fs) indicates that 1 second corresponds to fs samples (in our case, as fs=16000 Hz, 1 second corresponds to 16000 samples), the conversion from seconds to samples is performed by:

samples = seconds*fs = seconds*16000
# Specifying variables for feature extraction
fs = 16000 # Sampling frequency
wst = 0.032 # Window size (seconds)
fpt = 0.008 # Frame period (seconds)
nfft = int(np.ceil(wst*fs)) # Window size (samples)
fp = int(np.ceil(fpt*fs)) # Frame period (samples)
nbands = 40 # Number of filters in the filterbank
ncomp = 20 # Number of MFCC components
# Feature extraction with MFCC 
x_MFCC = librosa.feature.mfcc(y=audio_signal_1, sr=fs, n_fft=nfft, hop_length=fp, n_mels=nbands, n_mfcc=ncomp).T
x_MFCC
array([[-1.5723482e+02,  8.1401794e+01, -1.0822573e+01, ...,
        -1.4883176e+00, -2.3103115e-01, -5.7656441e+00],
       [-1.4987587e+02,  8.2267227e+01, -1.5025454e+01, ...,
        -4.9614630e+00, -5.0628245e-01, -1.0207132e+01],
       [-1.5724088e+02,  7.8481644e+01, -2.0651829e+01, ...,
        -8.5300264e+00, -1.4817656e+00, -1.2810444e+01],
       ...,
       [-4.8450891e+02,  1.6448849e+01,  1.2042265e+01, ...,
         1.0532631e+00,  6.7879003e-01,  3.1149516e-01],
       [-4.8156741e+02,  2.0192478e+01,  1.4441982e+01, ...,
         8.4327966e-01, -4.1927201e-01,  1.3737071e-01],
       [-4.7447165e+02,  2.9626501e+01,  2.1557457e+01, ...,
         5.4327607e-01, -3.4168136e-01, -1.5610456e-01]], dtype=float32)
x_MFCC.shape
(1692, 20)

x_MFCC is a time-samples x ncomp = 1692 x 20 matrix, where its columns represent features and its rows observations.

  • 20 is the number of Mel-Frequency Cepstrum Coefficients (MFCC) components, and is defined apriori as a hyper-parameter.

  • 1692 is the number of time-samples fort that audio signal, and is defined as the number of samples of the audio signal divided between the frame period (in samples).

ncomp
20
n_samples_audio_signal_1 = len(audio_signal_1)
time_samples_audio_signal_1 = np.ceil(n_samples_audio_signal_1 / fp)
time_samples_audio_signal_1
1692.0

The columns of x_MFCC, which are the MFCC components could be interpreted as features of the audio signal, and its rows as the value of these features along time, and this is why we called these features as ‘time-varying’ features.

It’s important to realize that x_MFCC is the MFCC matrix for one single audio signal, concretely for audio_signal_1.

But we want to obtain this matrix for the available data, let say, for all the available audio signals, in order to build a tensor or a predictors matrix to be used along with Machine Learning models, to carry out the classification task of this project, which is our main goal.

From MFCC matrices to tensor#

The task is to transform the \(n\) MFCC matrices (one per audio) into a tensor (3D array) of shape n x ncomp x max-n_time_samples.

For doing that we are going to use our custom function get_X_tensor_audio_features, that takes a list of audio file paths, a method for features extraction and the parameters for that method, then process the audio files as signals, and apply the specified feature extraction method on the signal, obtaining time-varying features of it.

This process is repeated for each audio file, obtaining \(n\) MFCC matrices of size ncomp x max-n_time_samples, and the results are allocated in a 3D array of size n x ncomp x max-n_time_samples, that is, in a tensor of size. The final output of out function is the desired tensor.

Later is explained why the las dimension of the tensor is max-n_time_samples.

We read all the audio file paths along with their belonging class, and accommodate them in a data-frame.

files_list_name = 'Files_List.txt'
files_df = pl.read_csv(files_list_name, separator='\t', has_header=False, new_columns=['path', 'level'])
files_df.head(3)
shape: (3, 2)
pathlevel
stri64
"PDSpeechData/l…0
"PDSpeechData/l…0
"PDSpeechData/l…0

We have 240 audio files.

files_df.shape
(240, 2)

Now we can process all those audio files, extract time-varying features form them and build a tensor to be used in ML models like RNN.

In this case we are using MFCC as features extraction method, with the parameters defined previously.

X_MFCC_tensor = get_X_tensor_audio_features(paths=files_df['path'], method='MFCC', sr=fs, n_fft=nfft, hop_length=fp, n_mels=nbands, n_mfcc=ncomp)

As you can see this is indeed a tensor, since it is a 3D array with shape (240, 20, 4403).

This means that for each one of our 240 audios, we have a MFCC matrix, with time-varying features that characterize it.

X_MFCC_tensor
array([[[-3.25005737e+02, -3.13607208e+02, -3.10144470e+02, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 6.67731781e+01,  7.59160614e+01,  7.93242340e+01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 1.46149426e+01,  1.50402470e+01,  1.77627831e+01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        ...,
        [-4.79171467e+00, -1.09497318e+01, -9.35082436e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [-3.61563778e+00, -1.05655603e+01, -6.97648907e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 8.39404225e-01, -1.94986522e+00,  3.36962867e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00]],

       [[-3.77771057e+02, -3.72950775e+02, -3.70798340e+02, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 1.37817211e+01,  1.93036613e+01,  2.15146694e+01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 8.53915405e+00,  1.09322910e+01,  1.14195042e+01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        ...,
        [ 2.99091190e-01,  1.31409812e+00, -1.37108326e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 1.42719173e+00,  2.57501340e+00,  6.80215955e-01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 1.89939249e+00,  3.12848997e+00,  3.17334318e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00]],

       [[-3.78744751e+02, -3.71267303e+02, -3.72638184e+02, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 3.59314842e+01,  4.38024063e+01,  4.41164932e+01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 8.90640831e+00,  1.22590275e+01,  1.53751431e+01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        ...,
        [-2.48288512e+00, -4.04505491e+00, -2.29102421e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 3.44525909e+00,  5.58475924e+00,  6.77276421e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 3.18025351e+00,  7.84416962e+00,  8.91281891e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00]],

       ...,

       [[-1.39822342e+02, -1.64667130e+02, -2.25252365e+02, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 1.03545357e+02,  1.08468513e+02,  9.81202545e+01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 9.58028412e+00,  1.53690796e+01,  3.00571747e+01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        ...,
        [ 5.99450636e+00,  1.21303453e+01,  3.11318607e+01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 3.47830296e+00,  6.51342058e+00,  1.70105896e+01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [-4.55060124e-01,  6.44598126e-01,  3.88159275e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00]],

       [[-1.84438980e+02, -1.90262070e+02, -2.01296692e+02, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 7.01741333e+01,  7.43718948e+01,  7.84279633e+01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 1.31448584e+01,  1.41633148e+01,  1.33200607e+01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        ...,
        [-1.57305622e+00,  9.78493154e-01, -5.54744959e-01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 4.42293644e+00,  2.46918893e+00,  2.14608788e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 3.28937721e+00,  3.15117645e+00,  1.88868785e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00]],

       [[-1.69065125e+02, -1.80024323e+02, -1.97941238e+02, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 7.73344879e+01,  7.82881241e+01,  7.72418594e+01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 2.05024719e+01,  2.66848755e+01,  3.05285492e+01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        ...,
        [-5.02676678e+00, -7.10136032e+00, -8.99715805e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [-4.17449474e-01, -1.66777277e+00, -2.24128127e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [-2.11845565e+00, -3.21187496e+00, -4.22027206e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00]]])
X_MFCC_tensor.shape
(240, 20, 4403)

For example, this is the MFCC matrix for our first audio file:

X_MFCC_tensor[0]
array([[-325.0057373 , -313.60720825, -310.14447021, ...,    0.        ,
           0.        ,    0.        ],
       [  66.7731781 ,   75.9160614 ,   79.32423401, ...,    0.        ,
           0.        ,    0.        ],
       [  14.61494255,   15.04024696,   17.76278305, ...,    0.        ,
           0.        ,    0.        ],
       ...,
       [  -4.79171467,  -10.94973183,   -9.35082436, ...,    0.        ,
           0.        ,    0.        ],
       [  -3.61563778,  -10.56556034,   -6.97648907, ...,    0.        ,
           0.        ,    0.        ],
       [   0.83940423,   -1.94986522,    3.36962867, ...,    0.        ,
           0.        ,    0.        ]])
X_MFCC_tensor[0].shape
(20, 4403)

As you can see there are zeros at the end of each row, this are the results of padding the MFCC matrix.

The point is that each audio has a different size MFCC matrix (in time-samples terms, since all of them have the same number of components), so, in order to allocate all of them in a 3D array we need to enforce the same size for all of them, and this is done by forcing all the MFCC to have the same number of time-samples, concretely the maximum one (max-n_time_samples), let say, the one of the largest MFCC matrix. So, now all the MFCC matrices will have extra positions in the time-samples (except the largest one), and those extra positions are filled with zeros. This process is called as padding, and is done automatically by our function get_X_tensor_audio_features.

From MFCC matrices to predictors matrix (tabular data)#

The task is to transform the \(n\) MFCC matrices (one per audio) into a predictors matrix (2D array) of shape n x ncomp.

For doing this we are going to use aur custom function get_X_audio_features, that takes a list of audio file paths, a method for features extraction and the parameters for that method, then process the audio files as signals, and apply the specified feature extraction method on the signal, obtaining time-varying features of it. Then statistics are computed for each feature along the time dimension, obtaining a vector (1D array) of size ncomp.

This process is repeated for each audio file, obtaining \(n\) vectors, and the results are allocated in a 2D array of size n x ncomp, that is, a matrix. The final output of our function is the desired predictors matrix.

Here, as example, we compute two possible predictors (features) matrices using the MFCC method for feature extraction and two different statistics configurations, one with the mean and another with both the mean and the standard deviation.

X_MFCC_stats_1 = get_X_audio_features(paths=files_df['path'], method='MFCC', stats='mean', sr=fs, n_fft=nfft, hop_length=fp, n_mels=nbands, n_mfcc=ncomp)
X_MFCC_stats_2 = get_X_audio_features(paths=files_df['path'], method='MFCC', stats='mean-std', sr=fs, n_fft=nfft, hop_length=fp, n_mels=nbands, n_mfcc=ncomp)

These matrices has predictors as columns and observations/samples as rows.

Since we are working with 240 audios these matrices will have 240 rows, and the number of predictors depends on the used statistics.

If only one statistic us used, as in the first case, we will have a number of predictors equal to the number of components fixed for the features extraction method, in this case 20.

If we use a combination of several statistics, let say \(k\), the number of predictors will be k*ncomp.

In the first case, since we have used only the mean as statistics the number of predictors is equal to the number of MFCC components, so 20. These predictors represent the mean of the time-varying components for each audio. For example, the first predictors contains the mean of the first time-varying MFCC component (the mean of that component along time) for each one of the 240 available audios.

In the second case we have used two statistics, therefore the number of predictors is 40 (2*20), the first 20 predictors represent the mean of the 20 time-varying components for each audio file, and the next 20 predictors represent the standard deviation.

Is pretty obvious that this idea can be generalized, so, we can use any combination of statistics as well as of features extraction to build predictors matrices to be used along with ML algorithms in predictive scenarios like this.

In the predictive part of this project we will explore these alternatives, both combining different statistics as well as features extraction methods.

Note: when we talk about combining feature extraction methods what we mean is to obtain matrices using different methods and then concatenate them to form a single predictors matrix that combines the information of them, which could improve the predictive performance of certain models. This option has been considered in the predictive part.

X_MFCC_stats_1
array([[-2.13158752e+02,  9.29840240e+01, -6.40546494e+01, ...,
        -2.17615294e+00,  2.29761638e-02, -5.47736108e-01],
       [-1.98984543e+02,  6.80483322e+01, -4.74277916e+01, ...,
         4.45366669e+00,  9.83190179e-01, -4.84144878e+00],
       [-2.40027390e+02,  5.97654343e+01, -2.60434890e+00, ...,
         7.12371826e-01, -3.60437250e+00, -7.20492887e+00],
       ...,
       [-2.91605225e+02,  8.18440247e+01,  2.95778332e+01, ...,
         1.58350534e+01,  1.41482115e+01,  1.16930342e+01],
       [-2.12697678e+02,  7.87770691e+01,  1.69434319e+01, ...,
        -4.82905912e+00,  2.94541955e+00,  1.33842838e+00],
       [-2.09183228e+02,  7.25918579e+01,  3.24228172e+01, ...,
        -1.01590958e+01,  1.20261431e+00, -3.76960754e+00]], dtype=float32)
X_MFCC_stats_1.shape
(240, 20)
X_MFCC_stats_2
array([[-213.15875  ,   92.984024 ,  -64.05465  , ...,    7.095715 ,
           6.177743 ,    5.351471 ],
       [-198.98454  ,   68.04833  ,  -47.42779  , ...,    6.012935 ,
           5.59839  ,    4.8842993],
       [-240.02739  ,   59.765434 ,   -2.604349 , ...,    5.646291 ,
           5.4601407,    6.5454264],
       ...,
       [-291.60522  ,   81.844025 ,   29.577833 , ...,   13.644189 ,
          11.3029785,    9.850929 ],
       [-212.69768  ,   78.77707  ,   16.943432 , ...,    2.2818298,
           2.13314  ,    2.4295764],
       [-209.18323  ,   72.59186  ,   32.422817 , ...,    2.3896735,
           3.0300555,    2.1277957]], dtype=float32)
X_MFCC_stats_2.shape
(240, 40)

Example of predictors matrix that combines different features extraction methods and statistics#

X_MFCC_stats = get_X_audio_features(paths=files_df['path'], method='MFCC', stats='mean-std', sr=fs, n_fft=nfft, hop_length=fp, n_mels=nbands, n_mfcc=ncomp)
X_chroma_stats = get_X_audio_features(paths=files_df['path'], method='chroma', stats='median-std', sr=fs, n_fft=nfft, hop_length=fp, n_mels=nbands, n_mfcc=ncomp)
X_MFCC_stats.shape
(240, 40)
X_chroma_stats.shape
(240, 24)
X_combined = np.concatenate((X_MFCC_stats, X_chroma_stats), axis=1)
X_combined
array([[-2.1315875e+02,  9.2984024e+01, -6.4054649e+01, ...,
         2.7830657e-01,  3.4890896e-01,  1.7714919e-01],
       [-1.9898454e+02,  6.8048332e+01, -4.7427792e+01, ...,
         3.7769288e-01,  1.5670222e-01,  2.0125444e-01],
       [-2.4002739e+02,  5.9765434e+01, -2.6043489e+00, ...,
         1.8112896e-01,  2.1978563e-01,  3.3394688e-01],
       ...,
       [-2.9160522e+02,  8.1844025e+01,  2.9577833e+01, ...,
         3.7720391e-01,  3.9648506e-01,  4.0834939e-01],
       [-2.1269768e+02,  7.8777069e+01,  1.6943432e+01, ...,
         4.0676277e-02,  7.6654352e-02,  9.9300966e-02],
       [-2.0918323e+02,  7.2591858e+01,  3.2422817e+01, ...,
         2.8915258e-02,  3.2900050e-02,  3.4263603e-02]], dtype=float32)
X_combined.shape
(240, 64)

Chromagram#

Chroma features are a powerful tool for analyzing music. They capture the essence of harmony, melody, and tonality of musical signals. By projecting the entire spectrum onto 12 different bins representing the 12 distinct semitones (or chromatic scale) in Western music, chroma features provide a high-level representation of music or audio in terms of octaves. This can be particularly useful for capturing the musical aspects of speech which could correlate with disease states.

# Feature extraction with Chromagram 
x_chroma = librosa.feature.chroma_stft(y=audio_signal_1, sr=fs, n_fft=nfft, hop_length=fp).T
x_chroma
array([[0.76920897, 0.776894  , 0.34586996, ..., 1.        , 0.63658094,
        0.48888198],
       [0.5365836 , 0.57130456, 0.07953023, ..., 1.        , 0.46217796,
        0.23196535],
       [0.59507215, 0.52440584, 0.04829044, ..., 1.        , 0.4172734 ,
        0.19030185],
       ...,
       [0.9172087 , 0.7431331 , 0.6895156 , ..., 0.8187049 , 0.9443104 ,
        1.        ],
       [0.9935548 , 0.934088  , 0.8525207 , ..., 0.9157632 , 0.99230087,
        1.        ],
       [0.93939584, 1.        , 0.96149266, ..., 0.8938448 , 0.92480946,
        0.94353443]], dtype=float32)
x_chroma.shape
(1692, 12)

From Chroma matrices to tensor#

X_chroma_tensor = get_X_tensor_audio_features(paths=files_df['path'], method='chroma', sr=fs, n_fft=nfft, hop_length=fp)
X_chroma_tensor
array([[[3.30834001e-01, 2.73084641e-01, 1.72916576e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [4.88729805e-01, 3.64229351e-01, 3.07732373e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [8.24708462e-01, 8.15815687e-01, 7.72476017e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        ...,
        [7.54282296e-01, 9.33874607e-01, 6.54246688e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [4.48958814e-01, 5.71549773e-01, 4.18673873e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [3.12655747e-01, 3.80960107e-01, 2.34689996e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00]],

       [[3.12040150e-01, 1.55773565e-01, 2.85095990e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [3.72878194e-01, 2.47185424e-01, 3.49979341e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [7.39971995e-01, 4.82317954e-01, 3.00103635e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        ...,
        [4.79578823e-01, 3.46601009e-01, 5.40694356e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [4.12601918e-01, 1.45907149e-01, 3.16213101e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [3.80797684e-01, 1.39935687e-01, 2.63296276e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00]],

       [[1.95253670e-01, 5.68398200e-02, 8.79043713e-02, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [2.79328316e-01, 1.09151907e-01, 1.37994885e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [4.16318625e-01, 2.32939079e-01, 2.45861769e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        ...,
        [1.15602165e-01, 4.24965061e-02, 6.30273521e-02, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [8.62279609e-02, 2.84093022e-02, 3.58258635e-02, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [1.22719601e-01, 3.72236483e-02, 4.10057195e-02, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00]],

       ...,

       [[5.85519336e-02, 5.56419091e-03, 3.07984083e-05, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [5.80269657e-02, 5.19187702e-03, 7.23518242e-05, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [7.61531293e-02, 5.30038262e-03, 1.78323127e-04, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        ...,
        [3.50973278e-01, 1.18097760e-01, 7.36766532e-02, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [1.37970760e-01, 1.42571330e-02, 1.69064559e-03, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [7.20229223e-02, 7.56322034e-03, 5.10804712e-05, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00]],

       [[5.96340179e-01, 4.91394937e-01, 4.66173530e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [7.40738869e-01, 6.75898135e-01, 6.55857086e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [8.93028915e-01, 8.82704973e-01, 8.74643266e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        ...,
        [4.46513116e-01, 2.38409176e-01, 1.97379783e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [4.71661240e-01, 3.15472454e-01, 2.87048072e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [5.34890950e-01, 3.94979298e-01, 3.70775521e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00]],

       [[5.93712926e-01, 3.97959381e-01, 3.34024280e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [6.57446086e-01, 5.45625567e-01, 5.14605403e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [8.18063438e-01, 7.82413363e-01, 7.81226754e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        ...,
        [4.74603772e-01, 2.39262685e-01, 2.04746559e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [5.70781469e-01, 3.43587726e-01, 2.86463976e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
        [6.04692757e-01, 3.66815865e-01, 2.94027925e-01, ...,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00]]])
X_chroma_tensor.shape
(240, 12, 4403)

From Chroma matrices to predictors matrix (tabular data)#

X_chroma_stats = get_X_audio_features(paths=files_df['path'], method='chroma', stats='mean-std', sr=fs, n_fft=nfft, hop_length=fp)
X_chroma_stats
array([[0.32919034, 0.48674947, 0.13121736, ..., 0.27830657, 0.34890896,
        0.17714919],
       [0.11371233, 0.19766188, 0.09701521, ..., 0.37769288, 0.15670222,
        0.20125444],
       [0.5519331 , 0.1438093 , 0.08654997, ..., 0.18112896, 0.21978563,
        0.33394688],
       ...,
       [0.25067928, 0.23875663, 0.26547787, ..., 0.3772039 , 0.39648506,
        0.4083494 ],
       [0.3393735 , 0.5189489 , 0.7807806 , ..., 0.04067628, 0.07665435,
        0.09930097],
       [0.30499187, 0.47429892, 0.7501884 , ..., 0.02891526, 0.03290005,
        0.0342636 ]], dtype=float32)
X_chroma_stats.shape
(240, 24)

Spectral Centroid#

The Spectral Centroid represents the center of mass of the spectrum, providing a measure of the brightness of a sound. It is calculated as the weighted mean of the frequencies present in the sound, with their magnitudes as the weights. This feature gives an idea of how high or low the majority of the energy is in a sound spectrum.

# Feature extraction with spectral centroid 
x_spectral_centroid = librosa.feature.spectral_centroid(y=audio_signal_1, sr=fs, n_fft=nfft, hop_length=fp).T
x_spectral_centroid
array([[1173.49156085],
       [1173.44240948],
       [1225.6018241 ],
       ...,
       [2030.97788411],
       [2028.13073426],
       [1640.92396239]])
x_spectral_centroid.shape
(1692, 1)

From Spectral Centroid matrices to tensor#

X_spectral_centroid_tensor = get_X_tensor_audio_features(paths=files_df['path'], method='spectral_centroid', sr=fs, n_fft=nfft, hop_length=fp)
X_spectral_centroid_tensor
array([[[ 860.68242317,  737.32612638,  702.74733212, ...,
            0.        ,    0.        ,    0.        ]],

       [[1722.98555129, 1677.03902985, 1615.59102764, ...,
            0.        ,    0.        ,    0.        ]],

       [[1479.32504338, 1345.68771268, 1360.28338621, ...,
            0.        ,    0.        ,    0.        ]],

       ...,

       [[ 663.09360913,  586.61483978,  520.02672314, ...,
            0.        ,    0.        ,    0.        ]],

       [[1086.38030366,  971.71842423,  898.1939858 , ...,
            0.        ,    0.        ,    0.        ]],

       [[ 796.84943399,  704.26379079,  679.87846212, ...,
            0.        ,    0.        ,    0.        ]]])
X_spectral_centroid_tensor.shape
(240, 1, 4403)

From Spectral Centroid matrices to predictors matrix#

X_spectral_centroid_stats = get_X_audio_features(paths=files_df['path'], method='spectral_centroid', stats='mean-std', sr=fs, n_fft=nfft, hop_length=fp)
X_spectral_centroid_stats
array([[1222.51022012,  106.02869666],
       [1352.95675704,  292.90811358],
       [1573.29651356,  452.12319515],
       [1148.70689305,  102.08901252],
       [1030.46170316,  187.55238243],
       [1295.44084773,  208.27129492],
       [1263.05181668,  248.44794161],
       [1003.50312509,  133.57028631],
       [1070.73699491,  129.43194965],
       [1069.19199417,  149.36018421],
       [ 770.5005775 ,   59.60935519],
       [ 453.24031129,   29.71792047],
       [1344.7035196 ,  725.68509036],
       [1336.46868934,  608.80623466],
       [1227.09023004,  106.67904364],
       [1434.53104501,  117.39653373],
       [1932.02713538,  170.23029854],
       [ 932.59136027,  124.93733827],
       [ 696.62066277,  276.71068651],
       [1963.60256868,  720.26894443],
       [1915.64251365,  665.98189964],
       [ 919.65891344,  431.09062568],
       [1187.01412327,  447.66393129],
       [1199.32941239,  480.20315422],
       [ 852.61529121,  514.50788938],
       [ 546.81791865,  518.49770286],
       [1510.02991708,  871.70219926],
       [1461.9026807 ,  800.3389051 ],
       [1223.82662023,  393.717687  ],
       [1475.41466613,  469.09860929],
       [1659.40294229,  447.99165796],
       [ 925.68728539,  265.80684328],
       [ 548.3999444 ,  194.34276981],
       [1107.86299172,  579.81563808],
       [1111.05775534,  619.85001803],
       [ 998.67234754,  466.06555326],
       [1286.78972652,  485.63595525],
       [1117.61042445,  546.66797142],
       [ 827.13857334,  437.22395128],
       [ 625.32340127,  591.58583135],
       [1674.71503742,  880.832784  ],
       [1720.21758856,  893.16075877],
       [1103.06324294,  433.56105328],
       [ 844.4973583 ,  351.04323765],
       [ 718.10124674,  252.39332035],
       [ 673.64820996,  287.14242312],
       [ 410.11190074,  201.49233214],
       [1376.72184891,  792.21863795],
       [1599.82076939,  909.11019598],
       [1154.67824106,  125.49427835],
       [1282.40223179,  160.06453406],
       [1130.8653972 ,  131.49315532],
       [ 912.28588184,  266.4720747 ],
       [ 558.01275111,  173.14414971],
       [2232.41678324,  817.17077469],
       [2066.70869079,  803.96679082],
       [ 837.62234866,   38.27332706],
       [1088.90998097,  137.43539107],
       [1505.98407692,  183.41097446],
       [ 556.38792632,  146.56856867],
       [ 495.70141716,   75.9556951 ],
       [1325.96928205,  774.79824914],
       [1292.60576071,  750.76985192],
       [1043.15599798,  247.88217992],
       [1293.25415854,  223.72766107],
       [1199.97054074,  287.51526491],
       [ 827.96868198,  155.45311023],
       [ 518.68214836,  278.34153612],
       [1947.82163181,  661.609216  ],
       [2018.97652252,  634.50370732],
       [1455.71173113,  182.93397549],
       [2016.76583394,  176.43360352],
       [2351.53392258,  128.64121373],
       [1240.41281588,  123.93831135],
       [ 657.52351135,  125.86747875],
       [1457.03697135,  896.46501713],
       [1575.61867723,  651.81612992],
       [1016.49224637,  267.06308765],
       [1111.07640479,  178.33280696],
       [1366.00258619,  178.94406355],
       [ 696.8660634 ,   93.10377053],
       [ 467.16446452,  201.30926384],
       [1717.63722669,  633.03784541],
       [1974.89563535,  629.50697992],
       [ 936.66694381,  107.73159875],
       [1050.76892774,   87.84057546],
       [ 920.38419263,  206.05959952],
       [ 671.67588225,  228.64260454],
       [ 425.20081084,  169.67614849],
       [1594.78178032,  541.29980977],
       [1433.15712446,  588.8918704 ],
       [1305.59875918,  408.09844578],
       [1301.20365956,  243.24999637],
       [1262.78224327,  342.20760944],
       [ 869.19306719,  440.34424592],
       [ 670.95620752,  167.79673256],
       [1295.06480512,  765.94187184],
       [1207.9721599 ,  589.34359313],
       [1107.48436685,   87.46255333],
       [ 904.51082148,  138.23301117],
       [ 659.61637497,   60.49428587],
       [ 598.46330699,   58.91508254],
       [ 358.6928751 ,  118.92731077],
       [1214.56923866,  621.80405993],
       [1299.88392768,  765.12484925],
       [1120.55709123,  150.41430585],
       [1131.68454831,  183.76642489],
       [ 878.98036993,  217.63520394],
       [ 745.92270901,  123.39398383],
       [ 482.80653207,  294.521634  ],
       [1612.57562457,  796.35755135],
       [1619.05712347,  817.94451602],
       [1240.23108306,  105.99526918],
       [1412.34261618,  140.48262655],
       [1816.10740793,  173.0388888 ],
       [ 958.43213814,  125.73785492],
       [ 611.5171495 ,  127.67713156],
       [1513.01507093,  515.21928873],
       [1499.68837371,  448.9238126 ],
       [1086.24913696,  238.71201068],
       [1265.17937979,  397.08646852],
       [1211.67043901,  357.72861428],
       [1185.79322403,  497.15659448],
       [ 618.70811719,  248.36974082],
       [1195.64309399,  633.63411393],
       [1343.92304274,  435.36640078],
       [1081.99046491,  180.17088106],
       [1153.27757916,  158.65034769],
       [ 838.15272545,  342.98686133],
       [ 850.70770902,  287.40833282],
       [ 431.01773158,  191.55889467],
       [1147.73215026,  417.77017472],
       [1238.99110037,  386.54781237],
       [1004.98338201,  264.83963487],
       [ 919.22122622,  382.17909503],
       [ 895.12495843,  229.97896092],
       [ 789.85249873,  264.13786958],
       [ 503.55409286,  304.52139403],
       [1484.23600109,  642.2053158 ],
       [1271.51657377,  549.83873064],
       [ 954.67145971,  234.82784253],
       [ 857.57593056,   63.403128  ],
       [ 463.48546104,   59.01714627],
       [1019.09077456,  518.44780856],
       [1006.55020954,   77.57806083],
       [ 724.72350649,   65.99740948],
       [ 802.21762527,   67.01015058],
       [ 527.28815924,   83.46680564],
       [ 995.2176487 ,   47.12128062],
       [ 927.11164285,   48.01420616],
       [1051.79370871,   99.30393497],
       [ 539.37208744,   34.57917517],
       [ 411.45882376,   57.40506949],
       [ 986.18066408,  409.05198979],
       [1265.35478019,  485.8809656 ],
       [1061.94645059,  119.36894347],
       [ 802.38742318,   44.76000009],
       [ 672.86472897,   39.03110342],
       [ 845.10526287,   71.06031366],
       [ 652.83534576,  186.35787338],
       [1152.88158988,  609.00476796],
       [1146.19868864,  392.66038521],
       [ 879.9539933 ,  376.53670465],
       [1035.28409297,  454.65176759],
       [ 853.44004307,  356.04717158],
       [ 587.15479794,  272.61904313],
       [ 411.3928329 ,  268.69779277],
       [ 978.23016554,  430.44802835],
       [ 794.02097327,  358.97610528],
       [1305.449728  ,  130.67317489],
       [1085.74939654,  267.43239657],
       [1587.86977289,  465.04677322],
       [1106.49953298,  216.79341511],
       [ 972.45273034,  206.25977577],
       [1037.14410916,  133.76358633],
       [1118.91778742,   69.02621605],
       [ 949.21525276,   83.45813494],
       [ 715.89600178,   30.42736107],
       [ 485.03935647,  115.2463449 ],
       [1337.72019182,  169.79429395],
       [1388.97182208,  111.96171378],
       [1919.63043694,  144.22263796],
       [1201.63152769,  563.07556183],
       [ 820.95957046,  455.07968708],
       [ 522.48685935,  544.76495817],
       [1174.78984287,  370.00687267],
       [1520.36950547,  421.93178626],
       [1546.38885385,  481.50915732],
       [ 858.88042607,  223.6199381 ],
       [ 554.7293444 ,  176.71613639],
       [ 637.84029378,  270.84900721],
       [ 606.00574906,  259.01397586],
       [ 402.00421966,  180.32844886],
       [1047.91094996,   83.76691236],
       [1451.93038954,  211.62007407],
       [1169.71254647,  107.64251841],
       [ 821.60202288,   62.4673394 ],
       [ 568.13483024,   25.36348971],
       [ 846.09733504,  145.28262361],
       [1088.22458778,   51.95388124],
       [1199.58404603,   96.86233842],
       [ 584.10389212,  198.87244551],
       [ 410.27813316,   34.64098124],
       [ 968.22362454,   46.11834746],
       [1200.02490971,   50.67794376],
       [1291.47840853,  116.71885134],
       [ 776.78830327,   54.43044729],
       [ 520.73293129,   69.82289885],
       [1749.12939565,  113.14448623],
       [2141.48645895,  179.18077818],
       [2451.50580028,  254.43382404],
       [1518.5467943 ,   65.97152251],
       [ 744.13249207,   43.9597291 ],
       [1233.95619608,  128.76257334],
       [1703.33919116,  231.93582054],
       [1645.17497616,  142.02142305],
       [ 615.46242043,   41.83108505],
       [ 415.43269335,   36.21164266],
       [ 987.57728743,   62.81949447],
       [ 877.38299317,   52.83187359],
       [ 829.57967244,   73.9722045 ],
       [ 499.07556106,   41.32029723],
       [ 364.52448202,   60.77233679],
       [1217.35533547,  126.38074794],
       [1047.59734963,  110.17178051],
       [ 720.80693174,   65.85325014],
       [ 699.72368554,   17.16116139],
       [ 524.44371865,  433.14241086],
       [1274.51490416,   40.97357169],
       [1591.47096766,  100.78604912],
       [1468.84288746,  130.4952661 ],
       [ 930.91711822,   43.56742745],
       [ 557.03528738,   38.03243915],
       [1086.24913696,  238.71201068],
       [1265.17937979,  397.08646852],
       [1211.67043901,  357.72861428],
       [1185.79322403,  497.15659448],
       [ 618.70811719,  248.36974082],
       [1282.07792881,   64.07181308],
       [1123.23739168,   66.49508231]])
X_spectral_centroid_stats.shape
(240, 2)

Spectral Bandwidth#

Spectral Bandwidth measures the width of a band of frequencies and is defined as the width of the range of frequencies at which the magnitude of the spectrum is greater than a certain percentage of the peak magnitude. It reflects the spread of the spectrum and can indicate the complexity of a sound. A wider bandwidth signifies a noise-like or complex sound, while a narrow bandwidth indicates a tonal or simple sound.

# Feature extraction with spectral bandwith 
x_spectral_bandwidth = librosa.feature.spectral_bandwidth(y=audio_signal_1, sr=fs, n_fft=nfft, hop_length=fp).T
x_spectral_bandwidth
array([[1309.30089921],
       [1255.1158018 ],
       [1275.03885178],
       ...,
       [2427.90186889],
       [2420.073342  ],
       [2256.56008742]])
x_spectral_bandwidth.shape
(1692, 1)

From Spectral Bandwidth matrices to tensor#

X_spectral_bandwidth_tensor = get_X_tensor_audio_features(paths=files_df['path'], method='spectral_bandwidth', sr=fs, n_fft=nfft, hop_length=fp)
X_spectral_bandwidth_tensor
array([[[1143.32497568,  942.13602371,  870.97126742, ...,
            0.        ,    0.        ,    0.        ]],

       [[1785.06408858, 1760.67818379, 1734.84937353, ...,
            0.        ,    0.        ,    0.        ]],

       [[1574.27623182, 1486.74798403, 1545.91856   , ...,
            0.        ,    0.        ,    0.        ]],

       ...,

       [[ 901.28159703,  722.00737352,  537.66098854, ...,
            0.        ,    0.        ,    0.        ]],

       [[1617.36931068, 1553.21232249, 1469.45092787, ...,
            0.        ,    0.        ,    0.        ]],

       [[1417.14548475, 1323.33541644, 1265.27486656, ...,
            0.        ,    0.        ,    0.        ]]])
X_spectral_bandwidth_tensor.shape
(240, 1, 4403)

From Spectral Bandwidth matrices to predictors matrix (tabular data)#

X_spectral_bandwidth_stats = get_X_audio_features(paths=files_df['path'], method='spectral_bandwidth', stats='mean-std', sr=fs, n_fft=nfft, hop_length=fp)
X_spectral_bandwidth_stats
array([[ 547.71039463,  190.68833226],
       [ 949.60789304,  210.65348963],
       [1257.37962494,  221.11233883],
       [ 655.10901947,  266.61427025],
       [1036.72287665,  183.3495748 ],
       [1087.49948991,  499.1671853 ],
       [1112.26886734,  453.83392324],
       [1008.52278684,  169.65089348],
       [1136.31978971,  114.58698542],
       [1369.45105255,  137.89868859],
       [ 733.60358082,   90.93676414],
       [ 600.8755911 ,   55.60126607],
       [1797.74599018,  734.56558902],
       [1776.10775885,  612.33648523],
       [ 944.46518321,  149.55301631],
       [1434.2496765 ,  105.46849527],
       [1954.60189389,  100.56700799],
       [ 910.6004008 ,  154.15121229],
       [ 765.39650212,  290.37160189],
       [1983.59218098,  620.01437049],
       [1981.69727227,  581.72095076],
       [ 998.51875333,  408.45410957],
       [1450.18425688,  454.85092884],
       [1518.25136553,  482.11063008],
       [1094.91762071,  458.69315579],
       [ 781.96866483,  474.70930895],
       [1680.16490268,  888.4293138 ],
       [1668.06768676,  872.49278575],
       [ 928.87365496,  333.25661316],
       [1344.69901462,  431.16109861],
       [1744.15188681,  462.07526467],
       [ 752.74092272,  240.32506969],
       [ 576.19869778,  231.88184117],
       [1192.73007265,  740.45643954],
       [1200.84935927,  759.63084532],
       [1179.96885612,  430.67727142],
       [1414.42977622,  473.60777341],
       [1379.5033354 ,  547.01904526],
       [ 997.58955261,  458.86070221],
       [ 780.89645038,  545.48013567],
       [1834.09953118,  878.99621659],
       [1802.03908387,  843.64936634],
       [1391.77918637,  472.65750026],
       [1122.56229877,  425.60571427],
       [1130.8964514 ,  380.10071557],
       [ 924.95368606,  360.77284215],
       [ 658.94058718,  281.22860998],
       [1747.89250412,  837.00998138],
       [1876.11907605,  878.56767846],
       [1150.51058542,  163.12005018],
       [1391.4578276 ,  155.95806739],
       [1421.86726724,  116.50717496],
       [1016.2546602 ,  277.75449182],
       [ 760.3493956 ,  190.70772893],
       [2153.65553644,  625.44801062],
       [2118.80433895,  579.84525812],
       [1069.7433535 ,   56.92207264],
       [1291.27802182,  163.95136861],
       [1696.43408677,  156.86229481],
       [ 606.35181409,  174.97106457],
       [ 607.38289993,  113.47758636],
       [1611.89720517,  627.53012741],
       [1547.5327357 ,  714.24148487],
       [1041.59028291,  272.68499906],
       [1363.07790584,  178.53626408],
       [1423.32899711,  202.9157379 ],
       [ 869.5176206 ,  229.47322246],
       [ 624.08077935,  347.96106504],
       [2087.82096024,  525.85130172],
       [2121.54004258,  493.39515192],
       [1223.00502404,  185.80370432],
       [1574.62273867,  146.79569926],
       [1648.41297682,   88.48103811],
       [1135.67503014,  150.54060209],
       [ 776.96386065,  148.87374308],
       [1682.99049988,  680.78042209],
       [1743.76985717,  540.17759968],
       [1163.50460259,  224.07209218],
       [1210.70608323,  175.91447923],
       [1592.28224404,  168.65178026],
       [ 922.67344763,  157.202565  ],
       [ 666.76944657,  262.93292986],
       [1956.48308592,  488.23418425],
       [2095.05972543,  456.60586165],
       [1026.10393124,  131.87057915],
       [1287.98723388,  131.12888481],
       [1302.89833322,  201.50687481],
       [ 792.34975876,  270.71595399],
       [ 610.823858  ,  256.12631197],
       [1827.38726949,  548.96591199],
       [1618.47020489,  591.53908473],
       [1594.61260724,  488.85688559],
       [1432.35396378,  248.86526491],
       [1531.56994973,  315.53675823],
       [ 948.51498137,  435.15046369],
       [ 792.11089674,  251.55952899],
       [1639.67946997,  643.69785873],
       [1600.16586035,  572.76353883],
       [1131.27915623,  112.95820103],
       [ 982.40600485,  155.45666168],
       [ 887.76595649,   62.21072306],
       [ 722.65582589,  110.08114967],
       [ 431.76410788,  184.1218538 ],
       [1512.36677254,  628.8341862 ],
       [1509.77070167,  622.34071884],
       [1030.26075925,  160.63266403],
       [1262.29908095,  185.85317336],
       [1297.77434675,  241.76533414],
       [ 764.37461079,  188.07892445],
       [ 683.35134368,  301.70878573],
       [1829.97074471,  612.43151148],
       [1876.31467085,  614.45525045],
       [1108.41622112,  135.34501357],
       [1271.97741246,  121.13148386],
       [1679.86350265,   95.0385343 ],
       [ 941.60400606,  173.17222557],
       [ 726.77203105,  160.9577211 ],
       [1812.53980695,  453.54587947],
       [1837.47311421,  400.29246761],
       [1181.20945294,  264.87839746],
       [1339.13617025,  281.22975215],
       [1431.87077073,  300.64451574],
       [1473.11111715,  469.10600388],
       [ 765.28143359,  485.76426098],
       [1409.63193315,  573.17811736],
       [1640.4477585 ,  477.29232806],
       [1388.54848504,  221.80750609],
       [1492.73102686,  200.05235862],
       [1256.56298565,  330.96153532],
       [1192.32237196,  248.60279257],
       [ 703.39447663,  240.42795929],
       [1693.1146841 ,  387.49392021],
       [1801.11448512,  388.751357  ],
       [1216.93121413,  231.77629827],
       [1299.51681861,  335.36366565],
       [1293.62730691,  204.6915478 ],
       [1098.41744346,  239.14238942],
       [ 691.07720828,  328.55285461],
       [1887.37511249,  526.61287901],
       [1626.01162602,  551.22178193],
       [1404.84954745,  237.46471246],
       [1205.37324391,   91.12948719],
       [ 747.92342639,   89.61994738],
       [1266.49210553,  442.63119496],
       [1381.69916656,   99.82853962],
       [1113.57876763,   91.18169824],
       [1137.64027651,  120.18712096],
       [ 742.21175805,  115.58692178],
       [1231.78409791,   76.50641214],
       [1215.58342974,   67.6930879 ],
       [1362.75532412,   85.21833493],
       [ 635.46891407,   58.44085211],
       [ 536.71555155,   73.82526798],
       [1191.76321633,  332.56765386],
       [1470.35956029,  373.72426834],
       [1439.61783807,  140.32685033],
       [ 849.68501895,  162.7774635 ],
       [ 740.16355888,   94.14465483],
       [ 787.83229561,  109.67583212],
       [ 696.84896944,  199.117318  ],
       [1227.96684984,  577.06680691],
       [1433.67157464,  434.44200338],
       [1066.80459777,  468.9600619 ],
       [1195.8029285 ,  524.38543667],
       [1132.34848677,  489.94946752],
       [ 719.38047665,  338.51121742],
       [ 544.73690499,  339.57961169],
       [1252.72449374,  548.3030673 ],
       [1045.3532014 ,  474.423362  ],
       [ 546.60899718,  217.87787246],
       [ 856.17924212,  252.99940333],
       [1308.76355506,  206.04033045],
       [ 708.20961252,  313.05600769],
       [ 991.19036361,  152.4540067 ],
       [ 980.93440482,   96.21251148],
       [1157.47657461,   89.9392409 ],
       [1242.7218421 ,   82.83212673],
       [ 654.05963361,   54.67993009],
       [ 636.10528493,  148.59982779],
       [1100.26653408,  159.92706773],
       [1405.34438759,  103.9836515 ],
       [1933.27891787,  105.62788057],
       [1615.43907393,  566.80756349],
       [1028.6059865 ,  434.36789787],
       [ 690.21078958,  449.58579489],
       [ 902.52586509,  322.6426786 ],
       [1437.8354143 ,  405.07221715],
       [1646.75598976,  500.43886899],
       [ 684.41904321,  228.31053949],
       [ 571.03412365,  223.37679209],
       [1023.75064056,  440.86647935],
       [ 841.17986868,  371.71613754],
       [ 647.2927289 ,  315.58535673],
       [1067.50132366,   98.40618407],
       [1551.7717964 ,  167.8117326 ],
       [1454.25681417,   79.24770319],
       [ 898.37609186,   77.95295461],
       [ 807.11121289,   53.51446775],
       [1037.32271118,  155.88011782],
       [1241.43686083,   68.20723684],
       [1517.42713872,  103.66003445],
       [ 620.35148557,  270.95841517],
       [ 490.10342053,   54.01293507],
       [1102.42860971,   67.18981059],
       [1241.68262464,   85.18896581],
       [1387.5167652 ,   78.82992207],
       [ 774.11574743,   78.36547621],
       [ 614.72394938,   89.71526665],
       [1335.08933938,   78.1218587 ],
       [1529.72164262,   70.41657334],
       [1712.55220471,   68.95098386],
       [1370.8300994 ,   44.90133942],
       [ 831.70460643,   52.90542831],
       [1386.43632492,  154.02155861],
       [1684.60163611,  135.32612933],
       [1944.06337635,  109.00798274],
       [ 715.42204263,   80.54445512],
       [ 624.43774291,   70.63284885],
       [ 957.89068915,  101.27828858],
       [ 954.28540925,   64.33790172],
       [1057.14680113,   64.67690657],
       [ 541.41027048,   57.86087652],
       [ 431.83741107,   82.06721887],
       [1013.56250988,  150.78494287],
       [1172.32433873,  127.75601552],
       [1075.96804169,  114.88108511],
       [ 682.82001004,   58.45651973],
       [ 718.01197724,  432.07213832],
       [1129.95979784,   46.54439183],
       [1461.68454365,   50.35623328],
       [1554.21693409,   55.26445159],
       [ 903.88887738,   51.53510667],
       [ 642.34101444,   51.66781636],
       [1181.20945294,  264.87839746],
       [1339.13617025,  281.22975215],
       [1431.87077073,  300.64451574],
       [1473.11111715,  469.10600388],
       [ 765.28143359,  485.76426098],
       [1582.67326195,   83.16492397],
       [1462.25168166,   84.83617958]])
X_spectral_bandwidth_stats.shape
(240, 2)

Spectral Contrast#

Spectral Contrast considers the difference in amplitude between peaks and valleys in the spectrum. This feature can be used to distinguish between different types of sound textures and timbres, as it effectively captures the dynamics of the spectral peaks and troughs over time.

# Feature extraction with Spectral Contrast 
x_spectral_contrast = librosa.feature.spectral_contrast(y=audio_signal_1, sr=fs, n_fft=nfft, hop_length=fp).T
x_spectral_contrast
array([[ 5.63810593,  5.66707538,  6.30149297, ..., 13.96765089,
        12.25343919, 10.17588324],
       [ 6.21691956, 11.38575072, 11.8819769 , ..., 21.75429392,
        18.58376962, 15.36240143],
       [25.49436508, 18.267316  , 25.67031318, ..., 19.25143146,
        17.4834322 , 15.83951655],
       ...,
       [ 9.64391107,  3.7931424 , 10.92962245, ..., 17.23052401,
        15.0653737 , 11.47613243],
       [11.83635255, 10.54209236,  8.97554807, ..., 17.5021864 ,
        10.951186  , 12.79228362],
       [ 8.21206286,  2.71809055,  4.90784151, ..., 17.91953245,
        11.35885709, 12.94873939]])
x_spectral_contrast.shape
(1692, 7)

From Spectral Contrast matrices to tensor#

X_spectral_contrast_tensor = get_X_tensor_audio_features(paths=files_df['path'], method='spectral_contrast', sr=fs, n_fft=nfft, hop_length=fp)
X_spectral_contrast_tensor
array([[[ 5.11662149, 10.292627  , 18.93497539, ...,  0.        ,
          0.        ,  0.        ],
        [ 6.59230729, 15.67208379,  8.04153712, ...,  0.        ,
          0.        ,  0.        ],
        [ 8.45520947, 17.5710727 , 18.92689921, ...,  0.        ,
          0.        ,  0.        ],
        ...,
        [12.37949689, 17.93070151, 20.61663416, ...,  0.        ,
          0.        ,  0.        ],
        [19.41221672, 15.34366631, 17.42559859, ...,  0.        ,
          0.        ,  0.        ],
        [17.22184445, 15.558392  , 17.60068867, ...,  0.        ,
          0.        ,  0.        ]],

       [[ 8.85085044,  9.03121079, 11.46311676, ...,  0.        ,
          0.        ,  0.        ],
        [ 6.52008624, 12.60085564,  4.4187557 , ...,  0.        ,
          0.        ,  0.        ],
        [ 6.68093446,  8.71320422, 13.95862644, ...,  0.        ,
          0.        ,  0.        ],
        ...,
        [16.7197144 , 13.70447852, 10.81675158, ...,  0.        ,
          0.        ,  0.        ],
        [11.91122822, 11.42524776, 18.37071563, ...,  0.        ,
          0.        ,  0.        ],
        [13.56250443, 21.8735152 , 20.07728048, ...,  0.        ,
          0.        ,  0.        ]],

       [[ 5.21670479, 13.65392556, 14.58397327, ...,  0.        ,
          0.        ,  0.        ],
        [ 4.65187137,  9.55822925, 10.14831792, ...,  0.        ,
          0.        ,  0.        ],
        [10.39356395, 11.80113395, 11.72104283, ...,  0.        ,
          0.        ,  0.        ],
        ...,
        [13.04850063, 20.12447109, 14.33627266, ...,  0.        ,
          0.        ,  0.        ],
        [13.14200444, 14.82449875, 12.20986094, ...,  0.        ,
          0.        ,  0.        ],
        [11.37148345, 13.42516093, 17.88007679, ...,  0.        ,
          0.        ,  0.        ]],

       ...,

       [[ 0.64248729,  2.0961406 ,  6.65306162, ...,  0.        ,
          0.        ,  0.        ],
        [ 7.20312337, 11.00841146, 23.68205907, ...,  0.        ,
          0.        ,  0.        ],
        [13.08904135, 17.47975944, 33.00536649, ...,  0.        ,
          0.        ,  0.        ],
        ...,
        [11.80279665,  9.73920911, 20.00545211, ...,  0.        ,
          0.        ,  0.        ],
        [ 6.41256743,  9.66989794, 26.14086535, ...,  0.        ,
          0.        ,  0.        ],
        [ 9.23506418, 12.4874168 , 25.51032203, ...,  0.        ,
          0.        ,  0.        ]],

       [[ 9.32894184, 13.2649533 , 29.12562629, ...,  0.        ,
          0.        ,  0.        ],
        [ 4.84771882, 10.07119587,  9.77597993, ...,  0.        ,
          0.        ,  0.        ],
        [ 4.30891259,  9.2015415 , 16.8332916 , ...,  0.        ,
          0.        ,  0.        ],
        ...,
        [10.13391008, 14.07278776, 16.34504072, ...,  0.        ,
          0.        ,  0.        ],
        [12.76159498, 19.44753392, 18.42114983, ...,  0.        ,
          0.        ,  0.        ],
        [12.27059947, 35.52131359, 18.83002358, ...,  0.        ,
          0.        ,  0.        ]],

       [[ 5.41242976, 11.35503993, 29.50033658, ...,  0.        ,
          0.        ,  0.        ],
        [ 8.64307322,  9.49400294, 13.23320113, ...,  0.        ,
          0.        ,  0.        ],
        [ 7.9839711 , 15.35658127, 22.49312214, ...,  0.        ,
          0.        ,  0.        ],
        ...,
        [11.60944505, 16.85032802, 12.1691571 , ...,  0.        ,
          0.        ,  0.        ],
        [13.09084623, 17.7903069 , 20.16377032, ...,  0.        ,
          0.        ,  0.        ],
        [ 7.19329074, 11.87063984, 18.35145274, ...,  0.        ,
          0.        ,  0.        ]]])
X_spectral_contrast_tensor.shape
(240, 7, 4403)

From Spectral Contrast matrices to predictors matrix (tabular data)#

X_spectral_contrast_stats = get_X_audio_features(paths=files_df['path'], method='spectral_contrast', stats='mean-std', sr=fs, n_fft=nfft, hop_length=fp)
X_spectral_contrast_stats
array([[15.06088705, 13.66975386, 21.81056366, ...,  4.84396642,
         4.33531148,  3.91902237],
       [10.96765446, 10.62080312, 21.04077107, ...,  4.59791976,
         4.76635799,  3.87400562],
       [14.57345365, 13.03160549, 20.7638555 , ...,  5.36559019,
         4.00373309,  5.56484739],
       ...,
       [ 8.84012428, 20.74519205, 22.22548112, ...,  3.60031112,
         4.46534307,  5.43222996],
       [22.70844871, 13.80892128, 13.44845706, ...,  3.08630459,
         2.73138099,  4.70273429],
       [24.06449453, 14.47909753, 17.42508059, ...,  3.08591616,
         2.60190525,  3.64455257]])
X_spectral_contrast_stats.shape
(240, 14)

Spectral Rolloff#

Spectral Rolloff is a measure of the shape of the signal. It represents the frequency below which a certain percentage of the total spectral energy, typically between 85% and 95%, is contained. This can indicate whether the sound is noise-like or tone-like.

# Feature extraction with spectral rolloff 
x_spectral_rolloff = librosa.feature.spectral_rolloff(y=audio_signal_1, sr=fs, n_fft=nfft, hop_length=fp).T
x_spectral_rolloff
array([[1750.  ],
       [1687.5 ],
       [1718.75],
       ...,
       [5468.75],
       [5250.  ],
       [4562.5 ]])
x_spectral_rolloff.shape
(1692, 1)

From Spectral Rolloff matrices to tensor#

X_spectral_rolloff_tensor = get_X_tensor_audio_features(paths=files_df['path'], method='spectral_rolloff', sr=fs, n_fft=nfft, hop_length=fp)
X_spectral_rolloff_tensor
array([[[1375.  , 1312.5 , 1281.25, ...,    0.  ,    0.  ,    0.  ]],

       [[3781.25, 3750.  , 3656.25, ...,    0.  ,    0.  ,    0.  ]],

       [[3031.25, 2750.  , 2906.25, ...,    0.  ,    0.  ,    0.  ]],

       ...,

       [[ 843.75,  750.  ,  750.  , ...,    0.  ,    0.  ,    0.  ]],

       [[2343.75, 1843.75, 1593.75, ...,    0.  ,    0.  ,    0.  ]],

       [[1406.25,  937.5 ,  781.25, ...,    0.  ,    0.  ,    0.  ]]])
X_spectral_rolloff_tensor.shape
(240, 1, 4403)

From Spectral Rolloff matrices to tensor#

X_spectral_rolloff_stats = get_X_audio_features(paths=files_df['path'], method='spectral_rolloff', stats='mean-std', sr=fs, n_fft=nfft, hop_length=fp)
X_spectral_rolloff_stats
array([[1578.6367205 ,  294.68382051],
       [2307.13619403,  645.09551762],
       [2839.43103941,  709.64216797],
       [1382.05467372,  513.57783209],
       [1389.07657658,  490.49725542],
       [2164.73642173,  946.00923455],
       [2131.14017572,  893.32214866],
       [1198.98875753,  441.14167795],
       [1719.21762126,  266.29955367],
       [1973.86210005,  596.13730246],
       [ 935.67607004,   58.1070769 ],
       [ 559.38546901,   43.88905301],
       [3432.1086262 , 2201.60451981],
       [3117.26238019, 2041.97472162],
       [1454.17764396,  311.29167221],
       [2514.0600159 ,  344.8225258 ],
       [3772.02406923,  242.67752925],
       [1033.64672482,  314.1257139 ],
       [ 857.72165698,  737.38720186],
       [4216.0543131 , 1976.77369894],
       [4163.7879393 , 1871.63426645],
       [1150.74325951,  957.16552888],
       [1879.84330484, 1069.92905233],
       [1613.3592832 , 1206.71699409],
       [1071.52650823, 1173.53613121],
       [ 742.73830935, 1209.61572622],
       [3458.38881491, 2228.09735205],
       [3318.49201065, 2146.37621317],
       [1563.86181193,  587.91731402],
       [2377.50536865,  794.41661805],
       [3324.47444612,  935.64808442],
       [1070.78607743,  450.20903218],
       [ 578.27019299,  320.58401002],
       [2141.57133244, 1623.96398923],
       [2255.82067371, 1731.8901964 ],
       [1338.34609684, 1144.35899745],
       [2138.61094762, 1135.79701637],
       [1576.4553429 , 1465.20702938],
       [1064.16693445, 1133.08929299],
       [ 890.07429164, 1388.3017005 ],
       [3858.66389914, 2224.35858576],
       [3799.33621718, 2181.85116565],
       [1435.80493741,  981.24560147],
       [ 819.73659717,  865.00084828],
       [ 387.73311049,  542.12799179],
       [ 707.5970201 ,  693.06733494],
       [ 367.19939117,  453.92438608],
       [3398.85831382, 2151.01256025],
       [3854.65730676, 2264.88491817],
       [1484.47888963,  424.14674694],
       [2131.84875328,  495.68605309],
       [1553.65861292,  872.72498138],
       [1139.91434203,  817.36289266],
       [ 511.07848372,  485.98713341],
       [4873.64682003, 1927.01127398],
       [4627.18563988, 1885.06710895],
       [ 885.60267857,  102.62042625],
       [1622.40740741,  405.69647845],
       [2545.17067124,  524.42803916],
       [ 728.3779985 ,  435.20021324],
       [ 614.2288197 ,  241.25372691],
       [2866.61161335, 2073.9756643 ],
       [2820.19297636, 2160.34752419],
       [1350.7054849 ,  731.81093646],
       [2099.0182803 ,  659.98831582],
       [2017.0386309 ,  922.31060334],
       [ 940.95595127,  541.41208085],
       [ 707.85501701,  802.86068199],
       [4471.5379494 , 1783.17038779],
       [4600.5859375 , 1677.91194826],
       [2507.421875  ,  568.81464629],
       [3827.36703682,  340.24512245],
       [3772.27489867,  219.35218214],
       [1837.5743205 ,  605.67155563],
       [ 803.093292  ,  381.85386404],
       [3169.83103198, 2294.27586906],
       [3678.22265625, 1740.43691076],
       [1223.09149184,  714.45317936],
       [1768.66442398,  561.82312587],
       [2358.01282051,  702.80736945],
       [ 863.04925157,  245.87019308],
       [ 639.68005498,  610.64071127],
       [3918.12586685, 1802.59150778],
       [4367.35048679, 1738.39212478],
       [1250.77033689,  334.76705459],
       [1385.63724193,  359.19361123],
       [ 746.30905512,  940.9836373 ],
       [ 993.31825658,  676.06153863],
       [ 494.01735624,  567.95022001],
       [3541.8667467 , 1619.45666857],
       [3040.68877551, 1681.95144235],
       [2542.87732042, 1530.66887727],
       [1905.97716588,  951.12761808],
       [1653.24039421, 1049.17231359],
       [1206.449877  , 1138.26826948],
       [ 839.93267629,  615.15670034],
       [2860.92862216, 1905.56375762],
       [2702.61310452, 1755.80060827],
       [1594.22451193,  327.23041889],
       [1675.16727494,  439.57349985],
       [ 518.33890031,  264.28908366],
       [ 658.03993694,  220.93226459],
       [ 458.62950763,  369.56606731],
       [2662.11752434, 1811.07551733],
       [2807.83913352, 1832.78082539],
       [1584.4495552 ,  404.61854163],
       [1358.60736926,  659.89217544],
       [ 495.10287486,  509.62082461],
       [ 972.95966229,  371.32572981],
       [ 574.41271963,  803.2310604 ],
       [3557.82312925, 2144.49266002],
       [3726.19047619, 2119.92357497],
       [1515.05245272,  301.74537836],
       [2398.69698992,  299.92106988],
       [3535.37936091,  318.58815764],
       [1169.01221455,  388.05427157],
       [ 713.34050721,  374.83249456],
       [3332.69583843, 1454.03962884],
       [3392.69950565, 1332.2369463 ],
       [1602.72051148,  738.35120943],
       [2125.28564899,  936.30804244],
       [2020.64620758, 1175.09501504],
       [2207.29041013, 1483.82146809],
       [ 999.25947867,  831.29376621],
       [2314.53894807, 1610.61560691],
       [2829.20396419, 1426.87374346],
       [1655.62157221,  728.62985827],
       [1711.31921824,  823.40781998],
       [ 838.90608181, 1046.28110475],
       [ 901.42413607,  781.104817  ],
       [ 451.66782087,  532.05512018],
       [2486.060253  , 1527.98495191],
       [2798.51190476, 1567.08923551],
       [1284.55305533,  760.17585358],
       [1079.35138081, 1303.49345057],
       [ 713.57725892,  832.30255882],
       [ 865.06689233,  770.61642087],
       [ 611.8787092 ,  853.21561376],
       [3565.29017857, 1940.59921084],
       [2920.45454545, 1728.230626  ],
       [ 708.52359209,  981.93363476],
       [ 792.44056464,  137.08473086],
       [ 372.12171053,  111.73951411],
       [1690.58035714, 1539.2421707 ],
       [ 922.27450284,  290.62243435],
       [ 360.97301136,   97.36575466],
       [ 736.77721088,   62.23804375],
       [ 529.17268786,  126.82099239],
       [1234.23549107,   72.30946634],
       [ 825.94722598,  259.25639394],
       [1294.78561047,  677.6963999 ],
       [ 715.99786932,   80.38751358],
       [ 423.50685379,   96.94177263],
       [1363.87343533,  964.31350655],
       [2075.49307036, 1381.86850838],
       [ 617.22452607,  329.05594076],
       [ 879.95173103,  183.29925875],
       [ 903.49786932,   80.90353576],
       [ 951.75289312,  112.24318425],
       [ 827.81668428,  508.84341295],
       [1702.40902965, 1509.62467979],
       [1802.83434232,  877.59688545],
       [1009.65698393,  601.88516404],
       [1498.53936039,  813.20652401],
       [ 785.60419236,  680.99685271],
       [ 696.26383764,  505.39312914],
       [ 511.87730627,  712.10340385],
       [1087.21187943,  774.4801005 ],
       [ 716.38492556,  694.43894157],
       [1642.89459885,  406.59378677],
       [1658.82402995,  885.7156183 ],
       [2958.08649289,  784.06241789],
       [1472.41257089,  718.68630045],
       [1217.11198094,  416.04509203],
       [1211.59306908,  222.08965433],
       [1761.02669783,  131.25278913],
       [1794.42084542,  511.0219202 ],
       [ 917.73177593,   37.24245717],
       [ 640.26889244,  348.26059351],
       [1551.57215558,  465.66772474],
       [2396.99777238,  326.26780501],
       [3725.06671588,  327.63435257],
       [1485.78973843, 1488.31734151],
       [1031.66986564, 1149.73662961],
       [ 694.88324176, 1182.76746538],
       [1541.82284876,  611.18272829],
       [2386.76205654,  710.5439903 ],
       [3066.91817301, 1004.65072277],
       [1062.02084332,  375.19180662],
       [ 591.82375823,  380.57416718],
       [ 316.61285363,  235.69780354],
       [ 623.47931873,  445.01984414],
       [ 378.21691176,  437.90651105],
       [1320.53450609,   61.98142231],
       [2320.18757688,  427.8118124 ],
       [2035.59348093,  688.9114003 ],
       [ 956.03241297,  110.41399843],
       [ 440.09182464,   28.70635485],
       [ 971.26582994,  455.02806315],
       [1676.99829932,  135.11377637],
       [1734.31258322,  557.0219678 ],
       [ 822.23701731,  644.89906846],
       [ 494.94363395,   53.96194693],
       [1147.77835408,   95.13631694],
       [1950.9422545 ,  190.85299681],
       [2314.54100145,  296.0713833 ],
       [ 894.4868608 ,   88.69579563],
       [ 580.39196568,   91.79478257],
       [3156.78206583,  332.77839626],
       [3688.32579972,  313.18270765],
       [3933.54485396,  155.21424184],
       [2919.3452381 ,  228.22044916],
       [ 846.37150466,   44.9196164 ],
       [1517.17687075,  671.18269131],
       [3185.18350291,  665.77052429],
       [3591.88771802,  834.20846904],
       [ 803.58403955,   88.53154462],
       [ 489.20068027,   75.8860565 ],
       [1400.94866071,   68.51900701],
       [1552.53120666,  319.54741893],
       [1524.29552023,  688.20548111],
       [ 643.68472585,   64.15240438],
       [ 475.21989175,  109.41470265],
       [1646.94888734,  136.62034684],
       [1394.30930398,  453.16206269],
       [ 423.51740057,  143.80506776],
       [ 910.06747159,   45.05668258],
       [ 761.76286073, 1183.23349718],
       [1596.90525588,   71.92158386],
       [2794.00510204,  118.31368454],
       [3195.28061224,  153.84387488],
       [1079.81178977,   23.01793203],
       [ 524.84151047,   87.25957695],
       [1602.72051148,  738.35120943],
       [2125.28564899,  936.30804244],
       [2020.64620758, 1175.09501504],
       [2207.29041013, 1483.82146809],
       [ 999.25947867,  831.29376621],
       [1644.75835756,  199.74250599],
       [1567.00680272,  338.52262425]])
X_spectral_rolloff_stats.shape
(240, 2)

Zero Crossing Rate#

The Zero Crossing Rate is the rate at which the signal changes from positive to negative or back. This feature is often used to measure the noisiness or the frequency content of a sound. A higher zero-crossing rate indicates a noisier signal or a higher frequency content.

# Feature extraction with spectral centroid 
x_zero_crossing_rate = librosa.feature.zero_crossing_rate(y=audio_signal_1, hop_length=fp).T
x_zero_crossing_rate
array([[0.0390625 ],
       [0.04443359],
       [0.04882812],
       ...,
       [0.04443359],
       [0.04443359],
       [0.04443359]])
x_zero_crossing_rate.shape
(1692, 1)

From Zero Crossing Rate matrices to tensor#

X_zero_crossing_rate_tensor = get_X_tensor_audio_features(paths=files_df['path'], method='zero_crossing_rate', hop_length=fp)
X_zero_crossing_rate_tensor
array([[[0.03857422, 0.04638672, 0.05566406, ..., 0.        ,
         0.        , 0.        ]],

       [[0.04882812, 0.05175781, 0.05566406, ..., 0.        ,
         0.        , 0.        ]],

       [[0.03125   , 0.03466797, 0.03759766, ..., 0.        ,
         0.        , 0.        ]],

       ...,

       [[0.02294922, 0.02587891, 0.02880859, ..., 0.        ,
         0.        , 0.        ]],

       [[0.00927734, 0.01074219, 0.01171875, ..., 0.        ,
         0.        , 0.        ]],

       [[0.01123047, 0.01220703, 0.015625  , ..., 0.        ,
         0.        , 0.        ]]])
X_zero_crossing_rate_tensor.shape
(240, 1, 4403)

From Zero Crossing Rate matrices to predictors matrix#

X_zero_crossing_rate_stats = get_X_audio_features(paths=files_df['path'], method='zero_crossing_rate', stats='mean-std', hop_length=fp)
X_zero_crossing_rate_stats
array([[0.13968108, 0.01811889],
       [0.12713428, 0.05410704],
       [0.16139104, 0.09023381],
       [0.13278552, 0.0150368 ],
       [0.06845982, 0.01230139],
       [0.12291739, 0.02418128],
       [0.11398559, 0.03383005],
       [0.04721614, 0.0100022 ],
       [0.06379654, 0.00702339],
       [0.03594709, 0.00396019],
       [0.06673346, 0.00343725],
       [0.03630212, 0.00146984],
       [0.06123172, 0.03862254],
       [0.05755557, 0.03619177],
       [0.11102839, 0.01048388],
       [0.06200357, 0.00509845],
       [0.04921273, 0.01197553],
       [0.06549767, 0.00737761],
       [0.05590522, 0.01658055],
       [0.12777993, 0.04488347],
       [0.11122282, 0.03122729],
       [0.05421168, 0.02222161],
       [0.02710364, 0.03646258],
       [0.0251287 , 0.0306713 ],
       [0.02515645, 0.02913848],
       [0.02123953, 0.02857775],
       [0.07877582, 0.04501191],
       [0.06794651, 0.0396902 ],
       [0.10538364, 0.03681736],
       [0.08328533, 0.03037691],
       [0.05997303, 0.02158588],
       [0.07708809, 0.01790234],
       [0.04999805, 0.0154705 ],
       [0.0608518 , 0.03226082],
       [0.05940261, 0.03613893],
       [0.03698103, 0.0348455 ],
       [0.04904827, 0.03648653],
       [0.03641628, 0.03814897],
       [0.0438722 , 0.02820492],
       [0.04142663, 0.03438384],
       [0.07770218, 0.04534396],
       [0.0969215 , 0.06244353],
       [0.02222664, 0.01578338],
       [0.02415738, 0.01778952],
       [0.02024907, 0.00672536],
       [0.02500724, 0.01242779],
       [0.01912175, 0.00606162],
       [0.05932331, 0.03690528],
       [0.06844134, 0.0444455 ],
       [0.05044686, 0.00393384],
       [0.05008215, 0.00904949],
       [0.02599162, 0.00591953],
       [0.04956218, 0.01153753],
       [0.04733304, 0.00612864],
       [0.12416814, 0.05707454],
       [0.11248489, 0.07045394],
       [0.0385546 , 0.00645496],
       [0.04952745, 0.00487907],
       [0.04086106, 0.0131053 ],
       [0.05485935, 0.00818578],
       [0.04154545, 0.007506  ],
       [0.07722042, 0.05328447],
       [0.0638752 , 0.04123212],
       [0.07812189, 0.01389549],
       [0.05926286, 0.03108077],
       [0.03692485, 0.02160651],
       [0.06672064, 0.00761959],
       [0.03954165, 0.01084108],
       [0.11126636, 0.04463073],
       [0.11067893, 0.03837935],
       [0.08972039, 0.01590184],
       [0.14395165, 0.02377742],
       [0.09803964, 0.01855576],
       [0.059948  , 0.00446716],
       [0.04035064, 0.00278837],
       [0.09153002, 0.06177192],
       [0.09320068, 0.04530613],
       [0.03140502, 0.01943803],
       [0.04172315, 0.01466549],
       [0.03991436, 0.01126458],
       [0.02519894, 0.00839441],
       [0.01699947, 0.00671597],
       [0.09296577, 0.04268649],
       [0.10410509, 0.04983582],
       [0.03064898, 0.01241794],
       [0.02730601, 0.01510759],
       [0.02487658, 0.01435425],
       [0.03393201, 0.01164079],
       [0.0239655 , 0.0050883 ],
       [0.06358676, 0.03265331],
       [0.06741337, 0.03683844],
       [0.05646525, 0.02461556],
       [0.06902146, 0.01508865],
       [0.04853818, 0.00593081],
       [0.05983517, 0.01235953],
       [0.05137608, 0.00635608],
       [0.0541583 , 0.0339362 ],
       [0.05042856, 0.0248835 ],
       [0.06452007, 0.00909583],
       [0.05463855, 0.00777592],
       [0.03005288, 0.00136787],
       [0.0474926 , 0.00606619],
       [0.02826783, 0.00242811],
       [0.05432893, 0.02864796],
       [0.06451069, 0.02921676],
       [0.06711172, 0.01126371],
       [0.05211474, 0.00967578],
       [0.02653931, 0.01364669],
       [0.05070375, 0.00636863],
       [0.02818509, 0.01122906],
       [0.0873379 , 0.05396432],
       [0.08751794, 0.04881271],
       [0.07529112, 0.00654264],
       [0.07020491, 0.01723818],
       [0.04758654, 0.01144851],
       [0.08261003, 0.00918995],
       [0.03713323, 0.00767515],
       [0.06179716, 0.02684996],
       [0.05616752, 0.02458286],
       [0.08429478, 0.02419465],
       [0.09040344, 0.04778768],
       [0.05908203, 0.04133373],
       [0.06453676, 0.04603129],
       [0.03852678, 0.01008504],
       [0.06419566, 0.04955133],
       [0.06270543, 0.03789536],
       [0.02649751, 0.00803221],
       [0.02576678, 0.01562968],
       [0.02386902, 0.00755392],
       [0.026354  , 0.00931457],
       [0.02286716, 0.00867855],
       [0.04174707, 0.02722968],
       [0.03848985, 0.02107703],
       [0.03408534, 0.02487225],
       [0.02655455, 0.0291366 ],
       [0.02094493, 0.00716199],
       [0.02362545, 0.01310268],
       [0.02340092, 0.0119515 ],
       [0.07906015, 0.04495811],
       [0.06888025, 0.0442522 ],
       [0.02407665, 0.01066833],
       [0.03110707, 0.00732316],
       [0.02085261, 0.00096723],
       [0.04219727, 0.04750712],
       [0.01970742, 0.00096511],
       [0.01981423, 0.00196725],
       [0.02229153, 0.00542349],
       [0.02281374, 0.00604543],
       [0.03097825, 0.0064902 ],
       [0.02132051, 0.00103194],
       [0.02956018, 0.00963568],
       [0.02231737, 0.00410791],
       [0.02439048, 0.00455354],
       [0.05000706, 0.02383377],
       [0.064298  , 0.03223356],
       [0.04895482, 0.00267739],
       [0.07360824, 0.01488585],
       [0.05343212, 0.00351828],
       [0.06551245, 0.00669362],
       [0.04941723, 0.00875284],
       [0.04785902, 0.03845829],
       [0.0199244 , 0.01894239],
       [0.03905646, 0.01865368],
       [0.04302701, 0.02121043],
       [0.0276773 , 0.01179262],
       [0.03782888, 0.01656477],
       [0.03066322, 0.01703873],
       [0.02207308, 0.01316066],
       [0.02220953, 0.01314401],
       [0.15428638, 0.01690589],
       [0.09630176, 0.02930831],
       [0.15509438, 0.09276395],
       [0.11556943, 0.02297215],
       [0.06960384, 0.00895991],
       [0.06418306, 0.01284583],
       [0.06860784, 0.00750579],
       [0.03513338, 0.00232827],
       [0.06745328, 0.00315454],
       [0.03571403, 0.00211246],
       [0.11620473, 0.00912301],
       [0.06388811, 0.00827902],
       [0.05453125, 0.02038243],
       [0.02789524, 0.0417326 ],
       [0.02776713, 0.02875004],
       [0.02412834, 0.03925213],
       [0.09714187, 0.03477876],
       [0.07630937, 0.02205347],
       [0.05836017, 0.01956413],
       [0.07699613, 0.01966548],
       [0.05165145, 0.01457113],
       [0.01814448, 0.00765121],
       [0.01896218, 0.00942553],
       [0.01808077, 0.00771513],
       [0.04567775, 0.00221061],
       [0.02458043, 0.00284532],
       [0.0265816 , 0.00321473],
       [0.04837032, 0.00380908],
       [0.02571981, 0.00106344],
       [0.04332938, 0.01047251],
       [0.04883743, 0.00245721],
       [0.03439034, 0.00389592],
       [0.05813668, 0.00572955],
       [0.03028121, 0.00294335],
       [0.05586397, 0.00980335],
       [0.06183207, 0.00775425],
       [0.04027009, 0.00794348],
       [0.06504198, 0.00434342],
       [0.03683134, 0.00313853],
       [0.10267928, 0.00872451],
       [0.14867247, 0.01975351],
       [0.1257674 , 0.03768157],
       [0.06207882, 0.00630047],
       [0.04324247, 0.0021459 ],
       [0.03616404, 0.00538445],
       [0.06617276, 0.01587198],
       [0.03377799, 0.00847464],
       [0.02790307, 0.00540516],
       [0.01475672, 0.00068555],
       [0.06856428, 0.00596784],
       [0.0527039 , 0.00594065],
       [0.03204028, 0.00491591],
       [0.05535936, 0.00240478],
       [0.02998272, 0.00337078],
       [0.09314423, 0.01300252],
       [0.04803051, 0.00247526],
       [0.02579498, 0.00435521],
       [0.04774683, 0.00245418],
       [0.03134067, 0.01968811],
       [0.07954392, 0.00361803],
       [0.08026746, 0.02211988],
       [0.02960512, 0.00139591],
       [0.07879223, 0.00932616],
       [0.03649997, 0.00809681],
       [0.08429478, 0.02419465],
       [0.09040344, 0.04778768],
       [0.05908203, 0.04133373],
       [0.06453676, 0.04603129],
       [0.03852678, 0.01008504],
       [0.0267845 , 0.00542232],
       [0.02100207, 0.00238617]])
X_zero_crossing_rate_stats.shape
(240, 2)

Tempogram#

A tempogram provides a time-tempo representation, showing how the tempo of a music piece or any audio signal varies over time. It is essentially a two-dimensional feature that maps tempo changes over time, offering a detailed view of the rhythmic dynamics within the audio. This analysis is crucial for understanding the structure and expression in music, as well as the articulation in speech or other sounds.

# Feature extraction with tempogram
x_tempogram = librosa.feature.tempogram(y=audio_signal_1, hop_length=fp).T
x_tempogram
array([[ 1.00000000e+00,  9.41933158e-01,  8.76072103e-01, ...,
         1.59134570e-17, -3.85780776e-17,  6.93578724e-17],
       [ 1.00000000e+00,  9.42431885e-01,  8.76818837e-01, ...,
        -7.22920084e-17, -5.24033103e-17, -2.94649892e-17],
       [ 1.00000000e+00,  9.42927130e-01,  8.77559350e-01, ...,
         2.42083309e-17,  4.45072418e-17,  6.77565953e-17],
       ...,
       [ 1.00000000e+00,  9.82748236e-01,  9.40765518e-01, ...,
         1.59796469e-13,  1.67667967e-14, -6.19583679e-17],
       [ 1.00000000e+00,  9.82782184e-01,  9.40875757e-01, ...,
         1.23167723e-13,  1.18727972e-14, -2.76223874e-17],
       [ 1.00000000e+00,  9.82816140e-01,  9.40986261e-01, ...,
         8.18619518e-14,  6.58631321e-15, -5.36289546e-17]])
x_tempogram.shape
(1692, 384)

From Tempogram matrices to tensor#

X_tempogram_tensor = get_X_tensor_audio_features(paths=files_df['path'], method='tempogram', sr=fs, hop_length=fp)
X_tempogram_tensor
array([[[ 1.00000000e+00,  1.00000000e+00,  1.00000000e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 9.74025825e-01,  9.73998168e-01,  9.73970572e-01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 9.24148727e-01,  9.23955753e-01,  9.23763154e-01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        ...,
        [-1.28282802e-17,  1.08569700e-16,  1.05896581e-16, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [-5.90230469e-17,  2.06738282e-17,  7.99315540e-17, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 1.64078887e-17,  3.83782012e-17,  8.48954562e-17, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00]],

       [[ 1.00000000e+00,  1.00000000e+00,  1.00000000e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 9.53294547e-01,  9.53346317e-01,  9.53398304e-01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 8.48126253e-01,  8.48283160e-01,  8.48440825e-01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        ...,
        [ 4.15936158e-17,  2.20724551e-17,  3.39125953e-17, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 4.23709512e-17,  1.30717984e-17,  1.46523866e-17, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 6.67027846e-17,  3.10240619e-17,  4.69924712e-17, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00]],

       [[ 1.00000000e+00,  1.00000000e+00,  1.00000000e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 9.64979117e-01,  9.65065593e-01,  9.65152299e-01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 9.16047076e-01,  9.16234321e-01,  9.16422126e-01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        ...,
        [ 4.93275170e-17, -2.60084241e-17,  9.42100486e-19, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 6.90084451e-17,  5.23157957e-18, -4.78983721e-17, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 2.60784817e-17,  5.12694797e-17, -1.12246315e-17, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00]],

       ...,

       [[ 1.00000000e+00,  1.00000000e+00,  1.00000000e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 9.82744870e-01,  9.82783156e-01,  9.82820940e-01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 9.46394249e-01,  9.46493980e-01,  9.46592393e-01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        ...,
        [ 4.48524823e-17, -6.33349599e-17,  2.18787322e-17, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [-5.60620420e-17, -1.08045290e-16, -3.81772827e-17, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [-6.32051096e-19, -1.12831175e-16, -3.46585914e-17, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00]],

       [[ 1.00000000e+00,  1.00000000e+00,  1.00000000e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 9.42840089e-01,  9.43384184e-01,  9.43925014e-01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 8.96773308e-01,  8.97585307e-01,  8.98392989e-01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        ...,
        [-1.00166280e-16, -1.63361461e-16, -3.34007391e-17, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [-8.14673642e-17, -1.15189864e-16, -6.16683675e-17, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [-7.50167651e-17, -7.13336301e-17, -5.65139450e-17, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00]],

       [[ 1.00000000e+00,  1.00000000e+00,  1.00000000e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 9.51249855e-01,  9.51799326e-01,  9.52344081e-01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 9.10201027e-01,  9.11061540e-01,  9.11915205e-01, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        ...,
        [-1.94385523e-16,  9.77787920e-17,  3.71273595e-18, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [-1.31127758e-16,  1.35390064e-16, -9.61137305e-17, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [-1.75107854e-16,  1.55994476e-16, -3.74720293e-17, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00]]])
X_tempogram_tensor.shape
(240, 384, 4403)

From Tempogram matrices to predictors matrix#

X_tempogram_stats = get_X_audio_features(paths=files_df['path'], method='tempogram', stats='mean-std', sr=fs, hop_length=fp)
X_tempogram_stats
array([[1.00000000e+00, 9.82917312e-01, 9.40125605e-01, ...,
        1.97084272e-10, 2.59539915e-11, 9.25080531e-17],
       [1.00000000e+00, 9.82942999e-01, 9.42317549e-01, ...,
        1.96826479e-10, 2.51983908e-11, 9.82804325e-17],
       [1.00000000e+00, 9.88717836e-01, 9.62690178e-01, ...,
        2.05286095e-10, 2.61386952e-11, 1.08429054e-16],
       ...,
       [1.00000000e+00, 9.84447901e-01, 9.48210334e-01, ...,
        1.78410464e-11, 2.02061978e-12, 8.59678112e-17],
       [1.00000000e+00, 9.88605317e-01, 9.69314379e-01, ...,
        1.18753587e-10, 1.49967598e-11, 1.25303448e-16],
       [1.00000000e+00, 9.90035415e-01, 9.72834162e-01, ...,
        1.21425552e-10, 1.53642920e-11, 1.30194034e-16]])
X_tempogram_stats.shape
(240, 768)