Core methods

Core IO and DSP

Audio processing

Spectral representations

Magnitude scaling

Time and frequency conversion

Pitch and tuning

Dynamic Time Warping

Deprecated

librosa.core.A_weighting(frequencies, min_db=-80.0)

Compute the A-weighting of a set of frequencies.

frequencies : scalar or np.ndarray [shape=(n,)]
One or more frequencies (in Hz)
min_db : float [scalar] or None
Clip weights below this threshold. If None, no clipping is performed.
A_weighting : scalar or np.ndarray [shape=(n,)]
A_weighting[i] is the A-weighting of frequencies[i]

perceptual_weighting

Get the A-weighting for CQT frequencies

>>> import matplotlib.pyplot as plt
>>> freqs = librosa.cqt_frequencies(108, librosa.note_to_hz('C1'))
>>> aw = librosa.A_weighting(freqs)
>>> plt.plot(freqs, aw)
>>> plt.xlabel('Frequency (Hz)')
>>> plt.ylabel('Weighting (log10)')
>>> plt.title('A-Weighting of CQT frequencies')
librosa.core.amplitude_to_db(S, ref=1.0, amin=1e-05, top_db=80.0)

Convert an amplitude spectrogram to dB-scaled spectrogram.

This is equivalent to power_to_db(S**2), but is provided for convenience.

S : np.ndarray
input amplitude
ref : scalar or callable

If scalar, the amplitude abs(S) is scaled relative to ref: 20 * log10(S / ref). Zeros in the output correspond to positions where S == ref.

If callable, the reference value is computed as ref(S).

amin : float > 0 [scalar]
minimum threshold for S and ref
top_db : float >= 0 [scalar]
threshold the output at top_db below the peak: max(20 * log10(S)) - top_db
S_db : np.ndarray
S measured in dB

logamplitude, power_to_db, db_to_amplitude

This function caches at level 30.

librosa.core.autocorrelate(y, max_size=None, axis=-1)

Bounded auto-correlation

y : np.ndarray
array to autocorrelate
max_size : int > 0 or None
maximum correlation lag. If unspecified, defaults to y.shape[axis] (unbounded)
axis : int
The axis along which to autocorrelate. By default, the last axis (-1) is taken.
z : np.ndarray
truncated autocorrelation y*y along the specified axis. If max_size is specified, then z.shape[axis] is bounded to max_size.

This function caches at level 20.

Compute full autocorrelation of y

>>> y, sr = librosa.load(librosa.util.example_audio_file(), offset=20, duration=10)
>>> librosa.autocorrelate(y)
array([  3.226e+03,   3.217e+03, ...,   8.277e-04,   3.575e-04], dtype=float32)

Compute onset strength auto-correlation up to 4 seconds

>>> import matplotlib.pyplot as plt
>>> odf = librosa.onset.onset_strength(y=y, sr=sr, hop_length=512)
>>> ac = librosa.autocorrelate(odf, max_size=4* sr / 512)
>>> plt.plot(ac)
>>> plt.title('Auto-correlation')
>>> plt.xlabel('Lag (frames)')
librosa.core.clicks(times=None, frames=None, sr=22050, hop_length=512, click_freq=1000.0, click_duration=0.1, click=None, length=None)

Returns a signal with the signal click placed at each specified time

times : np.ndarray or None
times to place clicks, in seconds
frames : np.ndarray or None
frame indices to place clicks
sr : number > 0
desired sampling rate of the output signal
hop_length : int > 0
if positions are specified by frames, the number of samples between frames.
click_freq : float > 0
frequency (in Hz) of the default click signal. Default is 1KHz.
click_duration : float > 0
duration (in seconds) of the default click signal. Default is 100ms.
click : np.ndarray or None
optional click signal sample to use instead of the default blip.
length : int > 0
desired number of samples in the output signal
click_signal : np.ndarray
Synthesized click signal
ParameterError
  • If neither times nor frames are provided.
  • If any of click_freq, click_duration, or length are out of range.
>>> # Sonify detected beat events
>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> tempo, beats = librosa.beat.beat_track(y=y, sr=sr)
>>> y_beats = librosa.clicks(frames=beats, sr=sr)
>>> # Or generate a signal of the same length as y
>>> y_beats = librosa.clicks(frames=beats, sr=sr, length=len(y))
>>> # Or use timing instead of frame indices
>>> times = librosa.frames_to_time(beats, sr=sr)
>>> y_beat_times = librosa.clicks(times=times, sr=sr)
>>> # Or with a click frequency of 880Hz and a 500ms sample
>>> y_beat_times880 = librosa.clicks(times=times, sr=sr,
...                                  click_freq=880, click_duration=0.5)

Display click waveform next to the spectrogram

>>> import matplotlib.pyplot as plt
>>> plt.figure()
>>> S = librosa.feature.melspectrogram(y=y, sr=sr)
>>> ax = plt.subplot(2,1,2)
>>> librosa.display.specshow(librosa.power_to_db(S, ref=np.max),
...                          x_axis='time', y_axis='mel')
>>> plt.subplot(2,1,1, sharex=ax)
>>> librosa.display.waveplot(y_beat_times, sr=sr, label='Beat clicks')
>>> plt.legend()
>>> plt.xlim(15, 30)
>>> plt.tight_layout()
librosa.core.cqt(y, sr=22050, hop_length=512, fmin=None, n_bins=84, bins_per_octave=12, tuning=0.0, filter_scale=1, norm=1, sparsity=0.01, window='hann', scale=True, real=<DEPRECATED parameter>, pad_mode='reflect')

Compute the constant-Q transform of an audio signal.

This implementation is based on the recursive sub-sampling method described by [1]_.

[1]Schoerkhuber, Christian, and Anssi Klapuri. “Constant-Q transform toolbox for music processing.” 7th Sound and Music Computing Conference, Barcelona, Spain. 2010.
y : np.ndarray [shape=(n,)]
audio time series
sr : number > 0 [scalar]
sampling rate of y
hop_length : int > 0 [scalar]
number of samples between successive CQT columns.
fmin : float > 0 [scalar]
Minimum frequency. Defaults to C1 ~= 32.70 Hz
n_bins : int > 0 [scalar]
Number of frequency bins, starting at fmin
bins_per_octave : int > 0 [scalar]
Number of bins per octave
tuning : None or float in [-0.5, 0.5)

Tuning offset in fractions of a bin (cents).

If None, tuning will be automatically estimated from the signal.

filter_scale : float > 0
Filter scale factor. Small values (<1) use shorter windows for improved time resolution.
norm : {inf, -inf, 0, float > 0}
Type of norm to use for basis function normalization. See librosa.util.normalize.
sparsity : float in [0, 1)

Sparsify the CQT basis by discarding up to sparsity fraction of the energy in each basis.

Set sparsity=0 to disable sparsification.

window : str, tuple, number, or function
Window specification for the basis filters. See filters.get_window for details.
scale : bool

If True, scale the CQT response by square-root the length of each channel’s filter. This is analogous to norm=’ortho’ in FFT.

If False, do not scale the CQT. This is analogous to norm=None in FFT.

real : bool [DEPRECATED]

If False, return a complex-valued constant-Q transform (default).

If True, return the CQT magnitude.

Warning

This parameter is deprecated in librosa 0.5.0 It will be removed in librosa 0.6.0.

pad_mode : string

Padding mode for centered frame analysis.

See also: librosa.core.stft and np.pad.

CQT : np.ndarray [shape=(n_bins, t), dtype=np.complex or np.float]
Constant-Q value each frequency at each time.
ParameterError

If hop_length is not an integer multiple of 2**(n_bins / bins_per_octave)

Or if y is too short to support the frequency range of the CQT.

librosa.core.resample librosa.util.normalize

This function caches at level 20.

Generate and plot a constant-Q power spectrum

>>> import matplotlib.pyplot as plt
>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> C = librosa.cqt(y, sr=sr)
>>> librosa.display.specshow(librosa.amplitude_to_db(C, ref=np.max),
...                          sr=sr, x_axis='time', y_axis='cqt_note')
>>> plt.colorbar(format='%+2.0f dB')
>>> plt.title('Constant-Q power spectrum')
>>> plt.tight_layout()

Limit the frequency range

>>> C = librosa.cqt(y, sr=sr, fmin=librosa.note_to_hz('C2'),
...                 n_bins=60)
>>> C
array([[  8.827e-04,   9.293e-04, ...,   3.133e-07,   2.942e-07],
       [  1.076e-03,   1.068e-03, ...,   1.153e-06,   1.148e-06],
       ...,
       [  1.042e-07,   4.087e-07, ...,   1.612e-07,   1.928e-07],
       [  2.363e-07,   5.329e-07, ...,   1.294e-07,   1.611e-07]])

Using a higher frequency resolution

>>> C = librosa.cqt(y, sr=sr, fmin=librosa.note_to_hz('C2'),
...                 n_bins=60 * 2, bins_per_octave=12 * 2)
>>> C
array([[  1.536e-05,   5.848e-05, ...,   3.241e-07,   2.453e-07],
       [  1.856e-03,   1.854e-03, ...,   2.397e-08,   3.549e-08],
       ...,
       [  2.034e-07,   4.245e-07, ...,   6.213e-08,   1.463e-07],
       [  4.896e-08,   5.407e-07, ...,   9.176e-08,   1.051e-07]])
librosa.core.cqt_frequencies(n_bins, fmin, bins_per_octave=12, tuning=0.0)

Compute the center frequencies of Constant-Q bins.

>>> # Get the CQT frequencies for 24 notes, starting at C2
>>> librosa.cqt_frequencies(24, fmin=librosa.note_to_hz('C2'))
array([  65.406,   69.296,   73.416,   77.782,   82.407,   87.307,
         92.499,   97.999,  103.826,  110.   ,  116.541,  123.471,
        130.813,  138.591,  146.832,  155.563,  164.814,  174.614,
        184.997,  195.998,  207.652,  220.   ,  233.082,  246.942])
n_bins : int > 0 [scalar]
Number of constant-Q bins
fmin : float > 0 [scalar]
Minimum frequency
bins_per_octave : int > 0 [scalar]
Number of bins per octave
tuning : float in [-0.5, +0.5)
Deviation from A440 tuning in fractional bins (cents)
frequencies : np.ndarray [shape=(n_bins,)]
Center frequency for each CQT bin
librosa.core.db_to_amplitude(S_db, ref=1.0)

Convert a dB-scaled spectrogram to an amplitude spectrogram.

This effectively inverts amplitude_to_db:

db_to_amplitude(S_db) ~= 10.0**(0.5 * (S_db + log10(ref)/10))
S_db : np.ndarray
dB-scaled spectrogram
ref: number > 0
Optional reference power.
S : np.ndarray
Linear magnitude spectrogram

This function caches at level 30.

librosa.core.db_to_power(S_db, ref=1.0)

Convert a dB-scale spectrogram to a power spectrogram.

This effectively inverts power_to_db:

db_to_power(S_db) ~= ref * 10.0**(S_db / 10)
S_db : np.ndarray
dB-scaled spectrogram
ref : number > 0
Reference power: output will be scaled by this value
S : np.ndarray
Power spectrogram

This function caches at level 30.

librosa.core.dtw(X, Y, metric='euclidean', step_sizes_sigma=None, weights_add=None, weights_mul=None, subseq=False, backtrack=True, global_constraints=False, band_rad=0.25)

Dynamic time warping (DTW).

This function performs a DTW and path backtracking on two sequences. We follow the nomenclature and algorithmic approach as described in [1].

[1]Meinard Mueller Fundamentals of Music Processing — Audio, Analysis, Algorithms, Applications Springer Verlag, ISBN: 978-3-319-21944-8, 2015.
X : np.ndarray [shape=(K, N)]
audio feature matrix (e.g., chroma features)
Y : np.ndarray [shape=(K, M)]
audio feature matrix (e.g., chroma features)
metric : str
Identifier for the cost-function as documented in scipy.spatial.cdist()
step_sizes_sigma : np.ndarray [shape=[n, 2]]
Specifies allowed step sizes as used by the dtw.
weights_add : np.ndarray [shape=[n, ]]
Additive weights to penalize certain step sizes.
weights_mul : np.ndarray [shape=[n, ]]
Multiplicative weights to penalize certain step sizes.
subseq : binary
Enable subsequence DTW, e.g., for retrieval tasks.
backtrack : binary
Enable backtracking in accumulated cost matrix.
global_constraints : binary
Applies global constraints to the cost matrix C (Sakoe-Chiba band).
band_rad : float
The Sakoe-Chiba band radius (1/2 of the width) will be int(radius*min(C.shape)).
D : np.ndarray [shape=(N,M)]
accumulated cost matrix. D[N,M] is the total alignment cost. When doing subsequence DTW, D[N,:] indicates a matching function.
wp : np.ndarray [shape=(N,2)]
Warping path with index pairs. Each row of the array contains an index pair n,m). Only returned when backtrack is True.
ParameterError
If you are doing diagonal matching and Y is shorter than X
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> y, sr = librosa.load(librosa.util.example_audio_file(), offset=10, duration=15)
>>> X = librosa.feature.chroma_cens(y=y, sr=sr)
>>> noise = np.random.rand(X.shape[0], 200)
>>> Y = np.concatenate((noise, noise, X, noise), axis=1)
>>> D, wp = librosa.dtw(X, Y, subseq=True)
>>> plt.subplot(2, 1, 1)
>>> librosa.display.specshow(D, x_axis='frames', y_axis='frames')
>>> plt.title('Database excerpt')
>>> plt.plot(wp[:, 1], wp[:, 0], label='Optimal path', color='y')
>>> plt.legend()
>>> plt.subplot(2, 1, 2)
>>> plt.plot(D[-1, :] / wp.shape[0])
>>> plt.xlim([0, Y.shape[1]])
>>> plt.ylim([0, 2])
>>> plt.title('Matching cost function')
>>> plt.tight_layout()
librosa.core.estimate_tuning(y=None, sr=22050, S=None, n_fft=2048, resolution=0.01, bins_per_octave=12, **kwargs)

Estimate the tuning of an audio time series or spectrogram input.

y: np.ndarray [shape=(n,)] or None
audio signal
sr : number > 0 [scalar]
audio sampling rate of y
S: np.ndarray [shape=(d, t)] or None
magnitude or power spectrogram
n_fft : int > 0 [scalar] or None
number of FFT bins to use, if y is provided.
resolution : float in (0, 1)
Resolution of the tuning as a fraction of a bin. 0.01 corresponds to measurements in cents.
bins_per_octave : int > 0 [scalar]
How many frequency bins per octave
kwargs : additional keyword arguments
Additional arguments passed to piptrack
tuning: float in [-0.5, 0.5)
estimated tuning deviation (fractions of a bin)
piptrack
Pitch tracking by parabolic interpolation
>>> # With time-series input
>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> librosa.estimate_tuning(y=y, sr=sr)
0.089999999999999969
>>> # In tenths of a cent
>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> librosa.estimate_tuning(y=y, sr=sr, resolution=1e-3)
0.093999999999999972
>>> # Using spectrogram input
>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> S = np.abs(librosa.stft(y))
>>> librosa.estimate_tuning(S=S, sr=sr)
0.089999999999999969
>>> # Using pass-through arguments to `librosa.piptrack`
>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> librosa.estimate_tuning(y=y, sr=sr, n_fft=8192,
...                         fmax=librosa.note_to_hz('G#9'))
0.070000000000000062
librosa.core.fft_frequencies(sr=22050, n_fft=2048)

Alternative implementation of np.fft.fftfreqs

sr : number > 0 [scalar]
Audio sampling rate
n_fft : int > 0 [scalar]
FFT window size
freqs : np.ndarray [shape=(1 + n_fft/2,)]
Frequencies (0, sr/n_fft, 2*sr/n_fft, ..., sr/2)
>>> librosa.fft_frequencies(sr=22050, n_fft=16)
array([     0.   ,   1378.125,   2756.25 ,   4134.375,
         5512.5  ,   6890.625,   8268.75 ,   9646.875,  11025.   ])
librosa.core.fill_off_diagonal(x, radius, value=0)

Sets all cells of a matrix to a given value if they lie outside a constraint region. In this case, the constraint region is the Sakoe-Chiba band which runs with a fixed radius along the main diagonal. When x.shape[0] != x.shape[1], the radius will be expanded so that x[-1, -1] = 1 always.

x will be modified in place.

x : np.ndarray [shape=(N, M)]
Input matrix, will be modified in place.
radius : float
The band radius (1/2 of the width) will be int(radius*min(x.shape)).
value : int
x[n, m] = value when (n, m) lies outside the band.
>>> x = np.ones((8, 8))
>>> global_constraints(x, 0.25)
>>> x
array([[1, 1, 0, 0, 0, 0, 0, 0],
       [1, 1, 1, 0, 0, 0, 0, 0],
       [0, 1, 1, 1, 0, 0, 0, 0],
       [0, 0, 1, 1, 1, 0, 0, 0],
       [0, 0, 0, 1, 1, 1, 0, 0],
       [0, 0, 0, 0, 1, 1, 1, 0],
       [0, 0, 0, 0, 0, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 1, 1]])
>>> x = np.ones((8, 12))
>>> global_constraints(x, 0.25)
>>> x
array([[1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0],
       [0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0],
       [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
       [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0],
       [0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0],
       [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]])
librosa.core.fmt(y, t_min=0.5, n_fmt=None, kind='cubic', beta=0.5, over_sample=1, axis=-1)

The fast Mellin transform (FMT) [1]_ of a uniformly sampled signal y.

When the Mellin parameter (beta) is 1/2, it is also known as the scale transform [2]. The scale transform can be useful for audio analysis because its magnitude is invariant to scaling of the domain (e.g., time stretching or compression). This is analogous to the magnitude of the Fourier transform being invariant to shifts in the input domain.

[1]De Sena, Antonio, and Davide Rocchesso. “A fast Mellin and scale transform.” EURASIP Journal on Applied Signal Processing 2007.1 (2007): 75-75.
[2]Cohen, L. “The scale representation.” IEEE Transactions on Signal Processing 41, no. 12 (1993): 3275-3292.
y : np.ndarray, real-valued
The input signal(s). Can be multidimensional. The target axis must contain at least 3 samples.
t_min : float > 0
The minimum time spacing (in samples). This value should generally be less than 1 to preserve as much information as possible.
n_fmt : int > 2 or None
The number of scale transform bins to use. If None, then n_bins = over_sample * ceil(n * log((n-1)/t_min)) is taken, where n = y.shape[axis]
kind : str

The type of interpolation to use when re-sampling the input. See scipy.interpolate.interp1d for possible values.

Note that the default is to use high-precision (cubic) interpolation. This can be slow in practice; if speed is preferred over accuracy, then consider using kind=’linear’.

beta : float
The Mellin parameter. beta=0.5 provides the scale transform.
over_sample : float >= 1
Over-sampling factor for exponential resampling.
axis : int
The axis along which to transform y
x_scale : np.ndarray [dtype=complex]
The scale transform of y along the axis dimension.
ParameterError
if n_fmt < 2 or t_min <= 0 or if y is not finite or if y.shape[axis] < 3.

This function caches at level 30.

>>> # Generate a signal and time-stretch it (with energy normalization)
>>> scale = 1.25
>>> freq = 3.0
>>> x1 = np.linspace(0, 1, num=1024, endpoint=False)
>>> x2 = np.linspace(0, 1, num=scale * len(x1), endpoint=False)
>>> y1 = np.sin(2 * np.pi * freq * x1)
>>> y2 = np.sin(2 * np.pi * freq * x2) / np.sqrt(scale)
>>> # Verify that the two signals have the same energy
>>> np.sum(np.abs(y1)**2), np.sum(np.abs(y2)**2)
    (255.99999999999997, 255.99999999999969)
>>> scale1 = librosa.fmt(y1, n_fmt=512)
>>> scale2 = librosa.fmt(y2, n_fmt=512)
>>> # And plot the results
>>> import matplotlib.pyplot as plt
>>> plt.figure(figsize=(8, 4))
>>> plt.subplot(1, 2, 1)
>>> plt.plot(y1, label='Original')
>>> plt.plot(y2, linestyle='--', label='Stretched')
>>> plt.xlabel('time (samples)')
>>> plt.title('Input signals')
>>> plt.legend(frameon=True)
>>> plt.axis('tight')
>>> plt.subplot(1, 2, 2)
>>> plt.semilogy(np.abs(scale1), label='Original')
>>> plt.semilogy(np.abs(scale2), linestyle='--', label='Stretched')
>>> plt.xlabel('scale coefficients')
>>> plt.title('Scale transform magnitude')
>>> plt.legend(frameon=True)
>>> plt.axis('tight')
>>> plt.tight_layout()
>>> # Plot the scale transform of an onset strength autocorrelation
>>> y, sr = librosa.load(librosa.util.example_audio_file(),
...                      offset=10.0, duration=30.0)
>>> odf = librosa.onset.onset_strength(y=y, sr=sr)
>>> # Auto-correlate with up to 10 seconds lag
>>> odf_ac = librosa.autocorrelate(odf, max_size=10 * sr // 512)
>>> # Normalize
>>> odf_ac = librosa.util.normalize(odf_ac, norm=np.inf)
>>> # Compute the scale transform
>>> odf_ac_scale = librosa.fmt(librosa.util.normalize(odf_ac), n_fmt=512)
>>> # Plot the results
>>> plt.figure()
>>> plt.subplot(3, 1, 1)
>>> plt.plot(odf, label='Onset strength')
>>> plt.axis('tight')
>>> plt.xlabel('Time (frames)')
>>> plt.xticks([])
>>> plt.legend(frameon=True)
>>> plt.subplot(3, 1, 2)
>>> plt.plot(odf_ac, label='Onset autocorrelation')
>>> plt.axis('tight')
>>> plt.xlabel('Lag (frames)')
>>> plt.xticks([])
>>> plt.legend(frameon=True)
>>> plt.subplot(3, 1, 3)
>>> plt.semilogy(np.abs(odf_ac_scale), label='Scale transform magnitude')
>>> plt.axis('tight')
>>> plt.xlabel('scale coefficients')
>>> plt.legend(frameon=True)
>>> plt.tight_layout()
librosa.core.frames_to_samples(frames, hop_length=512, n_fft=None)

Converts frame indices to audio sample indices

frames : np.ndarray [shape=(n,)]
vector of frame indices
hop_length : int > 0 [scalar]
number of samples between successive frames
n_fft : None or int > 0 [scalar]
Optional: length of the FFT window. If given, time conversion will include an offset of n_fft / 2 to counteract windowing effects when using a non-centered STFT.
times : np.ndarray [shape=(n,)]
time (in samples) of each given frame number: times[i] = frames[i] * hop_length

frames_to_time : convert frame indices to time values samples_to_frames : convert sample indices to frame indices

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> tempo, beats = librosa.beat.beat_track(y, sr=sr)
>>> beat_samples = librosa.frames_to_samples(beats)
librosa.core.frames_to_time(frames, sr=22050, hop_length=512, n_fft=None)

Converts frame counts to time (seconds)

frames : np.ndarray [shape=(n,)]
vector of frame numbers
sr : number > 0 [scalar]
audio sampling rate
hop_length : int > 0 [scalar]
number of samples between successive frames
n_fft : None or int > 0 [scalar]
Optional: length of the FFT window. If given, time conversion will include an offset of n_fft / 2 to counteract windowing effects when using a non-centered STFT.
times : np.ndarray [shape=(n,)]
time (in seconds) of each given frame number: times[i] = frames[i] * hop_length / sr

time_to_frames : convert time values to frame indices frames_to_samples : convert frame indices to sample indices

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> tempo, beats = librosa.beat.beat_track(y, sr=sr)
>>> beat_times = librosa.frames_to_time(beats, sr=sr)
librosa.core.get_duration(y=None, sr=22050, S=None, n_fft=2048, hop_length=512, center=True, filename=None)

Compute the duration (in seconds) of an audio time series, feature matrix, or filename.

>>> # Load the example audio file
>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> librosa.get_duration(y=y, sr=sr)
61.44
>>> # Or directly from an audio file
>>> librosa.get_duration(filename=librosa.util.example_audio_file())
61.4
>>> # Or compute duration from an STFT matrix
>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> S = librosa.stft(y)
>>> librosa.get_duration(S=S, sr=sr)
61.44
>>> # Or a non-centered STFT matrix
>>> S_left = librosa.stft(y, center=False)
>>> librosa.get_duration(S=S_left, sr=sr)
61.3471201814059
y : np.ndarray [shape=(n,), (2, n)] or None
audio time series
sr : number > 0 [scalar]
audio sampling rate of y
S : np.ndarray [shape=(d, t)] or None
STFT matrix, or any STFT-derived matrix (e.g., chromagram or mel spectrogram).
n_fft : int > 0 [scalar]
FFT window size for S
hop_length : int > 0 [ scalar]
number of audio samples between columns of S
center : boolean
  • If True, S[:, t] is centered at y[t * hop_length]
  • If False, then S[:, t] begins at y[t * hop_length]
filename : str
If provided, all other parameters are ignored, and the duration is calculated directly from the audio file. Note that this avoids loading the contents into memory, and is therefore useful for querying the duration of long files.
d : float >= 0
Duration (in seconds) of the input time series or spectrogram.
librosa.core.hybrid_cqt(y, sr=22050, hop_length=512, fmin=None, n_bins=84, bins_per_octave=12, tuning=0.0, filter_scale=1, norm=1, sparsity=0.01, window='hann', scale=True, pad_mode='reflect')

Compute the hybrid constant-Q transform of an audio signal.

Here, the hybrid CQT uses the pseudo CQT for higher frequencies where the hop_length is longer than half the filter length and the full CQT for lower frequencies.

y : np.ndarray [shape=(n,)]
audio time series
sr : number > 0 [scalar]
sampling rate of y
hop_length : int > 0 [scalar]
number of samples between successive CQT columns.
fmin : float > 0 [scalar]
Minimum frequency. Defaults to C1 ~= 32.70 Hz
n_bins : int > 0 [scalar]
Number of frequency bins, starting at fmin
bins_per_octave : int > 0 [scalar]
Number of bins per octave
tuning : None or float in [-0.5, 0.5)

Tuning offset in fractions of a bin (cents).

If None, tuning will be automatically estimated from the signal.

filter_scale : float > 0
Filter filter_scale factor. Larger values use longer windows.
sparsity : float in [0, 1)

Sparsify the CQT basis by discarding up to sparsity fraction of the energy in each basis.

Set sparsity=0 to disable sparsification.

window : str, tuple, number, or function
Window specification for the basis filters. See filters.get_window for details.
pad_mode : string

Padding mode for centered frame analysis.

See also: librosa.core.stft and np.pad.

CQT : np.ndarray [shape=(n_bins, t), dtype=np.float]
Constant-Q energy for each frequency at each time.
ParameterError

If hop_length is not an integer multiple of 2**(n_bins / bins_per_octave)

Or if y is too short to support the frequency range of the CQT.

cqt pseudo_cqt

This function caches at level 20.

librosa.core.hz_to_mel(frequencies, htk=False)

Convert Hz to Mels

>>> librosa.hz_to_mel(60)
array([ 0.9])
>>> librosa.hz_to_mel([110, 220, 440])
array([ 1.65,  3.3 ,  6.6 ])
frequencies : np.ndarray [shape=(n,)] , float
scalar or array of frequencies
htk : bool
use HTK formula instead of Slaney
mels : np.ndarray [shape=(n,)]
input frequencies in Mels

mel_to_hz

librosa.core.hz_to_midi(frequencies)

Get the closest MIDI note number(s) for given frequencies

>>> librosa.hz_to_midi(60)
array([ 34.506])
>>> librosa.hz_to_midi([110, 220, 440])
array([ 45.,  57.,  69.])
frequencies : float or np.ndarray [shape=(n,), dtype=float]
frequencies to convert
note_nums : np.ndarray [shape=(n,), dtype=int]
closest MIDI notes to frequencies

midi_to_hz note_to_midi hz_to_note

librosa.core.hz_to_note(frequencies, **kwargs)

Convert one or more frequencies (in Hz) to the nearest note names.

frequencies : float or iterable of float
Input frequencies, specified in Hz
kwargs : additional keyword arguments
Arguments passed through to midi_to_note
notes : list of str
notes[i] is the closest note name to frequency[i] (or frequency if the input is scalar)

hz_to_midi midi_to_note note_to_hz

Get a single note name for a frequency

>>> librosa.hz_to_note(440.0)
['A5']

Get multiple notes with cent deviation

>>> librosa.hz_to_note([32, 64], cents=True)
['C1-38', 'C2-38']

Get multiple notes, but suppress octave labels

>>> librosa.hz_to_note(440.0 * (2.0 ** np.linspace(0, 1, 12)),
...                    octave=False)
['A', 'A#', 'B', 'C', 'C#', 'D', 'E', 'F', 'F#', 'G', 'G#', 'A']
librosa.core.hz_to_octs(frequencies, A440=440.0)

Convert frequencies (Hz) to (fractional) octave numbers.

>>> librosa.hz_to_octs(440.0)
array([ 4.])
>>> librosa.hz_to_octs([32, 64, 128, 256])
array([ 0.219,  1.219,  2.219,  3.219])
frequencies : np.ndarray [shape=(n,)] or float
scalar or vector of frequencies
A440 : float
frequency of A440 (in Hz)
octaves : np.ndarray [shape=(n,)]
octave number for each frequency

octs_to_hz

librosa.core.ifgram(y, sr=22050, n_fft=2048, hop_length=None, win_length=None, window='hann', norm=False, center=True, ref_power=1e-06, clip=True, dtype=<class 'numpy.complex64'>, pad_mode='reflect')

Compute the instantaneous frequency (as a proportion of the sampling rate) obtained as the time-derivative of the phase of the complex spectrum as described by [1]_.

Calculates regular STFT as a side effect.

[1]Abe, Toshihiko, Takao Kobayashi, and Satoshi Imai. “Harmonics tracking and pitch extraction based on instantaneous frequency.” International Conference on Acoustics, Speech, and Signal Processing, ICASSP-95., Vol. 1. IEEE, 1995.
y : np.ndarray [shape=(n,)]
audio time series
sr : number > 0 [scalar]
sampling rate of y
n_fft : int > 0 [scalar]
FFT window size
hop_length : int > 0 [scalar]
hop length, number samples between subsequent frames. If not supplied, defaults to win_length / 4.
win_length : int > 0, <= n_fft
Window length. Defaults to n_fft. See stft for details.
window : string, tuple, number, function, or np.ndarray [shape=(n_fft,)]
  • a window specification (string, tuple, number); see scipy.signal.get_window
  • a window function, such as scipy.signal.hanning
  • a user-specified window vector of length n_fft

See stft for details.

norm : bool
Normalize the STFT.
center : boolean
  • If True, the signal y is padded so that frame
    D[:, t] (and if_gram) is centered at y[t * hop_length].
  • If False, then D[:, t] at y[t * hop_length]
ref_power : float >= 0 or callable

Minimum power threshold for estimating instantaneous frequency. Any bin with np.abs(D[f, t])**2 < ref_power will receive the default frequency estimate.

If callable, the threshold is set to ref_power(np.abs(D)**2).

clip : boolean
  • If True, clip estimated frequencies to the range [0, 0.5 * sr].
  • If False, estimated frequencies can be negative or exceed 0.5 * sr.
dtype : numeric type
Complex numeric type for D. Default is 64-bit complex.
mode : string
If center=True, the padding mode to use at the edges of the signal. By default, STFT uses reflection padding.
if_gram : np.ndarray [shape=(1 + n_fft/2, t), dtype=real]
Instantaneous frequency spectrogram: if_gram[f, t] is the frequency at bin f, time t
D : np.ndarray [shape=(1 + n_fft/2, t), dtype=complex]
Short-time Fourier transform

stft : Short-time Fourier Transform

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> frequencies, D = librosa.ifgram(y, sr=sr)
>>> frequencies
array([[  0.000e+00,   0.000e+00, ...,   0.000e+00,   0.000e+00],
       [  3.150e+01,   3.070e+01, ...,   1.077e+01,   1.077e+01],
       ...,
       [  1.101e+04,   1.101e+04, ...,   1.101e+04,   1.101e+04],
       [  1.102e+04,   1.102e+04, ...,   1.102e+04,   1.102e+04]])
librosa.core.interp_harmonics(x, freqs, h_range, kind='linear', fill_value=0, axis=0)

Compute the energy at harmonics of time-frequency representation.

Given a frequency-based energy representation such as a spectrogram or tempogram, this function computes the energy at the chosen harmonics of the frequency axis. (See examples below.) The resulting harmonic array can then be used as input to a salience computation.

x : np.ndarray
The input energy
freqs : np.ndarray, shape=(X.shape[axis])
The frequency values corresponding to X’s elements along the chosen axis.
h_range : list-like, non-negative
Harmonics to compute. The first harmonic (1) corresponds to x itself. Values less than one (e.g., 1/2) correspond to sub-harmonics.
kind : str
Interpolation type. See scipy.interpolate.interp1d.
fill_value : float
The value to fill when extrapolating beyond the observed frequency range.
axis : int
The axis along which to compute harmonics
x_harm : np.ndarray, shape=(len(h_range), [x.shape])
x_harm[i] will have the same shape as x, and measure the energy at the h_range[i] harmonic of each frequency.

scipy.interpolate.interp1d

Estimate the harmonics of a time-averaged tempogram

>>> y, sr = librosa.load(librosa.util.example_audio_file(),
...                      duration=15, offset=30)
>>> # Compute the time-varying tempogram and average over time
>>> tempi = np.mean(librosa.feature.tempogram(y=y, sr=sr), axis=1)
>>> # We'll measure the first five harmonics
>>> h_range = [1, 2, 3, 4, 5]
>>> f_tempo = librosa.tempo_frequencies(len(tempi), sr=sr)
>>> # Build the harmonic tensor
>>> t_harmonics = librosa.interp_harmonics(tempi, f_tempo, h_range)
>>> print(t_harmonics.shape)
(5, 384)
>>> # And plot the results
>>> import matplotlib.pyplot as plt
>>> plt.figure()
>>> librosa.display.specshow(t_harmonics, x_axis='tempo', sr=sr)
>>> plt.yticks(0.5 + np.arange(len(h_range)),
...            ['{:.3g}'.format(_) for _ in h_range])
>>> plt.ylabel('Harmonic')
>>> plt.xlabel('Tempo (BPM)')
>>> plt.tight_layout()

We can also compute frequency harmonics for spectrograms. To calculate sub-harmonic energy, use values < 1.

>>> h_range = [1./3, 1./2, 1, 2, 3, 4]
>>> S = np.abs(librosa.stft(y))
>>> fft_freqs = librosa.fft_frequencies(sr=sr)
>>> S_harm = librosa.interp_harmonics(S, fft_freqs, h_range, axis=0)
>>> print(S_harm.shape)
(6, 1025, 646)
>>> plt.figure()
>>> for i, _sh in enumerate(S_harm, 1):
...     plt.subplot(3, 2, i)
...     librosa.display.specshow(librosa.amplitude_to_db(_sh,
...                                                      ref=S.max()),
...                              sr=sr, y_axis='log')
...     plt.title('h={:.3g}'.format(h_range[i-1]))
...     plt.yticks([])
>>> plt.tight_layout()
librosa.core.istft(stft_matrix, hop_length=None, win_length=None, window='hann', center=True, dtype=<class 'numpy.float32'>, length=None)

Inverse short-time Fourier transform (ISTFT).

Converts a complex-valued spectrogram stft_matrix to time-series y by minimizing the mean squared error between stft_matrix and STFT of y as described in [1]_.

In general, window function, hop length and other parameters should be same as in stft, which mostly leads to perfect reconstruction of a signal from unmodified stft_matrix.

[1]D. W. Griffin and J. S. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Trans. ASSP, vol.32, no.2, pp.236–243, Apr. 1984.
stft_matrix : np.ndarray [shape=(1 + n_fft/2, t)]
STFT matrix from stft
hop_length : int > 0 [scalar]
Number of frames between STFT columns. If unspecified, defaults to win_length / 4.
win_length : int <= n_fft = 2 * (stft_matrix.shape[0] - 1)

When reconstructing the time series, each frame is windowed and each sample is normalized by the sum of squared window according to the window function (see below).

If unspecified, defaults to n_fft.

window : string, tuple, number, function, np.ndarray [shape=(n_fft,)]
  • a window specification (string, tuple, or number); see scipy.signal.get_window
  • a window function, such as scipy.signal.hanning
  • a user-specified window vector of length n_fft
center : boolean
  • If True, D is assumed to have centered frames.
  • If False, D is assumed to have left-aligned frames.
dtype : numeric type
Real numeric type for y. Default is 32-bit float.
length : int > 0, optional
If provided, the output y is zero-padded or clipped to exactly length samples.
y : np.ndarray [shape=(n,)]
time domain signal reconstructed from stft_matrix

stft : Short-time Fourier Transform

This function caches at level 30.

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> D = librosa.stft(y)
>>> y_hat = librosa.istft(D)
>>> y_hat
array([ -4.812e-06,  -4.267e-06, ...,   6.271e-06,   2.827e-07], dtype=float32)

Exactly preserving length of the input signal requires explicit padding. Otherwise, a partial frame at the end of y will not be represented.

>>> n = len(y)
>>> n_fft = 2048
>>> y_pad = librosa.util.fix_length(y, n + n_fft // 2)
>>> D = librosa.stft(y_pad, n_fft=n_fft)
>>> y_out = librosa.istft(D, length=n)
>>> np.max(np.abs(y - y_out))
1.4901161e-07
librosa.core.load(path, sr=22050, mono=True, offset=0.0, duration=None, dtype=<class 'numpy.float32'>, res_type='kaiser_best')

Load an audio file as a floating point time series.

path : string

path to the input file.

Any format supported by audioread will work.

sr : number > 0 [scalar]

target sampling rate

‘None’ uses the native sampling rate

mono : bool
convert signal to mono
offset : float
start reading after this time (in seconds)
duration : float
only load up to this much audio (in seconds)
dtype : numeric type
data type of y
res_type : str

resample type (see note)

Note

By default, this uses resampy‘s high-quality mode (‘kaiser_best’).

To use a faster method, set res_type=’kaiser_fast’.

To use scipy.signal.resample, set res_type=’scipy’.

y : np.ndarray [shape=(n,) or (2, n)]
audio time series
sr : number > 0 [scalar]
sampling rate of y
>>> # Load a wav file
>>> filename = librosa.util.example_audio_file()
>>> y, sr = librosa.load(filename)
>>> y
array([ -4.756e-06,  -6.020e-06, ...,  -1.040e-06,   0.000e+00], dtype=float32)
>>> sr
22050
>>> # Load a wav file and resample to 11 KHz
>>> filename = librosa.util.example_audio_file()
>>> y, sr = librosa.load(filename, sr=11025)
>>> y
array([ -2.077e-06,  -2.928e-06, ...,  -4.395e-06,   0.000e+00], dtype=float32)
>>> sr
11025
>>> # Load 5 seconds of a wav file, starting 15 seconds in
>>> filename = librosa.util.example_audio_file()
>>> y, sr = librosa.load(filename, offset=15.0, duration=5.0)
>>> y
array([ 0.069,  0.1  , ..., -0.101,  0.   ], dtype=float32)
>>> sr
22050
librosa.core.logamplitude(S, ref=1.0, amin=1e-10, top_db=80.0, ref_power=<DEPRECATED parameter>)

Convert a power spectrogram (amplitude squared) to decibel (dB) units

This computes the scaling 10 * log10(S / ref) in a numerically stable way.

S : np.ndarray
input power
ref : scalar or callable

If scalar, the amplitude abs(S) is scaled relative to ref: 10 * log10(S / ref). Zeros in the output correspond to positions where S == ref.

If callable, the reference value is computed as ref(S).

amin : float > 0 [scalar]
minimum threshold for abs(S) and ref
top_db : float >= 0 [scalar]
threshold the output at top_db below the peak: max(10 * log10(S)) - top_db
ref_power : scalar or callable

Warning

This parameter name was deprecated in librosa 0.5.0. Use the ref parameter instead. The ref_power parameter will be removed in librosa 0.6.0.

S_db : np.ndarray
S_db ~= 10 * log10(S) - 10 * log10(ref)

perceptual_weighting db_to_power amplitude_to_db db_to_amplitude

This function caches at level 30.

Get a power spectrogram from a waveform y

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> S = np.abs(librosa.stft(y))
>>> librosa.power_to_db(S**2)
array([[-33.293, -27.32 , ..., -33.293, -33.293],
       [-33.293, -25.723, ..., -33.293, -33.293],
       ...,
       [-33.293, -33.293, ..., -33.293, -33.293],
       [-33.293, -33.293, ..., -33.293, -33.293]], dtype=float32)

Compute dB relative to peak power

>>> librosa.power_to_db(S**2, ref=np.max)
array([[-80.   , -74.027, ..., -80.   , -80.   ],
       [-80.   , -72.431, ..., -80.   , -80.   ],
       ...,
       [-80.   , -80.   , ..., -80.   , -80.   ],
       [-80.   , -80.   , ..., -80.   , -80.   ]], dtype=float32)

Or compare to median power

>>> librosa.power_to_db(S**2, ref=np.median)
array([[-0.189,  5.784, ..., -0.189, -0.189],
       [-0.189,  7.381, ..., -0.189, -0.189],
       ...,
       [-0.189, -0.189, ..., -0.189, -0.189],
       [-0.189, -0.189, ..., -0.189, -0.189]], dtype=float32)

And plot the results

>>> import matplotlib.pyplot as plt
>>> plt.figure()
>>> plt.subplot(2, 1, 1)
>>> librosa.display.specshow(S**2, sr=sr, y_axis='log')
>>> plt.colorbar()
>>> plt.title('Power spectrogram')
>>> plt.subplot(2, 1, 2)
>>> librosa.display.specshow(librosa.power_to_db(S**2, ref=np.max),
...                          sr=sr, y_axis='log', x_axis='time')
>>> plt.colorbar(format='%+2.0f dB')
>>> plt.title('Log-Power spectrogram')
>>> plt.tight_layout()
librosa.core.magphase(D)

Separate a complex-valued spectrogram D into its magnitude (S) and phase (P) components, so that D = S * P.

D : np.ndarray [shape=(d, t), dtype=complex]
complex-valued spectrogram
D_mag : np.ndarray [shape=(d, t), dtype=real]
magnitude of D
D_phase : np.ndarray [shape=(d, t), dtype=complex]
exp(1.j * phi) where phi is the phase of D
>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> D = librosa.stft(y)
>>> magnitude, phase = librosa.magphase(D)
>>> magnitude
array([[  2.524e-03,   4.329e-02, ...,   3.217e-04,   3.520e-05],
       [  2.645e-03,   5.152e-02, ...,   3.283e-04,   3.432e-04],
       ...,
       [  1.966e-05,   9.828e-06, ...,   3.164e-07,   9.370e-06],
       [  1.966e-05,   9.830e-06, ...,   3.161e-07,   9.366e-06]], dtype=float32)
>>> phase
array([[  1.000e+00 +0.000e+00j,   1.000e+00 +0.000e+00j, ...,
         -1.000e+00 +8.742e-08j,  -1.000e+00 +8.742e-08j],
       [  1.000e+00 +1.615e-16j,   9.950e-01 -1.001e-01j, ...,
          9.794e-01 +2.017e-01j,   1.492e-02 -9.999e-01j],
       ...,
       [  1.000e+00 -5.609e-15j,  -5.081e-04 +1.000e+00j, ...,
         -9.549e-01 -2.970e-01j,   2.938e-01 -9.559e-01j],
       [ -1.000e+00 +8.742e-08j,  -1.000e+00 +8.742e-08j, ...,
         -1.000e+00 +8.742e-08j,  -1.000e+00 +8.742e-08j]], dtype=complex64)

Or get the phase angle (in radians)

>>> np.angle(phase)
array([[  0.000e+00,   0.000e+00, ...,   3.142e+00,   3.142e+00],
       [  1.615e-16,  -1.003e-01, ...,   2.031e-01,  -1.556e+00],
       ...,
       [ -5.609e-15,   1.571e+00, ...,  -2.840e+00,  -1.273e+00],
       [  3.142e+00,   3.142e+00, ...,   3.142e+00,   3.142e+00]], dtype=float32)
librosa.core.mel_frequencies(n_mels=128, fmin=0.0, fmax=11025.0, htk=False)

Compute the center frequencies of mel bands.

n_mels : int > 0 [scalar]
number of Mel bins
fmin : float >= 0 [scalar]
minimum frequency (Hz)
fmax : float >= 0 [scalar]
maximum frequency (Hz)
htk : bool
use HTK formula instead of Slaney
bin_frequencies : ndarray [shape=(n_mels,)]
vector of n_mels frequencies in Hz which are uniformly spaced on the Mel axis.
>>> librosa.mel_frequencies(n_mels=40)
array([     0.   ,     85.317,    170.635,    255.952,
          341.269,    426.586,    511.904,    597.221,
          682.538,    767.855,    853.173,    938.49 ,
         1024.856,   1119.114,   1222.042,   1334.436,
         1457.167,   1591.187,   1737.532,   1897.337,
         2071.84 ,   2262.393,   2470.47 ,   2697.686,
         2945.799,   3216.731,   3512.582,   3835.643,
         4188.417,   4573.636,   4994.285,   5453.621,
         5955.205,   6502.92 ,   7101.009,   7754.107,
         8467.272,   9246.028,  10096.408,  11025.   ])
librosa.core.mel_to_hz(mels, htk=False)

Convert mel bin numbers to frequencies

>>> librosa.mel_to_hz(3)
array([ 200.])
>>> librosa.mel_to_hz([1,2,3,4,5])
array([  66.667,  133.333,  200.   ,  266.667,  333.333])
mels : np.ndarray [shape=(n,)], float
mel bins to convert
htk : bool
use HTK formula instead of Slaney
frequencies : np.ndarray [shape=(n,)]
input mels in Hz

hz_to_mel

librosa.core.midi_to_hz(notes)

Get the frequency (Hz) of MIDI note(s)

>>> librosa.midi_to_hz(36)
array([ 65.406])
>>> librosa.midi_to_hz(np.arange(36, 48))
array([  65.406,   69.296,   73.416,   77.782,   82.407,
         87.307,   92.499,   97.999,  103.826,  110.   ,
        116.541,  123.471])
notes : int or np.ndarray [shape=(n,), dtype=int]
midi number(s) of the note(s)
frequency : np.ndarray [shape=(n,), dtype=float]
frequency (frequencies) of notes in Hz

hz_to_midi note_to_hz

librosa.core.midi_to_note(midi, octave=True, cents=False)

Convert one or more MIDI numbers to note strings.

MIDI numbers will be rounded to the nearest integer.

Notes will be of the format ‘C0’, ‘C#0’, ‘D0’, ...

>>> librosa.midi_to_note(0)
'C-1'
>>> librosa.midi_to_note(37)
'C#2'
>>> librosa.midi_to_note(-2)
'A#-2'
>>> librosa.midi_to_note(104.7)
'A7'
>>> librosa.midi_to_note(104.7, cents=True)
'A7-30'
>>> librosa.midi_to_note(list(range(12, 24)))
['C0', 'C#0', 'D0', 'D#0', 'E0', 'F0', 'F#0', 'G0', 'G#0', 'A0', 'A#0', 'B0']
midi : int or iterable of int
Midi numbers to convert.
octave: bool
If True, include the octave number
cents: bool
If true, cent markers will be appended for fractional notes. Eg, midi_to_note(69.3, cents=True) == A4+03
notes : str or iterable of str
Strings describing each midi note.
ParameterError
if cents is True and octave is False

midi_to_hz note_to_midi hz_to_note

librosa.core.note_to_hz(note, **kwargs)

Convert one or more note names to frequency (Hz)

>>> # Get the frequency of a note
>>> librosa.note_to_hz('C')
array([ 16.352])
>>> # Or multiple notes
>>> librosa.note_to_hz(['A3', 'A4', 'A5'])
array([ 220.,  440.,  880.])
>>> # Or notes with tuning deviations
>>> librosa.note_to_hz('C2-32', round_midi=False)
array([ 64.209])
note : str or iterable of str
One or more note names to convert
kwargs : additional keyword arguments
Additional parameters to note_to_midi
frequencies : np.ndarray [shape=(len(note),)]
Array of frequencies (in Hz) corresponding to note

midi_to_hz note_to_midi hz_to_note

librosa.core.note_to_midi(note, round_midi=True)

Convert one or more spelled notes to MIDI number(s).

Notes may be spelled out with optional accidentals or octave numbers.

The leading note name is case-insensitive.

Sharps are indicated with #, flats may be indicated with ! or b.

note : str or iterable of str
One or more note names.
round_midi : bool
  • If True, allow for fractional midi notes
  • Otherwise, round cent deviations to the nearest note
midi : float or np.array
Midi note numbers corresponding to inputs.
ParameterError
If the input is not in valid note format

midi_to_note note_to_hz

>>> librosa.note_to_midi('C')
12
>>> librosa.note_to_midi('C#3')
49
>>> librosa.note_to_midi('f4')
65
>>> librosa.note_to_midi('Bb-1')
10
>>> librosa.note_to_midi('A!8')
116
>>> # Lists of notes also work
>>> librosa.note_to_midi(['C', 'E', 'G'])
array([12, 16, 19])
librosa.core.octs_to_hz(octs, A440=440.0)

Convert octaves numbers to frequencies.

Octaves are counted relative to A.

>>> librosa.octs_to_hz(1)
array([ 55.])
>>> librosa.octs_to_hz([-2, -1, 0, 1, 2])
array([   6.875,   13.75 ,   27.5  ,   55.   ,  110.   ])
octaves : np.ndarray [shape=(n,)] or float
octave number for each frequency
A440 : float
frequency of A440
frequencies : np.ndarray [shape=(n,)]
scalar or vector of frequencies

hz_to_octs

librosa.core.perceptual_weighting(S, frequencies, **kwargs)

Perceptual weighting of a power spectrogram:

S_p[f] = A_weighting(f) + 10*log(S[f] / ref)

S : np.ndarray [shape=(d, t)]
Power spectrogram
frequencies : np.ndarray [shape=(d,)]
Center frequency for each row of S
kwargs : additional keyword arguments
Additional keyword arguments to logamplitude.
S_p : np.ndarray [shape=(d, t)]
perceptually weighted version of S

logamplitude

This function caches at level 30.

Re-weight a CQT power spectrum, using peak power as reference

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> CQT = librosa.cqt(y, sr=sr, fmin=librosa.note_to_hz('A1'))
>>> freqs = librosa.cqt_frequencies(CQT.shape[0],
...                                 fmin=librosa.note_to_hz('A1'))
>>> perceptual_CQT = librosa.perceptual_weighting(CQT**2,
...                                               freqs,
...                                               ref=np.max)
>>> perceptual_CQT
array([[ -80.076,  -80.049, ..., -104.735, -104.735],
       [ -78.344,  -78.555, ..., -103.725, -103.725],
       ...,
       [ -76.272,  -76.272, ...,  -76.272,  -76.272],
       [ -76.485,  -76.485, ...,  -76.485,  -76.485]])
>>> import matplotlib.pyplot as plt
>>> plt.figure()
>>> plt.subplot(2, 1, 1)
>>> librosa.display.specshow(librosa.amplitude_to_db(CQT,
...                                                  ref=np.max),
...                          fmin=librosa.note_to_hz('A1'),
...                          y_axis='cqt_hz')
>>> plt.title('Log CQT power')
>>> plt.colorbar(format='%+2.0f dB')
>>> plt.subplot(2, 1, 2)
>>> librosa.display.specshow(perceptual_CQT, y_axis='cqt_hz',
...                          fmin=librosa.note_to_hz('A1'),
...                          x_axis='time')
>>> plt.title('Perceptually weighted log CQT')
>>> plt.colorbar(format='%+2.0f dB')
>>> plt.tight_layout()
librosa.core.phase_vocoder(D, rate, hop_length=None)

Phase vocoder. Given an STFT matrix D, speed up by a factor of rate

Based on the implementation provided by [1]_.

[1]Ellis, D. P. W. “A phase vocoder in Matlab.” Columbia University, 2002. http://www.ee.columbia.edu/~dpwe/resources/matlab/pvoc/
>>> # Play at double speed
>>> y, sr   = librosa.load(librosa.util.example_audio_file())
>>> D       = librosa.stft(y, n_fft=2048, hop_length=512)
>>> D_fast  = librosa.phase_vocoder(D, 2.0, hop_length=512)
>>> y_fast  = librosa.istft(D_fast, hop_length=512)
>>> # Or play at 1/3 speed
>>> y, sr   = librosa.load(librosa.util.example_audio_file())
>>> D       = librosa.stft(y, n_fft=2048, hop_length=512)
>>> D_slow  = librosa.phase_vocoder(D, 1./3, hop_length=512)
>>> y_slow  = librosa.istft(D_slow, hop_length=512)
D : np.ndarray [shape=(d, t), dtype=complex]
STFT matrix
rate : float > 0 [scalar]
Speed-up factor: rate > 1 is faster, rate < 1 is slower.
hop_length : int > 0 [scalar] or None

The number of samples between successive columns of D.

If None, defaults to n_fft/4 = (D.shape[0]-1)/2

D_stretched : np.ndarray [shape=(d, t / rate), dtype=complex]
time-stretched STFT
librosa.core.piptrack(y=None, sr=22050, S=None, n_fft=2048, hop_length=None, fmin=150.0, fmax=4000.0, threshold=0.1)

Pitch tracking on thresholded parabolically-interpolated STFT

[1]https://ccrma.stanford.edu/~jos/sasp/Sinusoidal_Peak_Interpolation.html
y: np.ndarray [shape=(n,)] or None
audio signal
sr : number > 0 [scalar]
audio sampling rate of y
S: np.ndarray [shape=(d, t)] or None
magnitude or power spectrogram
n_fft : int > 0 [scalar] or None
number of FFT bins to use, if y is provided.
hop_length : int > 0 [scalar] or None
number of samples to hop
threshold : float in (0, 1)
A bin in spectrum X is considered a pitch when it is greater than threshold*X.max()
fmin : float > 0 [scalar]
lower frequency cutoff.
fmax : float > 0 [scalar]
upper frequency cutoff.

Note

One of S or y must be provided.

If S is not given, it is computed from y using the default parameters of librosa.core.stft.

pitches : np.ndarray [shape=(d, t)] magnitudes : np.ndarray [shape=(d,t)]

Where d is the subset of FFT bins within fmin and fmax.

pitches[f, t] contains instantaneous frequency at bin f, time t

magnitudes[f, t] contains the corresponding magnitudes.

Both pitches and magnitudes take value 0 at bins of non-maximal magnitude.

This function caches at level 30.

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> pitches, magnitudes = librosa.piptrack(y=y, sr=sr)
librosa.core.pitch_tuning(frequencies, resolution=0.01, bins_per_octave=12)

Given a collection of pitches, estimate its tuning offset (in fractions of a bin) relative to A440=440.0Hz.

frequencies : array-like, float
A collection of frequencies detected in the signal. See piptrack
resolution : float in (0, 1)
Resolution of the tuning as a fraction of a bin. 0.01 corresponds to cents.
bins_per_octave : int > 0 [scalar]
How many frequency bins per octave
tuning: float in [-0.5, 0.5)
estimated tuning deviation (fractions of a bin)
estimate_tuning
Estimating tuning from time-series or spectrogram input
>>> # Generate notes at +25 cents
>>> freqs = librosa.cqt_frequencies(24, 55, tuning=0.25)
>>> librosa.pitch_tuning(freqs)
0.25
>>> # Track frequencies from a real spectrogram
>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> pitches, magnitudes, stft = librosa.ifptrack(y, sr)
>>> # Select out pitches with high energy
>>> pitches = pitches[magnitudes > np.median(magnitudes)]
>>> librosa.pitch_tuning(pitches)
0.089999999999999969
librosa.core.power_to_db(S, ref=1.0, amin=1e-10, top_db=80.0, ref_power=<DEPRECATED parameter>)

Convert a power spectrogram (amplitude squared) to decibel (dB) units

This computes the scaling 10 * log10(S / ref) in a numerically stable way.

S : np.ndarray
input power
ref : scalar or callable

If scalar, the amplitude abs(S) is scaled relative to ref: 10 * log10(S / ref). Zeros in the output correspond to positions where S == ref.

If callable, the reference value is computed as ref(S).

amin : float > 0 [scalar]
minimum threshold for abs(S) and ref
top_db : float >= 0 [scalar]
threshold the output at top_db below the peak: max(10 * log10(S)) - top_db
ref_power : scalar or callable

Warning

This parameter name was deprecated in librosa 0.5.0. Use the ref parameter instead. The ref_power parameter will be removed in librosa 0.6.0.

S_db : np.ndarray
S_db ~= 10 * log10(S) - 10 * log10(ref)

perceptual_weighting db_to_power amplitude_to_db db_to_amplitude

This function caches at level 30.

Get a power spectrogram from a waveform y

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> S = np.abs(librosa.stft(y))
>>> librosa.power_to_db(S**2)
array([[-33.293, -27.32 , ..., -33.293, -33.293],
       [-33.293, -25.723, ..., -33.293, -33.293],
       ...,
       [-33.293, -33.293, ..., -33.293, -33.293],
       [-33.293, -33.293, ..., -33.293, -33.293]], dtype=float32)

Compute dB relative to peak power

>>> librosa.power_to_db(S**2, ref=np.max)
array([[-80.   , -74.027, ..., -80.   , -80.   ],
       [-80.   , -72.431, ..., -80.   , -80.   ],
       ...,
       [-80.   , -80.   , ..., -80.   , -80.   ],
       [-80.   , -80.   , ..., -80.   , -80.   ]], dtype=float32)

Or compare to median power

>>> librosa.power_to_db(S**2, ref=np.median)
array([[-0.189,  5.784, ..., -0.189, -0.189],
       [-0.189,  7.381, ..., -0.189, -0.189],
       ...,
       [-0.189, -0.189, ..., -0.189, -0.189],
       [-0.189, -0.189, ..., -0.189, -0.189]], dtype=float32)

And plot the results

>>> import matplotlib.pyplot as plt
>>> plt.figure()
>>> plt.subplot(2, 1, 1)
>>> librosa.display.specshow(S**2, sr=sr, y_axis='log')
>>> plt.colorbar()
>>> plt.title('Power spectrogram')
>>> plt.subplot(2, 1, 2)
>>> librosa.display.specshow(librosa.power_to_db(S**2, ref=np.max),
...                          sr=sr, y_axis='log', x_axis='time')
>>> plt.colorbar(format='%+2.0f dB')
>>> plt.title('Log-Power spectrogram')
>>> plt.tight_layout()
librosa.core.pseudo_cqt(y, sr=22050, hop_length=512, fmin=None, n_bins=84, bins_per_octave=12, tuning=0.0, filter_scale=1, norm=1, sparsity=0.01, window='hann', scale=True, pad_mode='reflect')

Compute the pseudo constant-Q transform of an audio signal.

This uses a single fft size that is the smallest power of 2 that is greater than or equal to the max of:

  1. The longest CQT filter
  2. 2x the hop_length
y : np.ndarray [shape=(n,)]
audio time series
sr : number > 0 [scalar]
sampling rate of y
hop_length : int > 0 [scalar]
number of samples between successive CQT columns.
fmin : float > 0 [scalar]
Minimum frequency. Defaults to C1 ~= 32.70 Hz
n_bins : int > 0 [scalar]
Number of frequency bins, starting at fmin
bins_per_octave : int > 0 [scalar]
Number of bins per octave
tuning : None or float in [-0.5, 0.5)

Tuning offset in fractions of a bin (cents).

If None, tuning will be automatically estimated from the signal.

filter_scale : float > 0
Filter filter_scale factor. Larger values use longer windows.
sparsity : float in [0, 1)

Sparsify the CQT basis by discarding up to sparsity fraction of the energy in each basis.

Set sparsity=0 to disable sparsification.

window : str, tuple, number, or function
Window specification for the basis filters. See filters.get_window for details.
pad_mode : string

Padding mode for centered frame analysis.

See also: librosa.core.stft and np.pad.

CQT : np.ndarray [shape=(n_bins, t), dtype=np.float]
Pseudo Constant-Q energy for each frequency at each time.
ParameterError

If hop_length is not an integer multiple of 2**(n_bins / bins_per_octave)

Or if y is too short to support the frequency range of the CQT.

This function caches at level 20.

librosa.core.resample(y, orig_sr, target_sr, res_type='kaiser_best', fix=True, scale=False, **kwargs)

Resample a time series from orig_sr to target_sr

y : np.ndarray [shape=(n,) or shape=(2, n)]
audio time series. Can be mono or stereo.
orig_sr : number > 0 [scalar]
original sampling rate of y
target_sr : number > 0 [scalar]
target sampling rate
res_type : str

resample type (see note)

Note

By default, this uses resampy‘s high-quality mode (‘kaiser_best’).

To use a faster method, set res_type=’kaiser_fast’.

To use scipy.signal.resample, set res_type=’scipy’.

fix : bool
adjust the length of the resampled signal to be of size exactly ceil(target_sr * len(y) / orig_sr)
scale : bool
Scale the resampled signal so that y and y_hat have approximately equal total energy.
kwargs : additional keyword arguments
If fix==True, additional keyword arguments to pass to librosa.util.fix_length.
y_hat : np.ndarray [shape=(n * target_sr / orig_sr,)]
y resampled from orig_sr to target_sr

librosa.util.fix_length scipy.signal.resample resampy.resample

This function caches at level 20.

Downsample from 22 KHz to 8 KHz

>>> y, sr = librosa.load(librosa.util.example_audio_file(), sr=22050)
>>> y_8k = librosa.resample(y, sr, 8000)
>>> y.shape, y_8k.shape
((1355168,), (491671,))
librosa.core.salience(S, freqs, h_range, weights=None, aggregate=None, filter_peaks=True, fill_value=nan, kind='linear', axis=0)

Harmonic salience function.

S : np.ndarray [shape=(d, n)]
input time frequency magnitude representation (stft, ifgram, etc). Must be real-valued and non-negative.
freqs : np.ndarray, shape=(S.shape[axis])
The frequency values corresponding to S’s elements along the chosen axis.
h_range : list-like, non-negative
Harmonics to include in salience computation. The first harmonic (1) corresponds to S itself. Values less than one (e.g., 1/2) correspond to sub-harmonics.
weights : list-like
The weight to apply to each harmonic in the summation. (default: uniform weights). Must be the same length as harmonics.
aggregate : function
aggregation function (default: np.average) If aggregate=np.average, then a weighted average is computed per-harmonic according to the specified weights. For all other aggregation functions, all harmonics are treated equally.
filter_peaks : bool
If true, returns harmonic summation only on frequencies of peak magnitude. Otherwise returns harmonic summation over the full spectrum. Defaults to True.
fill_value : float
The value to fill non-peaks in the output representation. (default: np.nan) Only used if filter_peaks == True.
kind : str
Interpolation type for harmonic estimation. See scipy.interpolate.interp1d.
axis : int
The axis along which to compute harmonics
S_sal : np.ndarray, shape=(len(h_range), [x.shape])
S_sal will have the same shape as S, and measure the overal harmonic energy at each frequency.

interp_harmonics

>>> y, sr = librosa.load(librosa.util.example_audio_file(),
...                      duration=15, offset=30)
>>> S = np.abs(librosa.stft(y))
>>> freqs = librosa.core.fft_frequencies(sr)
>>> harms = [1, 2, 3, 4]
>>> weights = [1.0, 0.5, 0.33, 0.25]
>>> S_sal = librosa.salience(S, freqs, harms, weights, fill_value=0)
>>> print(S_sal.shape)
(1025, 646)
>>> import matplotlib.pyplot as plt
>>> plt.figure()
>>> librosa.display.specshow(librosa.amplitude_to_db(S_sal,
...                                                  ref=np.max),
...                          sr=sr, y_axis='log', x_axis='time')
>>> plt.colorbar()
>>> plt.title('Salience spectrogram')
>>> plt.tight_layout()
librosa.core.samples_to_frames(samples, hop_length=512, n_fft=None)

Converts sample indices into STFT frames.

>>> # Get the frame numbers for every 256 samples
>>> librosa.samples_to_frames(np.arange(0, 22050, 256))
array([ 0,  0,  1,  1,  2,  2,  3,  3,  4,  4,  5,  5,  6,  6,
        7,  7,  8,  8,  9,  9, 10, 10, 11, 11, 12, 12, 13, 13,
       14, 14, 15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 20, 20,
       21, 21, 22, 22, 23, 23, 24, 24, 25, 25, 26, 26, 27, 27,
       28, 28, 29, 29, 30, 30, 31, 31, 32, 32, 33, 33, 34, 34,
       35, 35, 36, 36, 37, 37, 38, 38, 39, 39, 40, 40, 41, 41,
       42, 42, 43])
samples : np.ndarray [shape=(n,)]
vector of sample indices
hop_length : int > 0 [scalar]
number of samples between successive frames
n_fft : None or int > 0 [scalar]

Optional: length of the FFT window. If given, time conversion will include an offset of - n_fft / 2 to counteract windowing effects in STFT.

Note

This may result in negative frame indices.

frames : np.ndarray [shape=(n,), dtype=int]
Frame numbers corresponding to the given times: frames[i] = floor( samples[i] / hop_length )

samples_to_time : convert sample indices to time values frames_to_samples : convert frame indices to sample indices

librosa.core.samples_to_time(samples, sr=22050)

Convert sample indices to time (in seconds).

samples : np.ndarray
Array of sample indices
sr : number > 0
Sampling rate
times : np.ndarray [shape=samples.shape, dtype=int]
Time values corresponding to samples (in seconds)

samples_to_frames : convert sample indices to frame indices time_to_samples : convert time values to sample indices

Get timestamps corresponding to every 512 samples

>>> librosa.samples_to_time(np.arange(0, 22050, 512))
array([ 0.   ,  0.023,  0.046,  0.07 ,  0.093,  0.116,  0.139,
        0.163,  0.186,  0.209,  0.232,  0.255,  0.279,  0.302,
        0.325,  0.348,  0.372,  0.395,  0.418,  0.441,  0.464,
        0.488,  0.511,  0.534,  0.557,  0.58 ,  0.604,  0.627,
        0.65 ,  0.673,  0.697,  0.72 ,  0.743,  0.766,  0.789,
        0.813,  0.836,  0.859,  0.882,  0.906,  0.929,  0.952,
        0.975,  0.998])
librosa.core.stft(y, n_fft=2048, hop_length=None, win_length=None, window='hann', center=True, dtype=<class 'numpy.complex64'>, pad_mode='reflect')

Short-time Fourier transform (STFT)

Returns a complex-valued matrix D such that

np.abs(D[f, t]) is the magnitude of frequency bin f at frame t

np.angle(D[f, t]) is the phase of frequency bin f at frame t

y : np.ndarray [shape=(n,)], real-valued
the input signal (audio time series)
n_fft : int > 0 [scalar]
FFT window size
hop_length : int > 0 [scalar]
number audio of frames between STFT columns. If unspecified, defaults win_length / 4.
win_length : int <= n_fft [scalar]

Each frame of audio is windowed by window(). The window will be of length win_length and then padded with zeros to match n_fft.

If unspecified, defaults to win_length = n_fft.

window : string, tuple, number, function, or np.ndarray [shape=(n_fft,)]
  • a window specification (string, tuple, or number); see scipy.signal.get_window
  • a window function, such as scipy.signal.hanning
  • a vector or array of length n_fft
center : boolean
  • If True, the signal y is padded so that frame D[:, t] is centered at y[t * hop_length].
  • If False, then D[:, t] begins at y[t * hop_length]
dtype : numeric type
Complex numeric type for D. Default is 64-bit complex.
mode : string
If center=True, the padding mode to use at the edges of the signal. By default, STFT uses reflection padding.
D : np.ndarray [shape=(1 + n_fft/2, t), dtype=dtype]
STFT matrix

istft : Inverse STFT

ifgram : Instantaneous frequency spectrogram

np.pad : array padding

This function caches at level 20.

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> D = librosa.stft(y)
>>> D
array([[  2.576e-03 -0.000e+00j,   4.327e-02 -0.000e+00j, ...,
          3.189e-04 -0.000e+00j,  -5.961e-06 -0.000e+00j],
       [  2.441e-03 +2.884e-19j,   5.145e-02 -5.076e-03j, ...,
         -3.885e-04 -7.253e-05j,   7.334e-05 +3.868e-04j],
      ...,
       [ -7.120e-06 -1.029e-19j,  -1.951e-09 -3.568e-06j, ...,
         -4.912e-07 -1.487e-07j,   4.438e-06 -1.448e-05j],
       [  7.136e-06 -0.000e+00j,   3.561e-06 -0.000e+00j, ...,
         -5.144e-07 -0.000e+00j,  -1.514e-05 -0.000e+00j]], dtype=complex64)

Use left-aligned frames, instead of centered frames

>>> D_left = librosa.stft(y, center=False)

Use a shorter hop length

>>> D_short = librosa.stft(y, hop_length=64)

Display a spectrogram

>>> import matplotlib.pyplot as plt
>>> librosa.display.specshow(librosa.amplitude_to_db(D,
...                                                  ref=np.max),
...                          y_axis='log', x_axis='time')
>>> plt.title('Power spectrogram')
>>> plt.colorbar(format='%+2.0f dB')
>>> plt.tight_layout()
librosa.core.tempo_frequencies(n_bins, hop_length=512, sr=22050)

Compute the frequencies (in beats-per-minute) corresponding to an onset auto-correlation or tempogram matrix.

n_bins : int > 0
The number of lag bins
hop_length : int > 0
The number of samples between each bin
sr : number > 0
The audio sampling rate
bin_frequencies : ndarray [shape=(n_bins,)]

vector of bin frequencies measured in BPM.

Note

bin_frequencies[0] = +np.inf corresponds to 0-lag

Get the tempo frequencies corresponding to a 384-bin (8-second) tempogram

>>> librosa.tempo_frequencies(384)
array([      inf,  2583.984,  1291.992, ...,     6.782,
           6.764,     6.747])
librosa.core.time_to_frames(times, sr=22050, hop_length=512, n_fft=None)

Converts time stamps into STFT frames.

times : np.ndarray [shape=(n,)]
vector of time stamps
sr : number > 0 [scalar]
audio sampling rate
hop_length : int > 0 [scalar]
number of samples between successive frames
n_fft : None or int > 0 [scalar]

Optional: length of the FFT window. If given, time conversion will include an offset of - n_fft / 2 to counteract windowing effects in STFT.

Note

This may result in negative frame indices.

frames : np.ndarray [shape=(n,), dtype=int]
Frame numbers corresponding to the given times: frames[i] = floor( times[i] * sr / hop_length )

frames_to_time : convert frame indices to time values time_to_samples : convert time values to sample indices

Get the frame numbers for every 100ms

>>> librosa.time_to_frames(np.arange(0, 1, 0.1),
...                         sr=22050, hop_length=512)
array([ 0,  4,  8, 12, 17, 21, 25, 30, 34, 38])
librosa.core.time_to_samples(times, sr=22050)

Convert timestamps (in seconds) to sample indices.

times : np.ndarray
Array of time values (in seconds)
sr : number > 0
Sampling rate
samples : np.ndarray [shape=times.shape, dtype=int]
Sample indices corresponding to values in times

time_to_frames : convert time values to frame indices samples_to_time : convert sample indices to time values

>>> librosa.time_to_samples(np.arange(0, 1, 0.1), sr=22050)
array([    0,  2205,  4410,  6615,  8820, 11025, 13230, 15435,
       17640, 19845])
librosa.core.to_mono(y)

Force an audio signal down to mono.

y : np.ndarray [shape=(2,n) or shape=(n,)]
audio time series, either stereo or mono
y_mono : np.ndarray [shape=(n,)]
y as a monophonic time-series

This function caches at level 20.

>>> y, sr = librosa.load(librosa.util.example_audio_file(), mono=False)
>>> y.shape
(2, 1355168)
>>> y_mono = librosa.to_mono(y)
>>> y_mono.shape
(1355168,)
librosa.core.zero_crossings(y, threshold=1e-10, ref_magnitude=None, pad=True, zero_pos=True, axis=-1)

Find the zero-crossings of a signal y: indices i such that sign(y[i]) != sign(y[j]).

If y is multi-dimensional, then zero-crossings are computed along the specified axis.

y : np.ndarray
The input array
threshold : float > 0 or None
If specified, values where -threshold <= y <= threshold are clipped to 0.
ref_magnitude : float > 0 or callable

If numeric, the threshold is scaled relative to ref_magnitude.

If callable, the threshold is scaled relative to ref_magnitude(np.abs(y)).

pad : boolean
If True, then y[0] is considered a valid zero-crossing.
zero_pos : boolean

If True then the value 0 is interpreted as having positive sign.

If False, then 0, -1, and +1 all have distinct signs.

axis : int
Axis along which to compute zero-crossings.
zero_crossings : np.ndarray [shape=y.shape, dtype=boolean]
Indicator array of zero-crossings in y along the selected axis.

This function caches at level 20.

>>> # Generate a time-series
>>> y = np.sin(np.linspace(0, 4 * 2 * np.pi, 20))
>>> y
array([  0.000e+00,   9.694e-01,   4.759e-01,  -7.357e-01,
        -8.372e-01,   3.247e-01,   9.966e-01,   1.646e-01,
        -9.158e-01,  -6.142e-01,   6.142e-01,   9.158e-01,
        -1.646e-01,  -9.966e-01,  -3.247e-01,   8.372e-01,
         7.357e-01,  -4.759e-01,  -9.694e-01,  -9.797e-16])
>>> # Compute zero-crossings
>>> z = librosa.zero_crossings(y)
>>> z
array([ True, False, False,  True, False,  True, False, False,
        True, False,  True, False,  True, False, False,  True,
       False,  True, False,  True], dtype=bool)
>>> # Stack y against the zero-crossing indicator
>>> np.vstack([y, z]).T
array([[  0.000e+00,   1.000e+00],
       [  9.694e-01,   0.000e+00],
       [  4.759e-01,   0.000e+00],
       [ -7.357e-01,   1.000e+00],
       [ -8.372e-01,   0.000e+00],
       [  3.247e-01,   1.000e+00],
       [  9.966e-01,   0.000e+00],
       [  1.646e-01,   0.000e+00],
       [ -9.158e-01,   1.000e+00],
       [ -6.142e-01,   0.000e+00],
       [  6.142e-01,   1.000e+00],
       [  9.158e-01,   0.000e+00],
       [ -1.646e-01,   1.000e+00],
       [ -9.966e-01,   0.000e+00],
       [ -3.247e-01,   0.000e+00],
       [  8.372e-01,   1.000e+00],
       [  7.357e-01,   0.000e+00],
       [ -4.759e-01,   1.000e+00],
       [ -9.694e-01,   0.000e+00],
       [ -9.797e-16,   1.000e+00]])
>>> # Find the indices of zero-crossings
>>> np.nonzero(z)
(array([ 0,  3,  5,  8, 10, 12, 15, 17, 19]),)

Beat tracking

Beat and tempo

Deprecated

librosa.beat.beat_track(y=None, sr=22050, onset_envelope=None, hop_length=512, start_bpm=120.0, tightness=100, trim=True, bpm=None, units='frames')

Dynamic programming beat tracker.

Beats are detected in three stages, following the method of [1]_:
  1. Measure onset strength
  2. Estimate tempo from onset correlation
  3. Pick peaks in onset strength approximately consistent with estimated tempo
[1]Ellis, Daniel PW. “Beat tracking by dynamic programming.” Journal of New Music Research 36.1 (2007): 51-60. http://labrosa.ee.columbia.edu/projects/beattrack/
y : np.ndarray [shape=(n,)] or None
audio time series
sr : number > 0 [scalar]
sampling rate of y
onset_envelope : np.ndarray [shape=(n,)] or None
(optional) pre-computed onset strength envelope.
hop_length : int > 0 [scalar]
number of audio samples between successive onset_envelope values
start_bpm : float > 0 [scalar]
initial guess for the tempo estimator (in beats per minute)
tightness : float [scalar]
tightness of beat distribution around tempo
trim : bool [scalar]
trim leading/trailing beats with weak onsets
bpm : float [scalar]
(optional) If provided, use bpm as the tempo instead of estimating it from onsets.
units : {‘frames’, ‘samples’, ‘time’}
The units to encode detected beat events in. By default, ‘frames’ are used.
tempo : float [scalar, non-negative]
estimated global tempo (in beats per minute)
beats : np.ndarray [shape=(m,)]
estimated beat event locations in the specified units (default is frame indices)

Note

If no onset strength could be detected, beat_tracker estimates 0 BPM and returns an empty list.

ParameterError

if neither y nor onset_envelope are provided

or if units is not one of ‘frames’, ‘samples’, or ‘time’

librosa.onset.onset_strength

Track beats using time series input

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> tempo, beats = librosa.beat.beat_track(y=y, sr=sr)
>>> tempo
64.599609375

Print the first 20 beat frames

>>> beats[:20]
array([ 320,  357,  397,  436,  480,  525,  569,  609,  658,
        698,  737,  777,  817,  857,  896,  936,  976, 1016,
       1055, 1095])

Or print them as timestamps

>>> librosa.frames_to_time(beats[:20], sr=sr)
array([  7.43 ,   8.29 ,   9.218,  10.124,  11.146,  12.19 ,
        13.212,  14.141,  15.279,  16.208,  17.113,  18.042,
        18.971,  19.9  ,  20.805,  21.734,  22.663,  23.591,
        24.497,  25.426])

Track beats using a pre-computed onset envelope

>>> onset_env = librosa.onset.onset_strength(y, sr=sr,
...                                          aggregate=np.median)
>>> tempo, beats = librosa.beat.beat_track(onset_envelope=onset_env,
...                                        sr=sr)
>>> tempo
64.599609375
>>> beats[:20]
array([ 320,  357,  397,  436,  480,  525,  569,  609,  658,
        698,  737,  777,  817,  857,  896,  936,  976, 1016,
       1055, 1095])

Plot the beat events against the onset strength envelope

>>> import matplotlib.pyplot as plt
>>> hop_length = 512
>>> plt.figure(figsize=(8, 4))
>>> times = librosa.frames_to_time(np.arange(len(onset_env)),
...                                sr=sr, hop_length=hop_length)
>>> plt.plot(times, librosa.util.normalize(onset_env),
...          label='Onset strength')
>>> plt.vlines(times[beats], 0, 1, alpha=0.5, color='r',
...            linestyle='--', label='Beats')
>>> plt.legend(frameon=True, framealpha=0.75)
>>> # Limit the plot to a 15-second window
>>> plt.xlim(15, 30)
>>> plt.gca().xaxis.set_major_formatter(librosa.display.TimeFormatter())
>>> plt.tight_layout()
librosa.beat.tempo(y=None, sr=22050, onset_envelope=None, hop_length=512, start_bpm=120, std_bpm=1.0, ac_size=8.0, max_tempo=320.0, aggregate=<function mean>)

Estimate the tempo (beats per minute)

y : np.ndarray [shape=(n,)] or None
audio time series
sr : number > 0 [scalar]
sampling rate of the time series
onset_envelope : np.ndarray [shape=(n,)]
pre-computed onset strength envelope
hop_length : int > 0 [scalar]
hop length of the time series
start_bpm : float [scalar]
initial guess of the BPM
std_bpm : float > 0 [scalar]
standard deviation of tempo distribution
ac_size : float > 0 [scalar]
length (in seconds) of the auto-correlation window
max_tempo : float > 0 [scalar, optional]
If provided, only estimate tempo below this threshold
aggregate : callable [optional]
Aggregation function for estimating global tempo. If None, then tempo is estimated independently for each frame.
tempo : np.ndarray [scalar]
estimated tempo (beats per minute)

librosa.onset.onset_strength librosa.feature.tempogram

This function caches at level 30.

>>> # Estimate a static tempo
>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> onset_env = librosa.onset.onset_strength(y, sr=sr)
>>> tempo = librosa.beat.tempo(onset_envelope=onset_env, sr=sr)
>>> tempo
array([129.199])
>>> # Or a dynamic tempo
>>> dtempo = librosa.beat.tempo(onset_envelope=onset_env, sr=sr,
...                             aggregate=None)
>>> dtempo
array([ 143.555,  143.555,  143.555, ...,  161.499,  161.499,
        172.266])

Plot the estimated tempo against the onset autocorrelation

>>> import matplotlib.pyplot as plt
>>> # Convert to scalar
>>> tempo = np.asscalar(tempo)
>>> # Compute 2-second windowed autocorrelation
>>> hop_length = 512
>>> ac = librosa.autocorrelate(onset_env, 2 * sr // hop_length)
>>> freqs = librosa.tempo_frequencies(len(ac), sr=sr,
...                                   hop_length=hop_length)
>>> # Plot on a BPM axis.  We skip the first (0-lag) bin.
>>> plt.figure(figsize=(8,4))
>>> plt.semilogx(freqs[1:], librosa.util.normalize(ac)[1:],
...              label='Onset autocorrelation', basex=2)
>>> plt.axvline(tempo, 0, 1, color='r', alpha=0.75, linestyle='--',
...            label='Tempo: {:.2f} BPM'.format(tempo))
>>> plt.xlabel('Tempo (BPM)')
>>> plt.grid()
>>> plt.title('Static tempo estimation')
>>> plt.legend(frameon=True)
>>> plt.axis('tight')

Plot dynamic tempo estimates over a tempogram

>>> plt.figure()
>>> tg = librosa.feature.tempogram(onset_envelope=onset_env, sr=sr,
...                                hop_length=hop_length)
>>> librosa.display.specshow(tg, x_axis='time', y_axis='tempo')
>>> plt.plot(librosa.frames_to_time(np.arange(len(dtempo))), dtempo,
...          color='w', linewidth=1.5, label='Tempo estimate')
>>> plt.title('Dynamic tempo estimation')
>>> plt.legend(frameon=True, framealpha=0.75)
librosa.beat.estimate_tempo(onset_envelope, sr=22050, hop_length=512, start_bpm=120, std_bpm=1.0, ac_size=4.0, duration=90.0, offset=0.0)

Estimate the tempo (beats per minute) from an onset envelope

Warning

Deprecated in librosa 0.5 Functionality is superseded by librosa.beat.tempo.

onset_envelope : np.ndarray [shape=(n,)]
onset strength envelope
sr : number > 0 [scalar]
sampling rate of the time series
hop_length : int > 0 [scalar]
hop length of the time series
start_bpm : float [scalar]
initial guess of the BPM
std_bpm : float > 0 [scalar]
standard deviation of tempo distribution
ac_size : float > 0 [scalar]
length (in seconds) of the auto-correlation window
duration : float > 0 [scalar]
length of signal (in seconds) to use in estimating tempo
offset : float > 0 [scalar]
offset (in seconds) of signal sample to use in estimating tempo
tempo : float [scalar]
estimated tempo (beats per minute)

librosa.onset.onset_strength

This function caches at level 30.

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> onset_env = librosa.onset.onset_strength(y, sr=sr)
>>> tempo = librosa.beat.estimate_tempo(onset_env, sr=sr)
>>> tempo
103.359375

Plot the estimated tempo against the onset autocorrelation

>>> import matplotlib.pyplot as plt
>>> # Compute 2-second windowed autocorrelation
>>> hop_length = 512
>>> ac = librosa.autocorrelate(onset_env, 2 * sr // hop_length)
>>> freqs = librosa.tempo_frequencies(len(ac), sr=sr,
...                                   hop_length=hop_length)
>>> # Plot on a BPM axis.  We skip the first (0-lag) bin.
>>> plt.figure(figsize=(8,4))
>>> plt.semilogx(freqs[1:], librosa.util.normalize(ac)[1:],
...              label='Onset autocorrelation', basex=2)
>>> plt.axvline(tempo, 0, 1, color='r', alpha=0.75, linestyle='--',
...            label='Tempo: {:.2f} BPM'.format(tempo))
>>> plt.xlabel('Tempo (BPM)')
>>> plt.grid()
>>> plt.legend(frameon=True)
>>> plt.axis('tight')

Feature extraction

Feature extraction

Spectral features

Rhythm features

Feature manipulation

librosa.feature.chroma_cens(y=None, sr=22050, C=None, hop_length=512, fmin=None, tuning=None, n_chroma=12, n_octaves=7, bins_per_octave=None, cqt_mode='full', window=None, norm=2, win_len_smooth=41)

Computes the chroma variant “Chroma Energy Normalized” (CENS), following [1]_.

[1]Meinard Müller and Sebastian Ewert Chroma Toolbox: MATLAB implementations for extracting variants of chroma-based audio features In Proceedings of the International Conference on Music Information Retrieval (ISMIR), 2011.
y : np.ndarray [shape=(n,)]
audio time series
sr : number > 0
sampling rate of y
C : np.ndarray [shape=(d, t)] [Optional]
a pre-computed constant-Q spectrogram
hop_length : int > 0
number of samples between successive chroma frames
fmin : float > 0
minimum frequency to analyze in the CQT. Default: ‘C1’ ~= 32.7 Hz
norm : int > 0, +-np.inf, or None
Column-wise normalization of the chromagram.
tuning : float
Deviation (in cents) from A440 tuning
n_chroma : int > 0
Number of chroma bins to produce
n_octaves : int > 0
Number of octaves to analyze above fmin
window : None or np.ndarray
Optional window parameter to filters.cq_to_chroma
bins_per_octave : int > 0
Number of bins per octave in the CQT. Default: matches n_chroma
cqt_mode : [‘full’, ‘hybrid’]
Constant-Q transform mode
win_len_smooth : int > 0
Length of temporal smoothing window. Default: 41
chroma_cens : np.ndarray [shape=(n_chroma, t)]
The output cens-chromagram
chroma_cqt
Compute a chromagram from a constant-Q transform.
chroma_stft
Compute a chromagram from an STFT spectrogram or waveform.

Compare standard cqt chroma to CENS.

>>> y, sr = librosa.load(librosa.util.example_audio_file(),
...                      offset=10, duration=15)
>>> chroma_cens = librosa.feature.chroma_cens(y=y, sr=sr)
>>> chroma_cq = librosa.feature.chroma_cqt(y=y, sr=sr)
>>> import matplotlib.pyplot as plt
>>> plt.figure()
>>> plt.subplot(2,1,1)
>>> librosa.display.specshow(chroma_cq, y_axis='chroma')
>>> plt.title('chroma_cq')
>>> plt.colorbar()
>>> plt.subplot(2,1,2)
>>> librosa.display.specshow(chroma_cens, y_axis='chroma', x_axis='time')
>>> plt.title('chroma_cens')
>>> plt.colorbar()
>>> plt.tight_layout()
librosa.feature.chroma_cqt(y=None, sr=22050, C=None, hop_length=512, fmin=None, norm=inf, threshold=0.0, tuning=None, n_chroma=12, n_octaves=7, window=None, bins_per_octave=None, cqt_mode='full')

Constant-Q chromagram

y : np.ndarray [shape=(n,)]
audio time series
sr : number > 0
sampling rate of y
C : np.ndarray [shape=(d, t)] [Optional]
a pre-computed constant-Q spectrogram
hop_length : int > 0
number of samples between successive chroma frames
fmin : float > 0
minimum frequency to analyze in the CQT. Default: ‘C1’ ~= 32.7 Hz
norm : int > 0, +-np.inf, or None
Column-wise normalization of the chromagram.
threshold : float
Pre-normalization energy threshold. Values below the threshold are discarded, resulting in a sparse chromagram.
tuning : float
Deviation (in cents) from A440 tuning
n_chroma : int > 0
Number of chroma bins to produce
n_octaves : int > 0
Number of octaves to analyze above fmin
window : None or np.ndarray
Optional window parameter to filters.cq_to_chroma
bins_per_octave : int > 0
Number of bins per octave in the CQT. Default: matches n_chroma
cqt_mode : [‘full’, ‘hybrid’]
Constant-Q transform mode
chromagram : np.ndarray [shape=(n_chroma, t)]
The output chromagram

librosa.util.normalize librosa.core.cqt librosa.core.hybrid_cqt chroma_stft

Compare a long-window STFT chromagram to the CQT chromagram

>>> y, sr = librosa.load(librosa.util.example_audio_file(),
...                      offset=10, duration=15)
>>> chroma_stft = librosa.feature.chroma_stft(y=y, sr=sr,
...                                           n_chroma=12, n_fft=4096)
>>> chroma_cq = librosa.feature.chroma_cqt(y=y, sr=sr)
>>> import matplotlib.pyplot as plt
>>> plt.figure()
>>> plt.subplot(2,1,1)
>>> librosa.display.specshow(chroma_stft, y_axis='chroma')
>>> plt.title('chroma_stft')
>>> plt.colorbar()
>>> plt.subplot(2,1,2)
>>> librosa.display.specshow(chroma_cq, y_axis='chroma', x_axis='time')
>>> plt.title('chroma_cqt')
>>> plt.colorbar()
>>> plt.tight_layout()
librosa.feature.chroma_stft(y=None, sr=22050, S=None, norm=inf, n_fft=2048, hop_length=512, tuning=None, **kwargs)

Compute a chromagram from a waveform or power spectrogram.

This implementation is derived from chromagram_E [1]_

[1]Ellis, Daniel P.W. “Chroma feature analysis and synthesis” 2007/04/21 http://labrosa.ee.columbia.edu/matlab/chroma-ansyn/
y : np.ndarray [shape=(n,)] or None
audio time series
sr : number > 0 [scalar]
sampling rate of y
S : np.ndarray [shape=(d, t)] or None
power spectrogram
norm : float or None

Column-wise normalization. See librosa.util.normalize for details.

If None, no normalization is performed.

n_fft : int > 0 [scalar]
FFT window size if provided y, sr instead of S
hop_length : int > 0 [scalar]
hop length if provided y, sr instead of S
tuning : float in [-0.5, 0.5) [scalar] or None.
Deviation from A440 tuning in fractional bins (cents). If None, it is automatically estimated.
kwargs : additional keyword arguments
Arguments to parameterize chroma filters. See librosa.filters.chroma for details.
chromagram : np.ndarray [shape=(n_chroma, t)]
Normalized energy for each chroma bin at each frame.
librosa.filters.chroma
Chroma filter bank construction
librosa.util.normalize
Vector normalization
>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> librosa.feature.chroma_stft(y=y, sr=sr)
array([[ 0.974,  0.881, ...,  0.925,  1.   ],
       [ 1.   ,  0.841, ...,  0.882,  0.878],
       ...,
       [ 0.658,  0.985, ...,  0.878,  0.764],
       [ 0.969,  0.92 , ...,  0.974,  0.915]])

Use an energy (magnitude) spectrum instead of power spectrogram

>>> S = np.abs(librosa.stft(y))
>>> chroma = librosa.feature.chroma_stft(S=S, sr=sr)
>>> chroma
array([[ 0.884,  0.91 , ...,  0.861,  0.858],
       [ 0.963,  0.785, ...,  0.968,  0.896],
       ...,
       [ 0.871,  1.   , ...,  0.928,  0.829],
       [ 1.   ,  0.982, ...,  0.93 ,  0.878]])

Use a pre-computed power spectrogram with a larger frame

>>> S = np.abs(librosa.stft(y, n_fft=4096))**2
>>> chroma = librosa.feature.chroma_stft(S=S, sr=sr)
>>> chroma
array([[ 0.685,  0.477, ...,  0.961,  0.986],
       [ 0.674,  0.452, ...,  0.952,  0.926],
       ...,
       [ 0.844,  0.575, ...,  0.934,  0.869],
       [ 0.793,  0.663, ...,  0.964,  0.972]])
>>> import matplotlib.pyplot as plt
>>> plt.figure(figsize=(10, 4))
>>> librosa.display.specshow(chroma, y_axis='chroma', x_axis='time')
>>> plt.colorbar()
>>> plt.title('Chromagram')
>>> plt.tight_layout()
librosa.feature.delta(data, width=9, order=1, axis=-1, trim=True)

Compute delta features: local estimate of the derivative of the input data along the selected axis.

data : np.ndarray
the input data matrix (eg, spectrogram)
width : int >= 3, odd [scalar]
Number of frames over which to compute the delta feature
order : int > 0 [scalar]
the order of the difference operator. 1 for first derivative, 2 for second, etc.
axis : int [scalar]
the axis along which to compute deltas. Default is -1 (columns).
trim : bool
set to True to trim the output matrix to the original size.
delta_data : np.ndarray [shape=(d, t) or (d, t + window)]
delta matrix of data.

This function caches at level 40.

Compute MFCC deltas, delta-deltas

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> mfcc = librosa.feature.mfcc(y=y, sr=sr)
>>> mfcc_delta = librosa.feature.delta(mfcc)
>>> mfcc_delta
array([[  2.929e+01,   3.090e+01, ...,   0.000e+00,   0.000e+00],
       [  2.226e+01,   2.553e+01, ...,   3.944e-31,   3.944e-31],
       ...,
       [ -1.192e+00,  -6.099e-01, ...,   9.861e-32,   9.861e-32],
       [ -5.349e-01,  -2.077e-01, ...,   1.183e-30,   1.183e-30]])
>>> mfcc_delta2 = librosa.feature.delta(mfcc, order=2)
>>> mfcc_delta2
array([[  1.281e+01,   1.020e+01, ...,   0.000e+00,   0.000e+00],
       [  2.726e+00,   3.558e+00, ...,   0.000e+00,   0.000e+00],
       ...,
       [ -1.702e-01,  -1.509e-01, ...,   0.000e+00,   0.000e+00],
       [ -9.021e-02,  -7.007e-02, ...,  -2.190e-47,  -2.190e-47]])
>>> import matplotlib.pyplot as plt
>>> plt.subplot(3, 1, 1)
>>> librosa.display.specshow(mfcc)
>>> plt.title('MFCC')
>>> plt.colorbar()
>>> plt.subplot(3, 1, 2)
>>> librosa.display.specshow(mfcc_delta)
>>> plt.title(r'MFCC-$\Delta$')
>>> plt.colorbar()
>>> plt.subplot(3, 1, 3)
>>> librosa.display.specshow(mfcc_delta2, x_axis='time')
>>> plt.title(r'MFCC-$\Delta^2$')
>>> plt.colorbar()
>>> plt.tight_layout()
librosa.feature.melspectrogram(y=None, sr=22050, S=None, n_fft=2048, hop_length=512, power=2.0, **kwargs)

Compute a mel-scaled spectrogram.

If a spectrogram input S is provided, then it is mapped directly onto the mel basis mel_f by mel_f.dot(S).

If a time-series input y, sr is provided, then its magnitude spectrogram S is first computed, and then mapped onto the mel scale by mel_f.dot(S**power). By default, power=2 operates on a power spectrum.

y : np.ndarray [shape=(n,)] or None
audio time-series
sr : number > 0 [scalar]
sampling rate of y
S : np.ndarray [shape=(d, t)]
spectrogram
n_fft : int > 0 [scalar]
length of the FFT window
hop_length : int > 0 [scalar]
number of samples between successive frames. See librosa.core.stft
power : float > 0 [scalar]
Exponent for the magnitude melspectrogram. e.g., 1 for energy, 2 for power, etc.
kwargs : additional keyword arguments
Mel filter bank parameters. See librosa.filters.mel for details.
S : np.ndarray [shape=(n_mels, t)]
Mel spectrogram
librosa.filters.mel
Mel filter bank construction
librosa.core.stft
Short-time Fourier Transform
>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> librosa.feature.melspectrogram(y=y, sr=sr)
array([[  2.891e-07,   2.548e-03, ...,   8.116e-09,   5.633e-09],
       [  1.986e-07,   1.162e-02, ...,   9.332e-08,   6.716e-09],
       ...,
       [  3.668e-09,   2.029e-08, ...,   3.208e-09,   2.864e-09],
       [  2.561e-10,   2.096e-09, ...,   7.543e-10,   6.101e-10]])

Using a pre-computed power spectrogram

>>> D = np.abs(librosa.stft(y))**2
>>> S = librosa.feature.melspectrogram(S=D)
>>> # Passing through arguments to the Mel filters
>>> S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128,
...                                     fmax=8000)
>>> import matplotlib.pyplot as plt
>>> plt.figure(figsize=(10, 4))
>>> librosa.display.specshow(librosa.power_to_db(S,
...                                              ref=np.max),
...                          y_axis='mel', fmax=8000,
...                          x_axis='time')
>>> plt.colorbar(format='%+2.0f dB')
>>> plt.title('Mel spectrogram')
>>> plt.tight_layout()
librosa.feature.mfcc(y=None, sr=22050, S=None, n_mfcc=20, **kwargs)

Mel-frequency cepstral coefficients

y : np.ndarray [shape=(n,)] or None
audio time series
sr : number > 0 [scalar]
sampling rate of y
S : np.ndarray [shape=(d, t)] or None
log-power Mel spectrogram
n_mfcc: int > 0 [scalar]
number of MFCCs to return
kwargs : additional keyword arguments
Arguments to melspectrogram, if operating on time series input
M : np.ndarray [shape=(n_mfcc, t)]
MFCC sequence

melspectrogram

Generate mfccs from a time series

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> librosa.feature.mfcc(y=y, sr=sr)
array([[ -5.229e+02,  -4.944e+02, ...,  -5.229e+02,  -5.229e+02],
       [  7.105e-15,   3.787e+01, ...,  -7.105e-15,  -7.105e-15],
       ...,
       [  1.066e-14,  -7.500e+00, ...,   1.421e-14,   1.421e-14],
       [  3.109e-14,  -5.058e+00, ...,   2.931e-14,   2.931e-14]])

Use a pre-computed log-power Mel spectrogram

>>> S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128,
...                                    fmax=8000)
>>> librosa.feature.mfcc(S=librosa.power_to_db(S))
array([[ -5.207e+02,  -4.898e+02, ...,  -5.207e+02,  -5.207e+02],
       [ -2.576e-14,   4.054e+01, ...,  -3.997e-14,  -3.997e-14],
       ...,
       [  7.105e-15,  -3.534e+00, ...,   0.000e+00,   0.000e+00],
       [  3.020e-14,  -2.613e+00, ...,   3.553e-14,   3.553e-14]])

Get more components

>>> mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=40)

Visualize the MFCC series

>>> import matplotlib.pyplot as plt
>>> plt.figure(figsize=(10, 4))
>>> librosa.display.specshow(mfccs, x_axis='time')
>>> plt.colorbar()
>>> plt.title('MFCC')
>>> plt.tight_layout()
librosa.feature.poly_features(y=None, sr=22050, S=None, n_fft=2048, hop_length=512, order=1, freq=None)

Get coefficients of fitting an nth-order polynomial to the columns of a spectrogram.

y : np.ndarray [shape=(n,)] or None
audio time series
sr : number > 0 [scalar]
audio sampling rate of y
S : np.ndarray [shape=(d, t)] or None
(optional) spectrogram magnitude
n_fft : int > 0 [scalar]
FFT window size
hop_length : int > 0 [scalar]
hop length for STFT. See librosa.core.stft for details.
order : int > 0
order of the polynomial to fit
freq : None or np.ndarray [shape=(d,) or shape=(d, t)]
Center frequencies for spectrogram bins. If None, then FFT bin center frequencies are used. Otherwise, it can be a single array of d center frequencies, or a matrix of center frequencies as constructed by librosa.core.ifgram
coefficients : np.ndarray [shape=(order+1, t)]

polynomial coefficients for each frame.

coeffecients[0] corresponds to the highest degree (order),

coefficients[1] corresponds to the next highest degree (order-1),

down to the constant term coefficients[order].

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> S = np.abs(librosa.stft(y))

Fit a degree-0 polynomial (constant) to each frame

>>> p0 = librosa.feature.poly_features(S=S, order=0)

Fit a linear polynomial to each frame

>>> p1 = librosa.feature.poly_features(S=S, order=1)

Fit a quadratic to each frame

>>> p2 = librosa.feature.poly_features(S=S, order=2)

Plot the results for comparison

>>> import matplotlib.pyplot as plt
>>> plt.figure(figsize=(8, 8))
>>> ax = plt.subplot(4,1,1)
>>> plt.plot(p2[2], label='order=2', alpha=0.8)
>>> plt.plot(p1[1], label='order=1', alpha=0.8)
>>> plt.plot(p0[0], label='order=0', alpha=0.8)
>>> plt.xticks([])
>>> plt.ylabel('Constant')
>>> plt.legend()
>>> plt.subplot(4,1,2, sharex=ax)
>>> plt.plot(p2[1], label='order=2', alpha=0.8)
>>> plt.plot(p1[0], label='order=1', alpha=0.8)
>>> plt.xticks([])
>>> plt.ylabel('Linear')
>>> plt.subplot(4,1,3, sharex=ax)
>>> plt.plot(p2[0], label='order=2', alpha=0.8)
>>> plt.xticks([])
>>> plt.ylabel('Quadratic')
>>> plt.subplot(4,1,4, sharex=ax)
>>> librosa.display.specshow(librosa.amplitude_to_db(S, ref=np.max),
...                          y_axis='log')
>>> plt.tight_layout()
librosa.feature.rmse(y=None, S=None, frame_length=2048, hop_length=512, center=True, pad_mode='reflect', n_fft=<DEPRECATED parameter>)

Compute root-mean-square (RMS) energy for each frame, either from the audio samples y or from a spectrogram S.

Computing the energy from audio samples is faster as it doesn’t require a STFT calculation. However, using a spectrogram will give a more accurate representation of energy over time because its frames can be windowed, thus prefer using S if it’s already available.

y : np.ndarray [shape=(n,)] or None
(optional) audio time series. Required if S is not input.
S : np.ndarray [shape=(d, t)] or None
(optional) spectrogram magnitude. Required if y is not input.
frame_length : int > 0 [scalar]
length of analysis frame (in samples) for energy calculation
hop_length : int > 0 [scalar]
hop length for STFT. See librosa.core.stft for details.
center : bool

If True and operating on time-domain input (y), pad the signal by frame_length//2 on either side.

If operating on spectrogram input, this has no effect.

pad_mode : str
Padding mode for centered analysis. See np.pad for valid values.
n_fft : [DEPRECATED]

Warning

This parameter name was deprecated in librosa 0.5.0 Use the frame_length parameter instead. The n_fft parameter will be removed in librosa 0.6.0.

rms : np.ndarray [shape=(1, t)]
RMS value for each frame
>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> librosa.feature.rmse(y=y)
array([[ 0.   ,  0.056, ...,  0.   ,  0.   ]], dtype=float32)

Or from spectrogram input

>>> S, phase = librosa.magphase(librosa.stft(y))
>>> rms = librosa.feature.rmse(S=S)
>>> import matplotlib.pyplot as plt
>>> plt.figure()
>>> plt.subplot(2, 1, 1)
>>> plt.semilogy(rms.T, label='RMS Energy')
>>> plt.xticks([])
>>> plt.xlim([0, rms.shape[-1]])
>>> plt.legend(loc='best')
>>> plt.subplot(2, 1, 2)
>>> librosa.display.specshow(librosa.amplitude_to_db(S, ref=np.max),
...                          y_axis='log', x_axis='time')
>>> plt.title('log Power spectrogram')
>>> plt.tight_layout()

Use a STFT window of constant ones and no frame centering to get consistent results with the RMS energy computed from the audio samples y

>>> S = librosa.magphase(librosa.stft(y, window=np.ones, center=False))[0]
>>> librosa.feature.rmse(S=S)
librosa.feature.spectral_bandwidth(y=None, sr=22050, S=None, n_fft=2048, hop_length=512, freq=None, centroid=None, norm=True, p=2)

Compute p’th-order spectral bandwidth:

(sum_k S[k] * (freq[k] - centroid)**p)**(1/p)
y : np.ndarray [shape=(n,)] or None
audio time series
sr : number > 0 [scalar]
audio sampling rate of y
S : np.ndarray [shape=(d, t)] or None
(optional) spectrogram magnitude
n_fft : int > 0 [scalar]
FFT window size
hop_length : int > 0 [scalar]
hop length for STFT. See librosa.core.stft for details.
freq : None or np.ndarray [shape=(d,) or shape=(d, t)]
Center frequencies for spectrogram bins. If None, then FFT bin center frequencies are used. Otherwise, it can be a single array of d center frequencies, or a matrix of center frequencies as constructed by librosa.core.ifgram
centroid : None or np.ndarray [shape=(1, t)]
pre-computed centroid frequencies
norm : bool
Normalize per-frame spectral energy (sum to one)
p : float > 0
Power to raise deviation from spectral centroid.
bandwidth : np.ndarray [shape=(1, t)]
frequency bandwidth for each frame

From time-series input

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> spec_bw = librosa.feature.spectral_bandwidth(y=y, sr=sr)
>>> spec_bw
array([[ 3379.878,  1429.486, ...,  3235.214,  3080.148]])

From spectrogram input

>>> S, phase = librosa.magphase(librosa.stft(y=y))
>>> librosa.feature.spectral_bandwidth(S=S)
array([[ 3379.878,  1429.486, ...,  3235.214,  3080.148]])

Using variable bin center frequencies

>>> if_gram, D = librosa.ifgram(y)
>>> librosa.feature.spectral_bandwidth(S=np.abs(D), freq=if_gram)
array([[ 3380.011,  1429.11 , ...,  3235.22 ,  3080.148]])

Plot the result

>>> import matplotlib.pyplot as plt
>>> plt.figure()
>>> plt.subplot(2, 1, 1)
>>> plt.semilogy(spec_bw.T, label='Spectral bandwidth')
>>> plt.ylabel('Hz')
>>> plt.xticks([])
>>> plt.xlim([0, spec_bw.shape[-1]])
>>> plt.legend()
>>> plt.subplot(2, 1, 2)
>>> librosa.display.specshow(librosa.amplitude_to_db(S, ref=np.max),
...                          y_axis='log', x_axis='time')
>>> plt.title('log Power spectrogram')
>>> plt.tight_layout()
librosa.feature.spectral_centroid(y=None, sr=22050, S=None, n_fft=2048, hop_length=512, freq=None)

Compute the spectral centroid.

Each frame of a magnitude spectrogram is normalized and treated as a distribution over frequency bins, from which the mean (centroid) is extracted per frame.

y : np.ndarray [shape=(n,)] or None
audio time series
sr : number > 0 [scalar]
audio sampling rate of y
S : np.ndarray [shape=(d, t)] or None
(optional) spectrogram magnitude
n_fft : int > 0 [scalar]
FFT window size
hop_length : int > 0 [scalar]
hop length for STFT. See librosa.core.stft for details.
freq : None or np.ndarray [shape=(d,) or shape=(d, t)]
Center frequencies for spectrogram bins. If None, then FFT bin center frequencies are used. Otherwise, it can be a single array of d center frequencies, or a matrix of center frequencies as constructed by librosa.core.ifgram
centroid : np.ndarray [shape=(1, t)]
centroid frequencies
librosa.core.stft
Short-time Fourier Transform
librosa.core.ifgram
Instantaneous-frequency spectrogram

From time-series input:

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> cent = librosa.feature.spectral_centroid(y=y, sr=sr)
>>> cent
array([[ 4382.894,   626.588, ...,  5037.07 ,  5413.398]])

From spectrogram input:

>>> S, phase = librosa.magphase(librosa.stft(y=y))
>>> librosa.feature.spectral_centroid(S=S)
array([[ 4382.894,   626.588, ...,  5037.07 ,  5413.398]])

Using variable bin center frequencies:

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> if_gram, D = librosa.ifgram(y)
>>> librosa.feature.spectral_centroid(S=np.abs(D), freq=if_gram)
array([[ 4420.719,   625.769, ...,  5011.86 ,  5221.492]])

Plot the result

>>> import matplotlib.pyplot as plt
>>> plt.figure()
>>> plt.subplot(2, 1, 1)
>>> plt.semilogy(cent.T, label='Spectral centroid')
>>> plt.ylabel('Hz')
>>> plt.xticks([])
>>> plt.xlim([0, cent.shape[-1]])
>>> plt.legend()
>>> plt.subplot(2, 1, 2)
>>> librosa.display.specshow(librosa.amplitude_to_db(S, ref=np.max),
...                          y_axis='log', x_axis='time')
>>> plt.title('log Power spectrogram')
>>> plt.tight_layout()
librosa.feature.spectral_contrast(y=None, sr=22050, S=None, n_fft=2048, hop_length=512, freq=None, fmin=200.0, n_bands=6, quantile=0.02, linear=False)

Compute spectral contrast [1]_

[1]Jiang, Dan-Ning, Lie Lu, Hong-Jiang Zhang, Jian-Hua Tao, and Lian-Hong Cai. “Music type classification by spectral contrast feature.” In Multimedia and Expo, 2002. ICME‘02. Proceedings. 2002 IEEE International Conference on, vol. 1, pp. 113-116. IEEE, 2002.
y : np.ndarray [shape=(n,)] or None
audio time series
sr : number > 0 [scalar]
audio sampling rate of y
S : np.ndarray [shape=(d, t)] or None
(optional) spectrogram magnitude
n_fft : int > 0 [scalar]
FFT window size
hop_length : int > 0 [scalar]
hop length for STFT. See librosa.core.stft for details.
freq : None or np.ndarray [shape=(d,)]
Center frequencies for spectrogram bins. If None, then FFT bin center frequencies are used. Otherwise, it can be a single array of d center frequencies.
fmin : float > 0
Frequency cutoff for the first bin [0, fmin] Subsequent bins will cover [fmin, 2*fmin], [2*fmin, 4*fmin], etc.
n_bands : int > 1
number of frequency bands
quantile : float in (0, 1)
quantile for determining peaks and valleys
linear : bool

If True, return the linear difference of magnitudes: peaks - valleys.

If False, return the logarithmic difference: log(peaks) - log(valleys).

contrast : np.ndarray [shape=(n_bands + 1, t)]
each row of spectral contrast values corresponds to a given octave-based frequency
>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> S = np.abs(librosa.stft(y))
>>> contrast = librosa.feature.spectral_contrast(S=S, sr=sr)
>>> import matplotlib.pyplot as plt
>>> plt.figure()
>>> plt.subplot(2, 1, 1)
>>> librosa.display.specshow(librosa.amplitude_to_db(S,
...                                                  ref=np.max),
...                          y_axis='log')
>>> plt.colorbar(format='%+2.0f dB')
>>> plt.title('Power spectrogram')
>>> plt.subplot(2, 1, 2)
>>> librosa.display.specshow(contrast, x_axis='time')
>>> plt.colorbar()
>>> plt.ylabel('Frequency bands')
>>> plt.title('Spectral contrast')
>>> plt.tight_layout()
librosa.feature.spectral_rolloff(y=None, sr=22050, S=None, n_fft=2048, hop_length=512, freq=None, roll_percent=0.85)

Compute roll-off frequency

y : np.ndarray [shape=(n,)] or None
audio time series
sr : number > 0 [scalar]
audio sampling rate of y
S : np.ndarray [shape=(d, t)] or None
(optional) spectrogram magnitude
n_fft : int > 0 [scalar]
FFT window size
hop_length : int > 0 [scalar]
hop length for STFT. See librosa.core.stft for details.
freq : None or np.ndarray [shape=(d,) or shape=(d, t)]

Center frequencies for spectrogram bins. If None, then FFT bin center frequencies are used. Otherwise, it can be a single array of d center frequencies,

Note

freq is assumed to be sorted in increasing order

roll_percent : float [0 < roll_percent < 1]
Roll-off percentage.
rolloff : np.ndarray [shape=(1, t)]
roll-off frequency for each frame

From time-series input

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr)
>>> rolloff
array([[ 8376.416,   968.994, ...,  8925.513,  9108.545]])

From spectrogram input

>>> S, phase = librosa.magphase(librosa.stft(y))
>>> librosa.feature.spectral_rolloff(S=S, sr=sr)
array([[ 8376.416,   968.994, ...,  8925.513,  9108.545]])
>>> # With a higher roll percentage:
>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> librosa.feature.spectral_rolloff(y=y, sr=sr, roll_percent=0.95)
array([[ 10012.939,   3003.882, ...,  10034.473,  10077.539]])
>>> import matplotlib.pyplot as plt
>>> plt.figure()
>>> plt.subplot(2, 1, 1)
>>> plt.semilogy(rolloff.T, label='Roll-off frequency')
>>> plt.ylabel('Hz')
>>> plt.xticks([])
>>> plt.xlim([0, rolloff.shape[-1]])
>>> plt.legend()
>>> plt.subplot(2, 1, 2)
>>> librosa.display.specshow(librosa.amplitude_to_db(S, ref=np.max),
...                          y_axis='log', x_axis='time')
>>> plt.title('log Power spectrogram')
>>> plt.tight_layout()
librosa.feature.stack_memory(data, n_steps=2, delay=1, **kwargs)

Short-term history embedding: vertically concatenate a data vector or matrix with delayed copies of itself.

Each column data[:, i] is mapped to:

data[:, i] ->  [data[:, i],
                data[:, i - delay],
                ...
                data[:, i - (n_steps-1)*delay]]

For columns i < (n_steps - 1) * delay , the data will be padded. By default, the data is padded with zeros, but this behavior can be overridden by supplying additional keyword arguments which are passed to np.pad().

data : np.ndarray [shape=(t,) or (d, t)]
Input data matrix. If data is a vector (data.ndim == 1), it will be interpreted as a row matrix and reshaped to (1, t).
n_steps : int > 0 [scalar]
embedding dimension, the number of steps back in time to stack
delay : int != 0 [scalar]

the number of columns to step.

Positive values embed from the past (previous columns).

Negative values embed from the future (subsequent columns).

kwargs : additional keyword arguments
Additional arguments to pass to np.pad.
data_history : np.ndarray [shape=(m * d, t)]
data augmented with lagged copies of itself, where m == n_steps - 1.

This function caches at level 40.

Keep two steps (current and previous)

>>> data = np.arange(-3, 3)
>>> librosa.feature.stack_memory(data)
array([[-3, -2, -1,  0,  1,  2],
       [ 0, -3, -2, -1,  0,  1]])

Or three steps

>>> librosa.feature.stack_memory(data, n_steps=3)
array([[-3, -2, -1,  0,  1,  2],
       [ 0, -3, -2, -1,  0,  1],
       [ 0,  0, -3, -2, -1,  0]])

Use reflection padding instead of zero-padding

>>> librosa.feature.stack_memory(data, n_steps=3, mode='reflect')
array([[-3, -2, -1,  0,  1,  2],
       [-2, -3, -2, -1,  0,  1],
       [-1, -2, -3, -2, -1,  0]])

Or pad with edge-values, and delay by 2

>>> librosa.feature.stack_memory(data, n_steps=3, delay=2, mode='edge')
array([[-3, -2, -1,  0,  1,  2],
       [-3, -3, -3, -2, -1,  0],
       [-3, -3, -3, -3, -3, -2]])

Stack time-lagged beat-synchronous chroma edge padding

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> chroma = librosa.feature.chroma_stft(y=y, sr=sr)
>>> tempo, beats = librosa.beat.beat_track(y=y, sr=sr, hop_length=512)
>>> beats = librosa.util.fix_frames(beats, x_min=0, x_max=chroma.shape[1])
>>> chroma_sync = librosa.util.sync(chroma, beats)
>>> chroma_lag = librosa.feature.stack_memory(chroma_sync, n_steps=3,
...                                           mode='edge')

Plot the result

>>> import matplotlib.pyplot as plt
>>> beat_times = librosa.frames_to_time(beats, sr=sr, hop_length=512)
>>> librosa.display.specshow(chroma_lag, y_axis='chroma', x_axis='time',
...                          x_coords=beat_times)
>>> plt.yticks([0, 12, 24], ['Lag=0', 'Lag=1', 'Lag=2'])
>>> plt.title('Time-lagged chroma')
>>> plt.colorbar()
>>> plt.tight_layout()
librosa.feature.tempogram(y=None, sr=22050, onset_envelope=None, hop_length=512, win_length=384, center=True, window='hann', norm=inf)

Compute the tempogram: local autocorrelation of the onset strength envelope. [1]_

[1]Grosche, Peter, Meinard Müller, and Frank Kurth. “Cyclic tempogram - A mid-level tempo representation for music signals.” ICASSP, 2010.
y : np.ndarray [shape=(n,)] or None
Audio time series.
sr : number > 0 [scalar]
sampling rate of y
onset_envelope : np.ndarray [shape=(n,)] or None
Optional pre-computed onset strength envelope as provided by onset.onset_strength
hop_length : int > 0
number of audio samples between successive onset measurements
win_length : int > 0
length of the onset autocorrelation window (in frames/onset measurements) The default settings (384) corresponds to 384 * hop_length / sr ~= 8.9s.
center : bool
If True, onset autocorrelation windows are centered. If False, windows are left-aligned.
window : string, function, number, tuple, or np.ndarray [shape=(win_length,)]
A window specification as in core.stft.
norm : {np.inf, -np.inf, 0, float > 0, None}
Normalization mode. Set to None to disable normalization.
tempogram : np.ndarray [shape=(win_length, n)]
Localized autocorrelation of the onset strength envelope
ParameterError

if neither y nor onset_envelope are provided

if win_length < 1

librosa.onset.onset_strength librosa.util.normalize librosa.core.stft

>>> # Compute local onset autocorrelation
>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> hop_length = 512
>>> oenv = librosa.onset.onset_strength(y=y, sr=sr, hop_length=hop_length)
>>> tempogram = librosa.feature.tempogram(onset_envelope=oenv, sr=sr,
...                                       hop_length=hop_length)
>>> # Compute global onset autocorrelation
>>> ac_global = librosa.autocorrelate(oenv, max_size=tempogram.shape[0])
>>> ac_global = librosa.util.normalize(ac_global)
>>> # Estimate the global tempo for display purposes
>>> tempo = librosa.beat.tempo(onset_envelope=oenv, sr=sr,
...                            hop_length=hop_length)[0]
>>> import matplotlib.pyplot as plt
>>> plt.figure(figsize=(8, 8))
>>> plt.subplot(4, 1, 1)
>>> plt.plot(oenv, label='Onset strength')
>>> plt.xticks([])
>>> plt.legend(frameon=True)
>>> plt.axis('tight')
>>> plt.subplot(4, 1, 2)
>>> # We'll truncate the display to a narrower range of tempi
>>> librosa.display.specshow(tempogram, sr=sr, hop_length=hop_length,
>>>                          x_axis='time', y_axis='tempo')
>>> plt.axhline(tempo, color='w', linestyle='--', alpha=1,
...             label='Estimated tempo={:g}'.format(tempo))
>>> plt.legend(frameon=True, framealpha=0.75)
>>> plt.subplot(4, 1, 3)
>>> x = np.linspace(0, tempogram.shape[0] * float(hop_length) / sr,
...                 num=tempogram.shape[0])
>>> plt.plot(x, np.mean(tempogram, axis=1), label='Mean local autocorrelation')
>>> plt.plot(x, ac_global, '--', alpha=0.75, label='Global autocorrelation')
>>> plt.xlabel('Lag (seconds)')
>>> plt.axis('tight')
>>> plt.legend(frameon=True)
>>> plt.subplot(4,1,4)
>>> # We can also plot on a BPM axis
>>> freqs = librosa.tempo_frequencies(tempogram.shape[0], hop_length=hop_length, sr=sr)
>>> plt.semilogx(freqs[1:], np.mean(tempogram[1:], axis=1),
...              label='Mean local autocorrelation', basex=2)
>>> plt.semilogx(freqs[1:], ac_global[1:], '--', alpha=0.75,
...              label='Global autocorrelation', basex=2)
>>> plt.axvline(tempo, color='black', linestyle='--', alpha=.8,
...             label='Estimated tempo={:g}'.format(tempo))
>>> plt.legend(frameon=True)
>>> plt.xlabel('BPM')
>>> plt.axis('tight')
>>> plt.grid()
>>> plt.tight_layout()
librosa.feature.tonnetz(y=None, sr=22050, chroma=None)

Computes the tonal centroid features (tonnetz), following the method of [1]_.

[1]Harte, C., Sandler, M., & Gasser, M. (2006). “Detecting Harmonic Change in Musical Audio.” In Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia (pp. 21-26). Santa Barbara, CA, USA: ACM Press. doi:10.1145/1178723.1178727.
y : np.ndarray [shape=(n,)] or None
Audio time series.
sr : number > 0 [scalar]
sampling rate of y
chroma : np.ndarray [shape=(n_chroma, t)] or None

Normalized energy for each chroma bin at each frame.

If None, a cqt chromagram is performed.

tonnetz : np.ndarray [shape(6, t)]

Tonal centroid features for each frame.

Tonnetz dimensions:
  • 0: Fifth x-axis
  • 1: Fifth y-axis
  • 2: Minor x-axis
  • 3: Minor y-axis
  • 4: Major x-axis
  • 5: Major y-axis
chroma_cqt
Compute a chromagram from a constant-Q transform.
chroma_stft
Compute a chromagram from an STFT spectrogram or waveform.

Compute tonnetz features from the harmonic component of a song

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> y = librosa.effects.harmonic(y)
>>> tonnetz = librosa.feature.tonnetz(y=y, sr=sr)
>>> tonnetz
array([[-0.073, -0.053, ..., -0.054, -0.073],
       [ 0.001,  0.001, ..., -0.054, -0.062],
       ...,
       [ 0.039,  0.034, ...,  0.044,  0.064],
       [ 0.005,  0.002, ...,  0.011,  0.017]])

Compare the tonnetz features to chroma_cqt

>>> import matplotlib.pyplot as plt
>>> plt.subplot(2, 1, 1)
>>> librosa.display.specshow(tonnetz, y_axis='tonnetz')
>>> plt.colorbar()
>>> plt.title('Tonal Centroids (Tonnetz)')
>>> plt.subplot(2, 1, 2)
>>> librosa.display.specshow(librosa.feature.chroma_cqt(y, sr=sr),
...                          y_axis='chroma', x_axis='time')
>>> plt.colorbar()
>>> plt.title('Chroma')
>>> plt.tight_layout()
librosa.feature.zero_crossing_rate(y, frame_length=2048, hop_length=512, center=True, **kwargs)

Compute the zero-crossing rate of an audio time series.

y : np.ndarray [shape=(n,)]
Audio time series
frame_length : int > 0
Length of the frame over which to compute zero crossing rates
hop_length : int > 0
Number of samples to advance for each frame
center : bool
If True, frames are centered by padding the edges of y. This is similar to the padding in librosa.core.stft, but uses edge-value copies instead of reflection.
kwargs : additional keyword arguments

See librosa.core.zero_crossings

Note

By default, the pad parameter is set to False, which differs from the default specified by librosa.core.zero_crossings.

zcr : np.ndarray [shape=(1, t)]
zcr[0, i] is the fraction of zero crossings in the i th frame
librosa.core.zero_crossings
Compute zero-crossings in a time-series
>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> librosa.feature.zero_crossing_rate(y)
array([[ 0.134,  0.139, ...,  0.387,  0.322]])

Harmonic-percussive source separation

Output methods

Output

Text output

Audio output

librosa.output.annotation(path, intervals, annotations=None, delimiter=', ', fmt='%0.3f')

Save annotations in a 3-column format:

intervals[0, 0],intervals[0, 1],annotations[0]\n
intervals[1, 0],intervals[1, 1],annotations[1]\n
intervals[2, 0],intervals[2, 1],annotations[2]\n
...

This can be used for segment or chord annotations.

path : str
path to save the output CSV file
intervals : np.ndarray [shape=(n, 2)]

array of interval start and end-times.

intervals[i, 0] marks the start time of interval i

intervals[i, 1] marks the end time of interval i

annotations : None or list-like [shape=(n,)]
optional list of annotation strings. annotations[i] applies to the time range intervals[i, 0] to intervals[i, 1]
delimiter : str
character to separate fields
fmt : str
format-string for rendering time data
ParameterError
if annotations is not None and length does not match intervals
>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> data = librosa.feature.mfcc(y=y, sr=sr, hop_length=512)

Detect segment boundaries

>>> boundaries = librosa.segment.agglomerative(data, k=10)

Convert to time

>>> boundary_times = librosa.frames_to_time(boundaries, sr=sr,
...                                         hop_length=512)

Convert events boundaries to intervals

>>> intervals = np.hstack([boundary_times[:-1, np.newaxis],
...                        boundary_times[1:, np.newaxis]])

Make some fake annotations

>>> labels = ['Seg #{:03d}'.format(i) for i in range(len(intervals))]

Save the output

>>> librosa.output.annotation('segments.csv', intervals,
...                           annotations=labels)
librosa.output.times_csv(path, times, annotations=None, delimiter=', ', fmt='%0.3f')

Save time steps as in CSV format. This can be used to store the output of a beat-tracker or segmentation algorihtm.

If only times are provided, the file will contain each value of times on a row:

times[0]\n
times[1]\n
times[2]\n
...

If annotations are also provided, the file will contain delimiter-separated values:

times[0],annotations[0]\n
times[1],annotations[1]\n
times[2],annotations[2]\n
...
path : string
path to save the output CSV file
times : list-like of floats
list of frame numbers for beat events
annotations : None or list-like
optional annotations for each time step
delimiter : str
character to separate fields
fmt : str
format-string for rendering time
ParameterError
if annotations is not None and length does not match times

Write beat-tracker time to CSV

>>> y, sr = librosa.load(librosa.util.example_audio_file())
>>> tempo, beats = librosa.beat.beat_track(y, sr=sr, units='time')
>>> librosa.output.times_csv('beat_times.csv', beats)
librosa.output.write_wav(path, y, sr, norm=False)

Output a time series as a .wav file

path : str
path to save the output wav file
y : np.ndarray [shape=(n,) or (2,n)]
audio time series (mono or stereo)
sr : int > 0 [scalar]
sampling rate of y
norm : boolean [scalar]
enable amplitude normalization. For floating point y, scale the data to the range [-1, +1].

Trim a signal to 5 seconds and save it back

>>> y, sr = librosa.load(librosa.util.example_audio_file(),
...                      duration=5.0)
>>> librosa.output.write_wav('file_trim_5s.wav', y, sr)