librosa.decompose.hpss

librosa.decompose.hpss(S, kernel_size=31, power=2.0, mask=False, margin=1.0)[source]

Median-filtering harmonic percussive source separation (HPSS).

If margin = 1.0, decomposes an input spectrogram S = H + P where H contains the harmonic components, and P contains the percussive components.

If margin > 1.0, decomposes an input spectrogram S = H + P + R where R contains residual components not included in H or P.

This implementation is based upon the algorithm described by [R30] and [R31].

[R30]Fitzgerald, Derry. “Harmonic/percussive separation using median filtering.” 13th International Conference on Digital Audio Effects (DAFX10), Graz, Austria, 2010.
[R31](1, 2) Driedger, Müller, Disch. “Extending harmonic-percussive separation of audio.” 15th International Society for Music Information Retrieval Conference (ISMIR 2014), Taipei, Taiwan, 2014.
Parameters:

S : np.ndarray [shape=(d, n)]

input spectrogram. May be real (magnitude) or complex.

kernel_size : int or tuple (kernel_harmonic, kernel_percussive)

kernel size(s) for the median filters.

  • If scalar, the same size is used for both harmonic and percussive.
  • If tuple, the first value specifies the width of the harmonic filter, and the second value specifies the width of the percussive filter.

power : float > 0 [scalar]

Exponent for the Wiener filter when constructing soft mask matrices.

mask : bool

Return the masking matrices instead of components.

Masking matrices contain non-negative real values that can be used to measure the assignment of energy from S into harmonic or percussive components.

Components can be recovered by multiplying S * mask_H or S * mask_P.

margin : float or tuple (margin_harmonic, margin_percussive)

margin size(s) for the masks (as described in [R31])

  • If scalar, the same size is used for both harmonic and percussive.
  • If tuple, the first value specifies the margin of the harmonic mask, and the second value specifies the margin of the percussive mask.
Returns:

harmonic : np.ndarray [shape=(d, n)]

harmonic component (or mask)

percussive : np.ndarray [shape=(d, n)]

percussive component (or mask)

See also

util.softmask

Notes

This function caches at level 30.

Examples

Separate into harmonic and percussive

>>> y, sr = librosa.load(librosa.util.example_audio_file(), duration=15)
>>> D = librosa.stft(y)
>>> H, P = librosa.decompose.hpss(D)
>>> import matplotlib.pyplot as plt
>>> plt.figure()
>>> plt.subplot(3, 1, 1)
>>> librosa.display.specshow(librosa.amplitude_to_db(D,
...                                                  ref=np.max),
...                          y_axis='log')
>>> plt.colorbar(format='%+2.0f dB')
>>> plt.title('Full power spectrogram')
>>> plt.subplot(3, 1, 2)
>>> librosa.display.specshow(librosa.amplitude_to_db(H,
...                                                  ref=np.max),
...                          y_axis='log')
>>> plt.colorbar(format='%+2.0f dB')
>>> plt.title('Harmonic power spectrogram')
>>> plt.subplot(3, 1, 3)
>>> librosa.display.specshow(librosa.amplitude_to_db(P,
...                                                  ref=np.max),
...                          y_axis='log')
>>> plt.colorbar(format='%+2.0f dB')
>>> plt.title('Percussive power spectrogram')
>>> plt.tight_layout()

Or with a narrower horizontal filter

>>> H, P = librosa.decompose.hpss(D, kernel_size=(13, 31))

Just get harmonic/percussive masks, not the spectra

>>> mask_H, mask_P = librosa.decompose.hpss(D, mask=True)
>>> mask_H
array([[  1.000e+00,   1.469e-01, ...,   2.648e-03,   2.164e-03],
       [  1.000e+00,   2.368e-01, ...,   9.413e-03,   7.703e-03],
       ...,
       [  8.869e-01,   5.673e-02, ...,   4.603e-02,   1.247e-05],
       [  7.068e-01,   2.194e-02, ...,   4.453e-02,   1.205e-05]], dtype=float32)
>>> mask_P
array([[  2.858e-05,   8.531e-01, ...,   9.974e-01,   9.978e-01],
       [  1.586e-05,   7.632e-01, ...,   9.906e-01,   9.923e-01],
       ...,
       [  1.131e-01,   9.433e-01, ...,   9.540e-01,   1.000e+00],
       [  2.932e-01,   9.781e-01, ...,   9.555e-01,   1.000e+00]], dtype=float32)

Separate into harmonic/percussive/residual components by using a margin > 1.0

>>> H, P = librosa.decompose.hpss(D, margin=3.0)
>>> R = D - (H+P)
>>> y_harm = librosa.core.istft(H)
>>> y_perc = librosa.core.istft(P)
>>> y_resi = librosa.core.istft(R)

Get a more isolated percussive component by widening its margin

>>> H, P = librosa.decompose.hpss(D, margin=(1.0,5.0))

(Source code)

../_images/librosa-decompose-hpss-1.png