Skip to content

Operations

Tim Sharii edited this page Sep 18, 2021 · 13 revisions

High-level operations are based mostly, one way or another, on transforms and filters.

Here's the list of available operations:

Most operations can be carried out using the corresponding static methods of Operation class:

// convolution

var conv = Operation.Convolve(signal, kernel);
var xcorr = Operation.CrossCorrelate(signal1, signal2);

// block convolution

var filtered = Operation.BlockConvolve(signal, kernel, 4096, FilteringMethod.OverlapAdd);

// periodogram evaluation

var periodogram = Operation.Welch(signal, 2048, 1024);
var pgram = Operation.LombScargle(x, y, freqs);

// resampling

var resampled = Operation.Resample(signal, 22050);
var interpolated = Operation.Interpolate(signal, 3);
var decimated = Operation.Decimate(signal, 2);
var updown = Operation.ResampleUpDown(signal, 3, 2);

// tsm

var stretch = Operation.TimeStretch(signal, 0.7, TsmAlgorithm.Wsola);
var cool = Operation.TimeStretch(signal, 16, TsmAlgorithm.PaulStretch);

// envelope following

var envelope = Operation.Envelope(signal);

// peak / rms normalization

var peakNorm = Operation.NormalizePeak(signal, -3/*dB*/);
var rmsNorm = Operation.NormalizeRms(signal, -3/*dB*/);
var rmsChanged = Operation.ChangeRms(signal, -6/*dB*/);

// rectification

var halfRect = Operation.HalfRectify(signal);
var fullRect = Operation.FullRectify(signal);

// spectral subtraction

var clean = Operation.SpectralSubtract(signal, noise);

All of the above-mentioned functions are non-mutating and create new output signal for each input. These functions are OK for one-time call. For repeated operations it's usually better to call methods of special stateful classes responsible for each particular operation. Besides, there are some additional parameters that can be tweaked in methods (they weren't specified in the code above).

Convolver

Convolver class provides overloaded methods for carrying out fast convolution and cross-correlation via FFT. It works with its own internal buffers everytime its methods are called. Hence, this class should be used when processing signal sequentially, block after block.

var convolver = new Convolver(512); // FFT size

var kernel = new float[101];
// fill kernel

float[] output = new float[512];

convolver.Convolve(input1, kernel, output);
// do something with output

convolver.Convolve(input2, kernel, output);
// do something with output

convolver.Convolve(input3, kernel, output);
// do something with output


// cross-correlation has side-effect! it reverses second array

convolver.CrossCorrelate(input1, corr1, output);
// do something with output
// corr1 is now reversed

convolver.CrossCorrelate(input1, corr2, output);
// do something with output
// corr2 is now reversed

Remember, theoretical length of convolution/cross-correlation signal is input.Length + kernel.Length - 1. So the length of output array must be at least the nearest power of 2 to this value.

ComplexConvolver is similar class that convolves signals of type ComplexDiscreteSignal.

BlockConvolver

Two well-known techniques of block convolution are implemented:

  • Overlap-Add (OlaBlockConvolver class)
  • Overlap-Save (OlsBlockConvolver class)

Both classes implement IFilter and IOnlineFilter interfaces. Hence, they can be used as filters in offline and online processing of signals.

var kernel = firFilter.Kernel;
var processor = new OlaBlockConvolver(kernel, 4096);

// equivalent line:
var processor = OlaBlockConvolver.FromFilter(firFilter, 4096);

var filtered = processor.ApplyTo(signal);  // like any filter

Online processing:

// Overlap-Add / Overlap-Save

FirFilter filter = new FirFilter(kernel);

var blockConvolver = OlsBlockConvolver.FromFilter(filter, 16384);

// processing loop:
// while new input sample is available
{
     var outputSample = blockConvolver.Process(sample);
}

// or:
// while new input buffer is available
{
    blockConvolver.Process(input, output);
}

Note that the output will always be "late" by FftSize - KernelSize + 1 samples. The property (Ola|Ols)BlockConvoler.HopSize returns this value. So you might want to process first HopSize - 1 samples without storing the result anywhere (the samples will just get into delay line). For example, this is how offline method ApplyTo() is implemented for block convolvers:

var firstCount = Math.Min(HopSize - 1, signal.Length);

int i = 0, j = 0;

for (; i < firstCount; i++)    // first HopSize samples are just placed in the delay line
{
    Process(signal[i]);
}

var filtered = new float[signal.Length + _kernel.Length - 1];

for (; i < signal.Length; i++, j++)    // process
{
    filtered[j] = Process(signal[i]);
}

var lastCount = firstCount + _kernel.Length - 1;

for (i = 0; i < lastCount; i++, j++)    // get last 'late' samples
{
    filtered[j] = Process(0.0f);
}

See also OnlineDemoForm code.

For double-precision block convolution the 64-bit versions of block convolvers are available:

  • OlsBlockConvolver64
  • OlaBlockConvolver64

Resampler

Resampler class provides methods for simple decimation, interpolation, up-down resampling (for small factors) and band-limited resampling:

// signal is sampled at 16kHz

var resampler = new Resampler();

var signal_22_5 = resampler.Resample(signal, 22050);    // band-limited resampling

var signal_8 = resampler.Decimate(signal, 2);           // downsample to 8 kHz
var signal_48 = resampler.Interpolate(signal, 3);       // upsample to 48 kHz
var signal_24 = resampler.ResampleUpDown(signal, 3, 2); // resample to 24 kHz

For simple decimation/interpolation/resampling the three latter methods will work faster. Bandlimited resampling resampling is universal and will work for any sampling rates.

All methods use anti-aliasing low-pass filtering under the hood. By default, the lowpass filter is designed inside the routines (of order 101), but you can specify your own anti-aliasing filter as the 3rd parameter:

var lpFilter = DesignFilter.FirLp(301, 0.5f / 2);
var resampled = resampler.Decimate(signal, 2, lpFilter);

var fasterFilter = DesignFilter.FirLp(51, 0.5f / 3);
resampled = resampler.Interpolate(signal, 3, fasterFilter);

Rectification

Operation class provides methods for rectification. Unlike their DiscreteSignal analogs, these methods don't affect initial signal:

var fullRect = Operation.FullRectify(signal);
var halfRect = Operation.HalfRectify(signal);

signal.FullRectify();  // in-place
signal.HalfRectify();  // in-place

EnvelopeFollower

EnvelopeFollower class implements IOnlineFilter interface. It's used, for instance, in AutoWah audio effect.

The constructor has three parameters: 1) sampling rate; 2) attack time; 3) release time.

var envelope = new EnvelopeFollower(samplingRate, 0.01f, 0.05f);

// while new input sample is available
{
    var envelopeSample = envelope.Process(sample);
    //...
}

envelope.Reset();

In principle, envelope detection could also be achieved with simple low-pass filtering, but EnvelopeFollower usually gives better results.

Dynamics processing

Dynamics processors are another examples of systems based on envelope following. They are often classified as:

  • Compressor
  • Limiter
  • Expander
  • Noise gate

All these types are defined in enum DynamicsMode.

Algorithmically, the limiter and compressor are identical and the expander and noise gate are identical. The difference is only in the compression/expansion ratios:

Type Ratio
Limiter bigger (5:1, 10:1, etc.)
Compressor smaller (1.5:1, 2:1, etc.)
Expander smaller (1.5:1, 2:1, etc.)
Noise gate bigger (5:1, 7:1, etc.)
var sr = signal.SamplingRate;
var threshold = -10;/*dBFS*/
var makeupGain = 1;/*dB*/            // optional output gain

var attackTime = 0.01f;/*seconds*/   // optional parameter of underlying envelope follower
var releaseTime = 0.1f;/*seconds*/   // optional parameter of underlying envelope follower

var limiter = new DynamicsProcessor(DynamicsMode.Limiter,
                                    sr,
                                    threshold,
                                    9, // i.e. ratio 9:1
                                    makeupGain,
                                    attackTime,
                                    releaseTime);

var compressor = new DynamicsProcessor(DynamicsMode.Compressor, sr, threshold, 2);

var expander = new DynamicsProcessor(DynamicsMode.Expander, sr, threshold, 2);

var noiseGate = new DynamicsProcessor(DynamicsMode.NoiseGate, sr, threshold, 7);

// all parameters of DynamicsProcessor can be changed anytime during processing

limiter.Ratio = 8; /* 8:1 */
limiter.Threshold = -12/*dBFS*/;
limiter.MakeupGain = 0/*dB*/;
limiter.Attack = 0.03/*seconds*/;
limiter.Release = 0.12/*seconds*/;

// ...

Normalization

Methods for peak and RMS normalization of signal are available. We'll take a look at RMS functions first:

var rmsNorm = Operation.NormalizeRms(signal, -6/*dBFS*/);
// -6 dBFS means 6 dB decrease relative to 1.0 (full scale), i.e. approx. 0.5

// There's also method for changing RMS:

var rmsChanged = Operation.ChangeRms(signal, -6/*dB*/);
// -6 dB means 6 dB decrease relative to current RMS of the signal

// So, for example, if signal.Rms() = 0.12
// then
//   rmsNorm.Rms() = 0.5 (approx.)
//   rmsChanged.Rms() = 0.06 (approx.)

Same goes to peak normalization:

var peakNorm = Operation.NormalizePeak(signal, -6/*dBFS*/);
// -6 dBFS means 6 dB decrease relative to 1.0 (full scale), i.e. approx. 0.5

var peakChanged = Operation.ChangePeak(signal, -6/*dB*/);
// -6 dB means 6 dB decrease relative to current peak level of the signal

// So, for example, if signal.Peak() = 0.12
// then
//   peakNorm.Peak() = 0.5 (approx.)
//   peakChanged.Peak() = 0.06 (approx.)

// Extension method Peak():
// float Peak(this DiscreteSignal s) => s.Samples.Max(x => Math.Abs(x));

Periodogram evaluation

Basically, Welch's periodogram can be obtained simply as an averaged STFT:

var stft = new Stft(2048, 1024, WindowType.Hann);
float[] periodogram = stft.AveragePeriodogram(signal.Samples);

However sciPy version applies additional scaling to averaged periodogram. Hence, there's also separate method in NWaves, compliant with sciPy's version (and its scaling modes 'spectrum' and 'density'):

var periodogram = Operation.Welch(signal, 2048, 1024, WindowType.Hann, samplingRate: 0);
// sciPy equivalent: nperseg=2048, noverlap=1024, window="hann", scaling="spectrum"

var periodogram = Operation.Welch(signal, 2048, 1024, WindowType.Hann, samplingRate: 16000);
// sciPy equivalent: nperseg=2048, noverlap=1024, window="hann", scaling="density"

Lomb-Scargle periodogram evaluation is completely identical to sciPy version.

float[] x = ... ;   // sample times
float[] y = ... ;   // signal values at sample times

float[] freqs = ...;  // angular frequencies for output periodogram.

var periodogram = Operation.LombScargle(x, y, freqs, subtractMean: true, normalize: true);

TSM (time scale modification)

Four well-known TSM algorithms are implemented. Each one is reflected in TsmAlgorithm enum:

  • Phase vocoder
  • Phase vocoder with identity phase locking
  • WSOLA (waveform similarity overlap-add)
  • Paul stretch algorithm

In general, phase vocoder with phase locking (PVIPL) produces best results, so it's used by default in time-stretching operations. Wsola is usually good for speech signals. PaulStretch is different: it produces interesting sounds for large stretch factors (10 and more).

Each algorithm is coded in separate class implementing IFilter interface.

var wsola = new Wsola(0.75, windowSize, hopSize, maxDelta);
wsola = new Wsola(0.75); // parameters will be estimated automatically

var pvipl = new PhaseVocoderPhaseLocking(0.75, hopSize, fftSize);
var pv = new PhaseVocoder(1.25, hopSize, fftSize);
var paulStretch = new PaulStretch(16, hopSize, fftSize);

var output1 = wsola.ApplyTo(signal);
var output2 = pv.ApplyTo(signal);
var output3 = pvipl.ApplyTo(signal);
var output4 = paulStretch.ApplyTo(signal);

Parameters fftSize and hopSize can be tweaked. But general recommendation is to set relatively small hop length (corresponding to about 8-15ms), while size of FFT must be at least 6-7 times longer (and power of 2). For example, in case of signals sampled at 16kHz parameters fftSize=1024, hopSize=128 are OK (the computations will take longer time, though. Bigger hop length will lead to faster processing and poorer results).

Here are the parameters for WSOLA (can be viewed as the starting point for tuning):

// parameters are for 22.05 kHz sampling rate, so they will be adjusted for an input signal

if (_stretch > 1.5)        
{
    _windowSize = 1024;     // 46,4 ms
    _hopAnalysis = 128;     //  5,8 ms
}
else if (_stretch > 1.1)   
{
    _windowSize = 1536;     // 69,7 ms
    _hopAnalysis = 256;     // 10,6 ms
}
else if (_stretch > 0.6)
{
    _windowSize = 1536;     // 69,7 ms
    _hopAnalysis = 690;     // 31,3 ms
}
else
{
    _windowSize = 1024;     // 46,4 ms
    _hopAnalysis = 896;     // 40,6 ms
}

SpectralSubtractor

SpectralSubtractor class performs spectral subtraction according to

[1979] M.Berouti, R.Schwartz, J.Makhoul "Enhancement of Speech Corrupted by Acoustic Noise".

The class implements IFilter and IOnlineFilter interfaces.

// some noise signal is already measured or prepared

var subtractor = new SpectralSubtractor(noise, fftSize: 1024, hopSize: 300);
var clean = subtractor.ApplyTo(noisySignal);

// online:
// while input sample is available
{
    var outputSample = subtractor.Process(inputSample);
    //...
}

// noise can be re-estimated:

subtractor.EstimateNoise(newNoise);
clean = subtractor.ApplyTo(noisySignal);

Griffin-Lim algorithm

This algorithm allows reconstructing a signal from a given power or magnitude spectrogram. The algorithm is iterative so we can specify number of iterations in Reconstruct() method (by default, it's 20) or call Iterate() method any number of times.

var stft = new Stft(512, 200);

// someone has computed spectrogram before like this:
// var spectrogram = stft.Spectrogram(signal);

// and we'd like to reconstruct signal from spectrogram:

var griffinLim = new GriffinLimReconstructor(spectrogram, stft);    // for power spectrogram (default)
var griffinLim = new GriffinLimReconstructor(spectrogram, stft, 1); // for magnitude spectrogram
float[] reconstructed = griffinLim.Reconstruct();
var signal = new DiscreteSignal(16000, reconstructed);

// if we want more iterations:
float[] reconstructed = griffinLim.Reconstruct(50);

// step-by-step:
var reconstructed = griffinLim.Iterate();            // listen / visualize result
reconstructed = griffinLim.Iterate(reconstructed);   // listen / visualize result
reconstructed = griffinLim.Iterate(reconstructed);   // listen / visualize result

// done, the result is OK for us

HarmonicPercussiveSeparator

HarmonicPercussiveSeparator class performs HPSS according to

[2010] D.Fitzgerald. Harmonic/percussive separation using median filtering // 13th International Conference on Digital Audio Effects (DAFX10), Graz, Austria, 2010.

Conceptually the algorithm is pretty simple:

  • compute signal spectrogram
  • do median filtering along time axis (harmonic part)
  • do median filtering along frequency axis (percussive part)
  • post-process two filtered spectrograms using certain mask
  • reconstruct signals from updated spectrograms (optional)
var masking = HpsMasking.Wiener2;
var hpss = new HarmonicPercussiveSeparator(2048, 400, 25, 17, masking);

var signals = hpss.EvaluateSignals(signal);

// ...or we can compute only spectrograms
var spectrograms = hpss.EvaluateSpectrograms(signal);

harmonicSignal = signals.Item1;
percussiveSignal = signals.Item2;

harmonicSpectrograms = spectrograms.Item1;
percussiveSpectrograms = spectrograms.Item2;

Constructor parameters are:

  • FFT size for STFT
  • hop size for STFT
  • window size for "harmonic" median filter
  • window size for "percussive" median filter
  • masking mode (binary, Wiener1, Wiener2):
// we have two spectrograms;
// each sample in filtered spectrogram is post-processed based on itself (h) and
// the corresponding sample (p) of alternative spectrogram:

float BinaryMask(float h, float p)  =>  h > p ? 1 : 0;
float WienerMask1(float h, float p) =>  h + p > 1e-10 ? h / (h + p) : 0;
float WienerMask2(float h, float p) =>  h + p > 1e-10 ? h * h / (h * h + p * p) : 0;

Wave shaping

WaveShaper is a filter that maps an input signal to the output signal by applying certain arbitrary mathematical function (shaping function) to the input signal.

var waveShaper = new WaveShaper(x => x > 0 ? 1 : -1);
var shaped = waveShaper.ApplyTo(signal);


float ShapingFunction(float x)
{
    // ...
    return ...;
}

var waveShaper2 = new WaveShaper(ShapingFunction);
var shaped2 = waveShaper2.ApplyTo(signal);

Modulator

  • Amplitude modulation / demodulation
  • Frequency modulation / demodulation
  • Linear frequency modulation
  • Sinusoidal frequency modulation
  • Phase modulation
var modulator = new Modulator();

var ring = modulator.Ring(carrier, modulatorSignal);

var modAmp = modulator.Amplitude(carrier, 20/*Hz*/, 0.5f);
var demodAmp = modulator.DemodulateAmplitude(modAmp);

var modFreq = modulator.Frequency(baseband, carrierAmplitude: 1, carrierFrequency: 16000/*Hz*/, deviation: 0.1f);
var demodFreq = modulator.DemodulateFrequency(modFreq);

var modPhase = modulator.Phase(baseband, 1, 16000/*Hz*/, deviation: 0.1f);

Deconvolution

The deconvolution operation has well-known numerical problems when it's done via FFT. It will work OK if the corresponding polynomials can be divided.

var signal = new ComplexDiscreteSignal(1, new[] { 1, 0, 0, -27 });
var kernel = new ComplexDiscreteSignal(1, new[] { 1, -3 });

var c = new ComplexConvolver();
var deconvolved = c.Deconvolve(signal, kernel);

// deconvolved: { 1, 3, 9 }

// x^3 - 27 = (x - 3)(x^2 + 3x + 9)