AudioSignal
Base functionality
- class audiotools.core.audio_signal.AudioSignal(audio_path_or_array: Union[Tensor, str, Path, ndarray], sample_rate: Optional[int] = None, stft_params: Optional[STFTParams] = None, offset: float = 0, duration: Optional[float] = None, device: Optional[str] = None)[source]
Bases:
EffectMixin
,LoudnessMixin
,PlayMixin
,ImpulseResponseMixin
,DSPMixin
,DisplayMixin
,FFMPEGMixin
,WhisperMixin
This is the core object of this library. Audio is always loaded into an AudioSignal, which then enables all the features of this library, including audio augmentations, I/O, playback, and more.
The structure of this object is that the base functionality is defined in
core/audio_signal.py
, while extensions to that functionality are defined in the othercore/*.py
files. For example, all the display-based functionality (e.g. plot spectrograms, waveforms, write to tensorboard) are incore/display.py
.- Parameters
audio_path_or_array (Union[torch.Tensor, str, Path, np.ndarray]) – Object to create AudioSignal from. Can be a tensor, numpy array, or a path to a file. The file is always reshaped to
sample_rate (int, optional) – Sample rate of the audio. If different from underlying file, resampling is performed. If passing in an array or tensor, this must be defined, by default None
stft_params (STFTParams, optional) – Parameters of STFT to use. , by default None
offset (float, optional) – Offset in seconds to read from file, by default 0
duration (float, optional) – Duration in seconds to read from file, by default None
device (str, optional) – Device to load audio onto, by default None
Examples
Loading an AudioSignal from an array, at a sample rate of 44100.
>>> signal = AudioSignal(torch.randn(5*44100), 44100)
Note, the signal is reshaped to have a batch size, and one audio channel:
>>> print(signal.shape) (1, 1, 44100)
You can treat AudioSignals like tensors, and many of the same functions you might use on tensors are defined for AudioSignals as well:
>>> signal.to("cuda") >>> signal.cuda() >>> signal.clone() >>> signal.detach()
Indexing AudioSignals returns an AudioSignal:
>>> signal[..., 3*44100:4*44100]
The above signal is 1 second long, and is also an AudioSignal.
- property audio_data
Returns the audio data tensor in the object.
Audio data is always of the shape (batch_size, num_channels, num_samples). If value has less than 3 dims (e.g. is (num_channels, num_samples)), then it will be reshaped to (1, num_channels, num_samples) - a batch size of 1.
- Parameters
data (Union[torch.Tensor, np.ndarray]) – Audio data to set.
- Returns
Audio samples.
- Return type
torch.Tensor
- classmethod batch(audio_signals: list, pad_signals: bool = False, truncate_signals: bool = False, resample: bool = False, dim: int = 0)[source]
Creates a batched AudioSignal from a list of AudioSignals.
- Parameters
audio_signals (list[AudioSignal]) – List of AudioSignal objects
pad_signals (bool, optional) – Whether to pad signals to length of the maximum length AudioSignal in the list, by default False
truncate_signals (bool, optional) – Whether to truncate signals to length of shortest length AudioSignal in the list, by default False
resample (bool, optional) – Whether to resample AudioSignal to the sample rate of the first AudioSignal in the list, by default False
dim (int, optional) – Dimension along which to batch the signals.
- Returns
Batched AudioSignal.
- Return type
- Raises
RuntimeError – If not all AudioSignals are the same sample rate, and
resample=False
, an error is raised.RuntimeError – If not all AudioSignals are the same the length, and both
pad_signals=False
andtruncate_signals=False
, an error is raised.
Examples
Batching a bunch of random signals:
>>> signal_list = [AudioSignal(torch.randn(44100), 44100) for _ in range(10)] >>> signal = AudioSignal.batch(signal_list) >>> print(signal.shape) (10, 1, 44100)
- property batch_size
Batch size of audio signal.
- Returns
Batch size of signal.
- Return type
int
- clone()[source]
Clones all tensors contained in the AudioSignal, and returns a copy of the signal with everything cloned. Useful when using AudioSignal within autograd computation graphs.
Relevant attributes are the stft data, the audio data, and the loudness of the file.
- Returns
Clone of AudioSignal.
- Return type
- compute_stft_padding(window_length: int, hop_length: int, match_stride: bool)[source]
Compute how the STFT should be padded, based on match_stride.
- Parameters
window_length (int) – Window length of STFT.
hop_length (int) – Hop length of STFT.
match_stride (bool) – Whether or not to match stride, making the STFT have the same alignment as convolutional layers.
- Returns
Amount to pad on either side of audio.
- Return type
tuple
- deepcopy()[source]
Copies the signal and all of its attributes.
- Returns
Deep copy of the audio signal.
- Return type
- detach()[source]
Detaches tensors contained in AudioSignal.
Relevant attributes are the stft data, the audio data, and the loudness of the file.
- Returns
Same signal, but with all tensors detached.
- Return type
- property device
Get device that AudioSignal is on.
- Returns
Device that AudioSignal is on.
- Return type
torch.device
- property duration
Length of audio signal in seconds.
- Returns
Length of signal in seconds.
- Return type
float
- classmethod excerpt(audio_path: Union[str, Path], offset: Optional[float] = None, duration: Optional[float] = None, state: Optional[Union[RandomState, int]] = None, **kwargs)[source]
Randomly draw an excerpt of
duration
seconds from an audio file specified ataudio_path
, betweenoffset
seconds and end of file.state
can be used to seed the random draw.- Parameters
audio_path (Union[str, Path]) – Path to audio file to grab excerpt from.
offset (float, optional) – Lower bound for the start time, in seconds drawn from the file, by default None.
duration (float, optional) – Duration of excerpt, in seconds, by default None
state (Union[np.random.RandomState, int], optional) – RandomState or seed of random state, by default None
- Returns
AudioSignal containing excerpt.
- Return type
Examples
>>> signal = AudioSignal.excerpt("path/to/audio", duration=5)
- static get_dct(n_mfcc: int, n_mels: int, norm: str = 'ortho', device: str = None)[source]
Create a discrete cosine transform (DCT) transformation matrix with shape (
n_mels
,n_mfcc
), it can be normalized depending on norm. For more information about dct: http://en.wikipedia.org/wiki/Discrete_cosine_transform#DCT-II- Parameters
n_mfcc (int) – Number of mfccs
n_mels (int) – Number of mels
norm (str) – Use “ortho” to get a orthogonal matrix or None, by default “ortho”
device (str, optional) – Device to load the transformation matrix on, by default None
- Returns
The dct transformation matrix.
- Return type
torch.Tensor [shape=(n_mels, n_mfcc)] T
- static get_mel_filters(sr: int, n_fft: int, n_mels: int, fmin: float = 0.0, fmax: float = None)[source]
Create a Filterbank matrix to combine FFT bins into Mel-frequency bins.
- Parameters
sr (int) – Sample rate of audio
n_fft (int) – Number of FFT bins
n_mels (int) – Number of mels
fmin (float, optional) – Lowest frequency, in Hz, by default 0.0
fmax (float, optional) – Highest frequency, by default None
- Returns
Mel transform matrix
- Return type
np.ndarray [shape=(n_mels, 1 + n_fft/2)]
- static get_window(window_type: str, window_length: int, device: str)[source]
Wrapper around scipy.signal.get_window so one can also get the popular sqrt-hann window. This function caches for efficiency using functools.lru_cache.
- Parameters
window_type (str) – Type of window to get
window_length (int) – Length of the window
device (str) – Device to put window onto.
- Returns
Window returned by scipy.signal.get_window, as a tensor.
- Return type
torch.Tensor
- hash()[source]
Writes the audio data to a temporary file, and then hashes it using hashlib. Useful for creating a file name based on the audio content.
- Returns
Hash of audio data.
- Return type
str
Examples
Creating a signal, and writing it to a unique file name:
>>> signal = AudioSignal(torch.randn(44100), 44100) >>> hash = signal.hash() >>> signal.write(f"{hash}.wav")
- istft(window_length: Optional[int] = None, hop_length: Optional[int] = None, window_type: Optional[str] = None, match_stride: Optional[bool] = None, length: Optional[int] = None)[source]
Computes inverse STFT and sets it to audio_data.
- Parameters
window_length (int, optional) – Window length of STFT, by default
0.032 * self.sample_rate
.hop_length (int, optional) – Hop length of STFT, by default
window_length // 4
.window_type (str, optional) – Type of window to use, by default
sqrt\_hann
.match_stride (bool, optional) – Whether to match the stride of convolutional layers, by default False
length (int, optional) – Original length of signal, by default None
- Returns
AudioSignal with istft applied.
- Return type
- Raises
RuntimeError – Raises an error if stft was not called prior to istft on the signal, or if stft_data is not set.
- property length
Length of audio signal.
- Returns
Length of signal in samples.
- Return type
int
- load_from_array(audio_array: Union[Tensor, ndarray], sample_rate: int, device: str = 'cpu')[source]
Loads data from array, reshaping it to be exactly 3 dimensions. Used internally when AudioSignal is called with a tensor or an array.
- Parameters
audio_array (Union[torch.Tensor, np.ndarray]) – Array/tensor of audio of samples.
sample_rate (int) – Sample rate of audio
device (str, optional) – Device to move audio onto, by default “cpu”
- Returns
AudioSignal loaded from array
- Return type
- load_from_file(audio_path: Union[str, Path], offset: float, duration: float, device: str = 'cpu')[source]
Loads data from file. Used internally when AudioSignal is instantiated with a path to a file.
- Parameters
audio_path (Union[str, Path]) – Path to file
offset (float) – Offset in seconds
duration (float) – Duration in seconds
device (str, optional) – Device to put AudioSignal on, by default “cpu”
- Returns
AudioSignal loaded from file
- Return type
- log_magnitude(ref_value: float = 1.0, amin: float = 1e-05, top_db: float = 80.0)[source]
Computes the log-magnitude of the spectrogram.
- Parameters
ref_value (float, optional) – The magnitude is scaled relative to
ref
:20 * log10(S / ref)
. Zeros in the output correspond to positions whereS == ref
, by default 1.0amin (float, optional) – Minimum threshold for
S
andref
, by default 1e-5top_db (float, optional) – Threshold the output at
top_db
below the peak:max(10 * log10(S/ref)) - top_db
, by default -80.0
- Returns
Log-magnitude spectrogram
- Return type
torch.Tensor
- property magnitude
Computes and returns the absolute value of the STFT, which is the magnitude. This value can also be set to some tensor. When set,
self.stft_data
is manipulated so that its magnitude matches what this is set to, and modulated by the phase.- Returns
Magnitude of STFT.
- Return type
torch.Tensor
Examples
>>> signal = AudioSignal(torch.randn(44100), 44100) >>> magnitude = signal.magnitude # Computes stft if not computed >>> magnitude[magnitude < magnitude.mean()] = 0 >>> signal.magnitude = magnitude >>> signal.istft()
- markdown()[source]
Produces a markdown representation of AudioSignal, in a markdown table.
- Returns
Markdown representation of AudioSignal.
- Return type
str
Examples
>>> signal = AudioSignal(torch.randn(44100), 44100) >>> print(signal.markdown()) | Key | Value |---|--- | duration | 1.000 seconds | | batch_size | 1 | | path | path unknown | | sample_rate | 44100 | | num_channels | 1 | | audio_data.shape | torch.Size([1, 1, 44100]) | | stft_params | STFTParams(window_length=2048, hop_length=512, window_type='sqrt_hann', match_stride=False) | | device | cpu |
- mel_spectrogram(n_mels: int = 80, mel_fmin: float = 0.0, mel_fmax: Optional[float] = None, **kwargs)[source]
Computes a Mel spectrogram.
- Parameters
n_mels (int, optional) – Number of mels, by default 80
mel_fmin (float, optional) – Lowest frequency, in Hz, by default 0.0
mel_fmax (float, optional) – Highest frequency, by default None
kwargs (dict, optional) – Keyword arguments to self.stft().
- Returns
Mel spectrogram.
- Return type
torch.Tensor [shape=(batch, channels, mels, time)]
- mfcc(n_mfcc: int = 40, n_mels: int = 80, log_offset: float = 1e-06, **kwargs)[source]
Computes mel-frequency cepstral coefficients (MFCCs).
- Parameters
n_mfcc (int, optional) – Number of mels, by default 40
n_mels (int, optional) – Number of mels, by default 80
log_offset (float, optional) – Small value to prevent numerical issues when trying to compute log(0), by default 1e-6
kwargs (dict, optional) – Keyword arguments to self.mel_spectrogram(), note that some of them will be used for self.stft()
- Returns
MFCCs.
- Return type
torch.Tensor [shape=(batch, channels, mfccs, time)]
- property num_channels
Number of audio channels.
- Returns
Number of audio channels.
- Return type
int
- numpy()[source]
Detaches
self.audio_data
, moves to cpu, and converts to numpy.- Returns
Audio data as a numpy array.
- Return type
np.ndarray
- property path_to_input_file
Path to input file, if it exists. Alias to
path_to_file
for backwards compatibility
- property phase
Computes and returns the phase of the STFT. This value can also be set to some tensor. When set,
self.stft_data
is manipulated so that its phase matches what this is set to, we original magnitudeith th.- Returns
Phase of STFT.
- Return type
torch.Tensor
Examples
>>> signal = AudioSignal(torch.randn(44100), 44100) >>> phase = signal.phase # Computes stft if not computed >>> phase[phase < phase.mean()] = 0 >>> signal.phase = phase >>> signal.istft()
- resample(sample_rate: int)[source]
Resamples the audio, using sinc interpolation. This works on both cpu and gpu, and is much faster on gpu.
- Parameters
sample_rate (int) – Sample rate to resample to.
- Returns
Resampled AudioSignal
- Return type
- classmethod salient_excerpt(audio_path: Union[str, Path], loudness_cutoff: Optional[float] = None, num_tries: int = 8, state: Optional[Union[RandomState, int]] = None, **kwargs)[source]
Similar to AudioSignal.excerpt, except it extracts excerpts only if they are above a specified loudness threshold, which is computed via a fast LUFS routine.
- Parameters
audio_path (Union[str, Path]) – Path to audio file to grab excerpt from.
loudness_cutoff (float, optional) – Loudness threshold in dB. Typical values are
-40, -60
, etc, by default Nonenum_tries (int, optional) – Number of tries to grab an excerpt above the threshold before giving up, by default 8.
state (Union[np.random.RandomState, int], optional) – RandomState or seed of random state, by default None
kwargs (dict) – Keyword arguments to AudioSignal.excerpt
- Returns
AudioSignal containing excerpt.
- Return type
Warning
if
num_tries
is set to None,salient_excerpt
may try forever, which can result in an infinite loop ifaudio_path
does not have any loud enough excerpts.Examples
>>> signal = AudioSignal.salient_excerpt( "path/to/audio", loudness_cutoff=-40, duration=5 )
- property samples
Returns the audio data tensor in the object.
Audio data is always of the shape (batch_size, num_channels, num_samples). If value has less than 3 dims (e.g. is (num_channels, num_samples)), then it will be reshaped to (1, num_channels, num_samples) - a batch size of 1.
- Parameters
data (Union[torch.Tensor, np.ndarray]) – Audio data to set.
- Returns
Audio samples.
- Return type
torch.Tensor
- property shape
Shape of audio data.
- Returns
Shape of audio data.
- Return type
tuple
- property signal_duration
Length of audio signal in seconds.
- Returns
Length of signal in seconds.
- Return type
float
- property signal_length
Length of audio signal.
- Returns
Length of signal in samples.
- Return type
int
- stft(window_length: Optional[int] = None, hop_length: Optional[int] = None, window_type: Optional[str] = None, match_stride: Optional[bool] = None, padding_type: Optional[str] = None)[source]
Computes the short-time Fourier transform of the audio data, with specified STFT parameters.
- Parameters
window_length (int, optional) – Window length of STFT, by default
0.032 * self.sample_rate
.hop_length (int, optional) – Hop length of STFT, by default
window_length // 4
.window_type (str, optional) – Type of window to use, by default
sqrt\_hann
.match_stride (bool, optional) – Whether to match the stride of convolutional layers, by default False
padding_type (str, optional) – Type of padding to use, by default ‘reflect’
- Returns
STFT of audio data.
- Return type
torch.Tensor
Examples
Compute the STFT of an AudioSignal:
>>> signal = AudioSignal(torch.randn(44100), 44100) >>> signal.stft()
Vary the window and hop length:
>>> stft_params = [STFTParams(128, 32), STFTParams(512, 128)] >>> for stft_param in stft_params: >>> signal.stft_params = stft_params >>> signal.stft()
- property stft_data
Returns the STFT data inside the signal. Shape is (batch, channels, frequencies, time).
- Returns
Complex spectrogram data.
- Return type
torch.Tensor
- property stft_params
Returns STFTParams object, which can be re-used to other AudioSignals.
This property can be set as well. If values are not defined in STFTParams, they are inferred automatically from the signal properties. The default is to use 32ms windows, with 8ms hop length, and the square root of the hann window.
- Returns
STFT parameters for the AudioSignal.
- Return type
Examples
>>> stft_params = STFTParams(128, 32) >>> signal1 = AudioSignal(torch.randn(44100), 44100, stft_params=stft_params) >>> signal2 = AudioSignal(torch.randn(44100), 44100, stft_params=signal1.stft_params) >>> signal1.stft_params = STFTParams() # Defaults
- to(device: str)[source]
Moves all tensors contained in signal to the specified device.
- Parameters
device (str) – Device to move AudioSignal onto. Typical values are “cuda”, “cpu”, or “cuda:n” to specify the nth gpu.
- Returns
AudioSignal with all tensors moved to specified device.
- Return type
- to_mono()[source]
Converts audio data to mono audio, by taking the mean along the channels dimension.
- Returns
AudioSignal with mean of channels.
- Return type
- trim(before: int, after: int)[source]
Trims the audio_data tensor before and after.
- Parameters
before (int) – How many samples to trim from beginning.
after (int) – How many samples to trim from end.
- Returns
AudioSignal with trimming applied.
- Return type
- truncate_samples(length_in_samples: int)[source]
Truncate signal to specified length.
- Parameters
length_in_samples (int) – Truncate to this many samples.
- Returns
AudioSignal with truncation applied.
- Return type
- classmethod wave(frequency: float, duration: float, sample_rate: int, num_channels: int = 1, shape: str = 'sine', **kwargs)[source]
Generate a waveform of a given frequency and shape.
- Parameters
frequency (float) – Frequency of the waveform
duration (float) – Duration of the waveform
sample_rate (int) – Sample rate of the waveform
num_channels (int, optional) – Number of channels, by default 1
shape (str, optional) – Shape of the waveform, by default “saw” One of “sawtooth”, “square”, “sine”, “triangle”
kwargs (dict) – Keyword arguments to AudioSignal
- write(audio_path: Union[str, Path])[source]
Writes audio to a file. Only writes the audio that is in the very first item of the batch. To write other items in the batch, index the signal along the batch dimension before writing. After writing, the signal’s
path_to_file
attribute is updated to the new path.- Parameters
audio_path (Union[str, Path]) – Path to write audio to.
- Returns
Returns original AudioSignal, so you can use this in a fluent interface.
- Return type
Examples
Creating and writing a signal to disk:
>>> signal = AudioSignal(torch.randn(10, 1, 44100), 44100) >>> signal.write("/tmp/out.wav")
Writing a different element of the batch:
>>> signal[5].write("/tmp/out.wav")
Using this in a fluent interface:
>>> signal.write("/tmp/original.wav").low_pass(4000).write("/tmp/lowpass.wav")
- zero_pad(before: int, after: int)[source]
Zero pads the audio_data tensor before and after.
- Parameters
before (int) – How many zeros to prepend to audio.
after (int) – How many zeros to append to audio.
- Returns
AudioSignal with padding applied.
- Return type
- zero_pad_to(length: int, mode: str = 'after')[source]
Pad with zeros to a specified length, either before or after the audio data.
- Parameters
length (int) – Length to pad to
mode (str, optional) – Whether to prepend or append zeros to signal, by default “after”
- Returns
AudioSignal with padding applied.
- Return type
- classmethod zeros(duration: float, sample_rate: int, num_channels: int = 1, batch_size: int = 1, **kwargs)[source]
Helper function create an AudioSignal of all zeros.
- Parameters
duration (float) – Duration of AudioSignal
sample_rate (int) – Sample rate of AudioSignal
num_channels (int, optional) – Number of channels, by default 1
batch_size (int, optional) – Batch size, by default 1
- Returns
AudioSignal containing all zeros.
- Return type
Examples
Generate 5 seconds of all zeros at a sample rate of 44100.
>>> signal = AudioSignal.zeros(5.0, 44100)
- class audiotools.core.audio_signal.STFTParams(window_length, hop_length, window_type, match_stride, padding_type)
Bases:
tuple
STFTParams object is a container that holds STFT parameters - window_length, hop_length, and window_type. Not all parameters need to be specified. Ones that are not specified will be inferred by the AudioSignal parameters.
- Parameters
window_length (int, optional) – Window length of STFT, by default
0.032 * self.sample_rate
.hop_length (int, optional) – Hop length of STFT, by default
window_length // 4
.window_type (str, optional) – Type of window to use, by default
sqrt\_hann
.match_stride (bool, optional) – Whether to match the stride of convolutional layers, by default False
padding_type (str, optional) – Type of padding to use, by default ‘reflect’
- hop_length
Alias for field number 1
- match_stride
Alias for field number 3
- padding_type
Alias for field number 4
- window_length
Alias for field number 0
- window_type
Alias for field number 2
Displaying and visualizing
- class audiotools.core.display.DisplayMixin[source]
Bases:
object
- save_image(image_path: str, plot_fn: Union[Callable, str] = 'specshow', **kwargs)[source]
Save AudioSignal spectrogram (or whatever
plot_fn
is set to) to a specified file.- Parameters
image_path (str) – Where to save the file to.
plot_fn (Union[Callable, str], optional) – How to create the image. Set to
None
to avoid plotting, by default “specshow”kwargs (dict, optional) – Keyword arguments to
audiotools.core.display.DisplayMixin.specshow()
or whateverplot_fn
is set to.
- specshow(preemphasis: bool = False, x_axis: str = 'time', y_axis: str = 'linear', n_mels: int = 128, **kwargs)[source]
Displays a spectrogram, using
librosa.display.specshow
.- Parameters
preemphasis (bool, optional) – Whether or not to apply preemphasis, which makes high frequency detail easier to see, by default False
x_axis (str, optional) – How to label the x axis, by default “time”
y_axis (str, optional) – How to label the y axis, by default “linear”
n_mels (int, optional) – If displaying a mel spectrogram with
y_axis = "mel"
, this controls the number of mels, by default 128.kwargs (dict, optional) – Keyword arguments to
audiotools.core.util.format_figure()
.
- waveplot(x_axis: str = 'time', **kwargs)[source]
Displays a waveform plot, using
librosa.display.waveshow
.- Parameters
x_axis (str, optional) – How to label the x axis, by default “time”
kwargs (dict, optional) – Keyword arguments to
audiotools.core.util.format_figure()
.
- wavespec(x_axis: str = 'time', **kwargs)[source]
Displays a waveform plot, using
librosa.display.waveshow
.- Parameters
x_axis (str, optional) – How to label the x axis, by default “time”
kwargs (dict, optional) – Keyword arguments to
audiotools.core.display.DisplayMixin.specshow()
.
- write_audio_to_tb(tag: str, writer, step: Optional[int] = None, plot_fn: Union[Callable, str] = 'specshow', **kwargs)[source]
Writes a signal and its spectrogram to Tensorboard. Will show up under the Audio and Images tab in Tensorboard.
- Parameters
tag (str) – Tag to write signal to (e.g.
clean/sample_0.wav
). The image will be written to the corresponding.png
file (e.g.clean/sample_0.png
).writer (SummaryWriter) – A SummaryWriter object from PyTorch library.
step (int, optional) – The step to write the signal to, by default None
plot_fn (Union[Callable, str], optional) – How to create the image. Set to
None
to avoid plotting, by default “specshow”kwargs (dict, optional) – Keyword arguments to
audiotools.core.display.DisplayMixin.specshow()
or whateverplot_fn
is set to.
- audiotools.core.display.format_figure(func)[source]
Decorator for formatting figures produced by the code below. See
audiotools.core.util.format_figure()
for more.- Parameters
func (Callable) – Plotting function that is decorated by this function.
Digital signal processing
- class audiotools.core.dsp.DSPMixin[source]
Bases:
object
- collect_windows(window_duration: float, hop_duration: float, preprocess: bool = True)[source]
Reshapes signal into windows of specified duration from signal with a specified hop length. Window are placed along the batch dimension. Use with
audiotools.core.dsp.DSPMixin.overlap_and_add()
to reconstruct the original signal.- Parameters
window_duration (float) – Duration of every window in seconds.
hop_duration (float) – Hop between windows in seconds.
preprocess (bool, optional) – Whether to preprocess the signal, so that the first sample is in the middle of the first window, by default True
- Returns
AudioSignal unfolded with shape
(nb * nch * num_windows, 1, window_length)
- Return type
- corrupt_phase(scale: Union[Tensor, ndarray, float])[source]
Corrupts the phase randomly by some scaled value.
- Parameters
scale (Union[torch.Tensor, np.ndarray, float]) – Standard deviation of noise to add to the phase.
- Returns
Signal with
stft_data
manipulated. Apply.istft()
to get the masked audio data.- Return type
- high_pass(cutoffs: Union[Tensor, ndarray, float], zeros: int = 51)[source]
High-passes the signal in-place. Each item in the batch can have a different high-pass cutoff, if the input to this signal is an array or tensor. If a float, all items are given the same high-pass filter.
- Parameters
cutoffs (Union[torch.Tensor, np.ndarray, float]) – Cutoff in Hz of high-pass filter.
zeros (int, optional) – Number of taps to use in high-pass filter, by default 51
- Returns
High-passed AudioSignal.
- Return type
- low_pass(cutoffs: Union[Tensor, ndarray, float], zeros: int = 51)[source]
Low-passes the signal in-place. Each item in the batch can have a different low-pass cutoff, if the input to this signal is an array or tensor. If a float, all items are given the same low-pass filter.
- Parameters
cutoffs (Union[torch.Tensor, np.ndarray, float]) – Cutoff in Hz of low-pass filter.
zeros (int, optional) – Number of taps to use in low-pass filter, by default 51
- Returns
Low-passed AudioSignal.
- Return type
- mask_frequencies(fmin_hz: Union[Tensor, ndarray, float], fmax_hz: Union[Tensor, ndarray, float], val: float = 0.0)[source]
Masks frequencies between
fmin_hz
andfmax_hz
, and fills them with the value specified byval
. Useful for implementing SpecAug. The min and max can be different for every item in the batch.- Parameters
fmin_hz (Union[torch.Tensor, np.ndarray, float]) – Lower end of band to mask out.
fmax_hz (Union[torch.Tensor, np.ndarray, float]) – Upper end of band to mask out.
val (float, optional) – Value to fill in, by default 0.0
- Returns
Signal with
stft_data
manipulated. Apply.istft()
to get the masked audio data.- Return type
- mask_low_magnitudes(db_cutoff: Union[Tensor, ndarray, float], val: float = 0.0)[source]
Mask away magnitudes below a specified threshold, which can be different for every item in the batch.
- Parameters
db_cutoff (Union[torch.Tensor, np.ndarray, float]) – Decibel value for which things below it will be masked away.
val (float, optional) – Value to fill in for masked portions, by default 0.0
- Returns
Signal with
stft_data
manipulated. Apply.istft()
to get the masked audio data.- Return type
- mask_timesteps(tmin_s: Union[Tensor, ndarray, float], tmax_s: Union[Tensor, ndarray, float], val: float = 0.0)[source]
Masks timesteps between
tmin_s
andtmax_s
, and fills them with the value specified byval
. Useful for implementing SpecAug. The min and max can be different for every item in the batch.- Parameters
tmin_s (Union[torch.Tensor, np.ndarray, float]) – Lower end of timesteps to mask out.
tmax_s (Union[torch.Tensor, np.ndarray, float]) – Upper end of timesteps to mask out.
val (float, optional) – Value to fill in, by default 0.0
- Returns
Signal with
stft_data
manipulated. Apply.istft()
to get the masked audio data.- Return type
- overlap_and_add(hop_duration: float)[source]
Function which takes a list of windows and overlap adds them into a signal the same length as
audio_signal
.- Parameters
hop_duration (float) – How much to shift for each window (overlap is window_duration - hop_duration) in seconds.
- Returns
overlap-and-added signal.
- Return type
- preemphasis(coef: float = 0.85)[source]
Applies pre-emphasis to audio signal.
- Parameters
coef (float, optional) – How much pre-emphasis to apply, lower values do less. 0 does nothing. by default 0.85
- Returns
Pre-emphasized signal.
- Return type
- shift_phase(shift: Union[Tensor, ndarray, float])[source]
Shifts the phase by a constant value.
- Parameters
shift (Union[torch.Tensor, np.ndarray, float]) – What to shift the phase by.
- Returns
Signal with
stft_data
manipulated. Apply.istft()
to get the masked audio data.- Return type
- windows(window_duration: float, hop_duration: float, preprocess: bool = True)[source]
Generator which yields windows of specified duration from signal with a specified hop length.
- Parameters
window_duration (float) – Duration of every window in seconds.
hop_duration (float) – Hop between windows in seconds.
preprocess (bool, optional) – Whether to preprocess the signal, so that the first sample is in the middle of the first window, by default True
- Yields
AudioSignal – Each window is returned as an AudioSignal.
Audio effects
- class audiotools.core.effects.EffectMixin[source]
Bases:
object
- CODEC_PRESETS = {'8-bit': {'bits_per_sample': 8, 'encoding': 'ULAW', 'format': 'wav'}, 'Amr-nb': {'format': 'amr-nb'}, 'GSM-FR': {'format': 'gsm'}, 'MP3': {'compression': -9, 'format': 'mp3'}, 'Ogg': {'compression': -1, 'format': 'ogg'}, 'Vorbis': {'compression': -1, 'format': 'vorbis'}}
Presets for applying codecs via torchaudio.
- GAIN_FACTOR = 0.11512925464970229
Gain factor for converting between amplitude and decibels.
- apply_codec(preset: Optional[str] = None, format: str = 'wav', encoding: Optional[str] = None, bits_per_sample: Optional[int] = None, compression: Optional[int] = None)[source]
Applies an audio codec to the signal.
- Parameters
preset (str, optional) – One of the keys in
self.CODEC_PRESETS
, by default Noneformat (str, optional) – Format for audio codec, by default “wav”
encoding (str, optional) – Encoding to use, by default None
bits_per_sample (int, optional) – How many bits per sample, by default None
compression (int, optional) – Compression amount of codec, by default None
- Returns
AudioSignal with codec applied.
- Return type
- Raises
ValueError – If preset is not in
self.CODEC_PRESETS
, an error is thrown.
- apply_ir(ir, drr: Optional[Union[Tensor, ndarray, float]] = None, ir_eq: Optional[Union[Tensor, ndarray]] = None, use_original_phase: bool = False)[source]
Applies an impulse response to the signal. If ` is`ir_eq`` is specified, the impulse response is equalized before it is applied, using the given curve.
- Parameters
ir (AudioSignal) – Impulse response to convolve with.
drr (Union[torch.Tensor, np.ndarray, float], optional) – Direct-to-reverberant ratio that impulse response will be altered to, if specified, by default None
ir_eq (Union[torch.Tensor, np.ndarray], optional) – Equalization that will be applied to impulse response if specified, by default None
use_original_phase (bool, optional) – Whether to use the original phase, instead of the convolved phase, by default False
- Returns
Signal with impulse response applied to it
- Return type
- clip_distortion(clip_percentile: Union[Tensor, ndarray, float])[source]
Clips the signal at a given percentile. The higher it is, the lower the threshold for clipping.
- Parameters
clip_percentile (Union[torch.Tensor, np.ndarray, float]) – Values are between 0.0 to 1.0. Typical values are 0.1 or below.
- Returns
Audio signal with clipped audio data.
- Return type
- convolve(other, start_at_max: bool = True)[source]
Convolves self with other. This function uses FFTs to do the convolution.
- Parameters
other (AudioSignal) – Signal to convolve with.
start_at_max (bool, optional) – Whether to start at the max value of other signal, to avoid inducing delays, by default True
- Returns
Convolved signal, in-place.
- Return type
- ensure_max_of_audio(max: float = 1.0)[source]
Ensures that
abs(audio_data) <= max
.- Parameters
max (float, optional) – Max absolute value of signal, by default 1.0
- Returns
Signal with values scaled between -max and max.
- Return type
- equalizer(db: Union[Tensor, ndarray])[source]
Applies a mel-spaced equalizer to the audio signal.
- Parameters
db (Union[torch.Tensor, np.ndarray]) – EQ curve to apply.
- Returns
AudioSignal with equalization applied.
- Return type
- mel_filterbank(n_bands: int)[source]
Breaks signal into mel bands.
- Parameters
n_bands (int) – Number of mel bands to use.
- Returns
Mel-filtered bands, with last axis being the band index.
- Return type
torch.Tensor
- mix(other, snr: Union[Tensor, ndarray, float] = 10, other_eq: Optional[Union[Tensor, ndarray]] = None)[source]
Mixes noise with signal at specified signal-to-noise ratio. Optionally, the other signal can be equalized in-place.
- Parameters
other (AudioSignal) – AudioSignal object to mix with.
snr (Union[torch.Tensor, np.ndarray, float], optional) – Signal to noise ratio, by default 10
other_eq (Union[torch.Tensor, np.ndarray], optional) – EQ curve to apply to other signal, if any, by default None
- Returns
In-place modification of AudioSignal.
- Return type
- mulaw_quantization(quantization_channels: Union[Tensor, ndarray, int])[source]
Applies mu-law quantization to the input waveform.
- Parameters
quantization_channels (Union[torch.Tensor, np.ndarray, int]) – Number of mu-law spaced quantization channels to quantize to.
- Returns
Quantized AudioSignal.
- Return type
- normalize(db: Union[Tensor, ndarray, float] = -24.0)[source]
Normalizes the signal’s volume to the specified db, in LUFS. This is GPU-compatible, making for very fast loudness normalization.
- Parameters
db (Union[torch.Tensor, np.ndarray, float], optional) – Loudness to normalize to, by default -24.0
- Returns
Normalized audio signal.
- Return type
- pitch_shift(n_semitones: int, quick: bool = True)[source]
Pitch shift the signal. All items in the batch get the same pitch shift.
- Parameters
n_semitones (int) – How many semitones to shift the signal by.
quick (bool, optional) – Using quick pitch shifting, by default True
- Returns
Pitch shifted audio signal.
- Return type
- quantization(quantization_channels: Union[Tensor, ndarray, int])[source]
Applies quantization to the input waveform.
- Parameters
quantization_channels (Union[torch.Tensor, np.ndarray, int]) – Number of evenly spaced quantization channels to quantize to.
- Returns
Quantized AudioSignal.
- Return type
- time_stretch(factor: float, quick: bool = True)[source]
Time stretch the audio signal.
- Parameters
factor (float) – Factor by which to stretch the AudioSignal. Typically between 0.8 and 1.2.
quick (bool, optional) – Whether to use quick time stretching, by default True
- Returns
Time-stretched AudioSignal.
- Return type
- class audiotools.core.effects.ImpulseResponseMixin[source]
Bases:
object
These functions are generally only used with AudioSignals that are derived from impulse responses, not other sources like music or speech. These methods are used to replicate the data augmentation described in [1].
Bryan, Nicholas J. “Impulse response data augmentation and deep neural networks for blind room acoustic parameter estimation.” ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020.
- alter_drr(drr: Union[Tensor, ndarray, float])[source]
Alters the direct-to-reverberant ratio of the impulse response.
- Parameters
drr (Union[torch.Tensor, np.ndarray, float]) – Direct-to-reverberant ratio that impulse response will be altered to, if specified, by default None
- Returns
Altered impulse response.
- Return type
FFMPEG routines
- class audiotools.core.ffmpeg.FFMPEGMixin[source]
Bases:
object
- ffmpeg_loudness(quiet: bool = True)[source]
Computes loudness of audio file using FFMPEG.
- Parameters
quiet (bool, optional) – Whether to show FFMPEG output during computation, by default True
- Returns
Loudness of every item in the batch, computed via FFMPEG.
- Return type
torch.Tensor
- ffmpeg_resample(sample_rate: int, quiet: bool = True)[source]
Resamples AudioSignal using FFMPEG. More memory-efficient than using julius.resample for long audio files.
- Parameters
sample_rate (int) – Sample rate to resample to.
quiet (bool, optional) – Whether to show FFMPEG output during computation, by default True
- Returns
Resampled AudioSignal.
- Return type
- classmethod load_from_file_with_ffmpeg(audio_path: str, quiet: bool = True, **kwargs)[source]
Loads AudioSignal object after decoding it to a wav file using FFMPEG. Useful for loading audio that isn’t covered by librosa’s loading mechanism. Also useful for loading mp3 files, without any offset.
- Parameters
audio_path (str) – Path to load AudioSignal from.
quiet (bool, optional) – Whether to show FFMPEG output during computation, by default True
- Returns
AudioSignal loaded from file with FFMPEG.
- Return type
- audiotools.core.ffmpeg.r128stats(filepath: str, quiet: bool)[source]
Takes a path to an audio file, returns a dict with the loudness stats computed by the ffmpeg ebur128 filter.
- Parameters
filepath (str) – Path to compute loudness stats on.
quiet (bool) – Whether to show FFMPEG output during computation.
- Returns
Dictionary containing loudness stats.
- Return type
dict
Perceptual loudness
- class audiotools.core.loudness.LoudnessMixin[source]
Bases:
object
- MIN_LOUDNESS = -70
Minimum loudness possible.
- loudness(filter_class: str = 'K-weighting', block_size: float = 0.4, **kwargs)[source]
Calculates loudness using an implementation of ITU-R BS.1770-4. Allows control over gating block size and frequency weighting filters for additional control. Measure the integrated gated loudness of a signal.
API is derived from PyLoudnorm, but this implementation is ported to PyTorch and is tensorized across batches. When on GPU, an FIR approximation of the IIR filters is used to compute loudness for speed.
Uses the weighting filters and block size defined by the meter the integrated loudness is measured based upon the gating algorithm defined in the ITU-R BS.1770-4 specification.
- Parameters
filter_class (str, optional) – Class of weighting filter used. K-weighting’ (default), ‘Fenton/Lee 1’ ‘Fenton/Lee 2’, ‘Dash et al.’ by default “K-weighting”
block_size (float, optional) – Gating block size in seconds, by default 0.400
kwargs (dict, optional) – Keyword arguments to
audiotools.core.loudness.Meter()
.
- Returns
Loudness of audio data.
- Return type
torch.Tensor
- class audiotools.core.loudness.Meter(rate: int, filter_class: str = 'K-weighting', block_size: float = 0.4, zeros: int = 512, use_fir: bool = False)[source]
Bases:
Module
Tensorized version of pyloudnorm.Meter. Works with batched audio tensors.
- Parameters
rate (int) – Sample rate of audio.
filter_class (str, optional) – Class of weighting filter used. K-weighting’ (default), ‘Fenton/Lee 1’ ‘Fenton/Lee 2’, ‘Dash et al.’ by default “K-weighting”
block_size (float, optional) – Gating block size in seconds, by default 0.400
zeros (int, optional) – Number of zeros to use in FIR approximation of IIR filters, by default 512
use_fir (bool, optional) – Whether to use FIR approximation or exact IIR formulation. If computing on GPU,
use_fir=True
will be used, as its much faster, by default False
- apply_filter(data: Tensor)[source]
Applies filter on either CPU or GPU, depending on if the audio is on GPU or is on CPU, or if
self.use_fir
is True.- Parameters
data (torch.Tensor) – Audio data of shape (nb, nch, nt).
- Returns
Filtered audio data.
- Return type
torch.Tensor
- apply_filter_cpu(data: Tensor)[source]
Performs IIR formulation of loudness computation.
- Parameters
data (torch.Tensor) – Audio data of shape (nb, nch, nt).
- Returns
Filtered audio data.
- Return type
torch.Tensor
- apply_filter_gpu(data: Tensor)[source]
Performs FIR approximation of loudness computation.
- Parameters
data (torch.Tensor) – Audio data of shape (nb, nch, nt).
- Returns
Filtered audio data.
- Return type
torch.Tensor
- property filter_class
- forward(data: Tensor)[source]
Computes integrated loudness of data.
- Parameters
data (torch.Tensor) – Audio data of shape (nb, nch, nt).
- Returns
Filtered audio data.
- Return type
torch.Tensor
- integrated_loudness(data: Tensor)[source]
Computes integrated loudness of data.
- Parameters
data (torch.Tensor) – Audio data of shape (nb, nch, nt).
- Returns
Filtered audio data.
- Return type
torch.Tensor
- training: bool
Listening to AudioSignals
These are utilities that allow one to embed an AudioSignal as a playable object in a Jupyter notebook, or to play audio from the terminal, etc.
- class audiotools.core.playback.PlayMixin[source]
Bases:
object
- embed(ext: Optional[str] = None, display: bool = True, return_html: bool = False)[source]
Embeds audio as a playable audio embed in a notebook, or HTML document, etc.
- Parameters
ext (str, optional) – Extension to use when saving the audio, by default “.wav”
display (bool, optional) – This controls whether or not to display the audio when called. This is used when the embed is the last line in a Jupyter cell, to prevent the audio from being embedded twice, by default True
return_html (bool, optional) – Whether to return the data wrapped in an HTML audio element, by default False
- Returns
Either the element for display, or the HTML string of it.
- Return type
str
- play()[source]
Plays an audio signal if ffplay from the ffmpeg suite of tools is installed. Otherwise, will fail. The audio signal is written to a temporary file and then played with ffplay.
- widget(title: Optional[str] = None, ext: str = '.wav', add_headers: bool = True, player_width: str = '100%', margin: str = '10px', plot_fn: str = 'specshow', return_html: bool = False, **kwargs)[source]
Creates a playable widget with spectrogram. Inspired (heavily) by https://sjvasquez.github.io/blog/melnet/.
- Parameters
title (str, optional) – Title of plot, placed in upper right of top-most axis.
ext (str, optional) – Extension for embedding, by default “.mp3”
add_headers (bool, optional) – Whether or not to add headers (use for first embed, False for later embeds), by default True
player_width (str, optional) – Width of the player, as a string in a CSS rule, by default “100%”
margin (str, optional) – Margin on all sides of player, by default “10px”
plot_fn (function, optional) – Plotting function to use (by default self.specshow).
return_html (bool, optional) – Whether to return the data wrapped in an HTML audio element, by default False
kwargs (dict, optional) – Keyword arguments to plot_fn (by default self.specshow).
- Returns
HTML object.
- Return type
HTML
Utilities
- class audiotools.core.util.Info(sample_rate: float, num_frames: int)[source]
Bases:
object
Shim for torchaudio.info API changes.
- property duration: float
- num_frames: int
- sample_rate: float
- audiotools.core.util.chdir(newdir: Union[Path, str])[source]
Context manager for switching directories to run a function. Useful for when you want to use relative paths to different runs.
- Parameters
newdir (Union[Path, str]) – Directory to switch to.
- audiotools.core.util.choose_from_list_of_lists(state: RandomState, list_of_lists: list, p: Optional[float] = None)[source]
Choose a single item from a list of lists.
- Parameters
state (np.random.RandomState) – Random state to use when choosing an item.
list_of_lists (list) – A list of lists from which items will be drawn.
p (float, optional) – Probabilities of each list, by default None
- Returns
An item from the list of lists.
- Return type
Any
- audiotools.core.util.collate(list_of_dicts: list, n_splits: Optional[int] = None)[source]
Collates a list of dictionaries (e.g. as returned by a dataloader) into a dictionary with batched values. This routine uses the default torch collate function for everything except AudioSignal objects, which are handled by the
audiotools.core.audio_signal.AudioSignal.batch()
function.This function takes n_splits to enable splitting a batch into multiple sub-batches for the purposes of gradient accumulation, etc.
- Parameters
list_of_dicts (list) – List of dictionaries to be collated.
n_splits (int) – Number of splits to make when creating the batches (split into sub-batches). Useful for things like gradient accumulation.
- Returns
Dictionary containing batched data.
- Return type
dict
- audiotools.core.util.ensure_tensor(x: Union[ndarray, Tensor, float, int], ndim: Optional[int] = None, batch_size: Optional[int] = None)[source]
Ensures that the input
x
is a tensor of specified dimensions and batch size.- Parameters
x (Union[np.ndarray, torch.Tensor, float, int]) – Data that will become a tensor on its way out.
ndim (int, optional) – How many dimensions should be in the output, by default None
batch_size (int, optional) – The batch size of the output, by default None
- Returns
Modified version of
x
as a tensor.- Return type
torch.Tensor
- audiotools.core.util.find_audio(folder: str, ext: List[str] = ['.wav', '.flac', '.mp3', '.mp4'])[source]
Finds all audio files in a directory recursively. Returns a list.
- Parameters
folder (str) – Folder to look for audio files in, recursively.
ext (List[str], optional) – Extensions to look for without the ., by default
['.wav', '.flac', '.mp3', '.mp4']
.
- audiotools.core.util.format_figure(fig_size: Optional[tuple] = None, title: Optional[str] = None, fig=None, format_axes: bool = True, format: bool = True, font_color: str = 'white')[source]
Prettifies the spectrogram and waveform plots. A title can be inset into the top right corner, and the axes can be inset into the figure, allowing the data to take up the entire image. Used in
- Parameters
fig_size (tuple, optional) – Size of figure, by default (9, 3)
title (str, optional) – Title to inset in top right, by default None
fig (matplotlib.figure.Figure, optional) – Figure object, if None
plt.gcf()
will be used, by default Noneformat_axes (bool, optional) – Format the axes to be inside the figure, by default True
format (bool, optional) – This formatting can be skipped entirely by passing
format=False
to any of the plotting functions that use this formater, by default Truefont_color (str, optional) – Color of font of axes, by default “white”
- audiotools.core.util.generate_chord_dataset(max_voices: int = 8, sample_rate: int = 44100, num_items: int = 5, duration: float = 1.0, min_note: str = 'C2', max_note: str = 'C6', output_dir: Path = 'chords')[source]
Generates a toy multitrack dataset of chords, synthesized from sine waves.
- Parameters
max_voices (int, optional) – Maximum number of voices in a chord, by default 8
sample_rate (int, optional) – Sample rate of audio, by default 44100
num_items (int, optional) – Number of items to generate, by default 5
duration (float, optional) – Duration of each item, by default 1.0
min_note (str, optional) – Minimum note in the dataset, by default “C2”
max_note (str, optional) – Maximum note in the dataset, by default “C6”
output_dir (Path, optional) – Directory to save the dataset, by default “chords”
- audiotools.core.util.hz_to_bin(hz: Tensor, n_fft: int, sample_rate: int)[source]
Closest frequency bin given a frequency, number of bins, and a sampling rate.
- Parameters
hz (torch.Tensor) – Tensor of frequencies in Hz.
n_fft (int) – Number of FFT bins.
sample_rate (int) – Sample rate of audio.
- Returns
Closest bins to the data.
- Return type
torch.Tensor
- audiotools.core.util.info(audio_path: str)[source]
Shim for torchaudio.info to make 0.7.2 API match 0.8.0.
- Parameters
audio_path (str) – Path to audio file.
- audiotools.core.util.prepare_batch(batch: Union[dict, list, Tensor], device: str = 'cpu')[source]
Moves items in a batch (typically generated by a DataLoader as a list or a dict) to the specified device. This works even if dictionaries are nested.
- Parameters
batch (Union[dict, list, torch.Tensor]) – Batch, typically generated by a dataloader, that will be moved to the device.
device (str, optional) – Device to move batch to, by default “cpu”
- Returns
Batch with all values moved to the specified device.
- Return type
Union[dict, list, torch.Tensor]
- audiotools.core.util.random_state(seed: Union[int, RandomState])[source]
Turn seed into a np.random.RandomState instance.
- Parameters
seed (Union[int, np.random.RandomState] or None) – If seed is None, return the RandomState singleton used by np.random. If seed is an int, return a new RandomState instance seeded with seed. If seed is already a RandomState instance, return it. Otherwise raise ValueError.
- Returns
Random state object.
- Return type
np.random.RandomState
- Raises
ValueError – If seed is not valid, an error is thrown.
- audiotools.core.util.read_sources(sources: List[str], remove_empty: bool = True, relative_path: str = '', ext: List[str] = ['.wav', '.flac', '.mp3', '.mp4'])[source]
Reads audio sources that can either be folders full of audio files, or CSV files that contain paths to audio files. CSV files that adhere to the expected format can be generated by
audiotools.data.preprocess.create_csv()
.- Parameters
sources (List[str]) – List of audio sources to be converted into a list of lists of audio files.
remove_empty (bool, optional) – Whether or not to remove rows with an empty “path” from each CSV file, by default True.
- Returns
List of lists of rows of CSV files.
- Return type
list
- audiotools.core.util.sample_from_dist(dist_tuple: tuple, state: Optional[RandomState] = None)[source]
Samples from a distribution defined by a tuple. The first item in the tuple is the distribution type, and the rest of the items are arguments to that distribution. The distribution function is gotten from the
np.random.RandomState
object.- Parameters
dist_tuple (tuple) – Distribution tuple
state (np.random.RandomState, optional) – Random state, or seed to use, by default None
- Returns
Draw from the distribution.
- Return type
Union[float, int, str]
Examples
Sample from a uniform distribution:
>>> dist_tuple = ("uniform", 0, 1) >>> sample_from_dist(dist_tuple)
Sample from a constant distribution:
>>> dist_tuple = ("const", 0) >>> sample_from_dist(dist_tuple)
Sample from a normal distribution:
>>> dist_tuple = ("normal", 0, 0.5) >>> sample_from_dist(dist_tuple)
- audiotools.core.util.seed(random_seed, set_cudnn=False)[source]
Seeds all random states with the same random seed for reproducibility. Seeds
numpy
,random
andtorch
random generators. For full reproducibility, two further options must be set according to the torch documentation: https://pytorch.org/docs/stable/notes/randomness.html To do this,set_cudnn
must be True. It defaults to False, since setting it to True results in a performance hit.- Parameters
random_seed (int) – integer corresponding to random seed to
use. –
set_cudnn (bool) – Whether or not to set cudnn into determinstic
False. (mode and off of benchmark mode. Defaults to) –