Layers

Base layers

class audiotools.ml.layers.base.BaseModel[source]

Bases: Module

This is a class that adds useful save/load functionality to a torch.nn.Module object. BaseModel objects can be saved as torch.package easily, making them super easy to port between machines without requiring a ton of dependencies. Files can also be saved as just weights, in the standard way.

>>> class Model(ml.BaseModel):
>>>     def __init__(self, arg1: float = 1.0):
>>>         super().__init__()
>>>         self.arg1 = arg1
>>>         self.linear = nn.Linear(1, 1)
>>>
>>>     def forward(self, x):
>>>         return self.linear(x)
>>>
>>> model1 = Model()
>>>
>>> with tempfile.NamedTemporaryFile(suffix=".pth") as f:
>>>     model1.save(
>>>         f.name,
>>>     )
>>>     model2 = Model.load(f.name)
>>>     out2 = seed_and_run(model2, x)
>>>     assert torch.allclose(out1, out2)
>>>
>>>     model1.save(f.name, package=True)
>>>     model2 = Model.load(f.name)
>>>     model2.save(f.name, package=False)
>>>     model3 = Model.load(f.name)
>>>     out3 = seed_and_run(model3, x)
>>>
>>> with tempfile.TemporaryDirectory() as d:
>>>     model1.save_to_folder(d, {"data": 1.0})
>>>     Model.load_from_folder(d)

EXTERN = ['audiotools.**', 'tqdm', '__main__', 'numpy.**', 'julius.**', 'torchaudio.**', 'scipy.**', 'einops']: Names of libraries that are external to the torch.package saving mechanism. Source code from these libraries will not be packaged into the model. This can be edited by the user of this class by editing model.EXTERN.

INTERN = []: Names of libraries that are internal to the torch.package saving mechanism. Source code from these libraries will be saved alongside the model.

property device: Gets the device the model is on by looking at the device of the first parameter. May not be valid if model is split across multiple devices.

classmethod load(location: str, *args, package_name: Optional[str] = None, strict: bool = False, **kwargs)[source]

Load model from a path. Tries first to load as a package, and if that fails, tries to load as weights. The arguments to the class are specified inside the model weights file.

Parameters

location (str) – Path to file.
package_name (str, optional) – Name of package, by default cls.__name__.
strict (bool, optional) – Ignore unmatched keys, by default False
kwargs (dict) – Additional keyword arguments to the model instantiation, if not loading from package.

Returns

A model that inherits from BaseModel.

Return type

BaseModel

classmethod load_from_folder(folder: Union[str, Path], package: bool = True, strict: bool = False, **kwargs)[source]

Loads the model from a folder generated by audiotools.ml.layers.base.BaseModel.save_to_folder(). Like that function, this one looks for a subfolder that has the name of the class (e.g. folder/generator/[package, weights].pth if the model name was Generator).

Parameters

folder (Union[str, Path]) – _description_
package (bool, optional) – Whether to use torch.package to load the model, loading the model from package.pth.
strict (bool, optional) – Ignore unmatched keys, by default False

Returns

tuple of model and extra data as saved by audiotools.ml.layers.base.BaseModel.save_to_folder().

Return type

tuple

save(path: str, metadata: Optional[dict] = None, package: bool = True, intern: list = [], extern: list = [], mock: list = [])[source]

Saves the model, either as a torch package, or just as weights, alongside some specified metadata.

Parameters

path (str) – Path to save model to.
metadata (dict, optional) – Any metadata to save alongside the model, by default None
package (bool, optional) – Whether to use torch.package to save the model in a format that is portable, by default True
intern (list, optional) – List of additional libraries that are internal to the model, used with torch.package, by default []
extern (list, optional) – List of additional libraries that are external to the model, used with torch.package, by default []
mock (list, optional) – List of libraries to mock, used with torch.package, by default []

Returns

Path to saved model.

Return type

str

save_to_folder(folder: Union[str, Path], extra_data: Optional[dict] = None)[source]

Dumps a model into a folder, as both a package and as weights, as well as anything specified in extra_data. extra_data is a dictionary of other pickleable files, with the keys being the paths to save them in. The model is saved under a subfolder specified by the name of the class (e.g. folder/generator/[package, weights].pth if the model name was Generator).

>>> with tempfile.TemporaryDirectory() as d:
>>>     extra_data = {
>>>         "optimizer.pth": optimizer.state_dict()
>>>     }
>>>     model.save_to_folder(d, extra_data)
>>>     Model.load_from_folder(d)

Parameters

folder (Union[str, Path]) – _description_
extra_data (dict, optional) – _description_, by default None

Returns

Path to folder

Return type

str

training: bool

Spectral gate

class audiotools.ml.layers.spectral_gate.SpectralGate(n_freq: int = 3, n_time: int = 5)[source]

Bases: Module

Spectral gating algorithm for noise reduction, as in Audacity/Ocenaudio. The steps are as follows:

An FFT is calculated over the noise audio clip
Statistics are calculated over FFT of the the noise (in frequency)
A threshold is calculated based upon the statistics of the noise (and the desired sensitivity of the algorithm)
An FFT is calculated over the signal
A mask is determined by comparing the signal FFT to the threshold
The mask is smoothed with a filter over frequency and time
The mask is appled to the FFT of the signal, and is inverted

Implementation inspired by Tim Sainburg’s noisereduce:

https://timsainburg.com/noise-reduction-python.html

Parameters

n_freq (int, optional) – Number of frequency bins to smooth by, by default 3
n_time (int, optional) – Number of time bins to smooth by, by default 5

forward(audio_signal: AudioSignal, nz_signal: AudioSignal, denoise_amount: float = 1.0, n_std: float = 3.0, win_length: int = 2048, hop_length: int = 512)[source]

Perform noise reduction.

Parameters

audio_signal (AudioSignal) – Audio signal that noise will be removed from.
nz_signal (AudioSignal, optional) – Noise signal to compute noise statistics from.
denoise_amount (float, optional) – Amount to denoise by, by default 1.0
n_std (float, optional) – Number of standard deviations above which to consider noise, by default 3.0
win_length (int, optional) – Length of window for STFT, by default 2048
hop_length (int, optional) – Hop length for STFT, by default 512

Returns

Denoised audio signal.

Return type

AudioSignal

training: bool