Layers

Base layers

class audiotools.ml.layers.base.BaseModel[source]

Bases: Module

This is a class that adds useful save/load functionality to a torch.nn.Module object. BaseModel objects can be saved as torch.package easily, making them super easy to port between machines without requiring a ton of dependencies. Files can also be saved as just weights, in the standard way.

>>> class Model(ml.BaseModel):
>>>     def __init__(self, arg1: float = 1.0):
>>>         super().__init__()
>>>         self.arg1 = arg1
>>>         self.linear = nn.Linear(1, 1)
>>>
>>>     def forward(self, x):
>>>         return self.linear(x)
>>>
>>> model1 = Model()
>>>
>>> with tempfile.NamedTemporaryFile(suffix=".pth") as f:
>>>     model1.save(
>>>         f.name,
>>>     )
>>>     model2 = Model.load(f.name)
>>>     out2 = seed_and_run(model2, x)
>>>     assert torch.allclose(out1, out2)
>>>
>>>     model1.save(f.name, package=True)
>>>     model2 = Model.load(f.name)
>>>     model2.save(f.name, package=False)
>>>     model3 = Model.load(f.name)
>>>     out3 = seed_and_run(model3, x)
>>>
>>> with tempfile.TemporaryDirectory() as d:
>>>     model1.save_to_folder(d, {"data": 1.0})
>>>     Model.load_from_folder(d)
EXTERN = ['audiotools.**', 'tqdm', '__main__', 'numpy.**', 'julius.**', 'torchaudio.**', 'scipy.**', 'einops']

Names of libraries that are external to the torch.package saving mechanism. Source code from these libraries will not be packaged into the model. This can be edited by the user of this class by editing model.EXTERN.

INTERN = []

Names of libraries that are internal to the torch.package saving mechanism. Source code from these libraries will be saved alongside the model.

property device

Gets the device the model is on by looking at the device of the first parameter. May not be valid if model is split across multiple devices.

classmethod load(location: str, *args, package_name: Optional[str] = None, strict: bool = False, **kwargs)[source]

Load model from a path. Tries first to load as a package, and if that fails, tries to load as weights. The arguments to the class are specified inside the model weights file.

Parameters
  • location (str) – Path to file.

  • package_name (str, optional) – Name of package, by default cls.__name__.

  • strict (bool, optional) – Ignore unmatched keys, by default False

  • kwargs (dict) – Additional keyword arguments to the model instantiation, if not loading from package.

Returns

A model that inherits from BaseModel.

Return type

BaseModel

classmethod load_from_folder(folder: Union[str, Path], package: bool = True, strict: bool = False, **kwargs)[source]

Loads the model from a folder generated by audiotools.ml.layers.base.BaseModel.save_to_folder(). Like that function, this one looks for a subfolder that has the name of the class (e.g. folder/generator/[package, weights].pth if the model name was Generator).

Parameters
  • folder (Union[str, Path]) – _description_

  • package (bool, optional) – Whether to use torch.package to load the model, loading the model from package.pth.

  • strict (bool, optional) – Ignore unmatched keys, by default False

Returns

tuple of model and extra data as saved by audiotools.ml.layers.base.BaseModel.save_to_folder().

Return type

tuple

save(path: str, metadata: Optional[dict] = None, package: bool = True, intern: list = [], extern: list = [], mock: list = [])[source]

Saves the model, either as a torch package, or just as weights, alongside some specified metadata.

Parameters
  • path (str) – Path to save model to.

  • metadata (dict, optional) – Any metadata to save alongside the model, by default None

  • package (bool, optional) – Whether to use torch.package to save the model in a format that is portable, by default True

  • intern (list, optional) – List of additional libraries that are internal to the model, used with torch.package, by default []

  • extern (list, optional) – List of additional libraries that are external to the model, used with torch.package, by default []

  • mock (list, optional) – List of libraries to mock, used with torch.package, by default []

Returns

Path to saved model.

Return type

str

save_to_folder(folder: Union[str, Path], extra_data: Optional[dict] = None)[source]

Dumps a model into a folder, as both a package and as weights, as well as anything specified in extra_data. extra_data is a dictionary of other pickleable files, with the keys being the paths to save them in. The model is saved under a subfolder specified by the name of the class (e.g. folder/generator/[package, weights].pth if the model name was Generator).

>>> with tempfile.TemporaryDirectory() as d:
>>>     extra_data = {
>>>         "optimizer.pth": optimizer.state_dict()
>>>     }
>>>     model.save_to_folder(d, extra_data)
>>>     Model.load_from_folder(d)
Parameters
  • folder (Union[str, Path]) – _description_

  • extra_data (dict, optional) – _description_, by default None

Returns

Path to folder

Return type

str

training: bool

Spectral gate

class audiotools.ml.layers.spectral_gate.SpectralGate(n_freq: int = 3, n_time: int = 5)[source]

Bases: Module

Spectral gating algorithm for noise reduction, as in Audacity/Ocenaudio. The steps are as follows:

  1. An FFT is calculated over the noise audio clip

  2. Statistics are calculated over FFT of the the noise (in frequency)

  3. A threshold is calculated based upon the statistics of the noise (and the desired sensitivity of the algorithm)

  4. An FFT is calculated over the signal

  5. A mask is determined by comparing the signal FFT to the threshold

  6. The mask is smoothed with a filter over frequency and time

  7. The mask is appled to the FFT of the signal, and is inverted

Implementation inspired by Tim Sainburg’s noisereduce:

https://timsainburg.com/noise-reduction-python.html

Parameters
  • n_freq (int, optional) – Number of frequency bins to smooth by, by default 3

  • n_time (int, optional) – Number of time bins to smooth by, by default 5

forward(audio_signal: AudioSignal, nz_signal: AudioSignal, denoise_amount: float = 1.0, n_std: float = 3.0, win_length: int = 2048, hop_length: int = 512)[source]

Perform noise reduction.

Parameters
  • audio_signal (AudioSignal) – Audio signal that noise will be removed from.

  • nz_signal (AudioSignal, optional) – Noise signal to compute noise statistics from.

  • denoise_amount (float, optional) – Amount to denoise by, by default 1.0

  • n_std (float, optional) – Number of standard deviations above which to consider noise, by default 3.0

  • win_length (int, optional) – Length of window for STFT, by default 2048

  • hop_length (int, optional) – Hop length for STFT, by default 512

Returns

Denoised audio signal.

Return type

AudioSignal

training: bool