You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To avoid crashes and undefined behaviors, I think it's worth revising the API / metadata to cover expected input and output channel counts for waveform-to-waveform models. To illustrate the current behaviors, I wrote a simple "pass-through" effect that returns whatever audio is provided (via torch.nn.Identity). Here are the results:
INPUT CHANNELS: 2 / MULTICHANNEL FLAG: False / OUTPUT LABELS: 1
Sums the stereo track to a single mono output track (via sum, not average)
INPUT CHANNELS: 1 / MULTICHANNEL FLAG: False / OUTPUT LABELS: 1
Works as expected (creates one output track)
INPUT CHANNELS: 2 / MULTICHANNEL FLAG: False / OUTPUT LABELS: 2
Sums the stereo track to a single mono output track (via sum, not average) and creates an
additional empty track.
INPUT CHANNELS: 1 / MULTICHANNEL FLAG: False / OUTPUT LABELS: 2
Creates one output track with the input data and one empty track
INPUT CHANNELS: 2 / MULTICHANNEL FLAG: True / OUTPUT LABELS: 1
Crashes upon applying effect
INPUT CHANNELS: 1 / MULTICHANNEL FLAG: True / OUTPUT LABELS: 1
Works as expected (creates one output track)
INPUT CHANNELS: 2 / MULTICHANNEL FLAG: True / OUTPUT LABELS: 2
Works as expected (creates two output tracks); however, in some situations a user may wish
to map a stereo track to a single stereo track without splitting
INPUT CHANNELS: 1 / MULTICHANNEL FLAG: True / OUTPUT LABELS: 2
Creates one output track with the input data and one empty track
To confirm the summing behavior, I doubled a mono track to stereo, set the effect gain to 0.5, inverted the summed output track, and made sure it canceled the stereo inputs.
I see a few issues with the current behaviors:
the interaction between the multichannel flag and labels can cause crashes, even though the former is only ostensibly responsible for downmixing stereo inputs to mono
labels implicitly determines the number of output tracks created, which is probably something that should be determined explicitly
in many instances a user may wish to map stereo inputs to stereo outputs rather than multiple mono tracks; however, the current setup does not distinguish between output tracks and channels, meaning that users have to upmix mono outputs to stereo manually
finally, the metadata contains a mix of attributes that determine actual model behavior (sample_rate, multichannel, etc.) and descriptors. Thus, developers have to keep track of the metadata and model behavior simultaneously and make sure the two match correctly
Some possible fixes:
In the Audacitorch API, remove the multichannel field and add metadata fields that specify (1) the input / output channel counts and (2) whether to automatically upmix the outputs to a single track, assuming this is feasible within Audacity
Make behavior-determining attributes (sample_rate, etc.) constructor arguments to WaveformToWaveformBase and store them there.
Generate metadata directly from a WaveformToWaveformBase model a with a utility function or class method that takes the remaining descriptors as arguments, ensuring that metadata and model are matched. This could also be folded into a single serialization / scripting utility function if we really want to streamline.
If we can catch Torchscript exceptions within Audacity and pass informative error messages, perform checks on input channel dimensions at runtime based on attributes stored within the scripted model or in the metadata
During validation, trim or auto-generate the contents of labels to match the effect outputs, so that labels does not determine any behavior beyond track naming
The text was updated successfully, but these errors were encountered:
To avoid crashes and undefined behaviors, I think it's worth revising the API / metadata to cover expected input and output channel counts for waveform-to-waveform models. To illustrate the current behaviors, I wrote a simple "pass-through" effect that returns whatever audio is provided (via
torch.nn.Identity
). Here are the results:To confirm the summing behavior, I doubled a mono track to stereo, set the effect gain to 0.5, inverted the summed output track, and made sure it canceled the stereo inputs.
I see a few issues with the current behaviors:
multichannel
flag andlabels
can cause crashes, even though the former is only ostensibly responsible for downmixing stereo inputs to monolabels
implicitly determines the number of output tracks created, which is probably something that should be determined explicitlysample_rate
,multichannel
, etc.) and descriptors. Thus, developers have to keep track of the metadata and model behavior simultaneously and make sure the two match correctlySome possible fixes:
multichannel
field and add metadata fields that specify (1) the input / output channel counts and (2) whether to automatically upmix the outputs to a single track, assuming this is feasible within Audacitysample_rate
, etc.) constructor arguments toWaveformToWaveformBase
and store them there.WaveformToWaveformBase
model a with a utility function or class method that takes the remaining descriptors as arguments, ensuring that metadata and model are matched. This could also be folded into a single serialization / scripting utility function if we really want to streamline.labels
to match the effect outputs, so thatlabels
does not determine any behavior beyond track namingThe text was updated successfully, but these errors were encountered: