Improved Readme (#20)

* Added screenshot, updated plugin. Removed parameters.md * Explained a bit the process of creating modelData files * Added why not real-time section * Addressed Tibor comments * Added link to release * Typo * Removed points, typo
DamRsn · Mar 24, 2023 · 80beecb · 80beecb
1 parent bd280bf
commit 80beecb
Show file tree

Hide file tree

Showing 3 changed files with 54 additions and 116 deletions.
diff --git a/Brainstorm UX UI/Parameters.md b/Brainstorm UX UI/Parameters.md
diff --git a/NeuralNote_UI.png b/NeuralNote_UI.png
diff --git a/README.md b/README.md
@@ -9,6 +9,20 @@ your favorite Digital Audio Workstation.
 - Lightweight and very fast transcription
 - Can scale and time quantize transcribed MIDI directly in the plugin
 
+## Install NeuralNote
+
+Download the latest release for your platform [here](https://github.com/DamRsn/NeuralNote/releases) (Windows and Mac (
+Universal) supported)!
+
+Currently, only the raw `.vst3`, `.component` (Audio Unit), `.app` and `.exe` (Standalone) files are provided.
+Installers will be created soon. In the meantime, you can manually copy the plugin/app file in the appropriate
+directory. Also, the code is not yet signed (will be soon), so you might have to authorize the plugin in your security
+settings, as it currently comes from an unidentified developer.
+
+## Usage
+
+![UI](NeuralNote_UI.png)
+
 NeuralNote comes as a simple AudioFX plugin (VST3/AU/Standalone app) to be applied on the track to transcribe.
 
 The workflow is very simple:
@@ -19,26 +33,18 @@ The workflow is very simple:
 - The midi transcription instantly appears in the piano roll section. Play with the different settings to adjust it.
 - Export the MIDI transcription with a simple drag and drop from the plugin to a MIDI track.
 
+**Watch our presentation video for the Neural Audio Plugin
+competition [here](https://www.youtube.com/watch?v=6_MC0_aG_DQ)**.
+
 NeuralNote uses internally the model from Spotify's [basic-pitch](https://github.com/spotify/basic-pitch). See
 their [blogpost](https://engineering.atspotify.com/2022/06/meet-basic-pitch/)
-and [paper](https://arxiv.org/abs/2203.09893) for more information.
-
-In NeuralNote, basic-pitch is run
+and [paper](https://arxiv.org/abs/2203.09893) for more information. In NeuralNote, basic-pitch is run
 using [RTNeural](https://github.com/jatinchowdhury18/RTNeural) for the CNN part
 and [ONNXRuntime](https://github.com/microsoft/onnxruntime) for the feature part (Constant-Q transform calculation +
 Harmonic Stacking).
 As part of this project, [we contributed to RTNeural](https://github.com/jatinchowdhury18/RTNeural/pull/89) to add 2D
 convolution support.
 
-## Install and use the plugin
-
-To simply install and start to use the plugin right away, download the latest release for your platform! (Windows and
-Mac (Universal) supported)
-
-Currently, only the .vst3, .component (Audio Unit), .app and .exe files are provided. Installers will be created soon.
-Also, the code is not yet signed (will be soon), so you might have to authorize the plugin in your security settings, as
-it currently comes from an unidentified developer.
-
 ## Build from source
 
 Use this when cloning:
@@ -65,23 +71,40 @@ with [ort-builder](https://github.com/olilarkin/ort-builder)) before calling CMa
 
 #### IDEs
 
-Once the build script corresponding as been executed at least once, you can load this project in your favorite IDE
+Once the build script has been executed at least once, you can load this project in your favorite IDE
 (CLion/Visual Studio/VSCode/etc) and click 'build' for one of the targets.
 
+## Reuse code from NeuralNote’s transcription engine
+
+All the code to perform the transcription is in `Lib/Model` and all the model weights are in `Lib/ModelData/`. Feel free
+to use only this part of the code in your own project! We'll try to isolate it more from the rest of the repo in the
+future and make it a library.
+
+The code to generate the files in `Lib/ModelData/` is not currently available as it required a lot of manual operations.
+But here's a description of the process we followed to create those files:
+
+- `features_model.onnx` was generated by converting a keras model containing only the CQT + Harmonic Stacking part of
+  the full basic-pitch graph using `tf2onnx` (with manually added weights for batch normalization).
+- the `.json` files containing the weights of the basic-pitch cnn were generated from the tensorflow-js model available
+  in the [basic-pitch-tf repository](https://github.com/spotify/basic-pitch-ts), then converted to onnx with `tf2onnx`.
+  Finally, the weights were gathered manually to `.npy` thanks to [Netron](https://netron.app/) and finally applied to a
+  split keras model created with [basic-pitch](https://github.com/spotify/basic-pitch) code.
+
+The original basic-pitch CNN was split in 4 sequential models wired together, so they can be run with RTNeural.
+
 ## Roadmap
 
-- Improve stability.
-- Save plugin internal state properly, so it can be loaded back when reentering a session.
+- Improve stability
+- Save plugin internal state properly, so it can be loaded back when reentering a session
 - Add tooltips
 - Build a simple synth in the plugin so that one can listen to the transcription while playing with the settings, before
-  export.
-- Allow pitch bends on non-overlapping parts of overlapping notes.
+  export
+- Allow pitch bends on non-overlapping parts of overlapping notes
 - Support transcription of mp3 files
 
 ## Bug reports and feature requests
 
-If you have any request/suggestion concerning the plugin or encounter a bug, please fill a Github issue, we'll
-do our best to address it.
+If you have any request/suggestion concerning the plugin or encounter a bug, please file a GitHub issue.
 
 ## Contributing
 
@@ -103,6 +126,18 @@ Here's a list of all the third party libraries used in NeuralNote and the licens
 - [basic-pitch](https://github.com/spotify/basic-pitch) (Apache-2.0 license)
 - [basic-pitch-ts](https://github.com/spotify/basic-pitch-ts) (Apache-2.0 license)
 
+## Could NeuralNote transcribe audio in real-time?
+
+Unfortunately no and this for a few reasons:
+
+- Basic Pitch uses the Constant-Q transform (CQT) as input feature. The CQT requires really long audio chunks (> 1s) to
+  get amplitudes for the lowest frequency bins. This makes the latency too high to have real-time transcription.
+- The basic pitch CNN has an additional latency of approximately 120ms.
+- Very few DAWs support audio input/MIDI output plugins as far as I know. This is partially why NeuralNote is an
+  Audio FX plugin (audio-to-audio) and that MIDI is exported via drag and drop.
+
+But if you have ideas please share!
+
 ## Credits
 
 NeuralNote was developed by [Damien Ronssin](https://github.com/DamRsn) and [Tibor Vass](https://github.com/tiborvass).