ibm
/

DAC.speech.v1.0

@@ -3,21 +3,32 @@ license: cdla-permissive-2.0
 ---
 ## Model Summary
-[DAC auto-encoder models](https://github.com/descriptinc/descript-audio-codec) provide compact discrete tokenization of speech and audio signals that facilitate signal generation by cascaded generative AI models (e.g. multi-modal generative AI models) and high-quality reconstruction of the original signals. [The current models](https://www.isca-archive.org/interspeech_2024/shechtman24_interspeech.pdf) improve upon the [original DAC models](https://github.com/descriptinc/descript-audio-codec) by allowing a more compact representation for speech-only signals with high-quality signal reconstruction.
 ## Usage
-follow [DAC](https://github.com/descriptinc/descript-audio-codec) installation instructions
-download the model weights from the current repo (e.g., *weights_24khz_1.5kbps_v1.0*)
 ### Compress audio
 ```
-python3 -m dac encode /path/to/input --output /path/to/output/codes --weights_path /path/to/weights_24khz_1.5kbps_v1.0
 ```
 This command will create `.dac` files with the same name as the input files. It will also preserve the directory structure relative to input root and re-create it in the output directory. Please use `python -m dac encode --help` for more options.
 ### Reconstruct audio from compressed codes
 ```
-python3 -m dac decode /path/to/output/codes --output /path/to/reconstructed_input --weights_path /path/to/weights_24khz_1.5kbps_v1.0
 ```
 This command will create `.wav` files with the same name as the input files. It will also preserve the directory structure relative to input root and re-create it in the output directory. Please use `python -m dac decode --help` for more options.
@@ -28,7 +39,7 @@ import dac
 from audiotools import AudioSignal
 # Download a model
-model_path = /path/to/weights_24khz_1.5kbps_v1.0
 model = dac.DAC.load(model_path)
 model.to('cuda')

 ---
 ## Model Summary
+[DAC auto-encoder models](https://github.com/descriptinc/descript-audio-codec) provide compact discrete tokenization of speech and audio signals that facilitate signal generation by cascaded generative AI models (e.g. multi-modal generative AI models) and high-quality reconstruction of the original signals. [The current models](https://www.isca-archive.org/interspeech_2024/shechtman24_interspeech.pdf) improve upon the [original DAC models](https://github.com/descriptinc/descript-audio-codec) by allowing a more compact representation for wide-band speech-only signals with high-quality signal reconstruction.
+| Model     | Speech Sample Rate    | codebooks | Bit Rate  | Token Rate| version|
+| :---:     | :---:                 | :---:     | :---:     | :---:     | :---: |
+| weights_24khz_3.0kbps_v1.0.pth | 24kHz   | 4 | 3kHz   | 300Hz | 1.0 |
+| weights_24khz_1.5kbps_v1.0.pth | 24kHz   | 2 | 1.5kHz   | 150Hz | 1.0 |
 ## Usage
+* follow [DAC](https://github.com/descriptinc/descript-audio-codec) installation instructions
+* clone the current repo
+```
+git clone https://huggingface.co/ibm/DAC.speech.v1.0
+cd DAC.speech.v1.0
+```
 ### Compress audio
 ```
+python3 -m dac encode /path/to/input --output /path/to/output/codes --weights_path weights_24khz_3.0kbps_v1.0.pth
 ```
 This command will create `.dac` files with the same name as the input files. It will also preserve the directory structure relative to input root and re-create it in the output directory. Please use `python -m dac encode --help` for more options.
 ### Reconstruct audio from compressed codes
 ```
+python3 -m dac decode /path/to/output/codes --output /path/to/reconstructed_input --weights_path weights_24khz_3.0kbps_v1.0.pth
 ```
 This command will create `.wav` files with the same name as the input files. It will also preserve the directory structure relative to input root and re-create it in the output directory. Please use `python -m dac decode --help` for more options.
 from audiotools import AudioSignal
 # Download a model
+model_path = 'weights_24khz_3.0kbps_v1.0.pth'
 model = dac.DAC.load(model_path)
 model.to('cuda')