update README
Browse files
README.md
CHANGED
@@ -3,21 +3,32 @@ license: cdla-permissive-2.0
|
|
3 |
---
|
4 |
|
5 |
## Model Summary
|
6 |
-
[DAC auto-encoder models](https://github.com/descriptinc/descript-audio-codec) provide compact discrete tokenization of speech and audio signals that facilitate signal generation by cascaded generative AI models (e.g. multi-modal generative AI models) and high-quality reconstruction of the original signals. [The current models](https://www.isca-archive.org/interspeech_2024/shechtman24_interspeech.pdf) improve upon the [original DAC models](https://github.com/descriptinc/descript-audio-codec) by allowing a more compact representation for speech-only signals with high-quality signal reconstruction.
|
|
|
|
|
|
|
|
|
|
|
7 |
|
8 |
## Usage
|
9 |
-
follow [DAC](https://github.com/descriptinc/descript-audio-codec) installation instructions
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
### Compress audio
|
12 |
```
|
13 |
-
python3 -m dac encode /path/to/input --output /path/to/output/codes --weights_path
|
14 |
```
|
15 |
|
16 |
This command will create `.dac` files with the same name as the input files. It will also preserve the directory structure relative to input root and re-create it in the output directory. Please use `python -m dac encode --help` for more options.
|
17 |
|
18 |
### Reconstruct audio from compressed codes
|
19 |
```
|
20 |
-
python3 -m dac decode /path/to/output/codes --output /path/to/reconstructed_input --weights_path
|
21 |
```
|
22 |
|
23 |
This command will create `.wav` files with the same name as the input files. It will also preserve the directory structure relative to input root and re-create it in the output directory. Please use `python -m dac decode --help` for more options.
|
@@ -28,7 +39,7 @@ import dac
|
|
28 |
from audiotools import AudioSignal
|
29 |
|
30 |
# Download a model
|
31 |
-
model_path =
|
32 |
model = dac.DAC.load(model_path)
|
33 |
|
34 |
model.to('cuda')
|
|
|
3 |
---
|
4 |
|
5 |
## Model Summary
|
6 |
+
[DAC auto-encoder models](https://github.com/descriptinc/descript-audio-codec) provide compact discrete tokenization of speech and audio signals that facilitate signal generation by cascaded generative AI models (e.g. multi-modal generative AI models) and high-quality reconstruction of the original signals. [The current models](https://www.isca-archive.org/interspeech_2024/shechtman24_interspeech.pdf) improve upon the [original DAC models](https://github.com/descriptinc/descript-audio-codec) by allowing a more compact representation for wide-band speech-only signals with high-quality signal reconstruction.
|
7 |
+
|
8 |
+
| Model | Speech Sample Rate | codebooks | Bit Rate | Token Rate| version|
|
9 |
+
| :---: | :---: | :---: | :---: | :---: | :---: |
|
10 |
+
| weights_24khz_3.0kbps_v1.0.pth | 24kHz | 4 | 3kHz | 300Hz | 1.0 |
|
11 |
+
| weights_24khz_1.5kbps_v1.0.pth | 24kHz | 2 | 1.5kHz | 150Hz | 1.0 |
|
12 |
|
13 |
## Usage
|
14 |
+
* follow [DAC](https://github.com/descriptinc/descript-audio-codec) installation instructions
|
15 |
+
|
16 |
+
* clone the current repo
|
17 |
+
```
|
18 |
+
git clone https://huggingface.co/ibm/DAC.speech.v1.0
|
19 |
+
cd DAC.speech.v1.0
|
20 |
+
```
|
21 |
+
|
22 |
### Compress audio
|
23 |
```
|
24 |
+
python3 -m dac encode /path/to/input --output /path/to/output/codes --weights_path weights_24khz_3.0kbps_v1.0.pth
|
25 |
```
|
26 |
|
27 |
This command will create `.dac` files with the same name as the input files. It will also preserve the directory structure relative to input root and re-create it in the output directory. Please use `python -m dac encode --help` for more options.
|
28 |
|
29 |
### Reconstruct audio from compressed codes
|
30 |
```
|
31 |
+
python3 -m dac decode /path/to/output/codes --output /path/to/reconstructed_input --weights_path weights_24khz_3.0kbps_v1.0.pth
|
32 |
```
|
33 |
|
34 |
This command will create `.wav` files with the same name as the input files. It will also preserve the directory structure relative to input root and re-create it in the output directory. Please use `python -m dac decode --help` for more options.
|
|
|
39 |
from audiotools import AudioSignal
|
40 |
|
41 |
# Download a model
|
42 |
+
model_path = 'weights_24khz_3.0kbps_v1.0.pth'
|
43 |
model = dac.DAC.load(model_path)
|
44 |
|
45 |
model.to('cuda')
|