ibm
/

DAC.speech.v1.0

Model card Files Files and versions Community

slavashe commited on about 10 hours ago

Commit

d00f47e

•

1 Parent(s): 692fc03

Readme updates

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -18,7 +18,8 @@ tags:
 ---
 ## Model Summary
-[DAC auto-encoder models](https://github.com/descriptinc/descript-audio-codec) provide compact discrete tokenization of speech and audio signals that facilitate signal generation by cascaded generative AI models (e.g. multi-modal generative AI models) and high-quality reconstruction of the original signals. [The current models](https://www.isca-archive.org/interspeech_2024/shechtman24_interspeech.pdf) improve upon the [original DAC models](https://github.com/descriptinc/descript-audio-codec) by allowing a more compact representation for wide-band speech-only signals with high-quality signal reconstruction.
 | Model     | Speech Sample Rate    | codebooks | Bit Rate  | Token Rate| version|
 | :---:     | :---:                 | :---:     | :---:     | :---:     | :---: |

 ---
 ## Model Summary
+[DAC auto-encoder models](https://github.com/descriptinc/descript-audio-codec) provide compact discrete tokenization of speech and audio signals that facilitate signal generation by cascaded generative AI models (e.g. multi-modal generative AI models) and high-quality reconstruction of the original signals. [The current finetuned models](https://www.isca-archive.org/interspeech_2024/shechtman24_interspeech.pdf) improve upon the [original DAC models](https://github.com/descriptinc/descript-audio-codec) by allowing a more compact representation for wide-band speech signals with high-quality signal reconstruction. The models achieve speech reconstruction, which is [nearly indistinguishable from PCM](https://ibm.biz/IS24SpeechRVQ) with a rate of 150-300 tokens per second
+(1500-3000 bps). [The evaluation](https://www.isca-archive.org/interspeech_2024/shechtman24_interspeech.pdf) used comprehensive English speech data encompassing different recording conditions, including studio settings.
 | Model     | Speech Sample Rate    | codebooks | Bit Rate  | Token Rate| version|
 | :---:     | :---:                 | :---:     | :---:     | :---:     | :---: |