Edresson commited on
Commit
344ba1c
1 Parent(s): 2ebc057

Update README

Browse files
Files changed (1) hide show
  1. README.md +69 -0
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: pt
3
+ datasets:
4
+ - Common Voice
5
+ metrics:
6
+ - wer
7
+ tags:
8
+ - audio
9
+ - speech
10
+ - wav2vec2
11
+ - pt
12
+ - portuguese-speech-corpus
13
+ - automatic-speech-recognition
14
+ - speech
15
+ - PyTorch
16
+ license: apache-2.0
17
+ model-index:
18
+ - name: Edresson Casanova Wav2vec2 Large 100k Voxpopuli fine-tuned with a single-speaker dataset plus Data Augmentation in Portuguese
19
+ results:
20
+ - task:
21
+ name: Speech Recognition
22
+ type: automatic-speech-recognition
23
+ metrics:
24
+ - name: Test Common Voice 7.0 WER
25
+ type: wer
26
+ value: 33.96
27
+ ---
28
+
29
+ # Wav2vec2 Large 100k Voxpopuli fine-tuned with Common Voice and M-AILABS in Russian
30
+
31
+ [Wav2vec2 Large 100k Voxpopuli](https://huggingface.co/facebook/wav2vec2-large-100k-voxpopuli) fine-tuned in Portuguese using a single-speaker dataset plus a data augmentation method based on TTS and voice conversion.
32
+
33
+
34
+
35
+ # Use this model
36
+
37
+ ```python
38
+
39
+ from transformers import AutoTokenizer, Wav2Vec2ForCTC
40
+
41
+ tokenizer = AutoTokenizer.from_pretrained("Edresson/wav2vec2-large-100k-voxpopuli-ft-TTS-Dataset-plus-data-augmentation-portuguese")
42
+
43
+ model = Wav2Vec2ForCTC.from_pretrained("Edresson/wav2vec2-large-100k-voxpopuli-ft-TTS-Dataset-plus-data-augmentation-portuguese")
44
+ ```
45
+ # Results
46
+ For the results check the [article (Soon)]()
47
+
48
+ # Example test with Common Voice Dataset
49
+
50
+
51
+ ```python
52
+ dataset = load_dataset("common_voice", "pt", split="test", data_dir="./cv-corpus-7.0-2021-07-21")
53
+
54
+ resampler = torchaudio.transforms.Resample(orig_freq=48_000, new_freq=16_000)
55
+
56
+ def map_to_array(batch):
57
+ speech, _ = torchaudio.load(batch["path"])
58
+ batch["speech"] = resampler.forward(speech.squeeze(0)).numpy()
59
+ batch["sampling_rate"] = resampler.new_freq
60
+ batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower().replace("’", "'")
61
+ return batch
62
+ ```
63
+
64
+ ```python
65
+ ds = dataset.map(map_to_array)
66
+ result = ds.map(map_to_pred, batched=True, batch_size=1, remove_columns=list(ds.features.keys()))
67
+ print(wer.compute(predictions=result["predicted"], references=result["target"]))
68
+ ```
69
+