同步操作将从 DeepSpark/DeepSparkHub 强制同步,此操作会覆盖自 Fork 仓库以来所做的任何修改,且无法恢复!!!
确定后同步将在后台操作,完成时将刷新页面,请耐心等待。
HiFiGAN is a commonly used vocoder in academia and industry in recent years, which can convert the frequency spectrum generated by acoustic models into high-quality audio. This vocoder uses generative adversarial networks as the basis for generating models.
# Pip the requirements
pip3 install requirements.txt
# Clone the repo
git clone https://github.com/PaddlePaddle/PaddleSpeech.git
cd PaddleSpeech/examples/csmsc/voc5
Download CSMSC(BZNSYP) from this Website.and extract it to ./datasets. Then the dataset is in the directory ./datasets/BZNSYP.
We use MFA to get durations for fastspeech2. You can download from here baker_alignment_tone.tar.gz.
Put the data directory structure like this:
voc5
├── baker_alignment_tone
├── conf
├── datasets
│ └── BZNSYP
│ ├── PhoneLabeling
│ ├── ProsodyLabeling
│ └── Wave
├── local
└── ...
Change the rootdir of dataset in ./local/preprocess.sh to the dataset path. Like this: --rootdir=./datasets/BZNSYP
./run.sh --stage 0 --stop-stage 0
When it is done. A dump
folder is created in the current directory. The structure of the dump folder is listed below.
dump
├── dev
│ ├── norm
│ └── raw
├── test
│ ├── norm
│ └── raw
└── train
├── norm
├── raw
└── feats_stats.npy
You can choose use how many gpus for training by changing gups
parameter in run.sh file and ngpu
parameter in ./local/train.sh file.
Modify ./local/train.sh
file to use python3 run.
sed -i 's/python /python3 /g' ./local/train.sh
Full training may cost much time, you can modify the train_max_steps
parameter in ./conf/default.yaml file to reduce training time. But in order to get the weight file you should make the train_max_steps
parameter bigger than save_interval_steps
parameter.
./run.sh --stage 1 --stop-stage 1
Modify the parameter of ckpt_name
in run.sh file to the weight name after training.
./run.sh --stage 2 --stop-stage 2
Main results after 1000 step train.
GPUS | avg_ips | adversarial loss | feature matching loss | mel loss | generator loss | real loss | fake loss | discriminator loss |
---|---|---|---|---|---|---|---|---|
BI V100 × 1 | 15.42 sequences/sec | 6.276 | 0.845 | 0.531 | 31.858 | 0.513 | 0.6289 | 1.142 |
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。