PP-TTS-HiFiGAN

Model description

HiFiGAN is a commonly used vocoder in academia and industry in recent years, which can convert the frequency spectrum generated by acoustic models into high-quality audio. This vocoder uses generative adversarial networks as the basis for generating models.

Step 1:Installation

# Pip the requirements
pip3 install requirements.txt

# Clone the repo
git clone https://github.com/PaddlePaddle/PaddleSpeech.git
cd PaddleSpeech/examples/csmsc/voc5

Step 2:Preparing datasets

Download and Extract

Download CSMSC(BZNSYP) from this Website.and extract it to ./datasets. Then the dataset is in the directory ./datasets/BZNSYP.

Get MFA Result and Extract

We use MFA to get durations for fastspeech2. You can download from here baker_alignment_tone.tar.gz.

Put the data directory structure like this:

voc5
├── baker_alignment_tone
├── conf
├── datasets
│   └── BZNSYP
│       ├── PhoneLabeling
│       ├── ProsodyLabeling
│       └── Wave
├── local
└── ...

Change the rootdir of dataset in ./local/preprocess.sh to the dataset path. Like this: --rootdir=./datasets/BZNSYP

Data preprocessing

./run.sh --stage 0 --stop-stage 0

When it is done. A dump folder is created in the current directory. The structure of the dump folder is listed below.

dump
├── dev
│   ├── norm
│   └── raw
├── test
│   ├── norm
│   └── raw
└── train
    ├── norm
    ├── raw
    └── feats_stats.npy

Step 3:Training

Model Training

You can choose use how many gpus for training by changing gups parameter in run.sh file and ngpu parameter in ./local/train.sh file.

Modify ./local/train.sh file to use python3 run.

sed -i 's/python /python3 /g' ./local/train.sh

Full training may cost much time, you can modify the train_max_steps parameter in ./conf/default.yaml file to reduce training time. But in order to get the weight file you should make the train_max_steps parameter bigger than save_interval_steps parameter.

./run.sh --stage 1 --stop-stage 1

Synthesizing

Modify the parameter of ckpt_name in run.sh file to the weight name after training.

./run.sh --stage 2 --stop-stage 2

Results

Main results after 1000 step train.

GPUS	avg_ips	adversarial loss	feature matching loss	mel loss	generator loss	real loss	fake loss	discriminator loss
BI V100 × 1	15.42 sequences/sec	6.276	0.845	0.531	31.858	0.513	0.6289	1.142

Reference

HiFiGAN

思想的光芒 / DeepSparkHub

PP-TTS-HiFiGAN

Model description

Step 1:Installation

Step 2:Preparing datasets

Download and Extract

Get MFA Result and Extract

Data preprocessing

Step 3:Training

Model Training

Synthesizing

Results

Reference

简介

发行版

贡献者

近期动态

思想的光芒 / DeepSparkHub .gitee-modal { width: 500px !important; }

PP-TTS-HiFiGAN

Model description

Step 1:Installation

Step 2:Preparing datasets

Download and Extract

Get MFA Result and Extract

Data preprocessing

Step 3:Training

Model Training

Synthesizing

Results

Reference

简介

发行版

贡献者

近期动态

搜索帮助

思想的光芒 / DeepSparkHub