1 Star 0 Fork 49

思想的光芒 / DeepSparkHub

forked from DeepSpark / DeepSparkHub 
加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
README.md 5.10 KB
一键复制 编辑 原始数据 按行查看 历史

PP-TTS-FastSpeech2

Model description

Non-autoregressive text to speech (TTS) models such as FastSpeech can synthesize speech significantly faster than previous autoregressive models with comparable quality. FastSpeech 2s is the first attempt to directly generate speech waveform from text in parallel, enjoying the benefit of fully end-to-end inference. Experimental results show that 1) FastSpeech 2 achieves a 3x training speed-up over FastSpeech, and FastSpeech 2s enjoys even faster inference speed; 2) FastSpeech 2 and 2s outperform FastSpeech in voice quality, and FastSpeech 2 can even surpass autoregressive models.

Step 1:Installation

# Pip the requirements
pip3 install -r requirements.txt

# Clone the repo
git clone https://github.com/PaddlePaddle/PaddleSpeech.git
cd PaddleSpeech/examples/csmsc/tts3
# Install sqlite3
wget https://sqlite.org/2019/sqlite-autoconf-3290000.tar.gz
tar zxvf sqlite-autoconf-3290000.tar.gz
cd sqlite-autoconf-3290000
./configure
make && make install
cd ..

wget https://www.python.org/ftp/python/3.7.9/Python-3.7.9.tar.xz
tar xvf Python-3.7.9.tar.xz
cd Python-3.7.9
./configure LDFLAGS="-L/usr/local/lib" CPPFLAGS="-I/usr/local/include" --prefix=/usr/bin
make && make install

cp /usr/bin/lib/python3.7/lib-dynload/_sqlite3.cpython-37m-x86_64-linux-gnu.so /usr/local/lib/python3.7/lib-dynload/_sqlite3.so
# Update GCC lib
wget http://ftp.gnu.org/gnu/gcc/gcc-8.3.0/gcc-8.3.0.tar.gz
tar -zxvf gcc-8.3.0.tar.gz
yum -y install bzip2
cd gcc-8.3.0
./contrib/download_prerequisites
mkdir build
cd build/
../configure -enable-checking=release -enable-languages=c,c++ -disable-multilib
make -j 10
make install

cp /usr/local/lib64/libstdc++.so.6.0.25 /usr/lib64
cd /usr/lib64
rm -rf libstdc++.so.6
ln -s libstdc++.so.6.0.25 libstdc++.so.6

Step 2:Preparing datasets

Download and Extract

Download CSMSC(BZNSYP) from this Website. and extract it to ./datasets. Then the dataset is in the directory ./datasets/BZNSYP.

Get MFA Result and Extract

We use MFA to get durations for fastspeech2. You can download from here baker_alignment_tone.tar.gz.

Put the data directory structure like this:

tts3
├── baker_alignment_tone
├── conf
├── datasets
│   └── BZNSYP
│       ├── PhoneLabeling
│       ├── ProsodyLabeling
│       └── Wave
├── local
└── ...

Change the rootdir of dataset in ./local/preprocess.sh to the dataset path. Like this: --rootdir=./datasets/BZNSYP

Data preprocessing

PYTHONWARNINGS='ignore:semaphore_tracker:UserWarning' ./run.sh --stage 0 --stop-stage 0

When it is done. A dump folder is created in the current directory. The structure of the dump folder is listed below.

dump
├── dev
│   ├── norm
│   └── raw
├── phone_id_map.txt
├── speaker_id_map.txt
├── test
│   ├── norm
│   └── raw
└── train
    ├── energy_stats.npy
    ├── norm
    ├── pitch_stats.npy
    ├── raw
    └── speech_stats.npy

Step 3:Training

Model Training

You can choose use how many gpus for training by changing gups parameter in run.sh file and ngpu parameter in ./local/train.sh file.

PYTHONWARNINGS='ignore:semaphore_tracker:UserWarning' ./run.sh --stage 1 --stop-stage 1

Synthesizing

We use parallel wavegan as the neural vocoder. Download pretrained parallel wavegan model from pwg_baker_ckpt_0.4.zip and unzip it.

unzip pwg_baker_ckpt_0.4.zip

Parallel WaveGAN checkpoint contains files listed below.

pwg_baker_ckpt_0.4
├── pwg_default.yaml               # default config used to train parallel wavegan
├── pwg_snapshot_iter_400000.pdz   # model parameters of parallel wavegan
└── pwg_stats.npy                  # statistics used to normalize spectrogram when training parallel wavegan

Run synthesizing Modify the parameter of ckpt_name in run.sh file to the weight name after training. Add parameter providers=['CUDAExecutionProvider'] to the file PaddleSpeech/paddlespeech/t2s/frontend/g2pw/onnx_api.py at line 80. Like below:

self.session_g2pW = onnxruntime.InferenceSession(
    os.path.join(uncompress_path, 'g2pW.onnx'),
    sess_options=sess_options, providers=['CUDAExecutionProvider'])
./run.sh --stage 2 --stop-stage 3

Inferencing

./run.sh --stage 4 --stop-stage 4

Results

GPUS avg_ips l1 loss duration loss pitch loss energy loss loss
BI 100 × 8 71.19sequences/sec 0.603 0.037 0.327 0.151 1.118

Reference

FastSpeech2

马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
Python
1
https://gitee.com/Hcl11/deepsparkhub.git
git@gitee.com:Hcl11/deepsparkhub.git
Hcl11
deepsparkhub
DeepSparkHub
master

搜索帮助

344bd9b3 5694891 D2dac590 5694891