同步操作将从 DeepSpark/DeepSparkHub 强制同步,此操作会覆盖自 Fork 仓库以来所做的任何修改,且无法恢复!!!
确定后同步将在后台操作,完成时将刷新页面,请耐心等待。
Non-autoregressive text to speech (TTS) models such as FastSpeech can synthesize speech significantly faster than previous autoregressive models with comparable quality. FastSpeech 2s is the first attempt to directly generate speech waveform from text in parallel, enjoying the benefit of fully end-to-end inference. Experimental results show that 1) FastSpeech 2 achieves a 3x training speed-up over FastSpeech, and FastSpeech 2s enjoys even faster inference speed; 2) FastSpeech 2 and 2s outperform FastSpeech in voice quality, and FastSpeech 2 can even surpass autoregressive models.
# Pip the requirements
pip3 install -r requirements.txt
# Clone the repo
git clone https://github.com/PaddlePaddle/PaddleSpeech.git
cd PaddleSpeech/examples/csmsc/tts3
# Install sqlite3
wget https://sqlite.org/2019/sqlite-autoconf-3290000.tar.gz
tar zxvf sqlite-autoconf-3290000.tar.gz
cd sqlite-autoconf-3290000
./configure
make && make install
cd ..
wget https://www.python.org/ftp/python/3.7.9/Python-3.7.9.tar.xz
tar xvf Python-3.7.9.tar.xz
cd Python-3.7.9
./configure LDFLAGS="-L/usr/local/lib" CPPFLAGS="-I/usr/local/include" --prefix=/usr/bin
make && make install
cp /usr/bin/lib/python3.7/lib-dynload/_sqlite3.cpython-37m-x86_64-linux-gnu.so /usr/local/lib/python3.7/lib-dynload/_sqlite3.so
# Update GCC lib
wget http://ftp.gnu.org/gnu/gcc/gcc-8.3.0/gcc-8.3.0.tar.gz
tar -zxvf gcc-8.3.0.tar.gz
yum -y install bzip2
cd gcc-8.3.0
./contrib/download_prerequisites
mkdir build
cd build/
../configure -enable-checking=release -enable-languages=c,c++ -disable-multilib
make -j 10
make install
cp /usr/local/lib64/libstdc++.so.6.0.25 /usr/lib64
cd /usr/lib64
rm -rf libstdc++.so.6
ln -s libstdc++.so.6.0.25 libstdc++.so.6
Download CSMSC(BZNSYP) from this Website. and extract it to ./datasets. Then the dataset is in the directory ./datasets/BZNSYP.
We use MFA to get durations for fastspeech2. You can download from here baker_alignment_tone.tar.gz.
Put the data directory structure like this:
tts3
├── baker_alignment_tone
├── conf
├── datasets
│ └── BZNSYP
│ ├── PhoneLabeling
│ ├── ProsodyLabeling
│ └── Wave
├── local
└── ...
Change the rootdir of dataset in ./local/preprocess.sh to the dataset path. Like this: --rootdir=./datasets/BZNSYP
PYTHONWARNINGS='ignore:semaphore_tracker:UserWarning' ./run.sh --stage 0 --stop-stage 0
When it is done. A dump
folder is created in the current directory. The structure of the dump folder is listed below.
dump
├── dev
│ ├── norm
│ └── raw
├── phone_id_map.txt
├── speaker_id_map.txt
├── test
│ ├── norm
│ └── raw
└── train
├── energy_stats.npy
├── norm
├── pitch_stats.npy
├── raw
└── speech_stats.npy
You can choose use how many gpus for training by changing gups parameter in run.sh file and ngpu parameter in ./local/train.sh file.
PYTHONWARNINGS='ignore:semaphore_tracker:UserWarning' ./run.sh --stage 1 --stop-stage 1
We use parallel wavegan as the neural vocoder. Download pretrained parallel wavegan model from pwg_baker_ckpt_0.4.zip and unzip it.
unzip pwg_baker_ckpt_0.4.zip
Parallel WaveGAN checkpoint contains files listed below.
pwg_baker_ckpt_0.4
├── pwg_default.yaml # default config used to train parallel wavegan
├── pwg_snapshot_iter_400000.pdz # model parameters of parallel wavegan
└── pwg_stats.npy # statistics used to normalize spectrogram when training parallel wavegan
Run synthesizing
Modify the parameter of ckpt_name
in run.sh file to the weight name after training.
Add parameter providers=['CUDAExecutionProvider']
to the file PaddleSpeech/paddlespeech/t2s/frontend/g2pw/onnx_api.py
at line 80. Like below:
self.session_g2pW = onnxruntime.InferenceSession(
os.path.join(uncompress_path, 'g2pW.onnx'),
sess_options=sess_options, providers=['CUDAExecutionProvider'])
./run.sh --stage 2 --stop-stage 3
./run.sh --stage 4 --stop-stage 4
GPUS | avg_ips | l1 loss | duration loss | pitch loss | energy loss | loss |
---|---|---|---|---|---|---|
BI 100 × 8 | 71.19sequences/sec | 0.603 | 0.037 | 0.327 | 0.151 | 1.118 |
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。