Please note that if you did not explicitly discuss with Andy and Leo (the owners of the repository) about your planning development, by default it will not be merged into S3PRL and is better to be maintained in your own repository. Please see our pull request policy. Thanks for your understanding!
This is an open source toolkit called s3prl, which stands for Self-Supervised Speech Pre-training and Representation Learning. Self-supervised speech pre-trained models are called upstream in this toolkit, and are utilized in various downstream tasks.
The toolkit has three major usages:
Below is an intuitive illustration on how this toolkit may help you:
Feel free to use or modify our toolkit in your research. Here is a list of papers using our toolkit. Any question, bug report or improvement suggestion is welcome through opening up a new issue.
If you find this toolkit helpful to your research, please do consider citing our papers, thanks!
README.md
under each upstream
folder. E.g., upstream/pase/README.md
master
, e.g. you create a branch new-awesome-feature
.The majority of S3PRL Toolkit is licensed under the Apache License version 2.0, however all the files authored by Facebook, Inc. (which have explicit copyright statement on the top) are licensed under CC-BY-NC.
@article{mockingjay,
title={Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders},
ISBN={9781509066315},
url={http://dx.doi.org/10.1109/ICASSP40776.2020.9054458},
DOI={10.1109/icassp40776.2020.9054458},
journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
publisher={IEEE},
author={Liu, Andy T. and Yang, Shu-wen and Chi, Po-Han and Hsu, Po-chun and Lee, Hung-yi},
year={2020},
month={May}
}
@misc{tera,
title={TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech},
author={Andy T. Liu and Shang-Wen Li and Hung-yi Lee},
year={2020},
eprint={2007.06028},
archivePrefix={arXiv},
primaryClass={eess.AS}
}
@inproceedings{audio_albert,
title={Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation},
author={Po-Han Chi and Pei-Hung Chung and Tsung-Han Wu and Chun-Cheng Hsieh and Shang-Wen Li and Hung-yi Lee},
year={2020},
booktitle={SLT 2020},
}
@inproceedings{understanding_sat,
author={Shu-wen Yang and Andy T. Liu and Hung-yi Lee},
title={{Understanding Self-Attention of Self-Supervised Audio Transformers}},
year=2020,
booktitle={Proc. Interspeech 2020},
pages={3785--3789},
doi={10.21437/Interspeech.2020-2231},
url={http://dx.doi.org/10.21437/Interspeech.2020-2231}
}
Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning (Wu et al., 2020), code for computing LNSR: utility/observe_lnsr.py
@inproceedings{mockingjay_defense,
author={Haibin Wu and Andy T. Liu and Hung-yi Lee},
title={{Defense for Black-Box Attacks on Anti-Spoofing Models by Self-Supervised Learning}},
year=2020,
booktitle={Proc. Interspeech 2020},
pages={3780--3784},
doi={10.21437/Interspeech.2020-2026},
url={http://dx.doi.org/10.21437/Interspeech.2020-2026}
}
@misc{asv_ssl,
title={Adversarial defense for automatic speaker verification by cascaded self-supervised learning models},
author={Haibin Wu and Xu Li and Andy T. Liu and Zhiyong Wu and Helen Meng and Hung-yi Lee},
year={2021},
eprint={2102.07047},
archivePrefix={arXiv},
primaryClass={eess.AS}
@misc{s2vc,
title={S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations},
author={Jheng-hao Lin and Yist Y. Lin and Chung-Ming Chien and Hung-yi Lee},
year={2021},
eprint={2104.02901},
archivePrefix={arXiv},
primaryClass={eess.AS}
}
SUPERB: Speech processing Universal PERformance Benchmark (Yang et al., 2021)
@misc{superb,
title={SUPERB: Speech processing Universal PERformance Benchmark},
author={Shu-wen Yang and Po-Han Chi and Yung-Sung Chuang and Cheng-I Jeff Lai and Kushal Lakhotia and Yist Y. Lin and Andy T. Liu and Jiatong Shi and Xuankai Chang and Guan-Ting Lin and Tzu-Hsien Huang and Wei-Cheng Tseng and Ko-tik Lee and Da-Rong Liu and Zili Huang and Shuyan Dong and Shang-Wen Li and Shinji Watanabe and Abdelrahman Mohamed and Hung-yi Lee},
year={2021},
eprint={2105.01051},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Utilizing Self-supervised Representations for MOS Prediction (Tseng et al., 2021)
@misc{ssr_mos,
title={Utilizing Self-supervised Representations for MOS Prediction},
author={Wei-Cheng Tseng and Chien-yu Huang and Wei-Tsung Kao and Yist Y. Lin and Hung-yi Lee},
year={2021},
eprint={2104.03017},
archivePrefix={arXiv},
primaryClass={eess.AS}
}
}
If you find this toolkit useful, please consider citing following papers.
@misc{tera,
title={TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech},
author={Andy T. Liu and Shang-Wen Li and Hung-yi Lee},
year={2020},
eprint={2007.06028},
archivePrefix={arXiv},
primaryClass={eess.AS}
}
@article{mockingjay,
title={Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders},
ISBN={9781509066315},
url={http://dx.doi.org/10.1109/ICASSP40776.2020.9054458},
DOI={10.1109/icassp40776.2020.9054458},
journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
publisher={IEEE},
author={Liu, Andy T. and Yang, Shu-wen and Chi, Po-Han and Hsu, Po-chun and Lee, Hung-yi},
year={2020},
month={May}
}
@inproceedings{yang21c_interspeech,
author={Shu-wen Yang and Po-Han Chi and Yung-Sung Chuang and Cheng-I Jeff Lai and Kushal Lakhotia and Yist Y. Lin and Andy T. Liu and Jiatong Shi and Xuankai Chang and Guan-Ting Lin and Tzu-Hsien Huang and Wei-Cheng Tseng and Ko-tik Lee and Da-Rong Liu and Zili Huang and Shuyan Dong and Shang-Wen Li and Shinji Watanabe and Abdelrahman Mohamed and Hung-yi Lee},
title={{SUPERB: Speech Processing Universal PERformance Benchmark}},
year=2021,
booktitle={Proc. Interspeech 2021},
pages={1194--1198},
doi={10.21437/Interspeech.2021-1775}
}
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。