1 Star 0 Fork 0

xuxm / PaddleSpeech

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
README.md 10.95 KB
一键复制 编辑 原始数据 按行查看 历史
TianYuan 提交于 2022-08-30 20:29 . Update README.md

(简体中文|English)

Speech Verification

Introduction

Speaker Verification, refers to the problem of getting a speaker embedding from an audio.

This demo is an implementation to extract speaker embedding from a specific audio file. It can be done by a single command or a few lines in python using PaddleSpeech.

Usage

1. Installation

see installation.

You can choose one way from easy, meduim and hard to install paddlespeech.

2. Prepare Input File

The input of this cli demo should be a WAV file(.wav), and the sample rate must be the same as the model.

Here are sample files for this demo that can be downloaded:

wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/123456789.wav

3. Usage

  • Command Line(Recommended)

    paddlespeech vector --task spk --input 85236145389.wav
    
    echo -e "demo1 85236145389.wav" > vec.job
    paddlespeech vector --task spk --input vec.job
    
    echo -e "demo2 85236145389.wav \n demo3 85236145389.wav" | paddlespeech vector --task spk
    
    paddlespeech vector --task score --input "./85236145389.wav ./123456789.wav"
    
    echo -e "demo4 85236145389.wav 85236145389.wav \n demo5 85236145389.wav 123456789.wav" > vec.job
    paddlespeech vector --task score --input vec.job

    Usage:

    paddlespeech vector --help

    Arguments:

    • input(required): Audio file to recognize.
    • task (required): Specify vector task. Default spk
    • model: Model type of vector task. Default: ecapatdnn_voxceleb12.
    • sample_rate: Sample rate of the model. Default: 16000.
    • config: Config of vector task. Use pretrained model when it is None. Default: None.
    • ckpt_path: Model checkpoint. Use pretrained model when it is None. Default: None.
    • device: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.

    Output:

      demo [ -1.3251206    7.8606825   -4.620626     0.3000721    2.2648535
      -1.1931441    3.0647137    7.673595    -6.0044727  -12.02426
      -1.9496069    3.1269536    1.618838    -7.6383104   -1.2299773
    -12.338331     2.1373026   -5.3957124    9.717328     5.6752305
      3.7805123    3.0597172    3.429692     8.97601     13.174125
      -0.53132284   8.9424715    4.46511     -4.4262476   -9.726503
      8.399328     7.2239175   -7.435854     2.9441683   -4.3430395
    -13.886965    -1.6346735  -10.9027405   -5.311245     3.8007221
      3.8976038   -2.1230774   -2.3521194    4.151031    -7.4048667
      0.13911647   2.4626107    4.9664545    0.9897574    5.4839754
      -3.3574002   10.1340065   -0.6120171  -10.403095     4.6007543
      16.00935     -7.7836914   -4.1945305   -6.9368606    1.1789556
      11.490801     4.2380238    9.550931     8.375046     7.5089145
      -0.65707296  -0.30051577   2.8406055    3.0828028    0.730817
      6.148354     0.13766119 -13.424735    -7.7461405   -2.3227983
      -8.305252     2.9879124  -10.995229     0.15211068  -2.3820348
      -1.7984174    8.495629    -5.8522367   -3.755498     0.6989711
      -5.2702994   -2.6188622   -1.8828466   -4.64665     14.078544
      -0.5495333   10.579158    -3.2160501    9.349004    -4.381078
    -11.675817    -2.8630207    4.5721755    2.246612    -4.574342
      1.8610188    2.3767874    5.6257877   -9.784078     0.64967257
      -1.4579505    0.4263264   -4.9211264   -2.454784     3.4869802
      -0.42654222   8.341269     1.356552     7.0966883  -13.102829
      8.016734    -7.1159344    1.8699781    0.208721    14.699384
      -1.025278    -2.6107233   -2.5082312    8.427193     6.9138527
      -6.2912464    0.6157366    2.489688    -3.4668267    9.921763
      11.200815    -0.1966403    7.4916005   -0.62312716  -0.25848144
      -9.947997    -0.9611041    1.1649219   -2.1907122   -1.5028487
      -0.51926106  15.165954     2.4649463   -0.9980445    7.4416637
      -2.0768049    3.5896823   -7.3055434   -7.5620847    4.323335
      0.0804418   -6.56401     -2.3148053   -1.7642345   -2.4708817
      -7.675618    -9.548878    -1.0177554    0.16986446   2.5877135
      -1.8752296   -0.36614323  -6.0493784   -2.3965611   -5.9453387
      0.9424033  -13.155974    -7.457801     0.14658108  -3.742797
      5.8414927   -1.2872906    5.5694313   12.57059      1.0939219
      2.2142086    1.9181576    6.9914207   -5.888139     3.1409824
      -2.003628     2.4434285    9.973139     5.03668      2.0051203
      2.8615603    5.860224     2.9176188   -1.6311141    2.0292206
      -4.070415    -6.831437  ]
  • Python API

    from paddlespeech.cli.vector import VectorExecutor
    
    vector_executor = VectorExecutor()
    audio_emb = vector_executor(
        model='ecapatdnn_voxceleb12',
        sample_rate=16000,
        config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
        ckpt_path=None,
        audio_file='./85236145389.wav',
        device=paddle.get_device())
    print('Audio embedding Result: \n{}'.format(audio_emb))
    
    test_emb = vector_executor(
        model='ecapatdnn_voxceleb12',
        sample_rate=16000,
        config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
        ckpt_path=None,
        audio_file='./123456789.wav',
        device=paddle.get_device())
    print('Test embedding Result: \n{}'.format(test_emb))
    
    # score range [0, 1]
    score = vector_executor.get_embeddings_score(audio_emb, test_emb)
    print(f"Eembeddings Score: {score}")

    Output:

    # Vector Result:
     Audio embedding Result:
      [ -1.3251206    7.8606825   -4.620626     0.3000721    2.2648535
        -1.1931441    3.0647137    7.673595    -6.0044727  -12.02426
        -1.9496069    3.1269536    1.618838    -7.6383104   -1.2299773
      -12.338331     2.1373026   -5.3957124    9.717328     5.6752305
        3.7805123    3.0597172    3.429692     8.97601     13.174125
        -0.53132284   8.9424715    4.46511     -4.4262476   -9.726503
        8.399328     7.2239175   -7.435854     2.9441683   -4.3430395
      -13.886965    -1.6346735  -10.9027405   -5.311245     3.8007221
        3.8976038   -2.1230774   -2.3521194    4.151031    -7.4048667
        0.13911647   2.4626107    4.9664545    0.9897574    5.4839754
        -3.3574002   10.1340065   -0.6120171  -10.403095     4.6007543
        16.00935     -7.7836914   -4.1945305   -6.9368606    1.1789556
        11.490801     4.2380238    9.550931     8.375046     7.5089145
        -0.65707296  -0.30051577   2.8406055    3.0828028    0.730817
        6.148354     0.13766119 -13.424735    -7.7461405   -2.3227983
        -8.305252     2.9879124  -10.995229     0.15211068  -2.3820348
        -1.7984174    8.495629    -5.8522367   -3.755498     0.6989711
        -5.2702994   -2.6188622   -1.8828466   -4.64665     14.078544
        -0.5495333   10.579158    -3.2160501    9.349004    -4.381078
      -11.675817    -2.8630207    4.5721755    2.246612    -4.574342
        1.8610188    2.3767874    5.6257877   -9.784078     0.64967257
        -1.4579505    0.4263264   -4.9211264   -2.454784     3.4869802
        -0.42654222   8.341269     1.356552     7.0966883  -13.102829
        8.016734    -7.1159344    1.8699781    0.208721    14.699384
        -1.025278    -2.6107233   -2.5082312    8.427193     6.9138527
        -6.2912464    0.6157366    2.489688    -3.4668267    9.921763
        11.200815    -0.1966403    7.4916005   -0.62312716  -0.25848144
        -9.947997    -0.9611041    1.1649219   -2.1907122   -1.5028487
        -0.51926106  15.165954     2.4649463   -0.9980445    7.4416637
        -2.0768049    3.5896823   -7.3055434   -7.5620847    4.323335
        0.0804418   -6.56401     -2.3148053   -1.7642345   -2.4708817
        -7.675618    -9.548878    -1.0177554    0.16986446   2.5877135
        -1.8752296   -0.36614323  -6.0493784   -2.3965611   -5.9453387
        0.9424033  -13.155974    -7.457801     0.14658108  -3.742797
        5.8414927   -1.2872906    5.5694313   12.57059      1.0939219
        2.2142086    1.9181576    6.9914207   -5.888139     3.1409824
        -2.003628     2.4434285    9.973139     5.03668      2.0051203
        2.8615603    5.860224     2.9176188   -1.6311141    2.0292206
        -4.070415    -6.831437  ]
      # get the test embedding
      Test embedding Result:
      [  2.5247195    5.119042    -4.335273     4.4583654    5.047907
        3.5059214    1.6159848    0.49364898 -11.6899185   -3.1014526
        -5.6589785   -0.42684984   2.674276   -11.937654     6.2248464
      -10.776924    -5.694543     1.112041     1.5709964    1.0961034
        1.3976512    2.324352     1.339981     5.279319    13.734659
        -2.5753925   13.651442    -2.2357535    5.1575427   -3.251567
        1.4023279    6.1191974   -6.0845175   -1.3646189   -2.6789894
      -15.220778     9.779349    -9.411551    -6.388947     6.8313975
        -9.245996     0.31196198   2.5509644   -4.413065     6.1649427
        6.793837     2.6328635    8.620976     3.4832475    0.52491665
        2.9115407    5.8392377    0.6702376   -3.2726715    2.6694255
        16.91701     -5.5811176    0.23362345  -4.5573606  -11.801059
        14.728292    -0.5198082   -3.999922     7.0927105   -7.0459595
        -5.4389      -0.46420583  -5.1085467   10.376568    -8.889225
        -0.37705845  -1.659806     2.6731026   -7.1909504    1.4608804
        -2.163136    -0.17949677   4.0241547    0.11319201   0.601279
        2.039692     3.1910992  -11.649526    -8.121584    -4.8707457
        0.3851982    1.4231744   -2.3321972    0.99332285  14.121717
        5.899413     0.7384519  -17.760096    10.555021     4.1366534
        -0.3391071   -0.20792882   3.208204     0.8847948   -8.721497
        -6.432868    13.006379     4.8956      -9.155822    -1.9441519
        5.7815638   -2.066733    10.425042    -0.8802383   -2.4314315
        -9.869258     0.35095334  -5.3549943    2.1076174   -8.290468
        8.4433365   -4.689333     9.334139    -2.172678    -3.0250976
        8.394216    -3.2110903   -7.93868      2.3960824   -2.3213403
        -1.4963245   -3.476059     4.132903   -10.893354     4.362673
        -0.45456508  10.258634    -1.1655927   -6.7799754    0.22885278
        -4.399287     2.333433    -4.84745     -4.2752337   -1.3577863
        -1.0685898    9.505196     7.3062205    0.08708266  12.927811
        -9.57974      1.3936648   -1.9444873    5.776769    15.251903
        10.6118355   -1.4903594   -9.535318    -3.6553776   -1.6699586
        -0.5933151    7.600357    -4.8815503   -8.698617   -15.855757
        0.25632986  -7.2235737    0.9506656    0.7128582   -9.051738
        8.74869     -1.6426028   -6.5762258    2.506905    -6.7431564
        5.129912   -12.189555    -3.6435068   12.068113    -6.0059533
        -2.3535995    2.9014351   22.3082      -1.5563312   13.193291
        2.7583609   -7.468798     1.3407065   -4.599617    -6.2345777
        10.7689295    7.137627     5.099476     0.3473359    9.647881
        -2.0484571   -5.8549366 ]
      # get the score between enroll and test
      Eembeddings Score: 0.45332613587379456

4.Pretrained Models

Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:

Model Sample Rate
ecapatdnn_voxceleb12 16k
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/xuxmhao/PaddleSpeech.git
git@gitee.com:xuxmhao/PaddleSpeech.git
xuxmhao
PaddleSpeech
PaddleSpeech
develop

搜索帮助

344bd9b3 5694891 D2dac590 5694891