1 Star 0 Fork 0

johntao / pycorrector

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
Apache-2.0

🇨🇳中文 | 🌐English | 📖文档/Docs | 🤖模型/Models


pycorrector: useful python text correction toolkit

PyPI version Downloads GitHub contributors License Apache 2.0 python_vesion GitHub issues Wechat Group

pycorrector: 中文文本纠错工具。支持中文音似、形似、语法错误纠正,python3开发。

pycorrector实现了Kenlm、ConvSeq2Seq、BERT、MacBERT、ELECTRA、ERNIE、Transformer等多种模型的文本纠错,并在SigHAN数据集评估各模型的效果。

Guide

Introduction

中文文本纠错任务,常见错误类型:

当然,针对不同业务场景,这些问题并不一定全部存在,比如拼音输入法、语音识别校对关注音似错误;五笔输入法、OCR校对关注形似错误, 搜索引擎query纠错关注所有错误类型。

本项目重点解决其中的"音似、形字、语法、专名错误"等类型。

News

[2023/11/07] v1.0.0版本:新增了ChatGLM3/LLaMA2等GPT模型用于中文文本纠错,发布了基于ChatGLM3-6B的shibing624/chatglm3-6b-csc-chinese-lora拼写和语法纠错模型;重写了DeepContext、ConvSeq2Seq、T5等模型的实现。详见Release-v1.0.0

Features

  • Kenlm模型:本项目基于Kenlm统计语言模型工具训练了中文NGram语言模型,结合规则方法、混淆集可以纠正中文拼写错误,方法速度快,扩展性强,效果一般
  • DeepContext模型:本项目基于PyTorch实现了用于文本纠错的DeepContext模型,该模型结构参考Stanford University的NLC模型,2014英文纠错比赛得第一名,效果一般
  • Seq2Seq模型:本项目基于PyTorch实现了用于中文文本纠错的ConvSeq2Seq模型,该模型在NLPCC-2018的中文语法纠错比赛中,使用单模型并取得第三名,可以并行训练,模型收敛快,效果一般
  • T5模型:本项目基于PyTorch实现了用于中文文本纠错的T5模型,使用Langboat/mengzi-t5-base的预训练模型finetune中文纠错数据集,模型改造的潜力较大,效果好
  • ERNIE_CSC模型:本项目基于PaddlePaddle实现了用于中文文本纠错的ERNIE_CSC模型,模型在ERNIE-1.0上finetune,模型结构适配了中文拼写纠错任务,效果好
  • MacBERT模型【推荐】:本项目基于PyTorch实现了用于中文文本纠错的MacBERT4CSC模型,模型加入了错误检测和纠正网络,适配中文拼写纠错任务,效果好
  • GPT模型:本项目基于PyTorch实现了用于中文文本纠错的ChatGLM/LLaMA模型,模型在中文CSC和语法纠错数据集上finetune,适配中文文本纠错任务,效果好

Demo

run example: examples/macbert/gradio_demo.py to see the demo:

python examples/macbert/gradio_demo.py

Evaluation

提供评估脚本examples/evaluate_models/evaluate_models.py

  • 使用sighan15评估集:SIGHAN2015的测试集pycorrector/data/sighan2015_test.tsv ,已经转为简体中文
  • 评估标准:纠错准召率,采用严格句子粒度(Sentence Level)计算方式,把模型纠正之后的与正确句子完成相同的视为正确,否则为错

评估结果

评估数据集:SIGHAN2015测试集

GPU:Tesla V100,显存 32 GB

Model Name Model Link Base Model GPU Precision Recall F1 QPS
Kenlm-CSC shibing624/chinese-kenlm-klm kenlm CPU 0.6860 0.1529 0.2500 9
BART-CSC shibing624/bart4csc-base-chinese fnlp/bart-base-chinese GPU 0.6984 0.6354 0.6654 58
Mengzi-T5-CSC shibing624/mengzi-t5-base-chinese-correction mengzi-t5-base GPU 0.8321 0.6390 0.7229 214
MacBERT-CSC shibing624/macbert4csc-base-chinese hfl/chinese-macbert-base GPU 0.8254 0.7311 0.7754 224
ChatGLM3-6B-CSC shibing624/chatglm3-6b-csc-chinese-lora THUDM/chatglm3-6b GPU 0.5574 0.4917 0.5225 4

结论

Install

pip install -U pycorrector

or

pip install -r requirements.txt

git clone https://github.com/shibing624/pycorrector.git
cd pycorrector
pip install --no-deps .

通过以上两种方法的任何一种完成安装都可以。如果不想安装依赖包,可以拉docker环境。

  • docker使用
docker run -it -v ~/.pycorrector:/root/.pycorrector shibing624/pycorrector:0.0.2

后续调用python使用即可,该镜像已经安装好kenlm、pycorrector等包,具体参见Dockerfile

使用示例:

docker

Usage

本项目的初衷之一是比对、调研各种中文文本纠错方法,抛砖引玉。

项目实现了kenlm、macbert、seq2seq、 ernie_csc、T5、deepcontext、LLaMA等模型应用于文本纠错任务,各模型均可基于已经训练好的纠错模型快速预测,也可使用自有数据训练、预测。

kenlm模型(统计模型)

中文拼写纠错

example: examples/kenlm/demo.py

from pycorrector import Corrector
m = Corrector()
print(m.correct_batch(['少先队员因该为老人让坐', '你找到你最喜欢的工作,我也很高心。']))

output:

[{'source': '少先队员因该为老人让坐', 'target': '少先队员应该为老人让座', 'errors': [('因该', '应该', 4), ('坐', '座', 10)]}
{'source': '你找到你最喜欢的工作,我也很高心。', 'target': '你找到你最喜欢的工作,我也很高兴。', 'errors': [('心', '兴', 15)]}]
  • Corrector()类是kenlm统计模型的纠错方法实现,默认会从路径~/.pycorrector/datasets/zh_giga.no_cna_cmn.prune01244.klm加载kenlm语言模型文件,如果检测没有该文件, 则程序会自动联网下载。当然也可以手动下载模型文件(2.8G)并放置于该位置
  • 返回值: correct方法返回dict,{'source': '原句子', 'target': '纠正后的句子', 'errors': [('错误词', '正确词', '错误位置'), ...]},correct_batch方法返回包含多个dictlist

错误检测

example: examples/kenlm/detect_demo.py

from pycorrector import Corrector
m = Corrector()
idx_errors = m.detect('少先队员因该为老人让坐')
print(idx_errors)

output:

[['因该', 4, 6, 'word'], ['坐', 10, 11, 'char']]
  • 返回值:list, [error_word, begin_pos, end_pos, error_type]pos索引位置以0开始。

成语、专名纠错

example: examples/kenlm/use_custom_proper.py

from pycorrector import Corrector
m = Corrector(proper_name_path='./my_custom_proper.txt')
x = ['报应接中迩来', '这块名表带带相传',]
for i in x:
    print(i, ' -> ', m.correct(i))

output:

报应接中迩来  ->  {'source': '报应接踵而来', 'target': '报应接踵而来', 'errors': [('接中迩来', '接踵而来', 2)]}
这块名表带带相传  ->  {'source': '这块名表代代相传', 'target': '这块名表代代相传', 'errors': [('带带相传', '代代相传', 4)]}

自定义混淆集

通过加载自定义混淆集,支持用户纠正已知的错误,包括两方面功能:1)【提升准确率】误杀加白;2)【提升召回率】补充召回。

example: examples/kenlm/use_custom_confusion.py

from pycorrector import Corrector

error_sentences = [
    '买iphonex,要多少钱',
    '共同实际控制人萧华、霍荣铨、张旗康',
]
m = Corrector()
print(m.correct_batch(error_sentences))
print('*' * 42)
m = Corrector(custom_confusion_path_or_dict='./my_custom_confusion.txt')
print(m.correct_batch(error_sentences))

output:

('买iphonex,要多少钱', [])   # "iphonex"漏召,应该是"iphoneX"
('共同实际控制人萧华、霍荣铨、张启康', [('张旗康', '张启康', 14)]) # "张启康"误杀,应该不用纠
*****************************************************
('买iphonex,要多少钱', [('iphonex', 'iphoneX', 1)])
('共同实际控制人萧华、霍荣铨、张旗康', [])
  • 其中./my_custom_confusion.txt的内容格式如下,以空格间隔:
iPhone差 iPhoneX
张旗康 张旗康

自定义混淆集ConfusionCorrector类,除了上面演示的和Corrector类一起使用,还可以和MacBertCorrector一起使用,也可以独立使用。示例代码 examples/macbert/model_correction_pipeline_demo.py

自定义语言模型

默认提供下载并使用的kenlm语言模型zh_giga.no_cna_cmn.prune01244.klm文件是2.8G,内存小的电脑使用pycorrector程序可能会吃力些。

支持用户加载自己训练的kenlm语言模型,或使用2014版人民日报数据训练的模型,模型小(140M),准确率稍低,模型下载地址:shibing624/chinese-kenlm-klm | people2014corpus_chars.klm(密码o5e9)

example:examples/kenlm/load_custom_language_model.py

from pycorrector import Corrector
model = Corrector(language_model_path='people2014corpus_chars.klm')
print(model.correct('少先队员因该为老人让坐'))

英文拼写纠错

支持英文单词级别的拼写错误纠正。

example:examples/kenlm/en_correct_demo.py

from pycorrector import EnSpellCorrector
m = EnSpellCorrector()
sent = "what happending? how to speling it, can you gorrect it?"
print(m.correct(sent))

output:

{'source': 'what happending? how to speling it, can you gorrect it?', 'target': 'what happening? how to spelling it, can you correct it?', 'errors': [('happending', 'happening', 5), ('speling', 'spelling', 24), ('gorrect', 'correct', 44)]}

中文简繁互换

支持中文繁体到简体的转换,和简体到繁体的转换。

example:examples/kenlm/traditional_simplified_chinese_demo.py

import pycorrector

traditional_sentence = '憂郁的臺灣烏龜'
simplified_sentence = pycorrector.traditional2simplified(traditional_sentence)
print(traditional_sentence, '=>', simplified_sentence)

simplified_sentence = '忧郁的台湾乌龟'
traditional_sentence = pycorrector.simplified2traditional(simplified_sentence)
print(simplified_sentence, '=>', traditional_sentence)

output:

憂郁的臺灣烏龜 => 忧郁的台湾乌龟
忧郁的台湾乌龟 => 憂郁的臺灣烏龜

命令行模式

支持kenlm方法的批量文本纠错

python -m pycorrector -h
usage: __main__.py [-h] -o OUTPUT [-n] [-d] input

@description:

positional arguments:
  input                 the input file path, file encode need utf-8.

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        the output file path.
  -n, --no_char         disable char detect mode.
  -d, --detail          print detail info

case:

python -m pycorrector input.txt -o out.txt -n -d
  • 输入文件:input.txt;输出文件:out.txt ;关闭字粒度纠错;打印详细纠错信息;纠错结果以\t间隔

MacBert4CSC模型

基于MacBERT改变网络结构的中文拼写纠错模型,模型已经开源在HuggingFace Models:https://huggingface.co/shibing624/macbert4csc-base-chinese

模型网络结构:

  • 本项目是 MacBERT 改变网络结构的中文文本纠错模型,可支持 BERT 类模型为 backbone
  • 在原生 BERT 模型上进行了魔改,追加了一个全连接层作为错误检测即 detection , MacBERT4CSC 训练时用 detection 层和 correction 层的 loss 加权得到最终的 loss,预测时用 BERT MLM 的 correction 权重即可

macbert_network

详细教程参考examples/macbert/README.md

pycorrector快速预测

example:examples/macbert/demo.py

from pycorrector import MacBertCorrector
m = MacBertCorrector("shibing624/macbert4csc-base-chinese")
print(m.correct_batch(['今天新情很好', '你找到你最喜欢的工作,我也很高心。']))

output:

{'source': '今天新情很好', 'target': '今天心情很好', 'errors': [('新', '心', 2)]}
{'source': '你找到你最喜欢的工作,我也很高心。', 'target': '你找到你最喜欢的工作,我也很高兴。', 'errors': [('心', '兴', 15)]}

transformers快速预测

examples/macbert/README.md

T5模型

基于T5的中文拼写纠错模型,模型训练详细教程参考examples/t5/README.md

pycorrector快速预测

example:examples/t5/demo.py

from pycorrector import T5Corrector
m = T5Corrector()
print(m.correct_batch(['今天新情很好', '你找到你最喜欢的工作,我也很高心。']))

output:

[{'source': '今天新情很好', 'target': '今天心情很好', 'errors': [('新', '心', 2)]},
{'source': '你找到你最喜欢的工作,我也很高心。', 'target': '你找到你最喜欢的工作,我也很高兴。', 'errors': [('心', '兴', 15)]}]

GPT模型

基于ChatGLM3、LLaMA、Baichuan、QWen等模型微调训练纠错模型,训练方法见examples/gpt/README.md

在ChatGLM3-6B上SFT微调的纠错模型,已经release到HuggingFace Models: https://huggingface.co/shibing624/chatglm3-6b-csc-chinese-lora

pycorrector快速预测

example: examples/gpt/demo.py

from pycorrector import GptCorrector
m = GptCorrector()
print(m.correct_batch(['今天新情很好', '你找到你最喜欢的工作,我也很高心。']))

output:

[{'source': '今天新情很好', 'target': '今天心情很好', 'errors': [('新', '心', 2)]},
{'source': '你找到你最喜欢的工作,我也很高心。', 'target': '你找到你最喜欢的工作,我也很高兴。', 'errors': [('心', '兴', 15)]}]

ErnieCSC模型

基于ERNIE的中文拼写纠错模型,模型已经开源在PaddleNLP。 模型网络结构:

详细教程参考examples/ernie_csc/README.md

pycorrector快速预测

example:examples/ernie_csc/demo.py

from pycorrector import ErnieCscCorrector

if __name__ == '__main__':
    error_sentences = [
        '真麻烦你了。希望你们好好的跳无',
        '少先队员因该为老人让坐',
    ]
    m = ErnieCscCorrector()
    batch_res = m.correct_batch(error_sentences)
    for i in batch_res:
        print(i)
        print()

output:

{'source': '真麻烦你了。希望你们好好的跳无', 'target': '真麻烦你了。希望你们好好的跳舞', 'errors': [{'position': 14, 'correction': {'无': '舞'}}]}
{'source': '少先队员因该为老人让坐', 'target': '少先队员应该为老人让座', 'errors': [{'position': 4, 'correction': {'因': '应'}}, {'position': 10, 'correction': {'坐': '座'}}]}

Bart模型

基于SIGHAN+Wang271K中文纠错数据集训练的Bart4CSC模型,已经release到HuggingFace Models: https://huggingface.co/shibing624/bart4csc-base-chinese

from transformers import BertTokenizerFast
from textgen import BartSeq2SeqModel

tokenizer = BertTokenizerFast.from_pretrained('shibing624/bart4csc-base-chinese')
model = BartSeq2SeqModel(
    encoder_type='bart',
    encoder_decoder_type='bart',
    encoder_decoder_name='shibing624/bart4csc-base-chinese',
    tokenizer=tokenizer,
    args={"max_length": 128, "eval_batch_size": 128})
sentences = ["少先队员因该为老人让坐"]
print(model.predict(sentences))

output:

['少先队员应该为老人让座']

如果需要训练Bart模型,请参考 https://github.com/shibing624/textgen/blob/main/examples/seq2seq/training_bartseq2seq_zh_demo.py

Dataset

数据集 语料 下载链接 压缩包大小
SIGHAN+Wang271K中文纠错数据集 SIGHAN+Wang271K(27万条) 百度网盘(密码01b9)
shibing624/CSC
106M
原始SIGHAN数据集 SIGHAN13 14 15 官方csc.html 339K
原始Wang271K数据集 Wang271K Automatic-Corpus-Generation dimmywang提供 93M
人民日报2014版语料 人民日报2014版 飞书(密码cHcu) 383M
NLPCC 2018 GEC官方数据集 NLPCC2018-GEC 官方trainingdata 114M
NLPCC 2018+HSK熟语料 nlpcc2018+hsk+CGED 百度网盘(密码m6fg)
飞书(密码gl9y)
215M
NLPCC 2018+HSK原始语料 HSK+Lang8 百度网盘(密码n31j)
飞书(密码Q9LH)
81M
中文纠错比赛数据汇总 Chinese Text Correction(CTC) 中文纠错汇总数据集(天池) -
NLPCC 2023中文语法纠错数据集 NLPCC 2023 Sharedtask1 Task 1: Chinese Grammatical Error Correction(Training Set) 125M

说明:

  • SIGHAN+Wang271K中文纠错数据集(27万条),是通过原始SIGHAN13、14、15年数据集和Wang271K数据集格式转化后得到,json格式,带错误字符位置信息,SIGHAN为test.json, macbert4csc模型训练可以直接用该数据集复现paper准召结果,详见pycorrector/macbert/README.md
  • NLPCC 2018 GEC官方数据集NLPCC2018-GEC, 训练集trainingdata[解压后114.5MB],该数据格式是原始文本,未做切词处理。
  • 汉语水平考试(HSK)和lang8原始平行语料[HSK+Lang8]百度网盘(密码n31j),该数据集已经切词,可用作数据扩增。
  • NLPCC 2018 + HSK + CGED16、17、18的数据,经过以字切分,繁体转简体,打乱数据顺序的预处理后,生成用于纠错的熟语料(nlpcc2018+hsk) ,百度网盘(密码:m6fg) [130万对句子,215MB]

SIGHAN+Wang271K中文纠错数据集,数据格式:

[
    {
        "id": "B2-4029-3",
        "original_text": "晚间会听到嗓音,白天的时候大家都不会太在意,但是在睡觉的时候这嗓音成为大家的恶梦。",
        "wrong_ids": [
            5,
            31
        ],
        "correct_text": "晚间会听到噪音,白天的时候大家都不会太在意,但是在睡觉的时候这噪音成为大家的恶梦。"
    }
]

字段解释:

  • id:唯一标识符,无意义
  • original_text: 原始错误文本
  • wrong_ids: 错误字的位置,从0开始
  • correct_text: 纠正后的文本

自有数据集

可以使用自己数据集训练纠错模型,把自己数据集标注好,保存为跟训练样本集一样的json格式,然后加载数据训练模型即可。

  1. 已有大量业务相关错误样本,主要标注错误位置(wrong_ids)和纠错后的句子(correct_text)
  2. 没有现成的错误样本,可以写脚本生成错误样本(original_text),根据音似、形似等特征把正确句子的指定位置(wrong_ids)字符改为错字,附上 第三方同音字生成脚本同音词替换

Language Model

什么是语言模型?-wiki

语言模型对于纠错步骤至关重要,当前默认使用的是从千兆中文文本训练的中文语言模型zh_giga.no_cna_cmn.prune01244.klm(2.8G), 提供人民日报2014版语料训练得到的轻量版语言模型people2014corpus_chars.klm(密码o5e9)

大家可以用中文维基(繁体转简体,pycorrector.utils.text_utils下有此功能)等语料数据训练通用的语言模型,或者也可以用专业领域语料训练更专用的语言模型。更适用的语言模型,对于纠错效果会有比较好的提升。

  1. kenlm语言模型训练工具的使用,请见博客:http://blog.csdn.net/mingzai624/article/details/79560063
  2. 附上训练语料<人民日报2014版熟语料>,包括: 1)标准人工切词及词性数据people2014.tar.gz, 2)未切词文本数据people2014_words.txt, 3)kenlm训练字粒度语言模型文件及其二进制文件people2014corpus_chars.arps/klm, 4)kenlm词粒度语言模型文件及其二进制文件people2014corpus_words.arps/klm。

尊重版权,传播请注明出处。

Contact

  • Github Issue(建议):GitHub issues
  • Github discussions:欢迎到讨论区GitHub discussions灌水(不会打扰开发者),公开交流纠错技术和问题
  • 邮件我:xuming: xuming624@qq.com
  • 微信我:加我微信号:xuming624, 进Python-NLP交流群,备注:姓名-公司名-NLP

Citation

如果你在研究中使用了pycorrector,请按如下格式引用:

APA:

Xu, M. Pycorrector: Text error correction tool (Version 0.4.2) [Computer software]. https://github.com/shibing624/pycorrector

BibTeX:

@misc{Xu_Pycorrector_Text_error,
  title={Pycorrector: Text error correction tool},
  author={Ming Xu},
  year={2023},
  howpublished={\url{https://github.com/shibing624/pycorrector}},
}

License

pycorrector 的授权协议为 Apache License 2.0,可免费用做商业用途。请在产品说明中附加pycorrector的链接和授权协议。

Contribute

项目代码还很粗糙,如果大家对代码有所改进,欢迎提交回本项目,在提交之前,注意以下两点:

  • tests添加相应的单元测试
  • 使用python -m pytest来运行所有单元测试,确保所有单测都是通过的

之后即可提交PR。

References

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

简介

暂无描述 展开 收起
Python 等 2 种语言
Apache-2.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
1
https://gitee.com/johntao/pycorrector.git
git@gitee.com:johntao/pycorrector.git
johntao
pycorrector
pycorrector
master

搜索帮助