同步操作将从 tekin/stanford-corenlp 强制同步,此操作会覆盖自 Fork 仓库以来所做的任何修改,且无法恢复!!!
确定后同步将在后台操作,完成时将刷新页面,请耐心等待。
stanfordcorenlp
is a Python wrapper for Stanford CoreNLP. It provides a simple API for text processing tasks such as Tokenization, Part of Speech Tagging, Named Entity Reconigtion, Constituency Parsing, Dependency Parsing, and more.
Java 1.8+ (Check with command: java -version
) (Download Page)
Stanford CoreNLP (Download Page)
Py Version | CoreNLP Version |
---|---|
v3.7.0.1 v3.7.0.2 | CoreNLP 3.7.0 |
v3.8.0.1 | CoreNLP 3.8.0 |
v3.9.1.1 | CoreNLP 3.9.1 |
pip install stanfordcorenlp
# Simple usage
from stanfordcorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP(r'G:\JavaLibraries\stanford-corenlp-full-2018-02-27')
sentence = 'Guangdong University of Foreign Studies is located in Guangzhou.'
print 'Tokenize:', nlp.word_tokenize(sentence)
print 'Part of Speech:', nlp.pos_tag(sentence)
print 'Named Entities:', nlp.ner(sentence)
print 'Constituency Parsing:', nlp.parse(sentence)
print 'Dependency Parsing:', nlp.dependency_parse(sentence)
nlp.close() # Do not forget to close! The backend server will consume a lot memery.
Output format:
# Tokenize
[u'Guangdong', u'University', u'of', u'Foreign', u'Studies', u'is', u'located', u'in', u'Guangzhou', u'.']
# Part of Speech
[(u'Guangdong', u'NNP'), (u'University', u'NNP'), (u'of', u'IN'), (u'Foreign', u'NNP'), (u'Studies', u'NNPS'), (u'is', u'VBZ'), (u'located', u'JJ'), (u'in', u'IN'), (u'Guangzhou', u'NNP'), (u'.', u'.')]
# Named Entities
[(u'Guangdong', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'of', u'ORGANIZATION'), (u'Foreign', u'ORGANIZATION'), (u'Studies', u'ORGANIZATION'), (u'is', u'O'), (u'located', u'O'), (u'in', u'O'), (u'Guangzhou', u'LOCATION'), (u'.', u'O')]
# Constituency Parsing
(ROOT
(S
(NP
(NP (NNP Guangdong) (NNP University))
(PP (IN of)
(NP (NNP Foreign) (NNPS Studies))))
(VP (VBZ is)
(ADJP (JJ located)
(PP (IN in)
(NP (NNP Guangzhou)))))
(. .)))
# Dependency Parsing
[(u'ROOT', 0, 7), (u'compound', 2, 1), (u'nsubjpass', 7, 2), (u'case', 5, 3), (u'compound', 5, 4), (u'nmod', 2, 5), (u'auxpass', 7, 6), (u'case', 9, 8), (u'nmod', 7, 9), (u'punct', 7, 10)]
Note: you must download an additional model file and place it in the .../stanford-corenlp-full-2018-02-27
folder. For example, you should download the stanford-chinese-corenlp-2018-02-27-models.jar
file if you want to process Chinese.
# _*_coding:utf-8_*_
# Other human languages support, e.g. Chinese
sentence = '清华大学位于北京。'
with StanfordCoreNLP(r'G:\JavaLibraries\stanford-corenlp-full-2018-02-27', lang='zh') as nlp:
print(nlp.word_tokenize(sentence))
print(nlp.pos_tag(sentence))
print(nlp.ner(sentence))
print(nlp.parse(sentence))
print(nlp.dependency_parse(sentence))
Since this will load all the models which require more memory, initialize the server with more memory. 8GB is recommended.
# General json output
nlp = StanfordCoreNLP(r'path_to_corenlp', memory='8g')
print nlp.annotate(sentence)
nlp.close()
You can specify properties:
annotators
: tokenize, ssplit, pos, lemma, ner, parse, depparse, dcoref
(See Detail)
pipelineLanguage
: en, zh, ar, fr, de, es
(English, Chinese, Arabic, French, German, Spanish) (See Annotator Support Detail)
outputFormat
: json, xml, text
text = 'Guangdong University of Foreign Studies is located in Guangzhou. ' \
'GDUFS is active in a full range of international cooperation and exchanges in education. '
props={'annotators': 'tokenize,ssplit,pos','pipelineLanguage':'en','outputFormat':'xml'}
print nlp.annotate(text, properties=props)
nlp.close()
Start a CoreNLP Server with command:
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
And then:
# Use an existing server
nlp = StanfordCoreNLP('http://localhost', port=9000)
import logging
from stanfordcorenlp import StanfordCoreNLP
# Debug the wrapper
nlp = StanfordCoreNLP(r'path_or_host', logging_level=logging.DEBUG)
# Check more info from the CoreNLP Server
nlp = StanfordCoreNLP(r'path_or_host', quiet=False, logging_level=logging.DEBUG)
nlp.close()
We use setuptools
to package our project. You can build from the latest source code with the following command:
$ python setup.py bdist_wheel --universal
You will see the .whl
file under dist
directory.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。