Datasaur
Search…
ML Assisted Labeling
Datasaur provide these service providers to help you label your project.

spaCy

Tagset

PERSON, NORP, FACILITY, ORG, GPE, LOC, PRODUCT, EVENT, WORK_OF_ART, LAW, LANGUAGE, DATE, TIME, PERCENT, MONEY, QUANTITY, ORDINAL, CARDINAL.

DistilBERT OPIEC

Tagset

date, duration, location, misc, money, number, ordinal, organization, percent, person, set, time.

NLTK

NLTK POS-tagging is done with english nltk.pos_tag which internally uses nltk.PerceptronTagger.
Greedy Averaged Perceptron tagger, as implemented by Matthew Honnibal. See more implementation details here: https://explosion.ai/blog/part-of-speech-pos-tagger-in-python>

Tagset

An off-the-shelf tagger is available for English. It uses the Penn Treebank tagset:

References

CoreNLP POS

CoreNLP POS-tagging is done using CoreNLP Server using official pre-trained model invoked from fromnltk.parse.corenlp.CoreNLPParser

Tagset

What is the tag set used by the Stanford Tagger?
You can train models for the Stanford POS Tagger with any tag set. For the models we distribute, the tag set depends on the language, reflecting the underlying treebanks that models have been built from. That is, the tag set was wholly or mainly decided by the treebank producers not us). Here are relevant links:

References

CoreNLP NER

CoreNLP NER-tagging is done using CoreNLP Server using official pre-trained model invoked from fromnltk.parse.corenlp.CoreNLPParser

Tagset

For English, by default, this annotator recognizes named (PERSON, LOCATION, ORGANIZATION, MISC), numerical (MONEY, NUMBER, ORDINAL, PERCENT), and temporal (DATE, TIME, DURATION, SET) entities (12 classes). Adding the regexner annotator and using the supplied RegexNER pattern files adds support for the fine-grained and additional entity classes EMAIL, URL, CITY, STATE_OR_PROVINCE, COUNTRY, NATIONALITY, RELIGION, (job) TITLE, IDEOLOGY, CRIMINAL_CHARGE, CAUSE_OF_DEATH, (Twitter, etc.) HANDLE (12 classes) for a total of 24 classes. Named entities are recognized using a combination of three CRF sequence taggers trained on various corpora, including CoNLL, ACE, MUC, and ERE corpora. Numerical entities are recognized using a rule-based system.
PERSON, LOCATION, ORGANIZATION, MISC, MONEY, NUMBER, ORDINAL, PERCENT, DATE, TIME, DURATION, SET, EMAIL, URL, CITY, STATE_OR_PROVINCE, COUNTRY, NATIONALITY, RELIGION, TITLE, IDEOLOGY, CRIMINAL_CHARGE, CAUSE_OF_DEATH, HANDLE

References

SparkNLP POS

SparkNLP POS-tagging is done using en.pos spell on johnsnowlabs/nlp_server​

Tagset

References

SparkNLP NER

SparkNLP POS-tagging is done using en.ner spell on johnsnowlabs/nlp_server​

Tagset

LOC, ORG, PER, MISC

References

Appendix

NLTK Treebank

$
dollar e.g. $, -$, --$, A$, C$, HK$, M$, NZ$, S$, U.S.$, US$
''
closing quotation mark e.g. ', ''
(
opening parenthesis e.g. (, [, {
,
comma e.g. ,
--
dash e.g. --
.
sentence terminator e.g. ., !, ?
:
colon or ellipsis e.g. :, ;, ...
``
opening quotation mark e.g. `, ``

Treebank Tagset

Tag
Description
CC
conjunction, coordinating e.g. &, 'n, and, both, but, either, et, for, less, minus, neither, nor, or, plus, so, therefore, times, v., versus, vs., whether, yet
CD
numeral, cardinal e.g. mid-1890, nine-thirty, forty-two, one-tenth, ten, million, 0.5, one, forty-, seven, 1987, twenty, '79, zero, two, 78-degrees, eighty-four, IX, '60s, .025, fifteen, 271,124, dozen, quintillion, DM2,000, ...
DT
determiner e.g. all, an, another, any, both, del, each, either, every, half, la, many, much, nary, neither, no, some, such, that, the, them, these, this, those
EX
existential there e.g. there
FW
foreign word e.g. gemeinschaft, hund, ich, jeux, habeas, Haementeria, Herr, K'ang-si, vous, lutihaw, alai, je, jour, objets, salutaris, fille, quibusdam, pas, trop, Monte, terram, fiche, oui, corporis, ...
IN
preposition or conjunction, subordinating e.g. astride, among, uppon, whether, out, inside, pro, despite, on, by, throughout, below, within, for, towards, near, behind, atop, around, if, like, until, below, next, into, if, beside, ...
JJ
adjective or numeral, ordinal e.g. third, ill-mannered, pre-war, regrettable, oiled, calamitous, first, separable, ectoplasmic, battery-powered, participatory, fourth, still-to-be-named, multilingual, multi-disciplinary, ...
JJR
adjective, comparative e.g. bleaker, braver, breezier, briefer, brighter, brisker, broader, bumper, busier, calmer, cheaper, choosier, cleaner, clearer, closer, colder, commoner, costlier, cozier, creamier, crunchier, cuter, ...
JJS
adjective, superlative e.g. calmest, cheapest, choicest, classiest, cleanest, clearest, closest, commonest, corniest, costliest, crassest, creepiest, crudest, cutest, darkest, deadliest, dearest, deepest, densest, dinkiest, ...
LS
list item marker e.g. A, A., B, B., C, C., D, E, F, First, G, H, I, J, K, One, SP-44001, SP-44002, SP-44005, SP-44007, Second, Third, Three, Two, *, a, b, c, d, first, five, four, one, six, three, two
MD
modal auxiliary e.g. can, cannot, could, couldn't, dare, may, might, must, need, ought, shall, should, shouldn't, will, would
NN
noun, common, singular or mass e.g. common-carrier, cabbage, knuckle-duster, Casino, afghan, shed, thermostat, investment, slide, humour, falloff, slick, wind, hyena, override, subhumanity, machinist, ...
NNP
noun, proper, singular e.g. Motown, Venneboerger, Czestochwa, Ranzer, Conchita, Trumplane, Christos, Oceanside, Escobar, Kreisler, Sawyer, Cougar, Yvette, Ervin, ODI, Darryl, CTCA, Shannon, A.K.C., Meltex, Liverpool, ...
NNPS
noun, proper, plural e.g. Americans, Americas, Amharas, Amityvilles, Amusements, Anarcho-Syndicalists, Andalusians, Andes, Andruses, Angels, Animals, Anthony, Antilles, Antiques, Apache, Apaches, Apocrypha, ...
NNS
noun, common, plural e.g. undergraduates, scotches, bric-a-brac, products, bodyguards, facets, coasts, divestitures, storehouses, designs, clubs, fragrances, averages, subjectivists, apprehensions, muses, factory-jobs, ...
PDT
pre-determiner e.g. all, both, half, many, quite, such, sure, this
POS
genitive marker e.g. ', 's
PRP
pronoun, personal e.g. hers, herself, him, himself, hisself, it, itself, me, myself, one, oneself, ours, ourselves, ownself, self, she, thee, theirs, them, themselves, they, thou, thy, us
PRP$
pronoun, possessive e.g. her, his, mine, my, our, ours, their, thy, your
RB
adverb e.g. occasionally, unabatingly, maddeningly, adventurously, professedly, stirringly, prominently, technologically, magisterially, predominately, swiftly, fiscally, pitilessly, ...

References