R/spacy_extract_entity.R
spacy_extract_entity.RdThis function extracts named entities from texts, based on the entity tag
ent attributes of documents objects parsed by spaCy (see
https://spacy.io/usage/linguistic-features#section-named-entities).
a character object or a TIF-compliant corpus data.frame (see https://github.com/ropenscilabs/tif)
type of returned object, either "list" or
"data.frame".
type of named entities, either named, extended, or
all. See
https://spacy.io/docs/usage/entity-recognition#entity-types for
details.
logical; If TRUE, the processing is parallelized
using spaCy's architecture (https://spacy.io/api)
unused
either a list or data.frame of tokens
When the option output = "data.frame" is selected, the
function returns a data.frame with the following fields.
entity_typetype of entity (e.g. ORG for
organizations)
start_idserial number ID of starting token.
This number corresponds with the number of data.frame returned from
spacy_tokenize(x) with default options.
lengthnumber
of words (tokens) included in a named entity (e.g. for an entity, "New York
Stock Exchange"", length = 4)
if (FALSE) {
spacy_initialize()
txt <- c(doc1 = "The Supreme Court is located in Washington D.C.",
doc2 = "Paul earned a postgraduate degree from MIT.")
spacy_extract_entity(txt)
spacy_extract_entity(txt, output = "list")
}