From an object parsed by spacy_parse, extract the entities as a separate object, or convert the multi-word entities into single "token" consisting of the concatenated elements of the multi-word entities.

entity_extract(x, type = c("named", "extended", "all"),
  concatenator = "_")

entity_consolidate(x, concatenator = "_")

Arguments

x

output from spacy_parse.

type

type of named entities, either named, extended, or all. See https://spacy.io/docs/usage/entity-recognition#entity-types for details.

concatenator

the character(s) used to join the elements of multi-word named entities

Value

entity_extract returns a data.frame of all named entities, containing the following fields:

  • doc_id name of the document containing the entity

  • sentence_id the sentence ID containing the entity, within the document

  • entity the named entity

  • entity_type type of named entities (e.g. PERSON, ORG, PERCENT, etc.)

entity_consolidate returns a modified data.frame of parsed results, where the named entities have been combined into a single "token". Currently, dependency parsing is removed when this consolidation occurs.

Examples

#> Found 'spacy_condaenv'. spacyr will use this environment
#> successfully initialized (spaCy Version: 2.0.10, language model: en)
#> (python options: type = "condaenv", value = "spacy_condaenv")
# entity extraction txt <- "Mr. Smith of moved to San Francisco in December." parsed <- spacy_parse(txt, entity = TRUE) entity_extract(parsed)
#> doc_id sentence_id entity entity_type #> 1 text1 1 Smith PERSON #> 2 text1 1 San_Francisco GPE
entity_extract(parsed, type = "all")
#> doc_id sentence_id entity entity_type #> 1 text1 1 Smith PERSON #> 2 text1 1 San_Francisco GPE #> 3 text1 1 December DATE
# consolidating multi-word entities txt <- "The House of Representatives voted to suspend aid to South Dakota." parsed <- spacy_parse(txt, entity = TRUE) entity_consolidate(parsed)
#> doc_id sentence_id token_id token #> 1 text1 1 1 The_House_of_Representatives #> 2 text1 1 2 voted #> 3 text1 1 3 to #> 4 text1 1 4 suspend #> 5 text1 1 5 aid #> 6 text1 1 6 to #> 7 text1 1 7 South_Dakota #> 8 text1 1 8 . #> lemma pos entity_type #> 1 the_house_of_representatives ENTITY ORG #> 2 vote VERB #> 3 to PART #> 4 suspend VERB #> 5 aid NOUN #> 6 to ADP #> 7 south_dakota ENTITY GPE #> 8 . PUNCT