From an object parsed by spacy_parse, extract the multi-word noun phrases as a separate object, or convert the multi-word noun phrases into single "token" consisting of the concatenated elements of the multi-word noun phrases.

nounphrase_extract(x, concatenator = "_")

nounphrase_consolidate(x, concatenator = "_")

Arguments

x

output from spacy_parse

concatenator

the character(s) used to join elements of multi-word noun phrases

Value

noun returns a data.frame of all named entities, containing the following fields:

  • doc_id name of the document containing the noun phrase

  • sentence_id the sentence ID containing the noun phrase, within the document

  • nounphrasethe noun phrase

  • root the root token of the noun phrase

nounphrase_consolidate returns a modified data.frame of parsed results, where the noun phrases have been combined into a single "token". Currently, dependency parsing is removed when this consolidation occurs.

Examples

#> spaCy is already initialized
#> NULL
# entity extraction txt <- "Mr. Smith of moved to San Francisco in December." parsed <- spacy_parse(txt, nounphrase = TRUE) entity_extract(parsed)
#> doc_id sentence_id entity entity_type #> 1 text1 1 Smith PERSON #> 2 text1 1 San_Francisco GPE
# consolidating multi-word noun phrases txt <- "The House of Representatives voted to suspend aid to South Dakota." parsed <- spacy_parse(txt, nounphrase = TRUE) nounphrase_consolidate(parsed)
#> doc_id sentence_id token_id token lemma pos #> 1 text1 1 1 The_House the_house nounphrase #> 2 text1 1 2 of of ADP #> 3 text1 1 3 Representatives representatives nounphrase #> 4 text1 1 4 voted vote VERB #> 5 text1 1 5 to to PART #> 6 text1 1 6 suspend suspend VERB #> 7 text1 1 7 aid aid nounphrase #> 8 text1 1 8 to to ADP #> 9 text1 1 9 South_Dakota south_dakota nounphrase #> 10 text1 1 10 . . PUNCT