R/nounphrase-functions.R
nounphrase_extract.RdFrom an object parsed by spacy_parse(), extract the multi-word
noun phrases as a separate object, or convert the multi-word noun phrases
into single "token" consisting of the concatenated elements of the multi-word
noun phrases.
nounphrase_extract(x, concatenator = "_")
nounphrase_consolidate(x, concatenator = "_")output from spacy_parse()
the character(s) used to join elements of multi-word noun phrases
noun returns a data.frame of all named
entities, containing the following fields:
doc_id name of the document containing the noun phrase
sentence_id the sentence ID containing the noun phrase, within the document
nounphrase the noun phrase
root the root token of the noun phrase
nounphrase_consolidate returns a modified data.frame of
parsed results, where the noun phrases have been combined into a single
"token". Currently, dependency parsing is removed when this consolidation
occurs.
if (FALSE) {
spacy_initialize()
# entity extraction
txt <- "Mr. Smith of moved to San Francisco in December."
parsed <- spacy_parse(txt, nounphrase = TRUE)
entity_extract(parsed)
}
if (FALSE) {
# consolidating multi-word noun phrases
txt <- "The House of Representatives voted to suspend aid to South Dakota."
parsed <- spacy_parse(txt, nounphrase = TRUE)
nounphrase_consolidate(parsed)
}