Skip to contents

extracts the entity from a text using regex. This regex captures all uppercase words, words that begin with upper case. If there is sequence of this patterns together, this function also captures. In the case of proper names with common lower case connectors like "Wwwww of Wwwww" this function also captures the connector and the subsequent uppercase words.

Usage

extract_entity_rb(
  text,
  connect = connectors("misc"),
  sw = "the",
  underscore = TRUE
)

Arguments

text

an input text

connect

a vector of lowercase connectors. Use use your own, or use the function "connector" to obtain some patterns.

sw

a vector of stopwords

underscore

keep underscore to make compounded words a unique word?

Examples

"John Does lives in New York in United States of America." |> extract_entity()
#> Error in extract_entity("John Does lives in New York in United States of America."): could not find function "extract_entity"
"João Ninguém mora em São José do Rio Preto. Ele esteve antes em Sergipe" |> extract_entity(connect = connectors("pt"))
#> Error in extract_entity("João Ninguém mora em São José do Rio Preto. Ele esteve antes em Sergipe",     connect = connectors("pt")): could not find function "extract_entity"
text |> extract_entity()
#> Error in extract_entity(text): could not find function "extract_entity"