R/entity_link.R
extract_entity.Rd
A rule based entity extractor extracts the entity from a text using regex. This regex captures all uppercase words, words that begin with upper case. If there is sequence of this patterns together, this function also captures. In the case of proper names with common lower case connectors like "Wwwww of Wwwww" this function also captures the connector and the subsequent uppercase words.
extract_entity(text, connect = connectors("misc"), sw = "the")
"John Does lives in New York in United States of America." |> extract_entity()
#> [1] "John Does" "New York"
#> [3] "United States of America"
"João Ninguém mora em São José do Rio Preto. Ele esteve antes em Sergipe" |> extract_entity(connect = connectors("pt"))
#> [1] "João Ninguém" "São José do Rio Preto" "Ele"
#> [4] "Sergipe"