extract proper names from strings using regular expresssions. The function suposes any sequence of upper letter followed by lower case is a proper name. It is expected to return more things that wanted. Post-processing.

extract_ppn(string, connector = "d[aeo]s?")

Arguments

string

input text

connector

typical connector between names, like "of" "von", "van", "del". regex.

Examples

"O Joaquim José da Silva Xavier, tambén conhecido como Tiradentes foi um auferes." |> extract_ppn()
#> [[1]]
#> [1] "Joaquim José da Silva Xavier" "Tiradentes"                  
#> 
"José da Silva e Fulano de Tal foram, bla Maria Silva. E depois disso, bla Joaquim José da Silva Xavier no STF" |> extract_ppn()
#> [[1]]
#> [1] "José da Silva"                "Fulano de Tal"               
#> [3] "Maria Silva"                  "Joaquim José da Silva Xavier"
#> [5] "STF"                         
#>