Extract a non directional graph based on co-occurrence in the token. It extracts only if two entities are mentioned in the same token (sentence or paragraph)

extract_graph(
  text,
  using = "sentences",
  connect = connectors("misc"),
  sw = gen_stopwords("en"),
  loop = FALSE
)

Arguments

text

an input text

using

sentence or paragraph to tokenize

connect

lowercase connectors, like the "von" in "John von Neumann".

sw

stopwords vector.

loop

if TRUE, it will not remove loops, a node pointing to itself.

Examples

text <- "John Does lives in New York in United States of America. He  is a passionate jazz musician, often playing in local clubs."
extract_graph(text)
#> Tokenizing by sentences
#> # A tibble: 6 × 2
#>   n1                       n2                      
#>   <chr>                    <chr>                   
#> 1 John Does                New York                
#> 2 John Does                United States of America
#> 3 John Does                He                      
#> 4 New York                 United States of America
#> 5 New York                 He                      
#> 6 United States of America He