rule based entity graph extractor — extract_graph

Extract a non directional graph based on co-occurrence in the token. It extracts only if two entities are mentioned in the same token (sentence or paragraph)

Usage

extract_graph_rb(
  text,
  using = "sentences",
  connect = connectors("misc"),
  sw = c("of", "the"),
  count = TRUE,
  loop = FALSE
)

Arguments

text: an input text
using: sentence or paragraph to tokenize
connect: lowercase connectors, like the "von" in "John von Neumann".
sw: stopwords vector.
count: if TRUE (default) count the frequency of nodes and return it in the order of its frequency
loop: if TRUE, it will not remove foops, a node pointing to itself.

Examples

text <- "John Does lives in New York in United States of America. He  is a passionate jazz musician, often playing in local clubs."
extract_graph_rb(text)
#> Tokenizing by sentences
#> # A tibble: 10 × 3
#>    n1            n2                n
#>    <chr>         <chr>         <int>
#>  1 America       He                1
#>  2 John_Does     America           1
#>  3 John_Does     He                1
#>  4 John_Does     New_York          1
#>  5 John_Does     United_States     1
#>  6 New_York      America           1
#>  7 New_York      He                1
#>  8 New_York      United_States     1
#>  9 United_States America           1
#> 10 United_States He                1