Skip to contents

Extract a non directional graph based on co-occurrence in the token. It extracts only if two entities are mentioned in the same token (sentence or paragraph)

Usage

extract_graph_rb(
  text,
  using = "sentences",
  connect = connectors("misc"),
  sw = c("of", "the"),
  count = TRUE,
  loop = FALSE
)

Arguments

text

an input text

using

sentence or paragraph to tokenize

connect

lowercase connectors, like the "von" in "John von Neumann".

sw

stopwords vector.

count

if TRUE (default) count the frequency of nodes and return it in the order of its frequency

loop

if TRUE, it will not remove foops, a node pointing to itself.

Examples

text <- "John Does lives in New York in United States of America. He  is a passionate jazz musician, often playing in local clubs."
extract_graph_rb(text)
#> Tokenizing by sentences
#> # A tibble: 10 × 3
#>    n1            n2                n
#>    <chr>         <chr>         <int>
#>  1 America       He                1
#>  2 John_Does     America           1
#>  3 John_Does     He                1
#>  4 John_Does     New_York          1
#>  5 John_Does     United_States     1
#>  6 New_York      America           1
#>  7 New_York      He                1
#>  8 New_York      United_States     1
#>  9 United_States America           1
#> 10 United_States He                1