Skip to contents

Extract a non directional graph based on co-occurrence in the token and returns a tibble It extracts only if two entities are mentioned in the same token (sentence or paragraph)

Usage

extract_graph_df(
  df,
  column_id,
  column_text,
  using = "sentences",
  connect = connectors("misc"),
  sw = c("of", "the"),
  loop = FALSE
)

Arguments

df

a data frame with two columns: text and id

column_id

name of the column with the id

column_text

name the column with the text to extract the graph

using

sentence or paragraph to tokenize

connect

lowercase connectors, like the "von" in "John von Neumann".

sw

stopwords vector.

loop

if TRUE, it will not remove loops, a node pointing to itself.

Examples

# creating a dataframe with text and id
DF <- data.frame(text = c("John Does lives in New York in United States of America. He  is a passionate jazz musician, often playing in local clubs.", r"(John Michael "Ozzy" Osbourne (3 December 1948 – 22 July 2025) was an English singer, songwriter, and media personality. He co-founded the pioneering heavy metal band Black Sabbath in 1968, and rose to prominence in the 1970s as their lead vocalist. During this time, he adopted the title "Prince of Darkness".[3][4] He performed on the band's first eight albums, most notably including Black Sabbath, Paranoid (both 1970) and Master of Reality (1971), before he was fired in 1979 due to his problems with alcohol and other drugs.)")) |> dplyr::mutate(id = paste0("id_", dplyr::row_number() ))
extract_graph_df(DF, "id", "text")
#> Error in extract_graph_df(DF, "id", "text"): object 'DF' not found