Get co-occurrence of all words from pure text — cooccur

input pure text and get a tibble/data frame of word co-occurrence.

Usage

cooccur_words(
  text,
  sw = "",
  token_by = "sentence",
  lower = TRUE,
  loop = FALSE,
  output = "df",
  count = TRUE
)

Arguments

text: The inputed text
sw: A vector of stopwords to be removed
token_by: Tokenize by sentence or paragraph
lower: Convert words to lowercase. If the text is passed in all lowercase, it can return false sentence and paragraph tokenization. It is advised to use lowercase.
loop: if FALSE, self referential nodes (e.g. n1=x and also n2=x) will be excluded. Default FALSE.
output: as 1) a single tibble/dataframe ("tlb", "df", "tibble", "datafame"); 2) as a list of dataframes with cooccurrence per vector element ("lst" or "list"); or 3) as raw list. This format is the most raw output of this function; 4) "df2", tibble/dataframe with the doc numbers.
count: Return count of words (default TRUE)

Examples

txt <- "Lorem Ipsum. The Ipsum John. Dolor est. Lorem Ipsum dolor."
txt |> cooccur_words()
#> tokenizing sentences...
#> tokenizing words...
#> # A tibble: 7 × 3
#>   n1    n2        n
#>   <chr> <chr> <int>
#> 1 ipsum lorem     2
#> 2 dolor est       1
#> 3 dolor ipsum     1
#> 4 dolor lorem     1
#> 5 ipsum john      1
#> 6 ipsum the       1
#> 7 john  the       1