Transforms the frequency in a co-occurrence dataframe, setting an stipulated maximum value of words frequency per document. The input must be the tibble from cooccur_words(output=df2). It groups the data by two columns, sums the counts within each group, and arranges the results in descending order. It is useful to reduce the maximum value that each document contribute to the corpus with its node pairs, avoiding that a highly frequent word pairs in a single document or in only a few documents, but rare or absent in other documents be taken as representative of the whole corpus.
Examples
# Example usage:
cooc_data <- data.frame(n1 = c(1, 1, 2), n2 = c(3, 4, 3), n = c(5, 6, 7))
cooc_data
#> n1 n2 n
#> 1 1 3 5
#> 2 1 4 6
#> 3 2 3 7
reduce_freq(cooc_data, threshold = 5)
#> # A tibble: 3 × 3
#> n1 n2 n
#> <dbl> <dbl> <dbl>
#> 1 1 3 5
#> 2 1 4 5
#> 3 2 3 5