Skip to contents

given a data frame with a column (list type) with POS, returns a list with three elements: 1) "graphs": a tibble with the frequency of graphs; 2) "isolated_nodes": a tibble with the isolated nodes, i.e. nodes without any connection. 3) "nodes": a tibble with the individual frequency of each node. If list element has only one element, it is removed.

Usage

graph_from_cooccurrence(df_cooccurrence, strip_rgx = "^the_", freq = TRUE)

Arguments

df_cooccurrence

a dataframe generated by get_pairs()

strip_rgx

regex pattern to strip in the node text. Default: "^the_". To erase nothing, use "".

freq

if TRUE (default), returns a dataframe with frequency. If FALSE, returns only the pairs without aggregation.

Examples

pos <- txt_wiki |>
  filter_by_query("Police") |>
  parsePOS()

entities_by_txt <- pos |>
  dplyr::group_by(doc_id) |>
  dplyr::summarise(entities = list(unique(entity)))

graph_from_cooccurrence(entities_by_txt)
#> $graphs
#> # A tibble: 88 × 3
#>    n1               n2                                 freq
#>    <chr>            <chr>                             <int>
#>  1 Ted_Kaczynski_'s Industrial_Society_and_Its_Future     2
#>  2 Altoona          Industrial_Society_and_Its_Future     1
#>  3 Altoona          McDonald                              1
#>  4 Altoona          Ted_Kaczynski_'s                      1
#>  5 Central_Park     Altoona                               1
#>  6 Central_Park     Industrial_Society_and_Its_Future     1
#>  7 Central_Park     Mangione                              1
#>  8 Central_Park     McDonald                              1
#>  9 Central_Park     New_York_City                         1
#> 10 Central_Park     San_Francisco                         1
#> # ℹ 78 more rows
#> 
#> $isolated_nodes
#> # A tibble: 1 × 2
#>   node      freq
#>   <chr>    <int>
#> 1 American     1
#> 
#> $nodes
#> # A tibble: 17 × 2
#>    node                                      freq
#>    <chr>                                    <int>
#>  1 Industrial_Society_and_Its_Future            2
#>  2 New_Jersey                                   2
#>  3 Ted_Kaczynski_'s                             2
#>  4 Altoona                                      1
#>  5 Central_Park                                 1
#>  6 Joseph_Kenny                                 1
#>  7 Mangione                                     1
#>  8 Manhattan                                    1
#>  9 McDonald                                     1
#> 10 NYPD                                         1
#> 11 New_York                                     1
#> 12 New_York_City                                1
#> 13 Pennsylvania                                 1
#> 14 San_Francisco                                1
#> 15 Upper_Manhattan                              1
#> 16 the_George_Washington_Bridge_Bus_Station     1
#> 17 the_San_Francisco_Police_Department          1
#>