given a data frame with a column (list type) with POS, returns a list with three elements: 1) "graphs": a tibble with the frequency of graphs; 2) "isolated_nodes": a tibble with the isolated nodes, i.e. nodes without any connection. 3) "nodes": a tibble with the individual frequency of each node. If list element has only one element, it is removed.
Examples
pos <- txt_wiki |>
filter_by_query("Police") |>
parsePOS()
entities_by_txt <- pos |>
dplyr::group_by(doc_id) |>
dplyr::summarise(entities = list(unique(entity)))
graph_from_cooccurrence(entities_by_txt)
#> $graphs
#> # A tibble: 88 × 3
#> n1 n2 freq
#> <chr> <chr> <int>
#> 1 Ted_Kaczynski_'s Industrial_Society_and_Its_Future 2
#> 2 Altoona Industrial_Society_and_Its_Future 1
#> 3 Altoona McDonald 1
#> 4 Altoona Ted_Kaczynski_'s 1
#> 5 Central_Park Altoona 1
#> 6 Central_Park Industrial_Society_and_Its_Future 1
#> 7 Central_Park Mangione 1
#> 8 Central_Park McDonald 1
#> 9 Central_Park New_York_City 1
#> 10 Central_Park San_Francisco 1
#> # ℹ 78 more rows
#>
#> $isolated_nodes
#> # A tibble: 1 × 2
#> node freq
#> <chr> <int>
#> 1 American 1
#>
#> $nodes
#> # A tibble: 17 × 2
#> node freq
#> <chr> <int>
#> 1 Industrial_Society_and_Its_Future 2
#> 2 New_Jersey 2
#> 3 Ted_Kaczynski_'s 2
#> 4 Altoona 1
#> 5 Central_Park 1
#> 6 Joseph_Kenny 1
#> 7 Mangione 1
#> 8 Manhattan 1
#> 9 McDonald 1
#> 10 NYPD 1
#> 11 New_York 1
#> 12 New_York_City 1
#> 13 Pennsylvania 1
#> 14 San_Francisco 1
#> 15 Upper_Manhattan 1
#> 16 the_George_Washington_Bridge_Bus_Station 1
#> 17 the_San_Francisco_Police_Department 1
#>