Group a data.frame generated by spacyr::spacy_parse(), if there is a sequence of entity = PERS* / PERSON. It works similar to spacyr::entity_extract(), but preserves dep_rel column
Examples
# example in Portuguese language
# spacy_finalize() # If spacy was previously initialized with another model.
spacy_initialize(model = "pt_core_news_lg")
#> Error in spacy_initialize(model = "pt_core_news_lg"): could not find function "spacy_initialize"
"Maria Jana ama John Smith e Maria é amada por Joaquim de Souza" |>
spacyr::spacy_parse(dependency = T) |>
group_ppn()
#> # A tibble: 13 × 10
#> # Groups: name [2]
#> doc_id sentence_id token_id token lemma pos head_token_id dep_rel entity
#> <chr> <int> <int> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 text1 1 1 Maria Maria PROPN 2 compou… "PERS…
#> 2 text1 1 2 Jana Jana PROPN 3 compou… "PERS…
#> 3 text1 1 3 ama ama NOUN 3 ROOT ""
#> 4 text1 1 4 John John PROPN 5 compou… "PERS…
#> 5 text1 1 5 Smith Smith PROPN 3 appos "PERS…
#> 6 text1 1 6 e e PROPN 13 compou… ""
#> 7 text1 1 7 Maria Maria PROPN 13 nmod "NORP…
#> 8 text1 1 8 é é PROPN 13 compou… ""
#> 9 text1 1 9 amada amada PROPN 13 compou… ""
#> 10 text1 1 10 por por PROPN 13 compou… ""
#> 11 text1 1 11 Joaquim Joaqu… PROPN 13 compou… "PERS…
#> 12 text1 1 12 de de PROPN 13 compou… "PERS…
#> 13 text1 1 13 Souza Souza PROPN 5 appos "PERS…
#> # ℹ 1 more variable: name <chr>