Skip to contents

Group a data.frame generated by spacyr::spacy_parse(), if there is a sequence of entity = PERS* / PERSON. It works similar to spacyr::entity_extract(), but preserves dep_rel column

Usage

group_ppn(DF)

Arguments

DF

A data.frame generated by spacyr::spacy_parse().

Examples

"Mary Jane loves John Smith, and Maria is loved by John Does" |>
  spacyr::spacy_parse(dependency = T) |>
  group_ppn()
#> # A tibble: 13 × 10
#> # Groups:   name [8]
#>    doc_id sentence_id token_id token lemma pos   head_token_id dep_rel   entity 
#>    <chr>        <int>    <int> <chr> <chr> <chr>         <dbl> <chr>     <chr>  
#>  1 text1            1        1 Mary  Mary  PROPN             2 compound  "PERSO…
#>  2 text1            1        2 Jane  Jane  PROPN             3 nsubj     "PERSO…
#>  3 text1            1        3 loves love  VERB              3 ROOT      ""     
#>  4 text1            1        4 John  John  PROPN             5 compound  "PERSO…
#>  5 text1            1        5 Smith Smith PROPN             3 dobj      "PERSO…
#>  6 text1            1        6 ,     ,     PUNCT             3 punct     ""     
#>  7 text1            1        7 and   and   CCONJ             3 cc        ""     
#>  8 text1            1        8 Maria Maria PROPN            10 nsubjpass "PERSO…
#>  9 text1            1        9 is    be    AUX              10 auxpass   ""     
#> 10 text1            1       10 loved love  VERB              3 conj      ""     
#> 11 text1            1       11 by    by    ADP              10 agent     ""     
#> 12 text1            1       12 John  John  PROPN            11 pobj      "PERSO…
#> 13 text1            1       13 Does  do    VERB              3 conj      ""     
#> # ℹ 1 more variable: name <chr>

# example in Portuguese language
# spacy_finalize() # If spacy was previously initialized with another model.
spacy_initialize(model = "pt_core_news_lg")
#> Error in spacy_initialize(model = "pt_core_news_lg"): could not find function "spacy_initialize"
"Maria Jana ama John Smith e Maria é amada por Joaquim de Souza" |>
  spacyr::spacy_parse(dependency = T) |>
  group_ppn()
#> # A tibble: 13 × 10
#> # Groups:   name [2]
#>    doc_id sentence_id token_id token   lemma  pos   head_token_id dep_rel entity
#>    <chr>        <int>    <int> <chr>   <chr>  <chr>         <dbl> <chr>   <chr> 
#>  1 text1            1        1 Maria   Maria  PROPN             2 compou… "PERS…
#>  2 text1            1        2 Jana    Jana   PROPN             3 compou… "PERS…
#>  3 text1            1        3 ama     ama    NOUN              3 ROOT    ""    
#>  4 text1            1        4 John    John   PROPN             5 compou… "PERS…
#>  5 text1            1        5 Smith   Smith  PROPN             3 appos   "PERS…
#>  6 text1            1        6 e       e      PROPN            13 compou… ""    
#>  7 text1            1        7 Maria   Maria  PROPN            13 nmod    "NORP…
#>  8 text1            1        8 é       é      PROPN            13 compou… ""    
#>  9 text1            1        9 amada   amada  PROPN            13 compou… ""    
#> 10 text1            1       10 por     por    PROPN            13 compou… ""    
#> 11 text1            1       11 Joaquim Joaqu… PROPN            13 compou… "PERS…
#> 12 text1            1       12 de      de     PROPN            13 compou… "PERS…
#> 13 text1            1       13 Souza   Souza  PROPN             5 appos   "PERS…
#> # ℹ 1 more variable: name <chr>