All functions

all_words()

returns a vector with words of a language. The intent behind it is to test regex patterns

connectors()

A lowercase connectors between two proper names

count_vec()

count a vector of elements, arragange it or not, and returns a tibble

extract_abbrev()

Extract abbreviations from text

extract_entity()

A rule based entity extractor extracts the entity from a text using regex. This regex captures all uppercase words, words that begin with upper case. If there is sequence of this patterns together, this function also captures. In the case of proper names with common lower case connectors like "Wwwww of Wwwww" this function also captures the connector and the subsequent uppercase words.

extract_graph()

Extract a non directional graph based on co-occurrence in the token. It extracts only if two entities are mentioned in the same token (sentence or paragraph)

extract_graph_rgx()

extract a graph from text, using custom regex pattern as nodes.

extract_ppn()

extract proper names from strings using regular expresssions. The function suposes any sequence of upper letter followed by lower case is a proper name. It is expected to return more things that wanted. Post-processing.

extract_relation()

tokenize and selects only sentences/paragraphs with more than one entity per sentence or paragraph

f()

To easily paste and collapse chars objects into one string. A wrapper for glue::glue.

gen_dict()

to generate a dictionary of specialized words you can use regex and the function check the dictionary of the language and returns the matched words. It is also useful to text your regex pattern.

gen_stopwords()

Generates a stopwords list of terms Function to generate a list of stopwords for a given language using grammar categories.

grep2()

A grep to be used with native pipe '|>'.

grepl2()

grepl to be used with native pipe |>

gsub2()

transform string to first capitalized, except join words*. A gsub to be used easily with native pipe |> gsub2 is just a wrapper around gsub

il()

Install libraries from string.

ll()

load libraries from string

ls2v()

list of strings to list of vectors.

nothing()

An empty function.

plot_graph()

plot a network of coocurrence of terms

regex_NomeProprio

a regex pattern to capture brazilian names, like "Fulano de Tal", "Ciclano dos Santos"

s2ppn()

Convert the string into proper name

s2v()

Convert a string into a vector of elements.

s_extract_all()

extract all chars.

show_sw()

show all stopwords categories of a language

shuffle_time()

generate shuffle times easily generate shuffle time, good to use in webscraping

subs_ppn()

Substitute proper names/entities spaces with underscore in the text.