All functions

add_words_check()

check if vector or unit. If unit, break it into a vector

all_words()

returns a vector with words of a language. The intent behind it is to test regex patterns

build_dict()

read a file .dic from Hunspell and get only the words to build a dict in the language

connectors()

A lowercase connectors between two proper names

count_vec()

count a vector of elements

extract_abbrev()

Extract abbreviations from text

extract_entity()

A rule based entity extractor extracts the entity from a text using regex. This regex captures all uppercase words, words that begin with upper case. If there is sequence of this patterns together, this function also captures. In the case of proper names with common lower case connectors like "Wwwww of Wwwww" this function also captures the connector and the subsequent uppercase words.

extract_graph()

Extract a non directional graph based on co-occurrence in the token. It extracts only if two entities are mentioned in the same token (sentence or paragraph)

extract_graph_rgx()

extract a graph from text, using custom regex pattern as nodes.

extract_ppn()

extract proper names from strings using regular expresssions. The function suposes any sequence of upper letter followed by lower case is a proper name. It is expected to return more things that wanted. Post-processing.

extract_relation()

tokenize and selects only sentences/paragraphs with more than one entity per sentence or paragraph

f()

To easily paste and collapse chars objects into one string.

gen_dict()

to generate a dictionary of specialized words you can use regex and the function check the dictionary of the language and returns the matched words. It is also useful to text your regex pattern.

gen_stopwords()

Generates a stopwords list of terms Function to generate a list of stopwords for a given language using grammar categories.

grep2()

A grep to be used with native pipe '|>'.

grepl2()

grepl to be used with native pipe |>

gsub2()

transform string to first capitalized, except join words*. A gsub to be used easily with native pipe |> gsub2 is just a wrapper around gsub

il()

Install libraries from string.

ll()

load libraries from string

ls2v()

list of strings to list of vectors.

nothing()

An empty function.

plot_graph()

plot a network of coocurrence of terms

regex_NomeProprio

a regex pattern to capture brazilian names, like "Fulano de Tal", "Ciclano dos Santos"

s2ppn()

Convert the string into proper name

s2v()

Convert a string into a vector of elements.

s_extract_all()

extract all chars.

show_sw()

show all stopwords categories of a language

shuffle_time()

generate shuffle times easily generate shuffle time, good to use in webscraping

strip_txt()

funcao para apagar o vetor de texto a partir do início de certo padrão de txt

subs_ppn()

Substitute proper names/entities spaces with underscore in the text.