All functions |
|
---|---|
check if vector or unit. If unit, break it into a vector |
|
returns a vector with words of a language. The intent behind it is to test regex patterns |
|
read a file .dic from Hunspell and get only the words to build a dict in the language |
|
A lowercase connectors between two proper names |
|
count a vector of elements |
|
Extract abbreviations from text |
|
A rule based entity extractor extracts the entity from a text using regex. This regex captures all uppercase words, words that begin with upper case. If there is sequence of this patterns together, this function also captures. In the case of proper names with common lower case connectors like "Wwwww of Wwwww" this function also captures the connector and the subsequent uppercase words. |
|
Extract a non directional graph based on co-occurrence in the token. It extracts only if two entities are mentioned in the same token (sentence or paragraph) |
|
extract a graph from text, using custom regex pattern as nodes. |
|
extract proper names from strings using regular expresssions. The function suposes any sequence of upper letter followed by lower case is a proper name. It is expected to return more things that wanted. Post-processing. |
|
tokenize and selects only sentences/paragraphs with more than one entity per sentence or paragraph |
|
To easily paste and collapse chars objects into one string. |
|
to generate a dictionary of specialized words you can use regex and the function check the dictionary of the language and returns the matched words. It is also useful to text your regex pattern. |
|
Generates a stopwords list of terms Function to generate a list of stopwords for a given language using grammar categories. |
|
A grep to be used with native pipe '|>'. |
|
grepl to be used with native pipe |> |
|
transform string to first capitalized, except join words*. A gsub to be used easily with native pipe |> gsub2 is just a wrapper around gsub |
|
Install libraries from string. |
|
load libraries from string |
|
list of strings to list of vectors. |
|
An empty function. |
|
plot a network of coocurrence of terms |
|
a regex pattern to capture brazilian names, like "Fulano de Tal", "Ciclano dos Santos" |
|
Convert the string into proper name |
|
Convert a string into a vector of elements. |
|
extract all chars. |
|
show all stopwords categories of a language |
|
generate shuffle times easily generate shuffle time, good to use in webscraping |
|
funcao para apagar o vetor de texto a partir do início de certo padrão de txt |
|
Substitute proper names/entities spaces with underscore in the text. |