Skip to contents

telegramR

R package

The telegramR package deals with exported chats from Telegram messenger. Telegram desktop allows the user to export the data of chats and channels, but how to extract ant treat those data remains a question. This package allows the transformation of HTML from exported chats/channels in tibble/dataframe, as well as have some functions to a brief summary about the exported chats.

Installation: Option 1 with devtools package

To install this package using devtools

# installing devtools
install.packages("devtools")
# installing telegramR
devtools::install_github("SoaresAlisson/telegramR")

Installation: Option 2 with remotes package

# if remote package is installed, load it; If not, install
if (!require('remotes')) install.packages('remotes')
remotes::install_github('SoaresAlisson/telegramR')

Installation: Option 3 download the files and build it locally

You go to Github, click on green bottom <> code, than in Download Zip, unzip it, open the .Rproj in Rstudio. Once the project is loaded, you can build the package: option1) press ctrl+shift+b, or 2) go to Build/Install Package

Running the package

Once installed, to load the package:

To transform one .HTML file into tibble/dataframe, use the function html2df("html_file")

html2df("~/Downloads/Telegram Desktop/ChatExport_2023-12-16 (1)/messages.html")
#> # A tibble: 999 × 6
#>    Chat_Name   msg_Id date_time           user_name   text     Audio_Video_image
#>    <chr>       <chr>  <dttm>              <chr>       <chr>    <chr>            
#>  1 Hacker News 69     2017-01-05 04:36:10 Hacker News 'Legiti… <NA>             
#>  2 Hacker News 70     2017-01-05 04:41:16 <NA>        Bitcoin… <NA>             
#>  3 Hacker News 71     2017-01-05 06:00:15 Hacker News Snapcha… <NA>             
#>  4 Hacker News 72     2017-01-05 06:15:47 Hacker News Rumors … <NA>             
#>  5 Hacker News 73     2017-01-05 13:41:36 Hacker News Mozilla… <NA>             
#>  6 Hacker News 74     2017-01-05 13:41:37 <NA>        Nvidia … <NA>             
#>  7 Hacker News 75     2017-01-05 13:41:37 <NA>        Easy 65… <NA>             
#>  8 Hacker News 76     2017-01-05 13:41:38 <NA>        Taichi … <NA>             
#>  9 Hacker News 77     2017-01-05 13:41:38 <NA>        .NET Co… <NA>             
#> 10 Hacker News 78     2017-01-05 13:41:42 <NA>        How to … <NA>             
#> # ℹ 989 more rows

To get all messages from HTML in a folder and transform all in one single tibble:

dir2df("~/Downloads/Telegram Desktop/ChatExport_2023-12-16 (1)/")
#> # A tibble: 170,156 × 6
#>    Chat_Name   msg_Id date_time           user_name   text     Audio_Video_image
#>    <chr>       <chr>  <dttm>              <chr>       <chr>    <chr>            
#>  1 Hacker News 69     2017-01-05 04:36:10 Hacker News 'Legiti… <NA>             
#>  2 Hacker News 70     2017-01-05 04:41:16 <NA>        Bitcoin… <NA>             
#>  3 Hacker News 71     2017-01-05 06:00:15 Hacker News Snapcha… <NA>             
#>  4 Hacker News 72     2017-01-05 06:15:47 Hacker News Rumors … <NA>             
#>  5 Hacker News 73     2017-01-05 13:41:36 Hacker News Mozilla… <NA>             
#>  6 Hacker News 74     2017-01-05 13:41:37 <NA>        Nvidia … <NA>             
#>  7 Hacker News 75     2017-01-05 13:41:37 <NA>        Easy 65… <NA>             
#>  8 Hacker News 76     2017-01-05 13:41:38 <NA>        Taichi … <NA>             
#>  9 Hacker News 77     2017-01-05 13:41:38 <NA>        .NET Co… <NA>             
#> 10 Hacker News 78     2017-01-05 13:41:42 <NA>        How to … <NA>             
#> # ℹ 170,146 more rows

The function has the parameter recursive = TRUE as default, so it is possible to recursively find HTML files in folders inside another folders. If you don’t want it, use the parameter recursive = FALSE

dir2df("~/Downloads/Telegram Desktop/")
#> # A tibble: 207,592 × 6
#>    Chat_Name   msg_Id date_time           user_name   text     Audio_Video_image
#>    <chr>       <chr>  <dttm>              <chr>       <chr>    <chr>            
#>  1 Hacker News 69     2017-01-05 04:36:10 Hacker News 'Legiti… <NA>             
#>  2 Hacker News 70     2017-01-05 04:41:16 <NA>        Bitcoin… <NA>             
#>  3 Hacker News 71     2017-01-05 06:00:15 Hacker News Snapcha… <NA>             
#>  4 Hacker News 72     2017-01-05 06:15:47 Hacker News Rumors … <NA>             
#>  5 Hacker News 73     2017-01-05 13:41:36 Hacker News Mozilla… <NA>             
#>  6 Hacker News 74     2017-01-05 13:41:37 <NA>        Nvidia … <NA>             
#>  7 Hacker News 75     2017-01-05 13:41:37 <NA>        Easy 65… <NA>             
#>  8 Hacker News 76     2017-01-05 13:41:38 <NA>        Taichi … <NA>             
#>  9 Hacker News 77     2017-01-05 13:41:38 <NA>        .NET Co… <NA>             
#> 10 Hacker News 78     2017-01-05 13:41:42 <NA>        How to … <NA>             
#> # ℹ 207,582 more rows

Organizing the exported chats/ channels.

Telegram Desktop allows the user to export channel/chats with specific date ranges. If you exports a lot of telegram channels/chats, the probability to become lost are high. This confusion may happens because there is a lot of groups/channels, or because the same channel was exported several times with different date ranges. To organize it, the function tm_info() (telegram information) returns the name of channel/chat, the first and the last dates in each HTML file

info_all_tm_chats <- tm_info("~/Downloads/Telegram Desktop/")
info_all_tm_chats
#> # A tibble: 209 × 5
#>    HTMLName                                 name  UserPics FirstDate  LastDate  
#>    <chr>                                    <chr> <chr>    <date>     <date>    
#>  1 ~/Downloads/Telegram Desktop/ChatExport… Hack… ""       2017-01-04 2017-01-23
#>  2 ~/Downloads/Telegram Desktop/ChatExport… Hack… ""       2017-01-23 2017-02-09
#>  3 ~/Downloads/Telegram Desktop/ChatExport… Hack… ""       2017-02-09 2017-02-26
#>  4 ~/Downloads/Telegram Desktop/ChatExport… Hack… ""       2017-02-26 2017-03-15
#>  5 ~/Downloads/Telegram Desktop/ChatExport… Hack… ""       2017-03-15 2017-04-01
#>  6 ~/Downloads/Telegram Desktop/ChatExport… Hack… ""       2017-04-01 2017-04-18
#>  7 ~/Downloads/Telegram Desktop/ChatExport… Hack… ""       2017-04-18 2017-05-05
#>  8 ~/Downloads/Telegram Desktop/ChatExport… Hack… ""       2017-05-05 2017-05-23
#>  9 ~/Downloads/Telegram Desktop/ChatExport… Hack… ""       2017-05-23 2017-06-10
#> 10 ~/Downloads/Telegram Desktop/ChatExport… Hack… ""       2017-06-10 2017-06-27
#> # ℹ 199 more rows

In this way, if you want to export again the same chat, it is easy to know the last date of the last export. To make it even easier:

filter_last_export(info_all_tm_chats)
#> # A tibble: 2 × 2
#>   name                             LastDate  
#>   <chr>                            <date>    
#> 1 Hacker News                      2023-12-16
#> 2 Science News Facts Updates Daily 2023-12-19