R
How-to Guides and Useful Links
base
Preventing scientific notation: https://stackoverflow.com/questions/25946047/how-to-prevent-scientific-notation-in-r
package management
My current preferred package management workflow involves creating virtual environments with renv
.
My (previously) preferred package for package management is pacman
. Before loading in dependencies, put this at the top of the script
if(!require("pacman")) install.packages("pacman")
file/path management
I use the here()
package for file/path management.
To reset home path (tidyverse equivalent of setwd()
): set_here()
. It’s a superseded function, but I don’t really like the replacement
dplyr
Combine with purrr::map
to read in multiple csvs to one data frame https://www.mjandrews.org/blog/readmultifile/
Useful functions that I am constantly forgetting: na_if
and rowwise()
(group_by for rows)
NOTE: Don’t get stuck in the trap of doing row-wise operations if pivoting makes more sense!
slice(1L)
for getting the max value of each group
grouped_data <- data %>%
group_by(variable, group_vars) %>%
summarize(values = sum(values)) %>%
mutate(grp = cur_group_id()) %>%
arrange(-n) %>%
slice(1L)
recode()
values in variables
replace_na()
for recoding NA values in variables
Do you want counts of variables in groups without deleting all the other variables? Use mutate()
after group_by()
instead of summarize. Then subset accordingly. e.g.:
df %>%
group_by(country_person) %>%
mutate(
n_articles_total = n(),
n_articles_before = sum(before_appoint==1),
n_articles_after = n_articles_total - n_articles_before,
n_lang_en = sum(lang_en==1),
n_lang_other = n_articles_total - n_lang_en,
av_text_length = mean(length)
)
df_tidy_subset <- df %>%
select(
id, country_person, n_articles_total, n_articles_before,
n_articles_after, n_lang_en, n_lang_other, av_text_length
) %>%
unique() # rm duplicates
id numbers within groups
df %>% group_by(cat) %>% mutate(id = row_number())
String/character vector manipulation
Remove all characters that are non-numeric: STRING <- str_remove_all(STRING, "\\D+")
Extract substring between two strings: qdapRegex::ex_between()
purrr
map
On reading in multiple files and combining result of a function into a data frame: https://clauswilke.com/blog/2016/06/13/reading-and-combining-many-tidy-data-files-in-r/
lubridate (/working with dates in general)
Create date object from year and month columns with ym()
function (goes for a bunch of different ymd combinations as well). e.g.:
df %>%
mutate(
date = ym(paste(Year, Month))
)
Manipulating Twitter Data
(NOTE: as of July 2023, there is extremely limited researcher access to Twitter data)
I used to collect my data using the twarc
Python package, but work with my data in R
. See code below as an example for wrangling the JSON strings from entities
variables.
You might have to make things more complex if you want to also add where tweets came from, but hopefully the snippet below provides a good starting point!
tweets_entities <- tweets %>%
filter(entities.annotations != "") %>% # for some reason drop_na not working
mutate(entities.annotations = gsub("\"\"", "\"", entities.annotations))
entities <- map(tweets_entities$entities.annotations, fromJSON) %>%
bind_rows() %>%
select("type", "normalized_text") %>%
distinct()
people <- entities %>%
filter(type == "Person")
ggplot2
If you’re a Python user, stick to the Grammar of Graphics and use the plotnine library for visualization :D
Heatmap snippet
fig_df |>
ggplot(aes(x = country, y = account_type, fill = n)) +
geom_tile(color = "white") +
geom_text(aes(label = n), color = "white", size = 15) +
coord_fixed() +
scale_fill_viridis(end = 0.7)
Faceting
Different categorical x-axes https://stackoverflow.com/questions/45019839/ggplot2-different-facet-width-for-categorical-x-axis
Themes
This is the theme_set()
that I might use for now.
# add fonts (this might not be a necessary step)
showtext::font_add_google(name = "Fira Sans", family = "fira")
showtext::font_add_google(name = "Roboto", family = "roboto")
# themes and text defaults
theme_set(
theme_minimal() +
theme(
legend.position = "bottom",
plot.title = element_text(family = "fira"),
text = element_text(family = "roboto")
)
)
Labels
Use str_wrap()
around different graphic elements to automatically wrap captions/text/legend labels. Sample code below:
top_df %>%
ggplot(aes(x = date, y = as.numeric(rank), color = str_wrap(game, 20))) +
geom_point() +
geom_bump() +
scale_y_reverse(limits = c(10, 1), n.breaks = 10) +
labs(
title = "Top Games Streamed on Twitch",
subtitle = str_wrap("Games shown are a subset of data with the top 200 ranked games over time. Each of these games have consistently ranked in the top 200, but not necessarily top 10 throughout the years.")
) +
guides(col = guide_legend(ncol = 3))
If you want to wrap legend labels but keep factor levels, use the following helper function (thanks Hadley Wickham!)
# for wrapping legend labels while keeping original factor levels
# https://github.com/tidyverse/stringr/issues/107
str_wrap_factor <- function(x, ...) {
levels(x) <- str_wrap(levels(x), ...)
x
}
How to customize which legends are shown based on aesthetic: guides()
. Example:
data %>%
ggplot(aes(x = type, y = fct_rev(abb), size = n, color = n)) +
geom_point() +
labs(
title = "TITLE",
x = "",
y = "",
color = "",
caption = "Data Source: DATA_SOURCE\nVisualization: Allison Koh"
) +
guides(size = "none")
Fonts
{extrafont}
and {showtext}
are useful for adding different fonts to viz. The former is for loading in existing fonts from system, the latter is for making sure your text shows up in all graphics(and for loading in fonts from google and other places).
LIFE HACK (or more likely, common sense thing that I often forget): Make sure to include font families in theme_set()
at the top of a script instead of in individual graphics.
Useful lines of code for {extrafont}
are as follows:
# load in system fonts
extrafont::load_fonts()
# show font names
fonts()
# show a data frame of all fonts available
fonttable()
Useful lines of code for {showtext}
are as follows:
showtext_auto() # put at the beginning of a script to automatically show text in new graphics devices
geom_bernie()
Add Bernie Sanders to your plots :D because why not
Install
remotes::install_github("R-CoderDotCom/ggbernie@main")
Geom
geom_bernie(aes(x = 1930, y = 20100), bernie = "sitting")
Alt Text
Helper Function
# helper function for writing alt text
# https://twitter.com/thomas_mock/status/1375853258145734660
write_alt_text <- function(
chart_type,
type_of_data,
reason,
misc,
source
){
glue::glue(
"{chart_type} of {type_of_data} where {reason}. \n\n{misc}\n\nData source from {source}"
)
}
Examples
The {TidyTuesdayAltText}
package contains examples of AltText from #TidyTuesday posts between 2019 and 2021.
A future version of this package will include an annotated dataset of alt text + ratings according to feature: https://twitter.com/spcanelon/status/1405488036989870080. Until it is integrated into the package, the data can be found here: https://github.com/spcanelon/csvConf2021/blob/main/data/annotatedRubric1.csv
devtools::install_github("spcanelon/TidyTuesdayAltText
Palette from picture with {paletteR}
https://datascienceplus.com/how-to-use-paletter-to-automagically-build-palettes-from-pictures/
devtools::install_github("andreacirilloac/paletter")
create_palette(image_path = "~/Desktop/410px-Piero_della_Francesca_046.jpg",
number_of_colors =20,
type_of_variable = “categorical")
Test palette with pie()
function
pie(rep(1, 13), col=pal)
Combining PDFs
https://stackoverflow.com/questions/17552917/merging-existing-pdf-files-using-r
install.packages("qpdf")
qpdf::pdf_combine(input = c("file.pdf", "file2.pdf", "file3.pdf"),
output = "output.pdf")
RSelenium
Reset port (for error message: Selenium server signals port = 4444 is already in use.) https://stackoverflow.com/questions/74708282/rselenium-is-not-working-when-creating-servers
library(qdapRegex)
#clear busy port in windows
port <- 4444L
tintern <- system("netstat -a -n -o",intern=T)
irow1 <- grep(as.character(port),tintern)
if(length(irow1)>0){
irow1 <- irow1[1]
if(!is.na(irow1)){
irow1 <- irow1[1]
trow <- tintern[irow1]
trow <- trimws(rm_white(trow))
tpid <- word(trow,-1,-1)
system(paste0("taskkill /pid ",tpid," /F"))
}
}