df2ggplot: Quick and dirty ggplot

Tue, 30 Nov 2021 00:00:00 +0000

Purpose

To quickly create a ggplot chart in R using any data object which has rows and columns. This function takes some typical defaults for creating a chart found in the ggplot cheatsheet (https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-visualization.pdf) to create a quickly stylized chart. The purpose is to just create a quick graph to visualize your data without thinking too much about the how and why that is to cut down the data management and time it takes to organize and create a graph.

NOTE: This function doesn’t directly pipe into the ggplot package so there are limited settings which can be adjusted in this function however additional layers can be added as the output object is a ggplot compatible list object.

Why use this function?

This function allows you to quickly create a graph using any data object with rows and columns. Since most data is organized in a way that columns are variables and rows are participants, this function allows data to be inputted as is without conforming to the ggplot specific format. There are some idiosyncracies in R and ggplot such is not typical in any other programming language (e.g. pivotting tables) so this function will extract only the specified variables, clean, and pivot the table in addition to creating the specified chart, if avalilable, in addition to providing basic titles, labels, and scales. This should make ggplot more accessible with other types of objects (e.g. list, tibble, dataframe) as well as users coming from different programming languages as a function specification may be more intuitive.

Function

#install.packages("tidyverse")
library(tidyverse)
df2ggplot <- function(data, x, y = NULL, group = NULL, type = NULL) {
#' @author Darren Liang
#' @note Last updated: 01 DEC 2021
#' @description Quickly creates a type of chart using ggplot using the data
#' and variables from a typically organized data frame (columns are variables
#' and rows are participants). since it is not typical to pivot tables in
#' other programs, this function will extract and pivot only the required
#' variables for ease of use in addition to providing basic titles and labels.
#' @param data (list/ tibble/ data frame) A data object which contains columns
#' and rows. Will be coerced into a data frame anyways.
#' @param x (char) A character string which contains the name of a column
#' which can be found in the data object. Plotted on the x-axis.
#' @param y (char) OPTIONAL. A character string which contains the name of a
#' column which can be found in the data object. Plotted on the y-axis.
#' @param group (char) OPTIONAL. A character string which contains the name of
#' a column which can be found in the data object. Will be used to color
#' separate the group.
#' @param type (char) OPTIONAL. A character string which contains the name of
#' the type of chart you which to create. Refer to the ggplot2 cheat sheet for
#' the most popular usage types (only include the type past geom*). Not all
#' graph types are included in this function.
#' @usage df2ggplot(data = data, x = "var1", y = "var2", group = "var3",
#' type = "chart")
#' @return A list object which can be viewed as a plot.
#' @examples
#' df2ggplot(data = mtcars, x = "mpg", y = "hp", group = "gear",
#' type = "jitter")
# create a new data frame by extracting variables from the input data
data <- as.data.frame(data) %>%
# clean all empty cells in case there are any
compact() %>%
# selects non-NULL variables if they exists to keep
select(all_of(x), all_of(y), all_of(group)) %>%
pivot_longer(!c(all_of(y), all_of(group)),
names_to = "scores", values_to = "values")
# group_by() if applicable
for (index in seq_along(group)) {
group_by(data, group[[index]])
}
# plot the object using the specified features
plt <- data %>%
ggplot(aes(x = values)) +
theme_classic() +
labs(x = x, title = paste(x))
if (!is.null(y)) {
plt <- plt +
aes(y = get(y)) +
labs(x = x, y = y, title = paste(x, "vs.", y))
}
if (!is.null(group)) {
plt <- plt +
aes(color = get(group)) +
labs(color = group)
}
# select type of graph based on type of data
# Some but not all popular geom* included from ggplot2 cheat sheet found at:
# https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-visualization.pdf
if (is.null(type)) {
message("Type not specified, default visualization selected.")
if (is.null(y)) {
plt <- plt +
geom_histogram(binwidth = 5)
} else {
plt <- plt +
geom_jitter()
}
} else if (type == "area") {plt <- plt + geom_area()
} else if (type == "density") {plt <- plt + geom_density(kernel = "gaussian")
} else if (type == "dotplot") {plt <- plt + geom_dotplot()
} else if (type == "freqpoly") {plt <- plt + geom_freqpoly()
} else if (type == "histogram") {plt <- plt + geom_histogram(binwidth = 5)
} else if (type == "bar") {plt <- plt + geom_bar()
} else if (type == "point") {plt <- plt + geom_point()
} else if (type == "rug") {plt <- plt + geom_rug(sides = "bl")
} else if (type == "smooth") {plt <- plt + geom_smooth(method = lm)
} else if (type == "col") {plt <- plt + geom_col()
} else if (type == "boxplot") {plt <- plt + geom_boxplot()
} else if (type == "violin") {plt <- plt + geom_violin(scale = "area")
} else if (type == "count") {plt <- plt + geom_count()
} else if (type == "jitter") {plt <- plt + geom_jitter()
} else if (type == "line") {plt <- plt + geom_line()
} else {stop("Graph type not included. Try ggplot geom_* instead.")
}
return(plt)
}

Usage

df2ggplot(data = data, x = “var1”, y = “var2”, group = “var3”, type = “chart”)

Arguments

data (list/ tibble/ data frame): A data object which contains columns and rows. Will be coerced into a data frame anyways.

x (char): A character string in quotes (either " or ‘) which contains the name of a column which can be found in the data object. Plotted on the x-axis.

y (char): OPTIONAL. A character string in quotes (either " or ‘) which contains the name of a column which can be found in the data object. Plotted on the y-axis.

group (char): OPTIONAL. A character string in quotes (either " or ‘) which contains the name of a column which can be found in the data object. Will be used to color separate the group.

type (char): OPTIONAL. A character string in quotes (either " or ‘) which contains the name of the type of chart you which to create. Refer to the ggplot2 cheat sheet for the most popular usage types (only include the type past geom*). Not all graph types are included in this function.

Returns

plt (list): A single list object containing a ggplot compatible plot.

Example

df2ggplot(data = mtcars, x = “mpg”, y = “hp”, group = “gear”, type = “jitter”)

See example output in knitted .Rmd to PDF.

psypy2df: Data importation

Fri, 26 Nov 2021 00:00:00 +0000

Purpose

For the automatation of data importation given an absolute or relative directory as a character vector. In particular, PsychoPy, a popular psychology experiment builder, allows the use of a graphical user interface (GUI) to build and program an experiment without the specific knowledge of coding. The organized data collected outputs per participant data in seperate files and formats (.csv, .log, and .psydat). This function may be used beyond this specific purpose with specific tweaks (commented in code) however it was written with this in mind. This function will read all comma seperated values (.csv) files in a given directory into R, clean up by removing rows and columns which belong to empty files, and output a single dataframe with all rows and columns with participant data.

NOTE: This function will retain all the columns and non-empty rows whether or not they may be useful to you. If a project was started with many variables or changes were made to variable names, it is important to check which columns are useful to you.

Why use this function?

This function allows you to quickly import all the data from a given PsychoPy data directory to look at all the data at once. This is helpful to allow a user to check which files may or may not be useful in a single dataframe rather than opening multiple comma seperated value files one at a time. Whether your directory contains old data, partial data, or other issues with the PsychoPy output data, this function should retain all of the columns so that you can check the output dataframe to filter or select what is useful to you.

Function

#install.packages("tidyverse")
#install.packages("readxl")
library(tidyverse)
library(readxl)
psypy2df <- function(directory) {
#' @author Darren Liang
#' @note Last updated: 01 DEC 2021
#' @description Imports all *.csv files from a input directory as a single
#' data frame object.
#' @param directory (character) The absolute or relative path to a PsychoPy
#' data directory.
#' Usually ends with ../../data/ based on default Psychopy data structure.
#' @usage psypy2df("directory")
#' @return All the rows and columns from the .csv files from the directory as
#' a single data frame object.
#' @examples
#' psypy2df("C:/Users/Admin/Documents/experiment/data")
# preallocate dataframe for output object
raw.list <- list()
df <- data.frame()
# obtain all the files and read in files in the data directory
filelist <- list.files(path = directory, pattern = "*.csv", full.names = TRUE)
for (index in 1:length(filelist)) {
raw.list[[index]] <- read_csv(filelist[[index]])
}
# remove empty files from the list
clean.list <- raw.list %>%
compact()
# remove all columns after framerate (framerate is usually the end of the
# useful information for Psychopy)
# if this function is being applied to another data structure, edit or remove
# the select() line
for (index in 1:length(clean.list)) {
clean.list[[index]] <- as.data.frame(clean.list[[index]]) %>%
select(1:frameRate)
df <- merge(df, clean.list[[index]], all = TRUE)
}
# output object as data frame
return(df)
}

Usage

psypy2df(‘directory’)

Arguments

directory (char): An absolute or relative path to your directory containing all the .csv files which you wish to import into R in quotes (either " or ‘).

Returns

df (dataframe): A single dataframe object containing all non-NA rows and columns from the imported .csv files.

Example

psypy2df(‘C:/Users/dliang55/Downloads/segmentation-timing-master/data’)

See example output in knitted .Rmd to PDF.

functions | Darren Liang

df2ggplot: Quick and dirty ggplot

Purpose

Why use this function?

Function

Usage

Arguments

Returns

Example

psypy2df: Data importation

Purpose

Why use this function?

Function

Usage

Arguments

Returns

Example