R Programming Journal – Christine Jacob

Posts

Module 11. Debugging and Defensive Programming in R

November 05, 2025

Error message reads: Error in outliers[, j] && tukey.outlier(x[, j]) : 'length = 10' in coercion to 'logical(1)' Diagnosis of the bug: The problem happens because both outliers[, j] and tukey.outlier(x[, j]) are vectors with several values (for example, 10 each). In R, the && operator only checks the first value from each vector and gives back one TRUE or FALSE. That means it’s not looking at all the elements, just the first ones. So when the code tries to use && on two full vectors, it basically tries to squeeze all those values into one single logical result, which causes the error about “length > 1.” To fix this, we should use & instead, because that operator works element by element across the whole vector. Results after debugging:

Module 10.

November 02, 2025

GitHub link: https://github.com/christyj777/tidycleanr/tree/main Scope and Purpose: The goal of tidycleanr is to make everyday data cleaning faster and more consistent for analysts and students who work with messy CSVs and survey data. Instead of repeating the same wrangling steps by hand, users can call short helper functions to standardize column names, guess variable types, and handle missing values. The package focuses on simple utilities that make it easy to prepare data for visualization or modeling. Key Functions: clean_names() – wraps janitor to quickly convert messy column names into snake_case and fix duplicates. guess_types() – automatically converts character columns to numeric, date, or factor types where appropriate. impute_fast() – fill missing values using median or mode for numeric and categorical variables. drop_dupes() – remove duplicate rows across one or more key columns using dplyr. Description: Some of the fields above such as (d...

Module 9. visualization systems in R

October 24, 2025

How does the syntax and workflow differ between base, lattice, and ggplot2? Base R graphics follow a “build-as-you-go” approach, where you start with a simple plot and add elements like lines or text step by step. Lattice graphics use a formula interface and require you to specify everything in one function call, producing multi-panel (conditioned) plots automatically but with less post-hoc flexibility. In contrast, ggplot2 uses the “grammar of graphics,” layering data, aesthetics, and geoms with a clear and consistent syntax. Which system gave you the most control or produced the most “publication‑quality” output with minimal code? ggplot2 produced the most publication-quality visuals with the least effort. Its default styling is clean and modern, legends are automatically generated, and themes allow easy customization for consistent, professional output. It also provided more outlook on the data as a whole, compared to the other systems. Any challenges or surprises y...

Module 8. Assignment CSV files

October 14, 2025

(please click images for higher clarity) The images above showcase file‑write operations succeeded and the R code. The first step imports the data without commas, from the dataset.txt file. It then creates a summarized table that shows the average grade for males and females. The resulting data frame, gender_mean, is then written to a tab-delimited text file named gender_mean.txt for reference or sharing. The second step filters the data to find all students whose names contain the letter “i” or “I,” using the grepl() function. A smaller data frame, i_students, is created from these matching rows and saved to a csv file. The final step produces a new csv file, created including all original columns, for just the names including the letter "I". This is the final document for this project, creating a new data set containing just those names.

Module 7. S3 and S4 object systems

October 08, 2025

Reflection: You can check whether an object is an S4 object by using isS4(object), which returns TRUE for S4 and FALSE otherwise. For S3 objects, you can look at their class with class(object) and see if they were created by simply assigning a class attribute. To see the underlying data type of an object, you can use the typeof() function. It shows the low-level storage type, such as “integer,” “double,” “list,” or “character.” You can also use str(object) for a more detailed look at the object’s internal structure and types of its components. A generic function in R is a special kind of function designed to work flexibly with different types of objects. It performs method dispatch, meaning it automatically calls a version of the function that matches the class of the object you provide. This allows R to adapt behavior to the data type, promoting cleaner and more intuitive code. Examples include print(), summary(), and plot(), which behave differently for data frames, lists, o...

Module 6. Assignment- Creating basic and diagonal matrices

October 03, 2025

Adding two matrices combines their entries element by element, while subtraction takes the difference the same way. This makes it easy to see how corresponding values from A and B interact. The diag() function builds a matrix with the given numbers along the main diagonal. All off-diagonal entries are set to zero automatically. Combining a special first column with a diagonal block using cbind() and rbind() . This shows how smaller pieces can be stacked together to form a structured matrix. R code pasted: #1. Matrix Addition & Subtraction # Define matrices A <- matrix(c(2, 0, 1, 3), ncol = 2) B <- matrix(c(5, 2, 4, -1), ncol = 2) # Addition A_plus_B <- A + B A_plus_B # Subtraction A_minus_B <- A - B A_minus_B #2. Build 4x4 diagonal matrix D <- diag(c(4, 1, 2, 3)) D #3. Construct a Custom 5×5 Matrix first_col <- c(3, 2, 2, 2, 2) diag_block <- diag(3, 4) #Bind them together M <- cbind(first_col, rbind(rep(1, 4), diag_block)) M

Search This Blog