Posts

Module 12

 Through this assignment, I learned how R Markdown brings together several important components of reproducible reporting. I became more comfortable with Markdown syntax, especially how headings, paragraphs, lists, and inline formatting work. I also learned how to incorporate LaTeX math into a document using both inline expressions surrounded by single dollar signs and displayed equations using double dollar signs. Seeing the rendered mathematical notation helped me understand how R Markdown communicates technical ideas clearly. I also gained experience integrating narrative text with executable R code chunks. It was useful to see how code and commentary work together in a single document, with the results of each chunk—like tables or plots—appearing immediately below the code that produced them. This made the workflow feel more organized and made it easier to connect explanations with the output they refer to. One challenge I faced was understanding when certain content should or ...

Module 11. Debugging and Defensive Programming in R

Image
  Error message reads: Error in outliers[, j] && tukey.outlier(x[, j]) : 'length = 10' in coercion to 'logical(1)' Diagnosis of the bug: The problem happens because both outliers[, j] and tukey.outlier(x[, j]) are vectors with several values (for example, 10 each). In R, the && operator only checks the first value from each vector and gives back one TRUE or FALSE. That means it’s not looking at all the elements, just the first ones. So when the code tries to use && on two full vectors, it basically tries to squeeze all those values into one single logical result, which causes the error about “length > 1.” To fix this, we should use & instead, because that operator works element by element across the whole vector. Results after debugging:

Module 10.

Image
 GitHub link:  https://github.com/christyj777/tidycleanr/tree/main Scope and Purpose:  The goal of tidycleanr is to make everyday data cleaning faster and more consistent for analysts and students who work with messy CSVs and survey data. Instead of repeating the same wrangling steps by hand, users can call short helper functions to standardize column names, guess variable types, and handle missing values. The package focuses on simple utilities that make it easy to prepare data for visualization or modeling.  Key Functions: clean_names() – wraps janitor to quickly convert messy column names into snake_case and fix duplicates. guess_types() – automatically converts character columns to numeric, date, or factor types where appropriate. impute_fast() – fill missing values using median or mode for numeric and categorical variables. drop_dupes() – remove duplicate rows across one or more key columns using dplyr. Description:  Some of the fields above such as (d...

Module 9. visualization systems in R

Image
  How does the syntax and workflow differ between base, lattice, and ggplot2? Base R graphics follow a “build-as-you-go” approach, where you start with a simple plot and add elements like lines or text step by step. Lattice graphics use a formula interface and require you to specify everything in one function call, producing multi-panel (conditioned) plots automatically but with less post-hoc flexibility. In contrast, ggplot2 uses the “grammar of graphics,” layering data, aesthetics, and geoms with a clear and consistent syntax. Which system gave you the most control or produced the most “publication‑quality” output with minimal code? ggplot2 produced the most publication-quality visuals with the least effort. Its default styling is clean and modern, legends are automatically generated, and themes allow easy customization for consistent, professional output. It also provided more outlook on the data as a whole, compared to the other systems.  Any challenges or surprises y...

Module 8. Assignment CSV files

Image
  (please click images for higher clarity) The images above showcase file‑write operations succeeded and the R code.  The first step imports the data without commas, from the dataset.txt file. It then creates a summarized table that shows the average grade for males and females. The resulting data frame, gender_mean, is then written to a tab-delimited text file named gender_mean.txt for reference or sharing. The second step filters the data to find all students whose names contain the letter “i” or “I,” using the grepl() function. A smaller data frame, i_students, is created from these matching rows and saved to a csv file. The final step produces a new csv file, created including all original columns, for just the names including the letter "I". This is the final document for this project, creating a new data set containing just those names. 

Module 7. S3 and S4 object systems

Reflection: You can check whether an object is an S4 object by using isS4(object), which returns TRUE for S4 and FALSE otherwise. For S3 objects, you can look at their class with class(object) and see if they were created by simply assigning a class attribute. To see the underlying data type of an object, you can use the typeof() function. It shows the low-level storage type, such as “integer,” “double,” “list,” or “character.” You can also use str(object) for a more detailed look at the object’s internal structure and types of its components. A generic function in R is a special kind of function designed to work flexibly with different types of objects. It performs method dispatch, meaning it automatically calls a version of the function that matches the class of the object you provide. This allows R to adapt behavior to the data type, promoting cleaner and more intuitive code. Examples include print(), summary(), and plot(), which behave differently for data frames, lists, o...

Module 6. Assignment- Creating basic and diagonal matrices

Image
  Adding two matrices combines their entries element by element, while subtraction takes the difference the same way. This makes it easy to see how corresponding values from  A and B interact. The diag() function builds a matrix with the given numbers along the main diagonal. All off-diagonal entries are set to zero automatically. Combining a special first column with a diagonal block using  cbind()  and  rbind() . This shows how smaller pieces can be stacked together to form a structured matrix. R code pasted: #1. Matrix Addition & Subtraction # Define matrices A <- matrix(c(2, 0, 1, 3), ncol = 2) B <- matrix(c(5, 2, 4, -1), ncol = 2) # Addition A_plus_B <- A + B A_plus_B # Subtraction A_minus_B <- A - B A_minus_B #2. Build 4x4 diagonal matrix D <- diag(c(4, 1, 2, 3)) D #3. Construct a Custom 5×5 Matrix first_col <- c(3, 2, 2, 2, 2) diag_block <- diag(3, 4) #Bind them together M <- cbind(first_col, rbind(rep(1, 4), diag_block)) M