Posts

Showing posts from November, 2025

Module 12

 Through this assignment, I learned how R Markdown brings together several important components of reproducible reporting. I became more comfortable with Markdown syntax, especially how headings, paragraphs, lists, and inline formatting work. I also learned how to incorporate LaTeX math into a document using both inline expressions surrounded by single dollar signs and displayed equations using double dollar signs. Seeing the rendered mathematical notation helped me understand how R Markdown communicates technical ideas clearly. I also gained experience integrating narrative text with executable R code chunks. It was useful to see how code and commentary work together in a single document, with the results of each chunk—like tables or plots—appearing immediately below the code that produced them. This made the workflow feel more organized and made it easier to connect explanations with the output they refer to. One challenge I faced was understanding when certain content should or ...

Module 11. Debugging and Defensive Programming in R

Image
  Error message reads: Error in outliers[, j] && tukey.outlier(x[, j]) : 'length = 10' in coercion to 'logical(1)' Diagnosis of the bug: The problem happens because both outliers[, j] and tukey.outlier(x[, j]) are vectors with several values (for example, 10 each). In R, the && operator only checks the first value from each vector and gives back one TRUE or FALSE. That means it’s not looking at all the elements, just the first ones. So when the code tries to use && on two full vectors, it basically tries to squeeze all those values into one single logical result, which causes the error about “length > 1.” To fix this, we should use & instead, because that operator works element by element across the whole vector. Results after debugging:

Module 10.

Image
 GitHub link:  https://github.com/christyj777/tidycleanr/tree/main Scope and Purpose:  The goal of tidycleanr is to make everyday data cleaning faster and more consistent for analysts and students who work with messy CSVs and survey data. Instead of repeating the same wrangling steps by hand, users can call short helper functions to standardize column names, guess variable types, and handle missing values. The package focuses on simple utilities that make it easy to prepare data for visualization or modeling.  Key Functions: clean_names() – wraps janitor to quickly convert messy column names into snake_case and fix duplicates. guess_types() – automatically converts character columns to numeric, date, or factor types where appropriate. impute_fast() – fill missing values using median or mode for numeric and categorical variables. drop_dupes() – remove duplicate rows across one or more key columns using dplyr. Description:  Some of the fields above such as (d...