Module 4- Boxplots and Histograms
The following presents the boxplots and histograms created by the code:
The boxplots revealed that blood pressure values differ meaningfully across assessments. In the first MD assessment, patients rated as “Bad” tend to have higher median blood pressures than those rated as “Good,” which supports the expectation that elevated BP is associated with worse evaluations. Similarly, in the second MD assessment, those labeled “High” have a noticeably wider range of blood pressure values, including extreme values, while the “Low” group is more tightly clustered around lower readings. Finally, in the final decision, the “High” category clearly corresponds to higher blood pressures compared to the “Low” group, reinforcing the consistency between clinical assessments and the ultimate classification. There are some patterns that could be observed from the dataset. First, the histogram of blood pressure shows a wide spread, with values ranging from the low 30s to just above 200. One noticeable feature is the outlier at 205, which stands apart from the bulk of the data and heavily influences the spread shown in the boxplots. In contrast, the histogram of visit frequency is more compact, with most values concentrated between 0.2 and 0.6. There are no extreme outliers in frequency, but the distribution is slightly uneven, suggesting variability in how often patients visit, which makes sense.
While the trends align with medical practices, higher blood pressure often being classified as “Bad” or “High", it is important to note the limitations of this dataset. The sample size is extremely small (only nine usable observations after cleaning), and the data was artificially constructed rather than drawn from actual patients. This restricts almost all generalizability of conclusions. Additionally, the presence of a single extreme outlier disproportionately affects averages and spreads, which would not be acceptable in a rigorous medical study without careful statistical adjustment. The dataset originally contained one missing value in the FirstAssess variable. Following the lab instructions, the entire row with the NA was dropped using na.omit(). Since this represented only one observation out of ten, the overall impact was minimal, though it did slightly reduce the sample size from 10 to 9. This change was reflected in the summary statistics: for example, the median of BloodPressure shifted slightly from 95 to 87, and the mean decreased from about 102.6 to 99.0 after the row with missing data was removed. The summary for FirstAssess also no longer reported an NA, and its distribution became slightly more balanced, with five cases of “Bad” and four of “Good.” While these shifts are small, they highlight how even a single missing observation can influence averages, medians, and category counts.
Comments
Post a Comment