# Problem

Can we use daily accelerometer data to check for changes in physical activity?

## Applied Skills

Visualization, Exploratory Data Analysis

# Introduction

This analysis seeks to analyze the accelerometer data of a 63 year-old male with congestive heart failure in hopes of uncovering patterns in his activity over time. We have reason to believe that he becomes more active with time, so we’ll use their data to assess this hypothesis.

# Data Cleaning

The data is generated from an accelerometer, measured as some physical quantity. The patient accelerometer measured activity every minute for more than 47 weeks. That ends us with a lot of data that needs to be investigated. Here, I just change up the names to be more human readable.

library(tidyverse)
library(forcats)
library(knitr)
library(wesanderson)

knitr::opts_chunk$set( out.width = "90%" ) theme_set(theme_classic() + theme(legend.position = "bottom", plot.title = element_text(hjust = 0.5)) ) color_scheme = "FantasticFox1" # My favorite raw_accel = read.csv('./data/p8105_mtp_data.csv') colnames(raw_accel) = str_replace(colnames(raw_accel), "activity.", "minute_") tidy_accel = raw_accel %>% rename(., week_number = week, dow = day) %>% mutate(dow = ordered(dow, c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")), total_activity = rowSums(select(., minute_1:minute_1440))) %>% arrange(week_number, dow) %>% mutate(day_number = c(1:nrow(raw_accel))) ## Exploratory Data Analyses Weekends can be sources of greater activity or rest relative to weekdays, so I’ve tabulated the median counts for each day in isolation below and graphed each day’s 24 hour profiles. day_comparison = tidy_accel %>% group_by(dow) %>% summarize(n_days = n(), Q1_activity = round(quantile(total_activity, probs = c(0.25)), 0), median_activity = round(median(total_activity), 0), Q3_activity = round(quantile(total_activity, probs = c(0.75)), 0)) profiles_by_dow = tidy_accel %>% gather(., key = minute, val = activity, minute_1:minute_1440) %>% mutate(minute_num = as.numeric(str_replace(minute, "minute_", ""))) %>% ggplot(data = ., aes(x = minute_num, y = activity, color = dow)) + geom_point(alpha = 0.1) + geom_smooth(color = "black", geom = "line", size = 1, se = FALSE) + facet_grid(. ~ dow) + scale_color_manual(values = wes_palette(color_scheme, 7, type = "continuous")) + labs( title = "24 hour profiles by day of the week", x = "Minute of the day", y = "Total activity" ) + theme(legend.position = "none", axis.text.x = element_text(angle = 90, hjust = 1)) kable(day_comparison) profiles_by_dow The graph suggests that weekends are more active, evidenced by denser spikes of activity on Friday night and Saturday/Sunday midday relative to weekdays. To test if this trend applies over time, we’ll plot that out. trends_by_day = tidy_accel %>% filter(., total_activity > 1440) %>% # Ignore no activity days in early data gather(., key = minute, val = activity, minute_1:minute_1440) %>% mutate(minute_num = as.numeric(str_replace(minute, "minute_", ""))) %>% group_by(day_number, dow) %>% summarize(total_activity_for_day = sum(activity)) %>% # Plotting ggplot(data = ., aes(x = day_number, y = total_activity_for_day, color = dow)) + geom_point(alpha = 0.2) + geom_smooth(geom = "line", size = 1, se = FALSE) + facet_grid(. ~ dow) + scale_x_continuous(limits = c(0, 330)) + scale_y_continuous(limits = c(0, 550000)) + labs( title = "Total activity through time by day of week", x = "Days since accelerometer start", y = "Total activity" ) + scale_color_manual(values = wes_palette(color_scheme, 7, type = "continuous")) + theme(legend.position = "none") trends_by_day The graph demonstrates that total activity generally increases with time regardless of day of the week. Friday and Saturday are the busiest days overall through time, while Sunday shows the greatest increase relative to day 1. Some early days were excluded for poor data (no activity), but the increasing trend is still preserved. Saturday has early days with only partial use, contributing to its “large” rise. ## Time-dependence of activity By considering activity over all 24 hours, we risk washing out bursts of activity in waking hours with sleeping hours. Therefore, it merits examining how activity differs by 8 hour blocks to account for this. density_by_regime = tidy_accel %>% gather(., key = minute, val = activity, minute_1:minute_1440) %>% mutate(minute_num = as.numeric(str_replace(minute, "minute_", "")), regime = ifelse(minute_num < 481, "First 8 hrs", ifelse(minute_num < 961, "Second 8 hrs", "Third 8 hrs"))) %>% group_by(day_number, regime) %>% summarize(regime_total = sum(activity)) %>% # Plotting ggplot(data = ., aes(x = regime_total, fill = regime, color = regime)) + geom_density(alpha = 0.4) + labs( title = "Distribution of total activity values by 8 hour regimes", x = "Total activity", y = "Probability density" ) + theme(legend.position = "none") + scale_fill_manual(values = wes_palette(color_scheme, 3, type = "continuous")) + scale_color_manual(values = wes_palette(color_scheme, 3, type = "continuous")) activity_by_regime = tidy_accel %>% filter(., total_activity > 1440) %>% # Ignore no activity days gather(., key = minute, val = activity, minute_1:minute_1440) %>% mutate(minute_num = as.numeric(str_replace(minute, "minute_", "")), regime = ifelse(minute_num < 481, "First 8 hrs", ifelse(minute_num < 961, "Second 8 hrs", "Third 8 hrs"))) %>% group_by(day_number, regime) %>% summarize(regime_total = sum(activity)) %>% # Plotting ggplot(data = ., aes(x = day_number, y = regime_total, color = regime)) + geom_point(alpha = 0.2) + stat_smooth(geom = "line", size = 1) + labs( title = "Change in total activity by day 8-hour regime", x = "Days since accelerometer start", y = "Total activity" ) + scale_color_manual(values = wes_palette(color_scheme, 3, type = "continuous")) density_by_regime / activity_by_regime The resulting graphs confirm our suspiscion that the first 8 hour regime has less activity compared to the others. We can also confirm visually that activity in the later 2 regimes rises as days pass, whereas the first does not. # Assessment We have shown visually that the patient becomes more active with time, accounting for both day of the week and time of day. We will test the hypothesis that the first 2 weeks of activity is different from the last two with a paired t-test on a 0.01 significance level. $H_0: \Delta = 0,$ $H_1: \Delta \neq 0$ where the true difference, $$\Delta$$, is estimated by day-to-day differences between first 2 to last 2 weeks. desired_weeks = c(1:2, 46:47) paired_data = tidy_accel %>% filter(., week_number %in% desired_weeks) first_weeks = paired_data %>% filter(., week_number == 1 | week_number == 2) last_weeks = paired_data %>% filter(., week_number == 46 | week_number == 47) mean_difference = mean(first_weeks$total_activity - last_weeks$total_activity) std_difference = var(first_weeks$total_activity - last_weeks\$total_activity)^(1/2)

T_statistic = mean_difference / (std_difference/sqrt(nrow(first_weeks)))
T_crit = qt(0.995, df = 14 - 1) # 2 weeks

The magnitude of the test statistic is 3.04 while the critical value is 3.01. We can reject the null hypothesis and conclude that activity is different between the first and last two weeks with 99% confidence. -3.04 is negative, indicating more activity in the latter weeks.