Photo by Jeremy Lapak on Unsplash

Photo by Jeremy Lapak on Unsplash

Problem

Can we use daily accelerometer data to check for changes in physical activity?

Applied Skills

Visualization, Exploratory Data Analysis

Introduction

This analysis seeks to analyze the accelerometer data of a 63 year-old male with congestive heart failure in hopes of uncovering patterns in his activity over time. We have reason to believe that he becomes more active with time, so we’ll use their data to assess this hypothesis.

Data Cleaning

The data is generated from an accelerometer, measured as some physical quantity. The patient accelerometer measured activity every minute for more than 47 weeks. That ends us with a lot of data that needs to be investigated. Here, I just change up the names to be more human readable.

Exploratory Data Analyses

Weekends can be sources of greater activity or rest relative to weekdays, so I’ve tabulated the median counts for each day in isolation below and graphed each day’s 24 hour profiles.

The graph suggests that weekends are more active, evidenced by denser spikes of activity on Friday night and Saturday/Sunday midday relative to weekdays. To test if this trend applies over time, we’ll plot that out.

The graph demonstrates that total activity generally increases with time regardless of day of the week. Friday and Saturday are the busiest days overall through time, while Sunday shows the greatest increase relative to day 1. Some early days were excluded for poor data (no activity), but the increasing trend is still preserved. Saturday has early days with only partial use, contributing to its “large” rise.

Time-dependence of activity

By considering activity over all 24 hours, we risk washing out bursts of activity in waking hours with sleeping hours. Therefore, it merits examining how activity differs by 8 hour blocks to account for this.

density_by_regime = tidy_accel %>% 
  gather(., key = minute, val = activity, minute_1:minute_1440) %>% 
  mutate(minute_num = as.numeric(str_replace(minute, "minute_", "")),
         regime = ifelse(minute_num < 481, "First 8 hrs", 
                         ifelse(minute_num < 961, "Second 8 hrs", "Third 8 hrs"))) %>% 
  group_by(day_number, regime) %>% 
  summarize(regime_total = sum(activity)) %>% 
  
  # Plotting
  ggplot(data = ., aes(x = regime_total, fill = regime, color = regime)) +
  geom_density(alpha = 0.4) +
  labs(
    title = "Distribution of total activity values by 8 hour regimes",
    x = "Total activity",
    y = "Probability density"
  ) + 
  theme(legend.position = "none") +
  scale_fill_manual(values = wes_palette(color_scheme, 3, type = "continuous")) +
  scale_color_manual(values = wes_palette(color_scheme, 3, type = "continuous"))

activity_by_regime = tidy_accel %>% 
  filter(., total_activity > 1440) %>% # Ignore no activity days
  gather(., key = minute, val = activity, minute_1:minute_1440) %>% 
  mutate(minute_num = as.numeric(str_replace(minute, "minute_", "")),
         regime = ifelse(minute_num < 481, "First 8 hrs", 
                         ifelse(minute_num < 961, "Second 8 hrs", "Third 8 hrs"))) %>% 
  group_by(day_number, regime) %>% 
  summarize(regime_total = sum(activity)) %>% 
  
  # Plotting
  ggplot(data = ., aes(x = day_number, y = regime_total, color = regime)) +
  geom_point(alpha = 0.2) +
  stat_smooth(geom = "line", size = 1) +
  labs(
    title = "Change in total activity by day 8-hour regime",
    x = "Days since accelerometer start",
    y = "Total activity"
  ) +
  scale_color_manual(values = wes_palette(color_scheme, 3, type = "continuous"))

density_by_regime / activity_by_regime

The resulting graphs confirm our suspiscion that the first 8 hour regime has less activity compared to the others. We can also confirm visually that activity in the later 2 regimes rises as days pass, whereas the first does not.

Assessment

We have shown visually that the patient becomes more active with time, accounting for both day of the week and time of day. We will test the hypothesis that the first 2 weeks of activity is different from the last two with a paired t-test on a 0.01 significance level.

\[H_0: \Delta = 0,\] \[H_1: \Delta \neq 0\]

where the true difference, \(\Delta\), is estimated by day-to-day differences between first 2 to last 2 weeks.

The magnitude of the test statistic is 3.04 while the critical value is 3.01. We can reject the null hypothesis and conclude that activity is different between the first and last two weeks with 99% confidence. -3.04 is negative, indicating more activity in the latter weeks.

Copyright © 2019 Christian B. Pascual. All rights reserved. This site is hosted by Github Pages and is built on R Markdown.