| Title: | Estimate, Compare, and Visualize Healthcare Resource Utilization for Real-World Evidence |
|---|---|
| Description: | Tools to estimate, compare, and visualize healthcare resource utilization using data derived from electronic health records or real-world evidence sources. The package supports pre index and post index analysis, patient cohort comparison, and customizable summaries and visualizations for clinical and health economics research. Methods implemented are based on Scott et al. (2022) <doi:10.1080/13696998.2022.2037917> and Xia et al. (2024) <doi:10.14309/ajg.0000000000002901>. |
| Authors: | Maheshkumar Umbarkar [aut, cre, cph], Safiuddin Shoeb Syed [ctb] |
| Maintainer: | Maheshkumar Umbarkar <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0.9000 |
| Built: | 2026-05-31 08:38:38 UTC |
| Source: | https://github.com/mumbarkar/hcrur |
This function calculates estimates of healthcare resource utilization (HCRU) from electronic health record data across various care settings (e.g., IP, OP, ED/ER). It provides descriptive summaries of patient counts, encounters, costs, length of stay, and readmission rates for pre- and post-index periods.
estimate_hcru( data, cohort_col = "cohort", patient_id_col = "patient_id", admit_col = "admission_date", discharge_col = "discharge_date", index_col = "index_date", visit_col = "visit_date", encounter_id_col = "encounter_id", setting_col = "care_setting", cost_col = "cost_usd", readmission_col = "readmission", time_window_col = "period", los_col = "length_of_stay", custom_var_list = NULL, pre_days = 180, post_days = 365, readmission_days_rule = 30, group_var_main = "cohort", group_var_by = "care_setting", test = NULL, timeline = "Pre", gt_output = TRUE )estimate_hcru( data, cohort_col = "cohort", patient_id_col = "patient_id", admit_col = "admission_date", discharge_col = "discharge_date", index_col = "index_date", visit_col = "visit_date", encounter_id_col = "encounter_id", setting_col = "care_setting", cost_col = "cost_usd", readmission_col = "readmission", time_window_col = "period", los_col = "length_of_stay", custom_var_list = NULL, pre_days = 180, post_days = 365, readmission_days_rule = 30, group_var_main = "cohort", group_var_by = "care_setting", test = NULL, timeline = "Pre", gt_output = TRUE )
data |
A dataframe specifying the health care details. |
cohort_col |
A character specifying the name of the cohort column. |
patient_id_col |
A character specifying the name of the patient identifier column. |
admit_col |
A character specifying the name of the date of admission column. |
discharge_col |
A character specifying the name of the date of discharge column. |
index_col |
A character specifying the name of the index date or diagnosis column. |
visit_col |
A character specifying the name of the date of visit/claim column. |
encounter_id_col |
A character specifying the name of the encounter/claim column. |
setting_col |
A character specifying the name of the HCRU setting column e.g. IP, ED, OP, etc. |
cost_col |
A character specifying the name of cost column. |
readmission_col |
A character specifying the name of readmission column. |
time_window_col |
A character specifying the name of time window column. |
los_col |
A character specifying the name of length of stay column. |
custom_var_list |
A character vector providing the list of additional columns. |
pre_days |
Number of days before index (default 180 days). |
post_days |
Number of days after index (default 365 days). |
readmission_days_rule |
Rule for how many days can be permissible to define readmission criteria in AP setting (default 30 days). |
group_var_main |
A character specifying the name of the main grouping column. |
group_var_by |
A character specifying the name of the secondary grouping column. |
test |
An optional named list of statistical tests
(e.g., |
timeline |
A character specifying the timeline window (default "Pre"). |
gt_output |
Logical; if |
A list containing one or two summary data frames:
A descriptive summary of HCRU metrics by cohort, setting, and time window.
Formatted summary
statistics using gtsummary, if gt_output = TRUE.
df <- hcru_sample_data[sample(nrow(hcru_sample_data), 10), ] estimate_hcru(data = df)df <- hcru_sample_data[sample(nrow(hcru_sample_data), 10), ] estimate_hcru(data = df)
A sample dataset representing a patient cohort with index dates.
hcru_sample_datahcru_sample_data
A data frame with columns:
Unique patient identifier
Cohort identifier (e.g., treatment group)
Index date (as Date)
encounter/claim identifier (e.g., claim number)
HCRU domain types (e.g., IP, OP, ER, etc.)
Visit date (as Date)
Admission date (as Date)
Discharge date (as Date)
Encounter/Claim date (as Date)
period (e.g., Pre/Post)
Cost of utilization of health resources
Simulated data
This function provides the visualization of the events of the settings grouped by cohort and time window.
plot_hcru( summary_df, x_var = "time_window", y_var = "Cost", cohort_col = "cohort", facet_var = "care_setting", facet_var_n = 3, title = "Average total cost by domain and cohort", x_label = "Healthcare Setting (Domain)", y_label = "Average total cost", fill_label = "Cohort" )plot_hcru( summary_df, x_var = "time_window", y_var = "Cost", cohort_col = "cohort", facet_var = "care_setting", facet_var_n = 3, title = "Average total cost by domain and cohort", x_label = "Healthcare Setting (Domain)", y_label = "Average total cost", fill_label = "Cohort" )
summary_df |
Output from estimate_hcru() |
x_var |
A character specifying column name to be plotted on x-axis |
y_var |
A character specifying column name to be plotted on y-axis |
cohort_col |
A character specifying cohort column name |
facet_var |
A character specifying column name to generate faceted plots |
facet_var_n |
A numeric specifying number of columns for facet output |
title |
A character specifying the plot title |
x_label |
A character specifying x-axis label |
y_label |
A character specifying y-axis label |
fill_label |
A character specifying fill legend label |
Plot HCRU Event Summary
ggplot object
df <- data.frame( time_window = rep(c("Pre", "Post"), each = 2), cohort = rep(c("A", "B"), 2), care_setting = rep("Setting1", 4), Cost = c(100, 120, 110, 130) ) plot_hcru( summary_df = df, x_var = "time_window", y_var = "Cost", cohort_col = "cohort", facet_var = "care_setting", facet_var_n = 1, title = "Example Plot", x_label = "Time Window", y_label = "Cost", fill_label = "Cohort" )df <- data.frame( time_window = rep(c("Pre", "Post"), each = 2), cohort = rep(c("A", "B"), 2), care_setting = rep("Setting1", 4), Cost = c(100, 120, 110, 130) ) plot_hcru( summary_df = df, x_var = "time_window", y_var = "Cost", cohort_col = "cohort", facet_var = "care_setting", facet_var_n = 1, title = "Example Plot", x_label = "Time Window", y_label = "Cost", fill_label = "Cohort" )
This function helps to pre-process the heath care resource utilization (HCRU) for a given electronic health record data for a given set of settings e.g. IP, OP, ED/ER, etc.
preproc_hcru_fun( data, cohort_col = "cohort", patient_id_col = "patient_id", admit_col = "admission_date", discharge_col = "discharge_date", index_col = "index_date", visit_col = "visit_date", encounter_id_col = "encounter_id", setting_col = "care_setting", pre_days = 180, post_days = 365, readmission_days_rule = 30 )preproc_hcru_fun( data, cohort_col = "cohort", patient_id_col = "patient_id", admit_col = "admission_date", discharge_col = "discharge_date", index_col = "index_date", visit_col = "visit_date", encounter_id_col = "encounter_id", setting_col = "care_setting", pre_days = 180, post_days = 365, readmission_days_rule = 30 )
data |
A dataframe specifying the health care details |
cohort_col |
A character specifying the name of the cohort column |
patient_id_col |
A character specifying the name of the patient identifier column |
admit_col |
A character specifying the name of the date of admission column |
discharge_col |
A character specifying the name of the date of discharge column |
index_col |
A character specifying the name of the index date or diagnosis column |
visit_col |
A character specifying the name of the date of visit/claim column |
encounter_id_col |
A character specifying the name of the encounter/claim column |
setting_col |
A character specifying the name of the HCRU setting column e.g. IP, ED, OP, etc. |
pre_days |
Number of days before index (default 180 days) |
post_days |
Number of days after index (default 365 days) |
readmission_days_rule |
Rule for how many days can be permissible to define readmission criteria in AP setting (default 30 days) |
dataframe with HCRU estimates.
preproc_hcru_fun(data = hcru_sample_data)preproc_hcru_fun(data = hcru_sample_data)
Generate Detailed Descriptive Statistics
summarize_descriptives( data, patient_id_col = "patient_id", setting_col = "care_setting", cohort_col = "cohort", encounter_id_col = "encounter_id", cost_col = "cost_usd", los_col = "length_of_stay", readmission_col = "readmission", time_window_col = "time_window" )summarize_descriptives( data, patient_id_col = "patient_id", setting_col = "care_setting", cohort_col = "cohort", encounter_id_col = "encounter_id", cost_col = "cost_usd", los_col = "length_of_stay", readmission_col = "readmission", time_window_col = "time_window" )
data |
A dataframe with variables to summarize. |
patient_id_col |
A character specifying the name of patient identifier column |
setting_col |
A character specifying the name of HRCU setting column |
cohort_col |
A character specifying the name of cohort column |
encounter_id_col |
A character specifying the name of encounter/claim column |
cost_col |
A character specifying the name of cost column |
los_col |
A character specifying the name of length of stay column |
readmission_col |
A character specifying the name of readmission column |
time_window_col |
A character specifying the name of time window column |
A table object
if (requireNamespace("dplyr", quietly = TRUE) && requireNamespace("checkmate", quietly = TRUE)) { hcru_sample_data <- data.frame( patient_id = rep(1:10, each = 2), cohort = rep(c("A", "B"), 10), care_setting = rep(c("IP", "OP"), 10), admission_date = Sys.Date() - sample(1:100, 20, TRUE), discharge_date = Sys.Date() - sample(1:90, 20, TRUE), index_date = Sys.Date() - 50, visit_date = Sys.Date() - sample(1:100, 20, TRUE), encounter_id = 1:20, cost_usd = runif(20, 100, 1000) ) df <- preproc_hcru_fun(data = hcru_sample_data) summary_df <- summarize_descriptives(data = df) # Only keep required columns for demonstration summary_df$LOS <- ifelse(summary_df$care_setting == "IP", sample(1:10, nrow(summary_df), TRUE), NA) summary_df$Readmission <- ifelse(summary_df$care_setting == "IP", sample(0:1, nrow(summary_df), TRUE), NA) summary_df$time_window <- "Pre" summary_df }if (requireNamespace("dplyr", quietly = TRUE) && requireNamespace("checkmate", quietly = TRUE)) { hcru_sample_data <- data.frame( patient_id = rep(1:10, each = 2), cohort = rep(c("A", "B"), 10), care_setting = rep(c("IP", "OP"), 10), admission_date = Sys.Date() - sample(1:100, 20, TRUE), discharge_date = Sys.Date() - sample(1:90, 20, TRUE), index_date = Sys.Date() - 50, visit_date = Sys.Date() - sample(1:100, 20, TRUE), encounter_id = 1:20, cost_usd = runif(20, 100, 1000) ) df <- preproc_hcru_fun(data = hcru_sample_data) summary_df <- summarize_descriptives(data = df) # Only keep required columns for demonstration summary_df$LOS <- ifelse(summary_df$care_setting == "IP", sample(1:10, nrow(summary_df), TRUE), NA) summary_df$Readmission <- ifelse(summary_df$care_setting == "IP", sample(0:1, nrow(summary_df), TRUE), NA) summary_df$time_window <- "Pre" summary_df }
Generate Detailed Descriptive Statistics with Custom P-Value Tests
summarize_descriptives_gt( data, patient_id_col = "patient_id", var_list = NULL, group_var_main = "cohort", group_var_by = "care_setting", test = NULL, timeline = "Pre" )summarize_descriptives_gt( data, patient_id_col = "patient_id", var_list = NULL, group_var_main = "cohort", group_var_by = "care_setting", test = NULL, timeline = "Pre" )
data |
A dataframe with variables to summarize from the output of the summarize_descriptives function. Kindly filter the data for timeline. |
patient_id_col |
A character specifying the name of patient identifier column. |
var_list |
Optional quoted variable list (e.g. care_setting). |
group_var_main |
A character specifying the name of the main grouping column. |
group_var_by |
A character specifying the name of the secondary grouping column. |
test |
Optional named list of statistical tests (e.g. age ~ "wilcox.test"). |
timeline |
A character specifying the timeline window (default "Pre"). |
A gtsummary table object
if (requireNamespace("gtsummary", quietly = TRUE) && requireNamespace("dplyr", quietly = TRUE) && requireNamespace("purrr", quietly = TRUE) && requireNamespace("checkmate", quietly = TRUE) && requireNamespace("glue", quietly = TRUE)) { hcru_sample_data <- data.frame( patient_id = rep(1:10, each = 2), cohort = rep(c("A", "B"), 10), care_setting = rep(c("IP", "OP"), 10), admission_date = Sys.Date() - sample(1:100, 20, TRUE), discharge_date = Sys.Date() - sample(1:90, 20, TRUE), index_date = Sys.Date() - 50, visit_date = Sys.Date() - sample(1:100, 20, TRUE), encounter_id = 1:20, cost_usd = runif(20, 100, 1000) ) df <- preproc_hcru_fun(data = hcru_sample_data) summary_df <- summarize_descriptives(data = df) # Only keep required columns for demonstration summary_df$LOS <- ifelse(summary_df$care_setting == "IP", sample(1:10, nrow(summary_df), TRUE), NA) summary_df$Readmission <- ifelse(summary_df$care_setting == "IP", sample(0:1, nrow(summary_df), TRUE), NA) summary_df$time_window <- "Pre" # Run the function (should execute within 5 seconds) summarize_descriptives_gt( data = summary_df, patient_id_col = "patient_id", var_list = c("Visits", "Cost", "LOS", "Readmission"), group_var_main = "cohort", group_var_by = "care_setting", timeline = "Pre" ) }if (requireNamespace("gtsummary", quietly = TRUE) && requireNamespace("dplyr", quietly = TRUE) && requireNamespace("purrr", quietly = TRUE) && requireNamespace("checkmate", quietly = TRUE) && requireNamespace("glue", quietly = TRUE)) { hcru_sample_data <- data.frame( patient_id = rep(1:10, each = 2), cohort = rep(c("A", "B"), 10), care_setting = rep(c("IP", "OP"), 10), admission_date = Sys.Date() - sample(1:100, 20, TRUE), discharge_date = Sys.Date() - sample(1:90, 20, TRUE), index_date = Sys.Date() - 50, visit_date = Sys.Date() - sample(1:100, 20, TRUE), encounter_id = 1:20, cost_usd = runif(20, 100, 1000) ) df <- preproc_hcru_fun(data = hcru_sample_data) summary_df <- summarize_descriptives(data = df) # Only keep required columns for demonstration summary_df$LOS <- ifelse(summary_df$care_setting == "IP", sample(1:10, nrow(summary_df), TRUE), NA) summary_df$Readmission <- ifelse(summary_df$care_setting == "IP", sample(0:1, nrow(summary_df), TRUE), NA) summary_df$time_window <- "Pre" # Run the function (should execute within 5 seconds) summarize_descriptives_gt( data = summary_df, patient_id_col = "patient_id", var_list = c("Visits", "Cost", "LOS", "Readmission"), group_var_main = "cohort", group_var_by = "care_setting", timeline = "Pre" ) }