Introduction

This document showcases usage examples for RCodeBox functions using a real oncology dataset. See the following sections for example data preparation and usage examples. This is a living document that will be updated as new functions and additional functionality of existing functions are added.

Lets go ahead and load the RCodeBox package and other packages will need for this vignette.

# Main package
library(RCodeBox)

# Packages for prepping the example data
library(TCGAbiolinks)
library(SummarizedExperiment)

# Packages for general data wrangling, plotting, display
library(dplyr)
library(tidyr)
library(RColorBrewer)
library(DT)

Example Data Prep

The below section details how to download and prep the example dataset used to showcase RCodeBox function usage.

Note

The example data download and preparation process can be lengthy, so feel free to skip to the Usage Examples section and pickup with the prepped data that’s stored in and is automatically loaded from the package itself.

First we download some example transcriptomic and corresponding clinical data from The Cancer Genome Atlas (TCGA) data portal via the TCGAbiolinks package to use in our examples. We will obtain primary tumor and solid normal tissue data from colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ) cases from TCGA.

First specify the query parameters for the TCGA data.

query = GDCquery(
  project = c("TCGA-COAD", "TCGA-READ"),
  data.category = "Transcriptome Profiling",
  data.type = "Gene Expression Quantification",
  sample.type = c("Primary Tumor", "Solid Tissue Normal")
)

Then we can download the data using the query specifications.

GDCdownload(query)

We then use the GDCprepare function to prepare the downloaded data by concatenating all the individual RNA-seq files we just downloaded and attaching corresponding clinical data. The prepared data is a RangedSummarizedExperiment object, so we will need to use the SummarizedExperiment package to work with it. We can extract out the data we need to more standard dataframe formats to simplify further processing.

# Prepare downloaded data
gdc_data = GDCprepare(query, save = FALSE)

# Extract different data modalities into data.frames
clinical_data = data.frame(colData(gdc_data))
gex_data = data.frame(rowData(gdc_data))
gex_counts = data.frame(assay(gdc_data, "unstranded"))
gex_tpm = data.frame(assay(gdc_data, "tpm_unstrand"))

# Remove any temp files if they were created
if (file.exists("df.rds")) {
  file.remove("df.rds")
}
if (file.exists("results.rds")) {
  file.remove("results.rds")
}

As stated above, the prepared data from this section is contained within the RCodeBox package and can be readily accessed as soon as you load the package into R.

Usage Examples

The following sections show RCodeBox functions in action based on some typical use cases. These are not meant to cover everything you can do with the functions in RCodeBox, just give an idea of how they are used.

Lets load the prepared TCGA data.

data(clinical_data)
data(gex_data)
data(gex_counts)
data(gex_tpm)

Stratified Statistics Table

The function stratified_stat_table is a good workhorse function for quickly creating a publication-ready table of summary statistics with statistical testing for multiple categorical and quantitative variables. The function’s main input is a dataframe with a grouping variable of interest and other variables you are interested in calculating summary statistics for and determining if any significantly differ between groups in the grouping variable. You can either display the variable names as they are, or supply a named list to rename the variables (i.e., if you want more formatted labeling in the table). You can additionally supply covariates to adjust for during statistical testing. The function uses Firth’s penalized logistic regression (via the logistf package) or linear regression (via the base lm function) to test categorical and quantitative variables, respectively.

Two Group Table

Below we use the function on a two group grouping variable: determining what patient demographics and/or clinical variables may differ between the cancer histologies, i.e., colon and rectal adenocarcinoma. We can visualize this as an HTML-based table with the function kable_html_table, which is useful for when you are performing analyses in interactive notebooks like Quarto or Rmarkdown and rendering to HTML.

See Table 1 for results.

# Define grouping variable
groups = "name"

# Define columns of interest
columns = c(
  "age_at_index",
  "gender",
  "race",
  "ethnicity",
  "ajcc_pathologic_stage",
  "prior_treatment",
  "sample_type"
)

# Define named list to rename variables by
rename_dict = list(
  "Age",
  "Gender",
  "Race",
  "Ethnicity",
  "Clinical Stage",
  "Prior Treatment",
  "Tissue Sample Type"
)
names(rename_dict) = columns

# Create data for testing
test_data = clinical_data[, c("barcode", groups, columns)]

# Set not reported or unknown data to missing
for (col in columns) {
  if (is.character(test_data[[col]]) | is.factor(test_data[[col]])) {
    test_data[[col]][grepl(
      "not reported|unknown",
      test_data[[col]],
      ignore.case = TRUE
    )] = NA
  }
}

# Make race and ethnicity categories title case
test_data$race = stringr::str_to_title(test_data$race)
test_data$ethnicity = stringr::str_to_title(test_data$ethnicity)

# Simplify staging information
test_data$ajcc_pathologic_stage = factor(
  dplyr::case_when(
    grepl("Stage IV", test_data$ajcc_pathologic_stage) ~ "Stage IV",
    grepl("Stage III", test_data$ajcc_pathologic_stage) ~ "Stage III",
    grepl("Stage II", test_data$ajcc_pathologic_stage) ~ "Stage II",
    grepl("Stage I", test_data$ajcc_pathologic_stage) ~ "Stage I",
    .default = NA
  )
)

# Build and display table
results = stratified_stat_table(test_data, columns, groups, rename_dict)
kable_html_table(results, bold_columns = c(1, 2))

Table 1: Patient and clinical characteristics in colon and rectal adenocarcinoma cases

Variable	Categories	Total (N=698)	Colon Adenocarcinoma (N=522)	Rectum Adenocarcinoma (N=176)	Coef [95%CI]	P
Age, Mean±SD	-	66.6±12.8	67.3±13	64.5±11.9	-2.8 [-4.98; -0.61]	1.21e-02
Gender, N (%)	Female	330 (47.5%)	248 (47.7%)	82 (46.9%)	0.97 [0.69; 1.36]
	Male	365 (52.5%)	272 (52.3%)	93 (53.1%)	1.03 [0.73; 1.46]	8.50e-01
Race, N (%)	American Indian or Alaska Native	2 (0.5%)	2 (0.6%)	0 (0%)	0.7 [0.01; 8.68]	8.10e-01
	Asian	12 (2.9%)	11 (3.4%)	1 (1.1%)	0.45 [0.05; 1.91]	3.12e-01
	Black or African American	71 (16.9%)	65 (19.8%)	6 (6.5%)	0.3 [0.12; 0.65]	1.36e-03
	White	336 (79.8%)	250 (76.2%)	86 (92.5%)	3.61 [1.74; 8.62]	2.60e-04
Ethnicity, N (%)	Hispanic or Latino	5 (1.2%)	4 (1.3%)	1 (1.1%)	1.21 [0.12; 6.66]
	Not Hispanic or Latino	400 (98.8%)	314 (98.7%)	86 (98.9%)	0.83 [0.15; 8.31]	8.42e-01
Clinical Stage, N (%)	Stage I	107 (18.8%)	75 (17.7%)	32 (22.2%)	1.34 [0.83; 2.11]	2.24e-01
	Stage II	228 (40.1%)	183 (43.2%)	45 (31.2%)	0.6 [0.4; 0.89]	1.15e-02
	Stage III	163 (28.7%)	114 (26.9%)	49 (34%)	1.41 [0.93; 2.1]	1.02e-01
	Stage IV	70 (12.3%)	52 (12.3%)	18 (12.5%)	1.04 [0.58; 1.8]	8.99e-01
Prior Treatment, N (%)	No	578 (88.7%)	426 (88.4%)	152 (89.4%)	1.09 [0.64; 1.95]
	Yes	74 (11.3%)	56 (11.6%)	18 (10.6%)	0.92 [0.51; 1.57]	7.55e-01
Tissue Sample Type, N (%)	Primary Tumor	647 (92.7%)	481 (92.1%)	166 (94.3%)	1.37 [0.7; 2.89]
	Solid Tissue Normal	51 (7.3%)	41 (7.9%)	10 (5.7%)	0.73 [0.35; 1.42]	3.69e-01

We can see from the results that patients with rectal adenocarcinoma in this dataset were significantly younger than patients with colon adenocarcinoma by 2.8 years on average (P=0.01) and were more likely to be White (92.5%, P=3E-4) compared to colon adenocarcinoma, which had a significantly lower frequency of White patients (76.2%) and higher frequency of Black or African American patients (19.8% vs 6.5%, P=1E-3). These would be important confounding variables to consider in further analyses assessing differences between these two groups.

Using the export_to_xlsx function, we can export the results to a publication-ready table.

Tip

Make sure to specify the numeric_columns parameter if there are any numeric columns in the data, so it is formatted correctly or else it will come out as text (which is not the end of the world, just annoying).

export_to_xlsx(
  results,
  out_path = "example_output/table1.xlsx",
  sheet = "Table 1",
  numeric_columns = "P"
)

This is a snapshot of the exported table in Excel.

Multi Group Table

Below we use the function on a grouping variable with more than two groups: determining what patient demographics and/or clinical variables may differ between different gene expression quartiles of PD-L1 (CD274). The overall parameters and process for the stratified_stat_table are the same, but the output and interpretation are a little different. Instead of just one coefficient and corresponding p-value per variable as seen for a two group grouping variable, the function will output coefficients and p-values for each group specified in the grouping variable. This is because the function performs statistical testing for each group vs all other groups (one-vs-all testing framework) in lieu of performing an omnibus test for differences across groups (e.g., what’s done in a Kruskal-Wallis rank-sum test or ANOVA).

See Table 2 for results.

# Define grouping variable
groups = "pdl1_gex"

# Add cancer type to columns of interest and rename list since we're not
# using it as a grouping variable now
columns = c(columns, "name")
rename_dict["name"] = "Cancer Type"

# Create needed grouping variable
ensgid = gex_data$gene_id[gex_data$gene_name == "CD274"]
quantiles = quantile(unlist(as.vector(gex_tpm[ensgid, ])))
test_data[[groups]] = factor(
  dplyr::case_when(
    gsub("-", ".", rownames(test_data)) %in%
      colnames(gex_tpm)[gex_tpm[ensgid, ] >= quantiles[4]] ~ "Very High",
    gsub("-", ".", rownames(test_data)) %in%
      colnames(gex_tpm)[gex_tpm[ensgid, ] >= quantiles[3]] ~ "High",
    gsub("-", ".", rownames(test_data)) %in%
      colnames(gex_tpm)[gex_tpm[ensgid, ] >= quantiles[2]] ~ "Moderate",
    gsub("-", ".", rownames(test_data)) %in%
      colnames(gex_tpm)[gex_tpm[ensgid, ] < quantiles[2]] ~ "Low",
    .default = NA
  ),
  levels = c("Low", "Moderate", "High", "Very High")
)

# Build and display table
results = stratified_stat_table(test_data, columns, groups, rename_dict)
kable_html_table(results, bold_columns = c(1, 2))

Table 2: Patient and clinical characteristics stratified by PD-L1 (CD274) gene expression level

Variable	Categories	Total (N=698)	Low (N=175)	Coef [95%CI]	P	Moderate (N=174)	Coef [95%CI]	P	High (N=174)	Coef [95%CI]	P	Very High (N=175)	Coef [95%CI]	P
Age, Mean±SD	-	66.6±12.8	66.8±12.4	0.27 [-1.93; 2.46]	8.11e-01	64.3±13.3	-3.06 [-5.25; -0.87]	6.30e-03	66.6±12.7	-0.01 [-2.21; 2.19]	9.96e-01	68.7±12.4	2.76 [0.58; 4.95]	1.31e-02
Gender, N (%)	Female	330 (47.5%)	85 (48.6%)	1.06 [0.75; 1.49]		75 (43.6%)	0.81 [0.57; 1.15]		81 (46.8%)	0.97 [0.68; 1.36]		89 (50.9%)	1.2 [0.85; 1.69]
	Male	365 (52.5%)	90 (51.4%)	0.94 [0.67; 1.33]	7.38e-01	97 (56.4%)	1.23 [0.87; 1.74]	2.42e-01	92 (53.2%)	1.04 [0.73; 1.46]	8.43e-01	86 (49.1%)	0.84 [0.59; 1.18]	3.02e-01
Race, N (%)	American Indian or Alaska Native	2 (0.5%)	0 (0%)	0.57 [0; 7.04]	6.98e-01	2 (2%)	16.54 [1.33; 2288.6]	2.81e-02	0 (0%)	0.64 [0; 7.9]	7.60e-01	0 (0%)	0.54 [0; 6.7]	6.72e-01
	Asian	12 (2.9%)	4 (3.7%)	1.53 [0.43; 4.72]	4.85e-01	2 (2%)	0.76 [0.15; 2.69]	6.97e-01	4 (4%)	1.72 [0.49; 5.32]	3.76e-01	2 (1.8%)	0.64 [0.12; 2.24]	5.10e-01
	Black or African American	71 (16.9%)	18 (16.5%)	0.98 [0.54; 1.72]	9.47e-01	26 (26.3%)	2.2 [1.27; 3.77]	5.52e-03	19 (19%)	1.23 [0.68; 2.16]	4.89e-01	8 (7.1%)	0.31 [0.14; 0.63]	6.68e-04
	White	336 (79.8%)	87 (79.8%)	0.99 [0.58; 1.72]	9.70e-01	69 (69.7%)	0.47 [0.28; 0.8]	5.17e-03	77 (77%)	0.79 [0.47; 1.38]	4.06e-01	103 (91.2%)	3.19 [1.67; 6.67]	2.40e-04
Ethnicity, N (%)	Hispanic or Latino	5 (1.2%)	3 (2.8%)	3.97 [0.76; 24.13]		1 (1%)	1.07 [0.11; 5.85]		0 (0%)	0.3 [0; 2.68]		1 (0.9%)	0.9 [0.09; 4.92]
	Not Hispanic or Latino	400 (98.8%)	104 (97.2%)	0.25 [0.04; 1.31]	9.88e-02	95 (99%)	0.94 [0.17; 9.43]	9.46e-01	93 (100%)	3.34 [0.37; 440.59]	3.37e-01	108 (99.1%)	1.11 [0.2; 11.19]	9.10e-01
Clinical Stage, N (%)	Stage I	107 (18.8%)	21 (14.9%)	0.7 [0.41; 1.16]	1.73e-01	31 (22.1%)	1.33 [0.82; 2.1]	2.43e-01	27 (19.7%)	1.09 [0.66; 1.74]	7.37e-01	28 (18.7%)	0.99 [0.61; 1.58]	9.79e-01
	Stage II	228 (40.1%)	44 (31.2%)	0.6 [0.4; 0.9]	1.22e-02	48 (34.3%)	0.72 [0.48; 1.07]	1.04e-01	66 (48.2%)	1.54 [1.05; 2.27]	2.83e-02	70 (46.7%)	1.44 [0.99; 2.1]	5.80e-02
	Stage III	163 (28.7%)	46 (32.6%)	1.29 [0.85; 1.93]	2.31e-01	45 (32.1%)	1.25 [0.82; 1.88]	2.94e-01	32 (23.4%)	0.7 [0.45; 1.09]	1.15e-01	40 (26.7%)	0.88 [0.57; 1.32]	5.35e-01
	Stage IV	70 (12.3%)	30 (21.3%)	2.62 [1.56; 4.37]	3.52e-04	16 (11.4%)	0.91 [0.49; 1.61]	7.53e-01	12 (8.8%)	0.64 [0.32; 1.17]	1.52e-01	12 (8%)	0.56 [0.28; 1.02]	5.98e-02
Prior Treatment, N (%)	No	578 (88.7%)	141 (87.6%)	0.86 [0.51; 1.51]		141 (87.6%)	0.86 [0.51; 1.51]		143 (87.2%)	0.82 [0.49; 1.42]		153 (92.2%)	1.64 [0.91; 3.17]
	Yes	74 (11.3%)	20 (12.4%)	1.16 [0.66; 1.97]	5.88e-01	20 (12.4%)	1.16 [0.66; 1.97]	5.88e-01	21 (12.8%)	1.22 [0.7; 2.06]	4.72e-01	13 (7.8%)	0.61 [0.32; 1.1]	1.00e-01
Tissue Sample Type, N (%)	Primary Tumor	647 (92.7%)	171 (97.7%)	3.8 [1.58; 11.8]		152 (87.4%)	0.4 [0.23; 0.73]		156 (89.7%)	0.58 [0.32; 1.06]		168 (96%)	2.09 [1; 4.99]
	Solid Tissue Normal	51 (7.3%)	4 (2.3%)	0.26 [0.08; 0.63]	1.64e-03	22 (12.6%)	2.48 [1.38; 4.4]	2.75e-03	18 (10.3%)	1.73 [0.94; 3.11]	7.73e-02	7 (4%)	0.48 [0.2; 1]	5.04e-02
Cancer Type, N (%)	Colon Adenocarcinoma	522 (74.8%)	123 (70.3%)	0.73 [0.5; 1.08]		127 (73%)	0.88 [0.6; 1.3]		128 (73.6%)	0.91 [0.62; 1.36]		144 (82.3%)	1.76 [1.16; 2.75]
	Rectum Adenocarcinoma	176 (25.2%)	52 (29.7%)	1.36 [0.93; 1.99]	1.13e-01	47 (27%)	1.14 [0.77; 1.67]	5.16e-01	46 (26.4%)	1.09 [0.74; 1.61]	6.53e-01	31 (17.7%)	0.57 [0.36; 0.86]	7.48e-03

From these results we see that patients with tumors having very high PD-L1 gene expression tended to be older by 2.8 years on average compared to other gene expression levels (P=0.01) and tended to be from predominately White patients (91.2%, P=2E-4). These associations may be partially due to an enrichment of colon adenocarcinoma cases in very high PD-L1 expressing tumors (82.3% vs 70-74% in other groups, P=8E-3), which were associated with these patient demographics in the previous results.

To test and see if these associations with age and race are at least partly due to enrichment of colon adenocarcinoma cases, we can supply cancer type as a covariate to the stratified_stat_table function, adjusting the analyses for differences between cancer types.

Tip

For a sanity check, we can leave cancer types in the table, the results will just be moot since we are adjusting the analysis by this variable, but it will tell us that we are successfully adjusting for these effects.

See Table 3 for results.

# Define covariates
covariates = "name"

# Build and display table
results_adj = stratified_stat_table(
  test_data,
  columns,
  groups,
  rename_dict,
  covariates
)
kable_html_table(results_adj, bold_columns = c(1, 2))

Table 3: Patient and clinical characteristics stratified by PD-L1 (CD274) gene expression level and adjusting for cancer type

Variable	Categories	Total (N=698)	Low (N=175)	Coef [95%CI]	P	Moderate (N=174)	Coef [95%CI]	P	High (N=174)	Coef [95%CI]	P	Very High (N=175)	Coef [95%CI]	P
Age, Mean±SD	-	66.6±12.8	66.8±12.4	0.44 [-1.75; 2.63]	6.94e-01	64.3±13.3	-2.98 [-5.17; -0.8]	7.55e-03	66.6±12.7	0.02 [-2.17; 2.22]	9.82e-01	68.7±12.4	2.51 [0.32; 4.7]	2.45e-02
Gender, N (%)	Female	330 (47.5%)	85 (48.6%)	1.06 [0.75; 1.5]		75 (43.6%)	0.81 [0.58; 1.15]		81 (46.8%)	0.97 [0.68; 1.36]		89 (50.9%)	1.2 [0.85; 1.69]
	Male	365 (52.5%)	90 (51.4%)	0.94 [0.67; 1.33]	7.29e-01	97 (56.4%)	1.23 [0.87; 1.74]	2.44e-01	92 (53.2%)	1.03 [0.73; 1.46]	8.45e-01	86 (49.1%)	0.84 [0.59; 1.18]	3.09e-01
Race, N (%)	American Indian or Alaska Native	2 (0.5%)	0 (0%)	0.56 [0; 6.93]	6.91e-01	2 (2%)	16.95 [1.36; 2346.11]	2.67e-02	0 (0%)	0.65 [0; 8.07]	7.72e-01	0 (0%)	0.52 [0; 6.46]	6.53e-01
	Asian	12 (2.9%)	4 (3.7%)	1.51 [0.43; 4.68]	4.96e-01	2 (2%)	0.78 [0.15; 2.75]	7.21e-01	4 (4%)	1.76 [0.5; 5.45]	3.59e-01	2 (1.8%)	0.62 [0.12; 2.18]	4.81e-01
	Black or African American	71 (16.9%)	18 (16.5%)	0.97 [0.53; 1.71]	9.11e-01	26 (26.3%)	2.31 [1.32; 4]	3.72e-03	19 (19%)	1.26 [0.69; 2.24]	4.37e-01	8 (7.1%)	0.29 [0.13; 0.6]	3.92e-04
	White	336 (79.8%)	87 (79.8%)	1 [0.59; 1.76]	9.89e-01	69 (69.7%)	0.45 [0.26; 0.76]	3.23e-03	77 (77%)	0.77 [0.45; 1.35]	3.52e-01	103 (91.2%)	3.4 [1.77; 7.15]	1.21e-04
Ethnicity, N (%)	Hispanic or Latino	5 (1.2%)	3 (2.8%)	3.95 [0.76; 23.93]		1 (1%)	1.07 [0.11; 5.83]		0 (0%)	0.3 [0; 2.67]		1 (0.9%)	0.9 [0.09; 4.9]
	Not Hispanic or Latino	400 (98.8%)	104 (97.2%)	0.25 [0.04; 1.32]	9.95e-02	95 (99%)	0.94 [0.17; 9.42]	9.47e-01	93 (100%)	3.34 [0.37; 439.5]	3.37e-01	108 (99.1%)	1.12 [0.2; 11.22]	9.07e-01
Clinical Stage, N (%)	Stage I	107 (18.8%)	21 (14.9%)	0.69 [0.4; 1.14]	1.55e-01	31 (22.1%)	1.31 [0.81; 2.07]	2.68e-01	27 (19.7%)	1.09 [0.66; 1.75]	7.31e-01	28 (18.7%)	1.02 [0.63; 1.63]	9.34e-01
	Stage II	228 (40.1%)	44 (31.2%)	0.61 [0.41; 0.92]	1.65e-02	48 (34.3%)	0.74 [0.49; 1.1]	1.33e-01	66 (48.2%)	1.54 [1.05; 2.28]	2.88e-02	70 (46.7%)	1.38 [0.95; 2.02]	9.37e-02
	Stage III	163 (28.7%)	46 (32.6%)	1.27 [0.84; 1.9]	2.63e-01	45 (32.1%)	1.23 [0.81; 1.85]	3.33e-01	32 (23.4%)	0.7 [0.45; 1.09]	1.17e-01	40 (26.7%)	0.9 [0.59; 1.37]	6.36e-01
	Stage IV	70 (12.3%)	30 (21.3%)	2.62 [1.55; 4.37]	3.56e-04	16 (11.4%)	0.91 [0.49; 1.6]	7.50e-01	12 (8.8%)	0.64 [0.32; 1.17]	1.53e-01	12 (8%)	0.56 [0.28; 1.02]	5.98e-02
Prior Treatment, N (%)	No	578 (88.7%)	141 (87.6%)	0.85 [0.5; 1.5]		141 (87.6%)	0.86 [0.51; 1.5]		143 (87.2%)	0.82 [0.49; 1.42]		153 (92.2%)	1.67 [0.92; 3.22]
	Yes	74 (11.3%)	20 (12.4%)	1.17 [0.67; 1.98]	5.74e-01	20 (12.4%)	1.17 [0.67; 1.98]	5.80e-01	21 (12.8%)	1.22 [0.7; 2.06]	4.70e-01	13 (7.8%)	0.6 [0.31; 1.08]	9.17e-02
Tissue Sample Type, N (%)	Primary Tumor	647 (92.7%)	171 (97.7%)	3.74 [1.55; 11.61]		152 (87.4%)	0.4 [0.23; 0.72]		156 (89.7%)	0.57 [0.32; 1.06]		168 (96%)	2.17 [1.03; 5.2]
	Solid Tissue Normal	51 (7.3%)	4 (2.3%)	0.27 [0.09; 0.64]	1.94e-03	22 (12.6%)	2.5 [1.39; 4.44]	2.53e-03	18 (10.3%)	1.74 [0.94; 3.13]	7.47e-02	7 (4%)	0.46 [0.19; 0.97]	3.97e-02
Cancer Type, N (%)	Colon Adenocarcinoma	522 (74.8%)	123 (70.3%)	0.89 [0; 353.95]		127 (73%)	1.05 [0; 413.99]		128 (73.6%)	1.09 [0; 427.64]		144 (82.3%)	2.71 [0; 747.8]
	Rectum Adenocarcinoma	176 (25.2%)	52 (29.7%)	1.31 [0; 484.73]	9.72e-01	47 (27%)	1.34 [0; 473.22]	9.85e-01	46 (26.4%)	1.35 [0; 470]	9.85e-01	31 (17.7%)	1.64 [0; 424.59]	9.11e-01

We see that the age and race associations are still there for patients with very high PD-L1 expressing tumors, so we can conclude these associations are independent of the association seen between PD-L1 gene expression and cancer type (at least in the current dataset). As expected, the results for cancer types are basically null (odds ratio confidence intervals are huge and p-value is ~1), so we know the adjustment was working.

Using the export_to_xlsx function, we can export the results to a publication-ready table like we did before.

export_to_xlsx(
  results,
  out_path = "example_output/table2.xlsx",
  sheet = "Table 2",
  numeric_columns = "P"
)

Caution

While the kable_html_table and export_to_xlsx should be flexible to work with any dataframe, they were specifically designed to work with results from running the stratified_strat_table function, therefore, unexpected results may occur when using them not on this function’s output.
Additionally, I would only used kable_html_table on smaller tables as it has no pagination capability. If wanting to display larger tables in HTML, I would use the datatable function from the DT package, or something similar.
Theoretically, the stratified_stat_table function will work with any grouping variable regardless the number of groups contained within the variable, however, that doesn’t mean you SHOULD test a ton of groups, especially since the function does not do multiple testing correction. I would limit to 3 or 4 groups, any more than that you really should be doing multiple testing corrections and it will start to become difficult to interpret results. If you have a grouping variable with a lot of groups, see if you can collapse some or break it up somehow to make it more digestible.

Plotting

A number of functions included in this package can be used for quickly generating publication-ready data visualizations that are commonly performed in scientific research. These functions produce the following types of plots:

Traditional Plots

stratified_barplot: Plot for showing the frequency distribution of a categorical variable of interest stratified by a grouping variable.
stratified_violin_boxplot: Plot for visualizing the distributions of a quantitative variable of interest stratified by a grouping variable.

Specialty Plots

composition_barplot: Stacked barplot normalized to 1 (or 100%) for showing compositions of a categorical variable of interest within groups or individual observations (e.g., a breakdown of genetic ancestry compositions for a cohort of individuals).
longtail_plot: Non-normalized stacked barplots for showing compositions of a categorical variable of interest within groups while preserving (and sorting by) the total count/frequency of groups. Observations are sorted from most prevalent to least, creating a “long tail” of data, hence the name. Commonly used for visualizing genomic alteration frequencies grouped by genes and broken down by alteration type.
volcano_plot: Plot for visualizing the relationship between p-values (y-axis) and effect sizes (x-axis) obtained from statistically testing many features of a single data modality (e.g., testing for differentially expressed genes from RNA-seq experiments).

All functions have the option to save the image to file and modify the width and height of the exported file, so it’s publication-ready.

Note

The units for image width and height are in pixels (e.g., 1000 x 1500 pixels).

Tip

All functions use the ggplot2 package and packages that extend it’s functionality and return a ggplot2 figure object. This means that you have the ability to add more to the plots, or modify existing layers of the plot within the object if needed. These functions should get you 90% - 100% of the way, but every now and then you may need to tweak things.

Traditional Plots

The following functions produce plots that are useful for showing the distributions of categorical or quantitative variables stratified by a grouping variable. These plots are traditional distribution plots, used in any field involving statistical analyses.

Additional to the plotting functionality, you can optionally request pairwise statistical testing to be performed (with or without multiple testing correction) and significant associations annotated in the plots. Statistical testing is performed via Fisher’s exact test (fisher.test) or Chi-squared test (chisq.test) for categorical variables (i.e., those handled by stratified_barplot) and Welch’s t-test (t.test) or Wilcoxon rank-sum test (wilcox.test) for quantitative variables (i.e., those handled by stratified_violin_boxplot).

Stratified Bar Plots

To demonstrate the stratified_barplot function, lets visualize the distributions of self-reported race within each of the cancer types we’ve been looking at. Recall earlier that there were significantly more White patients in the rectal adenocarcinoma group compared to the colon adenocarcinoma group, which was more diverse.

See Figure 1 for the result.

g = stratified_barplot(
  test_data,
  groups = "name",
  column = "race",
  xlab = "Cancer Type",
  legendlab = "Patient Race",
  test = TRUE,
  alpha = 0.05,
  save = TRUE,
  figwidth = 2000,
  figheight = 1500,
  out_path = "example_output/figure1.jpg"
)

Figure 1: Distribution of self-reported race among patients with colon and rectal adenocarcinoma

Indeed we see the enrichment of self-reported White patients in the rectal adenocarcinoma group and that the overall racial distribution between the two groups is significantly different (P=0.003). From statistical testing previously, we know this is due to the enrichment of White patients in the rectal adenocarcinoma group.

Tip

The text size of plots have been left at the ggplot2 default since it’s difficult to try and calculate the most effective text size on the front end. If text is too small or too large, play around with the figwidth and figheight parameters to change the size of the exported image, which will influence how well the text fits.

Now lets visualize the relationship between cancer type and different gene expression levels of PD-L1 (another significant association we observed earlier).

See Figure 2 for the result.

g = stratified_barplot(
  test_data,
  groups = "pdl1_gex",
  column = "name",
  xlab = "PD-L1 Gene Expression Levels",
  legendlab = "Cancer Type",
  test = TRUE,
  alpha = 0.05,
  save = TRUE,
  figwidth = 2000,
  figheight = 1500,
  out_path = "example_output/figure2.jpg"
)

Figure 2: Distribution of colon and rectal adenocarcinoma stratified by PD-L1 gene expression level

Again we see tumors with very high PD-L1 gene expression had significantly different distributions of the two cancer types compared to other groups, mainly tumors with low or moderate PD-L1 gene expression.

Because both variables being visualized by the stratified_barplot function are categorical, we can switch out the grouping variable and target variable to pivot the plot and look at the data from the alternate end.

See Figure 3 for the results.

g = stratified_barplot(
  test_data,
  groups = "name",
  column = "pdl1_gex",
  xlab = "Cancer Type",
  legendlab = "PD-L1 Gene Expression Level",
  test = TRUE,
  alpha = 0.05,
  save = TRUE,
  figwidth = 2000,
  figheight = 1500,
  out_path = "example_output/figure3.jpg"
)

Figure 3: Distribution of PD-L1 gene expression level in colon and rectal adenocarcinoma

Caution

Be wary with how many columns are being plotted when using stratified_barplot. Similar to the stratified_stat_table function, there is nothing in the code (yet) that hinders you from specifying variables with any number of groups, but I again would caution not to specify a grouping variable with more than 3 or 4 groups. The longtail_plot or composition_barplot functions handle many groups much more efficiently. See section Specialty Plots for more on those.

Tip

If you do not specify a grouping variable to groups, then a simple barplot for all cases will be plotted and the categories in the variable supplied to column will become the x-axis groups.

Stratified Violin-Boxplots

To demonstrate the stratified_violin_boxplot function, lets visualize the distributions of patient age within each of the cancer types we’ve been looking at. Recall earlier that patients with colon adenocarcinoma tended to be a few years younger on average than patients with rectal adenocarcinoma.

See Figure 4 for the result.

g = stratified_violin_boxplot(
  test_data,
  groups = "name",
  column = "age_at_index",
  xlab = "Cancer Type",
  ylab = "Patient Age (Years)",
  test = "t.test",
  alpha = 0.05,
  save = TRUE,
  figwidth = 1500,
  figheight = 1500,
  out_path = "example_output/figure4.jpg"
)

Figure 4: Age distribution among patients with colon and rectal adenocarcinoma

As we did earlier, we see the significant, almost 3 year age difference between patients with colon and rectal adenocarcinoma.

Now lets visualize the distribution of patient age based on the PD-L1 gene expression levels we created. Recall very high PD-L1 gene expression was associated with older patient age on average.

See Figure 5 for the result.

g = stratified_violin_boxplot(
  test_data,
  groups = "pdl1_gex",
  column = "age_at_index",
  xlab = "PD-L1 Gene Expression Level",
  ylab = "Patient Age (Years)",
  test = "t.test",
  alpha = 0.05,
  save = TRUE,
  figwidth = 1500,
  figheight = 1500,
  out_path = "example_output/figure5.jpg"
)

Figure 5: Age distribution among patients with varying levels of tumor PD-L1 gene expression

From the distributions and pairwise testing, we see that the key difference lies between patients whose tumors have moderate PD-L1 gene expression compared to the very high expression group.

Specialty Plots

The following functions produce plots that are more niche in nature, typically used in certain situations commonly come across in biomedical research. Unlike the Traditional Plots, these plots won’t include any statistical testing and are just used for visualizing large amounts of data or results from larger analyses.

Composition Barplots

Composition barplots are stacked barplots that are normalized to 1 or 100% in order to show how groups or individual observations are composed of another categorical variable of interest. Common use cases for this type of plot include visualizing ancestry admixture in a cohort of individuals, or microbial compositions of biological samples taken from organisms or the environment, but it can be used for any situation where you want to show proportions of a categorical variable within individual observations or groups.

To showcase the composition_barplot function, we will visualize the breakdown of gene expression levels for a set of genes known as cancer testis antigen (CTA) genes in the two cancer types we’ve been working with. CTA genes are a class of genes primarily expressed in normal germ cells of the testis but can get reactivated in various cancers, which make them good potential targets for immunotherapy and cancer vaccines.

Similar to what we did with PD-L1 gene expression, we will first create a number of new variables indicating gene expression levels of different CTA genes, then, for each gene, we will plot the breakdown of gene expression levels in colon and rectal adenocarcinoma cases. This can give us an idea of what CTA genes are most highly expressed in these cohorts. We will target genes in the MAGE, GAGE, and SSX gene families.

Note

Could we plot the gene expression values for these genes as violin-boxplots? Absolutely, that would be another good option to get a feel for the data. Sometimes though, especially for highly skewed data, it’s nice to bin data into discrete categories because it’s difficult to see differences between groups with the distributions of numeric values. Because CTA genes are expressed only in normal germ cell tumors and only reactivated somatically by cancer cells, their expression is usually quite sparse and their distributions will be highly skewed toward 0. This would make most violin-boxplots look squished, with a dense bottom around 0 and very spindle-like towards higher values.

See Figure 6 for the results.

# Create needed grouping variables
ensgid = gex_data$gene_id[grepl("MAGE|GAGE|SSX", gex_data$gene_name)]
for (id in ensgid) {
  x = unlist(as.vector(gex_tpm[id, ]))
  quantiles = quantile(x[x > 0]) # Getting quantiles on non-zero data due to sparsity
  test_data[[gex_data[id, "gene_name"]]] = factor(
    dplyr::case_when(
      gsub("-", ".", rownames(test_data)) %in%
        colnames(gex_tpm)[gex_tpm[id, ] >= quantiles[4]] ~ "Very High",
      gsub("-", ".", rownames(test_data)) %in%
        colnames(gex_tpm)[gex_tpm[id, ] >= quantiles[3]] ~ "High",
      gsub("-", ".", rownames(test_data)) %in%
        colnames(gex_tpm)[gex_tpm[id, ] >= quantiles[2]] ~ "Moderate",
      gsub("-", ".", rownames(test_data)) %in%
        colnames(gex_tpm)[gex_tpm[id, ] >= quantiles[1]] ~ "Low",
      gsub("-", ".", rownames(test_data)) %in%
        colnames(gex_tpm)[gex_tpm[id, ] < quantiles[1]] ~ "Negative",
      .default = NA
    ),
    levels = c("Negative", "Low", "Moderate", "High", "Very High")
  )
}

# Create a data.frame in "long" format for plotting
long_data = tidyr::pivot_longer(
  test_data[, c(
    "name",
    grep("MAGE|GAGE|SSX", colnames(test_data), value = TRUE)
  )],
  cols = grep("MAGE|GAGE|SSX", colnames(test_data), value = TRUE),
  names_to = "genes",
  values_to = "level"
)

# Perform plotting
g = composition_barplot(
  long_data,
  groups = "genes",
  column = "level",
  subcols = "name",
  xlab = "CTA Genes",
  legendlab = "Gene Expression Level",
  color_list = RColorBrewer::brewer.pal(
    length(levels(long_data$level)),
    "Blues"
  ),
  save = TRUE,
  figwidth = 5000,
  figheight = 1000,
  out_path = "example_output/figure6.jpg"
)

Figure 6: Composition of gene expression levels for CTA genes in colon and rectal adenocarcinoma

As expected for CTA genes, there is a lot of low-level expression among these cases with only a handful of genes having fairly high expression levels.

The plot looks a little squished in horizontal orientation, so we can use the parameter flip_plot = TRUE to make it vertical. See Figure 7.

g = composition_barplot(
  long_data,
  groups = "genes",
  column = "level",
  subcols = "name",
  xlab = "CTA Genes",
  legendlab = "Gene Expression Level",
  color_list = RColorBrewer::brewer.pal(
    length(levels(long_data$level)),
    "Blues"
  ),
  flip_plot = TRUE,
  save = TRUE,
  figwidth = 2000,
  figheight = 2500,
  out_path = "example_output/figure7.jpg"
)

Figure 7: Composition of gene expression levels for CTA genes in colon and rectal adenocarcinoma (vertical orientation)

Definitely looks better in this orientation.

We can also add another grouping on the bars themselves to break them up into meaningful categories. Lets group the genes we’re showing by gene family (i.e., MAGE, GAGE, SSX).

# Add variable denoting gene family
long_data$gene_family = dplyr::case_when(
  grepl("MAGE", long_data$genes) ~ "MAGE",
  grepl("GAGE", long_data$genes) ~ "GAGE",
  grepl("SSX", long_data$genes) ~ "SSX",
  .default = NA
)

# Perform plotting
g = composition_barplot(
  long_data,
  groups = "genes",
  column = "level",
  subrows = "gene_family",
  subcols = "name",
  xlab = "CTA Genes",
  legendlab = "Gene Expression Level",
  color_list = RColorBrewer::brewer.pal(
    length(levels(long_data$level)),
    "Blues"
  ),
  flip_plot = TRUE,
  save = TRUE,
  figwidth = 2000,
  figheight = 2500,
  out_path = "example_output/figure8.jpg"
)

Figure 8: Composition of gene expression levels for CTA genes in colon and rectal adenocarcinoma (vertical orientation)

Tip

Depending on if you choose to flip the orientation of the plot with flip_plot, you may need to switch around the variables being specified to subrows and/or subcols to make the groupings work correctly. You can tell that the groupings are not working if you get blank columns in the panels.

Also, you’ll notice that if you make add_labels = TRUE (the default) then the plot function will assume that you are plotting few enough bars to see the labels in each bar nicely, so it will also make the plot more nicely formatted by adding a black border to the bars and segments and making the x-axis labels horizontal. If this is set to FALSE, then the assumption is you are plotting a lot of bars and want a traditional composition barplot look, so it does not add the extra borders (because that would look messy with a lot of bars) and it flips x-axis labels vertically (90 degrees rotation). If the x-axis labels are still too crowded because you are plotting a ton of bars (like where one bar is an individual in a cohort), then you can set remove_xaxis_text = TRUE to remove them entirely.

Longtail Plots

Longtail plots refer to non-normalized stacked barplots for showing compositions of a categorical variable of interest within groups while preserving (and usually sorting by) the total count/frequency of groups. Observations are sorted from most prevalent to least, creating a “long tail” of data, which is where the name comes from. A common use case for this type of plot is visualizing genomic alteration frequencies grouped by genes and broken down by alteration type. For sake of simplicity, we will use the same data as we did above for the composition barplot, except we will not plot the Negative values. This is the main difference between a “composition” and “longtail” barplot, not plotting the “negative”, null, or missing results, so that the bar doesn’t reach 100% all of the time. We will use the negative_value parameter in the longtail_barplot function to specify what is the negative value to not plot.

See Figure 9 for the results.

g = longtail_barplot(
  long_data,
  groups = "genes",
  column = "level",
  negative_value = "Negative",
  xlab = "CTA Genes",
  legendlab = "Gene Expression Level",
  color_list = RColorBrewer::brewer.pal(
    length(levels(long_data$level)) - 1,
    "Blues"
  ),
  save = TRUE,
  figwidth = 3500,
  figheight = 1700,
  out_path = "example_output/figure9.jpg"
)

Figure 9: Gene expression prevalence for CTA genes in colon and rectal adenocarcinoma

As we saw in the composition barplot, a handful of MAGE genes and one SSX are expressed in at least half of tumors, but most of the CTA genes were sparsely detected.

Volcano Plots

Volcano plots (given the name by their volcano-like appearance) are great for visualizing the relationship between p-values (plotted on the y-axis, typically after -log10 transformation) and the corresponding effect sizes (plotted on the x-axis) from statistically testing many features of a single data modality. The original use case for this type of plot was visualizing the results from differential gene expression analyses (derived from micro-array or RNA-seq), where the -log10 p-value and fold change (usually log2 transformed) are plotted together. However, this plot can be used for visualizing any results where many p-values and effect sizes are obtained (e.g., differential abundance analysis for microbiome data or differential protein expression from proteomics). For sake of examples though, we will go with the standard use case and utilize the RNA-seq data we’ve downloaded – performing a differential expression analysis between our cancer types and visualizing those results with the volcano_plot function.

Important

While the volcano_plot function can be used to visualize any number of p-values and effect sizes, it really is only appropriate to use when the data being tested was derived from the same modality of data, i.e., not mixing results from many different data types that are not related. For example, I would use this plot to visualize results from the same RNA-seq or proteomic experiment, but I would not use it to visualize p-values and effect sizes from testing a large number of different clinical variables, each of which can be related or not and have completely different scales and interpretations. There are better plots for those types of analyses if you want to show them side by side (e.g., forest plots).

First lets perform a quick differential expression analysis on the RNA-seq data between the two cancer types. We can use the stratified_stat_table function to do this and extract what we need for plotting from the results.

Note

Because we are analyzing a type of omics data with the stratified_stat_table function (i.e., high-throughput data that only contains numeric variables), I will set omics_mode = TRUE to modify the workflow as needed when working with omics data, including turning off the automatic formatting of variable names and performing multiple testing correction. If I didn’t set this, then all of the gene names would only have the first letter capitalized, which would make them look like mouse/rat gene names. Additionally, all of the gene names would have had “Mean±SD” added to them, which would have just added useless size to the data since all the data is numeric.

See Table 4 for results with a false discovery rate (FDR) corrected q-value < 0.05.

# Prepare the gene expression data for analysis by transposing data to be
# sample by gene, filtering genes by those that (1) have non-zero expression in
# at least 50% of cases and (2) are protein coding, transform the data using a
# simple centered log-ratio transformation, and add cancer type variable
gex_test_data = data.frame(t(gex_counts))
gex_test_data = gex_test_data[,
  colSums(gex_test_data > 0) > (0.5 * nrow(gex_test_data)) &
    colnames(gex_test_data) %in%
      gex_data$gene_id[gex_data$gene_type == "protein_coding"]
]
gex_test_data = data.frame(t(apply(gex_test_data, 1, function(x) {
  log(x + 1) - mean(log(x + 1))
})))
gex_test_data = data.frame(
  gex_test_data,
  name = test_data[gsub("\\.", "-", colnames(gex_counts)), "name"]
)

# Define grouping variable
groups = "name"

# Define columns of interest
columns = colnames(gex_test_data)[-ncol(gex_test_data)]

# Define named list to rename variables by
rename_dict = as.list(gex_data$gene_name)
names(rename_dict) = gex_data$gene_id

# Run analysis
results = stratified_stat_table(
  gex_test_data,
  columns,
  groups,
  rename_dict,
  omics_mode = TRUE
)

# Display significant results
sig_results = results[as.numeric(results$FDR) < 0.05, ]
sig_results = sig_results[order(as.numeric(sig_results$P)), ]
DT::datatable(sig_results, rownames = FALSE)

Table 4: Differential gene expression analysis results for colon vs rectal adenocarcinoma

Now lets plot the differential expression analysis results using the volcano_plot function.

See Figure 10 for the results.

g = volcano_plot(
  results,
  variable = "Variable",
  coef = "Coef",
  pvalue = "FDR",
  transform_pvalue = TRUE,
  alpha = 0.05,
  top_n = 5,
  first_group_name = "Colon Adenocarcinoma",
  second_group_name = "Rectum Adenocarcinoma",
  ylab = "-log10 FDR q-value",
  xlab = "Mean difference in CLR-transformed gene expression",
  save = TRUE,
  figwidth = 2000,
  figheight = 1500,
  out_path = "example_output/figure10.jpg"
)

Figure 10: Differential gene expression analysis results between colon and rectal adenocarcinoma

Just looking at our top 5 hits on either side of the plot, we can see that our results make sense for what we would expect in gene expression differences between colon and rectal adenocarcinoma cancer types:

Rectal-enriched (positive ΔCLR)

PRAC1 sits ~4 kb downstream of HOXB13 on 17q21 and is classically described as Prostate, Rectum And distal Colon expressed; both genes tend to be co‑regulated. Seeing them as the top rectal markers is expected.
HOXB13 is a posterior HOX gene with high expression in rectum; it is even used diagnostically as a rectal neuroendocrine tumor marker.
EVX2 and HOXD12 lie at/near the 5′ end of the HOXD cluster (2q31) and often travel with posterior HOXD regulation; seeing both enriched together is a nice internal consistency check.

Colon‑enriched (negative ΔCLR)

HOXC6/HOXC8 (and HOXD3/HOXD4) are more anterior than HOXB13/HOXD12 in the HOX “code”. Prior work repeatedly reports HOXC family up‑regulation in colon cancer (especially right colon), so finding them higher in “colon vs. rectum” is very plausible.

Important

If you have previously transformed p-values with a -log10 or other transformation and are setting transform_pvalue = FALSE, make sure that you specify alpha using the corresponding transformed value (e.g., alpha = -log10(0.05) if you already used a -log10 transform). If not, the function will not work correctly.

Utilities

Under Construction

Come back later!

R Session

R version 4.5.3 (2026-03-11)
Platform: aarch64-apple-darwin20
Running under: macOS Tahoe 26.4.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] DT_0.34.0                   RColorBrewer_1.1-3         
 [3] tidyr_1.3.2                 dplyr_1.2.1                
 [5] SummarizedExperiment_1.40.0 Biobase_2.70.0             
 [7] GenomicRanges_1.62.1        Seqinfo_1.0.0              
 [9] IRanges_2.44.0              S4Vectors_0.48.1           
[11] BiocGenerics_0.56.0         generics_0.1.4             
[13] MatrixGenerics_1.22.0       matrixStats_1.5.0          
[15] TCGAbiolinks_2.38.0         RCodeBox_0.2.0             

loaded via a namespace (and not attached):
  [1] rstudioapi_0.18.0           jsonlite_2.0.0             
  [3] shape_1.4.6.1               magrittr_2.0.5             
  [5] jomo_2.7-6                  farver_2.1.2               
  [7] logistf_1.26.1              nloptr_2.2.1               
  [9] rmarkdown_2.31              ragg_1.5.2                 
 [11] vctrs_0.7.3                 memoise_2.0.1              
 [13] minqa_1.2.8                 htmltools_0.5.9            
 [15] S4Arrays_1.10.1             progress_1.2.3             
 [17] curl_7.1.0                  broom_1.0.12               
 [19] SparseArray_1.10.10         mitml_0.4-5                
 [21] sass_0.4.10                 bslib_0.10.0               
 [23] htmlwidgets_1.6.4           plyr_1.8.9                 
 [25] httr2_1.2.2                 cachem_1.1.0               
 [27] lifecycle_1.0.5             iterators_1.0.14           
 [29] pkgconfig_2.0.3             Matrix_1.7-5               
 [31] R6_2.6.1                    fastmap_1.2.0              
 [33] rbibutils_2.4.1             digest_0.6.39              
 [35] AnnotationDbi_1.72.0        crosstalk_1.2.2            
 [37] textshaping_1.0.5           RSQLite_2.4.6              
 [39] filelock_1.0.3              labeling_0.4.3             
 [41] httr_1.4.8                  abind_1.4-8                
 [43] mgcv_1.9-4                  compiler_4.5.3             
 [45] bit64_4.8.0                 withr_3.0.2                
 [47] downloader_0.4.1            S7_0.2.2                   
 [49] backports_1.5.1             DBI_1.3.0                  
 [51] ggsignif_0.6.4              pan_1.9                    
 [53] biomaRt_2.66.1              MASS_7.3-65                
 [55] rappdirs_0.3.4              DelayedArray_0.36.1        
 [57] tools_4.5.3                 otel_0.2.0                 
 [59] zip_2.3.3                   nnet_7.3-20                
 [61] glue_1.8.1                  nlme_3.1-169               
 [63] grid_4.5.3                  operator.tools_1.6.3.1     
 [65] gtable_0.3.6                tzdb_0.5.0                 
 [67] formula.tools_1.7.1         data.table_1.18.2.1        
 [69] hms_1.1.4                   xml2_1.5.2                 
 [71] XVector_0.50.0              ggrepel_0.9.8              
 [73] foreach_1.5.2               pillar_1.11.1              
 [75] stringr_1.6.0               splines_4.5.3              
 [77] BiocFileCache_3.0.0         lattice_0.22-9             
 [79] survival_3.8-6              bit_4.6.0                  
 [81] tidyselect_1.2.1            Biostrings_2.78.0          
 [83] knitr_1.51                  reformulas_0.4.4           
 [85] svglite_2.2.2               xfun_0.57                  
 [87] stringi_1.8.7               yaml_2.3.12                
 [89] boot_1.3-32                 TCGAbiolinksGUI.data_1.30.0
 [91] kableExtra_1.4.0            evaluate_1.0.5             
 [93] codetools_0.2-20            tibble_3.3.1               
 [95] cli_3.6.6                   rpart_4.1.27               
 [97] systemfonts_1.3.2           Rdpack_2.6.6               
 [99] jquerylib_0.1.4             Rcpp_1.1.1-1.1             
[101] dbplyr_2.5.2                png_0.1-9                  
[103] XML_3.99-0.23               ggplot2_4.0.3              
[105] readr_2.2.0                 blob_1.3.0                 
[107] prettyunits_1.2.0           lme4_2.0-1                 
[109] glmnet_4.1-10               viridisLite_0.4.3          
[111] scales_1.4.0                openxlsx_4.2.8.1           
[113] purrr_1.2.2                 crayon_1.5.3               
[115] rlang_1.2.0                 KEGGREST_1.50.0            
[117] rvest_1.0.5                 mice_3.19.0