Finding Common Origins of Milky Way Stars

Author

Andersen Chang, Tiffany M. Tang, Tarek M. Zikry, Genevera I. Allen

Published

June 4, 2025

Exploratory Data Analysis

In this section, we conduct a brief exploratory data analysis, visualizing the overall distributions of the chemical abundance data, the galactic coordinates of the stars, and pairwise relationships between the chemical abundance features. For simplicity, all plots shown here are based on the mean-imputed training data.

Show Code
# get X (using mean-imputed) and metadata for EDA
metadata <- metadata$train
feature_modes <- list("small" = 7, "medium" = 11, "big" = 19)
train_data_ls <- purrr::map(
  feature_modes,
  ~ get_abundance_data(data_mean_imputed$train, feature_mode = .x)
)
features_ls <- purrr::map(train_data_ls, ~ colnames(.x))
X <- train_data_ls$big

Feature Distributions

Show Code
# plot overall distribution
ggwrappers::plot_histogram(X) +
  ggplot2::facet_wrap(~ variable, scales = "free_x") +
  ggplot2::labs(x = "Data")

Distribution of abundance values per feature in (mean-imputed) training data.

Distribution of abundance values per feature in (mean-imputed) training data.
Show Code
# plot boxplots per GC
plt_df <- dplyr::bind_cols(
  X,
  metadata |> dplyr::select(GC_NAME)
) |> 
  dplyr::group_by(GC_NAME) |> 
  dplyr::mutate(
    GC_NAME = sprintf("%s (n = %d)", GC_NAME, dplyr::n())
  ) |> 
  dplyr::ungroup()
plt_vars <- sort(colnames(X))
plt_ls <- list()
for (plt_var in plt_vars) {
  plt_ls[[plt_var]] <- plt_df |> 
    ggplot2::ggplot() +
    ggplot2::aes(
      x = reorder(GC_NAME, !!rlang::sym(plt_var)), 
      y = !!rlang::sym(plt_var),
      fill = GC_NAME
    ) +
    ggplot2::geom_boxplot() +
    ggplot2::labs(x = "GC Name") +
    vthemes::theme_vmodern() +
    ggplot2::theme(
      axis.text.x = ggplot2::element_text(
        angle = 90, hjust = 1, vjust = 0.5
      ),
      legend.position = "none"
    )
}
plt <- patchwork::wrap_plots(plt_ls, ncol = 2) +
  patchwork::plot_layout(axis_titles = "collect")
subchunkify(
  plt, fig_height = 30, fig_width = 10, 
  caption = "'Distribution of abundance values per feature and GC in (mean-imputed) training data.'"
)

Distribution of abundance values per feature and GC in (mean-imputed) training data.

Distribution of abundance values per feature and GC in (mean-imputed) training data.

Galactic Coordinate Plots

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by GC_NAME.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by GC_NAME.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by AL_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by AL_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by C_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by C_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by CA_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by CA_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by CI_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by CI_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by CO_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by CO_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by CR_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by CR_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by FE_H.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by FE_H.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by K_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by K_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by MG_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by MG_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by MN_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by MN_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by N_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by N_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by NA_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by NA_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by NI_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by NI_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by O_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by O_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by S_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by S_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by SI_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by SI_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by TI_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by TI_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by TIII_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by TIII_FE.

Unjittered

Unjittered scatter plot of the star locations (given by galactic coordinates), colored by V_FE.

Jittered

Jittered scatter plot of the star locations (given by galactic coordinates), colored by V_FE.

Pair Plots