Replacement of Missing Values and Below-Detection Zeros in Compositional Data

Performs imputation (replacement) of missing values and/or values below the detection limit (BDL) in compositional datasets using the EM-algorithm assuming normality on the Simplex. This function is designed to prepare compositional data for subsequent log-ratio transformations.

coda_replacement(
  X,
  DL = NULL,
  dl_prop = 0.65,
  eps = 1e-04,
  parameters = FALSE,
  debug = FALSE
)

Arguments

X: A compositional dataset: numeric matrix or data frame where rows represent observations and columns represent parts.
DL: An optional matrix or vector of detection limits. If NULL, the minimum non-zero value in each column of X is used.
dl_prop: A numeric value between 0 and 1, used for initialization in the EM algorithm (default is 0.65).
eps: A small positive value controlling the convergence criterion for the EM algorithm (default is 1e-4).
parameters: Logical. If TRUE, returns additional output including estimated multivariate normal parameters (default is FALSE).
debug: Logical. Show the log-likelihood in every iteration.

Value

If parameters = FALSE, returns a numeric matrix with imputed values. If parameters = TRUE, returns a list with two components:

X_imp: The imputed compositional data matrix.
info: A list containing information about the EM algorithm parameters and convergence diagnostics.

Details

- Missing values are imputed based on a multivariate normal model on the simplex. - Zeros are treated as censored values and replaced accordingly. - The EM algorithm iteratively estimates the missing parts and model parameters. - To initialize the EM algorithm, zero values (considered below the detection limit) are replaced with a small positive value. Specifically, each zero is replaced by dl_prop times the detection limit of that part (column). This restrictions is imposed in the geometric mean of the parts with zeros against the non-missing positive values, helping to preserve the compositional structure in the simplex.

Examples

# Simulate compositional data with zeros
set.seed(123)
X <- abs(matrix(rnorm(100), ncol = 5))
X[sample(length(X), 10)] <- 0  # Introduce some zeros
X[sample(length(X), 10)] <- NA  # Introduce some NAs
# Apply replacement
summary(X/rowSums(X, na.rm=TRUE))
#>        V1                V2               V3                V4         
#>  Min.   :0.00000   Min.   :0.0000   Min.   :0.00000   Min.   :0.00000  
#>  1st Qu.:0.08166   1st Qu.:0.1276   1st Qu.:0.06565   1st Qu.:0.06958  
#>  Median :0.17094   Median :0.2154   Median :0.14397   Median :0.22717  
#>  Mean   :0.21250   Mean   :0.2317   Mean   :0.19922   Mean   :0.23875  
#>  3rd Qu.:0.29119   3rd Qu.:0.3202   3rd Qu.:0.29159   3rd Qu.:0.35619  
#>  Max.   :0.85974   Max.   :0.5172   Max.   :0.54402   Max.   :0.65742  
#>  NA's   :1         NA's   :2        NA's   :5                          
#>        V5         
#>  Min.   :0.00000  
#>  1st Qu.:0.06516  
#>  Median :0.17825  
#>  Mean   :0.22380  
#>  3rd Qu.:0.38013  
#>  Max.   :0.58262  
#>  NA's   :2        
summary(coda_replacement(X))
#>        V1                V2                V3                V4         
#>  Min.   :0.01229   Min.   :0.01068   Min.   :0.01825   Min.   :0.01008  
#>  1st Qu.:0.08175   1st Qu.:0.13631   1st Qu.:0.08862   1st Qu.:0.06958  
#>  Median :0.16266   Median :0.19768   Median :0.15228   Median :0.19568  
#>  Mean   :0.19994   Mean   :0.21418   Mean   :0.18270   Mean   :0.21449  
#>  3rd Qu.:0.25681   3rd Qu.:0.30352   3rd Qu.:0.26184   3rd Qu.:0.31558  
#>  Max.   :0.84115   Max.   :0.49140   Max.   :0.44513   Max.   :0.56703  
#>        V5           
#>  Min.   :0.0006653  
#>  1st Qu.:0.0488225  
#>  Median :0.1497661  
#>  Mean   :0.1886843  
#>  3rd Qu.:0.3208736  
#>  Max.   :0.5678503