R/zero_replacement_imputation.R
coda_replacement.Rd
Performs imputation (replacement) of missing values and/or values below the detection limit (BDL) in compositional datasets using the EM-algorithm assuming normality on the Simplex. This function is designed to prepare compositional data for subsequent log-ratio transformations.
coda_replacement(
X,
DL = NULL,
dl_prop = 0.65,
eps = 1e-04,
parameters = FALSE,
debug = FALSE
)
A compositional dataset: numeric matrix or data frame where rows represent observations and columns represent parts.
An optional matrix or vector of detection limits. If NULL
, the minimum non-zero value in each column of X
is used.
A numeric value between 0 and 1, used for initialization in the EM algorithm (default is 0.65).
A small positive value controlling the convergence criterion for the EM algorithm (default is 1e-4
).
Logical. If TRUE
, returns additional output including estimated multivariate normal parameters (default is FALSE
).
Logical. Show the log-likelihood in every iteration.
If parameters = FALSE
, returns a numeric matrix with imputed values.
If parameters = TRUE
, returns a list with two components:
The imputed compositional data matrix.
A list containing information about the EM algorithm parameters and convergence diagnostics.
- Missing values are imputed based on a multivariate normal model on the simplex.
- Zeros are treated as censored values and replaced accordingly.
- The EM algorithm iteratively estimates the missing parts and model parameters.
- To initialize the EM algorithm, zero values (considered below the detection limit) are replaced with a small positive value. Specifically, each zero is replaced by dl_prop
times the detection limit of that part (column). This restrictions is imposed in the geometric mean of the parts with zeros against the non-missing positive values, helping to preserve the compositional structure in the simplex.
# Simulate compositional data with zeros
set.seed(123)
X <- abs(matrix(rnorm(100), ncol = 5))
X[sample(length(X), 10)] <- 0 # Introduce some zeros
X[sample(length(X), 10)] <- NA # Introduce some NAs
# Apply replacement
summary(X/rowSums(X, na.rm=TRUE))
#> V1 V2 V3 V4
#> Min. :0.00000 Min. :0.0000 Min. :0.00000 Min. :0.00000
#> 1st Qu.:0.08166 1st Qu.:0.1276 1st Qu.:0.06565 1st Qu.:0.06958
#> Median :0.17094 Median :0.2154 Median :0.14397 Median :0.22717
#> Mean :0.21250 Mean :0.2317 Mean :0.19922 Mean :0.23875
#> 3rd Qu.:0.29119 3rd Qu.:0.3202 3rd Qu.:0.29159 3rd Qu.:0.35619
#> Max. :0.85974 Max. :0.5172 Max. :0.54402 Max. :0.65742
#> NA's :1 NA's :2 NA's :5
#> V5
#> Min. :0.00000
#> 1st Qu.:0.06516
#> Median :0.17825
#> Mean :0.22380
#> 3rd Qu.:0.38013
#> Max. :0.58262
#> NA's :2
summary(coda_replacement(X))
#> V1 V2 V3 V4
#> Min. :0.01229 Min. :0.01068 Min. :0.01825 Min. :0.01008
#> 1st Qu.:0.08175 1st Qu.:0.13631 1st Qu.:0.08862 1st Qu.:0.06958
#> Median :0.16266 Median :0.19768 Median :0.15228 Median :0.19568
#> Mean :0.19994 Mean :0.21418 Mean :0.18270 Mean :0.21449
#> 3rd Qu.:0.25681 3rd Qu.:0.30352 3rd Qu.:0.26184 3rd Qu.:0.31558
#> Max. :0.84115 Max. :0.49140 Max. :0.44513 Max. :0.56703
#> V5
#> Min. :0.0006653
#> 1st Qu.:0.0488225
#> Median :0.1497661
#> Mean :0.1886843
#> 3rd Qu.:0.3208736
#> Max. :0.5678503