Impute missing values
impute.Rd
impute
imputes missing values in a proteomics dataset.
Usage
impute(
se,
fun = c("QRILC", "bpca", "knn", "MLE", "MinDet", "MinProb", "man", "min", "zero",
"mixed", "nbavg", "RF", "GSimp"),
...
)
Arguments
- se
SummarizedExperiment, Proteomics data (output from
make_se()
ormake_se_parse()
). It is adviced to first remove proteins with too many missing values usingfilter_se()
and normalize the data usingnormalize_vsn()
.- fun
"bpca", "knn", "QRILC", "MLE", "MinDet", "MinProb", "man", "min", "zero", "mixed", "nbavg", "GSimp" or "RF", Function used for data imputation based on
manual_impute
("man") ,impute,MSnSet-method
andmissForest
("RF").- ...
Additional arguments for imputation functions as depicted in
manual_impute
,missForest
andimpute_matrix
.
Examples
# Load example
data(Silicosis_pg)
data <- Silicosis_pg
data_unique <- make_unique(data, "Gene.names", "Protein.IDs", delim = ";")
# Make SummarizedExperiment
ecols <- grep("LFQ.", colnames(data_unique))
se <- make_se_parse(data_unique, ecols, mode = "delim", sep = "_")
# Filter and normalize
filt <- filter_se(se, thr = 0, fraction = 0.4, filter_formula = ~ Reverse != "+" & Potential.contaminant!="+")
#> filter base on missing number is <= 0 in at least one condition.
#> filter base on missing number fraction < 0.4 in each row
#> filter base on giving formula
norm <- normalize_vsn(filt)
#> vsn2: 8762 x 20 matrix (1 stratum).
#> Please use 'meanSdPlot' to verify the fit.
# Impute missing values using different functions
imputed_MinProb <- impute(norm, fun = "MinProb", q = 0.05)
#> Imputing along margin 2 (samples/columns).
#> [1] 0.3026531
imputed_manual <- impute(norm, fun = "man", shift = 1.8, scale = 0.3)
imputed_QRILC <- impute(norm, fun = "QRILC")
#> Imputing along margin 2 (samples/columns).
imputed_knn <- impute(norm, fun = "knn", k = 10, rowmax = 0.9)
#> Imputing along margin 1 (features/rows).
#> Cluster size 8762 broken into 7012 1750
#> Cluster size 7012 broken into 2751 4261
#> Cluster size 2751 broken into 1861 890
#> Cluster size 1861 broken into 825 1036
#> Done cluster 825
#> Done cluster 1036
#> Done cluster 1861
#> Done cluster 890
#> Done cluster 2751
#> Cluster size 4261 broken into 1832 2429
#> Cluster size 1832 broken into 757 1075
#> Done cluster 757
#> Done cluster 1075
#> Done cluster 1832
#> Cluster size 2429 broken into 1236 1193
#> Done cluster 1236
#> Done cluster 1193
#> Done cluster 2429
#> Done cluster 4261
#> Done cluster 7012
#> Cluster size 1750 broken into 1139 611
#> Done cluster 1139
#> Done cluster 611
#> Done cluster 1750
if (FALSE) {
imputed_MLE <- impute(norm, fun = "MLE")
imputed_RF <- impute(norm, fun = "RF") # may take several minutes.
}