Skip to contents

impute imputes missing values in a proteomics dataset.

Usage

impute(
  se,
  fun = c("QRILC", "bpca", "knn", "MLE", "MinDet", "MinProb", "man", "min", "zero",
    "mixed", "nbavg", "RF", "GSimp"),
  ...
)

Arguments

se

SummarizedExperiment, Proteomics data (output from make_se() or make_se_parse()). It is adviced to first remove proteins with too many missing values using filter_se() and normalize the data using normalize_vsn().

fun

"bpca", "knn", "QRILC", "MLE", "MinDet", "MinProb", "man", "min", "zero", "mixed", "nbavg", "GSimp" or "RF", Function used for data imputation based on manual_impute("man") , impute,MSnSet-method and missForest("RF").

...

Additional arguments for imputation functions as depicted in manual_impute, missForest and impute_matrix.

Value

An imputed SummarizedExperiment object.

Examples

# Load example
data(Silicosis_pg)
data <- Silicosis_pg
data_unique <- make_unique(data, "Gene.names", "Protein.IDs", delim = ";")

# Make SummarizedExperiment
ecols <- grep("LFQ.", colnames(data_unique))
se <- make_se_parse(data_unique, ecols, mode = "delim", sep = "_")

# Filter and normalize
filt <- filter_se(se, thr = 0, fraction = 0.4, filter_formula = ~ Reverse != "+" & Potential.contaminant!="+")
#> filter base on missing number is <= 0 in at least one condition.
#> filter base on missing number fraction < 0.4 in each row
#> filter base on giving formula 
norm <- normalize_vsn(filt)
#> vsn2: 8762 x 20 matrix (1 stratum). 
#> Please use 'meanSdPlot' to verify the fit.

# Impute missing values using different functions
imputed_MinProb <- impute(norm, fun = "MinProb", q = 0.05)
#> Imputing along margin 2 (samples/columns).
#> [1] 0.3026531
imputed_manual <- impute(norm, fun = "man", shift = 1.8, scale = 0.3)

imputed_QRILC <- impute(norm, fun = "QRILC")
#> Imputing along margin 2 (samples/columns).
imputed_knn <- impute(norm, fun = "knn", k = 10, rowmax = 0.9)
#> Imputing along margin 1 (features/rows).
#> Cluster size 8762 broken into 7012 1750 
#> Cluster size 7012 broken into 2751 4261 
#> Cluster size 2751 broken into 1861 890 
#> Cluster size 1861 broken into 825 1036 
#> Done cluster 825 
#> Done cluster 1036 
#> Done cluster 1861 
#> Done cluster 890 
#> Done cluster 2751 
#> Cluster size 4261 broken into 1832 2429 
#> Cluster size 1832 broken into 757 1075 
#> Done cluster 757 
#> Done cluster 1075 
#> Done cluster 1832 
#> Cluster size 2429 broken into 1236 1193 
#> Done cluster 1236 
#> Done cluster 1193 
#> Done cluster 2429 
#> Done cluster 4261 
#> Done cluster 7012 
#> Cluster size 1750 broken into 1139 611 
#> Done cluster 1139 
#> Done cluster 611 
#> Done cluster 1750 

if (FALSE) {
imputed_MLE <- impute(norm, fun = "MLE")

imputed_RF <- impute(norm, fun = "RF") # may take several minutes.
}