Data import
Zhenhuan Feng
02_Data_import.Rmd
Abstract
This vigenette introduces how DEP2 receives results from upstream quantitative tools like MaxQuant, DIA-NN, Spectronaut, or Fragpipe.
Introduction
The DEP2
package utilizes the make_unique
function to format protein/PTM-peptide-level tables and then imports them into S4 objects via make_se/pe
function. The parameters in these functions can be adjusted to accommodate different result tables generated from search and quantitative software. Here, we present the workflow and provide detailed information on the inputs from four different quantitative software programs.
Example data
We obtained a benchmark proteomics dataset by spiking Yeast and E.coli lysate into a Hela background. The raw data was obtained using an Orbitrap Fusion Lumos Tribrid equipment in either DDA or DIA mode. The DDA data was searched and quantified using MaxQuant or FragPipe, while the DIA data was analyzed using DIA-NN or Spectronaut in library-free mode.
The example tables used in this paper can be found in this in this repository.
Constructing S4 object
The DEP2 analysis workflow is build upon S4 object, so the initial step is to format the table and convert it into an S4 container.
Converting protein-level data into SE format
For protein-level data, the gene name or protein ID should be used as the feature identifier. DEP2 provides make_unique
and make_unique_ptm
functions to format the feature identifiers in proteomics results. The make_unique
function cleans the identifiers, keeps the first name/ID in each group, and makes repeated identifiers unique by adding a suffix (‘ProXXX’, ‘ProXXX.1’, ‘ProXXX.2’). Repeated identifiers are common when library search is performed with isoforms or unreviewed sequences. The names and ids parameters should be gene/protein names or IDs. The unique identifiers are saved in the new columns “name” and “id”. Then, the make_se
function can convert the ‘uniqued’ table into a SE object.
Conversion from MaxQuant proteinGroups
In this pipeline, DEP2 use the table proteinGroups.txt
as input.
# Load proteinGroups table
mq_pgfile <- "./A_spikeIn/MaxQuant_output/proteinGroups.txt.gz"
pg <- read.csv(mq_pgfile,sep = "\t")
The names and ids are in the columns “Gene.names” and “Gene.names”, respectively.
## Generate unique names and ids
unique_pg <- make_unique(pg,
name = "Gene.names", #gene
ids = "Protein.IDs" #protein
)
## New columns "name" and "id", which is necessary to function make_se
head(unique_pg[,c("name","ID","Gene.names","Protein.IDs")])
#> name ID
#> 1 NUDT4 Q9NZJ9
#> 2 P0DPI2 P0DPI2
#> 3 A0A0U1RRE5 A0A0U1RRE5
#> 4 A0A0U1RRL7 A0A0U1RRL7
#> 5 JW5951 A0A385XJE6
#> 6 UBA6 A0AVT1
#> Gene.names
#> 1 NUDT4
#> 2
#> 3
#> 4
#> 5 JW5951;JW5936;JW5935;JW5928;JW5914;JW5895;JW1882;insH11;insH10;insH9;insH8;insH7;insH6;insH4;insH3;insH2;insH1;insH5
#> 6 UBA6
#> Protein.IDs
#> 1 Q9NZJ9;A0A024RBG1;Q96G61;Q8NFP7
#> 2 P0DPI2;A0A0B4J2D5
#> 3 A0A0U1RRE5
#> 4 A0A0U1RRL7
#> 5 A0A385XJE6;P0CE65;P0CE64;P0CE63;P0CE62;P0CE61;P0CE60;P0CE59;P0CE58;P0CE57;P0CE56;P0CE55;P0CE54;P0CE53;P0CE52;P0CE51;P0CE50;P0CE49;P76071
#> 6 A0AVT1
In this pipeline DEP2 uses SE object as the container to store expression assay, features information (row data) and experiment design (columns data). After make_unique
identifiers, DEP2 provides the make_se
/make_se_parse
functions to convert SummarizedExperiment object. The former required an input experiment design table to annotate samples, which must contain three columns: ‘label’, ‘condition’, and ‘replicate’. The ‘parse’ functions can automatically assign experiment design table by parsing the column names of the abundance assay.
## Expression columns in the input data
ecols <- grep("LFQ.intensity.", colnames(unique_pg))
## Experiment design table
expDesign_file <- "./A_spikeIn/MaxQuant_output/expdesign.txt.gz"
expdesign <- read.table(expDesign_file, sep = "\t", header = T)
expdesign
#> label condition replicate
#> 1 A1_1 A1 1
#> 2 A1_2 A1 2
#> 3 A1_3 A1 3
#> 4 B1_1 B1 1
#> 5 B1_2 B1 2
#> 6 B1_3 B1 3
## Convert SE with expdesign table
se <- make_se(unique_pg, columns = ecols, expdesign = expdesign)
## Convert SE with expdesign table using parse function
se <- make_se_parse(unique_pg, columns = ecols, mode = "delim")
How experiment design is extracted from colnames:
## How experiment design is extracted from colnames. Split by delim or by character number
sample_names = c("Quantity.A_1","Quantity.A_2","Quantity.B_1",
"Quantity.B_3","Quantity.B_2","Quantity.A_3")
get_exdesign_parse(sample_names,mode = "delim", sep = "_", remove_prefix = TRUE)
#> label ID condition replicate
#> 1 A_1 A_1 A 1
#> 2 A_2 A_2 A 2
#> 3 B_1 B_1 B 1
#> 4 B_3 B_3 B 3
#> 5 B_2 B_2 B 2
#> 6 A_3 A_3 A 3
sample_names = c("Quantity.A1","Quantity.A2","Quantity.B1",
"Quantity.B3","Quantity.B2","Quantity.A3")
get_exdesign_parse(sample_names,mode = "char", chars = 1, remove_prefix = TRUE)
#> label ID condition replicate
#> 1 A1 A1 A 1
#> 2 A2 A2 A 2
#> 3 B1 B1 B 1
#> 4 B3 B3 B 3
#> 5 B2 B2 B 2
#> 6 A3 A3 A 3
Conversion from FragPipe combined_protein
Next, we use the table combined_protein.tsv
from FragPipe as input.
# Load combined_protein.tsv table
FragP_pgfile <- "./A_spikeIn/FragPipe_output/combined_protein.tsv.gz"
FragP_pg <- read.csv(FragP_pgfile,sep = "\t")
Unique identifier. The names and ids are in the columns “Gene” and “Protein.ID”.
## Generate unique names and ids
unique_FragP_pg <- make_unique(FragP_pg,
name = "Gene", #gene
ids = "Protein.ID" #protein
)
## new columns name and id, which is necessary to make_se
head(unique_FragP_pg[,c("name", "ID", "Gene", "Protein.ID")])
#> name ID Gene Protein.ID
#> 1 ADH1 P00330 ADH1 P00330
#> 2 P00761 P00761 P00761
#> 3 CSN1S1 P02662 CSN1S1 P02662
#> 4 CSN1S2 P02663 CSN1S2 P02663
#> 5 CSN2 P02666 CSN2 P02666
#> 6 CSN3 P02668 CSN3 P02668
Convert SE. Here, We used the “MaxLFQ Intensity” values. “Intesity” is also practicable, but may bring some effect on the test result.
## Expression col in the combined_protein.tsv
ecols <- grep(".MaxLFQ.Intensity", colnames(unique_FragP_pg),value = T)
ecols
#> [1] "A_1.MaxLFQ.Intensity" "A_2.MaxLFQ.Intensity" "A_3.MaxLFQ.Intensity"
#> [4] "B_1.MaxLFQ.Intensity" "B_3.MaxLFQ.Intensity"
## Convert SE with expdesign table
se <- make_se_parse(unique_FragP_pg,
columns = ecols,
mode = "delim",
remove_suffix = TRUE ## remove the col suffix ".MaxLFQ.Intensity"
)
## The suffix is removed in SE
colData(se)
#> DataFrame with 5 rows and 4 columns
#> label ID condition replicate
#> <character> <character> <character> <character>
#> A_1 A_1 A_1 A 1
#> A_2 A_2 A_2 A 2
#> A_3 A_3 A_3 A 3
#> B_1 B_1 B_1 B 1
#> B_3 B_3 B_3 B 3
colnames(se)
#> [1] "A_1" "A_2" "A_3" "B_1" "B_3"
Conversion from DIA-NN result
As in the example above, make_se
and make_se_parse
accept wide-format tables which each row represents a features (a protein or a proteingroup), and rows contain the identification information and quantification information of samples. However, long-format tables are also widely used in many cases, such as the report.tsv output from DIA-NN or the MSstats output. Function reshape_long2wide
can turn long-format tables into wide tables. Next, we will show the difference between handling wide- or long-tables.
Wide table output
The report.pg_matrix.tsv from DIA-NN is a wide-format table similar to proteinGroups.
# Load combined_protein.tsv table
Diann_pgfile <- "./A_spikeIn/Diann_output/report.pg_matrix.tsv.gz"
Diann_pg <- read.csv(Diann_pgfile,sep = "\t", fileEncoding="latin1")
Unique identifier. The name and ids are in the columns “Genes” and “Protein.Group”.
## Generate unique names and ids
unique_diann_pg <- make_unique(Diann_pg,
name = "Genes", #gene
ids = "Protein.Group" #protein
)
## New columns ”name“ and "ID", which is necessary to make_se
head(unique_diann_pg[,c("name", "ID", "Genes", "Protein.Group")])
#> name ID Genes Protein.Group
#> 1 YER079C-A A0A023PZB8 YER079C-A A0A023PZB8
#> 2 PPIAL4C A0A0B4J2A2 PPIAL4C;PPIAL4G A0A0B4J2A2;P0DN37
#> 3 GATD3 A0A0B4J2D5 GATD3;GATD3B A0A0B4J2D5;P0DPI2
#> 4 PIGBOS1 A0A0B4J2F0 PIGBOS1 A0A0B4J2F0
#> 5 SIK1 A0A0B4J2F2 SIK1;SIK1B A0A0B4J2F2;P57059
#> 6 CENPVL1 A0A0U1RR11 CENPVL1;CENPVL2 A0A0U1RR11;P0DPI3
Convert SE. The expression columns are directly named by the file names of samples. We recommend to renaming the columns or renaming the MS files before DIAN-NN search.
## Expression col in the DIA-NN report.pg_matrix
ecols <- grep(".raw$", colnames(unique_diann_pg),value = T)
ecols
#> [1] "F..DIA_5.20211207_LM_SA_ZXF_LY_FZH_Hela_AH109_DH5.Á_FASP_DIA_A6_1.raw"
#> [2] "F..DIA_5.20211207_LM_SA_ZXF_LY_FZH_Hela_AH109_DH5.Á_FASP_DIA_A6_2.raw"
#> [3] "F..DIA_5.20211207_LM_SA_ZXF_LY_FZH_Hela_AH109_DH5.Á_FASP_DIA_A6_3.raw"
#> [4] "F..DIA_5.20211207_LM_SA_ZXF_LY_FZH_Hela_AH109_DH5.Á_FASP_DIA_B6_1.raw"
#> [5] "F..DIA_5.20211207_LM_SA_ZXF_LY_FZH_Hela_AH109_DH5.Á_FASP_DIA_B6_2.raw"
#> [6] "F..DIA_5.20211207_LM_SA_ZXF_LY_FZH_Hela_AH109_DH5.Á_FASP_DIA_B6_3.raw"
## Convert SE with expdesign table
se <- make_se_parse(unique_diann_pg,
columns = ecols,
mode = "delim",
remove_prefix = TRUE, ## remove the file prefix
remove_suffix = TRUE ## remove the col suffix ".raw"
)
## The file prefix and suffix is removed in SE
colData(se)
#> DataFrame with 6 rows and 4 columns
#> label ID condition replicate
#> <character> <character> <character> <character>
#> A6_1 A6_1 A6_1 A6 1
#> A6_2 A6_2 A6_2 A6 2
#> A6_3 A6_3 A6_3 A6 3
#> B6_1 B6_1 B6_1 B6 1
#> B6_2 B6_2 B6_2 B6 2
#> B6_3 B6_3 B6_3 B6 3
colnames(se)
#> [1] "A6_1" "A6_2" "A6_3" "B6_1" "B6_2" "B6_3"
Long table output
Unlike the above cases, the report.tsv from DIA-NN is in long format. Function reshape_long2wide
can reshape long tables to wide format, before constructing SE object.
## Load combined_protein.tsv table
Diann_repfile <- "./A_spikeIn/Diann_output/report.tsv.gz"
# Diann_repfile <- "./example_data/OmicsExample/A_spikeIn/Diann_output/report.tsv.gz"
Diann_rep <- read.csv(Diann_repfile,sep = "\t") #
dim(Diann_rep) # 390479 rows, each row is a precursor.
#> [1] 390479 64
## Filter out proteingroups exceeding the threshold value
Diann_rep = dplyr::filter(Diann_rep, PG.Q.Value < 0.01)
## Table report.tsv stores both Precursor- and PG-level quality
DT::datatable(head(Diann_rep) %>% mutate_if(is.character, utf8::utf8_encode), options = list(scrollX = T,pageLength = 6))
Reshape long-table.
Diann_rep_wided = reshape_long2wide(Diann_rep,
sample_col = "File.Name", # the column labeling sample names
feature_col = "Protein.Group", # PG IDs
expression_col = "PG.MaxLFQ", # PG quantity. Normalized one is also ok。
shrink_ident_cols = "Genes", # Gene names
extend_ident_cols = "Protein.Q.Value", # optional, some identification info.
remove_sample_prefix = FALSE, # remove prefix in sample_col
remove_sample_suffix = FALSE # remove suffix in sample_col
)
The reshaping result has ‘Genes’ in a column and extends the ‘Protein.Q.Value’ to 6 columns, each representing a different sample. The variables in shrink_ident_cols are combined into a string by concatenating all the values for each feature and separating them with a ‘;’. The variables in extend_ident_cols are expanded into multiple columns, with each column containing the values for each sample (e.g., “Q.Value”).
Unique identifier. The names and ids are in the columns “Genes” and “Protein.Group”.
## Generate unique names and ids
unique_diann_pg2 <- make_unique(Diann_rep_wided,
name = "Genes", #gene
ids = "Protein.Group" #protein
)
## new columns name and id, which is necessary to make_se
head(unique_diann_pg2[,c("name", "ID", "Genes", "Protein.Group")],4)
#> name ID Genes
#> 1 GATD3 A0A0B4J2D5 GATD3;GATD3B
#> 2 PIGBOS1 A0A0B4J2F0 PIGBOS1
#> 3 SIK1 A0A0B4J2F2 SIK1;SIK1B
#> 4 insA1 A0A385XJ53 insA1;insA2;insA3;insA4;insA5;insA6;insA8;insA9
#> Protein.Group
#> 1 A0A0B4J2D5;P0DPI2
#> 2 A0A0B4J2F0
#> 3 A0A0B4J2F2;P57059
#> 4 A0A385XJ53;P0CF07;P0CF08;P0CF09;P0CF10;P0CF11;P0CF12;P0CF13
Convert SE. The expression columns are also directly named by the file names of samples.
## Expression col in the DIA-NN report.pg_matrix
ecols <- grep(".raw$", colnames(unique_diann_pg2),value = T)
ecols # contains Protein.Q.Value columns
#> [1] "F..DIA_5.20211207_LM_SA_ZXF_LY_FZH_Hela_AH109_DH5.Á_FASP_DIA_A6_1.raw"
#> [2] "F..DIA_5.20211207_LM_SA_ZXF_LY_FZH_Hela_AH109_DH5.Á_FASP_DIA_A6_2.raw"
#> [3] "F..DIA_5.20211207_LM_SA_ZXF_LY_FZH_Hela_AH109_DH5.Á_FASP_DIA_A6_3.raw"
#> [4] "F..DIA_5.20211207_LM_SA_ZXF_LY_FZH_Hela_AH109_DH5.Á_FASP_DIA_B6_1.raw"
#> [5] "F..DIA_5.20211207_LM_SA_ZXF_LY_FZH_Hela_AH109_DH5.Á_FASP_DIA_B6_2.raw"
#> [6] "F..DIA_5.20211207_LM_SA_ZXF_LY_FZH_Hela_AH109_DH5.Á_FASP_DIA_B6_3.raw"
#> [7] "Protein.Q.Value.F..DIA_5.20211207_LM_SA_ZXF_LY_FZH_Hela_AH109_DH5.Á_FASP_DIA_A6_1.raw"
#> [8] "Protein.Q.Value.F..DIA_5.20211207_LM_SA_ZXF_LY_FZH_Hela_AH109_DH5.Á_FASP_DIA_A6_2.raw"
#> [9] "Protein.Q.Value.F..DIA_5.20211207_LM_SA_ZXF_LY_FZH_Hela_AH109_DH5.Á_FASP_DIA_A6_3.raw"
#> [10] "Protein.Q.Value.F..DIA_5.20211207_LM_SA_ZXF_LY_FZH_Hela_AH109_DH5.Á_FASP_DIA_B6_1.raw"
#> [11] "Protein.Q.Value.F..DIA_5.20211207_LM_SA_ZXF_LY_FZH_Hela_AH109_DH5.Á_FASP_DIA_B6_2.raw"
#> [12] "Protein.Q.Value.F..DIA_5.20211207_LM_SA_ZXF_LY_FZH_Hela_AH109_DH5.Á_FASP_DIA_B6_3.raw"
ecols = ecols[1:6]
## Convert SE with expdesign table
se <- make_se_parse(unique_diann_pg2,
columns = ecols,
mode = "delim",
remove_prefix = TRUE, ## remove the file prefix
remove_suffix = TRUE ## remove the col suffix ".raw"
)
## The file prefix and suffix is removed in SE
colData(se)
#> DataFrame with 6 rows and 4 columns
#> label ID condition replicate
#> <character> <character> <character> <character>
#> A6_1 A6_1 A6_1 A6 1
#> A6_2 A6_2 A6_2 A6 2
#> A6_3 A6_3 A6_3 A6 3
#> B6_1 B6_1 B6_1 B6 1
#> B6_2 B6_2 B6_2 B6 2
#> B6_3 B6_3 B6_3 B6 3
colnames(se)
#> [1] "A6_1" "A6_2" "A6_3" "B6_1" "B6_2" "B6_3"
Conversion from Spectronaut Report
Spectronaut offers the ability to customize the output table. In this example, we will use the built-in MSstats style table, which is in a long-format.
Spe_repfile <- "./A_spikeIn/Spectronaut_output/DIA_MSStats_Report.xls.gz"
Spe_rep <- read.csv(Spe_repfile,sep = "\t")
dim(Spe_rep) # 318132 rows, each row is a precursor.
#> [1] 449106 12
## Store both Precursor-, Peptides- and PG-level quality
DT::datatable(head(Spe_rep) %>% mutate_if(is.character, utf8::utf8_encode), options = list(scrollX = T,pageLength = 6))
Reshape long table.
Spe_rep_wided = reshape_long2wide(Spe_rep,
sample_col = "R.FileName", # the column labeling sample names
feature_col = "PG.ProteinGroups", # PG IDs
expression_col = "PG.Quantity", # PG quantity.
shrink_ident_cols = c("PG.Genes","PG.Qvalue"), # Gene names
remove_sample_prefix = TRUE, # remove prefix in sample_col
remove_sample_suffix = FALSE
)
Unique identifier. The names and Protein.Group are located in the “Genes” and “Protein.Group” columns.
## Generate unique names and ids
unique_spe_pg <- make_unique(Spe_rep_wided,
name = "PG.Genes", #gene
ids = "PG.ProteinGroups" #protein
)
## new columns name and id, which is necessary to make_se
head(unique_spe_pg[,c("name", "ID", "PG.Genes", "PG.ProteinGroups")],4)
#> name ID PG.Genes
#> 1 NUDT4B A0A024RBG1 NUDT4B;NUDT4
#> 2 GATD3B A0A0B4J2D5 GATD3B;GATD3
#> 3 MMP24OS A0A0U1RRL7 MMP24OS
#> 4 insA9 A0A385XJ53 insA9;insA1;insA2;insA3;insA4;insA5;insA6;insA8
#> PG.ProteinGroups
#> 1 A0A024RBG1;Q9NZJ9
#> 2 A0A0B4J2D5;P0DPI2
#> 3 A0A0U1RRL7
#> 4 A0A385XJ53;P0CF07;P0CF08;P0CF09;P0CF10;P0CF11;P0CF12;P0CF13
Convert SE. The expression columns is directly named by the file names of samples.
## Expression columns in the Spectronaut output
ecols <- 2:7
colnames(unique_spe_pg)[ecols]
#> [1] "A5_1" "A5_2" "A5_3" "B5_1" "B5_2" "B5_3"
## Convert SE with expdesign table
se <- make_se_parse(unique_spe_pg,
columns = ecols,
mode = "delim",
remove_prefix = TRUE, ## remove the file prefix
remove_suffix = FALSE ## no suffix in this case
)
## The file prefix and suffix is removed in SE
colData(se)
#> DataFrame with 6 rows and 4 columns
#> label ID condition replicate
#> <character> <character> <character> <character>
#> A5_1 A5_1 A5_1 A5 1
#> A5_2 A5_2 A5_2 A5 2
#> A5_3 A5_3 A5_3 A5 3
#> B5_1 B5_1 B5_1 B5 1
#> B5_2 B5_2 B5_2 B5 2
#> B5_3 B5_3 B5_3 B5 3
colnames(se)
#> [1] "A5_1" "A5_2" "A5_3" "B5_1" "B5_2" "B5_3"
Converting peptide-level data to a QFeatures object
DEP2 provides a pipeline, using the QFeature package, to aggregate and summarize peptide quantities into protein-level.The protein-level quantities used in the preceding section were counted by upstream software and mostly summarized using the maxLFQ algorithm. DEP2 offers other aggregation strategies in this peptide-to-protein analysis pipeline.
The following code snippet demonstrates the conversion of peptide-level data to a QFeatures object using different software outputs.
Conversion from MaxQuant peptides
The first step is to load in peptide-level data into QFeatures class via the make_pe
or make_pe_pars
function. We start by reading the peptides.txt file from the MaxQuant txt directory.
mq_pepfile <- "./A_spikeIn/MaxQuant_output/peptides.txt.gz"
mq_pep <- read.csv(mq_pepfile,sep = "\t")
To convert the data into a QFeatures
object, we extract the ‘intensity’ columns.
Conversion from FragPipe combined_peptides
Next, we process the combined_peptide.tsv file generated by FragPipe.
FragP_pepfile <- "./A_spikeIn/FragPipe_output/combined_peptide.tsv.gz"
FragP_pep <- read.csv(FragP_pepfile,sep = "\t")
We use the ‘Intensity’ columns as expression columns.
ecols <- grep("[0-9].Intensity", colnames(FragP_pep),value = T) ## the peptides intensity cols
FragP_pe <- make_pe_parse(FragP_pep, columns = ecols,
mode = "delim",
remove_suffix = T ## remove suffix
)
colData(FragP_pe)
#> DataFrame with 5 rows and 4 columns
#> label ID condition replicate
#> <character> <character> <character> <integer>
#> A_1 A_1 A_1 A 1
#> A_2 A_2 A_2 A 2
#> A_3 A_3 A_3 A 3
#> B_1 B_1 B_1 B 1
#> B_3 B_3 B_3 B 3
Conversion from DIA-NN report
We reshape the peptide quantities into a wide table using the reshape_long2wide
function. The ‘Stripped.Sequence’ column represents the peptide IDs. A stripped peptide may have multiple precursors due to variable modifications or different charge states. For these peptides, the function retains the maximum expression values.
## filter out proteingroups exceed the threshold value
Diann_rep = dplyr::filter(Diann_rep, PG.Q.Value < 0.01)
Diann_pep_wided = reshape_long2wide(Diann_rep,
sample_col = "File.Name", # the column labeling sample names
feature_col = "Stripped.Sequence", # PG IDs
expression_col = "Precursor.Quantity", # PG quantity. Normalized one is also ok。
shrink_ident_cols = c("Protein.Group","Genes"), # Gene names
extend_ident_cols = c("Global.Q.Value"), # optional, some identification info.
remove_sample_prefix = FALSE, # remove prefix in sample_col
remove_sample_suffix = FALSE # remove suffix in sample_col
)
ecols = 2:7
pe <- make_pe_parse(Diann_pep_wided, columns = ecols,
mode = "delim",
remove_suffix = T ## remove suffix
)
Conversion from Spectronaut result
The Spectronaut result also store the peptide-level quantities.
Spe_repfile <- "./A_spikeIn/Spectronaut_output/DIA_MSStats_Report.xls.gz"
Spe_rep <- read.csv(Spe_repfile,sep = "\t") #
dim(Spe_rep) # 318132 rows, each row is a precursor.
#> [1] 449106 12
## Store both Precursor-, Peptides- and PG-level quality
DT::datatable(head(Spe_rep) %>% mutate_if(is.character, utf8::utf8_encode), options = list(scrollX = T,pageLength = 6))
Reshape the long-table.
Spe_rep_wided2 = reshape_long2wide(Spe_rep,
sample_col = "R.FileName", # the column labeling sample names
feature_col = "PEP.StrippedSequence", # peptide sequence
expression_col = "PEP.Quantity", # peptide quantity.
shrink_ident_cols = c("PG.Genes","PG.ProteinAccessions"), # Gene names and protien IDs
remove_sample_prefix = TRUE, # remove prefix in sample_col
remove_sample_suffix = FALSE
)
Convert QFeatures object
ecols = 2:7
pe <- make_pe_parse(Spe_rep_wided2, columns = ecols,
mode = "delim",
remove_suffix = T ## remove suffix
)
colData(pe)
#> DataFrame with 6 rows and 4 columns
#> label ID condition replicate
#> <character> <character> <character> <integer>
#> A5_1 A5_1 A5_1 A5 1
#> A5_2 A5_2 A5_2 A5 2
#> A5_3 A5_3 A5_3 A5 3
#> B5_1 B5_1 B5_1 B5 1
#> B5_2 B5_2 B5_2 B5 2
#> B5_3 B5_3 B5_3 B5 3
Session information
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#>
#> locale:
#> [1] LC_CTYPE=zh_CN.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=zh_CN.UTF-8 LC_COLLATE=zh_CN.UTF-8
#> [5] LC_MONETARY=zh_CN.UTF-8 LC_MESSAGES=zh_CN.UTF-8
#> [7] LC_PAPER=zh_CN.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Asia/Shanghai
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] dplyr_1.1.2 DEP2_0.3.7.3
#> [3] R6_2.5.1 limma_3.56.2
#> [5] MSnbase_2.26.0 ProtGenerics_1.32.0
#> [7] mzR_2.34.1 Rcpp_1.0.11
#> [9] MsCoreUtils_1.12.0 SummarizedExperiment_1.30.2
#> [11] Biobase_2.60.0 GenomicRanges_1.52.0
#> [13] GenomeInfoDb_1.36.1 IRanges_2.34.1
#> [15] S4Vectors_0.38.1 BiocGenerics_0.46.0
#> [17] MatrixGenerics_1.12.2 matrixStats_1.0.0
#> [19] BiocStyle_2.28.0
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3 rstudioapi_0.15.0
#> [3] jsonlite_1.8.7 shape_1.4.6
#> [5] umap_0.2.10.0 MultiAssayExperiment_1.26.0
#> [7] magrittr_2.0.3 MALDIquant_1.22.1
#> [9] rmarkdown_2.23 GlobalOptions_0.1.2
#> [11] fs_1.6.2 zlibbioc_1.46.0
#> [13] ragg_1.2.5 vctrs_0.6.3
#> [15] memoise_2.0.1 Rsamtools_2.16.0
#> [17] RCurl_1.98-1.12 askpass_1.1
#> [19] BiocBaseUtils_1.2.0 itertools_0.1-3
#> [21] htmltools_0.5.5 S4Arrays_1.0.4
#> [23] missForest_1.5 mzID_1.38.0
#> [25] sass_0.4.6 bslib_0.5.0
#> [27] htmlwidgets_1.6.2 desc_1.4.2
#> [29] plyr_1.8.8 impute_1.74.1
#> [31] cachem_1.0.8 GenomicAlignments_1.36.0
#> [33] igraph_1.5.0 lifecycle_1.0.3
#> [35] iterators_1.0.14 pkgconfig_2.0.3
#> [37] Matrix_1.5-4.1 fastmap_1.1.1
#> [39] GenomeInfoDbData_1.2.10 clue_0.3-64
#> [41] digest_0.6.33 fdrtool_1.2.17
#> [43] pcaMethods_1.92.0 colorspace_2.1-0
#> [45] DESeq2_1.40.2 rprojroot_2.0.3
#> [47] RSpectra_0.16-1 crosstalk_1.2.0
#> [49] textshaping_0.3.6 randomForest_4.7-1.1
#> [51] fansi_1.0.4 compiler_4.3.1
#> [53] rngtools_1.5.2 proxy_0.4-27
#> [55] withr_2.5.0 doParallel_1.0.17
#> [57] downloader_0.4 BiocParallel_1.34.2
#> [59] MASS_7.3-60 openssl_2.0.6
#> [61] DelayedArray_0.26.6 rjson_0.2.21
#> [63] tools_4.3.1 glue_1.6.2
#> [65] QFeatures_1.10.0 grid_4.3.1
#> [67] Rtsne_0.16 cluster_2.1.4
#> [69] reshape2_1.4.4 generics_0.1.3
#> [71] gtable_0.3.3 class_7.3-22
#> [73] preprocessCore_1.62.1 tidyr_1.3.0
#> [75] data.table_1.14.8 utf8_1.2.3
#> [77] XVector_0.40.0 foreach_1.5.2
#> [79] pillar_1.9.0 stringr_1.5.0
#> [81] circlize_0.4.15 splines_4.3.1
#> [83] lattice_0.21-8 survival_3.5-5
#> [85] tidyselect_1.2.0 ComplexHeatmap_2.16.0
#> [87] locfit_1.5-9.8 Biostrings_2.68.1
#> [89] knitr_1.43 bookdown_0.34
#> [91] edgeR_3.42.4 xfun_0.39
#> [93] DT_0.28 stringi_1.7.12
#> [95] lazyeval_0.2.2 yaml_2.3.7
#> [97] evaluate_0.21 codetools_0.2-19
#> [99] tibble_3.2.1 BiocManager_1.30.21
#> [101] cli_3.6.1 affyio_1.70.0
#> [103] reticulate_1.30 systemfonts_1.0.4
#> [105] munsell_0.5.0 jquerylib_0.1.4
#> [107] TCseq_1.23.0 png_0.1-8
#> [109] XML_3.99-0.14 parallel_4.3.1
#> [111] ellipsis_0.3.2 pkgdown_2.0.7
#> [113] ggplot2_3.4.2 assertthat_0.2.1
#> [115] doRNG_1.8.6 AnnotationFilter_1.24.0
#> [117] bitops_1.0-7 glmnet_4.1-7
#> [119] scales_1.2.1 affy_1.78.1
#> [121] e1071_1.7-13 ncdf4_1.21
#> [123] purrr_1.0.1 crayon_1.5.2
#> [125] GetoptLong_1.0.5 rlang_1.1.1
#> [127] vsn_3.68.0