The Proteomics Landscape: Insights into Protein Identification, Quantification, and Post-Translational Modifications
Proteomics is a rapidly advancing field that plays a pivotal role in understanding the molecular mechanisms underlying biological processes. Proteomics is the study of proteins on a comprehensive scale, focusing on analyzing the dynamic changes in protein composition, expression levels, and modification states within cells or tissues. This field seeks to uncover protein-protein interactions and reveal their functional roles in cellular activities. Unlike genomics and transcriptomics, proteomics provides insights that are much closer to biological phenotypes, offering a deeper understanding of the molecular processes underlying cellular, tissue, and organismal states. Proteomics research can be categorized into three main areas: protein identification, quantitative proteomics, and protein modification analysis. This blog will provide an overview of these key areas, highlighting their methodologies, applications, and contributions to advancing our understanding of biological systems.
Protein Identification
Protein Identification involves determining whether proteins are present in a sample and identifying the specific proteins. Protein identification uses experimental techniques to determine the identity, quantity, structure, and function of proteins within a sample. The most widely-used method for high-throughput, full-spectrum protein identification is liquid chromatography-mass spectrometry (LC-MS/MS), which employs soft ionization techniques. LC-MS/MS allows for highly sensitive and high-throughput protein separation, identification, and analysis. The technique works by digesting proteins into peptides, ionizing them in a mass spectrometer, and measuring their mass-to-charge ratios to identify the proteins.
Sample Types:
- Target Protein Identification: Gel bands, lyophilized protein powders, or protein solutions.
- Protein-Protein Interaction Identification: Samples from immunoprecipitation (IP), co-immunoprecipitation (Co-IP), or pull-down experiments, including magnetic beads, protein elution solutions, and gel bands.
- Full-Spectrum Protein Identification: Samples from animal or plant tissues, cells, etc.
Experimental Workflow:
1. Sample Preparation: Depending on the sample type, protein identification requires different preprocessing approaches. There are two primary methods based on the sample type.
Method 1: Standard Tissue or Bead Sample Processing
(1) Sample preparation
(2) Protein extraction
(3) Reduction and alkylation, followed by trypsin digestion
(4) Peptide desalting, ABC quantification
(5) Mass spectrometry analysis
Method 2: Gel-based Sample Processing
For gel strip samples, in-gel digestion is necessary to obtain peptides.
(1) Cut the target band from the gel strip
(2) Retrieve the target band
(3) Destain the gel, followed by in-gel reduction, alkylation, and trypsin digestion
(4) Extract peptides
(5) Desalt peptides, ABC quantification
(6) Mass spectrometry analysis
2. Protein Detection: Perform liquid chromatography-mass spectrometry (LC-MS/MS) analysis.
3. Bioinformatics Analysis:
(1) Overall Data Presentation: Statistical graphs of the identification results
(2) Qualitative Quality Evaluation: Peptide length distribution, peptide count distribution, missed cleavage site distribution, Venn diagrams of protein identification across samples
(3) Functional Annotation: GO annotation bar chart, KOG annotation bar chart, KEGG annotation bar chart, protein structure annotation, subcellular localization bar chart, signal peptide prediction results.
Quantitative Proteomics
Quantitative Proteomics is identifying protein types and expression changes in samples. Quantitative proteomics involves techniques such as SILAC, TMT, Label-free, and DIA to study the types and expression levels of proteins in cells, tissues, or organs. It is used to detect changes in protein expression under different conditions or time points.
Sample Types:
- Solid samples: Animal or plant tissues, etc.
- Liquid samples: Blood, cell supernatant, fermentation broth, saliva, etc.
- Microbial samples: Single bacteria or fungi
- Protein interaction samples: IP, CO-IP, Pull-down experiments, such as magnetic beads, protein eluates, or gel strips
- Special samples: Exosomes
Experimental Workflow:
1. Sample Preparation: Since quantitative proteomics deals with a wide variety of sample types, in addition to the standard extraction protocols for solid samples and gel-based samples (as mentioned in protein identification), different extraction methods are applied to ensure comprehensive protein recovery from different sample types.
2. Protein Detection: Performed using liquid chromatography-mass spectrometry (LC-MS/MS).
3. Bioinformatics Analysis:
(1) Overall Data Presentation: Statistical charts of protein identification results and a general overview of functional annotation.
(2) Qualitative Quality Control: Peptide length distribution, peptide count distribution, and missed cleavage site distribution.
(3) Quantitative Quality Control: Protein abundance violin plots, PCA (Principal Component Analysis), and correlation analysis.
(4) Differential Protein Screening: Proteins are screened based on FC (fold change) and P-value. Generally, proteins with FC >1.5 or FC < 0.6667 and P-value < 0.05 are considered significantly differentially expressed.
(5) Gene Function Annotation and Enrichment Analysis: GO, KEGG, COG/KOG (Clusters of Orthologous Groups), subcellular localization, and signal peptide prediction.
(6) Custom Analysis: Including WGCNA (Weighted Gene Co-Expression Network Analysis), among others.
PTM Proteomics
PTM Proteomics refers to the chemical modification of protein amino acid residues through the addition or removal of specific groups after translation, which regulates protein activity, localization, folding, and interactions with other biomolecules. Currently, the UniProt protein database has cataloged over 450 types of post-translational modifications (PTMs), with the most common ones including phosphorylation, acetylation, ubiquitination, glycosylation, and more.
- Protein Phosphorylation: Phosphorylation occurs mainly on serine, threonine, or tyrosine residues and is one of the most extensively studied post-translational modifications. Phosphorylation plays a crucial regulatory role in processes such as cell cycle, growth, apoptosis, and signal transduction.
- Protein Glycosylation: Glycosylation significantly impacts protein folding, conformation, distribution, stability, and activity. Carbohydrates attached via asparagine (N-linked) or serine/threonine (O-linked) residues form key structural components of many cell surface and secreted proteins.
- Protein Ubiquitination: Ubiquitination is a vital post-translational modification. The ubiquitin-proteasome system mediates the degradation of 80%–85% of proteins in eukaryotic cells. In addition to degradation, ubiquitination can directly influence protein activity and localization, regulating diverse cellular processes such as the cell cycle, apoptosis, transcriptional regulation, DNA damage repair, and immune response.
- Protein Acetylation: Acetylation involves the attachment of an acetyl group to proteins. This modification occurs when acetyl groups are transferred to lysine residues or the N-terminus of proteins, catalyzed by acetyltransferases (or non-enzymatically). Protein acetylation is classified into two types: N-terminal acetylation and lysine acetylation. Lysine acetylation, in particular, involves the transfer of acetyl groups from acetyl-CoA to the ε-amino side chain of lysine, catalyzed by KATs (lysine acetyltransferases). This dynamic and reversible modification plays an important role in regulating protein function, chromatin structure, and gene expression.
Sample Types:
- Solid Samples: Animal and plant tissues, etc.
- Liquid Samples: Blood/serum, cell supernatant, fermentation liquid, saliva, etc.
- Microbial Samples: Single bacteria or fungi
- Protein Interaction Samples: IP, CO-IP, pull-down experiments: beads, protein eluate, gel bands
Experimental Workflow:
1. Sample Preparation: The workflow for PTM (post-translational modification) proteomics is similar to that of classical proteomics, but since PTM-modified proteins are typically present in low abundance, exhibit dynamic changes in cells, and involve labile covalent bonds between modification groups and amino acid residues, extra steps are required. Peptides must be fractionated, and modified peptides should be enriched during the preparation process.
2. Protein Detection: Liquid chromatography-tandem mass spectrometry (LC-MS/MS)
3. Bioinformatics Analysis:
(1) Overall Data Presentation: Statistical charts for identification results, modification site distribution charts, and functional annotation summary charts
(2) Qualitative Quality Control: Peptide length distribution and missed cleavage site distribution
(3) Quantitative Quality Control: Protein abundance violin plots, PCA dimensionality reduction analysis, correlation analysis
(4) Differential Protein Screening: Proteins are screened based on FC (fold change) and P-value, with conventional thresholds of FC >1.5 or FC <0.6667 and P-value <0.05 used to define significant differential proteins.
(5) Motif Analysis: Amino acid motif analysis identifies conserved sequences. By counting the occurrence of amino acids upstream and downstream (6 residues) of the phosphorylation modification sites on modified peptides, conserved motifs are identified and presented.
(6) Gene Functional Annotation and Enrichment Analysis: GO (Gene Ontology), KEGG, COG/KOG (Clusters of Orthologous Groups of proteins), subcellular localization, signal peptide prediction
(7) Custom Analysis: WGCNA (Weighted Gene Co-expression Network Analysis), protein-protein interaction (PPI) network analysis, etc.