Comprehensive Guide to Proteomics Technology: Strategies, Methods, Principles, and Applications
Although genes determine protein levels, factors such as post-transcriptional regulation, post-translational modifications, localization, conformational changes, and protein-protein interactions cause mRNA expression levels to not directly equate to protein levels. As a result, transcriptomic data alone cannot represent the expression of active proteins within cells, nor can it capture the impact of various modifications on protein function and activity, such as acetylation, ubiquitination, phosphorylation, or glycosylation. Furthermore, proteomics is more closely related to phenotypic expression than other omics studies, making it a more comprehensive approach for explaining biological states from a molecular perspective.
1. What is Proteomics Used For?
Proteomics refers to the large-scale study of proteins, focusing on the dynamic changes in protein composition, expression levels, and modification states within cells (or tissues, etc.). It aims to understand the interactions and relationships between proteins, uncovering protein functions and the molecular mechanisms of cellular activities. Proteomics provides complementary insights to genomics and transcriptomics, offering crucial information for mapping complex, interconnected pathways, networks, and molecular systems that directly regulate vital biological processes, such as cell proliferation, differentiation, aging, and apoptosis.
2. Proteomics Research Strategies
In proteomics, there are two main approaches for protein identification (Figure 1): Bottom-up and Top-down. Top-down involves analyzing an intact protein, fragmenting it directly in the mass spectrometer to deduce its sequence from the resulting fragments. However, the Bottom-up approach, also known as shotgun proteomics, is the predominant method. It takes advantage of the fact that proteins can be selectively cleaved by specific enzymes at defined sites. The strategy involves enzymatically digesting proteins into peptides, which are then analyzed by mass spectrometry. The peptide sequences are used to infer the original protein sequence. All methods described below fall under the Bottom-up approach.
3. Proteomics Methodologies
Proteomics can be classified based on whether the research targets specific proteins into two categories: Targeted and Untargeted Proteomics.
- Targeted Proteomics focuses on studying specific proteins of interest.
- Untargeted Proteomics aims to detect as many proteins as possible globally. Furthermore, untargeted proteomics can be subdivided based on labeling methods and data acquisition strategies.
3.1 Targeted Proteomics
Targeted proteomics focuses on analyzing specific proteins or peptides within a complex mixture. It allows for the precise detection and quantification of preselected proteins of interest. Using LC-MS/MS-based targeted proteomics techniques, researchers can achieve high selectivity, specificity, reproducibility, and accuracy in the detection of target proteins or peptides, including those with post-translational modifications (PTMs). This enables both relative and absolute quantification of the target proteins or peptides. Two commonly used methods for targeted quantification are MRM (Multiple Reaction Monitoring) and PRM (Parallel Reaction Monitoring).
Experimental Principle
MRM and PRM work by selecting representative peptides of the target proteins for mass spectrometric signal acquisition. These methods capture quantitative information specifically for these peptides, providing insights into protein abundance. Relative quantification can be achieved for target proteins in samples, and by incorporating synthesized isotopically labeled peptides as standards, a calibration curve can be generated to allow for absolute quantification.
1) MRM involves three stages of mass spectrometry. First, the mass spectrometer selects precursor ions from the target peptides in the first quadrupole (Q1). These precursor ions are then fragmented in the collision cell, and finally, the third quadrupole selectively detects the characterized fragment ions. This method offers high precision and is ideal for the quantification of multiple target proteins in complex samples. Incorporating isotopically labeled peptides as internal standards allows for absolute protein quantification.
2) PRM builds upon MRM, utilizing high-resolution, high-precision mass spectrometry such as Orbitrap. In PRM, the mass spectrometer selectively detects precursor ions in Q1, fragments these ions in the collision cell, and then uses a high-resolution mass analyzer to detect all fragment ions within the selected precursor ion window. This enables more accurate and specific analysis of target proteins or peptides in complex biological samples.
Experimental Workflow
1) Obtain known protein sequences of interest
2) Protein extraction
3) Protein digestion and desalting
4) Select target peptides
5) Perform PRM acquisition for target peptides on each sample
6) Bioinformatics analysis
Technical Advantages
- High sensitivity: Two stages of mass spectrometry ensure that only ions matching the target ions are selected, significantly reducing noise and improving detection accuracy.
- High throughput: Can identify over 200 proteins in a single run.
- Absolute quantification: No need for antibodies.
- High precision for low-abundance proteins: Quantification spans four orders of magnitude.
Applications:
- Validation of untargeted proteomics results (e.g., label-free proteomics).
- Absolute quantification of multiple proteins/peptides simultaneously.
- Study of protein families with high homology but lacking specific antibodies.
- Quantitative studies of post-translational modifications (PTMs).
- Absolute quantification of biomarkers in biological diseases.
3.2 Untargeted Proteomics
Untargeted proteomics can be categorized into three types based on the use of stable isotopes or synthetic stable isotope methods: metabolically labeled proteomics (SILAC), chemical labeled proteomics (iTRAQ/TMT), and label-free proteomics (DDA/DIA).
3.2.1 Metabolic Labeling Quantitative Proteomics
SILAC, or Stable Isotope Labeling by Amino acids in Cell culture, is a powerful and straightforward technique for quantitative proteomics based on mass spectrometry (MS). Initially introduced for quantitative proteomics by Ong and colleagues from the Mann Lab in Denmark in 2022, SILAC offers accurate relative quantification without the need for chemical derivatization or other complex procedures. It is widely used in various research fields, including cell biology, biochemistry, and pharmacology.
Technical Principle
The fundamental principle of SILAC involves labeling newly synthesized proteins in cell cultures using stable isotope-labeled essential amino acids. Specifically, SILAC technology incorporates light, medium, or heavy stable isotope-labeled lysine (Lys) and arginine (Arg) into the culture medium. During cell growth, the cells utilize these labeled amino acids to synthesize new proteins. Typically, multiple groups of cells are cultured, each using a different type of isotope-labeled amino acid—for instance, one group may use light isotope-labeled amino acids while another group uses heavy isotope-labeled amino acids. After 5-6 passages, proteins in all cells are uniformly labeled with the corresponding isotopes. By analyzing the samples using mass spectrometry, the abundance differences of proteins across different groups can be compared, enabling quantitative analysis.
Detection Workflow
1) Cell culture labeling
2) Protein extraction
3) 1:1 protein sample mixing
4) Protein reduction and digestion
5) LC-MS/MS analysis
6) Bioinformatics analysis.
Technical Advantages:
- Stable Labeling at the Cellular Level: The labeling effect is stable and effective, with labeling efficiency unaffected by lysis buffer.
- Reduced Sample Requirements: Typically, only a few dozen micrograms of protein are sufficient for each sample.
- High Throughput: Mass spectrometry allows for the simultaneous identification and quantification of multiple proteins.
- In Vivo Labeling Technique: This approach closely reflects the true state of the sample.
Technical Applications
SILAC-based quantitative proteomics can be employed to identify specific proteins involved in various protein-protein interactions (PPIs), including exogenous PPIs (Figure 6A), endogenous PPIs (Figure 6B), and specific interaction proteins in induced PPIs (Figure 6C)
1) Exogenous Protein-Protein Interactions: Wild-type cells or cells expressing affinity-tagged proteins are grown in either light or heavy media. The immune complexes are then purified from the mixed lysates of light and heavy cells.
2) Endogenous PPIs: Target proteins are downregulated in cells grown in light or heavy media using RNA interference (RNAi). Corresponding antibodies from the mixed lysates of light and heavy cells are then used for immunoprecipitation of the protein complexes.
3) Induced PPIs: Specific stimuli are applied to cells grown in light or heavy media to induce protein complexes, which are subsequently purified through immunoaffinity from the mixed lysates of light and heavy cells.
Once the protein complexes are obtained, they are digested into peptide fragments and analyzed using LC-MS/MS. Specific interacting proteins can be distinguished from non-specific background proteins based on their SILAC ratios.
3.2.2 Chemical Labeling Quantitative Proteomics Technology
Chemical labeling quantitative proteomics involves the in vitro labeling of peptides generated by protein digestion using isotope-labeled reagents, followed by the mixing of multiple samples for analysis. The labeling reagents can be categorized into two types: iTRAQ (Isobaric Tag for Relative Absolute Quantitation) and TMT (Tandem Mass Tags). These technologies, developed by AB Sciex and Thermo Fisher Scientific, respectively, employ similar principles for quantitative proteomics. TMT offers a greater number of labeling options compared to iTRAQ, which utilizes four or eight isobaric tags, while TMT employs two, six, ten, or sixteen isotopic labels, allowing for the analysis of a larger number of samples in a single run (the following example will focus on TMT technology).
Label Composition
- Reporter Ion Group: Used for quantifying different peptide segments.
- Balance Group: Ensures consistent precursor ion mass across the same peptide after labeling.
- Peptide Reactive Group: Reacts with the amino group at the N-terminus and the side chain amine of lysine residues to form connections.
Experimental Principle
The fundamental principle of this method involves the reaction and labeling of different reagents with various samples, ensuring that all peptides in a specific sample are tagged with a particular labeling reagent. After completing the labeling process for all samples, equal amounts of the labeled peptides are mixed together, followed by offline separation and analysis via mass spectrometry (MS). During MS analysis, the precursor ions from different samples exhibit a single peak due to the identical molecular weight of all reagents. This results in the signals from the same peptides across multiple samples being superimposed, which facilitates the selection for tandem mass spectrometry (MS/MS) analysis. When the precursor ion of a peptide is selected and fragmented, the reporter groups from the different samples are released, forming free ions. In the second stage of mass spectrometry, eighteen distinct reporter ions are generated. By analyzing the intensity ratios of these reporter ions, the expression levels of the same protein across different samples can be quantified (see Figure 6).
Experimental Workflow
1) Protein extraction
2) Protein digestion
3) iTRAQ/TMT labeling
4) HPLC fractionation
5) LC-MS/MS analysis
6) Protein database search
7) Bioinformatics analysis.
Technical Advantages
- High Sensitivity: Capable of detecting low-abundance proteins.
- Strong Separation Capacity: Effective in separating acidic/basic proteins, proteins smaller than 10 kDa, larger than 200 kDa, and insoluble proteins.
- Wide Application Range: Suitable for identifying various types of proteins, including membrane proteins, nuclear proteins, and extracellular proteins.
- High Throughput: Allows for the simultaneous analysis of up to 18 samples, making it particularly well-suited for differential protein analysis across samples subjected to multiple treatments or varying treatment durations.
Technical Applications
- Study of Protein Expression Differences: By comparing protein expression variations in samples under different physiological states, disease conditions, or after drug treatments, this approach aims to identify potential biomarkers or therapeutic targets and enhance understanding of disease mechanisms.
- Protein Modification Research: Identification and quantification of post-translational modification types and levels across different samples—such as phosphorylation, acetylation, and methylation—facilitates a deeper understanding of the regulatory mechanisms governing protein function and their associations with diseases.
- Construction of Protein Interaction Networks: Identification of interactions between proteins to construct protein interaction networks provides crucial insights into the functions and regulatory mechanisms of biological systems.
3.2.3 Label-free Quantitative Proteomics
Label-free quantitative proteomics differs from SILAC and iTRAQ/TMT in that it does not require any labeling of samples. This technique relies on mass spectrometry (MS) analysis to achieve quantitative measurements. It infers the relative abundance of peptides in different samples by directly comparing their mass spectral characteristics.
Experimental Principle
The experimental principle of label-free quantitative proteomics involves taking equal amounts of protein from different sample groups, followed by enzymatic digestion. The resulting equal quantities of digested peptides are subjected to LC-MS analysis under consistent conditions. Relative quantification of each protein in different samples is achieved by comparing the retention times and peak areas (or MS signal intensities) of the peptides. Since label-free quantification does not require sample labeling, it is also referred to as "label-free" quantitative proteomics.
Experimental Workflow
1) Protein extraction
2) Protein digestion
3) LC-MS/MS analysis
4) Protein database search
5) Bioinformatics analysis.
Acquisition Mode
Based on different data acquisition modes, label-free quantitative proteomics can be further categorized into two types: Data Dependent Acquisition (DDA) and Data Independent Acquisition (DIA) (see Figures 10A and 10B).
- DDA (Data Dependent Acquisition) Principle: In the DDA mode, a full scan (MS1) is performed to record the parent ion information of peptides, capturing the mass-to-charge ratios (m/z) of several parent ions sorted by intensity from highest to lowest. The mass spectrometer selects the top N (typically 20 to 40) parent ions with the highest intensities for fragmentation in the second stage of mass spectrometry. This information is then compared against a protein database for protein identification.
- DIA (Data Independent Acquisition) Principle: In the DIA mode, a full scan (MS1) captures the parent ion information of peptides within a specific time window, recording their m/z ratios in descending order of intensity. The mass spectrometer divides the scanning range into several windows and fragments all parent ions within each window, collecting the secondary spectra of all peptide ions for qualitative and quantitative analysis of proteins.
Technical Advantages
- No Labeling Agents Required: This approach avoids the influence of labeling agents on protein structure and function, preserving the sample's native state.
- High Sensitivity: Enables the detection and quantification of low-abundance proteins.
- Wide Linear Range: Facilitates quantitative analysis of proteins across various concentration ranges.
- High Throughput: Allows simultaneous analysis of multiple samples.
Technical Applications
- Biomarker Discovery: By comparing the proteomes of samples from different physiological states or diseases, potential biomarkers can be identified, providing a basis for early diagnosis and treatment of diseases.
- Drug Target Identification: Research on drug-protein interactions helps identify the targets of drugs, guiding drug development.
- Protein Interaction Studies: Investigating protein interactions reveals the structure and function of protein networks, providing essential insights into biological processes.
- Protein Modification Research: Studying protein modifications uncovers mechanisms of functional regulation, offering explanations for the onset and progression of diseases.