Home Resources Blog Proteomics

A Guide to Protein Database Selection

Proteomics relies on protein databases for spectrum prediction and comparison with mass spectrometry data to achieve protein identification. Therefore, protein databases serve as the foundation of proteomic analysis, and their completeness and accuracy directly impact the quality of final proteomic data.

Figure 1. The princeple of protein identification

Human Proteome

Compared to other species, research on the human proteome is relatively well-established, with the commonly utilized database provided by UniProt. Within UniProt, there are three sub-databases dedicated to the human proteome: UniProtKB/Swiss-Prot (referred to as Swiss-Prot), Proteome (UP000005640), and UniProtKB (comprising Swiss-Prot and TrEMBL). These databases differ in terms of protein count, accuracy, and annotation depth.

UniProt	Type	Total Number of Protein Sequences	Number of Unique Protein Sequences
UniProtKB	Swiss-Prot(Reviewed)	20404	20330
	TrEMBL(Unreviewed)	186900
	Total	207304	182025
Proteome	Swiss-Prot(Reviewed)	20389	一
	TrEMBL(Unreviewed)	61448	一
	Total	81837	81579

Swiss-Prot stands out among these databases. It is a high-quality, manually curated, non-redundant database primarily derived from research findings in literature and computationally analyzed results validated by E-value verification. Only data meeting quality standards are included in this database, making it a verified resource.

UniProtKB/TrEMBL, in contrast, consists of automatically translated nucleotide-encoded sequences, which undergo high-quality annotation and classification. This database is categorized as unverified.

Proteomes contain protein information translated and annotated from nucleotide sequences of whole-genome sequenced species, with each dataset assigned a Unique Proteome Identifier (UPID).

1) Comparison of Detected Protein Count

To assess the influence of different databases on the quality of proteomic data, Metware conducted searches using human cell proteomic data (divided into groups A and B, each with 4 replicates) across various databases. Subsequently, the qualitative and quantitative results of protein identification were evaluated.

Peptides identified using the Swiss-Prot, Proteome, and UniProteinKB databases numbered 100,821, 100,723, and 99,597, respectively. The corresponding quantities of identified proteins were 7,798, 8,158, and 8,452, respectively. The proportion of proteins and peptides identified across all three databases collectively was 85.97% and 87.21%, respectively.

Figure 2. Differences in Protein Detection Data Across Different Databases for Cell Samples

2) Comparison of Missing Values

An analysis was conducted on the quantitative missing values across different databases. The trend of missing values for all samples showed consistency across the three databases. However, Proteome and UniProtKB exhibited overall higher missing value rates compared to Swiss-Prot. In the comparison of missing values, Swiss-Prot showed a slight advantage over the other databases.

Figure 3. Missing Values in Protein Detection across Different Databases for Cell Samples

Summary:

a) Compared to the UniProtKB/Swiss-Prot database, using the Proteome and UniProtKB databases for searching resulted in a slight increase in the number of identified proteins by 4.62% and 8.44%, respectively. However, the increase was not significant.

b) Compared to the Proteome and UniProtKB databases, the Swiss-Prot database had the least sequence information and identified the fewest proteins. However, it had the highest number of identified peptide segments, indicating higher accuracy and better matching of protein sequences in Swiss-Prot.

c) With the increase in database usage, the proportion of missing values in quantified proteins also increased, indicating that many of the additionally identified proteins may have lower ion intensity and could potentially be false positives relative to the database.

In summary, while the Swiss-Prot database may have slightly fewer identified proteins, it offers superior qualitative accuracy and quantitative stability. Overall, for human cell samples, it is recommended to use the Swiss-Prot database for proteomic analysis.

Connect With Us

PREV: MetwareBio Launches Proteomics Services NEXT: Optimal Protein Database Selection: Insights from Experimental Data

Resources

Sample Requirements

Document Download

FAQ

Proteomics

Proteomics Methodology Proteomics Sample Extraction Proteomics Sample Preparation Proteomics Data Analysis

Metabolomics

Metabolites for Metabolomics Metabolomics Methodology Metabolomics Sample Extraction Metabolomics Sample Preparation Metabolomics Data Analysis

Multiomics

Multiomics Methodology Multi-omics Data Analysis

Lipidomics

Lipids for Lipidomics Lipidomics Methodology Lipidomics Sample Extraction Lipidomics Sample Preparation Lipidomics Data Analysis

Blog

Spatial Metabolomics

Proteomics

Metabolomics

Metabolites

Lipidomics

Multi-omics

Data analysis

Metabolites Library

Knowledgebase

Metabolomics

Metabolites

Lipidomics

Proteomics

Multi-omics

Data Analysis

Metware Cloud

Instrumentation

Publications

Metware Cloud Platform

Services

Global Metabolite Profiling

Untargeted Metabolomics

TM Widely-Targeted Metabolomics

Widely-Targeted Metabolomics for Plants

Flavonoids Metabolomics

Spatial Metabolomics

Targeted Metabolomics

Bile Acid

Oxylipin Targeted Metabolomics

Neurotransmitter Targeted Metabolomics

Steroid Hormone Targeted Metabolomics

Energy Metabolism

Tryptophan Targeted Metabolomics

Amino Acid Targeted Metabolomics

Short-Chain Fatty Acids

Plant Hormone Assay

Carotenoid Targeted Metabolomics

Anthocyanin Assay

Gibberellin Assay

Lipidomics

Quantitative Lipidomics

Quantitative Lipidomics for Plants

Name can't be empty

Email error!

Message can't be empty

CONTACT FOR DEMO

WHAT'S NEXT IN OMICS: THE METABOLOME

Please submit a detailed description of your project. We will provide you with a customized project plan metabolomics services to meet your research requests. You can also send emails directly to support-global@metwarebio.com for inquiries.

Name can't be empty

Email error!

Message can't be empty

CONTACT FOR DEMO