Home Resources Blog Data analysis

WGCNA Explained: Analysis, Tutorial & Online Tools for Omics Research

Weighted Gene Co-expression Network Analysis (WGCNA) is a powerful method for identifying gene modules, relating them to traits, and discovering hub genes in large-scale omics datasets. This article serves as both an in-depth guide and a step-by-step tutorial for conducting WGCNA analysis—whether you use the R package or online platforms like Metware Cloud.

What is WGCNA?
When and why is WGCNA used?
Step-by-Step WGCNA Tutorial
How to interpret WGCNA results?
FAQs about WGCNA
WGCNA Online with Metware Cloud Platform

01 What is WGCNA?

WGCNA, short for Weighted Gene Co-expression Network Analysis, is a commonly used tool for analyzing gene co-expression networks. It is often translated as weighted correlation network analysis. Weighted Correlation network analysis, particularly through the WGCNA R package, is applied to examine correlation structures in high-dimensional datasets, such as gene expression and proteomics data. For a deeper dive into interpreting WGCNA in research articles, see our Understanding WGCNA Analysis in Publications。

02 When and why is WGCNA used?

WGCNA is applied to analyze gene expression data in complex transcriptome data with multiple samples, particularly in the study of developmental regulation across different organs/tissues and stages. Differential network analysis is crucial for identifying changes in connectivity patterns under varying conditions. Additionally, it is utilized to investigate response mechanisms to biotic and abiotic stresses at various time points. Common scenarios include linking gene modules to phenotypic traits, identifying key metabolic regulators, and exploring developmental or stress-response networks. WGCNA is also increasingly applied in proteomics, metabolomics, and multi-omics integration to reveal biological mechanisms.

03 Step-by-Step WGCNA Tutorial

Data preparation & quality control – Ensure your expression matrix passes basic QC checks, removing low-expression genes and problematic samples.
Choosing soft-threshold power – Select β using scale-free topology criteria; this step determines network construction.
Constructing the network & detecting modules – Build the adjacency matrix, transform to TOM, and identify modules via dynamic tree cutting.
Relating modules to traits – Calculate module eigengenes and correlate them with phenotypic data to find biologically relevant modules.
Identifying hub genes – Use connectivity measures (KME, TOM) or literature evidence to find key regulators.
Functional enrichment & visualization – Apply GO/KEGG enrichment and create visual module-trait maps.

For a code-free experience, try the Metware Cloud WGCNA Online Analysis Tool.

04 How to interpret WGCNA results?

4.1 Identifying gene co-expression network sets

Based on pairwise correlation of gene expression data across all samples, genes with similar expression patterns are grouped into modules. This categorization condenses thousands of differentially expressed genes into several modules, typically a dozen or more. Using transcriptome data from tomato fruit development research as an example, WGCNA analysis identified 12 gene modules. The commonly used representation in literature is depicted in the figure below, where the upper part of the evolutionary tree displays each branch representing a gene, and the different colors below represent various modules.

Consider using interactive visualization tools for WGCNA to explore module relationships, especially in large datasets. Such tools can help zoom into specific branches, inspect module membership, and link module eigengenes to traits in real time, making it easier to identify biologically meaningful patterns.

4.2 Filtering Key Modules for Functional enrichment analysis

Method 1: Filtering based on module characteristic expression patterns.

After identifying modules, WGCNA calculates a module characteristic value (Epigene) for each module, representing the expression status of all genes in the module. Analyzing the abundance of module characteristic values in various samples helps filter modules closely related to the samples. For instance, in tomatoes, the "brown" module shows higher characteristic value expression (positive) in samples from the first period, making it a key module for subsequent analysis. This process requires the input of gene expression data to accurately determine the module characteristic expression patterns.

Correlation_Heatmap_between_Samples_and_Modules Method 2: Filtering through module-sample or phenotype correlation analysis. Calculating the correlation coefficient between modules and sample or phenotype data identifies modules highly correlated with specific samples or phenotypes. In tomato data, a specific correlation between JS3 and the "pink" module suggests special attention to this module. If there are statistical data on relevant phenotypes during tomato fruit development, such as tomato lycopene content, modules with the highest correlation to lycopene content can also be selected.

Method 3: Filtering through module gene function enrichment. Conducting functional enrichment analysis, like Gene Ontology (GO), for each module helps identify modules corresponding to biological processes related to the traits of interest. For example, in tomato fruit development, processes like carotenoid metabolism and ethylene signaling are relevant to fruit ripening, prompting focus on modules enriched with relevant GO terms. Additionally, network visualization can be enhanced by plotting the connectivity distribution of the entire network, where the y-axis depicts the logarithm of the corresponding frequency distribution.

Method 4: Filtering modules through target gene selection. Considering research objectives, previous findings, and published literature, modules containing target genes of interest can be directly selected for further analysis. In tomato fruit development, key genes like PG2A and PL1 involved in pectin degradation found in the "yellow" module make it a candidate for further investigation.

4.3 Identifying key genes

After filtering down to candidate modules through the aforementioned analyses, analyzing the internal composition of the modules is crucial. Identifying key genes within the modules, often referred to as Hub genes, is essential. This can be achieved through analyzing intra-modular gene connectivity (TOM values, KME, or KIM), selecting genes with higher connectivity in the network. RNA-Seq datasets from the Gene Expression Omnibus (GEO) are invaluable for such transcriptomics research, providing comprehensive data for various species and biological sample groups. Additionally, attention can be directed towards genes with regulatory functions, such as transcription factors, as they generally act as regulators in the upstream part of the module regulatory network.

How do I find hub genes? Hub genes are typically those with high connectivity within a module (e.g., high KME or TOM values) and are often biologically central. Combining statistical measures with literature evidence improves reliability, and follow-up experimental validation—such as RT-qPCR—is recommended for WGCNA hub gene validation.

05 FAQs about WGCNA

Q1: What is WGCNA used for?

To detect clusters (modules) of highly correlated genes and relate them to external traits.

Q2: How to choose soft-threshold power?

Use the scale-free topology fit index plot; pick the lowest β that reaches R² ≥ 0.8.

Q3: Can I run WGCNA online without coding?

Yes. Platforms like Metware Cloud provide automated workflows.

Q4: How to interpret module-trait correlation heatmaps?

Red/blue intensity reflects strength and direction of correlation; focus on modules with high |r| and low p-values.

Q5: How to find hub genes in WGCNA?

Filter by high connectivity and trait correlation; validate with literature and biological experiments.

06 WGCNA Online with Metware Cloud Platform

MetwareBio’s Metware Cloud Platform delivers a no-code, online WGCNA workflow — from data upload to network construction, module detection, trait correlation, and interactive visualization. With an intuitive interface and fast processing, you can run WGCNA without installing R or writing scripts, focusing on biological insights instead of technical hurdles.

Watch the video to see how Metware Cloud streamlines WGCNA analysis, helping you visualize gene co-expression networks and connect them to phenotypic traits in just a few clicks.

Conclusion

WGCNA empowers researchers to decode complex biological networks and link them to phenotypic traits. Whether you prefer the flexibility of the WGCNA R package or the convenience of an online platform, the methods here can help streamline your workflow.

Explore our WGCNA blog series for:

MetwareBio also offers LC - MS/MS detection and multi-omics data analysis services — including proteomics, transcriptomics, metabolomics, and multi-omics. Learn how our solutions can advance your research.

Read more:

Connect With Us

NEXT: Understanding WGCNA Analysis in Publications

Resources

Sample Requirements

Document Download

FAQ

Proteomics

Proteomics Methodology Proteomics Sample Extraction Proteomics Sample Preparation Proteomics Data Analysis

Metabolomics

Metabolites for Metabolomics Metabolomics Methodology Metabolomics Sample Extraction Metabolomics Sample Preparation Metabolomics Data Analysis

Multiomics

Multiomics Methodology Multi-omics Data Analysis

Lipidomics

Lipids for Lipidomics Lipidomics Methodology Lipidomics Sample Extraction Lipidomics Sample Preparation Lipidomics Data Analysis

Blog

Spatial Metabolomics

Proteomics

Metabolomics

Metabolites

Lipidomics

Multi-omics

Data analysis

Metabolites Library

Knowledgebase

Metabolomics

Metabolites

Lipidomics

Proteomics

Multi-omics

Data Analysis

Instrumentation

Metware Cloud

Publications

Metware Cloud Platform

Services

Proteomics

DIA Quantitative Proteomics

DDA Quantitative Proteomics

Serum/Plasma Quantitative Proteomics

Low-Input Quantitative Proteomics

Phosphoproteomics

Ubiquitin Proteomics

Lactylation Proteomics

Succinylation Proteomics

Acetyl-Proteomics

Proteome + PTM Analysis

Protein Complex Analysis

Global Metabolite Profiling

Untargeted Metabolomics

TM Widely-Targeted Metabolomics

Widely-Targeted Metabolomics for Plants

Flavonoids Metabolomics

Spatial Metabolomics

Lipidomics

Quantitative Lipidomics

Quantitative Lipidomics for Plants

Targeted Metabolomics

Energy Metabolism

One-Carbon Metabolism

Tryptophan Metabolism

Bile Acids

Steroid Hormones

Neurotransmitters

Oxylipins

Amino Acids

Free Fatty Acids

Short-Chain Fatty Acids

Sugars

Organic Acids

Plant Hormones

Carotenoids

Anthocyanins

Gibberellins

Name can't be empty

Email error!

Message can't be empty

CONTACT FOR DEMO

Next-Generation Omics Solutions:
Proteomics & Metabolomics

Have a project in mind? Tell us about your research, and our team will design a customized proteomics or metabolomics plan to support your goals.
Ready to get started? Submit your inquiry or contact us at support-global@metwarebio.com.

Name can't be empty

Email error!

Message can't be empty

CONTACT FOR DEMO