WebGestaltR: The R interface for enrichment analysis with WebGestalt.
Source:R/WebGestaltR-package.R
, R/WebGestaltR.R
, R/WebGestaltRBatch.R
WebGestaltR.Rd
Main function for enrichment analysis
Usage
WebGestaltR(
enrichMethod = "ORA",
organism = "hsapiens",
enrichDatabase = NULL,
enrichDatabaseFile = NULL,
enrichDatabaseType = NULL,
enrichDatabaseDescriptionFile = NULL,
interestGeneFile = NULL,
interestGene = NULL,
interestGeneType = NULL,
interestGeneNames = NULL,
collapseMethod = "mean",
referenceGeneFile = NULL,
referenceGene = NULL,
referenceGeneType = NULL,
referenceSet = NULL,
minNum = 10,
maxNum = 500,
sigMethod = "fdr",
fdrMethod = "BH",
fdrThr = 0.05,
topThr = 10,
reportNum = 20,
perNum = 1000,
gseaP = 1,
isOutput = TRUE,
outputDirectory = getwd(),
projectName = NULL,
dagColor = "continuous",
saveRawGseaResult = FALSE,
gseaPlotFormat = c("png", "svg"),
setCoverNum = 10,
networkConstructionMethod = NULL,
neighborNum = 10,
highlightType = "Seeds",
highlightSeedNum = 10,
nThreads = 1,
cache = NULL,
hostName = "https://www.webgestalt.org/",
useWeightedSetCover = FALSE,
useAffinityPropagation = FALSE,
usekMedoid = TRUE,
kMedoid_k = 25,
...
)
WebGestaltRBatch(
interestGeneFolder = NULL,
enrichMethod = "ORA",
isParallel = FALSE,
nThreads = 3,
...
)
Arguments
- enrichMethod
Enrichment methods:
ORA
,GSEA
orNTA
.- organism
Currently, WebGestaltR supports 12 organisms. Users can use the function
listOrganism
to check available organisms. Users can also inputothers
to perform the enrichment analysis for other organisms not supported by WebGestaltR. For other organisms, users need to provide the functional categories, interesting list and reference list (for ORA method). Because WebGestaltR does not perform the ID mapping for the other organisms, the above data should have the same ID type.- enrichDatabase
The functional categories for the enrichment analysis. Users can use the function
listGeneSet
to check the available functional databases for the selected organism. Multiple databases in a vector are supported for ORA and GSEA.- enrichDatabaseFile
Users can provide one or more GMT files as the functional category for enrichment analysis. The extension of the file should be
gmt
and the first column of the file is the category ID, the second one is the external link for the category. Genes annotated to the category are from the third column. All columns are separated by tabs. The GMT files will be combined withenrichDatabase
.- enrichDatabaseType
The ID type of the genes in the
enrichDatabaseFile
. If users setorganism
asothers
, users do not need to set this ID type because WebGestaltR will not perform ID mapping for other organisms. The supported ID types of WebGestaltR for the selected organism can be found by the functionlistIdType
.- enrichDatabaseDescriptionFile
Users can also provide description files for the custom
enrichDatabaseFile
. The extension of the description file should bedes
. The description file contains two columns: the first column is the category ID that should be exactly the same as the category ID in the customenrichDatabaseFile
and the second column is the description of the category. All columns are separated by tabs.- interestGeneFile
If
enrichMethod
isORA
orNTA
, the extension of theinterestGeneFile
should betxt
and the file can only contain one column: the interesting gene list. IfenrichMethod
isGSEA
, the extension of theinterestGeneFile
should bernk
and the file should contain two columns separated by tab: the gene list and the corresponding scores.- interestGene
Users can also use an R object as the input. If
enrichMethod
isORA
orNTA
,interestGene
should be an Rvector
object containing the interesting gene list. IfenrichMethod
isGSEA
,interestGene
should be an Rdata.frame
object containing two columns: the gene list and the corresponding scores.- interestGeneType
The ID type of the interesting gene list. The supported ID types of WebGestaltR for the selected organism can be found by the function
listIdType
. If theorganism
isothers
, users do not need to set this parameter.- interestGeneNames
The names of the id lists for multiomics data.
- collapseMethod
The method to collapse duplicate IDs with scores.
mean
,median
,min
andmax
represent the mean, median, minimum and maximum of scores for the duplicate IDs.- referenceGeneFile
For the ORA method, the users need to upload the reference gene list. The extension of the
referenceGeneFile
should betxt
and the file can only contain one column: the reference gene list.- referenceGene
For the ORA method, users can also use an R object as the reference gene list.
referenceGene
should be an Rvector
object containing the reference gene list.- referenceGeneType
The ID type of the reference gene list. The supported ID types of WebGestaltR for the selected organism can be found by the function
listIdType
. If theorganism
isothers
, users do not need to set this parameter.- referenceSet
Users can directly select the reference set from existing platforms in WebGestaltR and do not need to provide the reference set through
referenceGeneFile
. All existing platforms supported in WebGestaltR can be found by the functionlistReferenceSet
. IfreferenceGeneFile
andrefereneceGene
areNULL
, WebGestaltR will use thereferenceSet
as the reference gene set. Otherwise, WebGestaltR will use the user supplied reference set for enrichment analysis.- minNum
WebGestaltR will exclude the categories with the number of annotated genes less than
minNum
for enrichment analysis. The default is10
.- maxNum
WebGestaltR will exclude the categories with the number of annotated genes larger than
maxNum
for enrichment analysis. The default is500
.- sigMethod
Two methods of significance are available in WebGestaltR:
fdr
andtop
.fdr
means the enriched categories are identified based on the FDR andtop
means all categories are ranked based on FDR and then select top categories as the enriched categories. The default isfdr
.- fdrMethod
For the ORA method, WebGestaltR supports five FDR methods:
holm
,hochberg
,hommel
,bonferroni
,BH
andBY
. The default isBH
.- fdrThr
The significant threshold for the
fdr
method. The default is0.05
.- topThr
The threshold for the
top
method. The default is10
.- reportNum
The number of enriched categories visualized in the final report. The default is
20
. A largerreportNum
may be slow to render in the report.- perNum
The number of permutations for the GSEA method. The default is
1000
.- gseaP
The exponential scaling factor of the phenotype score. The default is
1
. When p=0, ES reduces to standard K-S statistics (See original paper for more details).- isOutput
If
isOutput
is TRUE, WebGestaltR will create a folder named by theprojectName
and save the results in the folder. Otherwise, WebGestaltR will only return an Rdata.frame
object containing the enrichment results. If hundreds of gene list need to be analyzed simultaneously, it is better to setisOutput
toFALSE
. The default isTRUE
.- outputDirectory
The output directory for the results.
- projectName
The name of the project. If
projectName
isNULL
, WebGestaltR will use time stamp as the project name.- dagColor
If
dagColor
isbinary
, the significant terms in the DAG structure will be colored by steel blue for ORA method or steel blue (positive related) and dark orange (negative related) for GSEA method. IfdagColor
iscontinous
, the significant terms in the DAG structure will be colored by the color gradient based on corresponding FDRs.- saveRawGseaResult
Whether the raw result from GSEA is saved as a RDS file, which can be used for plotting. Defaults to
FALSE
. The list includes- Enrichment_Results
A data frame of GSEA results with statistics
- Running_Sums
A matrix of running sum of scores for each gene set
- Items_in_Set
A list with ranks of genes for each gene set
- gseaPlotFormat
The graphic format of GSEA enrichment plots. Either
svg
,png
, orc("png", "svg")
(default).- setCoverNum
The number of expected gene sets after set cover to reduce redundancy. It could get fewer sets if the coverage reaches 100%. The default is
10
.- networkConstructionMethod
Netowrk construction method for NTA. Either
Network_Retrieval_Prioritization
orNetwork_Expansion
. Network Retrieval & Prioritization first uses random walk analysis to calculate random walk probabilities for the input seeds, then identifies the relationships among the seeds in the selected network and returns a retrieval sub-network. The seeds with the top random walk probabilities are highlighted in the sub-network. Network Expansion first uses random walk analysis to rank all genes in the selected network based on their network proximity to the input seeds and then return an expanded sub-network in which nodes are the input seeds and their top ranking neighbors and edges represent their relationships.- neighborNum
The number of neighbors to include in NTA Network Expansion method.
- highlightType
The type of nodes to highlight in the NTA Network Expansion method, either
Seeds
orNeighbors
.- highlightSeedNum
The number of top input seeds to highlight in NTA Network Retrieval & Prioritizaiton method.
- nThreads
The number of cores to use for GSEA and set cover, and in batch function.
- cache
A directory to save data cache for reuse. Defaults to
NULL
and disabled.- hostName
The server URL for accessing data. Mostly for development purposes.
- useWeightedSetCover
Use weighted set cover for ORA. Defaults to
TRUE
.- useAffinityPropagation
Use affinity propagation for ORA. Defaults to
FALSE
.- usekMedoid
Use k-medoid for ORA. Defaults to
TRUE
.- kMedoid_k
The number of clusters for k-medoid. Defaults to
25
.- ...
In batch function, passes parameters to WebGestaltR function. Also handles backward compatibility for some parameters in old versions.
- interestGeneFolder
Run WebGestaltR for gene list files in the folder.
- isParallel
If jobs are run parallelly in the batch.
Value
The WebGestaltR function returns a data frame containing the enrichment analysis
result and also outputs an user-friendly HTML report if isOutput
is TRUE
.
The columns in the data frame depend on the enrichMethod
and they are the following:
- geneSet
ID of the gene set.
- description
Description of the gene set if available.
- link
Link to the data source.
- size
The number of genes in the set after filtering by
minNum
andmaxNum
.- overlap
The number of mapped input genes that are annotated in the gene set.
- expect
Expected number of input genes that are annotated in the gene set.
- enrichmentRatio
Enrichment ratio, overlap / expect.
- enrichmentScore
Enrichment score, the maximum running sum of scores for the ranked list.
- normalizedEnrichmentScore
Normalized enrichment score, normalized against the average enrichment score of all permutations.
- leadingEdgeNum
Number of genes/phosphosites in the leading edge.
- pValue
P-value from hypergeometric test for ORA. For GSEA, please refer to its original publication or online at https://software.broadinstitute.org/gsea/doc/GSEAUserGuideTEXT.htm.
- FDR
Corrected P-value for mulilple testing with
fdrMethod
for ORA.- overlapId
The gene/phosphosite IDs of
overlap
for ORA (entrez gene IDs or phosphosite sequence).- leadingEdgeId
Genes/phosphosites in the leading edge in entrez gene ID or phosphosite sequence.
- userId
The gene/phosphosite IDs of
overlap
for ORA orleadingEdgeId
for GSEA in User input IDs.- plotPath
Path of the GSEA enrichment plot.
- database
Name of the source database if multiple enrichment databases are given.
- goId
In NTA, like
geneSet
, the enriched GO terms of genes in the returned subnetwork.- interestGene
In NTA, the gene IDs in the subnetwork with 0/1 annotations indicating if it is from user input.
The WebGestaltRBatch function returns a list of enrichment results.
Details
WebGestaltR function can perform three enrichment analyses: ORA (Over-Representation Analysis) and GSEA (Gene Set Enrichment Analysis).and NTA (Network Topology Analysis). Based on the user-uploaded gene list or gene list with scores, WebGestaltR function will first map the gene list to the entrez gene ids and then summarize the gene list based on the GO (Gene Ontology) Slim. After performing the enrichment analysis, WebGestaltR function also returns a user-friendly HTML report containing GO Slim summary and the enrichment analysis result. If functional categories have DAG (directed acyclic graph) structure or genes in the functional categories have network structure, those relationship can also be visualized in the report.
Examples
if (FALSE) {
####### ORA example #########
geneFile <- system.file("extdata", "interestingGenes.txt", package = "WebGestaltR")
refFile <- system.file("extdata", "referenceGenes.txt", package = "WebGestaltR")
outputDirectory <- getwd()
enrichResult <- WebGestaltR(
enrichMethod = "ORA", organism = "hsapiens",
enrichDatabase = "pathway_KEGG", interestGeneFile = geneFile,
interestGeneType = "genesymbol", referenceGeneFile = refFile,
referenceGeneType = "genesymbol", isOutput = TRUE,
outputDirectory = outputDirectory, projectName = NULL
)
####### GSEA example #########
rankFile <- system.file("extdata", "GeneRankList.rnk", package = "WebGestaltR")
outputDirectory <- getwd()
enrichResult <- WebGestaltR(
enrichMethod = "GSEA", organism = "hsapiens",
enrichDatabase = "pathway_KEGG", interestGeneFile = rankFile,
interestGeneType = "genesymbol", sigMethod = "top", topThr = 10, minNum = 5,
outputDirectory = outputDirectory
)
####### NTA example #########
enrichResult <- WebGestaltR(
enrichMethod = "NTA", organism = "hsapiens",
enrichDatabase = "network_PPI_BIOGRID", interestGeneFile = geneFile,
interestGeneType = "genesymbol", sigMethod = "top", topThr = 10,
outputDirectory = getwd(), highlightSeedNum = 10,
networkConstructionMethod = "Network_Retrieval_Prioritization"
)
}