Multi-omics GSEA importFrom dplyr bind_rows left_join arrange select desc importFrom readr write_tsv

Usage

WebGestaltRMultiOmicsGSEA(
  analyteLists = NULL,
  analyteListFiles = NULL,
  analyteTypes = NULL,
  enrichMethod = "GSEA",
  organism = "hsapiens",
  enrichDatabase = NULL,
  enrichDatabaseFile = NULL,
  enrichDatabaseType = NULL,
  enrichDatabaseDescriptionFile = NULL,
  collapseMethod = "mean",
  minNum = 10,
  maxNum = 500,
  fdrMethod = "BH",
  sigMethod = "fdr",
  fdrThr = 0.05,
  topThr = 10,
  reportNum = 100,
  setCoverNum = 10,
  perNum = 1000,
  gseaP = 1,
  isOutput = TRUE,
  outputDirectory = getwd(),
  projectName = NULL,
  dagColor = "binary",
  saveRawGseaResult = FALSE,
  gseaPlotFormat = "png",
  nThreads = 1,
  cache = NULL,
  hostName = "https://www.webgestalt.org/",
  useWeightedSetCover = TRUE,
  useAffinityPropagation = FALSE,
  usekMedoid = FALSE,
  kMedoid_k = 25,
  isMetaAnalysis = TRUE,
  mergeMethod = "mean",
  normalizationMethod = "rank",
  listNames = NULL
)

Arguments

analyteLists

vector of the ID type of the corresponding interesting analyte list. The supported ID types of WebGestaltR for the selected organism can be found by the function listIdType. If the organism is others, users do not need to set this parameter. The length of analyteLists should be the same as the length of analyteListFiles or analyteLists.

analyteListFiles

If enrichMethod is ORA, the extension of the analyteListFiles should be txt and each file can only contain one column: the interesting analyte list. If enrichMethod is GSEA, the extension of the analyteListFiles should be rnk and the files should contain two columns separated by tab: the analyte list and the corresponding scores.

analyteTypes

a vector containing the ID types of the analyte lists.

enrichMethod

Enrichment methods: ORAor GSEA.

organism

Currently, WebGestaltR supports 12 organisms. Users can use the function listOrganism to check available organisms. Users can also input others to perform the enrichment analysis for other organisms not supported by WebGestaltR. For other organisms, users need to provide the functional categories, interesting list and reference list (for ORA method). Because WebGestaltR does not perform the ID mapping for the other organisms, the above data should have the same ID type.

enrichDatabase

The functional categories for the enrichment analysis. Users can use the function listGeneSet to check the available functional databases for the selected organism. Multiple databases in a vector are supported for ORA and GSEA.

enrichDatabaseFile

Users can provide one or more GMT files as the functional category for enrichment analysis. The extension of the file should be gmt and the first column of the file is the category ID, the second one is the external link for the category. Genes annotated to the category are from the third column. All columns are separated by tabs. The GMT files will be combined with enrichDatabase.

enrichDatabaseType

The ID type of the genes in the enrichDatabaseFile. If users set organism as others, users do not need to set this ID type because WebGestaltR will not perform ID mapping for other organisms. The supported ID types of WebGestaltR for the selected organism can be found by the function listIdType.

enrichDatabaseDescriptionFile

Users can also provide description files for the custom enrichDatabaseFile. The extension of the description file should be des. The description file contains two columns: the first column is the category ID that should be exactly the same as the category ID in the custom enrichDatabaseFile and the second column is the description of the category. All columns are separated by tabs.

collapseMethod

The method to collapse duplicate IDs with scores. mean, median, min and max represent the mean, median, minimum and maximum of scores for the duplicate IDs.

minNum

WebGestaltR will exclude the categories with the number of annotated genes less than minNum for enrichment analysis. The default is 10.

maxNum

WebGestaltR will exclude the categories with the number of annotated genes larger than maxNum for enrichment analysis. The default is 500.

fdrMethod

For the ORA method, WebGestaltR supports five FDR methods: holm, hochberg, hommel, bonferroni, BH and BY. The default is BH.

sigMethod

Two methods of significance are available in WebGestaltR: fdr and top. fdr means the enriched categories are identified based on the FDR and top means all categories are ranked based on FDR and then select top categories as the enriched categories. The default is fdr.

fdrThr

The significant threshold for the fdr method. The default is 0.05.

topThr

The threshold for the top method. The default is 10.

reportNum

The number of enriched categories visualized in the final report. The default is 20. A larger reportNum may be slow to render in the report.

setCoverNum

The number of expected gene sets after set cover to reduce redundancy. It could get fewer sets if the coverage reaches 100%. The default is 10.

perNum

The number of permutations for the GSEA method. The default is 1000.

gseaP

The exponential scaling factor of the phenotype score. The default is 1. When p=0, ES reduces to standard K-S statistics (See original paper for more details).

isOutput

If isOutput is TRUE, WebGestaltR will create a folder named by the projectName and save the results in the folder. Otherwise, WebGestaltR will only return an R data.frame object containing the enrichment results. If hundreds of gene list need to be analyzed simultaneously, it is better to set isOutput to FALSE. The default is TRUE.

outputDirectory

The output directory for the results.

projectName

The name of the project. If projectName is NULL, WebGestaltR will use time stamp as the project name.

dagColor

If dagColor is binary, the significant terms in the DAG structure will be colored by steel blue for ORA method or steel blue (positive related) and dark orange (negative related) for GSEA method. If dagColor is continous, the significant terms in the DAG structure will be colored by the color gradient based on corresponding FDRs.

saveRawGseaResult

Whether the raw result from GSEA is saved as a RDS file, which can be used for plotting. Defaults to FALSE. The list includes

Enrichment_Results: A data frame of GSEA results with statistics
Running_Sums: A matrix of running sum of scores for each gene set
Items_in_Set: A list with ranks of genes for each gene set

gseaPlotFormat

The graphic format of GSEA enrichment plots. Either svg, png, or c("png", "svg") (default).

nThreads

The number of cores to use for GSEA and set cover, and in batch function.

cache

A directory to save data cache for reuse. Defaults to NULL and disabled.

hostName

The server URL for accessing data. Mostly for development purposes.

useWeightedSetCover

Use weighted set cover for ORA. Defaults to TRUE.

useAffinityPropagation

Use affinity propagation for ORA. Defaults to FALSE.

usekMedoid

Use k-medoid for ORA. Defaults to TRUE.

kMedoid_k

The number of clusters for k-medoid. Defaults to 25.

isMetaAnalysis

whether to perform meta-analysis. Defaults to TRUE.

mergeMethod

The method to merge the results from multiple omics (options: mean, max). Only used if isMetaAnalysis = FALSE. Defaults to mean.

normalizationMethod

The method to normalize the results from multiple omics (options: rank, median, mean). Only used if isMetaAnalysis = FALSE.

listNames

The names of the analyte lists.