Contents

  1. Overview
  2. Select your input type
  3. Enter your data set
  4. Choose the analysis
  5. Configure the analysis
  6. Results
  7. Footnotes

Overview

Use the Set Analyzer tool to perform analyses such as set-based enrichment for collections of chemicals or genes, and pathway generation for collections of genes.

Select your input type

Chemicals

Select this option to analyze a set of chemicals that you enter in Step 2. You may specify chemicals by MeSH® name, synonym, or accession ID (“MESH:…”), or by CAS RN.

Genes

Select this option to analyze a set of genes that you enter in Step 2. You may specify genes by NCBI symbols or accession IDs (“GENE:…”).

Enter your data set

Enter or paste the names/symbols or accession IDs (see above) for your set of chemicals or genes.

You may separate terms by returns, tabs, or vertical bars (|).

Choose the analysis

Enriched diseases

Displays the diseases (MEDIC terms) that are statistically enriched among your input genes/proteins. A disease is considered enriched if the proportion of genes annotated to it in a test set is significantly larger than the proportion of all genes annotated to it in the genome.

Many of the genes/proteins with curated chemical interactions in CTD are associated with human diseases. To provide insight into the diseases that may be influenced by a particular chemical, for example, you can run this analysis using the genes/proteins that have curated interactions with that chemical.

Enriched GO functional annotations

Displays the GO terms that are statistically enriched among your input genes/proteins. The enrichment calculations consider only human GO annotations. A GO term is considered enriched if the proportion of genes annotated to it in a test set is significantly larger than the proportion of all genes annotated to it in the genome.

Many genes/proteins with curated chemical interactions in CTD have GO annotations that provide information about their associated biological processes, molecular functions, and cellular components. To provide insight into the biological properties that may be affected by a chemical, for example, you can run this analysis using the genes/proteins that have curated interactions with that chemical.

Enriched pathways

Displays the pathways that are statistically enriched among your input genes/proteins. A pathway is considered enriched if the proportion of genes annotated to it in a test set is significantly larger than the proportion of all genes annotated to it in the genome.

Many of the genes/proteins with curated chemical interactions in CTD are represented in KEGG and REACTOME pathway maps that represent molecular interaction and reaction networks. To provide insight into the pathways and networks that may be affected by a particular chemical, for example, you can run this analysis using the genes/proteins that have curated interactions with that chemical.

Common gene–gene interactions

Displays the gene–gene and protein–protein interactions (from BioGRID) among your input genes/proteins.

CTD represents gene–gene interactions from BioGRID[2] that consist of genetic and protein interactions curated from primary literature for all major model organisms by BioGRID curators.

Configure the analysis

P-value

For enrichment analyses, you must specify a significance threshold—a corrected or raw p-value—for the results. Only enriched terms with a p-value less than your setting will appear.

We recommend that you interpret the significance of whether a particular disease, GO term, or pathway is enriched based on the corrected p-value. By default, the tool uses a corrected p-value threshold of 0.01.

The significance of enrichment is calculated by the hypergeometric distribution and adjusted for multiple testing using the Bonferroni method.[1] Depending on whether you selected the enrichment to be calculated for diseases, GO terms, or pathways, Set Analyzer iterates over the list of diseases, GO terms, or pathways annotated to the gene set. The hypergeometric distribution is used to calculate the probability that the fraction of genes annotated to the GO term or pathway is significantly higher than the fraction of all human genes annotated to that disease, GO term, or pathway in the genome.

For a particular disease, GO term, or pathway, the probability from the hypergeometric distribution is the raw p-value. As with any enrichment analysis, the raw p-value needs to be corrected for multiple testing as the number of false positives is proportional to the number of enrichment tests performed and the raw p-value threshold applied. The most conservative approach for multiple testing correction is the Bonferroni method where the raw p-value is corrected by multiplying it by the number of tests. In this case, the number of tests is the number of diseases, GO terms, or pathways that are annotated to one or more genes in your input list. Corrected p-values greater than 1.0 are displayed as 1.0.

Results

Enrichment analyses

For each enriched disease, GO term, or pathway with a p-value less than the value you specified, the following information is displayed:

  1. The name of the disease, GO term, or pathway, linked to additional information about it.
  2. The accession ID of the disease, GO term, or pathway.
  3. The raw p-value.
  4. The corrected p-value calculated using the Bonferroni multiple testing adjustment (see P-value, above).
  5. The number of annotated genes with the associated disease, GO term, or pathway, or its descendants. Each number is linked to a list of those genes.
  6. The genome frequency: the fraction of genes in the genome annotated to the disease, GO term, or pathway.

Common gene–gene interactions

Pathway View

You may visualize the pathways of interacting genes by either:

  1. clicking the "Show graph" link or icon to use our online visualization (your browser must support both JavaScript and Flash), or
  2. downloading the XGMML file into Cytoscape (see below).

The XGMML file can be viewed using Cytoscape, an open-source pathway visualization application:

  1. Download and install Cytoscape.
  2. Launch Cytoscape.
  3. Import the downloaded XGMML file by going to File > Import > Network (Multiple File Types) > Local (or Ctrl+L) and selecting your file. See also: Cytoscape Manual
  4. Use the Layout menu to view different arrangements of the network. We suggest the Cytoscape > Force-Directed Layout as a starting point.
  5. To view additional details about the nodes and edges within a pathway, select the Node or Edge Attribute Browser and click the Select All Attributes button in the Data Panel.
    Cytoscape Data Panel: Select All Attributes button

Tabular View

The following data is presented for each gene/protein interaction:

  1. The source gene in the interaction.
  2. The target gene in the interaction.
  3. The organism of the target gene.
  4. The organism of the source gene.
  5. The experimental system (assay) in which the interaction was recorded. More…
  6. The type of the assay.
  7. The throughput level(s) of the interaction.
  8. The reference supporting the interaction (if not a reference detail view).

Sorting

Sort these data differently by clicking a column heading.

Download

Save these data into a comma-separated values (CSV), Excel, XML, or tab-separated values (TSV) file by clicking a Download link at the bottom of the table.

Top ↑ Footnotes

[1]
Boyle EI, Weng S, Gollub J, Jin H, Botstein D, Cherry JM, Sherlock G. GO::TermFinder—open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics. 2004 Dec 12;20(18):3710-5. PMID:15297299
[2]
Chatr-Aryamontri A, Breitkreutz BJ, Heinicke S, Boucher L, Winter A, Stark C, Nixon J, Ramage L, Kolas N, O'Donnell L, Reguly T, Breitkreutz A, Sellam A, Chen D, Chang C, Rust J, Livstone M, Oughtred R, Dolinski K, Tyers M. The BioGRID interaction database: 2013 update. Nucleic Acids Res. 2013 Jan;41(Database issue):D816-23. PMID:23203989