What's in a GeneCard?


This page provides information about the various GeneCards sections and tables.

General Comments

  • The sections that follow are linked to by the GeneCards questionmark help links and found in the corresponding GeneCards section.
  • Superscripts in the data refer to the sources (shown on the bottom of the card) from where the data was extracted.
  • Tooltips offering explanatory information about images and links can be viewed by placing your mouse over the images and links (see expression)
  • To keep the GeneCards page more compact, many of the tables/columns initially offer only partial, high-scoring results (e.g: the top 5 SNPs sorted by type, with coding, nonsynonomous SNPs shown first; the chemical compounds matching highest with the gene; etc). In these cases, a hyperlink is always provided for viewing all of the information available for that item.

GeneCards Categories

CategoryDescription
protein-codingProtein-coding according to HGNC, Ensembl, or Entrez Gene*
pseudogenePseudogene according to HGNC, Ensembl, or Entrez Gene*
ncRNA geneRNA gene according to HGNC, Ensembl, or Entrez Gene*
or genes that are mined from RNAcentral and its external sources.
Biological regionsNon-genic functional elements (e.g. enhancers, promoters) that have been described in the literature and are experimentally validated, according to Entrez Gene
gene clusterincludes piwi-interacting RNA clusters (PIRCs) and symbols ending with '@'
genetic locus none of the above, but there is disease information , or 'QTL' in the symbol
uncategorizednone of the above

* In cases of conflict, HGNC overrides Ensembl which overrides Entrez Gene.

  • Categories are based on Entrez Gene type and status, as well as several other factors.
  • The former categories 'predicted' and 'predicted with support' are manifested now in the attibute 'predicted', which appears in the upper left box, where the category and GCid appear. This attribute means that the Entrez Gene status is 'PREDICTED', 'INFERRED', or 'MODEL' or the symbol source is Ensembl.
  • The former category "reserved symbol" no longer exists because it is no longer used by HUGO.

GeneCard Header

This section provides the gene's symbol, category, GIFtS score (see below), and GCid
The header also contains a short description of the gene, approved by HUGO Gene Nomenclature Committee (HGNC) database on the left side of the header.

Next to the gene symbol and category, there is a star symbol allowing the user to mark the gene for future reference. All marked genes can be viewed in the My Genes page.

On the right side of the header, there are links to Gene and Participant pages at the Undiagnosed Diseases Network (UDN) .

GeneCards Inferred Functionality Scores (GIFtS)

The GIFtS algorithm uses the wealth of GeneCards annotations to produce scores aimed at predicting the degree of a gene's functionality. Since the degree of known functionality is correlated with the amount of research done on a particular gene or its product, we use these annotations in a scoring system aimed at inferring functionality. Note that while the accumulation of data for a specific gene in certain databases is merely correlated with functionality, many GeneCards sources, like the Gene Ontology (GO) Consortium and Genatlas provide definitive information about functionality.

Our goal is to use these two types of annotations in order to measure the functionality of GeneCards genes. Our first step, was to produce for each gene, a binary vector of 67 elements , indicating presence or absence of data in each relevant source. The GIFtS score of a particular gene is a percentage which is derived from the sum of these binary values divided by the number of sources (the vector length).

Improved GIFtS includes experimenting with increased resolution by using sub-sectioning of data sources and adjusting scores based on the presence or absence of detailed annotations within a source (currently SwissProt). In addition we have introduced weights related to the quantitative aspects of annotations items, enabling better evaluation of the data relevant to annotation levels (currently orthologs and publications). In order to enrich GIFtS with respect to protein data, we selected the pivotal bioinformatics source for such data, namely SwissProt, and dissected it into 6 sub sources: protein subunit, sub cellular location, post-translational modification, function, catalytic activity, and other. Each of these subfields received a binary score as described above, thereby increasing the GIFtS vector size by 5. To weight proteins effectively in the new vectors, the sum of the binary data was still divided by the original number of sources (with SwissProt treated as 1 source for this denominator, in spite of its sub sources contributions to the numerator). To enrich GIFtS by orthologs or publications data, we define a new score for each of those components, which is then added to the default GIFtS. Specifically, the orthologs and publications scores for each gene are calculated as round (logxsum(i)), where x equals 3 for orthologs and 5 for publications, and sum(i) is the number of relevant orthologs or publications. Genes with no orthologs or publications receive score of zero for the relevant component(s); scores rounded down to 0 (for low counts) are normalized to 1.

Harel A, Inger A, Stelzer G, Strichman-Almashanu L, Dalah I, Safran M and Lancet D. GIFtS: annotation landscape analysis with GeneCards BMC Bioinformatics 2009, 10:348

GeneCards Sections

Aliases

This section displays synonyms and aliases for the relevant GeneCards gene, as extracted from OMIM, HGNC, Entrez Gene, UniProtKB (Swiss-Prot/TrEMBL), GeneLoc, Ensembl, DME, miRBase, ENA, GtRNAdb, LncBase, LncBook, Lncipedia, Modomics, Noncode, PDBe, Rfam, SILVA, snopy, SRPDB and TarBase. Also shown are accessions from HGNC, EntrezGene, UniProtKB, OMIM, HORDE, and/or Ensembl and previous symbols where relevant (for cases that GeneLoc deems it necessary to assign a new identifier to a gene based on updated information about its chromosomal location). Although gene symbols may change, GC ids will always remain with their original genes and will not be reused with other symbols.

Subcategories for genes with the 'RNA gene' category are derived from Ensembl's biotype, Entrez Gene's gene type, HGNC's locus type, and RNAcentral’s RNA type.

Number of RNA transcript sources

The number of databases that have annotations for transcripts of this gene, as compared to the total number of sources that provide non-coding transcripts data.

Finally, this section contains an option to search for the gene in outside databases by selecting from its aliases, asociated disorders and/or other keywords.

Summaries

This section displays descriptions of a gene's function, cellular localization and a gene's effect on phenotype for the relevant GeneCards gene, as extracted from Entrez Gene, CIViC, UniProtKB (UniprotKB/Swiss-Prot/ UniprotKB/TrEMBL), Tocris Bioscience, PharmGKB, Gene Wiki, and Rfam. The GeneCards-generated summary compiles significant annotations for the gene (such as aliases, diseases, paralogs, and pathways) into a descriptive text.

The Additional Gene Info subsection displays the gene's external ids, as well as links to The Monarch Initiative, DataMed, HumanCyc, and Open Targets Platform.

Genomics

This section displays the chromosome, cytogenetic band and map location of the GeneCards gene as extracted from GeneLoc, HGNC, Entrez Gene, Nature (405, 311-319) and miRBase, as well as genomic views from UCSC and Ensembl, RefSeq DNA sequence links and transcription factor binding sites from Qiagen and links to SPP. The GeneLoc integrated location is shown in red on the image. If this differs from the location provided by Entrez Gene and/or Ensembl, their locations are shown on the image in green and/or blue respectively. Also provided are links to the GeneLoc gene density information for this gene's chromosome, which shows the number of genes in each 1 Mb interval along the chromosome, and to detailed exon information as provided by GeneLoc.

We also integrated non-coding transcripts from RNAcentral. Transcripts with the same RNA type, same strand and at least one overlapping exon were grouped into clusters. Clusters overlapping with ncRNA genes from the main sources described above were added as annotations. Other clusters are defined as novel ncRNA genes.

GeneHancer Regulatory Elements

This subsection describes genomic regulatory elements related to the gene from GeneHancer. GeneHancer is a database of genome-wide enhancer-to-gene and promoter-to-gene associations, embedded in GeneCards. Regulatory elements were mined from the following sources:

  1. The ENCODE project (see paper) Z-Lab Enhancer-like regions
  2. Ensembl regulatory build (see paper)
  3. FANTOM5 atlas of active enhancers (see paper)
  4. VISTA Enhancer Browser enhancers validated by transgenic mouse assays (see paper).
  5. dbSUPER super-enhancers (see paper).
  6. EPDnew promoters (see paper).
  7. UCNEbase ultra-conserved noncoding elements (see paper).
  8. CraniofacialAtlas (see paper).

The GeneHancer table lists a set of enhancers and promoters associated with the gene. Gene-GeneHancer associations and likelihood-based scores were generated using information that helps link regulatory elements to genes:

  1. eQTLs (expression quantitative trait loci) from GTEx (see paper)
  2. Capture Hi-C promoter-enhancer long range interactions (see paper)
  3. Expression correlations between eRNAs and candidate target genes from FANTOM5 (see paper)
  4. Cross-tissue expression correlations between a transcription factor interacting with an enhancer and a candidate target gene
  5. GeneHancer-gene distance-based associations, scored utilizing inferred distance distributions. Associations include several approaches: (a) Nearest neighbors, where each GeneHancer is associated with its two proximal genes (from all gene categories). In cases where a proximal gene is not protein coding, the nearest protein coding gene is also included; (b) Overlaps with the gene territory (Intragenic); (c) Proximity (<2kb) to the gene TSS (transcription start site). TSS proximity scores are boosted to elevate Gene-GeneHancer associations in the vicinity of the gene TSS.

GeneHancer elements have unique, informative and persistent GeneHancer identifiers (GHids). The id begins with GH, which is followed by the chromosome number, a single letter related to the GeneHancer version (constant since version 4.8, ‘J’), and approximate kilobase start coordinate. Example: GH0XJ101383 is located on chromosome X, with starting position (in kb) of 101383.

GeneCards VersionGH id letter
4.8J
4.7I
4.6H
4.5G
4.4F
4.3E
4.2-

Each GeneHancer has a confidence score which is computed based on a combination of evidence annotations: (1) Number of sources; (2) Source scores; (3) TFBSs (from ENCODE). GeneHancers supported by two or more evidence sources were defined as elite and annotated accordingly with an asterisk. For every GeneHancer, the following annotations are included: GH id, GH type (promoter, enhancer or both), the sources with evidence for the GeneHancer, genomic size, GeneHancer confidence score, and a list of TFs (Transcription Factors) having TFBSs (Transcription Factor Binding Sites) within the GeneHancer (based on ChIP-Seq evidence). The expanded view also provides genomic location of the GeneHancer, and additional source-specific annotations such as identifiers, genomic locations, enhancer type (proximal/distal), a list of biological samples with evidence for the GeneHancer, a list of super-enhancers the enhancer belongs to, eRNA expression strength (maximum pooled expression of eRNA CAGE tag clusters), tissue pattern, and tissue pattern reproducibility.

For every gene-GeneHancer association the following annotations are displayed: Gene- GeneHancer Score, a general score for the gene-GeneHancer association (combined score based on all methods, associations supported by two or more methods were defined as elite and annotated accordingly with an asterisk), total score - multiplication of the GeneHancer confidence score and the GeneHancer-gene association score, gene-GeneHancer distance (calculated between the GeneHancer midpoint and the gene TSS, positive for downstream and negative for upstream), number of genes having a TSS between the gene and the GeneHancer, and a list of all genes associated with the GeneHancer. The expanded view also provides method-specific scores for the gene-GeneHancer association (p-values for eQTLs and co-expression, log(observed/expected) for C-HiC, and distance-inferred probability) and annotation of Topologically Associated Domains (TADs) shared by the GeneHancer and the gene (mined from the ENCODE project). A link to a UCSC GeneCards custom track presenting all GeneHancers within 100kb from the gene is located below the GeneHancers table.

Disease-GeneHancer associations: GeneHancer-gene pairs were associated to diseases by integrating manually curated disease-associated variants within regulatory elements from (1) DiseaseEnhancer, PMID:29059320; (2) PMID:27569544.

GWAS phenotypes: GeneHancer elements were associated to phenotypes by mapping GWAS SNPs from the GWAS Catalog (PMID: 27899670)

Proteins

This section provides annotated information of the proteins encoded by GeneCards genes according to UniProtKB, HORDE, neXtProt, Ensembl, and/or GlyConnect, the capability to view phosphorylation sites using PhosphoSitePlus, Specific Peptides from DME, a link to the Protein Expression image from SPIRE MOPED,PaxDb, and MaxQB and reference sequences (RefSeq) according to NCBI. Direct links to three-dimensional visualization of PDB structures provided by the OCA browser and Proteopedia. Visualizations are also provided via the (3D) for OCA Browser or the Proteopedia symbol hyperlink shown next to each PDB identifier.
Genes with similar ontologies can be seen using Genes Like Me (more information)

Post-translational modifications

This subsection provides annotated information of post translational modifications according to UniprotKB, neXtProt, and GlyConnect and the capability to view phosphorylation sites using PhosphoSitePlus. Specific amino acid identity and position of glycosylation and ubiquitination modifications are mined from neXtProt. Amino acid position refers to the sequence of the canonical isoform as defined in neXtProt.

Domains

This section provides annotated information about protein domains and families according to HGNC, IUPHAR, InterPro, ProtoNet, UniProtKB and Blocks, and Suggested Antigen Peptide Sequences from GenScript. We also show a list of protein family terms, as mined from the Human Protein Atlas (HPA). Original content and additional information can be found at the Human Protein Atlas available at www.proteinatlas.org (PubMed: 28495876)
Genes with similar domains can be seen using Genes Like Me (more information)

Function

This section provides annotated information about gene function according to MGI, UniProtKB IUBMB, DME, Genatlas, and LifeMap Discovery™, including: Human phenotypes from GenomeRNAi, transcription factor targeting from Qiagen and/or HOMER, miRNA Gene Targets from miRTarBase, as well as molecular function ontologies visualized by the Gene Ontology Consortium (more information).
Genes with similar ontologies can be seen using Genes Like Me (more information).

Information from MGI includes links to mouse knock-outs, phenotypes for mouse orthologs, and a popup table with information on phenotypic alleles of the orthologs. This table presents the following columns:

  • Experiment type - Type of allele by mode of origin
  • Name - Official symbol for the allele

Genes with similar phenotypes can be seen using Genes Like Me (more information)

GWAS Catalog

This table lists human phenotypes associated to the gene by GWAS (genome wide association studies) from the GWAS Catalog (PMID: 27899670). The GWAS Catalog is a quality controlled, manually curated, literature-derived collection of published genome-wide association studies assaying at least 100,000 SNPs and all SNP-trait associations with p-values < 1.0 x 10-5. Each record in the table summarizes the phenotype association evidence for a given gene, including the best p-value, the average p-value, count of SNPs, count of studies, and a list of SNPs. Phenotype-gene associations were created via 3 routes. (1) Associations reported by the source - marked as ‘GWAS’ in the ‘Gene Relation’ column. (2) Associations created by mapping SNPs to gene exons - marked as 'GeneExon'. (3) Associations created by mapping SNPs to GeneHancer regulatory elements - marked as ‘GeneHancer’.

Human Phenotype Ontology (HPO)

This table lists human phenotypes that were found to be linked to the gene by the Human Phenotype Ontology project. The HPO project has generated a set of 10,088 classes (terms) describing human phenotypic abnormalities. Links between phenotypes to genes are generated using the information about the phenotypes of a particular syndrome and the corresponding genes that are known to cause this syndrome when mutated. For each gene, the table first displays the most specific relevant HPO-classes (in bold), followed by their ancestor terms. (This approach implements the transitivity of the HPO annotation method).

Localization

This section provides information about gene localization according to UniProtKB and COMPARTMENTS Subcellular localization database, as well as cellular component ontologies visualized by the Gene Ontology Consortium (more information).

Subcellular locations from COMPARTMENTS:

COMPARTMENTS localization data is integrated from literature manual curation, high-throughput microscopy-based screens, predictions from primary sequence, and automatic text mining (see COMPARTMENTS: unification and visualization of protein subcellular localization evidence ). Unified confidence scores of the localization evidence are assigned based on evidence type and source, and visualized both in a table and in the schematic cell image. Confidence scale is color coded, ranging from light green (1) for low confidence to dark green (5) for high confidence. White (0) indicates an absence of localization evidence.

Subcellular locations from the Human Protein Atlas (HPA)

List of subcellular location terms, as mined from the Human Protein Atlas (HPA). Original content and additional information can be found at the Human Protein Atlas available at www.proteinatlas.org (PubMed: 28495876)

Pathways & Interactions

This section provides SuperPaths from PathCards, links to pathways according to information extracted from Kyoto Encyclopedia of Genes and Genomes (KEGG), Cell Signaling Technology, R&D Systems, GeneGo (Thomson Reuters), Reactome, BioSystems, Sino Biological, Tocris Bioscience, PharmGKB, Qiagen, and GeneTex interactions according to UniProtKB, IID, STRING MINT, and GPS-Prot, as well as biological process ontologies visualized by the Gene Ontology Consortium (more information).

Genes with similar ontologies and those in the same pathways can be seen using Genes Like Me (more information)
Links to the STRING Interaction Network for the relevant gene are also provided.

SuperPaths: unified GeneCards pathways

This table provides links to pathways in a unified view. All pathways from the sources listed above were clustered into SuperPaths for a better understanding of how the different pathways relate to one another. The left column contains a name representing the SuperPath, based on the most connected pathway in the SuperPath (this name giving pathway may or may not contain the gene to which the GeneCard belongs). SuperPaths are linked to PathCards, an integrated database of human pathways and their annotations. Human pathways were clustered into SuperPaths based on gene content similarity. Each PathCard provides information on one SuperPath, which represents one or more human pathways. The right column contains all current gene's pathways that belong to this SuperPath. Each of the contained pathways (in the right column) is followed by a score which is the Jaccard similarity score (0-1) to the most similar pathway. The SuperPaths are sorted by abundance of sources and then by number of gene-related pathways in the SuperPaths.

Below this table, all relevant pathways are listed by source.

Interacting proteins

Each line in this table represents one interacting protein, according to MINT, UniProtKB, IID, and/or String. The following columns are presented:

  • Symbol - Links to the GeneCards page (first sub-column) of the interacting protein
  • External ID(s) - Links to the the UniProtKB page and/or the Ensembl page for the interacting protein. Superscript links are to one of the following:
    • The comments section in the UniProtKB page for the interactant.
    • The page of all interactions between the two proteins, or all experiments supporting them, in the MINT database.
    • The page of all interactions with the interactant at IID.
    • The interaction network of the interactant at String.
  • Details - Links to the interaction page in the database from which it was retrieved. In the case of IntAct, this page may include several different experiments supporting the same interaction. In the MINT database each distinct interaction definition or experiment supporting it is assigned a different mint id, all are presented. In the IID database, the numbers of experimental and prediction studies are shown. In the String database each interaction is given an experimental score (based on experimental datasets from other protein-protein interaction databases) and a database score (based on information from curated databases). These scores indicate the confidence that the predicted interaction exists.

SIGNOR Curated Interactions

Presents a deep link to the SIGnaling Network Open Resource (SIGNOR), as well as a list of interacting genes, all linked to their GeneCards. The interactions are categorized as Activates, Inactivates, Is activated by, Is inactivated by, or Other effect.

Drugs & Chemical Compounds

This section provides relationships between GeneCards genes and drugs from different sources, including DrugBank, ApexBio, DGIdb, FDA Approved Drugs, ClinicalTrials.gov, and/or PharmGKB in a unified table. This table is sorted by each drug’s approval status, number and quality of sources, and group. Following this table there is a unified ‘Additional Compounds’ table that displays compounds from IUPHAR, Novoseek, HMDB, BitterDB, and/or Tocris Biosciences that are not also found at the above drug sources. Both tables have an "Options" drop-down menu to exclude(default)/include inferred text-mined information (from Novoseek).

This section also provides links to drugs and compounds for ordering at Tocris Biosciences, and ApexBio.

Drugs

Drugs shown in this table are considered to be drugs by DrugBank, ApexBio, DGIdb, FDA Approved Drugs, ClinicalTrials.gov, and/or PharmGKB. Superscripts in the Name column are provided only for sources giving evidence for drug-gene relationships. Drug metadata in other columns provided by DrugBank, ApexBio, DGIdb, FDA Approved Drugs, ClinicalTrials.gov, PharmGKB, IUPHAR, HMDB, BitterDB, Tocris Biosciences, and/or Novoseek.

This table presents the following columns:

  1. Name - Name is the unified name chosen for this drug. Different names from sources that unify with this name are supplied in the Synonyms column. Superscripts link to sources that provide an association between the drug and the gene.
  2. Status - Includes approved (with associated date when available), investigational, experimental, withdrawn. Status is disease-dependent. For example, a drug can be “approved” for one disease and have a status of “investigational” or “withdrawn” for others.
  3. Disease Links - Links to this drug's entry at MalaCards and/or Medline Plus
  4. Group - Denotes the group that the drug belongs to: pharmaceutical or nutraceutical.
  5. Role - The Role is specific to the gene-drug relation. Possible values for this field include transporters, enzyme, inhibitor, antagonist, binder, antibody, agonist.
  6. Mechanism of Action - Such as specific inhibitors or antagonists.
  7. Clinical Trials - Links to the clinical trials of this drug at ClinicalTrials.gov
A plus icon to the left of the drug name expands the table row to show synonyms, the CAS number and PubChem IDs associated with this drug, and PubMed IDs of papers that associate this drug with this gene.

Additional Compounds

This table presents the following columns for compounds associated with this gene that are not classified as "drugs" in the preceding table:

  1. Name - Name is the unified name chosen for this compound. Superscripts link to sources that provide an association between the compound and the gene.
  2. Synonyms - Different names that unify with this name.
  3. Role - The Role is specific to the gene-compound relation. Possible values for this field include transporters, enzyme, inhibitor, antagonist, binder, antibody, agonist.
  4. CAS Number - Chemical Abstracts Registry number.
  5. PubChem IDs - PubChem Identifiers associated with this compound
  6. PubMed IDs - PubMed IDs of articles associated with this compound.

Genes with similar drug and compound relationships can be seen using Genes Like Me (more information)

Transcripts

This section contains REFSEQ mRNAs, additional gene/cDNA sequences from GenBank, transcript links to Ensembl, an integrated table of ncRNA transcripts where relevant, CRISPR, miRNA, Inhibitory RNA, and Clone Products, exon structure information from GeneLoc and alternative splicing information.

The GeneCards ncRNA compendium

Our integrated gene-centric compendium of human non-coding RNA (ncRNA) genes was created by integrating data of transcripts genomic locations into a comprehensive non-redundant gene-centric view of human ncRNA genes.

GeneCards ncRNA gene records and annotations are integrated from EBI’s RNAcentral, and its expert sources of human ncRNA transcripts (including ENA, GtRNAdb, LncBase, LncBook, Lncipedia, miRBase, MirGeneDB, Modomics, Noncode, PDBe, Rfam, SILVA, snopy, snoDB, SRPDB, TarBase and 5SrRNAdb), as well as from Ensembl, NCBI Entrez Gene and HGNC.
Using the gene-centric GeneCards model, we clustered overlapping transcript entries from the aforementioned sources, applying an algorithm based on transcript annotations and genomic coordinates.

First, established ncRNA genes are generated using the GeneLoc algorithm, integrating genes from HGNC, NCBI and Ensembl.

Then, RNAcentral transcripts are processed as follows: If possible, transcripts are associated with established ncRNA genes based on their genomic coordinates and annotations (gene symbol, alias and RNA class). Remaining transcripts are clustered, based on 3 parameters: 1. Belonging to the same RNA class. 2. Mapped to the same DNA strand 3. Overlapping with at least 70% of one exon. Each cluster is defined as a GeneCards gene, categorized as an RNA gene, and annotated with an ncRNA subcategory based on the RNA class of its clustered transcripts. Coordinates of these genes are defined to be the minimum and maximum coordinates of all of their respective clustered transcripts. Transcript annotations are used to determine the gene symbol and aliases.

For RNA genes, this section contains a table of integrated transcripts. Transcripts information is based on RNAcentral’s data of non-redundant ncRNA transcript sequences and their annotations from expert sources. Each row in the table represents a transcript associated with a given gene. Data for each transcript includes transcript subcategory, length (sum of the lengths of the transcript exons), the number of expert sources annotating the transcript, and linked source transcript identifiers.

Alternative Splicing

This subsection contains alternative splicing information according to ASD. Exons with alternative splice sites in different isoforms were broken into Exonic Units (ExUns). The letters indicate the order of the ExUns in the exon. The symbol ' ^ ' between ExUns indicates an intron, while ' ·' indicates the junction of two ExUns. Mouseovers on the dark blue squares show the Exun's genomic coordinates, while mouseovers on the light blue squares show its transcript coordinates. When showing ASD's splice variants, GeneCards subtracts the 3000 bp flank that ASD adds to the transcript coordinates.
Note: We currently do not have any links to ASD, as their data has been frozen and their site taken down. We plan to upgrade this subsection.

Expression

This section contains expression images based on data from GTEx,BioGPS, Illumina Human BodyMap, and SAGE, with SAGE tags from CGAP, followed by a list of over-expressed tissues based on GTEx data, a table with expression data from LifeMap Discovery™, Protein Expression data from ProteomicsDB, SPIRE MOPED, PaxDb, and MaxQB, links to SPP and SOURCE, tissue specificity data from UniProtKB, evidence on tissue expression from TISSUES, and phenotype-based relationships between genes and organs from Gene ORGANizer.

GTEx

RNA-seq RPKM values were obtained from GTEx for 51 normal human tissues, cells and fluids, based on 2712 samples. Data was averaged across samples for each tissue, and rescaled by multiplying RPKM by 100 and then calculating the root. Multiple datasets describing different compartments of a tissue were further averaged to produce a single generalized dataset for the tissue (Adipose, Artery, Brain, Colon, Esophagus, Heart).

BioGPS

Measurements were obtained for 76 normal human tissues and compartments hybridized against HG-U133A. The Affymetrix MAS5 algorithm was used for array processing and probesets were averaged per gene.

Illumina body map

RNA obtained from 16 normal human tissues was sequenced and mapped to genes via their transcripts. Fragments Per Kilobase of exon per Million fragments mapped (FPKM) were calculated using the Cufflinks program and thereupon rescaled by multiplying FPKM by 100 and then calculating the root.

CGAP: SAGE Normal

Serial Analysis of Gene Expression: For 19 normal human tissues, CGAP datasets Hs.frequencies and Hs.libraries are mined for information about the number of SAGE tags per tissue. Tags are reassigned to a Unigene cluster and after that to a particular gene by mining Hs.best_gene, Hs.best_tag and Hs_GeneData. The expression level of a particular gene in a particular tissue was calculated as the number of appearances of the corresponding tag divided by the total number of tags in libraries derived from that tissue. These fractions were then rescaled by making the geometric mean of all tissues equal. Please note: Currently, only associations with minimal ambiguity participate in the analysis.

Tissues and anatomical compartments are colored according to 6 categories - Immune (red), Nervous (green), Muscle (yellow), Internal (blue), Secretory (violet) and Reproductive (turquoise).

Normalized intensities are drawn on a root scale, which is an intermediate between log and linear scales. Values are not comparable between datasets (i.e. Microarray, RNAseq and SAGE).

Genes with similar binary patterns can be seen using Genes Like Me (more information)

mRNA Differential Expression

This sentence provides a list of tissues for which a gene is positively differentially expressed, based on RNA-seq reads from GTEx. Fold change values of each sample were calculated using DESeq software, each sample reads were compared with all GTEx samples reads. Genes with fold change value >4 in a tissue are defined as positively differentially expressed in that tissue. Genes with maximal read count across tissues lower than 5 were excluded from calculations.

LifeMap Discovery™ Table

This table provides links to developmental and in vitro expression information in LifeMap Discovery™, the Embryonic Development and Stem Cells Database. Linked in-vivo cells or anatomical compartments where the gene is expressed also provide the tissue/organ of origin (using arrows). Links to stem cell differentiation are noted as "in vitro cells" or as "protocol derived cells". Additionally, there are links to datasets from external sources comprising high throughput experiments, such as microarray and RNA sequencing. The expression level (selective marker (cell-identifying gene) , positive , negative ) is also presented for each of the gene expression links. The table is grouped by tissue and sorted by number of hits, so tissues with more information are shown first.

Protein Differential Expression

This sentence provides a list of anatomical entities for which a gene is positively differentially expressed, based on the 69 integrated normal proteomics datasets in HIPED (the Human Integrated Protein Expression Database, see below). Fold change values were calculated as the ratio between the tested dataset protein abundance and the average of all datasets. Genes with fold change value >6 and protein abundance value >0.1 PPM in an anatomical entity are defined as positively differentially expressed in that entity

Expression Partners

This subsection provides a list of genes defined as expression partners with respect to protein and RNA expression. The 69 integrated normal proteomics datasets in HIPED (the Human Integrated Protein Expression Database, see below) were the platform used to calculate the pairwise similarity of across-tissue protein abundance patterns amongst all genes. Gene pairs having Pearson’s correlation coefficient of >0.7 are annotated as expression partners. A parallel analysis of RNA-seq data from GTEx was performed. Gene pairs having both RNA and protein correlations of >0.7 are annotated as ‘Elite’ expression partners.

Protein Expression Images

HIPED (the Human Integrated Protein Expression Database) is a unified database of protein abundance in human tissues, residing within GeneCards. HIPED is based on publically available mass spectrometry-based proteomics sources, integrating data for 69 normal anatomical entities (tissues, cells and fluids) and 125 cell lines. HIPED data sources:

  1. ProteomicsDB- Bernhard Küster, TUM School of Life Sciences Weihenstephan, Technische Universität München.
  2. MOPED - Eugene Kolker, Bioinformatics & High-throughput Analysis Lab, Seattle Children's Research Institute.
  3. PaxDb - Christian von Mering, Bioinformatics Group, Institute of Molecular Life Sciences, University of Zurich.
  4. MaxQB - Matthias Mann, Department of Proteomics and Signal Transduction, Max-Planck Institute of Biochemistry, Germany.

The data was normalized as follows:

  1. For each sample, ppm protein values were calculated, if not provided so by data sources. For each sample from MaxQB and ProteomicsDB, iBAQ expression values were divided by sum of values of each sample, and multiplied by 1,000,000. iBAQ, intensity-based absolute quantification, is a proxy for protein abundance levels (see https://www.nature.com/articles/nature10098#supplementary-information ). For all samples, data was gene centrically aggregated by summing expression values of all isoforms for each gene.
  2. Samples from similar tissues were averaged, using geometric mean.
  3. For better visualization of graphs, expression values are drawn on a root scale, which is an intermediate between log and linear scales as used for our mRNA expression graphs [see Safran, M., Chalifa-Caspi, V., Shmueli, O., Olender, T., Lapidot, M., Rosen, N., Shmoish, M., Peter, Y., Glusman, G., Feldmesser, E., Adato, A., Peter, I., Khen, M., Atarot, T., Groner, Y., and Lancet, D. Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE . Nucleic Acids Research 31,1:142-146 (2003). [PDF]].

The protein expression images present a protein expression vector for each gene, based on normalized abundances in 69 normal human anatomical entities. Selected 23 cell lines are shown in an additional image.
List of samples and their sources:

Sample ProteomicsDB MOPED PaxDB MaxQB
Serum
Plasma
Monocyte
Neutrophil
B-lymphocyte
T-lymphocyte
CD4 T cells
CD8 T cells
NK cells
Periph. blood mononuclear cells
Platelet
Lymph node
Tonsil
Bone marrow stromal cell
Bone marrow mesench. stem cell
Brain
Brain, fetal
Frontal cortex
Cerebral cortex
Cerebrospinal fluid
Spinal cord
Retina
Heart
Heart, Fetal
Bone
Colon muscle
Oral epithelium
Nasopharynx
Nasal respiratory epithelium
Esophagus
Stomach
Cardia
Gut, fetal
Colon
Rectum
Liver
Liver, fetal
Liver secretome
Kidney
Spleen
Lung
Lung Alveolar lavage
Adipocyte
Synovial fluid
Amniocyte
Vitreous humor
Saliva
Salivary gland
Thyroid
Adrenal
Breast
Milk
Pancreas
Pancreatic juice
Islet of Langerhans
Gallbladder
Prostate
Urine
Urinary bladder
Skin
Hair follicle
Placenta
Uterus
Cervix
Ovary
Ovary, fetal
Testis
Testis, fetal
Seminal vesicle
T-cell leukemia, Jurkat
Myeloid leukemia, K562
Lymphoblastic leukemia, CCRF-CEM
Brain cancer, U251
Brain cancer, GAMG
Bone cancer, U2OS
Kidney, HEK293
Liver cancer, HuH-7
Liver cancer, HepG2
NSC lung cancer, NCI-H460
Lung cancer, A549
Kidney cancer, RXF393
Colon cancer, RKO
Colon cancer, Colo205
Melanoma, M14
Breast cancer, LCC2
Breast cancer, MCF7
Pancreas cancer
Ovarian cancer, SKOV3
Prostate cancer, LnCap
Prostate cancer, PC3
Cervical cancer, HeLa S3
Cervical cancer, HeLa

Evidence on tissue expression from TISSUES

TISSUES tissue expression data is integrated from literature manual curation, proteomics and transcriptomics screens, and automatic text mining (see TISSUES 2.0: an integrative web resource on mammalian tissue expression). The evidence is unified by assigning confidence scores that facilitate comparison of the different types and sources of evidence. The unified scores are visualized at the TISSUES website on a schematic human body. Pairs of tissue term and score are listed in this GeneCards subsection. The following 21 terms are included:

Heart, Intestine, Kidney, Liver, Lung, Lymph node, Muscle, Nervous system, Pancreas, Skin, Spleen, Stomach, Adrenal gland, Bone, Bone marrow, Eye, Gall bladder, Thyroid gland, Blood, Saliva, Urine. Terms with confidence score below 2 are filtered out.

Orthologs

This section contains Orthologs from HomoloGene, Ensembl pan taxonomic compara, euGenes, SGD, and MGI Flybase and WormBase (through ensembl).

*Ensembl pan taxonomic compara doesn't have its own pages on the Ensembl site.

The table presents the following columns:

  • Organism - The names of the homologous species, using both scientific and popular terminology.
  • Taxonomy - The class of the species is presented (or a similar level if class is not defined). Higher taxonomic classifications are shown when hovering over the field (mouse-over).
  • Gene - The symbol for the gene in the homologous species. Its description is shown when hovering over the field (mouse-over).
  • Similarity - The percent similarity to the human gene, followed either by (n) where the comparison was based on nucleic acids or (a) for amino acid based comparisons.
  • Type - the type of orthology from Ensembl based on Ensembl gene trees:
    • 1 ↔ 1 (OneToOne) - one ortholog in the homologous species and one corresponding ortholog in human.
    • 1 → many - one ortholog in the homologous species and more than one corresponding orthologs in human.
    • many → 1 - more than one orthologs in the homologous species and one corresponding ortholog in human (changed from 1 → many type obtained from Ensembl, in cases where there is only one human ortholog).
    • 1 ↔ many - a more complex one to many relationships that can be better understood by examining the gene tree (changed from 1 → many type obtained from Ensembl, in cases where there is more than one ortholog in human as well as in the homologous species).
    • many ↔ many - more than one ortholog in the homologous species and more than one corresponding orthologs in human.
    • possible ortholog - type of orthology was not determined by Ensembl.
  • Details - The position of the gene in the homologous species, and IDs with links to sequences in other databases.

The species presented from Ensembl pan taxonomic compara were chosen to constitute a diverse collection of taxa including model organisms and species of interest. Currently, all available species from the Homologene database (old and new) are included. Species with no ortholog for the gene can be viewed just below the orthologs table.

Superscripts represent the source from which this data was extracted. Data from HomoloGene can have one of two superscripts. If the second one is cited, it means that data for this species exists only in the older version of HomoloGene, which used unfinished genomes and where the homologs found might not be true orthologs.

Following the table are links to Ensembl and TreeFam gene trees.

This section also includes links to Aminode (PMID:29358731) a comprehensive automated tool for evolutionary constrained regions (ECRs) discovery.

Paralogs

This section contains Paralogs from HomoloGene, Ensembl (similarities shown on mouseover), and SIMAP (PMID: 19906725), and Pseudogenes from Pseudogene.org. Genes with similar paralogs can be seen using Genes Like Me (more information). Paralogs obtained from SIMAP were chosen according to a fixed similarity score, shown on mouseover, to allow an average of 30 paralogs per protein-coding gene.

Genomic Variants

This section contains SNPs/Variants from ClinVar, UniProtKB and dbSNP, with links to Ensembl. This is followed by Structural Variations (CNVs/InDels/Inversions) from the Database of Genomic Variants, and links to mutations from HGMD, The Human Cytochrome P450 Allele Nomenclature Database, BGMUT, SNPedia, and the BRCA Exchange.

Sequence variations, with clinical significance, from ClinVar and Humsavar, with links to dbSNP

This table presents SNPs/variants with clinical significance annotations. SNP information is currently extracted from ClinVar VCF and UniProt's Human polymorphisms and disease mutations files. The order of a gene's displayed SNPs can be modified by using the up/down arrows above the relevant columns: SNP ID, clinical significance and condition, and position on the chromosome.

This table presents the following columns:
  • SNP ID - The NCBI rs number or UniProt VAR number or ClinVar variation ID for this SNP
  • Clinical significance and condition - clinical interpretation associated with this SNP, which can be pathogenic, likely pathogenic, drug response, benign, and/or other. Additionally, this column lists associated conditions provided by ClinVar or UniProt
  • Chr pos - chromosomal position: position of variation (strand)
  • Variation - Base pair or amino acid variation
  • AA info - View record at dbSNP or UniProtKB/Swiss-Prot
  • Type - The SNP type:
    MISSENSE_VARIANT - A sequence variant that changes one or more bases, resulting in a different amino acid sequence but where the length is preserved.
    FRAMESHIFT_VARIANT - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three.
    NONSENSE - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened polypeptide (also known as ‘stop gained‘).
    SYNONYMOUS_VARIANT - A sequence variant where there is no resulting change to the encoded amino acid.
    NON_CODING_TRANSCRIPT_VARIANT - A transcript variant of a non-coding RNA gene.
    FIVE_PRIME_UTR_VARIANT - 5’ UTR (untranslated region) - variation in transcript, but not in coding region interval
    THREE_PRIME_UTR_VARIANT - 3’ UTR (untranslated region) - variation in transcript, but not in coding region interval
    INTRON_VARIANT - A transcript variant occurring within an intron.
    SPLICE_DONOR_VARIANT - A splice variant that changes the 2 base pair region at the 5' end of an intron.
    SPLICE_ACCEPTOR_VARIANT - A splice variant that changes the 2 base region at the 3' end of an intron.
    INFRAME_DELETION - An inframe non synonymous variant that deletes bases from the coding sequence.
    INFRAME_INSERTION - An inframe non synonymous variant that inserts bases into in the coding sequence.
    INITIATOR_CODON_VARIANT - A codon variant that changes at least one base of the first codon of a transcript.
    INFRAME_INDEL - A coding sequence variant where the change does not alter the frame of the transcript.
    STOP_LOST - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript.
    GENIC_UPSTREAM_TRANSCRIPT_VARIANT - A variant that falls upstream of a transcript, but within the genic region of the gene due to alternately transcribed isoforms.
    GENIC_DOWNSTREAM_TRANSCRIPT_VARIANT - A variant that falls downstream of a transcript, but within the genic region of the gene due to alternately transcribed isoforms.
    NO_SEQUENCE_ALTERATION - A position or feature within a sequence that is identical to the comparable position or feature of a specified reference sequence.

Additional dbSNP identifiers (rs#s)

Variants from dbSNP that are not represented in the section above.

Structural Variation Table

Information about healthy variants is provided from the Database of Genomic Variants (DGV), containing each variant ID with its type (CNV or OTHER), its subtype (deletion, duplication, insertion, loss, gain, inversion, gain+loss, CNV, or complex), and a PubMed ID.

This section also provides Linkage Disequilibrium (LD) information from HapMap and Mutation information from HGMD.

Genic variance classification and scores

Genic intolerance – The ExAC RVIS (Residual Variation Intolerance Score based on the Exome Aggregation Consortium data) was mined from the Genic Intolerance database (see publication). The intolerance scoring system assesses whether genes have relatively more or less functional genetic variation than expected based on the apparently neutral variation found in the gene. Genes responsible for Mendelian diseases are significantly more intolerant to functional genetic variation than genes that do not cause any known disease. For each gene, the tolerance percentile is shown. Genes in the 25th percentile and below are considered intolerant to variation.

GDI (Gene damage index)

GDI values were mined from the Human Gene Damage Index (GDI) database (see publication). The GDI is the accumulated mutational damage of each human gene in the general human population. Highly damaged human genes are unlikely to be disease-causing, hence GDI might be used to filter out variants harbored in highly damaged (high GDI) genes that are unlikely to be disease-causing. For every gene the Phred-scale GDI score is shown along with the GDI percentile, using all disease causing genes as a reference set. More specific reference sets are provided at the GDI database.

Disorders / Diseases

This section contains Disorders in which GeneCards genes are involved, according to MalaCards. Gene-disease associations in MalaCards are generated by integration of GeneCards searches with information from multiple external sources: OMIM, ClinVar, Orphanet, UniProtKB/Swiss-Prot, Genetic Testing Registry, miR2Disease, LncRNADisease, the University of Copenhagen DISEASES database and Novoseek.
Additional information is shown from UniProtKB and Genatlas.

Finally, links to additional disease sources related to the gene are also provided: HuGENavigator, TGDB, the ATLAS of Genetics and Cytogenetics in Oncology and Haematology and the Open Targets Platform.

The MalaCards diseases table: This section provides a unified table of diseases associated with this gene by MalaCards. The table, sorted by MalaCards gene-association score, shows the disease name, linked to MalaCards, with superscripts indicating the sources for the disease annotation and a mouseover showing the MalaCards score. This score ranks diseases by how closely they are associated with the gene, factoring in the relative reliability of the sources that associate them. Elite associations are marked with an asterisk next to the disease name. The elite status is conferred when the gene-to-disease association is manually curated. For genes listed in The cancer Census Gene list, the cancer diseases they are associated with are marked with a CC icon. The cancer Census Gene list from COSMIC is an ongoing effort to catalogue those genes for which mutations have been causally implicated in cancer. The table further displays the most common alias for the disease, with a link to show all available aliases. A third column contains linked PubMed IDs associated with the disease. These can be seen by clicking on the magnifying glass icon, which appears whenever there are available PubMed IDs.

Genes with similar disease relationships can be seen using GenesLikeMe (more information)

Publications

This section provides titles of and links to research articles in PubMed, as associated via Novoseek, HGNC, Entrez Gene, UniProtKB, GAD, HMDB, and/or DrugBank.

The articles are ranked, first according to the number of GeneCards sources that associate the article with this gene and then by date of publication, and then according to the Novoseek score for this article/gene relationship. The year of publication appears in parentheses after the title of each article. Lower ranked articles may also appear in initial results if their titles or authors contain your search term.

Finally, links for searching for articles related to the gene in PubMed and other databases, as well as for viewing recent publication in Mastermind, are also provided.

Products



Repeating Data Sources

Gene Ontology (GO) Tables

The Gene Ontology sections in Function, Localization, and Pathways & Interactions display a table with the following columns:

GO ID
The identifier used by GO and linked to the GO entry
Qualified GO term
The description of this entry, possibly qualified with "NOT", "colocalizes with", or "contributes to" (see TPX2)
Evidence
A 2 or 3 letter code
Curator-assigned Evidence Codes
   Experimental Evidence Codes:
EXP: Inferred from Experiment
IDA: Inferred from Direct Assay
IPI: Inferred from Physical Interaction
IMP: Inferred from Mutant Phenotype
IGI: Inferred from Genetic Interaction
IEP: Inferred from Expression Pattern
   Computational Analysis Evidence Codes:
ISS: Inferred from Sequence or Structural Similarity
ISO: Inferred from Sequence Orthology
ISA: Inferred from Sequence Alignment
ISM: Inferred from Sequence Model
IGC: Inferred from Genetic Context
IBA: Inferred from Biological aspect of Ancestor
IBD: Inferred from Biological aspect of Descendant
IKR: Inferred from Key Residues
IRD: Inferred from Rapid Divergence
RCA: Inferred from Reviewed Computational Analysis
   Author Statement Evidence Codes:
TAS: Traceable Author Statement
NAS: Non-traceable Author Statement
   Curator Statement Evidence Codes:
IC: Inferred by Curator
ND: No biological Data available
Automatically-assigned Evidence Codes
IEA: Inferred from Electronic Annotation
Obsolete Evidence Codes
NR: Not Recorded
PubMed ids
References in the literature, if relevant, obtained from EntrezGene

GenesLikeMe

GenesLikeMe (formerly Partner Hunter) is available for ontologies, phenotypes, drugs and compounds, expression patterns, sequence-based paralogs, disorders, pathways, and domains. By clicking on the GenesLikeMe (formerly Partner Hunter) button for a particular section, one arrives at the GenesLikeMe home page, where the gene name has been entered and the appropriate fields selected from the attribute list. From this page, changes can be made to the data requested. Submitting this form brings up a result page containing a list of genes similar to the chosen gene and their descriptions.

Novoseek Scoring Algorithm

The relevance scores of elements related to genes (chemical substances and diseases) are based on the analysis of co-occurrences of two elements in Medline documents. The observed number of documents where both elements appear together and the number of documents where both appear independently are compared to an expected value based on a hypergeometric distribution. The more co-occurrences are observed in relation to the number expected the more unlikely it is that this happened by chance and the higher will be the value. Unfortunately the absolute numbers are not meaningful but can only give an order of importance (i.e. in the list of chemicals related to a gene the order is meaningful and the first chemicals in the list are, statistically, stronger related to the gene than the following ones but the absolute values of the scores may change from one release to another).