This page provides information about the various GeneCards sections and tables.
General Comments
The sections that follow are linked to by the GeneCards links labeled
About,
About this table, About this
scheme, and About these
images found in the corresponding GeneCards section.
Superscripts in the data refer to the
sources (shown on the left column of the card) from where the data was extracted.
Tooltips offering explanatory information about images can be viewed by placing your mouse
over the images (see expression).
Text background color changes
for easy identification of mouseovers.
To keep the GeneCards page more compact, many of the tables/columns initially offer only partial,
high-scoring results (e.g: the top 10 SNPs sorted by type, with coding,
nonsynonomous SNPs shown first;
the chemical compounds matching highest with the gene; etc).
In these cases, a hyperlink is always provided for viewing all of the information available for that
item.
Entrez Gene type is 'protein coding', or data source is Ensembl and an Ensembl protein exists
pseudogene
Entrez Gene type is 'pseudo' or symbol contains 'pseudo'
RNA gene
Entrez Gene type ends with 'RNA'
gene cluster
symbol ends with '@'
genetic locus
none of the above, but there is disease information , or 'QTL' in the symbol
uncategorized
none of the above
Categories are based on Entrez Gene type and status, as well as several other factors.
The former categories 'predicted' and 'predicted with support' are manifested now in the attibute
'predicted', which appears in the upper left box, where the category and GCid appear. This attribute
means that the Entrez Gene status is 'PREDICTED', 'INFERRED', or 'MODEL' or the symbol source is
Ensembl.
The former category "reserved symbol" no longer exists because it is no longer used by HUGO.
This section provides the gene's symbol, GCid, and category in the box on the left hand side.
Each gene category has its distinct color: protein-coding,
pseudogene,
RNA gene,
gene cluster,
genetic locus, and
uncategorized.
The gene's symbol and GCid are the color of the gene's category.
The header also consists of a short description of the gene, and whether or not the gene symbol is
HUGO Gene Nomenclature
Committee (HGNC) database
approved.
This section displays synonyms and aliases for the relevant GeneCards gene,
as extracted from GDB,
OMIM,
HGNC,
Entrez Gene,
UniProt
(Swiss-Prot/TrEMBL),
GeneLoc, and
Ensembl.
Also shown are accessions from HGNC, EntrezGene, UniProt, and/or Ensembl, and
previous GC identifiers where relevant
(for cases that GeneLoc deems it necessary to assign a new identifier to a gene based on updated
information about its chromosomal location).
Such GC ids will always remain with their original genes and will not be reused with other symbols.
This section displays the chromosome, cytogenetic band and map location of the GeneCards gene
as extracted from
GeneLoc,
HGNC,
Entrez Gene, and
miRBase,
as well as genomic views from UCSC
and Ensembl.
The GeneLoc integrated location is shown in red on the image. If this
differs from the location provided by Entrez Gene and/or Ensembl, their locations are shown on the
image in green and/or blue respectively.
Also provided are links to the
GeneLoc gene density information for this gene's chromosome, which shows the number of genes in
each 1 Mb interval along the chromosome, and to detailed exon information as provided by
GeneLoc.
This section provides annotated information of the proteins encoded by GeneCards genes
according to
UniProt, and/or
Ensembl,
the capability to view phosphorylation sites using
Phosphosite
and Invitrogen,
reference sequences (RefSeq) according to
NCBI,
cellular component ontologies visualized by the
Gene Ontology Consortium
(more information),
and links for ordering antibodies from
Invitrogen,
BIOMOL,
Cell Signaling Technology,
Abcam
and/or
GeneTex.
Direct links to three-dimensional visualization of PDB structures provided by the
OCA browser
are also provided via the (3D)
hyperlink shown next to each PDB identifier.
This section provides annotated information about protein domains and families according to
InterPro,
ProtoNet,
UniProt
and Blocks.
For InterPro entries, one can view other genes with these domains or in these families.
By selecting the InterPro entries of interest and clicking on the GeneDecks link, one arrives to a result page,
containing a list of these genes and their descriptions.
This section provides annotated information about gene function
according to MGD,
UniProtIUBMB, and
Genatlas,
including: RNAi, Clones, and Primers products from
Invitrogen,
Cell based and Kinase Function and Binding Assays from
DiscoveRx,
as well as molecular function ontologies visualized by the
Gene Ontology
Consortium (more information). Information from MGD includes
phenotypes for mouse orthologs and a popup table with information on
phenotypic alleles of the orthologs. This table presents the following
columns:
Allele Name - Official symbol for the allele with link to MGD
record
MGI id - MGI identifier of the allele (linked to in previous column)
Category - Type of allele by mode of origin
Observed Phenotypes in Mouse - Phenotypic details for all
genotypes that include at least one of the alleles
For Kegg, CST and Invitrogen iPath pathways, one can view other genes that participate in these pathways.
By selecting the pathways of interest and clicking on the GeneDecks link, one arrives to a result page, containing
a list of these genes and their descriptions.
Interacting proteins
Each line in this table represents one interacting protein, according to EBI-IntAct, MINT, or both.
The following columns are presented:
Interactant - Links to the GeneCards page (first sub-column) and the UniProt page (second
sub-column) for the interacting protein. Superscript links: 1 - the comments section in the UniProt
page for the interactant; 2 - the page of all interactions between the two proteins, or all
experiments supporting them, in the MINT database.
Interaction Details - Links to the interaction page in the database from which in was retrieved.
In the case of IntAct, this page may include several different experiments supporting the same
interaction. In the MINT database each distinct interaction definition or experiment supporting it is
assigned a different mint id, all are presented.
This section provides relationships between GeneCards genes and both chemical compounds and
drugs, as well as links to drugs and compounds for purchase at BIOMOL.
Chemical compound relationships are from AKS.
Drug compound relationships are from PharmGKB.
AKS chemical compound relationships.
This table presents the following columns:
Compound - The name of the chemical compound related to this GeneCards gene.
Articles - The number of articles in which both the gene's symbol and the compound appear.
PubMed IDs for Articles with Shared Sentences (# sentences) - PubMed IDs of articles in which both
the gene symbol and the compound appear in the same sentence,
sorted by the number of sentences (shown in parentheses in the column) in which they both appear.
PharmGKB drug compound relationships.
This table presents the following columns:
Drug Compound - The name of the drug compound related to this GeneCards gene.
PharmGKB Relations - description of the relationship between the gene and the drug:
CO - Clinical Outcome
PD - Pharmacodynamics and Drug Response
PK - Pharmacokinetics
FA - Molecular and Cellular Functional Assays
GN - Genotype
PubMed IDs for articles supporting these relationships - PubMed IDs of articles in which both the
gene symbol and the drug are discussed.
This section contains associated
Unigene clusters and
repesentative sequences,
REFSEQ mRNAs with
associated
expression assays from Applied Biosystems when available,
RNAi products from
Invitrogen,
OriGene clones,
DOTS assemblies
(sorted by a scoring scheme that gives preferences to mRNAs over EST associations),
GeneTide highest scoring ESTs,
transcript and alignment information from AceView,
additional gene/cDNA sequences from
GenBank, alternative splicing information, and transcript links to
Ensembl.
Alternative Splicing
This subsection contains alternative splicing information according to
ASD followed by
alternative splicing isoforms from ECgene. Exons with
alternative splice sites in different isoforms were broken into Exonic Units (ExUns). The letters
indicate the order of the ExUns in the exon. The symbol ' ^ ' between ExUns indicates an intron,
while ' ·' indicates the junction of two ExUns.
This section contains links to
expression assays
from Applied Biosystems,
experimental results from GeneNote,
probeset-to-gene annotations from GeneAnnot and
GeneTide,
electronic Northern data images and clone count from
UniGene,
SAGE
expression data images and tag counts based on data extracted from
CGAP, followed by links to
SOURCE,
and/or EXPOLDB,
and/or tissue specificity data from
UniProt.
An association of GeneCards genes to Affymetrix probe-sets, through GeneAnnot and GeneTide is presented in a table.
One can view other genes that share binary patterns of normal tissue expression based on GeneNote data (HG-U95 arrays).
By selecting the probe-sets of interest and clicking on the GeneDecks link, one arrives to a result page, containing
a list of these genes and their descriptions.
Other columns include data from GeneAnnot and GeneTide, where an asterisk next to the
probe set name indicates lower quality annotation, as follows:
Array - The Affymetrix GeneChip® expression array. Note that U95-A refers to Affymetrix array AV2
(version 2).
GeneAnnot
# genes - The number of genes related to this probe set.
Sensitivity - The fraction of probes that hit transcripts related to that gene (range: 0-1).
Specificity - The degree to which the individual probes of a given probe set match a certian gene
and only that gene (range: 0-1, where 1 is most specific).
GeneNote
Correlation - see description below in "GeneNote - individual probe-sets variation" (range: 0-1).
Length - see description below in "GeneNote - individual probe-sets variation" (range:
0-4).
GeneTide
Gb_Accession - The mRNA's GenBank accession number.
Consensus - The fraction of annotating resources that agree that the cDNA belongs to the gene
(range: 0-1).
Uniqueness - A confidence score that says how convinced each resource is
that this is the only possible gene associated with the sequence (range: 0-1).
Score - The 'Consensus' and 'Uniqueness' parameters collapsed into one score (range: 0-1).
Rank - The position of the specific gene among all other genes associated with this transcript.
After the table, 3 pairs of images, for GeneNote, electronic Northern, and SAGE tissue expression data
respectively, are presented, with the following tooltips:
GeneNote - expression arrays
Experimental tissue vectors: Duplicate measurements were obtained for twelve normal human tissues
hybridized against Affymetrix GeneChips HG-U95A-E. The intensity values (shown on the y-axis) were
normalized and drawn on a novel scale, which is an intermediate between log and linear scales. This
enables displaying several orders of magnitude on the same graph, while emphasizing the differences
between them. Noise was not subtracted out, so values below 10 may be suspect. Further, each
probeset's expression profile was converted into binary form when possible. At most 5 unique binary
patterns, which reflect the over-expression (in black) and under-expression (in white) in different
tissues, are shown per gene, with their counts on the left. (The grey stripes show undefined binary
patterns). Please note: under-expression does not always mean the lack of expression.
GeneNote - individual probe-sets variation
Multiple probe-sets corresponding to this
gene are included for its tissue vector calculation only if their normalized intensity levels reach a threshold in at least one tissue. The variation of included and excluded
probe-sets are visualized in the x-y plane: the x-axis shows Pearson's correlations between individual
probe-sets vectors and the average tissue
vector; the y-axis shows the relative length of an individual probe-set vector (its scalar length
divided by that of the average vector). The average is shown as a black square, while individual
probe-sets are depicted as colored circles.
UniGene - electronic Northern
Electronic Northern: For the shown set of
non-fetal normal human tissues, NCBI's Unigene dataset (Hs.data) is mined for information about the
number of unique clones per gene per tissue. Clones are assigned to particular tissues by applying
data-mining heuristics to Unigene's library information file (Hs.lib.info). Electronic expression
results were calculated by dividing the number of clones per gene by the number of clones per tissue.
They were then normalized by multiplying by 1M, and the obtained normalized counts are presented on
the same root scale as the experimental tissue vectors.
CGAP:SAGE
Serial Analysis of Gene Expression: For ten normal human tissues (currently the relevant SAGE
libraries are not available for spleen and thymus, shown in lower case and flagged with a *). CGAP
datasets Hs.frequencies and Hs.libraries are mined for information about the number of
SAGE tags per tissue. Tags are reassigned to a Unigene cluster and after that to a particular gene by
mining Hs.best_gene, Hs.best_tag and Hs_GeneData. The expression level of a
particular gene in a particular tissue was calculated as the number of appearances of the
corresponding tag divided by the total number of tags in libraries derived from that tissue. These
fractions were then normalized by multiplying by 1.2M and the obtained normalized counts are presented
on the same root scale as that used for the electronic Northern pictures. Please note: Currently,
only associations with minimal ambiguity participate in the analysis.
Organism - The names of the homologous species, using both scientific and popular terminology.
Gene - The symbol for the gene in the homologous species.
Locus - The position of the gene in the homologous species.
Description - Its description.
Human Similarity - The percent similarity to the human gene, followed either by
(n) where the comparison was based on nucleic acids or (a) for amino acid based comparisons.
NCBI accessions - links to the sequences for the gene in NCBI databases including
GenBank and Entrez Gene.
Upon clicking the "Species with no ortholog" link, a pop-up window appears.
It lists the species that do NOT have an ortholog to the relevant gene.
Superscripts represent the source from which this data was extracted. If a '~'
follows the superscript for HomoloGene, it means that data for this species exists only
in the older version of HomoloGene, which used unfinished genomes and where the homologs found
might not be true orthologs.
Following the table is a link to Ensembl gene trees.
SNP information is currently extracted from
dbSNP XML files. Filtering is done to include only those that are not artifacts, not connected to
gene duplication, not withdrawn by NCBI, fully specified, without ambiguous locations or low map quality,
and having single Entrez Gene and contig ids.
The order of a gene's displayed SNPs can be determined by the user. By default, SNPs are
sorted first (shown in the select box as 1st) by validation status (validated before
non-validated), then, within these groups, by ordered location type
(first
coding non-synonymous, then coding synonymous, followed by coding,
splice site, mRNA-UTR, intron, locus, reference, and/or
exception),
as the secondary (2nd) nested criterion,
and finally, by the number of validations (up to 4).
The user can change this default sort order and define up to three hierarchical sorting priorities
from fields available as select
boxes above the relevant columns on the section's
button line as follows: rs-numbers (sorted in ascending order),
validation status, position on the chromosome (ascending order), location type, allele frequencies
(existing info before non-existing), population types (alphabetical order), and total sample size
(largest to smallest).
Each displayed line includes genomic, expression, and allele frequency data sections. Only the summary
is shown for the expression and allele frequency sections, with a link to the detailed information
(via the magnifying glass icon).
This table presents the following columns:
AB - The AB logo is presented if an Applied Biosystem TaqMan genotyping assay exists for the SNP.
Click on the logo to access the relevant page at AB.
SNP ID - The NCBI rs number for this SNP
Valid - The validation method(s) associated with this SNP:
C - by-cluster
has 2+ submissions, with 1+ submissions assayed with a non-computational method
A - by-2hit-2allele
all alleles have been observed in 2+ chromosomes
F - by-frequency
subsnp has frequency data submitted
H - by-hapmap
validated by HapMap project
O - by-other-pop
Chr pos - chromosomal position: position of variation(strand).
Sequence - The sequence flanking the base pair variation (highlighted in blue/orange/green/pink).
Lower case letters indicate repetitive or low-complexity sequence.
Recs - number of records for expression/allele frequency data
AAChg - The change in amino acid resulting from this SNP
Type - The SNP type:
nonsynon - coding, non-synonymous
change in peptide with respect to contig sequence
synon - coding, synonymous
no change in peptide for allele with respect to contig seq
cds - coding
variation in coding region of gene, assigned if allele-specific class unknown
spl - splice-site
variation in first 2 or last 2 bases of intron
utr - mrna untranslated region
variation in transcript, but not in coding region interval
int - intron
variation in intron, but not in first 2 or last 2 bases of intron
exc - exception
variation in coding region with exception raised on alignment.This occurs when protein with
gap
in sequence is aligned back to contig sequence. variations 3' of the gap have undefined
functional inference.
ref - reference
allele observed in reference contig sequence
loc - locus-region
variation in region of gene, but not in transcript
Allele freq - Average frequency of the allelles for all populations, displayed as a pie-chart
(only if 2 alleles). Alleles are in the same orientation and color as the displayed SNP sequence.
Numeric info about the frequencies is available using the mouseover.
Pop - population type
Total sample - total data sample size (number of chromosomes)
Additional columns in Expression data popup:
mRNA Accession - The mRNA sequence at NCBI
Protein Accession - The protein sequence at NCBI
Phase - Codon position.(1, 2, or 3)
Protein Position - Position number of the amino acid in the protein.
Additional columns in Allele Frequency data popup:
Het - estimated heterozygosity of population
Sample Size - population data sample size (number of chromosomes)
Additional SNPs, found in Applied Biosystems data source but not in NCBI, are displayed under the table
(both "see all" options display these SNPs).
This section also provides Linkage Disequilibrium (LD) information from HapMap. The first link opens a
popup window showing the LD map for the length of the gene in population CEU
(Utah residents with ancestry from northern and western Europe). For other populations (at HapMap),
click on the second link.
Disorders & Mutations
Articles - The number of articles in which both the gene's symbol or description and the disease
appear.
PubMed IDs for Articles with Shared Sentences (# sentences) - PubMed IDs of articles in which both
the gene symbol and the disease appear in the same sentence,
sorted by the number of sentences (shown in parentheses in the column) in which they both appear.
PharmGKB disease relationships
This table presents the following columns:
Disease - The name of the disease related to this GeneCards gene.
PharmGKB Relations - description of the relationship between the gene and the disease:
CO - Clinical Outcome
PD - Pharmacodynamics and Drug Response
PK - Pharmacokinetics
FA - Molecular and Cellular Functional Assays
GN - Genotype
PubMed IDs for articles supporting these relationships - PubMed IDs of articles in which both the
gene symbol and the disease are discussed.
The articles are ranked,
first according to the number of GeneCards sources that associate the article with this gene, then by
date of publication,
and then according to the
AKS score for this
article/gene relationship. The year of publication appears in parentheses after the title of each
article.
Lower ranked articles may also appear in partial results if their titles or authors contain your
search term.
This section allows the user to search
PubMed,
OMIM,
or NCBI Bookshelf. The current gene's aliases and disorders are provided,
as well as the search string that led to the gene, to be used as search fodder. The user can also
add new search terms.
The relevance scores of elements related to genes (chemical
substances and diseases) are based on the analysis of co-occurrences of two
elements in Medline documents. The observed number of documents where both
elements appear together and the number of documents where both appear
independently are compared to an expected value based on a hypergeometric
distribution. The more co-occurrences are observed in relation to the number
expected the more unlikely it is that this happened by chance and the higher
will be the value. Unfortunately the absolute numbers are not meaningful but
can only give an order of importance (i.e. in the list of chemicals related to
a gene the order is meaningful and the first chemicals in the list are,
statistically, stronger related to the gene than the following ones but the
absolute values of the scores may change from one release to another)