|
|
||||||
GeneDecks
About GeneDecks
GeneDecks is a novel analysis tool which provides a similarity metric by highlighting shared descriptors between genes, based on the rich annotation within the GeneCards compendium of human genes. GeneDecks features Partner Hunter and Set Distiller.
In Partner Hunter, users supply a query gene, and the system finds putative
functional paralogs, namely genes that are similar to the query gene based on combinatorial similarity
of attribute annotations.
In Set Distiller, users supply a set of genes, and the system ranks descriptors
by their degree of sharing within the given gene set. GeneDecks enables the elucidation of unsuspected
putative functional paralogs, and a refined scrutiny of various gene-sets (e.g. from high-throughput
experiments) for discovering relevant biological patterns.
Partner Hunter Algorithm
Partner Hunter calculates similarity scores between each query gene and all remaining candidate genes in the GeneCards database for 10 attributes that appear in table 1. For all attributes except Gene Ontology, sequence paralogy and expression, the similarity score between a query gene and a candidate gene is calculated in the following manner: each descriptor score (DS) is the result of dividing its rank by Log10 of its frequency in the database
For the sequence paralogy attribute, if a partner candidate is also identified as a sequence paralog (SP),
then it is assigned a value of 1 for this attribute and 0 otherwise
The attribute score is then multiplied by the weight given for the attribute and all attribute scores are then summed to give the Partner Hunter score (PHS)
Set Distiller algorithm
Set Distiller employs descriptors from 9 out of the possible 10 attributes that appear in table1,
for user-defined query gene sets. For each descriptor, a p-value is calculated from the
binomial distribution, testing the null hypothesis that the frequency of the descriptor
in the query set is not significantly different from what is expected with a random sampling of genes,
given the frequency of the descriptor in the set of all genes. Expression levels, calculated as
quantiles, were grouped into low (1-3), medium (4-7), high (8-10) or no expression (0) and treated
as all other descriptors. Descriptors are sorted by increasing
P-value and then by decreasing occurrence counts within the gene set. Bonferroni correction was used
to correct for multiple testing and only descriptors with P-value > 0.05 are displayed.
Table 1
The attributes used in GeneDecks algorithms with their contributing data sources. Attribute inclusion in the Partner Hunter or Set Distiller algorithms is marked.
| Attribute | Partner Hunter | Set Distiller | Data Source | |
|---|---|---|---|---|
| Sequence paralogy | + |
|
||
| Domains | + | + |
|
|
| Pathways | Invitrogen | + | + |
|
| CST | + | + |
|
|
| KEGG | + | + |
|
|
| Expression patterns | + | + |
|
|
| Phenotypes | + | + |
|
|
| Compounds | + | + |
|
|
| Disorders | + | + |
|
|
| Gene Ontology | + | + |
|
|