GeneDecks is a novel analysis tool which provides a similarity metric by highlighting shared descriptors between genes, based on the rich annotation within the GeneCards compendium of human genes. GeneDecks features Partner Hunter and Set Distiller.
In Partner Hunter, users supply a query gene, and the system finds putative
functional paralogs, namely genes that are similar to the query gene based on combinatorial similarity
of attribute annotations.
In Set Distiller, users supply a set of genes, and the system ranks descriptors by their degree of sharing within the given gene set. GeneDecks enables the elucidation of unsuspected putative functional paralogs, and a refined scrutiny of various gene-sets (e.g. from high-throughput experiments) for discovering relevant biological patterns.
Partner Hunter Algorithm
Partner Hunter calculates similarity scores between each query gene and all remaining candidate genes in the GeneCards database for 10 attributes that appear in table 1. For all attributes except Gene Ontology, and sequence paralogy, the similarity score between a query gene and a candidate gene is calculated in the following manner: each descriptor score (DS) is the result of dividing its rank by Log10 of its frequency in the database Descriptor ranks are each assigned the value of 1, except for those associated with the Gene Ontology (GO) attribute, which are assigned the descriptor's evidence code (Buza et al. 2008); for example Inferred from Direct Assay (IDA) will receive a descriptor score of 5 The attribute score (AS) is the sum of the descriptor scores for those descriptors shared by both the query gene and the candidate gene, divided by the sum of the descriptor scores for all descriptors associated with the query gene For the sequence paralogy attribute, if a partner candidate is also identified as a sequence paralog (SP), then it is assigned a value of 1 for this attribute and 0 otherwise Gene expression data was mined from BioGPS (http://biogps.org/). The similarity score is the mean Pearson correlation (P.Corr) between all expression vectors for the query gene and candidate gene This improves GeneDecks'ing for expression patterns, since it looks for vector correlations rather than binary expression pattern exact matches and is therefore less stringent.
The attribute score is then multiplied by the weight given for the attribute and all attribute scores are then summed to give the Partner Hunter score (PHS)
Set Distiller algorithm
Set Distiller employs descriptors from 9 out of the possible 10 attributes that appear in table1, for user-defined query gene sets. For each descriptor, a p-value is calculated from the binomial distribution, testing the null hypothesis that the frequency of the descriptor in the query set is not significantly different from what is expected with a random sampling of genes, given the frequency of the descriptor in the set of all genes. Expression level vectors were calculated as binary values thereby assigning a binary expression vector for each gene as previously described (Yanai et al. 2005). Only tissues where the expression was observed were used and treated as all other descriptors. Descriptors are sorted by increasing P-value and then by decreasing occurrence counts within the gene set. Bonferroni correction was used to correct for multiple testing and only descriptors with P-value > 0.05 are displayed.
The attributes used in GeneDecks algorithms with their contributing data sources. Attribute inclusion in the Partner Hunter or Set Distiller algorithms is marked.
|Attribute||Partner Hunter||Set Distiller||Data Source|
hostname: 356977-web1.xennexinc.com index build: 126 solr: 1.4