GeneCards Search Guide

GeneCards Version 3 has an improved search that is fast and provides accurate results. Version 3.0 also provides paging capabilities and easily readable minicards. This version of 3.0 is a hybrid system, in which the database, search, and MiniCards have been improved, but the actual cards remain the same as those in 2.xx, with the addition of a search bar at the top of the page. Therefore, it is possible that there might be slight discrepancies between the search results and the information displayed in the card.

GeneCards Version 3.0 Search Features

  • V3 uses stemming in all of its searches so that similar words will be found rather than just exact matches.

  • A search for multiple words (ex. zona pellucida) behaves as an AND at the gene level in 3.0 (i.e. each of the words must exist in at least one of the sections of the GeneCard). To search for an exact phrase in 3.0, simply add quotes to your search.

  • Parentheses should be used in searches for complex boolean strings in order to indicate precedence. Otherwise AND operations will take precedence over OR operations. See the example in Multiword Search. Please note that booleans must be capitalized. A search using "and" will not produce the same results as one using "AND".

Table of Contents

  • What Do I Get?
  • Simple Search
  • Advanced Search
  • Search Examples
  • Relevance Scores

    What Do I Get?

    • Hit Context (Minicard) - The search results are first displayed closed, showing the gene symbol, description, category, GCID and a relevance score. To open the minicard click on the plus to the left of the gene symbol on the appropriate minicard. All fields of the GeneCard in which your search term(s) were found will be displayed. The sections' names, which appear on the left, are linked to the corresponding sections in the GeneCard. All of the keywords entered in your search, including any variants found due to stemming, will be highlighted in the minicards.

      The minicard list is sorted by relevance (determined by the relevance scoring method).

    • GeneCard - A detailed display of all the data concerning a specific gene, including relevant links to other important websites.

    • Superscripts indicating sources for information throughout the card link directly to the information, from a given source's site, on the specific gene you are viewing.

    More on: The GeneCard

    Simple Search

    Enter an expression into the search field on the GeneCards homepage and
    choose one of these to search by:

    • Symbol only - Brings up the GeneCard for the specified symbol. This is done by selecting the Symbol only button and typing the requested gene symbol in the search box.
       
    • Symbol/Alias - Searches the database for a gene name, or its alias. This is done by selecting the Symbol/Alias button and typing the requested symbol or alias for the gene in the search box.
       
    • Symbol/Alias/Identifier - Searches the database for a gene name, its alias, or another Identifier such as the GCid or an External Id (accessions from HGNC, EntrezGene, UniProt, and Ensembl). This is done by selecting the Symbol/Alias/Identifier button and typing the requested Id in the search box.
       
    • Keywords - Searches the database in a full free text search. This is done by selecting the (default) keywords button and typing any kind of text relevant to the search in the search box.

    Notes:

    • The GeneCards search is case insensitive.
    • All variants of your search terms that are found due to stemming will be highlighted in the search results.
    • When search terms are encapsulated with double quotes, exact match is made. Exact match enables you to determine the distance between two or more terms. The exact match ignores trivial words, like "a","and","then" etc. For example, searching "heart brain" would also retrieve the string "heart and a brain". You can also use Tilde (~) to determine the distance between the terms (excluding trivial words), i.e. :
      "heart brain"~50, would search for heart brain in the distance of maximum 50 words, excluding trivial words.

    Wild card Searches
    The * character serves as a wildcard, which matches all possible character strings (including no characters). Additionally, the ? characters can be used to represent 1 character. Wild cards can be used anywhere in the search string except as the first character.

    For example:
    Searching for acid* will find genes whose cards have the string acid, acidic, acidosis, and aciduria. Searching for acid? will yield those that have stings such as ACID2 and acidi. The term ac*d find acid, ACY1D, and ACD, while ac?d finds acid and ACAD-8, among others.

    Note that this is different from stemming, which matches strings that are considered to be related to the specified keyword. Stemming allows the search for acid (no wild card) to find acid and acidic.

    Advanced Search

    Advanced search enables you to browse GeneCards for more specific results. A broader variety of search options is offered in order to focus the search.

    There are two ways to use the Advanced search:

    1. Click the Advanced Search link next to the search box at the GeneCards homepage - a new page with more search options appears.

      First choose which type of search you wish to perform:

      • Keywords - only genes that contain the search term in the specified sections will be displayed.

      • Symbol Only - only genes with the symbols typed in this box will be displayed.

      • Symbol/Alias/Identifier - only genes with the symbols/aliases/Ids typed in this box will be displayed.

      • Symbol/Alias - only genes with the symbols/aliases typed in this box will be displayed.

      If you are performing a keywords search you will be able to choose which section of the GeneCard you would like to search in (e.g. location, summaries, pathways). This option is located in the menu between the type of search dropdown and the search box.

      Now type the search string in the search field. If you wish to search in more than one field Click on the + button next to the search box to get another field to add more terms to your search, or to search in multiple fields within a GeneCard for your search terms. When adding another term in a separate field, you may choose to search for your first term AND your second term or to search for your first term OR your second term by changing the first select box in the new row from "and" to "or".

      As in the simple search, you may enter multiple terms and explicit sub-queries in each search field. If you have a long list of search terms you may upload them as a file. By default, only genes that contain all of the entered search terms will be displayed (AND behavior). To display results that contain any one of the entered search terms (OR behavior), check "match any in list or file".
    2. Perform a simple search, and improve the search after getting the results - click on the Show Advanced Search link in the top left corner above the search results. The advanced search will appear on the top of the page containing your search results.

      Try it! Refine this simple search example

    3. See Search Examples for more on advanced searches.

    Search Examples:

    I.   Simple Search:

    Search String: Matching Words: Search Description:
    brca1 brca1, BRCA1 Exact word match (case insensitive).*
    x84746 x84746 GenBank accession number

    * A Keyword Search will result in a list of MiniCards, even if there is only one gene that matches your search.
      A Symbol Only Search with only one result will show the GeneCard itself right away!

    II.   Wild Card Search (*):

    This type of search will usually give out a large number of results.

    Search String: Matching Words: Search Description:
    live live, lives, liver, lively, lived, living Any object that begins with the string "live" or some derivative of "live" determined by the search engines stemming algorithm.
    A search for live* will return many more results and will give priority to those that contain "liver" rather than "live", "lived", or "living".
    * Please note that wild card searches using initial wild cards, for example *gammaglob*, are not supported in version 3.0 searches.

    III.   Multiword Search:

    Search String: Matching Words: Search Description:
    obesity diabetes diabetes,with hyperproinsulinemia... for obesity in association with Ig
    action profile in obese non-diabetic subjects.
    Search behaves as if an AND was used(see the AND search below!)
    obesity AND diabetes preponderance of insulin resistance, diabetes mellitus ... genetic predisposition to obesity
    action profile in obese non-diabetic subjects.
    All strings or variants of strings must exist in the GeneCard.
    obesity OR diabetes obesity, obese, diabetes, diabetic At least one of the strings or its variants must exist in the GeneCard
    (notice the difference from the AND search!)
    (neurodegenerative OR senile) AND Alzheimer ...senile dementia of the Alzheimer type: relation with the cognitive state and with quantitative studies of senile plaques... Finds all instances of either neurodegenerative AND alzheimer or senile AND alzheimer.
    "macular degeneration" associated with age-related macular degeneration type 4... Exact phrase search. All words in the order that they are entered must be in the GeneCard.
    * In the search (neurodegenerative OR senile) AND Alzheimer the use of parentheses is important. Without parentheses the AND takes precedence in this search so that the results returned are for the neurodegenerative or senile AND alzheimer.

    IV.   Advanced Search:

    Example 1: Search by section. Finds all instances where the word angiogenesis is found in summaries and the word cancer is found anywhere in the entry.
    Example 2: Search within the disorders section for alzheimers or dementia or within the literature section for those same keywords.

    Relevance Scores

    The search platform used is SOLR, based on Apache's Lucene text search API.
    When a term is searched Lucene returns a set of scored hits.
    A "hit" represents a document (in our case a GeneCard), whose fields (actual annotations) were previously indexed by Lucene.
    The scoring is calculated by a Lucene defined algorithm: (see Lucene's Similarity class)
    score(q,d)   =   coord(q,d)  ·  queryNorm(q)  ·  ( tf(t in d)  ·  idf(t)  ·  t.getBoost() ·  norm(t,d) )
    t in q
    the factors in this formula are : (see Solr Relevancy FAQ)
    • tf stands for term frequency - the more times a search term appears in a document, the higher the score
    • idf stands for inverse document frequency - matches on rarer terms count more than matches on common terms
    • coord is the coordination factor - if there are multiple terms in a query, the more terms that match, the higher the score
    • lengthNorm - matches on a smaller field score higher than matches on a larger field
    • index-time boost - if a boost was specified for a document at index time, scores for searches that match that document will be boosted.
    • query clause boost - a user may explicitly boost the contribution of one part of a query over another.

    Each field can be "boosted" - this means increase the weight of a specific field at search time.
    In GeneCards, we "boost" a few fields:
    • The Symbol
    • Aliases and Descriptions
    • Accessions for the major bioinformatics databases (NCBI, Ensembl, SwissProt)
    • Gene Summaries
    • Disorders

    You can read more about Lucene's scoring mechanism here: Apache Lucene - Scoring
    To ensure showing the best precision we display the score as (base 2 log of the score) + 10


  • Developed at the Crown Human Genome Center, Department of Molecular Genetics, the Weizmann Institute of Science

    Version: 3.12.250 16 Nov 2014
    hostname: 356977-web1.xennexinc.com index build: 128 solr: 1.4