Currently, Gene-Sieve only displays demonstration data on an internal training set. In the near future, the "Input trait" and "Input Gene(s)" will be entirely up to the user. In addition "Input Gene(s)" box will take full set of protein sequences in fasta format. These proteins will be used to search the initial layer of the database.
As it stands, please select a trait from the drop down menu and the "Input Gene(s)" field will auto-populate.
Please note that currently, some demo queries do not return any candidates.
The resolution of most genetic experiments is limited and cannot accurately determine the single gene that is causing a trait. Usually, the resolution ranges from 20 to 100 genes. GeneSieve’s goal is to prioritize these genes based on previously determined associations between a trait and a genomic sequence from four model crop species. After doing a genetic experiment (in any species), the researcher can submit a set of genes as well as a natural language description of the trait, such as would be found in the literature.
The GeneSieve database is essentially an association graph with two terminal nodes: the entered gene and the trait description. Each gene in the entered list spawns a set of paths through the association graph. The current graph has four types of weighted links – 1) protein similarity, 2) coexpression, 3) genetic association, and 4) trait similarity. All links are weighted from 0 to 1 based on biological criteria. For each gene the user has submitted, GeneSieve finds all cycles through the graph that connect that gene with the submitted trait through the existing database. The higher the path score and the more independent paths involved, then the more likely that candidate is.