Gene annotation data

Data sources

We currently obtain the gene annotation data from several public data resources and keep them up-to-date, so that you don’t have to do it:

Source Update frequency Notes
NCBI Entrez weekly snapshot  
Ensembl whenever a new release is available
Ensembl Pre! and EnsemblGenomes
are not included at the moment
Uniprot whenever a new release is available  
NetAffy whenever a new release is available  
PharmGKB whenever a new release is available  
UCSC whenever a new release is available For “exons” field
CPDB whenever a new release is available For “pathway” field

The most updated data information can be accessed here.

Gene object

Gene annotation data are both stored and returned as a gene object, which is essentially a collection of fields (attributes) and their values:

{
    "_id": "1017"
    "taxid": 9606,
    "symbol": "CDK2",
    "entrezgene": 1017,
    "name": "cyclin-dependent kinase 2",
    "genomic_pos": {
        "start": 56360553,
        "chr": "12",
        "end": 56366568,
        "strand": 1
    }
}

The example above omits most of available fields. For a full example, you can just check out a few gene examples: CDK2, ADA. Or, did you try our interactive API page yet?

Species

We support ALL species annotated by NCBI and Ensembl. All of our services allow you to pass a “species” parameter to limit the query results. “species” parameter accepts taxonomy ids as the input. You can look for the taxomony ids for your favorite species from NCBI Taxonomy.

For convenience, we allow you to pass these common names for commonly used species (e.g. “species=human,mouse,rat”):

Common name Genus name Taxonomy id
human Homo sapiens 9606
mouse Mus musculus 10090
rat Rattus norvegicus 10116
fruitfly Drosophila melanogaster 7227
nematode Caenorhabditis elegans 6239
zebrafish Danio rerio 7955
thale-cress Arabidopsis thaliana 3702
frog Xenopus tropicalis 8364
pig Sus scrofa 9823

If needed, you can pass “species=all” to query against all available species, although, we recommend you to pass specific species you need for faster response.

Genome assemblies

Our gene query service supports genome interval queries. We import genomic location data from Ensembl, so all species available there are supported. You can find the their reference genome assemblies information here.

This table lists the genome assembies for commonly-used species:

Common name Genus name Genome assembly
human Homo sapiens GRCh38 (hg38), also support hg19
mouse Mus musculus GRCm38 (mm10), also support mm9
rat Rattus norvegicus Rnor_5.0 (rn4)
fruitfly Drosophila melanogaster BDGP5 (dm3)
nematode Caenorhabditis elegans WBcel235 (ce10)
zebrafish Danio rerio Zv9 (danRer6)
frog Xenopus tropicalis JGI_4.2 (xenTro2)
pig Sus scrofa Sscrofa10.2 (susScr2)

Available fields

The table below lists of all of the possible fields that could be in a gene object.

Field Indexed Type Notes