Welcome to MyGene.py’s documentation!¶

MyGene.Info provides simple-to-use REST web services to query/retrieve gene annotation data. It’s designed with simplicity and performance emphasized. mygene, is an easy-to-use Python wrapper to access MyGene.Info services.

Requirements¶

python >=2.6 (including python3)

requests (install using “pip install requests”)

Optional dependencies¶

pandas (install using “pip install pandas”) is required for returning a list of gene objects as DataFrame.

Installation¶

Option 1

pip install mygene

Option 2
download/extract the source code and run:
python setup.py install
Option 3
install the latest code directly from the repository:
pip install -e git+https://github.com/sulab/mygene.py#egg=mygene

Version history¶

CHANGES.txt

Tutorial¶

ID mapping using mygene module in Python

API¶

mygene.alwayslist(value)[source]¶

If input value if not a list/tuple type, return it as a single value list.

Example:

>>> x = 'abc'
>>> for xx in alwayslist(x):
...     print xx
>>> x = ['abc', 'def']
>>> for xx in alwayslist(x):
...     print xx

class mygene.MyGeneInfo(url='http://mygene.info/v3')[source]¶

This is the client for MyGene.info web services. Example:

>>> mg = MyGeneInfo()

metadata(verbose=True, **kwargs)[source]¶

Return a dictionary of MyGene.info metadata.

Example:

>>> metadata = mg.metadata

set_caching(cache_db='mygene_cache', verbose=True, **kwargs)[source]¶: Installs a local cache for all requests. cache_db is the path to the local sqlite cache database.

stop_caching()[source]¶: Stop caching.

clear_cache()[source]¶: Clear the globally installed cache.

get_fields(search_term=None, verbose=True)[source]¶

Return all available fields can be return from MyGene.info services.

This is a wrapper for http://mygene.info/metadata/fields

Parameters:	search_term – an optional string to search (case insensitive) for matching field names. If not provided, all available fields will be returned.

Example:

>>> mv.get_fields()
>>> mv.get_fields("uniprot")
>>> mv.get_fields("refseq")
>>> mv.get_fields("kegg")

Hint

This is useful to find out the field names you need to pass to fields parameter of other methods.

getgene(geneid, fields='symbol, name, taxid, entrezgene', **kwargs)[source]¶

Return the gene object for the give geneid. This is a wrapper for GET query of “/gene/<geneid>” service.

Parameters:	geneid – entrez/ensembl gene id, entrez gene id can be either a string or integer fields – fields to return, a list or a comma-separated string. If fields=”all”, all available fields are returned species – optionally, you can pass comma-separated species names or taxonomy ids email – optionally, pass your email to help us to track usage filter – alias for fields parameter
Returns:	a gene object as a dictionary, or None if geneid is not valid.
Ref:	http://mygene.info/doc/annotation_service.html for available fields, extra kwargs and more.

Example:

>>> mg.getgene(1017, email='abc@example.com')
>>> mg.getgene('1017', fields='symbol,name,entrezgene,refseq')
>>> mg.getgene('1017', fields='symbol,name,entrezgene,refseq.rna')
>>> mg.getgene('1017', fields=['symbol', 'name', 'pathway.kegg'])
>>> mg.getgene('ENSG00000123374', fields='all')

Hint

The supported field names passed to fields parameter can be found from any full gene object (when fields=”all”). Note that field name supports dot notation for nested data structure as well, e.g. you can pass “refseq.rna” or “pathway.kegg”.

getgenes(geneids, fields='symbol, name, taxid, entrezgene', **kwargs)[source]¶

Return the list of gene objects for the given list of geneids. This is a wrapper for POST query of “/gene” service.

Parameters:	geneids – a list/tuple/iterable or comma-separated entrez/ensembl gene ids fields – fields to return, a list or a comma-separated string. If fields=”all”, all available fields are returned species – optionally, you can pass comma-separated species names or taxonomy ids email – optionally, pass your email to help us to track usage filter – alias for fields as_dataframe – if True, return object as DataFrame (requires Pandas). df_index – if True (default), index returned DataFrame by ‘query’, otherwise, index by number. Only applicable if as_dataframe=True.
Returns:	a list of gene objects or a pandas DataFrame object (when as_dataframe is True)
Ref:	http://mygene.info/doc/annotation_service.html for available fields, extra kwargs and more.

Example:

>>> mg.getgenes([1017, '1018','ENSG00000148795'], email='abc@example.com')
>>> mg.getgenes([1017, '1018','ENSG00000148795'], fields="entrezgene,uniprot")
>>> mg.getgenes([1017, '1018','ENSG00000148795'], fields="all")
>>> mg.getgenes([1017, '1018','ENSG00000148795'], as_dataframe=True)

Hint

A large list of more than 1000 input ids will be sent to the backend web service in batches (1000 at a time), and then the results will be concatenated together. So, from the user-end, it’s exactly the same as passing a shorter list. You don’t need to worry about saturating our backend servers.

query(q, **kwargs)[source]¶

Return the query result. This is a wrapper for GET query of “/query?q=<query>” service.

Parameters:	q – a query string, detailed query syntax here fields – fields to return, a list or a comma-separated string. If fields=”all”, all available fields are returned species – optionally, you can pass comma-separated species names or taxonomy ids. Default: human,mouse,rat. size – the maximum number of results to return (with a cap of 1000 at the moment). Default: 10. skip – the number of results to skip. Default: 0. sort – Prefix with “-” for descending order, otherwise in ascending order. Default: sort by matching scores in decending order. entrezonly – if True, return only matching entrez genes, otherwise, including matching Ensemble-only genes (those have no matching entrez genes). email – optionally, pass your email to help us to track usage as_dataframe – if True, return object as DataFrame (requires Pandas). df_index – if True (default), index returned DataFrame by ‘query’, otherwise, index by number. Only applicable if as_dataframe=True. fetch_all – if True, return a generator to all query results (unsorted). This can provide a very fast return of all hits from a large query. Server requests are done in blocks of 1000 and yielded individually. Each 1000 block of results must be yielded within 1 minute, otherwise the request will expire on the server side.
Returns:	a dictionary with returned gene hits or a pandas DataFrame object (when as_dataframe is True)
Ref:	http://mygene.info/doc/query_service.html for available fields, extra kwargs and more.

Example:

>>> mg.query('cdk2')
>>> mg.query('reporter:1000_at')
>>> mg.query('symbol:cdk2', species='human')
>>> mg.query('symbol:cdk*', species=10090, size=5, as_dataframe=True)
>>> mg.query('q=chrX:151073054-151383976', species=9606)

querymany(qterms, scopes=None, **kwargs)[source]¶

Return the batch query result. This is a wrapper for POST query of “/query” service.

Parameters:	qterms – a list/tuple/iterable of query terms, or a string of comma-separated query terms. scopes – type of types of identifiers, either a list or a comma-separated fields to specify type of input qterms, e.g. “entrezgene”, “entrezgene,symbol”, [“ensemblgene”, “symbol”]. Refer to official MyGene.info docs for full list of fields. fields – fields to return, a list or a comma-separated string. If fields=”all”, all available fields are returned species – optionally, you can pass comma-separated species names or taxonomy ids. Default: human,mouse,rat. entrezonly – if True, return only matching entrez genes, otherwise, including matching Ensemble-only genes (those have no matching entrez genes). returnall – if True, return a dict of all related data, including dup. and missing qterms verbose – if True (default), print out infomation about dup and missing qterms email – optionally, pass your email to help us to track usage as_dataframe – if True, return object as DataFrame (requires Pandas). df_index – if True (default), index returned DataFrame by ‘query’, otherwise, index by number. Only applicable if as_dataframe=True.
Returns:	a list of gene objects or a pandas DataFrame object (when as_dataframe is True)
Ref:	http://mygene.info/doc/query_service.html for available fields, extra kwargs and more.

Example:

>>> mg.querymany(['DDX26B', 'CCDC83'], scopes='symbol', species=9606)
>>> mg.querymany(['1255_g_at', '1294_at', '1316_at', '1320_at'], scopes='reporter')
>>> mg.querymany(['NM_003466', 'CDK2', 695, '1320_at', 'Q08345'],
...              scopes='refseq,symbol,entrezgene,reporter,uniprot', species='human')
>>> mg.querymany(['1255_g_at', '1294_at', '1316_at', '1320_at'], scopes='reporter',
...              fields='ensembl.gene,symbol', as_dataframe=True)

Hint

querymany() is perfect for doing id mappings.

Hint

Just like getgenes(), passing a large list of ids (>1000) to querymany() is perfectly fine.

findgenes(id_li, **kwargs)[source]¶: Deprecated since version 2.0.0.

Use querymany() instead. It’s kept here as an alias of querymany() method.

Welcome to MyGene.py’s documentation!¶

Requirements¶

Optional dependencies¶

Installation¶

Version history¶

Tutorial¶

API¶

Indices and tables¶