GeneCodis2 Web Service

1. Description

The Web Service provides an API to access the same analysis as the web site at http://genecodis.dacya.ucm.es, but in a programmatic way. The API works asynchronous, an analysis job is sent to execute, the status is queried until the job is finished and then, if no error arised, the results are gathered.

All the information to and from the web service is formated as numbers of text strings. The results are represented in an XML structure (see the results section for a description).

2. Connecting to the server

The easiest way to connect to the server is using the WSDL file: 'genecodis.wsdl'. In most SOAP architectures this will prepare a driver for use with support for all the methods of the API.

For example in Ruby:

require 'soap/wsdlDriver'
wsdl_url = "http://genecodis.dacya.ucm.es/static/wsdl/genecodisGW.wsdl'
driver = SOAP::WSDLDriverFactory.new( wsdl_url ).create_rpc_driver

There are two WSDL files that can be used. The one which executes the analysis in one of our clusters is available at: http://genecodis.dacya.ucm.es/static/wsdl/genecodisGW.wsdl. The other one executes the analysis in a multi-grid environment that integrates different resources from CyTED and EELA2 Projects and is available at: http://genecodis.dacya.ucm.es/static/wsdl/genecodisGW.wsdl.


Some examples of use can be found here (ruby code), here (perl code) and here (php code). You will need to provide a file genes, you can use this one for S. cerevisiae

3. Web Service API

The API provides 4 methods:

  • analyze: Launches a new analysis job
  • status: Informs of the status of the job: Running, done or aborted with error
  • info: Gathers the log from the job. This can be helpful to determine why an error occurred.
  • results: Returns the results of the analysis in tabulated text format.
  • resultsxml Returns the XML structure holding the results of the analysis
  • genesNotFound: Returns the XML with the genes of the input list that not appear in our databases. This can be helpful to test if there are genes submmited not allowed or if the organism selected was not the correct.

The analyze method accepts a number of parameter, that we will discuss in the following section, and returns text based job identifiers. This identifiers is used as the only parameter in all other three methods.

4. Parameters for analyze:

Analyze takes eight parameters:

  • org => String: The organism over which to run the analysis. Its a string and the possible values are:
    • At: Arabidopsis thaliana
    • Bt: Bos taurus
    • Ca: Candida albicans
    • Ce: Caenorhabditis elegans
    • Dm: Drosophila melanogaster
    • Dr: Danio rerio
    • Ec: Escherichia coli
    • Gg: Gallus gallus
    • Hs: Homo Sapiens
    • Lm: Leishmania major
    • Mm: Mus musculus
    • Rn: Rattus norvegicus
    • Sc: Saccharomyces cerevisiae.
    • Sp: Schizosaccharomyces pombe
    • Tb: Trypanosoma brucei
    • Vc: Vibrio cholerae
  • algorithm => Number: The type of analysis to perform
    • 1: co-annotations analysis
    • 2: singular annotations analysis
  • test => Number: Type of statistical test to use
    • 0: Hypergeometric
    • 1: Chi square
    • 2: Both
  • correction => Number: Type of multiple hypothesis correction
    • 0: None
    • n < 0: False Discovery Ratio method
    • n > 0: Permutations by n iterations (n = 1000 recommended) method
  • minsupport => Number: Minimum number of genes require for an annotation to be reported.
  • genelist => Array of Strings: List of genes to consider for analysis. Genes must be in any of the supported formats for the organism. Click here for more information about the allowed ids
  • annotations => Array of Strings: Biological annotations to include in the analysis. Its a string and the possible values are:
    • GO_Biological_Process: Lowest level of Gene Ontology biological process annotations
    • GO_Molecular_Function: Lowest level of Gene Ontology molecular function annotations
    • GO_Cellular_Component: Lowest level of Gene Ontology cellular component annotations
    • GO_Biological_Processk: Gene Ontology biological process annotations in level k of hierarchy, k must be in [3..7]
    • GO_Molecular_Functionk: Gene Ontology molecular function process annotations in level k of hierarchy, k must be in [3..7]
    • GO_Cellular_Componentk: Gene Ontology cellular component annotations in level k of hierarchy, k must be in [3..7]
    • GOSlim_Process: GOSlim categories from biological process annotations
    • GOSlim_Function: GOSlim categories from molecular function annotations
    • GOSlim_Component: GOSlim categories from cellular components
    • KEGG_Pahtways: Kegg pathways annotations
    • InterPro_Motifs: Motifs of Interpro annotations
    • MicroRNA: microRNAs from miRBase
    • Transcription_Factors: Transcription Factors from TransFAC
  • reflist => Array of Strings: List of genes to use as reference.

5. Results. XML description

An example of results reported in XML format can be found clicking here. Can be noticed that there are many results in the example xml file. Each result consist in one single annotation or set of annotations found by GeneCodis2 algorithm related with some genes of the input list. Each result contains different fields that means:

  • Items: One singular annotation or a list of annotations tags found
  • S: Number of genes in the input list with the annotation.
  • TS: Number of genes in the reference list (whole genome default) with the annotation.
  • Hyp: P-value for the hypergeometic test, if selected.
  • Hyp_c: P-value for the hypergeometic test, after multiple hypothesis correction, if both selected.
  • Chi: P-value for the Chi square test, if selected.
  • Chi_c: P-value for the Chi square test, after multiple hypothesis correction, if both selected.
  • Genes: The list of genes of the input related with the annotations of this result