Programs for Songjian Lu, Bo Jin, and Xinghua Lu’s ISMB paper

1. Go term Summarization: Given a list of genes (downstream), use a number of Go terms (biological process domain) to summarize them. (Put all four files into the same fold and run userCode_1.py)

a. userCode_1.py – The sample codes of how to use the class GoGraph.

b. GotermSummarization.py – The class that uses Go term to summarize a list of genes.

c. gene_association.sgd – The gene to Go term association file from http://www.geneontology.org.

d. newWeightedPubMedGO.xml – Weighted Go Ontology structure file. The file is obtained by adding semantic distance^*1 to all edges in the Go Ontology structure from http://www.geneontology.org.

2. Find highly dense sub-graph: Given a bipartite graph, find a high density sub-graph with maximum score.

a. userCode_2.py – The sample codes of how to use the class DenseSubGraph.

b. DenseSubGraph.py – The class that finds a highly dense sub-graph.

3. Find TF module for a set of co-expressed genes: Given a list of genes, find a set of TFs with the best score to regulate the list of genes. (Put all three files into the same fold and run userCode_3.py)

a. userCode_3.py – The sample codes of how to use the class TF_Module.

b. TF_Module.py – The class that finds a TF module that regulates a given list of co-regulated genes. This code uses the greedy algorithm to solve the t-cover hitting set problem. If you need the implementation of the exact algorithm in C++, please contact authers.

c. protein-dna_interactions_Gid.txt – The TF-gene binding event relation data. We obtain this data from papers^*2*3.

4. Find enriched Go terms: Given a list of genes (upstream), decide which Go terms are enriched of genes in the list. (Put all four files into the same fold and run userCode_4.py)

a. userCode_4.py – The sample codes of how to use the class GotermEnriched.

b. GotermEnriched.py – The class that uses Go term to summarize a list of genes.

c. gene_association.sgd – The gene to Go term association file from http://www.geneontology.org.

d. newWeightedPubMedGO.xml – Weighted Go Ontology structure file. The file is obtained by adding semantic distance^*1 to all edges in the Go Ontology structure from http://www.geneontology.org.

Note: The codes need packages networkX v1.3 or a later version.

Footnote:

*1--Bo Jin, Xinghua Lu: Identifying informative subsets of the Gene Ontology with

information bottleneck methods. Bioinformatics 26(19): 2445-2451 (2010).

*2--S. Huang and E. Fraenkel, Integrating Proteomic, Transcriptional,and Interactome

Data Reveals Hidden Components of Signaling and Regulatory Networks,

Science Signaling 2(81) Ra40, 2009.

*3--E. Yeger-Lotem, L. Riva, L. Su, A. Gitler, A. Cashikar, O. King, P. Auluck, M. Geddie, J.

Valastyan, D. Karger, S. Lindquist, and E. Fraenkel, Bridging high-throughput genetic and

transcriptional data revaels cellular responses to alpha-synuclein toxicity,

Nature Genetics 41(3), pp. 316-323, 2009.