Programs for Songjian
Lu, Bo Jin, and Xinghua Lu’s ISMB paper
1. Go term Summarization: Given a list of genes (downstream), use a number of Go terms
(biological process domain) to summarize them. (Put all four files into the same fold and run userCode_1.py)
a. userCode_1.py
– The sample codes of how to use the class GoGraph.
b. GotermSummarization.py
– The class that uses Go term to summarize a list of
genes.
c. gene_association.sgd – The gene to Go
term association file from http://www.geneontology.org.
d. newWeightedPubMedGO.xml
– Weighted Go Ontology structure file. The file is obtained by adding semantic distance*1
to all edges in the Go Ontology structure from http://www.geneontology.org.
2. Find highly dense sub-graph: Given a bipartite graph, find a high density sub-graph with
maximum score.
a. userCode_2.py
– The sample codes of how to use the class DenseSubGraph.
b. DenseSubGraph.py
– The class that finds a highly dense sub-graph.
3. Find TF module for a set of co-expressed genes: Given a list of genes, find a set of
TFs with the best score to regulate the list of genes. (Put all three files into the same fold and run userCode_3.py)
a. userCode_3.py
– The sample codes of how to use the class TF_Module.
b. TF_Module.py
– The class that finds a TF module that regulates a given list of co-regulated
genes. This code uses the greedy algorithm to solve the t-cover hitting set problem. If you need the implementation of
the exact algorithm in C++, please contact authers.
c. protein-dna_interactions_Gid.txt
– The TF-gene binding event relation data. We obtain this data from papers*2*3.
4. Find enriched Go terms: Given a list of genes (upstream), decide which Go terms are enriched
of genes in the list. (Put all four
files into the same fold and run userCode_4.py)
a. userCode_4.py
– The sample codes of how to use the class GotermEnriched.
b. GotermEnriched.py
– The class that uses Go term to summarize a list of
genes.
c. gene_association.sgd
– The gene to Go term association file from http://www.geneontology.org.
d. newWeightedPubMedGO.xml
– Weighted Go Ontology structure file. The file is obtained by adding semantic distance*1
to all edges in the Go Ontology structure from http://www.geneontology.org.
Note: The codes need packages networkX v1.3 or a later version.
Footnote:
*1--Bo Jin, Xinghua Lu: Identifying
informative subsets of the Gene Ontology with
information bottleneck methods. Bioinformatics
26(19): 2445-2451 (2010).
*2--S. Huang and E. Fraenkel, Integrating Proteomic, Transcriptional,and Interactome
Data Reveals Hidden
Components of Signaling and Regulatory Networks,
Science
Signaling 2(81) Ra40, 2009.
*3--E. Yeger-Lotem, L. Riva, L. Su, A. Gitler,
A. Cashikar, O. King, P. Auluck,
M. Geddie, J.
Valastyan, D. Karger,
S. Lindquist, and E. Fraenkel, Bridging
high-throughput genetic and
transcriptional data revaels
cellular responses to alpha-synuclein toxicity,
Nature
Genetics 41(3), pp. 316-323, 2009.