Thursday 28 October 2010

Redirecting Eclipse Output to a file

Whilst I think this tip is quite fundamental, it's worth noting. Being able to direct output from the Eclipse console to a file, particularly if there are warnings generated from a program you have little to no control over and the file is rather large, and the start of the output goes off the console screen.

To redirect to a file simply:
1. Go to the Run menu
2. Run-> Run Configurations
3. Go to Common tab
4. Tick file on standard input/output
5. The console output will then be written to that file.

Sunday 17 October 2010

The Future Is Bright, The Future is Orange


A recent post by Richard L. Apodaca on the use of Knime work flows in Eclipse for cheminformatics, provoked me to look at another piece of software, Orange. Orange has been around for some time, and is an opensource data visualization/ mining toolkit written in the Python programming language. The GUI is built on QT .

I recently downloaded the MAC OSx bundle, and was pleasantly surprised by the ease in which workflows could be created (see attached image). Using the Orange GUI is easy, it allows you to read in files of different formats, process or filter attributes, to cleanly visualize the data, data distributions, to classify data, show confusion matrices and ROC curves etc.

Since I am a big fan of Eclipse, I wanted to access the scripting side of the Orange library through Eclipse. Setting up a Pydev project is easy, however, when I came to run my program:

'''
Created on Oct 16, 2010
Example of using orange python -> constructs Naive Bayesian Classifier
@author: eoc21
'''
import os, sys, orange


class ClassifierExample():
def __init__(self,fileName):
self.data = orange.ExampleTable(fileName)
self.classifier = orange.BayesLearner(self.data[2:])

def runBayesLearner(self):
for i in range(2,20):
c = self.classifier(self.data[i])
print "original",self.data[i].getclass(),"classified as", c

def printProbabilities(self):
for i in range(2,20):
p = self.classifier(self.data[i],orange.GetProbabilities)
print "%d: %5.3f (originally %s)" % (i+1, p[1], self.data[i].getclass())


if __name__ == '__main__':
example = ClassifierExample(sys.argv[1])
example.runBayesLearner()
example.printProbabilities()

I came up against the error "orange.so can't work with 64 bit architecture", since I'm running Snow Leopard, which defaults to 64 bits, I had to set a variable called: VERSIONER_PYTHON_PREFER_32_BIT

to yes.

Everything then worked cleanly.

Orange is primarily for machine learning, however it also has tools to support workflows in bioinformatics, one can also use the molecule visualizer to view smiles strings from a file.

Sunday 3 October 2010

CML - who uses it?


I was very intrigued to hear from a colleague/fellow developer, that apparently CML (Chemical Markup Language) is not very well used in the field of cheminformatics. Is this true of just industry and private companies? Does academia (other than the Murray-Rust group and Henry S. Rzepa) relish this format, I would be interested to hear peoples' views.

From the Journal of Chemical Information and Modeling, "Chemical Markup Language" has retrieved 76 hits. Of which, 17 of these papers have included either PMR or Henry S. Rzepa (approx 22.4%).

If CML is not the chosen chemical format, what is the predominate format? SMILES, Inchi, Inchi key, sdf, mol2, etc? What will be the predominate format of the future? RDF, OWL?