Sunday 17 October 2010

The Future Is Bright, The Future is Orange


A recent post by Richard L. Apodaca on the use of Knime work flows in Eclipse for cheminformatics, provoked me to look at another piece of software, Orange. Orange has been around for some time, and is an opensource data visualization/ mining toolkit written in the Python programming language. The GUI is built on QT .

I recently downloaded the MAC OSx bundle, and was pleasantly surprised by the ease in which workflows could be created (see attached image). Using the Orange GUI is easy, it allows you to read in files of different formats, process or filter attributes, to cleanly visualize the data, data distributions, to classify data, show confusion matrices and ROC curves etc.

Since I am a big fan of Eclipse, I wanted to access the scripting side of the Orange library through Eclipse. Setting up a Pydev project is easy, however, when I came to run my program:

'''
Created on Oct 16, 2010
Example of using orange python -> constructs Naive Bayesian Classifier
@author: eoc21
'''
import os, sys, orange


class ClassifierExample():
def __init__(self,fileName):
self.data = orange.ExampleTable(fileName)
self.classifier = orange.BayesLearner(self.data[2:])

def runBayesLearner(self):
for i in range(2,20):
c = self.classifier(self.data[i])
print "original",self.data[i].getclass(),"classified as", c

def printProbabilities(self):
for i in range(2,20):
p = self.classifier(self.data[i],orange.GetProbabilities)
print "%d: %5.3f (originally %s)" % (i+1, p[1], self.data[i].getclass())


if __name__ == '__main__':
example = ClassifierExample(sys.argv[1])
example.runBayesLearner()
example.printProbabilities()

I came up against the error "orange.so can't work with 64 bit architecture", since I'm running Snow Leopard, which defaults to 64 bits, I had to set a variable called: VERSIONER_PYTHON_PREFER_32_BIT

to yes.

Everything then worked cleanly.

Orange is primarily for machine learning, however it also has tools to support workflows in bioinformatics, one can also use the molecule visualizer to view smiles strings from a file.

2 comments:

  1. And the AstraZeneca team in Mölndal/Sweden is developing a cheminformatics plugin for Orange:

    http://github.com/AZCompTox/AZOrange

    ReplyDelete
  2. Hi,

    I am trying to setup Orange with eclipse. I couldn't find up any write up on this, If you have any available, could you please help me?

    ReplyDelete