Cannomics

Friday, 13 September 2013

Seagate 3TB Hard Drive

I recently purchased a Seagate 3TB hard drive. I primarily wanted the device for my Ubuntu 12.04 machine, the drive did not however show up. Of course it showed up straight away on Windows 7.

In order to get this drive working you need to do:

1. sudo gedit /etc/default/grub

2. Change the line

"GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"

to:
"GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=nomsi"

and reboot.

Sunday, 5 December 2010

Edit & View Molecules on the go!

Having newly acquired an IPad, the first thing I did was download some useful apps for it that I already have on my iPhone. I then began to look for cheminformatics apps that might be useful as part of my work, and came across "Mobile Molecular Datasheets". This is an app that allows a user to view and edit chemical structures on the IPad. More details on the app can be found at: molmatinfo.com. The app is developed by Molecular Materials Informatics and was last updated on the 2nd of December 2010. It is supported on iOS4.2+ and is 1.9mb to download. It does however cost £8.99 (so not the cheapest of apps).
The molecules are essentially organized into a collection of data sheets. They can be shared or sent via email. The current formats supported are Mol and SDF files. The app has the advantage that it can be integrated into workflows.

Mobile Molecular Datasheets has support for:
1. New data sheets of molecules.

2. New reactions.
3. New molecule templates. Examples of templates include: large rings, crown ethers, multi-dentate ligands cage structures, amino acids, cage complexes, biomolecules and saccharides.
4. A web service.

If anyone has any experience with additional cheminformatics IPad or iPhone apps it would be great to hear from you.

Thursday, 28 October 2010

Redirecting Eclipse Output to a file

Whilst I think this tip is quite fundamental, it's worth noting. Being able to direct output from the Eclipse console to a file, particularly if there are warnings generated from a program you have little to no control over and the file is rather large, and the start of the output goes off the console screen.

To redirect to a file simply:
1. Go to the Run menu
2. Run-> Run Configurations
3. Go to Common tab
4. Tick file on standard input/output
5. The console output will then be written to that file.

Sunday, 17 October 2010

The Future Is Bright, The Future is Orange

A recent post by Richard L. Apodaca on the use of Knime work flows in Eclipse for cheminformatics, provoked me to look at another piece of software, Orange. Orange has been around for some time, and is an opensource data visualization/ mining toolkit written in the Python programming language. The GUI is built on QT .

I recently downloaded the MAC OSx bundle, and was pleasantly surprised by the ease in which workflows could be created (see attached image). Using the Orange GUI is easy, it allows you to read in files of different formats, process or filter attributes, to cleanly visualize the data, data distributions, to classify data, show confusion matrices and ROC curves etc.

Since I am a big fan of Eclipse, I wanted to access the scripting side of the Orange library through Eclipse. Setting up a Pydev project is easy, however, when I came to run my program:

'''
Created on Oct 16, 2010
Example of using orange python -> constructs Naive Bayesian Classifier
@author: eoc21
'''
import os, sys, orange

class ClassifierExample():
def __init__(self,fileName):
self.data = orange.ExampleTable(fileName)
self.classifier = orange.BayesLearner(self.data[2:])

def runBayesLearner(self):
for i in range(2,20):
c = self.classifier(self.data[i])
print "original",self.data[i].getclass(),"classified as", c

def printProbabilities(self):
for i in range(2,20):
p = self.classifier(self.data[i],orange.GetProbabilities)
print "%d: %5.3f (originally %s)" % (i+1, p[1], self.data[i].getclass())

if __name__ == '__main__':
example = ClassifierExample(sys.argv[1])
example.runBayesLearner()
example.printProbabilities()

I came up against the error "orange.so can't work with 64 bit architecture", since I'm running Snow Leopard, which defaults to 64 bits, I had to set a variable called: VERSIONER_PYTHON_PREFER_32_BIT

to yes.

Everything then worked cleanly.

Orange is primarily for machine learning, however it also has tools to support workflows in bioinformatics, one can also use the molecule visualizer to view smiles strings from a file.

Sunday, 3 October 2010

CML - who uses it?

I was very intrigued to hear from a colleague/fellow developer, that apparently CML (Chemical Markup Language) is not very well used in the field of cheminformatics. Is this true of just industry and private companies? Does academia (other than the Murray-Rust group and Henry S. Rzepa) relish this format, I would be interested to hear peoples' views.

From the Journal of Chemical Information and Modeling, "Chemical Markup Language" has retrieved 76 hits. Of which, 17 of these papers have included either PMR or Henry S. Rzepa (approx 22.4%).

If CML is not the chosen chemical format, what is the predominate format? SMILES, Inchi, Inchi key, sdf, mol2, etc? What will be the predominate format of the future? RDF, OWL?

Thursday, 2 September 2010

Core Cheminformatics Competencies

One article I found on line that interested me was "10 Useful Bioinformatics Skills to Have". Clearly all ten points apply to a professional in the field of cheminformatics. However it would be interesting to know what qualities are considered most highly amongst employers.

For this purpose, I have reviewed the jobs advertised on the Computational Chemistry List from the 19th of February 2010 to the 1st of September 2010. Alternative job sources could have been found, I agree, however the site is fairly representative of the types of jobs people having finished a degree in computational chemistry may consider.

Have Your Cake and Eat It
Of the 125 jobs, the pie chart above shows the distribution of cheminformatics jobs by country around the world. Clearly the cheminformatics job market is dominated by the USA, taking just over 50% of all the jobs advertised on the CCL.net over the last 8 months. Germany has taken silver on the podium with 16 positions available, whilst the UK has taken bronze with 10.

What skills are employers seeking?
Well this is a good question, now that we've established where the main job market / demand for cheminformatics jobs are.

The top ten skill expected by an employer are:
1. PhD (75).
2. Experience in Molecular Dynamics (36).
3. Experience in programming (34).
4. Simulation experience (29).
5. Strong oral communication skills (26).
6. Strong written communication skills (26).
7. Experience with Linux (22).
8. Python programming (22).
9. Team player (17).
10.Experience with docking software.

For the more interested reader, I have given a more detailed breakdown of the results in the table below.

Skill	Count
Linux	22
Molecular Dynamics	36
Programming Language	34
Organization skills	2
Oral Communications	26
Written communications	26
Working in a team	17
PhD	75
Simulation Experience	29
Docking	17
Python	22
R	5
Java	13
C	16
C++	16
Tcl	1
C#	2
Ruby	1
Perl	8
Fortran	7
Matlab	4
UML/XML	1
Database Knowledge	5
QSAR	11
Molecular Modeling	18
Virtual screening	11
Genomics	3
Pharmacophore Elucidation	5
Quantum Mechanics	19
Conformational Analysis	1
Homology Modeling	8
Ligand based design	7
Structure based design	9
Super computing experience	3
Parallel programming	3
Experience in force field development	3
Willingness to travel	3
Scripting	10
Workflow experience	3
Machine Learning	4
Chemogenomics	2
Semantic Web technologies	1

Thursday, 19 August 2010

Boston ACS Fall 2010

I am just preparing the last few tweaks to my poster before I present it in the ( CINF Scholarship for Scientific Excellence) (Chemical Information) on Sunday the 22nd of August. This should be an exciting event, there are 8 other posters. The posters range in content from semantic web applications, through to toxicity prediction and virtual screening.

With regards to the rest of the program, unfortunately I can not stay past Monday, however some talks that I would have liked (may not get to see on Sunday) to have seen include:

In the general papers
1.#84, Chemistry in your hand by Dr Anthony J. Williams (ChemSpider).

2. #86 Extracting information from the IUPAC Green Book, by Prof Jeremy G Frey from the University of Southampton.

Data Intensive Drug Design
1. #12 Public-domain data resources at the European Bioinformatics Institute and their use in drug discovery, by Christoph Steinbeck.

2. #16 Data drive life sciences: The Pyramids meet the Tower of Babel by
Dr. Rajarshi Guha, NIH.

Recent Progress in Chemical Structure Representation

1. #67 Recent IUPAC recommendations for chemical structure representation: An overview by Mr. Jonathan Brecher, CambridgeSoft.

2.#69 Line notations as unique identifiers by Krisztina Boda PhD.

There are also a number of presentations from the Semantic Web in Chemistry division.

#4 Chemical e-Science Information Cloud (ChemCloud): A semantic web based eScience
infrastructure, by Prof. Dr. Adrian Paschke, FIZ Chemie, Berlin.

#36 ChemicalTagger:A tool for semantic text-mining in chemistry
by Dr Lezan Hawizy, University of Cambridge.