Thursday, 19 August 2010

Boston ACS Fall 2010



I am just preparing the last few tweaks to my poster before I present it in the ( CINF Scholarship for Scientific Excellence) (Chemical Information) on Sunday the 22nd of August. This should be an exciting event, there are 8 other posters. The posters range in content from semantic web applications, through to toxicity prediction and virtual screening.


With regards to the rest of the program, unfortunately I can not stay past Monday, however some talks that I would have liked (may not get to see on Sunday) to have seen include:

In the general papers

1.#84, Chemistry in your hand by Dr Anthony J. Williams (ChemSpider).


2. #86 Extracting information from the IUPAC Green Book, by Prof Jeremy G Frey from the University of Southampton.

Data Intensive Drug Design
1. #12 Public-domain data resources at the European Bioinformatics Institute and their use in drug discovery, by Christoph Steinbeck.

2. #16 Data drive life sciences: The Pyramids meet the Tower of Babel by
Dr. Rajarshi Guha, NIH.


Recent Progress in Chemical Structure Representation


1. #67 Recent IUPAC recommendations for chemical structure representation: An overview by Mr. Jonathan Brecher, CambridgeSoft.

2.#69 Line notations as unique identifiers by Krisztina Boda PhD.

There are also a number of presentations from the Semantic Web in Chemistry division.

#4 Chemical e-Science Information Cloud (ChemCloud): A semantic web based eScience
infrastructure, by Prof. Dr. Adrian Paschke, FIZ Chemie, Berlin.

#36 ChemicalTagger:A tool for semantic text-mining in chemistry
by Dr Lezan Hawizy, University of Cambridge.

Sunday, 8 August 2010

Wrapping it all up




Since I'm going to be dealing with quite a lot of C++ code, it's interesting to know how one could use this functionality from a different programming language all in the cozy environment of one's own living room -the Eclipse IDE.

To get started, all we need is three ingredients: eclipse, SWIG and the SWIG plugin for eclipse called sKWash. For those unfamiliar with SWIG, according to their website, it is

"a software development tool that connects programs written in C and C++ with a variety of high-level programming languages."

Essentially you take some C or C++ code,

/* File : example.c */

#include
double My_variable = 3.0;

int fact(int n) {
if (n <= 1) return 1;
else return n*fact(n-1);
}

int my_mod(int x, int y) {
return (x%y);
}

char *get_time()
{
time_t ltime;
time(<ime);
return ctime(<ime);
}


write an interface file

/* example.i */
%module example
%{
/* Put header files here or function declarations like below */
extern double My_variable;
extern int fact(int n);
extern int my_mod(int x, int y);
extern char *get_time();
%}

extern double My_variable;
extern int fact(int n);
extern int my_mod(int x, int y);
extern char *get_time();


build the module, i.e. for python:




unix % swig -python example.i
unix % gcc -c example.c example_wrap.c \
-I/usr/local/include/python2.1
unix % ld -shared example.o example_wrap.o -o _example.so


We can now use the Python module as follows :

>>> import example
>>> example.fact(5)
120
>>> example.my_mod(7,3)
1
>>> example.get_time()
'Sun Feb 11 23:01:07 1996'
>>>

This however involves the command line. The advantage of sKWash is that it provides a GUI to SWIG through eclipse. An example of its usage is shown here; which can of course be extended to more complicated code such as the conversion of whole libraries e.g. with Quantlib a C++ library for quantitative finance. The library can then be used from the desired programming language.

In brief, you create two C++ projects in eclipse, one to store your main code, the other as a container for the C++ wrapper generation. Then create a Java or "target language" project you require in eclipse, this acts as the container for wrapper code for the target language. In the final stage create a sKWash project and specify the C++ wrapper generation project and the target language project. Eclipse will then generate the interface files, then build the project. The wrapper code should now be displayed in your wrapper projects.

Sunday, 1 August 2010

Mash it all together using Yahoo! Pipes



I came across Yahoo Pipes! completely by random. It is essentially a web application provided by Yahoo! to build data mashups from web feeds, web pages and other web services and has a GUI to simplify the process.

One application of Yahoo Pipes! that I found of interest, was to aggregate data on jobs taken from the CCL.net (Computational Chemistry Jobs Board) by Sara Nichols, a postdoc in the McCammon group.



She has created a pipe capable of producing either RSS feeds or JSON that can be incorporated into webpages.

Saturday, 31 July 2010

This may seem obvious

For some, it may be obvious / straight forward to implement, however the number of times I have been approached to ask for a script to transform the output of integer based fingerprints, such as MACCS keys or pharmacophore fingerprints coming out of MOE (the Molecular Operating Environment) into binary 000110101 representations so that molecules can be compared based on their Tanimoto similarity or some other measure or for input into a machine learning algorithm is astounding.

So here is a useful script that will convert integer based fingerprints into binary fingerprints.

import os, sys

class ConvertIntegerFPToBinary():
def __init__(self,descriptorFile,outputFile,label):
self.iFile = descriptorFile
self.oFile = outputFile
self.binaryFingerprints = []
self.integerList = []
self.label = label

def populateList(self,maxBitSize):
for i in range(1,maxBitSize+1):
self.integerList.append(str(i))

def convertData(self,maximumBitSize):
self.populateList(maximumBitSize)
inputFile = open(self.iFile,'r')
data = inputFile.readlines()
for i in range(1,len(data)):
splitdata = str(data[i]).replace("\"", "").split()
binaryFingerprint = []
for j in range(len(self.integerList)):
if self.integerList[j] in splitdata:
binaryFingerprint.append(1)
else:
binaryFingerprint.append(0)
binaryFingerprint.append(self.label)
self.binaryFingerprints.append(binaryFingerprint)
return self.binaryFingerprints

def postProcessing(self,fp):
binaryFingerprint = str(fp).replace("[", "").replace("]", "").replace("'", "")
return binaryFingerprint

def writeToFile(self):
outputFile = open(self.oFile,'w')
for i in range(len(self.binaryFingerprints)):
processedFingerprint = self.postProcessing(str(self.binaryFingerprints[i]))
if i == len(self.binaryFingerprints)-1:
outputFile.write(processedFingerprint)
else:
outputFile.write(processedFingerprint+"\n")

if __name__ == '__main__':
converter = ConvertIntegerFPToBinary(sys.argv[1],sys.argv[2],sys.argv[3])
binaryFingerprints = converter.convertData(166)
converter.writeToFile()

Wednesday, 28 July 2010

Protection, Protection, Protection!



A long long time ago when I can still remember (well I think it was about 3 years ago), I set up a MediaWiki on my website (installation instructions are here). Whilst I was reaping the benefits of having a nice Web 2.0 interactive site, with ease of adding new content and web pages, something more sinister was lurking in the background. It was spam. Because there were few extensions at this time to prevent spam I thought it wouldn't happen to me. I was later told by my server administrator that I had a few gigs of spam on my wiki that was slowing down the whole server.

After removing the last wiki, I have now bravely decided to install it again (fortune favours the bold). This time round however, the first thing I have done is install all the possible extensions to combat spam. There is an exhaustive list of extensions here. There are a number of ways that one can restrict spam:

1. Restrict reading of the wiki, by setting $wgGroupPermissions['*']['read'] = false;
2. Prevent account creation, except by the administrator $wgGroupPermissions['*']['createaccount'] = false;
3. Restrict anonymous editing by setting $wgGroupPermissions['*']['edit'] = false;
4. Admin can also use the 'protect' facility to restrict access to certain pages.
5. Removing the login link / create account link using the following function:

function NoLoginLinkOnMainPage( &$personal_urls ){
unset( $personal_urls['login'] );
unset( $personal_urls['anonlogin'] );
return true;
}
$wgHooks['PersonalUrls'][]='NoLoginLinkOnMainPage';

6. Capatcha extension
The fact that edits have to be confirmed means that most bots will not be able to make edits to the wiki. This extension is simple to install, simply download the php file and add require_once( "$IP/extensions/ConfirmEdit/ConfirmEdit.php" ); to your localSettings.php file, like all other additions shown above.


Now that my wiki is much more secure I thought I would also add a few more extensions. Clearly it is useful to have your calendar displayed without going to say your gmail all the time. MediaWiki has an extension for this here. Just download the extension then go to this link to get details on your calendar and copy and paste all information between "googlecalendar" tags, ignoring all comments before the '?' and removing the "iframe" tag at the end.


I have also installed the chemistry extension, which works, but in my opinion is in rather a beta stage of development. It allows you to write chemical formulas into your wiki between "chemform" tags and it automatically corrects the sub and superscripting of atom numbers and charges. See here for examples. One of the problems however that I have found with this extension, is that say for example you have dioxygen with a single negative charge, and put 02- it actually displays the molecule as though it has one oxygen atom with a -2 charge. Clearly some work will be needed on this extension, but it's interesting to see freely available chemistry extensions coming out of the wood work for MediaWiki.

Thursday, 22 July 2010

Python makes the REST look easy

Web services are becoming an increasingly common phenomenon in the field of chemoinformatics, where data and services are now being published more openly on the internet. For those unfamiliar with the concept of web services, Wikipedia spells it out to be "application programming interfaces (API) or web APIs that are accessed via Hypertext Transfer Protocol (HTTP) and executed on a remote system hosting the requested services."

Traditionally people used SOAP web services, however they have waned in popularity relative to a new paradigm of web service called REST, which stands for Representational State Transfer. More information on RESTful web services can be found at the above link.

I would just like to illustrate with the use of some code, the simplicity of accessing some RESTful web services using Python. The web services we will be accessing are chemoinformatics web services running on a server at the University of Indiana under the CHEMBIOGrid projects.

The two services we will be accessing are:
1. 3D structure for a PubChem compound, retrieved from the Pub3D database
2. Generation of a 3D structure from a SMILES string, using the smi23d program

The code to access these services is shown below.


'''
Created on Jul 22, 2010
Program to access a REST web service hosted at Indiana University
@author: ed
'''
import urllib2, os, sys

class ChemoinformaticsRESTWebServices():
def __init__(self,url):
self.url = url

def callWS(self,parameter):
try:
url = self.url+parameter
data = urllib2.urlopen(url).read()
print data
except urllib2.HTTPError, e:
print "HTTP error: %d" % e.code
except urllib2.URLError, e:
print "Network error: %s" % e.reason.args[1]

if __name__ == '__main__':
#3D structure for a PubChem compound, retrieved from the Pub3D database
ws = ChemoinformaticsRESTWebServices("http://cheminfov.informatics.indiana.edu/rest/db/pub3d/")
ws.callWS(sys.argv[1])
ws.callWS(sys.argv[2])
#Generation of a 3D structure from a SMILES string, using the smi23d program
ws2 = ChemoinformaticsRESTWebServices("http://cheminfov.informatics.indiana.edu/rest/thread/d3.py/SMILES/")
ws2.callWS(sys.argv[3])

Wednesday, 21 July 2010

ベンゼン

Not sure of the title? Well if you knew Japanese you'd be ok. Since for some, or many of us, Japanese may not be the easiest of languages to pickup or understand, as an English speaker, I haven't got a clue what it means. Other than picking up a dictionary or attending some Japanese classes, you can use Google Translate. The advantage of Google Translate, is that it can convert between a multitude of different languages. If the suspense is killing you, the title actually means "Benzene".

I would be interested to see how well Google Translate works on other chemical names. If you're interested in converting say a batch load of names, there are Java and Python APIs available to access the web service.

Converting the name Benzene in English to its Romanian counterpart Benzen, works correctly.

Taking something more elaborate like: 13-amino-N-(2-{2-[(2-{[2-(2-aminoethyl)amino]ethyl}amino)ethyl]amino}ethyl)-
2,5,8,11-tetraazatridecanamide

returns -> 13-amino-N-(2 - (2 - [(2 - ([2 - (2-amino) etil] amino) amino) etil] amino) etil) -
2,5,8,11-tetraazatridecanamide

which I think is also right (with the exception of the loss in curly brace notation), though I will have to check.

Example Java code to translate

package eoc21;

import com.google.api.translate.Language;
import com.google.api.translate.Translate;

public class TranslateChemicalNames {

public static void main(String[] args) {
Translate.setHttpReferrer("en-fr");
try {
String translatedText = Translate.translate("tri-oxa-tri-silinane",
Language.ENGLISH, Language.ROMANIAN);
System.out.println(translatedText);
} catch (Exception ex) {
ex.printStackTrace();
}
}
}