Suresh Kumar- Research Interest

Genomic technologies are generating a vast amount of information. Bioinformatics, at the intersection between Biology and Computation, has recently getting importance. Bioinformatics addresses the specific needs in data acquisition, storage, analysis and integration that research in genomics generates.

My research interest is primarily in the area of bioinformatics. My research focus on

Sequence level: sequence analysis of amino acids/DNA, Prediction of secondary and tertiary structure of Protein, Motif identification, remote homology identification, evolutionary related studies using phylogenetics, domain identification, conserved orthologue identification and prediction of sub-cellular localization.
Genome level: Genome analysis, comparative genomics-identification of eukaryotic promoter and its regulatory elements.
Computation: Database creation, Machine learning & data mining, web-based resource creation and bioinformatics software creation.


1. Ph.D Bioinformatics (From August 2006 to January 2010)
Department of Structural and Computational Biology, University of Vienna, Austria
Thesis title: Likelihood of Protein Structure Determination
Supervisor: Prof. Oliviero Carugo
(PhD fellowship funded by Austrian Genome Programme (GENAU), Bioinformatics Integration Network (BIN-II)) 

My PhD research is on the development of novel techniques to predict the level of structural organization of a protein construct on the basis of its amino acidic sequence. Several factors such as conformational disorder, improper selection of domain boundaries and solubility can hamper the production of protein constructs for structural biology. Reliable computational protein crystallization propensity predictors, based on amino acid sequences, are consequently required.

(i). Consensus Prediction of Protein Conformational Disorder
Natively disordered or unfolded proteins are proteins that do not form a stable three-dimensional structure in their native state. A disordered protein can be either completely unfolded or comprise both folded and unfolded segments (Fink 2005).Natively disordered proteins carry out function by means of regions that lack specific 3-D structure which exists in as ensembles of flexible, as unorganized molecules, some as flexible ensembles along their entire lengths, while in other cases only localized regions lack organized structure. These proteins called in various terms as "rheomorphic, natively unfolded, natively denatured, intrinsically unstructured, and several variants of disordered" (Dunker et al. 2008).

Prediction of protein conformational disorder is important since it can cause difficulty in crystallization. In my PhD work, a new procedure is presented that allows one to predict disordered residues with high accuracy on the basis of amino acid sequences, by using a consensus method based on various prediction tools. The performance of the new procedure is significantly better than that of each individual predictor previously reported.

(ii). Prediction of Quaternary Structure
Based on structure hierarchy proteins are classified as primary, secondary, tertiary, and quaternary. Quaternary structure refers that proteins contain more than one polypeptide chain. Each polypeptide chain in the protein is called a subunit. The subunits can be the same polypeptide chain or different ones. Protein assemblies composed of one polypeptide chain are termed as monomers and those composed of more than one polypeptide chain are called oligomers.Oligomers depending identical subunits called homo-oligomers and those which are not called hetero-oligomers.

A protein chain can correspond to a monomeric protein or it can form together with other chain, oligomeric assemblies, which can be either homo-oligomers or hetero-oligomers. In the later case, it must be avoided to determine the three dimensional structure of a single protomer, since it will not be functional and it will also be extremely difficult to express in a soluble form. It is thus desirable to have a computational tool that allows one to predict if a potential gene product is a part of permanent and obligate heterooligomeric assembly.

I have developed a novel method by employing machine learning methods in discriminating hetero-oligomeric from monomeric and homo-oligomeric proteins and also between monomeric and homo-oligomeric protein on the basis of amino acid sequences.

(iii). Prediction of Metalloproteins
Metalloproteins are proteins capable of binding one or more metal ions or metal containing cofactors, which are required for biological function or for the regulation of their activities or for structural purposes (Passerini et al. 2007). Metal-binding capabilities are encoded in the amino acidic sequences and these primary sequences are related to the protein three-dimensional structure.

Prediction of metalloprotein helps crystallographer to select proper growth medium for over-expression studies and also to increase the probability of obtaining a properly folded molecule. I have shown that the uptake of metal ions by proteins can be predicted on the basis of the amino acid composition. By employing machine learning methods results have been achieved with high accuracy besides discriminating between various metal species.

(iv). Filamin Bioinformatics Characterization
Filamins are large cytoplasmic homodimeric proteins that crosslink cortical actin into three dimensional structures and give mechanical force to cells by binding to actin filaments forming bundles or gel networks (Van der Flier and Sonnenberg 2001).

I have implemented the above developed methods (Prediction of Protein disorder, Quaternary structure and metalloprotein) in bioinformatics characterization of filamin protein. 

(v). Protein Domain Boundary Predictions: A Structural Biology Perspective
Domains constitute the structural, functional and evolutionary units of proteins. Proteins can be built from a single domain or an assortment of domains. We have studied the performance of several computational approaches for protein domain boundary prediction that were made publicly available in CASP 7 experiment. These predictors were compared and the reliability of these prediction methods for practical application in structural biology was tested.

2. Bioinformatician (July 2005 to December 2005)
IPGRI-INIBAP, Montpellier, France
Project title: Enabling Biological databases Interoperability to create an online integrated Musa information resource on Banana and Plantain accessible worldwide
Supervisor: Dr.Nicolas Roux
(Funded by Generation Challenge Programme fellowship)

Project involves datamining, clustering, annotation and maintainence of Musa EST sequences in database. The main workflow of project is

  • Collection of EST sequences from Musa genome project sequencing centres, assembled and maintained as flat files
  • Removal of vector contamination from EST sequences using cross match application and NCBI UniVec database
  • Low complexity sequences in collected EST sequeces were masked using Repeatbeater algorithm
  • Installation of OpenSputnik pipeline and standalone blast in linux platform
  • EST sequences were clustered into pools of related sequences by HPT2 algorithm using OpenSputnik pipeline
  • The resulting clusters were assembled into unigene sequences
  • The Unigene sequence have been annotated using a selection of bioinformatics tools like targetP, signalP, blast etc
  • Development of database in mysql containing Unigene sequence

 3. M.Sc Thesis (January 2005 to May 2005)
Department of Biology, University of Leicester, UK
Thesis title: Molecular Biodiversity and Banana Genomics: Anonymous markers, Gene Analysis and Access
Supervisor: Prof. (Pat) J.S. Heslop-Harrison
(Invited as Visiting Researcher-Funded by GCP)

The main outline of my work are
  • Analysis of diversity in a triploid crop using data from Generation CP in Musa: Quantitative analysis of accession relationships and phylogeny, taking into consideration complex genome relationships of the diploid species involved. Development of appropriate analysis methods for SSR and IRAP-based polymorphism analysis, taking into consideration data quality and ease of band scoring. Integration of diversity measurements made using different experimental methods, using nuclear, chloroplast and mitochondrial sequences
  • Development of optimized COS (conserved orthologue set) markers in Musa: Investigation of EST database of Musa, rice and Arabidopsis. Generation of data about Musa genome organization, including codon usage. Investigation of drought tolerance genes and control regions from literature and optimization with respect to Musa gene discovery
  • Analysis of Musa BAC sequence data: Extract diversity information and comparison with EST sequence information. compare sequences with other species with aims related to diversity analysis above
  • Development of website for the group: with particular emphasis on dissemination 1) of protocols that are robust, clear and accessible; and 2) of technical information which is useful to a wide range of end-users, including linkage and integration with suitable analysis and database tools (Website URL:
4.Summer Internship ( May to July 2004 )
Centre for Cellular and Molecular Biology, India
Project title: Remote homolog identification of Drosophila-GAGA protein in Mouse
Supervisor: Dr.Rakesh Mishra

Drosophila transcriptional activator known as GAGA factor, or Trithoraxlike, functions by influencing chromatin structure. GAGA is a multipurpose protein that mediates gene-specific regulation but also plays a global role in chromosome function. In my project is to find remote homolog of  drosophila GAGA protein in mouse through bionformatics analysis. My work involves identification of remote homolog through various bioinformatics tools like PSI-Blast, phylogenetic study using clustalw, homology modeling and superimposition using swisspdbviewer.

Twitter Delicious Facebook Digg Stumbleupon Favorites More