SureshKumar's Bioinformatics Blog

I am Suresh Kumar Sampathrajan. I have completed my PhD degree in bioinformatics from the University of Vienna, Austria in the year 2010. If you want to know more about me and my research,please click the menus at the top.

I have started this bioinformatics blog mainly for undegraduate and postgraduate students of bioinformatics. This blog will serve as an open resource material for the students and for those who wish to know about bionformatics. This blog contains video tutorials, tips, bioinformatics software downloads, articles on bioinformatics and career opportunities.

Life inside a cell animation

Molecular & Cellular Biology program that "transports Biology students into a three-dimensional journey through the microscopic world of a cell". the animation illustrates the mechanisms that allow a white blood cell to sense its surroundings & respond to an external stimulus.

click here to see the movie

Specialized Immunology databases

Immunology databases provide more detailed information on immunologically relevant molecules, systems and processes. They are typically annotated by experts and contain immunology-specific annotations.

Kabat database ( contains entries of proteins of immunological interest: Ig, T cell receptors (TCR), major histocompatibility complex (MHC) molecules and other immunological proteins.
Now not supported through website freely. It is available for purchase for $2250 US

The IMGT databases ( contain highquality annotations of DNA and protein sequences of Ig, TCR and MHC. They also contain IMGT-related genomic and structural data.

The FIMM database ( focuses on protein antigens, MHC molecules and structures, MHC-associated peptides and relevant disease associations.

The SYFPEITHI database ( contains entries of MHC ligands and peptide motifs.

The HIV molecular immunology database ( is an annotated searchable repository of HIV1 T cell and B cell epitopes.

Get a free copy of the Human Genome Landmarks poster!

The Human Genome Landmarks poster is a 24" by 36" wall poster that lists selected genes, traits, and disorders associated with each of the 24 different human chromosomes.

Request a free print copy of this poster online.

Click here to order online

Motifs discovery in groups of related DNA/protein sequences

1.Go to the MEME Website( and click on Discover Motifs.
2. Fill in the following fields in the MEME input form
a. E-mail address: Enter the E-mail address where results are to be
b. Description (optional): Enter information describing the sequences and/or parameters of the MEME run. This information will be included in the subject of the E-mail message received from MEME and can be very useful if submitting many MEME runs.
c. Name of a file: Use the Browse button to enter the path to the
training set file.
d. Number of motifs: Enter 2.
3. Click on the Start Search button. This will submit the search to the MEME Web-server at the SDSC. Within a few seconds, the browser should display a verification message.
4. Use an E-mail reader to receive the confirmation message MEME will send. If this message does not arrive, it is possible that the Email address was mistyped. In that case, resubmit the MEME run.
5. Save MEME results to a text file.

Toolbar for browsing biological data and databases

The biobar project is a bioinformatics power-browsing toolbar for Mozilla-based browsers including Firefox/Flock/Mozilla/Netscape and Seamonkey.

The primary advantage of this tool is that it allows a biologist to browse and retrieve data from Genomic, Proteomic, Functional, Literature, Taxonomic, Structural, Plant and Animal-specific databases. In addition to the browsing features, biobar also provides links to important bioinformatics sites and services including services at the European Bioinformatics Institute (EBI), National Center for Biotechnology Information (NCBI) and DNA Data Bank of Japan (DDBJ). The tool also provides links to major data deposition sites for nucleotide, protein and 3D-structure data. Finally, the menu also contains links to many Sequence, Structure alignment and analysis tools.Biobar provides browsing access to over 46 different databases (including Google Scholar, HubMed etc)

Install biobar toolbar

Don't have Firefox browser. Get it now.

Keeping up with the Human genome - Tim Hubbard

Abstract from Tim Hubbard talk:
Thirty times bigger than the worm genome that we were only just getting to grips with and with far greater numbers of interested users. The Ensembl project was started from scratch to handle this data: a system to store the data in an RDBMS; a pipeline to generate a pre-computed set of analysis; an API to provide both web and programmatic access. Ensembl evolves continuously: a new release is made every 2 months and in nearly every release the schema is updated to handle new data types. It now integrates more than thirty large genomes and provides researchers with a resource of >300Gb of data, all of which is free to download. The website alone generates >1million page impressions per week. However, with genome sequencing output per machine recently jumped 300 fold and costs having dropped 10 fold, with more drops promised, what Ensembl deals with now is tiny compared to what is to come.

Despite all this data, we are far from understanding our genome. Given the complexity of the system it is probably only feasible to tackle it as a huge global collaborative project, making data integration and exchange critical. One of the most significance features of the genome sequence is that it provides a framework to organize other biological information. However, there's a limit to how much can be usefully imported into a single database, especially as new resources spring up continuously and frequently are of unknown scientific value. The web has been constructed on links, however its hard to compare data unless it is easily aggregated. The Distributed Annotation System (DAS) is essentially a system of standardized web services: each provider runs a DAS server; DAS clients can aggregate data from as many servers as they wish around a single coordinate system, i.e. a genome sequence. Ensembl is both a DAS server and DAS client. There are analogies with layering data on and google earth, except that here the servers of different layers are distributed. However visual integration is only a first step: the genome is too big for researchers to explore manually. We are going need to computational guide researchers to the most interesting areas of the genome.

Computational docking

Computational docking is a technique with which one predicts the 3D structure of the complex between two or more molecules. Typically, its applications are confined to protein-protein complexes and to associations between proteins and small molecules. The 3D structure of the individual partners must be known and it is possible to consider computational docking as an extension of the modelling techniques, used to predict 3D structure of proteins. Computationally it is possible to determine the 3D structure of complexes that cannot be obtained through experimental techniques like crystallography or NMR spectroscopy.

A number of inter-molecular interactions are in fact transient, from a kinetics point of view, or weak, from a thermodynamics point of view. Consequently they cannot be studied experimentally, since the average concentration of the complex is too low. Computational docking is therefore the only possibility to determine the 3D features of these types of inter-molecular associations.

Membrane Proteins Structure database

A database specialized in membrane proteins structures determined by x-ray and electron diffraction with links to the Protein Data Bank and other useful sites. It is a typical example in which the sub-cellular location and, to some extent, the physiological function is the criterion of inclusion of the data. It provides serveral links to other source of information.

Go to Membrane Proteins Known 3D Structure

Proteins that lack definite 3D structure

Not all the proteins have definite 3D structure.There are partially and wholly unstructured proteins have been identified in all kingdoms of life, more commoly in eukaryotic organisms. These proteins are called as protein disorder or intrinsically disordered proteins. These unstructured regions in the proteins are gaining importance since they take part in functional important pathways (eg.Signal transduction pathways) and associated with various disease related Proteins. They are functions classified into four categories: molecular recognition, molecular assembly/disassembly, protein modification and entropic chains.

Protein disorder can be directly studied by NMR or circular dichromism, or indirectly detected by a variety of experimental methods including stretches of missing electron density in X-ray crystallography maps, Raman spectra, hydrodynamic measurements or even limited, time resolved proteolysis. Each one of these methods detects different aspects of disorder resulting in different operational definitions defintions of protein disorder.

Visualize DNA structure through Music

DNA can be represented in a variety of ways, which can provide different visual perspectives of molecular structure.This Musical Atlas presents an aural representation of the B-DNA molecules without mismatches, drugs, or modifiers.For each structure, there is a "Plain Melody," which follows a simple algorithm to highlight the structure's sequence, and a "Composition," which follows a more complicated algorithm that features the base pairing of the structure.


In each melody, each base in the sequence is played for one beat.


For each composition, there are four measures in which every quarter note gets one beat.

The number of beats per measure is based upon the length of the nucleotide; the number of beats per measure is half the number of bases per strand.

The sequences used here all had an even number of base pairs. However, if a sequence contained an odd number of bases, the number of beats per measure will be half that amount minus the remainder.

Each base in the asymmetrical strand is an eighth note (as opposed to the quarter note used in the Plain Melody; an eighth note is half the length of a quarter note).

The compositions consist of two lines:

Melodic Line

The melodic line is the melody derived directly from the sequence of the molecules. If the asymmetric strand is self-complementary, the DNA molecule will have only one melody. If the strand(s) in the asymmetrical unit is(are) not self-complementary, both the asymmetrical strand and its symmetry related strand each have a separate melody.

In this algorithm, there are four measures to each melody. The melodic line consists the sequence being repeated of the asymmetrical sequence being repeated four times.

Bass Line

The first measure is a full measure rest for the bass line while the full sequence is played on the melodic line.

The second measure begins with the complimentary strand. This strand is read 3' to 5' (essentially, it base pairs with the melody).

The third measure slightly expands upon the base pairing concept of the second measure. Using notes from the a minor scale, the base pairing note in the bass line is followed by specifically assigned notes to create counterpoint while the melody is being

Go to Musical Altas

Database for RNA structural classification

The Structural Classification of RNA (SCOR) is a database designed to provide a comprehensive perspective and understanding of RNA motif structure, function, tertiary interactions and their relationships.

The structural elements are organized in a directed acyclic graph (DAG) architecture, allowing multiple parent classes for a motif. Users can browse the database or search by PDB or NDB identifier, keyword or sequence. Descriptions and cartoon representations of each of the classes are available.

The SCOR database can be used for RNA functional prediction, in searching for functional RNAs in genomes and further it can be used for RNA design and disovery of RNA protein.

Go to SCOR database

Twitter Delicious Facebook Digg Stumbleupon Favorites More