SureshKumar's Bioinformatics Blog

I am Suresh Kumar Sampathrajan. I have completed my PhD degree in bioinformatics from the University of Vienna, Austria in the year 2010. If you want to know more about me and my research,please click the menus at the top.

I have started this bioinformatics blog mainly for undegraduate and postgraduate students of bioinformatics. This blog will serve as an open resource material for the students and for those who wish to know about bionformatics. This blog contains video tutorials, tips, bioinformatics software downloads, articles on bioinformatics and career opportunities.

Human Protein Reference Database

Human Protein Reference Database (HPRD) that integrates information relevant to the function of human proteins in health and disease.

Data pertaining to thousands of protein-protein interactions, posttranslational modifications, enzyme/substrate relationships, disease associations, tissue expression, and subcellular localization for each protein in the human proteome.

Go to Human Protein Reference Database

Few suggestions for good biological database design

Few suggestion for good biological database design (from the NAR database issue 2007-Editorial). Here is the summary

1.The quality, quantity and originality of data as well as the quality of the web interface are the most important.

2.Web database should be comprehensive (database should not be overspecialized), attribution of original data sources .

3. For bulk data, it should be available as flat files

4. The database web address should have unique domain name and easy to remember. Providing easy web interface, easy searching.

5. Providing help and examples where every necessary.

6. Server should not be slow

The 2007 database issue update includes 968 databases, 110 more than the previous one.

It can be viewed online

Nucleic Acids Research, 2007, Vol. 35, Database issue D1-D2

A database of incorrect Protein conformations

Decoys ‘R’Us database contains a wide variety of decoys generated by different methods with the aim of fooling scoring functions. Decoys are computergenerated conformations of protein sequences that possess some characteristics of native proteins, but are not biologically real.

Decoys have been based on discrete-state models, molecular dynamics trajectories, crystal structures of different resolutions ,conformations with different loops, and amino acid sequences mounted on radically different folds.

In other words, this database provide incorrect conformations data in order to improve the protein structure prediciton.

Organisation of decoy sets

1.The multiple decoy sets
2.The single decoy sets
3.The loop decoy sets

The current version of the entire decoy set is only available as a single tar and gzipped file to download.

Go to Decoys 'R' Us database

Tips to use EBI new search interface-"EB-eye"

EMBL-EBI launched its new website interface with powerful search engine called he "EB-eye", a powerful search engine allowing instant searches of all the EBI's databases from a single query.

EB-eye Search is developed on top of the Apache Lucene project framework, which is an Open-source, high-performance, full-featured text search engine library written entirely in Java. It uses this technology to index EBI databases in various formats (e.g. flatfiles, XML dumps, OBO format, etc.) and provides very fast access to the EBI's data resources. The system allows the user to search globally across all EBI databases or individually in selected resources by using an Advance search.

1. Simple search

(i)boolean operators
* AND - (default) meaning that term1 AND term2 must exist in the searched documents. eg.cytochrome AND c
* OR - meaning that either term1 OR term2 must OR c
* NOT - meaning that term1 must not be present in any of the displayed documents (e.g. excludes documents containing the term1). eg.glutathione NOT transferase

* + '+term1' - The document must contain the term1.
* - '-term1' - Prohibit operator: The document must not contain term1.

At the bottom of any results page there is a 'Refine your search box'. This one will allow the user to add terms to the query and automatically appends additional AND operators to the search.

(ii)Term Modifiers:

* '*' - as in 'gluta*' (glutacin, glutamate, glutamic, etc.)
* '?' - as in 'b?ind' (bind, bond, band, etc.)

(iii) Gouping terms together using parenthesis e.(reductase OR transferase) AND glutathione

2. Advanced Search

(i) Searches with all the words in a string
(ii) Searches of the exact phrase - quoted string
(iii) Searches with at least one of the words in the string
(iv) Searches that display results where none of the words in the input string are present

3. Domain Specific Search

Allows the user to narrow searches to specific databases


Build Protein Model from your sequence in a easy way

If your protein sequence shows significant homology to another protein of known three-dimensional structure, then a fairly accurate model of your protein 3D structure can be obtained via homology modelling.

The easiest way to homology modelling automatically through Swiss Model Server-First approach method.SWISS-MODEL is a fully automated protein structure homology-modeling server.

1.Fill your details with your email address (the homology model of PDB file will be email to you).Your name, and title to identify your model.

2.Paste you protein sequence in space provided.Sequences can be provided in either RAW, SWISS-PROT, FASTA or GCG format.

3.Click Send request

1.It is possible to send in a protein sequence only.
2.Recommended- Only to use if the degree of sequence homology is high (50% or greater) between your query sequence and target sequences to get good model. This can be identified by similarity searching between your query sequence against PDB database using BLAST tool.
3.Carefully read the header section of the files to know what templates and alignments were used during the model building process.

Go to Swiss-Model Server

Ligand searching in Protein Data Bank (PDB)

Partial string search

Ligand name searching supports partial string matches. For example, searching for 'benz' will return all structures that contain benzene as well as those containing benzamidine.

Exact match search

For an exact match, the complete name of the ligand must be entered. Ligand searches can also be performed using the three-character ligand ID in the PDB file (the "HET" record). For example, searching for 'HEM' returns all structures that have a heme ligand.

MarvinSketch search

The PDB can be searched for structures containing the same ligand by drawing a ligand in MarvinSketch (provided by ChemAxon)

SMILES string search

SMILES (Simplified Molecular Input Line Entry Specification) is a comprehensive nomenclature system for chemicals.

eg.SMILES string for benzene:C1=CC=CC=C1 or c1ccccc1

SMILES search feature is the ability to query for ligands using a SMILES string representation

Go to PDB Ligand search

Aligning two sequences

To compare only two sequences that are already known to be homologous, coming from related species ‘BLAST 2 Sequence tool can be used.

‘BLAST 2 Sequence' utilizes the BLAST algorithm for aligning two protein or nucleotide sequences(i.e DNA-DNA or protein-protein) sequence comparison.

The resulting alignments are presented in both graphical and text form.A World Wide Web version of the program can be used interactively at the NCBI WWW site.

>>strand option: Forward strand, reverse strand or both strand
>>Parameters: Reward for a match and penalty for a mismatch
>>view options: Strandard, mismatch highlight
>>Masking colour option: Black, grey and red

Go to BLAST2 Sequences

PubMed Search Tips

PubMed was developed by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM) as part of the Entrez retrieval system.
It provides free access to MEDLINE, the NLM database of indexed citations and abstracts to medical, nursing, dental, veterinary, health care, and preclinical sciences journal articles
It includes additional selected life sciences journals not in MEDLINE.It adds new citations Tuesday through Saturday.

Basic Search Techniques

1.Type any key word or phrase into the search box as shown in the image. Use an asterisk (*) to retrieve variations on a word, e.g., bacter* retrieves bacteria, bacterium, bacteriophage, etc.

For a Subject Search: Enter one or more words (e.g., asthma drug therapy) in the query box and click on Go. PubMed automatically "ANDs" (combines) terms together so that all terms or concepts are present, and it translates your words into MeSH terms.

For an Author Search: Enter the author's name in the format of last name first followed by initials (e.g., byrnes ca).

Use Boolean operators (AND, OR, and NOT) to combine topics in the search box if desired. button GO to Run Your Search

3.Setting Limits

Click on 'Limits' on the Feature tabs as shown in the image. Choose the restrictions for your search, e.g. a specific language, article type, date, or subset of PubMed, e.g. nursing journals, cancer or bioethics.

Note: Limits remain in place until you change or remove them. Limits other than language or date will exclude NEW records that are "in process" or "supplied by Publisher."

4.Anatomy of a PubMed Search
PubMed employs a process called Automatic Term Mapping. This means that your search term is matched against (in the following order):

1. MeSH (Medical Subject Headings) Translation Table
2. Journals Translation Table
3. Phrase List
4. Author Index

For example:
Enter mad cow disease and Pub Med will search for the mapped MeSH heading, Encephalopathy, Bovine Spongiform OR the text words mad cow disease.

Go to PubMed

Twitter Delicious Facebook Digg Stumbleupon Favorites More