SureshKumar's Bioinformatics Blog

I am Suresh Kumar Sampathrajan. I have completed my PhD degree in bioinformatics from the University of Vienna, Austria in the year 2010. If you want to know more about me and my research,please click the menus at the top.

I have started this bioinformatics blog mainly for undegraduate and postgraduate students of bioinformatics. This blog will serve as an open resource material for the students and for those who wish to know about bionformatics. This blog contains video tutorials, tips, bioinformatics software downloads, articles on bioinformatics and career opportunities.

Identifying Paralogs and Orthologs via COGs and KOGs databases

Orthologs and Paralogs defined as

Orthologs: similar sequences or genes in different species that arose through speciation and mutation and not from gene duplication.

Paralogs: Related genes(or proteins) in the same genome. The related genes have arisen by gene duplication.

COG and KOG databases:

The COG(Clusters of Orthologous Groups) and KOG (euKaryotic Orthologous Groups) databases have been constructed using a careful analysis of BLAST hits.

First, low-complexity sequence regions and commonly occuring domains are masked to prevent spurious hits and also to improve the the statistical score analysis (E-values).

All gene sequences from one genome are then scanned against all from another genome, noting the best-scoring BLAST hits for each gene, and this is repeated for all possible pairs.

Paralogous genes within a genome that result from gene duplication since divergence of two species are identified as those that give a better-scoring BLAST hit with each other than their BLAST hits with the other genome.

Orthologus genes are found as groups of genes from different genomes that are reciprocal BLAST hits of each other.

All sequences in a COG or a KOG are assumed to have a related function, and thus the method can be used to predict gene and protein function.

Twitter Delicious Facebook Digg Stumbleupon Favorites More