SureshKumar's Bioinformatics Blog

I am Suresh Kumar Sampathrajan. I have completed my PhD degree in bioinformatics from the University of Vienna, Austria in the year 2010. If you want to know more about me and my research,please click the menus at the top.

I have started this bioinformatics blog mainly for undegraduate and postgraduate students of bioinformatics. This blog will serve as an open resource material for the students and for those who wish to know about bionformatics. This blog contains video tutorials, tips, bioinformatics software downloads, articles on bioinformatics and career opportunities.

Ace of (data)base


A biological database is a large, organized body of persistent data, usually associated with computerized software designed to update, query, and retrieve components of the data stored within the system. A simple database might be a single file containing many records, each of which includes the same set of information. For example, a record associated with a nucleotide sequence database typically contains information such as contact name; the input sequence with a description of the type of molecule; the scientific name of the source organism from which it was isolated; and, often, literature citations associated with the sequence.

For researchers to benefit from the data stored in a database, two additional requirements must be met:
1.Easy access to the information; and
2.A method for extracting only that information needed to answer a specific biological question.

The principal requirements on the public data services are:

* Data quality - data quality has to be of the highest priority. However, because the data services in most cases lack access to supporting data, the quality of the data must remain the primary responsibility of the submitter.
* Supporting data - database users will need to examine the primary experimental data, either in the database itself, or by following cross-references back to network-accessible laboratory databases.
* Deep annotation - deep, consistent annotation comprising supporting and ancillary information should be attached to each basic datat object in the database.
* Timeliness - the basic data should be available on an Internet-accessible server within days (or hours) of publication or submission.
* Integration - each data object in the database should be cross-referenced to representation of the same or related biological entities in other databases. Data services should provide capabilities for following these links from one database or data service to another.
Primary databases(consisting of data derived experimentally)
a.) Sequence databases
DNA / nucleotide databases
GenBank

GenBank (Genetic Sequence Databank) is one of the fastest growing repositories of known genetic sequences. It has a flat file structure that is an ASCII text file, readable by both humans and computers. part of the International Nucleotide Sequence Database Collaboration.It consists of the DNA Data Bank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank (NCBI).In addition to sequence data, GenBank files contain information like accession numbers and gene names, phylogenetic classification and references to published literature.There are approximately contains publicly available DNA sequences for more than 170,000 different organisms, obtained primarily through the submission of sequence data from individual laboratories and batch submissions from large-scale sequencing projects as of 2006.It exchanges data on daily basis.

http://www.ncbi.nlm.nih.gov

EMBL

The EMBL Nucleotide Sequence Database is a comprehensive database of DNA and RNA sequences collected from the scientific literature and patent applications and directly submitted from researchers and sequencing groups. Data collection is done in collaboration with GenBank (USA) and the DNA Database of Japan (DDBJ). The database currently doubles in size every 18 months and currently (June 1994) contains nearly 2 million bases from 182,615 sequence entries.

http://www.ebi.ac.uk/embl/

DDBJ (DNA Data Bank of Japan)

DDBJ was established in 1986 at the National Institute of Genetics (NIG).It reorganized as the Center for Information Biology and DNA Data Bank of Japan (CIB/DDBJ) in 2001

http://www.ddbj.nig.ac.jp

Protein databases
SwissProt

SwissProt was established in 1986.It is maintained collaboratively by the EMBL Outstation (EBI) and the Swiss Institute of Bioinformatics (SIB). This is a protein sequence database that provides a high level of integration with other databases and also has a very low level of redundancy (means less identical sequences are present in the database).

http://www.expasy.org/sprot

TrEMBL (Translation of EMBL Nucleotide Sequence Databases)

It was created in 1996 as supplement to Swiss-Prot.It make new sequences available as quickly as possible
through computer-annotated entries derived from the translation of all coding sequences (CDS) in EMBL.

http://www.uniprot.org/database/knowledgebase.shtml

PIR (Protein Information Resource)
PIR was established in 1984 by the National Biomedical Research Foundation (NBRF), since 1988 maintained by PIR-International.It is partitioned into four sections by differences in classification, annotation and redundancy and cross-referencing to other biological databases.

http://pir.georgetown.edu
b.) Structure databases

PDB (Protein Data Bank)
Single worldwide repository for processing and distribution of 3-D biological macromolecular structure data.

http://www.rcsb.org/pdb/

NDB (Nucleic Acid Database)
The Nucleic Acid Database Project (NDB) assembles and distributes structural information about nucleic acids. The data available consist of coordinates, experimental details used to determine the structures, and derived information about the geometry of the structures.

http://ndbserver.rutgers.edu/

CCDB / CSD (Cambridge Crystallographic Data Centre / Cambridge Structural Database)
compilation of a computerised database containing comprehensive data for organic and metal-organic compounds studied by X-ray and neutron diffraction

http://www.ccdc.cam.ac.uk/prods/csd/csd.html

Secondary databases(derived information)

It contains derived information from a primary database, like information about conserved sequence, signature sequence and active site residues of the protein families arrived by multiple sequence alignment of a set of related proteins. secondary structure database contains entries of the PDB in an organized way (for instance, by classification of all PDB entries according to structures like alpha-helix or ß-sheets) and also information on conserved secondary structure motifs of a particular protein

ProSite (Database of Protein Families and Domains)
It contains patterns and profiles specific for more than a thousand protein families or domains and also background information on the structure and function of these proteins.

http://www.expasy.org/prosite

Pfam (Protein Families Database of Alignment and HMMs)
Large collection of multiple sequence alignments and hidden Markov models covering many protein domains and families.Pfam currently contains over 6,000 protein families and domains as of 2006.

http://www.sanger.ac.uk/Software/Pfam/

Enzyme (Enzyme Nomenclature Database)
Primarily based on the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB).It is a repository of information relative to the nomenclature of enzymes.

http://www.expasy.org/enzyme/

REBase (Restriction Enzyme Database)
It is collection of information about restriction enzymes and related proteins.Currently, there are over 4000 enzymes and over 7000 references stored in REBASE as of 2006.

http://rebase.neb.com/rebase/rebase.html

Genome-related Information
OMIM (Online Mendelian Inheritance in Man)
It is incorporated into NCBI's Entrez system and can be queried using the same approach as the other Entrez databases such as PubMed and GenBank. It has catalog of human genes and genetic disorders includes information on genetic variation in humans and also contains textual information, pictures, and reference information.

http://www.ncbi.nlm.nih.gov/Omim/

TransFac (Transcription Factor Database)
Database on eukaryotic cis-acting regulatory DNA elements and trans-acting factors.It covers the whole range from yeast to human.

http://www.gene-regulation.com

Structure-related Information
HSSP (Homology-derived Secondary Structure of Proteins)
A database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability and sequence profile. Tertiary structures of the aligned sequences are implied, but not modelled explicitly.

http://www.sander.ebi.ac.uk/hssp/

FSSP (Fold classification based on Structure-Structure alignment of Proteins)
Based on exhaustive all-against-all 3D structure comparison of protein structures currently in the Protein Data Bank (PDB)

http://www.bioinfo.biocenter.helsinki.fi:8080/dali/


Pathway Information

KEGG (Kyoto Encyclopedia of Genes and Genomes)
It is a suite of databases and associated software, integrating knowledge on molecular interaction networks in biological processes, the information about the universe of genes and proteins, and the information about the universe of chemical compounds and reactions.It serves as bioinformatics resource for understanding higher order functional meanings and utilities of the cell or the organism from its genome information.

http://www.genome.ad.jp/kegg

Composite databases

composite databases joins a variety of different primary database sources, which obviates the need to search multiple resources

For more database listing see:

http://www.oxfordjournals.org/nar/database/cap/

References:

  • The Molecular Biology Database Collection: 2006 update -Nucleic Acids Research, 2006, Vol. 34, Database issue D3-D5

Impact of human genome project


The Human Genome Project (HGP) is a project to map and sequence the 3 billion nucleotides contained in the human genome and to identify all the genes present in it. There are two draft sequences of the human genome were generated by the Human Genome Project (HGP)1 and Celera Genomics. The HGP used a hierarchical mapping and sequencing approach, involving generation of a series of overlapping clones that cover the entire genome and shotgun sequencing of each clone. The genome sequence was reconstructed by assembling the fragments on the basis of sequence overlap and mapping and chromosomal position information on the clones. Celera Genomics used a whole-genome shotgun sequencing approach, without generating a series of overlapping clones, but also incorporated HGP information where available.

Project goals
# Identify all the approximately 20,000-25,000 genes in human DNA,
# Determine the sequences of the 3 billion chemical base pairs that make up human DNA,
# Store this information in databases,
# Improve tools for data analysis,
# Transfer related technologies to the private sector, and
# Address the ethical, legal, and social issues (ELSI) that may arise from the project.

Facts after sequencing human genome

The present (assembly number 35, May 2004) human DNA sequence contains ~3,100,000,000 bp (depending on the actual source of the assembled DNA sequence) that covers most of the nonheterochromatic portions of the genome and contains some 250 gaps

we have ~20,000-25,000 genes (International Human Genome Sequencing Consortium 2004Go), somewhat fewer than estimates based on the preliminary reports of the human sequence (International Human Genome Sequencing Consortium 2001; Venter et al. 2001).

The sequence revealed the full extent to which human DNA is comprised of abundant interspersed repeats, extending and completing what was already known; fully 45% of our DNA consists of repetitive elements interspersed within nonrepetitive sequences. Interestingly, the extent and diversity of gene repetitions contained in low copy number repeats were greater than expected; very extensive duplications of regions of DNA both within and between chromosomes were identified by the International Human Genome Sequencing Consortium (2001) and Venter et al. (2001).

Challenges to bioinformatics research

The first challenge to bioinformatics research relates to the analysis of data posted on the Web in advance of publication without violating ethical standards

The second challenge to bioinformatics research derives not from restrictions on data access but from restrictions on downstream use, such as incorporation into new or existing databases.

Download free Pdf booklet - Bioinformatics and the Human genome Project

The Human genome : Future research

Genomics to biology

#Comprehensively identify the structural and functional components encoded in the human genome
#Elucidate the organization of genetic networks and protein pathways and establish how they contribute to cellular and organismal phenotypes
#Develop a detailed understanding of the heritable variation in the human genome
#Understand evolutionary variation across species and the mechanisms underlying it
#Develop policy options that facilitate the widespread use of genome information in both research and clinical settings

Genomics to health

#Translating genome-based knowledge into health benefits
#Develop robust strategies for identifying the genetic contributions to disease and drug response
#Develop strategies to identify gene variants that contribute to good health and resistance to disease
#Develop genome-based approaches to prediction of disease susceptibility and drug response, early detection of illness, and molecular taxonomy of disease states
#Use new understanding of genes and pathways to develop powerful new therapeutic approaches to disease
#Investigate how genetic risk information is conveyed in clinical settings, how that information influences health strategies and behaviours, and how these affect health outcomes and costs
#Develop genome-based tools that improve the health of all

Genomics to society

#Promoting the use of genomics to maximize benefits and minimize harms
#Develop policy options for the uses of genomics in medical and non-medical settings
#Understand the relationships between genomics, race and ethnicity, and the consequences of uncovering these relationships
#Understand the consequences of uncovering the genomic contributions to human traits and behaviours
Assess how to define the ethical boundaries for uses of genomics

What next?

HapMap Project

The International HapMap Project is a multi-country effort to identify and catalog genetic similarities and differences in human beings. Using the information in the HapMap, researchers will be able to find genes that affect health, disease, and individual responses to medications and environmental factors.

compare the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared.
By making this information freely available, the Project will help biomedical researchers find genes involved in disease and responses to therapeutic drugs.

ENCODE project

ENCODE, the Encyclopedia Of DNA Elements, in September 2003, to carry out a project to identify all functional elements in the human genome sequence. The project is being conducted in three phases: a pilot project phase, a technology development phase and a planned production phase.

Archon X PRIZE for Genomics - Create technology that can successfully map 100 human genomes in 10 days and win $10 million.

On October 4, 2006, the X PRIZE Foundation announced the launch of its second prize — the Archon X PRIZE for Genomics. The $10 million cash prize has been created to revolutionize the medical world.The Archon X PRIZE for Genomics challenges scientists and engineers to create better, cheaper and faster ways to sequence genomes. The knowledge gained by compiling and comparing a library of human genomes will create a new era of preventive and personalized medicine — and transform medical care from reactive to proactive.

The Competition Guidelines

The purpose of this X PRIZE competition is to develop radically new technology that will dramatically reduce the time and cost of sequencing genomes, and accelerate a new era of predictive and personalized medicine. The X PRIZE Foundation aims to enable the development of low-cost diagnostic sequencing of human genomes.

The preliminary guidelines for the competition have been written with this intent and will be further developed and interpreted by the X PRIZE Foundation towards this end.

The $10 million X PRIZE for Genomics prize purse will be awarded to the first Team that can build a device and use it to sequence 100 human genomes within 10 days or less, with an accuracy of no more than one error in every 10,000 bases sequenced, with sequences accurately covering at least 98% of the genome, and at a recurring cost of no more than $10,000 per genome.

If more than one Team attempts the competition at the same time, and more than one Team fulfills all the criteria, then Teams will be ranked according to the time of completion. No more than three teams will be ranked and will share the purse in the following manner: $7.5 million to the winner and $2.5 million to the second place team if two teams are successful, or $7 million, $2 million and $1 million if three teams are successful.

Actual competition events will take place twice a year with all eligible teams given the opportunity to make an attempt, starting at precisely the same time as the other teams.

For more information, please see: http://genomics.xprize.org/axp/Link
  • A Vision for the Future of Genomics Research Francis S. Collins, Eric D. Green, Alan E. Guttmacher, Mark S. Guyer A blueprint for the genomic era. Nature Apr 24 2003: 835
  • Bioinformatics--Trying to swim in a sea of data David S. Roos
  • Computational comparison of two draft sequences of the human genome John Aach, et al.
  • The Human Genome Project: Lessons from Large-Scale Biology Francis S. Collins, Michael Morgan, Aristides Patrinos Science Apr 11 2003: 286
  • http://www.xprize.org/xprizes/

Genome projects and bioinformatics


Genome

A genome is all of the DNA in an organism, including its genes and a lot of DNA that does not contribute to genes. Each animal or plant has its own unique genome. Genetic DNA is the molecular code that carries information for making all the proteins required by a living organism. These proteins determine, among other things, how the organism looks, how well it adapts to its environment, and sometimes even how it behaves.


Genome sequencing

There are essentially two ways to sequence a genome. The BAC-to-BAC method, the first to be employed in human genome studies, is slow but sure. The BAC-to-BAC approach, also referred to as the map-based method, evolved from procedures developed by a number of researchers during the late 1980s and 90s and that continues to develop and change.*

The other technique, known as whole genome shotgun sequencing, brings speed into the picture, enabling researchers to do the job in months to a year. The shotgun method was developed by J. Craig Venter in 1996.

BAC to BAC Sequencing

The BAC to BAC approach first creates a crude physical map of the whole genome before sequencing the DNA. Constructing a map requires cutting the chromosomes into large pieces and figuring out the order of these big chunks of DNA before taking a closer look and sequencing all the fragments.

1.Several copies of the genome are randomly cut into pieces base pairs (bp) long.

2.Each of these fragments is inserted into a BAC-a bacterial artificial chromosome. A BAC is a man made piece of DNA that can replicate inside a bacterial cell. The whole collection of BACs containing the entire human genome is called a BAC library, because each BAC is like a book in a library that can be accessed and copied.

3.These pieces are fingerprinted to give each piece a unique identification tag that determines the order of the fragments. Fingerprinting involves cutting each BAC fragment with a single enzyme and finding common sequence landmarks in overlapping fragments that determine the location of each BAC along the chromosome. Then overlapping BACs with markers every 100,000 bp form a map of each chromosome.

Each BAC is then broken randomly into 1,500 bp pieces and placed in another artificial piece of DNA called M13. This collection is known as an M13 library.

All the M13 libraries are sequenced. 500 bp from one end of the fragment are sequenced generating millions of sequences.These sequences are fed into a computer program called PHRAP that looks for common sequences that join two fragments together.

Whole Genome Shotgun Sequencing

The shotgun sequencing method goes straight to the job of decoding, bypassing the need for a physical map. Therefore, it is much faster.

1.Multiple copies of the genome are randomly shredded into pieces that are 2,000 base pairs (bp) long by squeezing the DNA through a pressurized syringe. This is done a second time to generate pieces that are 10,000 bp long.

2.Each 2,000 and 10,000 bp fragment is inserted into a plasmid, which is a piece of DNA that can replicate in bacteria. The two collections of plasmids containing 2,000 and 10,000 bp chunks of human DNA are known as plasmid libraries.

3.Both the 2,000 and the 10,000 bp plasmid libraries are sequenced. 500 bp from each end of each fragment are decoded generating millions of sequences. Sequencing both ends of each insert is critical for the assembling the entire chromosome.

Computer algorithms assemble the millions of sequenced fragments into a continuous stretch resembling each chromosome.

Genomic Projects and their importance

Genome projects are scientific endeavours that aim to map the genome of a living being or of a species (be it an animal, a plant, a fungus, a bacterium, an archaean, a protist or a virus), that is, the complete set of genes caried by this living being or virus. The Human Genome Project was such a project.


In the mid-1980s, the United States Department of Energy (DoE) initiated a number of projects to construct detailed genetic and physical maps of the human genome, to determine its complete nucleotide sequence, and to localise its estimated 100000 genes. Work on this scale required the develop- development of new computational methods for analysing genetic map and DNA
sequence data, and demanded the design of new techniques and instrumenta- instrumentation for detecting and analysing DNA. To benefit the public most effectively, the projects also necessitated the use of advanced means of information dis- dissemination in order to make the results available as rapidly as possible to scientists and physicians. The international effort arising from this vast initia- initiative became known as the Human Genome Project. Similar research efforts were also launched to map and sequence the genomes of a variety of organisms used extensively in research laboratories as model systems: these included the bacterium Escherichia coli, the yeast Saccharomyces cerevisiae, the nematode worm Caenorhabditis elegans, the fruit fly Drosophila melanogaster, the common weed Arabidopsis thalania,
and the domestic dog Canis familiaris and mouse Mus musculus. In April
1998, although the sequencing projects of only a small number of relatively small genomes had been completed, and the human genome is not expected to be complete until after the year 2000, the results of such projects were already beginning to pour into the public sequence databases in overwhelming numbers. we are now witnessing a dramatic change of focus towards sequence analysis, spurred on by the advent of the genome projects and the resultant
sequence/structure deficit.

GOLD: Genomes Online Database, is a World Wide Web resource for comprehensive access to information regarding complete and ongoing genome projects around the world.

Published complete genomes: 431
Metagenomes: 63
ongoing genomes projects
1.Archaeal genomes: 57
2.Bacterial genomes: 994
3.Eukaryotic genomes: 634
------------------------------------
Total genome projects: 2179


Bioinformatics challege:

The central challenge of bioinformatics is the rationalisation of the mass of sequence information, with a view not only to deriving more efficient means of data storage, but also to designing more incisive analysis tools. The imperative that drives this analytical process is
the need to convert sequence information into biochemical and biophysical knowledge; to decipher the structural, functional and evolutionary clues encoded in the language of biological sequences.

References:
  • Atwood, T.K., and Parry-Smith, D.J. 1999. Introduction to Bioinformatics. Prentice Hall, London.
  • Burke, D.T. et al . Cloning of large segments of exogenous DNA into yeast by means of artificial chromosomal vectors. Science 236 , 806-812 (1987).
  • Bernal, A., Ear, U., Kyrpides, N. (2001) Genomes OnLine Database (GOLD): a monitor of genome projects world-wide. NAR 29, 126-127
  • Kyrpides, N. (1999) Genomes OnLine Database (GOLD): a monitor of complete and ongoing genome projects world wide. Bioinformatics 15,773-774
  • Liolios K, Tavernarakis N, Hugenholtz P, Kyrpides, NC. The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide NAR 34, D332-334
  • Smith, L.M. et al . Fluorescence detection in automated DNA sequencing analysis. Nature 321 , 674-679 (1986).
  • Shizuya, H. et al . Cloning and stable integration of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. Proc Natl Acad Sci USA 89 , 8794-8797 (September 1992).
  • Venter, J.C. et al. A new strategy for genome sequencing. Nature 381, 364-366 (May 30, 1996).
  • Venter, J.C. et al. Shotgun sequencing of the human genome. Science 280, 1540-1542 (June 5, 1998).

Bioinformatics research fields


Bioinformatics research fields broadly classified to the following sub-fields. Some research fields have inter related with one another

SEQUENCE ANALYSIS


The term "sequence analysis" implies subjecting a DNA or peptide sequence to sequence alignment, sequence databases, repeated sequence searches, or other bioinformatics methods on a computer.

Sequence analysis in bioinformatics is an automated, computer-based examination of characteristical fragments. It basically includes five biologically relevant topics:

1. the comparison of sequences in order to find similar sequences (sequence alignment)
2. identification of gene-structures, reading frames, distributions of introns and exons and regulatory elements
3. prediction of protein structures
4. genome mapping
5. comparison of homologous sequences to construct a molecular phylogeny


STRUCTURAL BIOINFORMATICS

structural bioinformatics refers to the analysis of macromolecular structure particularly proteins, using computational tools and theoretical frameworks.

GENOME ANALYSIS

It is defined as analysis of the full genomes of organisms that have been sequenced and to identify those genes that are predicted to have a particular biological function.Comparative genome analysis is one of the component of genome analysis.

Comparative genomics include a comparision of gene number,gene content and gene location in both prokaryotic and eukaryotic groups of organisms.

GENE EXPRESSION

Gene expression, or simply expression, is the process by which a gene's DNA sequence is converted into the structures and functions of a cell. Non-protein coding genes (e.g. rRNA genes, tRNA genes) are not translated into protein.

Gene regulatory network (also called a GRN or genetic regulatory network) is a collection of DNA segments in a cell which interact with each other and with other substances in the cell, thereby governing the rates at which genes in the network are transcribed into mRNA.

Mathematical models of GRNs have been developed to allow predictions of the models to be tested. Various modeling techniques have been used, including Boolean networks, Petri nets, Bayesian networks, graphical Gaussian models, Stochastic Process Calculi and sets of differential equations.

SYSTEMS BIOLOGY

Systems biology is the coordinated study of biological systems by (1) investigating the components of cellular networks and their interactions, (2) applying exprerimental high-throughput and whole-genome techniques, and (3) integrating computational methods with experiemntal efforts.”

DATA AND TEXT MINING

Data mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns such as association rules. It applies computational techniques from statistics, information retrieval, machine learning and pattern recognition.

PHYLOGENETICS

phylogenetics (Greek: phylon = tribe, race and genetikos = relative to birth, from genesis = birth) is the study of evolutionary relatedness among various groups of organisms (e.g., species, populations). Also known as phylogenetic systematics, phylogenetics treats a species as a group of lineage-connected individuals over time.The most commonly used methods to infer phylogenies include parsimony, maximum likelihood, and MCMC-based Bayesian inference.

GENETICS AND POPULATION ANALYSIS

Genetic analysis: The study of a sample of DNA to look for mutations (changes) that may increase risk of disease or affect the way a person responds to treatment.

Population analysis: Population analysis encompasses methods used to characterize and understand changes in populations. Typically, through population analysis we are interested in being able to explain observed changes in population dynamics and make predictions regarding future possibilities. Knowledge from analyses is expressed as a model.

References:

  • Bourne, Weissig (2003) Structural Bioinformatics, Wiley
  • James M. Bower, Hamid Bolouri (editors), (2001) Computational Modeling of Genetic and Biochemical Networks Computational Molecular Biology Series, MIT Press, ISBN 0-262-02481-0
  • Klipp E et al. ”Systems Biology in Practice”, WILEY-VCH, 2005
  • Pang-Ning Tan, Michael Steinbach and Vipin Kumar, Introduction to Data Mining (2005), ISBN 0-321-32136-7
  • http://en.wikipedia.org/wiki/Sequence_analysis
  • http://en.wikipedia.org/wiki/Phylogenetics

what is bioinformatics?


Bioinformatics has evolved into a full-fledged multidisciplinary subject that integrates developments in information and computer technology as applied to biotechnology and biological Sciences.

Roughly, bioinformatics describes any use of computers to handle biological information. In practice the definition used by most people is narrower; bioinformatics to them is a synonym for "computational molecular biology"- the use of computers to characterize the molecular components of living things.

The NIH Biomedical Information Science and Technology Initiative Consortium computational tools andapproaches for expanding the use of biological, medical, agreed on the following definitions of bioinformatics as research, development, or application of behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.

The National Center for Biotechnology Information defines bioinformatics as "Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline.There are three important sub-disciplines within bioinformatics: the development of new algorithms and statistics with which to assess relationships among members of large data sets; the analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures; and the development and implementation of tools that enable efficient access and management of different types of information."


(Molecular) bio – informatics: bioinformatics is conceptualising biology in terms of molecules (in the sense of Physical chemistry) and applying “informatics techniques” (derived from disciplinessuch as applied maths, computer science and statistics) to understand andorganise the information associatedwith these molecules, on a large scale. Inshort, bioinformatics is a managementinformation system for molecular biology and has many practical applications.

References:
1. What is bioinformatics? A proposed definition and overview of the field. NM Luscombe,
D Greenbaum, M Gerstein (2001) Methods Inf Med 40: 346-58
2. http://www.bisti.nih.gov/
3. http://www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html
4. http://www.geocities.com/bioinformaticsweb/definition.html

Twitter Delicious Facebook Digg Stumbleupon Favorites More