Bioinformatics glossary - D

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

Bioinformatics glossary - D



Data Cleaning
A process whereby automated or semi-automated algorithms are used to process experimental data, including noise, experimental errors and other artifacts, in order to generate and store high-quality data for use in subsequent analysis. Data cleaning is typically required in high-throughput sequencing where compression or other experimental artifacts limit the amount of sequence data generated from each sequencing run or "read." 

Data Mining
The ability to query very large databases in order to satisfy a hypothesis ("top-down" data mining); or to interrogate a database in order to generate new hypotheses based on rigorous statistical correlations ("bottom-up" data mining).

Data Processing
Data processing is defined as the systematic performance of operations upon data such as handling, merging, sorting, and computing. The semantic content of the original data should not be changed, but the semantic content of the processed data may be changed.

Data Warehouses
Vast arrays of heterogeneous (biological) data, stored within a single logical data repository, that are accessible to different querying and manipulation methods.

Database
Any file system by which data gets stored following a logical process.  (see also relational database)

Deconvolution
Mathematical procedure to separate out the overlapping effects of molecules such as mixtures of compounds in a high-throughput screen, or mixtures of cDNAs in a high density array.

Deletion
A chromosomal alteration in which a portion of the chromosome or the underlying DNA is lost.

Deletion mapping
Process in which different deletions in a region of DNA are created and used to map the functionally critical areas of that DNA. e.g the minimal region of DNA required for a test promoter can be ascertained by systematic deletions in the region of interest.

Dendrogram 
A graphical procedure for representing the output of a hierarchical clustering method.  A dendrogram is strictly defined as a binary tree with a distinguished root, that has all the data items at its leaves.  Conventionally, all the leaves are shown at the same level of the drawing.  The ordering of the leaves is arbitrary, as is their horizontal position. The heights of the internal nodes may be arbitrary, or may be related to the metric information used to form the clustering.


Dimer
A composite molecule formed by the binding of two molecules (see homo and heterodimers).

Disulphide bond
Covalent link formed between the sulphur atoms of two different cysteine residues in a protein. Important in maintaining the folded structure of a protein, and also for linking different proteins in a complex.

DNA (deoxyribonucleic acid)
The chemical that forms the basis of the genetic material in virtually all organisms. DNA is composed of the four nitrogenous bases Adenine, Cytosine, Guanine, and Thymine, which are covalently bonded to a backbone of deoxyribose-phosphate to form a DNA strand. Two complementary strands (where all Gs pair with Cs and As with Ts) form a double helical structure which is held together by hydrogen bonding between the cognate bases.

DNA fingerprinting
A technique for identifying human individuals based on a restriction enzyme digest of tandemly repeated DNA sequences that are scattered throughout the human genome, but are unique to each individual.

DNA microarrays
The deposition of oligonucleotides or cDNAs onto an inert substrate such as glass or silicon. Thousands of molecules may be organized spatially into a high-density matrix. These DNA chips may be probed to allow expression monitoring of many thousands of genes simultaneously. Uses include study of polymorphisms in genes, de novo sequencing or molecular diagnosis of disease.

DNA polymerase
An enzyme that catalyzes the synthesis of DNA from a DNA template given the deoxyribonucleotide precursors.

DNA probes
Short single stranded DNA molecules of specific base sequence, labeled either radioactively or immunologically, that are used to detect and identify the complementary base sequence in a gene or genome by hybridizing specifically to that gene or sequence.

DNA sequencing
The technique in which the specific sequence of bases forming a particular DNA region is deciphered.

DNase (Deoxyribonuclease)
One of a series of enzymes that can digest DNA.

Domain (protein)
A region of special biological interest within a single protein sequence. However, a domain may also be defined as a region within the three-dimensional structure of a protein that may encompass regions of several distinct protein sequences that accomplishes a specific function. A domain class is a group of domains that share a common set of well-defined properties or characteristics.

Drug
An agent that affects a biological process. Specifically, a molecule whose molecular structure can be correlated with its pharmacological activity.

Drug discovery cycle
The cycle of events required to develop a new drug. Typically this involves research, preclinical testing and clinical development, and can take from 5 to 12 years.