Sequencing of a genome often starts with a random shotgun sequencing strategy or with direct sequencing on genomic DNA . The DNA sequences of the clones or sequenced genome fragments often overlap, yielding enlarged DNA sequences (contigs).
The genomic sequences are assembled into a series of genomic sequence contigs. These are then ordered, oriented with respect to each other, and placed along each chromosome with appropriately sized gaps inserted between adjacent contigs. The resulting genome assembly thus consists of a set of genomic sequence contigs and a specification for how to arrange the sequence contigs along each chromosome.
A chromosome sequence is considered finished when any gaps that remain cannot be closed using current cloning and sequencing technology. In practice, therefore, the sequence for a finished chromosome usually consists of a small number of genomic sequence contigs.
Genomic sequence contigs for unfinished chromosomes are assembled and laid out based largely on the clone tiling path. However, the tiling paths do not specify the orientation of the clone sequences or how they should be joined; therefore, data on the alignment of the input genomic sequences to each other and to other sequences are also used to guide the assembly. Genomic sequences that augment the initial set of genomic contigs based on the tiling path clones are also incorporated.
To download complete human chromosome sequences:
It is possible to download in fasta format of each chromosome as whole sequences, through NCBI ftp site.NCBI ftp site maintains section called assembled chromosomes. We can download each chromosome sequences by clicking file which starts with hs_ref.
Vega site maintianed by Sanger Institute presents data from the manual annotation of the human genome.
High-quality annotated human chromosome sequences
To download all human annotated contigs in one fasta sequnence
Identification of genes
Genes are found using three complementary approaches: (a) known genes are placed primarily by aligning mRNAs to the assembled genomic contigs; (b) additional genes are located based on alignment of ESTs to the assembled genomic contigs; and (c) previously unknown genes are predicted using hints provided by protein homologies.