Chapter 21: Genomic Analysis
Loading audio…
ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.
Genomics is the essential field dedicated to studying entire genomes by integrating modern DNA sequencing methodologies, recombinant DNA techniques, and specialized bioinformatics applications. Bioinformatics, which merges information technology, biology, and mathematics, is crucial for storing, sharing, comparing, and analyzing nucleic acid and protein sequence data. The primary strategy for sequencing and assembling large genomes is Whole-Genome Sequencing (WGS), or shotgun sequencing, where chromosomes are broken into overlapping fragments known as contigs which are then computationally aligned based on sequence identity. This process was dramatically advanced by High-Throughput Sequencing (HTS), which significantly increased sequence output and reduced costs compared to earlier methods like map-based cloning. Once a genome is compiled into a reference sequence, the process of annotation uses bioinformatics tools to identify functional elements, including coding segments (Open Reading Frames or ORFs) and gene-regulatory sequences such as promoters and enhancers. Applications like BLAST search databases such as GenBank to compare sequence segments for statistical similarity, which helps infer function and identify evolutionary relationships, specifically recognizing orthologs (homologous genes in different species) and paralogs (homologous genes in the same species). The monumental Human Genome Project (HGP) revealed several unexpected findings, including that only approximately 20,000 protein-coding genes exist in humans and that (lesser than) 2 percent of the 3.1 billion nucleotides in the human genome sequence actually codes for proteins. Post-HGP projects, including Personal Genome Projects (PGPs), highlight that individual variation is largely driven by Single-Nucleotide Polymorphisms (SNPs) and large structural differences like Copy Number Variations (CNVs), showing that the single reference genome model often underestimates variation, leading to the emerging concept of the pangenome to visualize all genetic diversity within a species. Specialized "omics" disciplines include Functional Genomics, which establishes gene function; Comparative Genomics, which analyzes evolutionary relationships by comparing different species (noting humans and chimpanzees share approximately 98 percent sequence identity); Metagenomics, which studies genomes from environmental samples (like the Human Microbiome Project); Transcriptome Analysis (or transcriptomics), which studies the quantitative expression profiles of all expressed RNAs; and Proteomics, which analyzes the proteome—the full complement of proteins in a cell—using separation techniques like Two-Dimensional Gel Electrophoresis (2DGE) and identification methods such as Mass Spectrometry (MS). Importantly, alternative splicing allows the human genome's modest gene count to produce a much larger number of proteins (potentially up to 290,000). The ENCODE project further demonstrated that about 80 percent of the genome is biochemically functional, often transcribing into various noncoding RNAs.