Molecular biology - Chapter 24: Genomics, proteomics, and bioinformatics

The National Center for Biological Information (NCBI) website contains a vast store of biological information, including genomic and proteomic data Start with a sequence and discover gene to which it belongs, then compare that sequence with that of similar genes Query the database with a topic for information View structures of protein in 3D by rotating the structure on your computer screen

60 trang | Chia sẻ: nguyenlam99 | Lượt xem: 1121 | Lượt tải: 0

Bạn đang xem trước 20 trang tài liệu Molecular biology - Chapter 24: Genomics, proteomics, and bioinformatics, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên

Molecular BiologyFourth EditionChapter 24Genomics, Proteomics, and BioinformaticsLecture PowerPoint to accompanyRobert F. WeaverCopyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.124.1 Positional CloningPositional cloning is a method for discovery of genes involved in genetic traitsPositional cloning was very difficult in the absence of genomic informationBegins with mapping studies to pin down the location of the gene of interest to a relatively small region of DNA2Classical Tools of Positional CloningMapping depends on a set of landmarks to which gene position can be relatedRestriction Fragment Length Polymorphisms (RFLP) are landmarks with lengths of restriction fragments given by a specific enzyme vary from one individual to anotherExon Traps use a special vector to help clone exons onlyCpG Islands are DNA regions containing unmethylated CpG sequences3Detecting RFLPs4Exon Trapping5Identifying the Gene Mutated in a Human DiseaseUsing RFLps, geneticists mapped the Huntington disease gene (HD) to a region near the end of chromosome 4Used an exon trap to identify the gene itselfMutation causing the disease is an expansion of a CAG repeat from the normal range of 11-34 copies to abnormal range of at least 38 copiesExtra repeats cause extra Glu inserted into huntingtin, product of the HD gene624.2 Sequencing GenomesWhat information can be gleaned from genome sequence?Location of exact coding regions for all the genesSpatial relationships among all the genes and exact distances between themHow is coding region recognized?Contains an ORF long enough to code for a phage proteinORF must Start with ATG tripletEnd with stop codonPhage or bacterial ORF is the same as a gene’s coding region 7Phage X174 GenomeFirst genome sequenced was a very simple one, phage X174Completed by Sanger in 19775375-nt completeNote that some of these phage genes overlap8Genome ResultsThe base sequences of viruses and organisms that have been obtained range from:PhagesBacteria AnimalsPlantsA rough draft and finished versions of the human genome have also been obtainedComparison of the genomes of closely related and more distantly related organisms can shed light on the evolution of these species9Sequencing Milestones10The Human Genome ProjectIn 1990, geneticists started to map and ultimately sequence the entire human genomeOriginal plan was systematic and conservativePrepare genetic and physical maps of genome with markers to allow piecing DNA sequences together in proper orderMost sequencing would be done only after mapping was complete111998 – Human Genome ProjectCelera, a private, for-profit company, shocked genomic community by announcing Celera would complete a rough draft of human genome by 2000Method that would be used was shotgun sequencing, whole human genome would be chopped up and clonedClones sequenced randomlySequences would be pieced together using computer programs12Vectors for Large-Scale Genome ProjectsTwo high-capacity vectors have been used extensively in the Human Genome ProjectMapping was done mostly using the yeast artificial chromosome, accepts million base pairsSequencing with bacterial artificial chromosomes accepting about 300,000 bpBACs are more stable, easier to work with than YACs13Clone-by-Clone StrategyMapping the human genome requires a set of landmarks to which we can relate the positions of genesSome of these markers are genes, many more are nameless stretches of DNARFLPsVNTRs, variable number tandem repeatsSTSs, sequence-tagged sites, expressed-sequence tags and microsatellites14Variable Number Tandem RepeatsVNTRs derive from minisatellites, stretches of DNA that contain a short core sequence repeated over and over in tandem (head to tail)The number of repeats of the core sequence in a VNTR is likely to be different from one individual to anotherSo VNTRs are highly polymorphicThis makes them relatively easy to mapDisadvantage as genetic markers as they tend to bunch together at chromosome ends15Sequence-Tagged SitesSTSs are short sequences60-1000 bp longDetectable by PCRCan design short primersHybridize few hundred bp apartAmplify a predictable length of DNA16Sequence-Tagged Sites Mapping17MicrosatellitesSTSs are very useful in physical mapping or locating specific sequences in the genomeWorthless as markers in traditional genetic mapping unless polymorphicMicrosatellites are a class of STSs that are highly polymorphicSimilar to minisatellitesConsist of a core sequence repeated over and over many times in a rowCore here is 2-4 bp long, much shorter18ContigA set of clones used by geneticists in physically mapping or sequencing a given region is called a contigContains contiguous (or overlapping) DNAs spanning long distancesUsed like putting together a jigsaw puzzleEasier to complete with bigger piecesHelpful to assemble in overlapping fashion19Shotgun SequencingMassive sequencing projects can take two forms:Map-then-sequence strategyProduces physical map of genome including STSsSequences clones (mostly BACs) used in mappingPlaces sequences in order to be pieced togetherIn the shotgun approachAssembles libraries of clones with different size insertsSequences the inserts at randomRelies on computer program to find areas of overlap among sequences and piece them together20Shotgun-Sequencing Method21Sequencing StandardsA “working draft” may be:Only 90% completeError rate of up to 1%A “final draft” (less consensus):Error rate of less than 0.01%Should have as few gaps as possibleSome researchers require a “final draft” is not completely sequenced until every last gap is completed22Sequencing the Human GenomeFirst chromosome completed in the Human Genome Project was chromosome 22 in late 1999Second completed was chromosome 21These are the 2 smallest human autosomes, have very valuable sequence information23Chromosome 22Only the long arm (22q) was sequencedShort arm (22p) is composed of pure heterochromatin, likely devoid of genes11 gaps remained in the sequence10 are gaps between contigs likely due to “unclonable” DNAOther a 1.5-kb region of cloned DNA that resisted sequencing24Findings from Chromosome 22We must learn to live with gaps in our sequence679 annotated genes categorized as:274 Known genes, previously identified150 Related genes, homologous to known genes148 Predicted genes, sequence homology to ESTs134 Pseudogenes, sequences are homologous to known genes, but contain defects that preclude proper expression25Contigs and Gaps26More From Chromosome 22Coding regions of genes account for only tiny fraction of length of the chromosomeAnnotated genes are 39% of total lengthExons are only 3% Repeat sequences (Alu, LINEs, etc) are 41%Rate of recombination varies across the chromosomeLong regions of low recombination interspersed with short regions where it is relatively frequent27Repetitive DNA Content28More From Chromosome 22There are local and long-range duplicationsImmunoglobin l locus36 gene segments are clustered together that can encode variable regions60-kb region is duplicated with greater than 90% fidelity almost 12 Mb awayDuplications found in few copies, low-copy repeatsLarge chunks of human chromosome 22q are conserved in several different mouse chromosomes113 human genes with mouse orthologs mapped to mouse chromosomes29HomologsOrthologs are homologous genes in different species that evolved from a common ancestor8 regions on 7 mouse chromosomesParalogs are homologous genes that evolved by gene duplication within a speciesHomologs are any kind of homologous genes, both orthologs and paralogs30Regions of Conservation31Chromosome 21Human chromosome 21q, and some of 21p have been sequencedGaps remaining are relatively few and shortSequence reveals a relative poverty of genes225 genes59 pseudogenesAll 24 genes known to be shared between mouse chromosome 10 and human chromosome 21 are in the same order in both chromosomes32The X ChromosomeThe sequence of 151 Mb of human X chromosome (99.3% of its euchromatin) revealed 1098 protein-encoding genes168 genes governing X-linked phenotypeGenes for 173 noncoding RNAsChromosome is rich in LINE1 elementsMay serve as way station for X inactivation mechanism in female cells33X Chromosome OrthologsComparison of the X chromosome sequence with the chicken whole genome confirmed that X (and partner Y) evolved from an ancestral pair of autosomesComparison of 3 mammalian X chromosome sequences demonstrate high degree of synteny among these chromosomesThis synteny likely reflects high degree of evolutionary pressue to keep order of genes on X chromosome relatively stable34Human Genome Project StatusWorking draft of human genome reported by 2 groups allowed estimates that genome contains fewer genes than anticipated – 25,000 to 40,000About half the genome has derived from the action of transposonsTransposons themselves have contributed dozens of genes to the genomeBacteria also have donated dozens of genesFinished draft is much more accurate than working draft, but there are still gapsInformation also about gene birth and death during human evolution35Other Vertebrate GenomesComparing human genome with that of other vertebrates has taught us much about similarities and differences among genomesComparison has also helped to identify many human genesIn future, will likely help identify defective genes involved in human genetic diseasesClosely related species like mouse can be used to find when and where genes are expressed so predict when and where human genes are likely expressed36The Minimal GenomeIt is possible to define the essential gene set of a simple organismMutate one gene at a timeSee which genes are required for lifeIn theory, also possible to define the minimal genome= set of genes that is minimum required for lifeMinimum genome likely larger than the essential gene setIn principle, possible to place minimal genome into a cell lacking genes of its own, create a new life form that can live and reproduce under lab conditions37The Barcode of LifeThere is a movement which has begun to create a barcode to identify any species of life on earthThe first such barcode will consist of the sequence of a 648-bp piece of mitochondrial COI gene from each organismThis sequence is sufficient to identify uniquely almost any organismOther sequences will be worked out for plants and perhaps later for bacteria3824.3 Applications of Genomics:Functional GenomicsFunctional genomics refers to those areas that deal with the function or expression of genomesAll transcripts an organism makes at any given time is an organism’s transcriptomeUse of genomic information to block expression systematically is called genomic functional profilingStudy of structures and functions of the protein products of genomes is proteomics 39TranscriptomicsThis area is the study of all transcripts an organism makes at any given timeCreate DNA microarrays and microchips that hold 1000s of cDNAs or oligosHybridize labeled RNAs from cells to these arrays or chipsIntensity of hybridization at each spot reveals the extent of expression of the corresponding geneMicroarray permits canvassing expression patterns of many genes at onceClustering of expression of genes in time and space suggest products of these genes collaborate in some process40Oligonucleotides on a Glass Substrate41Serial Analysis of Gene ExpressionSerial Analysis of Gene Expression (SAGE) allows us to determine:Which genes are expressed in a given tissueThe extent of that expressionShort tags, characteristic of particular genes, are generated from cDNAs and ligated together between linkersThese ligated tags are then sequenced to determine which genes are expressed and how abundantly42SAGE43Whole Chromosome Transcription MappingHigh density whole chromosome transcriptional mapping studies have shown a majority of sequences in cytoplasmic poly(A)RNAs derive from non-exon regions of human chromosomesAlmost half of the transcription from these same chromosomes is nonpolyadenylatedResults indicate that great majority of stable nuclear and cytoplasmic transcripts in these chromosomes come from regions outside exonsHelps to explain the great differences between species whose exons are almost identical44Transcription Maps45Genomic Functional ProfilingGenomic functional profiling can be performed in several waysA type of mutation analysis, deletion analysis - mutants created by replacing genes one at a time with antibiotic resistance gene flanked by oligomers serving as barcode for that mutantA functional profile can be obtained by growing the whole group of mutants together under various conditions to see which mutants disappear most rapidly46RNAi AnalysisAnother means of genomic functional analysis on complex organisms can be done by inactivating genes via RNAiAn application of this approach targeting the genes involved in early embryogenesis in C. elegans has identified:661 important genes326 are involved in embryogenesis47Tissue-Specific Functional ProfilingTissue-specific expression profiling can be done by examining spectrum of mRNAs whose levels are decreased by an exogenous miRNAThen compare to the spectrum of expression of genes at the mRNA level in various tissuesIf that miRNA causes decrease in levels of mRNAs naturally low in cells expressing the miRNASuggests that the miRNA is at least a partial cause of those natural low levelsThis type of analysis has implicated miR-124 in destabilizing mRNAs in brain tissuemiR-1 in destabilizing mRNAs in muscle tissue48Locating Target Sites for Transcription FactorsChromatin immunoprecipitation followed by DNA microarray analysis can be used to identify DNA-binding sites for activators and other proteinsSmall genome organisms - all of the intergenic regions can be included in the microarrayIf genome is large, that is not practicalTo narrow areas of interest can use CpG islandsThese are associated with gene control regionsIf timing/conditions of activator’s activity are known, control regions of genes known to be activated at those times, or under those conditions, can be used49In Situ Expression AnalysisThe mouse can be used as a human surrogate in large-scale expression studies that would be ethically impossible to perform on humansScientists have studied the expression of almost all the mouse orthologs of the genes on human chromosome 21Expression followed through various stages of embryonic developmentCatalogued the embryonic tissues in which these genes are expressed50Single-Nucleotide PolymorphismsSingle-nucleotide polymorphisms can probably account for many genetic conditions caused by single genes and even some by multiple genesMight be able to predict response to a drugHaplotype map with over 1 million SNPs makes it easier to sort out important SNPs from those with no effect51Structural VariationStructural variation is a prominent source of variation in human genomesInsertionsDeletionsInversionsRearrangements of DNA chunksSome structural variation can in principle predispose certain people to contract diseasesSome variation is presumably benignSome also is demonstrably beneficial5224.4 ProteomicsThe sum of all proteins produced by an organism is its proteosomeStudy of these proteins, even smaller subsets, is called proteomicsSuch studies give a more accurate picture of gene expression than transcriptomics studies do53Protein Separations and AnalysisCurrent research in proteomics requires first that proteins be resolved, sometimes on a massive scaleBest tool for separation of many proteins at once is 2-D gel electrophoresisAfter separation, proteins must be identifiedBest method of identification involves digestion of proteins one by one with proteasesThen identify the peptides by mass spectrometryIn the future, microchips with antibodies attached may allow analysis of proteins in complex mixtures without separation54MALDI-TOF Mass Spectrometry55Detecting Protein-Protein Interactions56Protein InteractionsMost proteins work with other proteins to perform their functionsSeveral techniques are available to probe these interactionsYeast two-hybrid analysis has been used for some time, now other methods are availableProtein microarraysImmunoaffinity chromatography with mass spectrometryOther combinations5724.5 BioinformaticsBioinformatics involves the building and use of biological databasesSome of these databases contain the DNA sequences of genomesEssential for mining the massive amounts of biological data for meaningful knowledge about gene structure and expression58Finding Regulatory Motifs in Mammalian GenomesUsing computational biology techniques, Lander and Kellis have discovered highly conserved sequence motifs in 4 mammalian species, including humans:In the promoter regions, these motifs probably represent binding sites for transcription factors3’-UTRs motifs probably represent binding sites for miRNAs59Using the DatabasesThe National Center for Biological Information (NCBI) website contains a vast store of biological information, including genomic and proteomic dataStart with a sequence and discover gene to which it belongs, then compare that sequence with that of similar genesQuery the database with a topic for information View structures of protein in 3D by rotating the structure on your computer screen60

Các file đính kèm theo tài liệu này:

chapt24_lecture_8239.ppt