BLISS home Breaker Lab Bioinformatics > SupplementaryInformation1
Breaker Lab | Changes | Index | Search | Go

New RNA Motifs Suggest an Expanded Scope for Riboswitches in Bacterial Genetic Control (Supplementary Methods and Website)

Jeffrey E. Barrick, Keith A. Corbino, Wade C. Winkler, Ali Nahvi, Maumita Mandal, Jennifer Collins, Mark Lee, Adam Roth, Narasimhan Sudarsan, Inbal Jona, J. Kenneth Wickiser, and Ronald R. Breaker

Contents

Links

BLISS Overview

The BLISS database integrates comparative genomics information to enable riboswitch discovery in Bacillus subtilis. It begins with automatically generated alignments seeded by BLAST hits between intergenic regions (IGRs) from fully sequenced bacterial genomes and incorporates uniform predictions of gene functions and intrinsic terminators. A web interface to the database allows intergenic regions to be sorted based on genome position or statistics derived from its sequence alignment. An integrated system for collaborative annotation facilitates the exhaustive manual examination of these IGRs for riboswitch candidates.

Genome Sequences

A complete list of the genomes analyzed is available. Genome sequences were downloaded in Genbank format from the NCBI bacterial reference sequence list. Genes on the same strand separated by fewer than 30 nt are usually part of the same transcriptional unit (1) and are not large enough to harbor structured RNA sequences. Therefore, we only considered IGRs with a length of at least 30 nt. Organisms were classified into broad taxonomic groups based on the information in Genbank records and the Complete Microbial Resource at TIGR. Our three-letter organism abbreviations are derived from the COG database when possible.

IGR Sequence Comparisons

We used version 2.2.5 of the BLAST package to compare Bacillus subtilis intergenic regions to intergenic region databases for every other genome. The program BLASTN was used with a word size of 7 nucleotides, a gap open penalty of 2, a gap extension penalty of 2, and a nucleotide penalty of 2 (-W 7 -G 2 -E 2 -q -2). These parameters were found to maximize the ratio of known positive comparisons (hits between two riboswitches) to known negative comparisons (hits between riboswitch intergenic regions and other IGRs) with a set of known riboswitch-containing IGRs within the B. subtilis genome (J.E.B., data not shown). BLAST results were symmetrized by taking the higher E-value for each pair of unidirectional hits between two intergenic regions. Bidirectional hits with E-values <= 0.01 were individually aligned to the B. subtilis sequence using the program ssearch34 from version 3.4 of the FASTA package (2) with a gap opening penalty of 15 (-f -15).

Gene Function Predictions

We used the COG database (September 2003) to uniformly assign gene functions to the genomic data sets (3, 4). Specifically, each annotated protein gene was filtered with the COILS 2.2 program (5) and compared to proteins in the COG database using BLASTPGP with default parameters. From these similarity results proteins were assigned to COGs by the local version of the COGnitor program. Proteins that are the results of gene fusions are often assigned to multiple COGs.

Gene descriptions and names for each COG are derived from the "whog" file of the database distribution. Gene names were assigned from identified genes in a COG with the following priority: (1) If the COG contains an E. coli gene then use this name, (2) If the COG contains a B. subtilis gene then use this name, (3) Otherwise do not assign a name (designated with a dash).

Terminator Predictions

Intrinsic terminators were predicted using a the software program TransTerm available from TIGR (6). The source code was modified to (1) ignore distinctions between head-to-tail and tail-to-tail intergenic regions when scoring terminators and (2) leave separate confidence values for overlapping terminators on opposite strands. The altered source for "smooth_confidence.perl" is available. Terminators with >98% confidence are high quality predictions.

IGR Annotation

The BLISS database links intergenic regions to the open source TWiki collaboration tool. TWiki allows webpages to be edited by any registered user and supports full version control to record a history of all page edits. BLISS generates a separate TWiki webpage for each intergenic region automatically when a user chooses to add annotation. Keywords within these pages are recognized by the web interface to prominently display information on the sortable list of IGRs. Our lab has used these pages to record known riboswitches, transcription-factor binding sites, T boxes, noncoding RNAs and other sequence features in B. subtilis IGRs that cause clusters of BLAST hits. Every remaining intergenic region alignment with at least 5 sequences has been examined for conservation indicative of a regulatory RNA motif.

The archived pages presented here have editing disabled so that a snapshot of the annotation process has been preserved. See the current BLISS pages for the ongoing annotation effort.

Candidate Phylogenies

We manually aligned the BLAST hits from the most promising candidate intergenic regions to create initial models for each putative RNA structure. Matches of these RNA motifs consisting of blocks of consensus sequences and base pairing to our curated set of complete genomes were compiled using the program SequenceSniffer (J.E.B., unpublished program). This program displays the relation of matches to nearby genes so that sequences regulating related genes are readily recognized. The published RNAMotif program allows more general searching for RNA patterns (7). Further BLAST matches to the conserved sequence families were found in other organisms with the NCBI Microbial BLAST page. We iteratively relaxed the RNA consensus model with each expanded alignment and repeated these searches until no new matches could be found.

References

1. Salgado, H., Moreno-Hagelsieb, G., Smith, T.F., and Collado-Vides, J. 2000. Operons in Escherichia coli: Genomic analyses and predictions. Proc. Natl. Acad. Sci. U. S. A. 97: 6652-6657.

2. Pearson, W.R. 2000. Flexible Sequence Similarity Searching with the FASTA3 Program Package. In Bioinformatics Methods and Protocols. (eds. S. Misener, and S.A. Krawetz), pp. 185-219. Humana Press, Totowa, NJ.

3. Tatusov, R.L., Koonin, E.V., and Lipman, D.J. 1997. A genomic perspective on protein families. Science 278: 631-637.

4. Tatusov, R.L., Natale, D.A., Garkavtsev, I.V., Tatusova, T.A., Shankavaram, U.T., Rao, B.S., Kiryutin, B., Galperin, M.Y., Fedorova, N.D., and Koonin, E.V. 2001. The COG database: New developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 29: 22-28.

5. Lupas, A. 1996. Prediction and analysis of coiled-coil structures. Method Enzymol 266: 513-525.

6. Ermolaeva, M.D., Khalak, H.G., White, O., Smith, H.O., and Salzberg, S.L. 2000. Prediction of transcription terminators in bacterial genomes. J. Mol. Biol. 301: 27-33.

7. Macke, T.J., Ecker, D.J., Gutell, R.R., Gautheret, D., Case, D.A., and Sampath, R. 2001. RNAMotif, an RNA secondary structure definition and search algorithm. Nucleic Acids Res. 29: 4724-4735.

Topic SupplementaryInformation1 . { Edit | Attach | Ref-By | Printable | Diffs | r1.11 | > | r1.10 | > | r1.9 | More }
Revision r1.11 - 04 Dec 2003 - 16:16 GMT - JeffreyBarrick
Parents: WebHome
Contact Webmaster. Pages created with TWiki.