On the Importance of RNA Tertiary Structures

00:12:58:49

A Molecule Long Overlooked

For much of the twentieth century, RNA was understood primarily as a messenger — a disposable photocopy of the genome whose only job was to carry instructions from DNA to the ribosome. Under this dogma, the overwhelming majority of the genome — the part that does not encode protein — was quietly dismissed as "junk." It was genetic noise, evolutionary detritus, the biological equivalent of commented-out code no one had gotten around to deleting.

We now know this view was spectacularly wrong.

Approximately 98% of the human genome is transcribed into RNA, yet only around 2% codes for protein. The rest encodes thousands of distinct non-coding RNA (ncRNA) species — molecules that fold into precise three-dimensional architectures and perform functions that no protein and no small molecule can replicate. Understanding those architectures, and why they matter, is one of the defining challenges of twenty-first-century molecular biology.

From Sequence to Structure

RNA, like its cousin DNA, is a polymer of nucleotides. But while DNA exists as a stable double helix driven by complementary base pairing between two separate strands, RNA is typically single-stranded — and that distinction opens an enormous landscape of structural possibility.

We conventionally divide RNA structure into three levels:

Primary structure is simply the linear sequence of nucleotides — adenine (A), uracil (U), guanine (G), and cytosine (C). This is the information encoded in the gene, and it fully determines everything that follows. Each nucleotide consists of a nitrogenous base attached to a ribose sugar via a glycosidic bond, with a phosphate group linking successive sugars along the backbone.

Secondary structure refers to local patterns of base pairing within the single strand. Because an RNA molecule can fold back on itself, complementary segments pair up to form stem-loop hairpins, internal loops, bulges, and multi-way junctions. These elements are thermodynamically stable and can be predicted with reasonable accuracy using free-energy minimization algorithms. They are also highly conserved across species — even when the exact sequence diverges, the secondary structure is often maintained, a powerful evolutionary signal of functional importance.

Tertiary structure is where things become extraordinary. Secondary structural elements pack against one another through long-range interactions — non-Watson-Crick base pairs, ribose zipper contacts, A-minor motifs, and metal ion coordination — to generate a compact, precisely folded three-dimensional shape. It is this shape that determines function. Just as an enzyme's active site depends on the precise geometry of its folded protein chain, an RNA's activity depends on the geometry of its tertiary fold.

The Ionic Hierarchy of RNA Folding

RNA folding is fundamentally an electrostatic problem. Every nucleotide carries one negative charge on its phosphate group — meaning a 100-nucleotide RNA has a backbone of 100 closely spaced negative charges. Without counterions, this charge repulsion prevents any folding whatsoever.

Monovalent ions (Na⁺, K⁺) are the first line of defense. These ions accumulate diffusely around the backbone through non-specific electrostatic attraction — the Manning condensation layer. By reducing the effective backbone charge, monovalent ions allow complementary regions to approach each other and form Watson-Crick base pairs. Secondary structure — stem-loops, hairpins, and multi-helix junctions — is largely accessible with monovalent salt concentrations in the 50–150 mM range, comparable to physiological conditions.

Divalent ions (Mg²⁺) are required for tertiary folding. Because Mg²⁺ carries two charges, it can bridge two phosphate groups simultaneously, drawing together backbone segments that would otherwise repel. But Mg²⁺ does more than bridge: it coordinates directly to backbone oxygens and water molecules at specific three-dimensional positions, acting almost as a structural component of the RNA itself. The characteristic outer-sphere and inner-sphere Mg²⁺ coordination motifs — where the ion contacts RNA either through its hydration shell or directly — stabilize bent backbone conformations, the elbow regions of L-shaped folds, and the metal ion cores of ribozyme active sites. RNA structures like the tRNA L-shape and the hammerhead ribozyme catalytic centre simply do not form in the absence of Mg²⁺.

Trivalent ions (spermine, spermidine, cobalt hexammine) can drive even further compaction. Spermine, a naturally occurring polyamine with four positive charges, is particularly effective at inducing the most compact tertiary structures in large RNAs. Trivalent ions are also used experimentally — cobalt hexammine, for example, is a non-reactive mimic of Mg²⁺(H₂O)₆ that allows NMR experiments to map specific ion binding sites on RNA.

Step through the three structural levels. Primary structure unfolds as a linear chain of nucleotides. Monovalent Na⁺/K⁺ ions (yellow) orbit the backbone diffusely in all stages — their cloud tightens as the backbone folds. In the secondary stage, base pair hydrogen bonds form between complementary nucleotides across the stem, stabilised by the screened backbone. In the tertiary L-fold, Mg²⁺ ions (teal spheres) appear at the elbow junction, directly coordinating the compact architecture that neither monovalents nor geometry alone can achieve.

The Catalytic Discovery That Changed Everything

The first hint that RNA was more than a passive messenger came in the early 1980s when Thomas Cech discovered that the pre-ribosomal RNA of the protozoan Tetrahymena thermophila could catalyze its own splicing — cutting and rejoining itself without any protein involvement. Sidney Altman simultaneously found catalytic activity in RNase P, an enzyme whose RNA subunit was responsible for processing transfer RNA precursors. Both researchers shared the 1989 Nobel Prize in Chemistry.

These catalytic RNAs — ribozymes — shattered the protein-centric view of biology and gave rise to the RNA World hypothesis: the idea that life began as self-replicating, self-catalyzing RNA molecules before proteins evolved to take over most enzymatic roles. The ribosome itself — the universal machine that synthesizes all proteins in all living things — turns out to be a ribozyme. The peptidyl transferase center, which forms the peptide bond linking amino acids together, is built of RNA. The ribosome is not a protein that carries some RNA; it is an RNA that carries some protein.

Ribozymes achieve their catalytic activity through precisely formed tertiary structures that position reactive groups with enzyme-like precision. The hammerhead ribozyme folds into a Y-shaped tertiary architecture in which a catalytic Mg²⁺ ion is positioned by RNA backbone contacts to cleave a specific phosphodiester bond. The geometry must be exact: even a single nucleotide change that perturbs the tertiary fold destroys activity completely.

Riboswitches: Nature's RNA-Only Gene Switches

Among the most elegant examples of RNA tertiary structure in action are riboswitches — RNA elements found in the 5' untranslated regions of bacterial messenger RNAs that directly sense small molecules and respond by changing their fold.

A riboswitch consists of two functionally distinct but structurally coupled domains: an aptamer domain that binds the target ligand with high affinity and selectivity, and an expression platform whose structure determines whether the downstream gene is turned on or off. When the ligand binds the aptamer domain, it stabilizes a specific tertiary conformation that propagates a structural change to the expression platform — either occluding the ribosome binding site to suppress translation, or triggering premature termination of transcription.

What makes riboswitches remarkable is the selectivity they achieve. The TPP riboswitch (responding to thiamine pyrophosphate, vitamin B1) can discriminate against structurally similar analogs by factors exceeding 1000-fold — comparable to the selectivity of protein-based receptors — and it accomplishes this with a carefully shaped RNA binding pocket that positions every hydrogen-bond donor and acceptor in precise geometric register with the ligand. Two Mg²⁺ ions at the heart of the binding pocket coordinate the pyrophosphate group of TPP, and removing Mg²⁺ from solution causes complete loss of binding even at saturating ligand concentrations.

Over 40 distinct classes of riboswitches have now been identified, controlling genes involved in vitamin biosynthesis, metal ion homeostasis, amino acid metabolism, and second-messenger signaling. In pathogenic bacteria, riboswitches control virulence factors, making them attractive targets for a new class of antibiotics that work by jamming a gene-regulatory switch rather than targeting a conventional enzyme.

Toggle the ligand to watch the riboswitch respond. The aptamer domain (teal) collapses around the small molecule, Mg²⁺ ions coordinate the binding pocket, and the resulting conformational change propagates to sequester the ribosome binding site (gold) in an inaccessible stem-loop. Remove the ligand and translation resumes — a complete molecular logic gate built entirely from RNA.

Pseudoknots, G-Quadruplexes, and Long-Range RNA Architecture

The tertiary structural repertoire of RNA extends well beyond the examples above. A few particularly important motifs:

Pseudoknots form when a loop region base-pairs with a complementary sequence outside its own stem. The resulting interlocked structure cannot be represented as a simple branched tree — a limitation of traditional secondary structure diagrams — and confers unusual mechanical rigidity. Pseudoknots are found in viral RNA genomes, where they are essential for frameshifting: the reprogramming of the ribosome to read a sequence in a different frame, allowing one RNA molecule to encode multiple proteins.

G-quadruplexes form in guanosine-rich sequences, where four guanines arrange in a planar quartet stabilized by Hoogsteen hydrogen bonds and a central monovalent cation. Multiple quartets stack to form a highly stable helical structure found in both DNA and RNA. G-quadruplexes are implicated in the regulation of oncogenes and telomere biology, generating intense interest as anticancer drug targets.

Tertiary contact motifs — including A-minor interactions (where adenosine residues dock into the minor groove of a nearby helix), ribose zippers (alternating 2'-OH contacts along the RNA backbone), and kink-turns (sharp bends induced by internal loop geometry) — function as the nuts and bolts that hold large RNA architectures together. The ribosome contains hundreds of such contacts, each contributing incrementally to the thermodynamic stability of the overall assembly.

Seeing RNA in Three Dimensions

The methods used to determine RNA tertiary structures have evolved dramatically in parallel with our appreciation of RNA's functional diversity.

X-ray crystallography has provided the highest-resolution views, including complete ribosome structures better than 3 Å. However, crystallization requires a fixed conformation and periodic packing — conditions that disfavor dynamic or flexible regions.

Cryo-electron microscopy (cryo-EM) has largely overcome these limitations. By rapidly freezing molecules in vitreous ice and averaging images from hundreds of thousands of particles, cryo-EM can determine structures at near-atomic resolution without crystals. It is particularly powerful for large complexes and can, with appropriate particle classification, capture multiple conformational states simultaneously. The methodology has recently been extended to medium-sized RNAs — the TPP riboswitch, small ribozymes, and select lncRNA domains — bringing previously intractable structures into reach.

NMR spectroscopy probes RNA in solution and is uniquely capable of accessing dynamics — the conformational fluctuations on timescales from picoseconds to seconds that are crucial for function. An RNA that appears static in a crystal may sample several distinct conformations in solution, only some of which are functionally active.

Chemical probing methods (SHAPE, DMS-MaPseq, and related techniques) use small molecules that react with flexible or unpaired nucleotides to read out structure at single-nucleotide resolution across entire transcriptomes. These approaches have revealed that secondary and tertiary structure are widespread across the transcriptome — not confined to classical structured RNAs but found in thousands of messenger RNA regions where structure influences translation efficiency, splicing, and stability.

Long Non-Coding RNAs: A Structural Frontier

Among the ncRNAs of increasing interest are long non-coding RNAs (lncRNAs) — RNA molecules more than 200 nucleotides in length that do not code for protein. The human genome encodes more than 60,000 putative lncRNA genes, compared to roughly 20,000 protein-coding genes. Most have no assigned function; a smaller but growing number have been shown to play critical roles in epigenetic regulation, nuclear organization, and development.

XIST: Orchestrating Chromosome-Wide Silencing

XIST is the paradigmatic large lncRNA. At approximately 17 kilobases, it is one of the largest known functional lncRNAs and is responsible for X-chromosome inactivation — the process by which female mammals silence one of their two X chromosomes across every cell division, equalizing X-linked gene dosage between the sexes.

XIST is expressed exclusively from the chromosome that will be inactivated and coats that chromosome in cis — remaining physically associated with its chromosome of origin rather than diffusing through the nucleus. This coating silences the ~1000 genes on that chromosome through recruitment of Polycomb repressive complexes, DNA methylation machinery, and histone-modifying enzymes.

The molecule is organized into modular structural domains, each with distinct functions:

The A-repeat domain consists of 8.5 tandem repeats of a 26-nucleotide consensus sequence, each capable of forming an RNA duplex with its neighbor. Recent cryo-EM and SAXS studies (Lu et al., Nature 2020; EMDB-10610) revealed that A-repeat pairs fold into tandem RNA duplexes that are recognized by the m6A reader protein YTHDC1 — the entry point for PRC1 Polycomb complex recruitment and gene silencing. The A-repeat fold is stabilized by Mg²⁺; monovalent substitution disrupts the duplex architecture and abolishes protein binding.

The B and C repeats are required for XIST spreading along the chromosome and physical interaction with the nuclear scaffold. Their 3D structures remain largely unresolved — a consequence of both their intrinsic flexibility and the difficulty of isolating structured lncRNA domains from full-length transcripts.

The E repeat recruits SPEN/SHARP, a transcriptional repressor that directly silences RNA Pol II activity on the inactive X. The SPEN-XIST interaction is the earliest identifiable step in silencing — preceding chromatin compaction and DNA methylation.

Understanding XIST structure is technically demanding because the full 17 kb molecule is long, highly flexible, and appears to adopt its functional architecture only when assembled with protein partners in a nuclear context. This is an area where computational methods — molecular dynamics simulation, machine learning-based structure prediction, and integrative modeling — are playing an increasingly important role.

NEAT1 and Other Nuclear Architectural RNAs

NEAT1 forms the structural scaffold of paraspeckles — membraneless nuclear organelles assembled entirely through RNA-protein interactions driven by the tertiary fold of NEAT1. Unlike XIST, NEAT1 does not silence genes but acts as a platform, concentrating specific RNA-binding proteins and retaining certain mRNAs in the nucleus under stress conditions.

The broader class of nuclear architectural RNAs — MALAT1, TUG1, and dozens of others — similarly perform their functions through 3D structure rather than sequence complementarity to a target. They are the scaffolds, platforms, and decoys of the transcriptome.

The Outlook

We are entering a period of unprecedented power in RNA structural biology. AlphaFold2 revolutionized protein structure prediction; its RNA-focused successors (RoseTTAFold2NA, trRosettaRNA, RhoFold) are achieving similar breakthroughs for secondary and partial tertiary structure. Meanwhile, advances in cryo-EM are enabling routine structure determination of RNA-protein complexes that would have been intractable a decade ago.

The practical implications are profound. mRNA therapeutics — delivered so dramatically in COVID-19 vaccines — depend on designing RNA sequences that fold into stable, immunosilent structures that survive delivery to the cytoplasm. RNA-based gene editing tools are exquisitely dependent on tertiary structure for their activity. Small-molecule drugs targeting RNA tertiary structure — once considered impossible — are now in clinical trials, including compounds that bind the IRES element in Hepatitis C RNA and riboswitch-targeting antibiotics in clinical development.

The ion physics that governs RNA folding — monovalents for secondary, Mg²⁺ for tertiary, polyamines for the most compact architectures — is not merely a biochemical detail. It is the physical principle that connects the RNA sequence to its three-dimensional shape, and that shape to its function. Understanding it quantitatively, and using that understanding to design RNA molecules with prescribed folds, is the frontier where structural biology meets molecular engineering.

The molecule once dismissed as junk has turned out to be one of the most structurally sophisticated, functionally versatile, and therapeutically important classes of biomolecule known. We are only beginning to understand it.