Building a drug screening model for RNA therapeutics

This is the full reflection. It’s still in progress, but I’ve written some early thoughts in another post. I’m currently prioritizing an explainer and building a usable web tool. Back to directory.

By mid-2025, I had already visited the emergency room 20 times, and the burden of Ehler’s Danlos Syndrome (hyperfragile skin) became too much. Whenever my family had the chance to go out (say) hiking, all it took was one twig and bam. All the plans we made were gone in a snap, and my parents’ hard-work wasted on stitches and late nights in the ER.

So I took to trusty old Google Search to find a treatment. But though it started simply as reading a few articles in my free time, by following one lead after another, I now find myself six months later with one of my biggest projects yet. It’s an ML screening model for discovering RNA-targeted drugs. I’m really proud (and also in disbelief) to say it now performs better than state-of-the-art binding affinity predictors.

Since then, I’ve learned there are far more important issues than EDS, like cancers and pandemic viruses, that we should prioritize this technology for. While I might not get a cure very soon for my own skin, I’m all the more excited to see where it goes. I’m now reflecting on findings so far in a paper and integrating the model into a web app for clinical use (will soon be more than a mockup). And between other experiments, I’ve been stealing moments to set up COL5A1 mRNA screening for EDS.

You can also find the model’s source code on Github.

I gained a lot of useful insights from this project so far. Here, I want to document how the dominos fell and discuss some of the takeaways beyond the paper.

This is partly an attempt to grapple with the whole discovery process. If you asked me just a year ago, I’d have barely any clue how “research” was conducted by professionals, let alone in one’s own room. From pure bouncing around, trying whatever looked most promising (and telling myself I’m just using this to learn ML), I’ve happened to stumble upon something new that works. Is this a fluke? Or might the piecemeal progress look anything like how new things are normally built?

I. Searching for a treatment

Back to that first weekend search. The early results Google yielded were thick sleeves, braces, creams, orthodics or PT, all symptom-level remedies the doctor already suggested. I wanted to find an actual pill or treatment, and dug deeper:

Celiprolol, available in UK, to reduce arterial stress in vascular Ehlers-Danlos Syndrome.
Prolotherapy, where they inject a hsugar solution (dextrose) into hypermobile ligaments to trigger a controlled inflammatory response. This scar tissue theoretically tightens the ligament. Available in clinics FDA-unregulated; formal trials ongoing.
Excellagen is topical gel for wound care (for post injury).

But these were still no cure. We make pills that change our brains and weaponize our own T-cells into living cancer drugs! Is this really the best we can do for EDS??

The root problem is that collagen is a bunch of long protein strands (fibrils), unlike enzymes or receptors with pockets. There’s no place for drugs to sit and act.
So the solution would be upstream. Everyone has two COL5A1 genes responsible for producing collagen (namely Type V in the skin), and in classical EDS, there’s an unhealthy copy that leaves the collagen unstable.
CRISPR might be the answer. CRISPRa (CRISPR activation) has already cured this kind of one-gene-broken “haploinsufficiency” in e.g. obesity!
In fact, researchers are actively working on applying similar CRISPRa techniques to boost the healthy copy of COL5A1 in EDS.

This was thrilling to discover. Yet, after reading about all the ongoing studies, I was still left with no fix. Do I just keep waiting? It seemed so far away. CRISPR has so far only acted on tiny targets (just the hypothalmus in the obesity study). Collagen genes are enormous and expressed in so many tissues, which would make delivery challenging even after we prove the concept.

So I dug deeper into delivery methods, and found out about RNA-targeted therapies. They are the exciting next frontier. RNAs are filled with pockets, unlike many proteins, and very active. Just find a small-molecule that binds to 5’ UTR in the COL5A1 mRNA and we could accelerate tissue protein generation $1.8x \times 0.50=0.9$ of the way to fully fixed (e.g., by stabilizing a conformation that recruits the ribosome more efficiently), so send that small-molecule as a drug throughout the bloodstream, and we’d have a cure!
How can we find these molecules that target RNAs? We could use one of these guys to screen for binding affinity by weighing RNAs in a vaccuum to see if it picked up a small-molecule. Yeah.. I don’t know I can afford that. The current experimental search is just too time and resource-intensive to devote to EDS.
- 200k a run. and 1-3 months including prep. ( $0.1-$ 1.00 per well (compound), for a round of 400k compounds)
- this gives .021% hits without pre-screening
- https://cancer.wisc.edu/research/resources/ddc/smsf/equipment-services/
- https://pubmed.ncbi.nlm.nih.gov/12014959/
But the slow-search problem reminded me of AlphaFold, and how they turned protein-folding, where each problem was worth a full-length PhD, into a click of a button.
Except when I looked for the equivalent of CASP14, there didn’t seem to be as lively of a community. Is there just not enough experimental data to learn useful patterns?

II. Models, one after another

[still consolidating old scattered notes to piece together the timeline]

Update 2026-01-18: I’ve reflected on the overall process underlying this project in another essay. I’ll focus more on the technical insights of the model here.