Future directions and projects we’re excited about
Last time we talked about some delivery and CRISPR fundamentals. This post is going to focus on things we believe are useful avenues to getting the field to the finish line.
Enabling experimental best practices
Give away WGS for off-target effect assessment
One of the main challenges when talking about CRISPR applied to the human genome is off-targeting. To date, there are a ton of studies that show different rates of deleterious cutting, which we talked about in part 1, but largely there is no unified consensus on how often this occurs. We propose one strategy for mitigating the inconsistency within the field is standardized use of whole genome sequencing for the assessment of off-target cuts. Of course, this is expensive, especially if we’re proposing the field aims for sufficient sequencing depth (which we are). Luckily, sequencing is getting cheaper, even outpacing trends expected when applying Moore’s law, putting us in the era of the $1000 genome.
We propose an opportunity for a non-profit to provide subsidized WGS for research projects to better understand the true rate of off-targeting with every new editing study, whether in vivo or ex vivo.
Better datasets and benchmarks
There are lots of places in the editing world where having open, clean datasets would help us and others both understand how far a problem is from being solved and make progress on solutions. One powerful example of clear benchmarks and open datasets enabling rapid progress is AlphaFold. AlphaFold was only possible because decades of researchers spent their time incrementally solving structures and then contributing them to the open and standardized protein data bank (PDB). Combined with also painstakingly compiled and open evolutionary data, PDB structures provided the raw material for AlphaFold, its precedents, and its antecedents. And AlphaFold is only the most salient, currently popular example of the impact the PDB has had. While there may not be an exact analogue to the PDB for editing, we believe open datasets can have a similar magnitude of impact on progress in the editing that the PDB had for protein structure.
Cross-editor and component off-target rates at thousands of targets
Right now, it’s hard to know how far we are from reaching negligible off-target rates (defined as lower than the mutation rate of normal body processes such as cell division). While several of the papers we linked above include measurements of off-target edit rates, they tend to look at a small number of editing setups (e.g. Sniper-Cas9 + CBEs/ABEs). On top of this, with a few exceptions most don’t do whole-genome sequencing (WGS) meaning they can’t easily translate off-target edit rates into a unit comparable to natural benchmarks (e.g. mutations per cell division).
Given this, collecting and sharing a large dataset of off-target edit rates for the combinatorial space of editing system, Cas protein variant, and guide combinations with off-target edit rates measured genome-wide would be extremely valuable. Ideally this could be done across multiple cell lines to get a sense of how much rates vary in different conditions. Having such a dataset would allow the field to answer questions like:
How far are we from negligible off-target edit rates?
Which combination of components currently has the lowest average off-target edit rate?
This would not only help current researchers but also funders and future researchers deciding where to invest their money and/or time.
(More) deep mutational scans for Cas proteins + editor combinations
While the prior project would enable a better understanding of current editing systems’ and components’ off-target edit rates, this project could enable much faster progress in Cas protein engineering. Inspired by Spencer et al. and also how deep mutational scans in other areas like AAV catalyzed modeling progress, we think deep mutational scans of more, especially engineered Cas variants, would lead to faster Cas engineering progress. In particular, ML-guided engineering of the Cas protein would benefit from having more such datasets.
While Spencer et al. measure knockout efficacy, we’re excited about projects that find ways to measure substitution and insertion efficiency in a deep mutational scan. This would require creative science but, if possible, would come with the benefit of directly enabling data-driven design of Cas proteins for a wider range of applications.
ML prediction and design challenge
While there’s more work – much of which we’ve already discussed – coming out using ML to improve various components of editors, we feel much of this work is hampered by the (unfortunately common) tight coupling ML of work to the wet lab work. As an analogy, imagine if the only people who worked on improving accuracy on ImageNet or WikiText were the people who collected the data.
Given this and inspired by the Bio Align protein engineering tournament and Novozyme’s enzyme stability prediction Kaggle competition, we see an opportunity to broaden participation in ML-guided editor engineering from a wider, ideally ML savvy pool by running a challenge centered around predicting and designing better editing components. We are agnostic to which component – Cas protein, gRNAs, etc. – makes most sense here as we think the higher-order bit is being able to collect a large, clean dataset that’s in the sweet spot of hard enough for ML to be required but tractable to predict and use for subsequent design.
This challenge could not only result in better editor components but also increase interest in the ML problems in the field while also improving the tech used to collect large datasets.
Longitudinal / observational studies in model systems
Repeat dosing experiments in model systems
Tools exist today that would allow for repeated dosing of an animal. A proposed plan aimed at introducing Dec2 mutation (aka short sleeper gene) into an organism would start with a standard CRISPR-Cas9 product loaded up into a wild-type AAV9. It works! The animal is now exhibiting all of the wonderful traits that make Dec2 mutants so blessed. Our next aim is to try to correct a point mutation linked to neuroblastoma. Because we targeted the CNS in the first round of dosing, and we’re now going back to target the same area, there are remaining questions about how a subsequent treatment will go, even if we’re looking at fixing something new. Sure, we can change the delivery vector, but will the animal have a negative immune reaction to Cas9 itself? What if it’s a base editor this time – what happens then? The main takeaway is that there are open questions that remain about whether some of the immunogenicity problems can be solved by simply messing around with some components of a therapy. It might be possible that simply swapping the type of CRISPR-effector and vector is enough to allow repeated edits. As of right now, it’s unclear, but it would be great to see studies trying to make that happen.
Assessing long-term risk of off-targeting
By far the biggest issue preventing us from taking editing into humans are questions about off-targeting. It’s clear that unintended editing and chimeras happen, and do so for a variety of reasons, but it’s important to differentiate scientific questions aimed at understanding mechanisms from actual influencers of a decision on whether we’re ready to take CRISPR-effectors into the clinic.
We see an opportunity to frame the problem around off-targeting against a different benchmark: the human natural machinery itself. There are plenty of studies that show off-targeting occurs, but longitudinal studies trying to directly quantify the extent of risk related to single or multi-dose gene therapies in the long-term. Organisms accumulate somatic mutations in their cells over time, something first recognized in cancer tissue. However, even normal tissue will have substantial genetic diversity, with mutation rate varying by tissue (see table 1). As a general rule, germ cells have the lowest rate of mutations per year, while every tissue from neurons (which are post-mitotic and not reproducing) to skin cells have a higher rate of mutations. Apart from the overall rate of mutations, different tissues have “tissue-specific” mutational processes which lead to different patterns of mutations. Thus, any benchmark assessing off-target edits should be viewed in the context of already-existing substantial intra-organism genetic diversity.
What we didn’t cover
There are other outstanding challenges that stand between the bench and the clinic when it comes to genetic engineering. While we covered many things relating to delivery and cutting machinery, target discovery is another massive component to this operation. Gene therapy programs work best when we have an exact, fully understood disease mechanism. Even if we are able to cut down on off-targeting, or deliver exactly where we want to go, a bad target will lead to a bad therapeutic outcome. There are plenty of regulatory hurdles, but since we aren’t experts in all things policy, that is also left out of the piece.