Mind & body, Research

Smallest life forms have smallest working CRISPR system

Metagenomic search turns up compact Cas protein ideal for disease diagnostics

An ancient group of microbes that contains some of the smallest life forms on Earth also has the smallest CRISPR gene-editing machinery discovered to date.

tiny cells with tiny CRISPR systems

Many Archaea like these have CRISPR systems to protect themselves from attacking viruses. The smallest CRISPR system found to date, Cas14, was found in the genome of one such Archaea, which scientists have so far been unable to grow in the lab.

The peewee protein machinery, dubbed Cas14, is related to but one-third the size of the Cas9 protein, the business end of the revolutionary gene-editing tool CRISPR-Cas9. While Cas9 was isolated from bacteria, Cas14 was found in the genome of a group of Archaea – a primitive relative of bacteria – that contains some of the smallest cells and smallest genomes known.

Cas9 and other Cas proteins are part of a defense system evolved by microbes to protect themselves from viruses. All are targeted enzymes that seek out and bind very selectively to a specific DNA or RNA sequence – in microbes, those that match sequences stored in its CRISPR memory banks after earlier viral infections – and then cuts the DNA or RNA to disable the new invader.

Like Cas9, Cas14 has potential as a biotech tool. Because of its small size, Cas14 could be useful in editing genes in small cells or in some viruses. But with its single-stranded DNA cutting activity, it is more likely to improve rapid CRISPR diagnostic systems now under development for infectious diseases, genetic mutations and cancer.

“For molecular diagnostics, you want to be able to target double-stranded DNA, single-stranded DNA and RNA,” said Lucas Harrington, a UC Berkeley graduate student and first author of a paper reporting the discovery. “Cas12 is really good at double-stranded DNA recognition, Cas13 is really good at single-stranded RNA recognition and now Cas14 completes the set: it is really good at single-stranded DNA recognition.”

Cas14 is similar to Cas12 and Cas13 in that, after binding to its target DNA sequence, it begins indiscriminately cutting all single-stranded DNA inside a cell. Cas9, in contrast, binds and cuts only the targeted DNA.

The wanton cutting of DNA is a possible disadvantage in therapy, but a great advantage in diagnostics. The Cas14 protein can be paired with a fluorescent marker attached to a piece of single-stranded DNA. When Cas14 binds to its target DNA sequence – a cancer gene or a gene in infectious bacteria – and starts cutting DNA, it will also cut the DNA linked with the marker, generating a fluorescent signal.

“Cas14 targets single-stranded DNA in a much more specific way than Cas12 does,” added Harrington’s colleague, Janice Chen, who recently received her Ph.D. from UC Berkeley. “That was a really unexpected finding. Because it is so small, we barely thought it could work, but actually it is super-specific, which makes it also a really powerful addition to the diagnostic toolbox.”

Scientists from the Innovative Genomics Institute explain CRISPR diagnostics and the roles of the various Cas proteins in detecting infections, cancer and other diseases. (IGI video)

Harrington, Chen and their colleagues, including CRISPR-Cas9 inventor Jennifer Doudna, a UC Berkeley professor of molecular and cell biology and of chemistry, have adapted Cas14 to work with their diagnostic system, called DETECTR, which now uses Cas12 and Cas13 to quickly detect the presence of infectious organisms and genetic mutations. Harrington, Doudna and Chen are co-founders of a company, Mammoth Biosciences, that is commercializing DETECTR.

The discovery will be reported online Oct. 18 in advance of print publication in the journal Science. Doudna is a Howard Hughes Medical Institute investigator, co-director of the Innovative Genomics Institute and a faculty scientist at Lawrence Berkeley National Laboratory. Banfield is the microbiology lead for IGI and a Berkeley Lab affiliate.

Mining metagenomes

The Cas14 protein was found by co-first authors Harrington and David Burstein, now a professor at Tel Aviv University in Israel, as they looked for Cas variants in a database of microbial genomes and metagenomes assembled over the past 15 years by colleagues at the Department of Energy’s Joint Genome Institute in Walnut Creek, California.

graphic explaining isolation of Cas14 from Colorado soil

The smallest known CRISPR gene-editing system was found in a database of all microbial genomes sequenced from soil at a toxic cleanup site in Rifle, Colorado. (Iris Burstein image)

Numbering in the tens of thousands, the genomes were obtained by metagenomic sequencing of all the DNA in samples from a variety of exotic environments, and many of them were constructed by co-author Jill Banfield, a UC Berkeley professor of earth and planetary science and of environmental sciences, policy and management. The Integrated Microbial Genomes & Microbiomes (IMG/M) system is the world’s largest collection of microbial genes, currently at 55 billion and growing.

The main Cas14 protein studied was found in the genome of Archaea sequenced from groundwater samples obtained from a toxic cleanup site in Rifle, Colorado, though other variants were discovered in microbial genomes sequenced from other environments.

Two years ago, Harrington and Burstein discovered other small Cas proteins, CasX and CasY, while mining the metagenomics database.

Cas14 is half the size – between 400 and 700 amino acids in length – of CasX and smaller than all other known Cas systems, which range in length from 950 to 1,400 amino acids.

“By happenstance, we found these very small proteins, which other people just throw away because they don’t look like previously known CRISPR systems. They are too small,” Harrington said. “We decided, what the heck, let’s give it a shot. We tested it out and we were actually shocked to find that these were actual functional systems.”

Finding the gene for Cas14 in the database was only the beginning. Most Cas proteins to date have been found in bacteria, and thus work well in the standard lab bacterium, E. coli. But Cas14 is from Archaea – and a group of the smallest of the Archaea, called DPANN. All Cas proteins incorporate bits of RNA for targeting and binding, but Cas14 won’t work with CRISPR-Cas9 RNAs, so the team also had to fish out of the database the two RNAs that must be present for Cas14 to function.

In addition, DPANN Archaea cannot be grown in the lab – they appear to be parasitic or in some way dependent on other larger Archaea – so the researchers had to create the right environment in a test tube.

Archaea in the tree of life

The Archaeal and eukaryotic (animal & plant) branches of the Tree of Life, showing the position of the DPANN group from which Cas14 was isolated. The bacteria, not shown, are along the line that extends out of the top of the image.

Consistent with its origins in a more primitive microbe, the slimmed down Cas14 appears to be a more primitive version of the larger and more complex Cas9 and Cas12 proteins, Harrington said, hinting that the molecules have evolved over eons to be more specialized. The researchers hope to learn from such primitive Cas proteins, which are the essential components of the Cas enzyme, so that they can design the most compact and sleek gene cutters they can.

Harrington noted that the metagenomic mining turned up various versions of Cas14 that may prove to be useful biotech tools. “One amazing thing … is just how diverse these systems are,” he said. “We’ve described more than 40 new CRISPR-Cas14 systems and eight different subtypes. This opens up the floodgates for investigation of these new CRISPR systems.”

Co-authors with Harrington, Burstein, Chen, Doudna and Banfield are Enbo Ma, Isaac Witte, Joshua Cofsky of UC Berkeley and David Paez-Espino and Nikos Kyrpides of JGI.

The work was funded by the National Science Foundation, U.S. Department of Energy, Innovative Genomics Institute and Paul Allen Institute.