Revolutionizing Cancer Research: DeepSomatic AI Identifies Genetic Variants in Tumors (2025)

Cancer, a genetic disease, is a complex and challenging battle. With various types and unique genetic underpinnings, finding effective treatments requires a deep understanding of the genetic mutations within tumor cells. This is where DeepSomatic comes in, offering a powerful tool to identify these mutations more accurately than ever before.

Unraveling the Genetic Mystery of Cancer

DeepSomatic, developed in collaboration with researchers at the University of California, Santa Cruz Genomics Institute and other esteemed institutions, is a game-changer. It utilizes machine learning and convolutional neural networks to identify genetic variants in tumor cells, adapting to different sequencing platforms and sample processing methods. The tool's flexibility is remarkable, even learning to recognize cancer types not included in its initial training.

We've made DeepSomatic and its high-quality training dataset freely available to the research community, hoping to accelerate cancer research and move towards precision medicine. This is part of a broader Google initiative to apply AI in cancer research, including analyzing mammograms and CT scans for breast and lung cancer screening, and partnering to advance research on gynecological cancers.

The Complexity of Cancer Genetics

Genetic variation in cancer is intricate. Unlike inherited variants, which are present in all body cells, cancer is often driven by variants acquired after birth. Environmental factors like UV light and chemical carcinogens, along with random DNA replication errors, can cause somatic cells to develop new variants. Sometimes, these acquired variants disrupt the normal behavior of cells, leading to uncontrolled replication and the development of cancer.

Identifying these acquired variants in tumor cells is a challenging task. Tumor cells may have a diverse set of variants at different frequencies, and the error rate in sequencing can be higher than the presence of somatic variants in a sample. DeepSomatic is designed to tackle these challenges, accurately identifying somatic variants and providing critical insights into the driving forces behind tumor growth.

Training DeepSomatic: A Deep Dive

DeepSomatic is trained to differentiate between the reference genome, inherited variants, and cancer-induced somatic variants in tumor cells. It can even identify somatic variation in tumor-only samples, such as in cases of leukemia where obtaining only normal cells is difficult. This versatility makes DeepSomatic applicable to a wide range of research and clinical settings.

Similar to DeepVariant, DeepSomatic converts genetic sequencing data into a set of images. These images represent various aspects of the sequencing data, including alignment, quality, and other variables. The convolutional neural network then analyzes this data, distinguishing between the different types of variants and discarding variations caused by sequencing errors. The result is a list of cancer-related variants or mutations.

Training accurate models for different cancer types requires comprehensive, high-quality data. For this purpose, a new training and evaluation dataset was created, sequencing tumor cells and accompanying normal cells from breast and lung cancer samples. The whole-genome sequencing of these samples was performed using three leading platforms, and the output was combined to create an accurate reference dataset called the Cancer Standards Long-read Evaluation dataset (CASTLE).

Testing DeepSomatic's Performance

DeepSomatic was trained on three breast cancer genomes and two lung cancer genomes from the CASTLE dataset. Its performance was then tested in various ways, including on a breast cancer genome not included in its training data and on chromosome 1 from each sample, which was also excluded from training.

Results show that DeepSomatic models outperformed other methods for each of the major sequencing platforms, identifying more tumor variants with higher accuracy. In particular, DeepSomatic excelled at identifying cancer variations involving insertions and deletions (Indels) of genetic code, significantly improving the F1-score, a balanced measure of precision and recall.

DeepSomatic was also tested on a preserved breast cancer tumor sample, which had been treated with formalin-fixed-paraffin-embedded (FFPE), a common preservation method that introduces additional DNA damage. The tool was trained on this sample data and then tested on chromosome 1, reserved from training. DeepSomatic again outperformed other tools, suggesting its potential to work with lower-quality or historic tumor samples.

Expanding Horizons: DeepSomatic and Other Cancers

To explore DeepSomatic's applicability to other cancer types, it was used to analyze a glioblastoma sample, an aggressive brain cancer with a small number of variants. DeepSomatic successfully identified these variants, demonstrating its ability to generalize its learning to different cancer types.

Additionally, DeepSomatic was employed to analyze pediatric leukemia samples in collaboration with Children's Mercy in Kansas City. Despite the challenge of working with tumor-only samples, DeepSomatic identified previously known variants and discovered 10 new ones, showcasing its effectiveness in such scenarios.

The Future of Cancer Research

We hope that research labs and clinicians will embrace DeepSomatic, using it to detect known cancer variants and potentially discover new ones. This could lead to the development of novel therapies and more effective treatments for patients. With tools like DeepSomatic, we can continue to unravel the complexities of cancer, driving progress towards a future where precision medicine becomes a reality.

Acknowledgments

We extend our gratitude to all research participants and donors whose contributions made this work and other biomedical research possible. Special thanks to our collaborators at UC Santa Cruz Genomics Institute, the National Cancer Institute, the Frederick National Laboratory for Cancer Research, Children's Mercy Hospital, and NYU. Writing contributions are acknowledged from Hannah Hickey, and research leadership support from Avinatan Hassidim, Katherine Chou, Lizzie Dorfman, and Yossi Matias. Communications support was provided by Resham Parikh and Isha Mishra.

Revolutionizing Cancer Research: DeepSomatic AI Identifies Genetic Variants in Tumors (2025)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Moshe Kshlerin

Last Updated:

Views: 6120

Rating: 4.7 / 5 (57 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Moshe Kshlerin

Birthday: 1994-01-25

Address: Suite 609 315 Lupita Unions, Ronnieburgh, MI 62697

Phone: +2424755286529

Job: District Education Designer

Hobby: Yoga, Gunsmithing, Singing, 3D printing, Nordic skating, Soapmaking, Juggling

Introduction: My name is Moshe Kshlerin, I am a gleaming, attractive, outstanding, pleasant, delightful, outstanding, famous person who loves writing and wants to share my knowledge and understanding with you.