11 February, 2026
machine-learning-revolutionizes-nucleic-acid-aptamer-analysis

The research team led by Weihong Tan, Xiaohong Fang, and Tao Bing from the Hangzhou Institute of Medical Sciences, Chinese Academy of Sciences, has unveiled a groundbreaking method for nucleic acid aptamer sequence analysis using machine learning. This innovative approach allows for the direct parsing of secondary structures from single-round screening data, significantly accelerating the discovery and optimization process of nucleic acid aptamers. Their findings were published as an open access Research Article in CCS Chemistry, the flagship journal of the Chinese Chemical Society.

Nucleic acid aptamers, known for their ability to specifically recognize target molecules, present a challenge due to their diverse and complex secondary structures. Traditional techniques like electron microscopy and X-ray crystallography have been inadequate in efficiently resolving these structures, hindering the optimization of aptamers. The new method proposed by the Chinese research team addresses these limitations by leveraging machine learning to analyze core sequences within aptamer families through a single round of screening.

Advancements in Aptamer Analysis

To tackle the challenges in aptamer analysis, the researchers developed a machine learning-based analytical method. This approach employs unsupervised autoencoder clustering and deep learning to identify core sequences from single-round screening data. By using these core sequences as indices, the method extracts common secondary structural features of nucleic acid aptamers, facilitating rational truncation and performance optimization.

In their study, the researchers focused on the CD8 protein, analyzing the single-round aptamer screening sequences. Despite the heterogeneous sequence background, they discovered a common core sequence, “GTGAGGAGCTTGAAA,” which traditional alignment methods failed to identify. The team synthesized a library containing this core sequence and confirmed its presence in over 20,000 nucleic acids obtained from screening, demonstrating the method’s efficacy.

Machine Learning’s Role in Structural Analysis

Machine learning algorithms further analyzed the secondary structures mediated by the core sequences. In the fixed-region sequence “5′-AGCTTGAAA-3′”, 62.4% of the sequences formed stem-loop structures. The sequence “GTGA” was prevalent in both multi-branched loops and stem structures, highlighting the method’s ability to deduce shared secondary structures capable of binding to the same target epitope.

By applying this analysis, the researchers truncated and optimized the nucleic acid sequences, significantly enhancing their affinity. This resulted in the successful identification of over 10,000 potentially active CD8-specific aptamers.

Broader Applications and Implications

To demonstrate the method’s versatility, the researchers applied it to fibroblast activation protein (FAP) screening data. They identified a highly conserved core sequence, “5′-GGGGTCTGCTTCGGATTGCGG-3′,” suggesting a G-quadruplex structure. This discovery underscores the method’s potential to handle different structural types, improving binding affinity through truncation and optimization.

This machine learning-enabled approach challenges traditional paradigms by emphasizing the importance of spatial conformation in molecular recognition. It opens new avenues for designing functional nucleic acids and developing AI-driven virtual screening platforms, which could revolutionize next-generation nucleic acid aptamer technologies for precision diagnosis and treatment.

Support and Future Prospects

This research was supported by several prestigious organizations, including the National Natural Science Foundation of China and the Zhejiang Provincial “Pioneer” and “Leading Goose” R&D Program. The successful application of this method across different aptamer types highlights its potential to become a standard in nucleic acid analysis.

As the field of nucleic acid aptamer research continues to evolve, the integration of machine learning techniques promises to enhance the efficiency and accuracy of aptamer discovery. This approach not only accelerates the discovery process but also provides a framework for exploring non-coding RNA-protein interactions and other complex molecular systems.

The findings published in CCS Chemistry represent a significant leap forward in the field of chemical sciences, showcasing the potential of interdisciplinary research to drive innovation. As the research community continues to explore the applications of machine learning in molecular biology, the future of nucleic acid aptamer technologies looks promising.

For more information on the journal, visit CCS Chemistry.