Editorial – Using Artificial Intelligence to Select Drug Targets in Oncology

Author(s) :

Tudor I Oprea1,2,4 and Virgil Păunescu3,4

1 The University of New Mexico School of Medicine, Albuquerque, New Mexico, USA
2 Expert Systems Inc., San Diego, USA
3 “Victor Babeş” University of Medicine and Pharmacy, Timişoara, Romania
4 Oncogen Center for Gene and Cellular Cancer Therapies, Timișoara, Romania

Corresponding author: Tudor I Oprea, Email: tudorzinho@gmail.com

Published: IV, 1, 5 July 2024, v - viii DOI: 10.53011/JMRO.2024.01.01



July 4, 2024 0 Comments

For decades, scientists have approached cancer as a disease of the genome (1). Efforts to collect multi-faceted, heterogeneous data such as tissue-based somatic mutations (2) and cancer cell line expression and perturbation (3), have contributed to breakthroughs such as the Hallmarks of Cancer (4,5) and The Cancer Genome Atlas (TCGA) (6). These efforts have framed our understanding of cancer at the molecular level and laid the foundational roadmap for drug target identification in oncology. The therapeutic management of cancer, an out-of control process of cellular proliferation and dissemination, typically aims to selectively inhibit specific molecules or pathways crucial for tumor growth and survival (7). Targeting specific mutations, such as BRAF V600E and KRAS G12C, has resulted in clinically successful treatments for melanoma (e.g., vemurafenib as BRAF inhibitor) and non-small cell lung carcinoma (e.g., sotorasib as KRAS inhibitor) (8).

Target selection is a critical step in pharmaceutical research and development, as it remains the major driver for therapeutic efficacy and patient safety. As outlined elsewhere (8), target selection starts from identifying tumor-specific actionable mutations via NGS (Next-Generation Sequencing). This nucleic acid sequencing technology identifies common and rare genetic aberrations in cancer. Through sequential oligonucleotide capture, amplification, and NGS, point of care diagnostic tools further support this process through mutational evaluation. In addition to patient-derived clinical data, pan-cancer analyses, and biomedical literature are frequently used to understand molecular pathways affected by specific mutations, further guiding therapeutic target selection. Functional genomics (9), genome-wide association studies (GWAS), and polygenic scores (10) are increasingly incorporated in clinical model assessments of cancer therapeutic targets.

Despite the widespread usage of these methodologies, several limitations have become apparent. First, cancer is a complex disease, with a subtle interplay between the environmental and genetic factors concerning tumor growth and survival. Intra-tumor heterogeneity studies improve our understanding of the evolutionary forces driving subclonal selection (11), whereas genetic (clonal) and non-genetic adaptive reprogramming events can explain primary and secondary drug resistance in cancer (12). Furthermore, elucidating the exact mechanism of action (MoA) drug targets in cancer is not trivial, as many anti-cancer drugs continue to exhibit tumoricidal activity even after the (suspected) MoA targets have been knocked out (13). Indeed, offtarget effects often compound biological phenotype interpretation (e.g., loss of cell viability or slowing tumor growth) (14). Against this backdrop, large-scale data integration coupled with artificial intelligence and machine learning (AIML) (15) can improve target selection in oncology.

AIML technologies can rapidly process a diverse set of oncology-related resources such as TCGA (6), COSMIC (2), DepMap (16), and others by coalescing large datasets into a seamlessly integrated platform. This is particularly true if large language models (LLMs) such as GPT-4 (17) are incorporated into the data ingestion workflow. From genomic and transcriptomic data to realworld evidence, AIML can sift through layers of evidence and produce models faster than traditional methods. This potential efficiency increase and the ability to develop multiple parallel models can offer testable hypotheses.

The ability to integrate and analyze vast datasets with AIML techniques holds promise for uncovering novel insights and therapeutic targets in various fields of medicine. By leveraging these AIML advancements, these technologies can be applied to most complex diseases, not just oncology. For instance, neurodegenerative diseases like Alzheimer’s disease present similar challenges due to their multifactorial nature and the interplay between genetic and environmental factors.

Recognizing the potential of AIML in complex disease biology modeling, we integrated a set of 17 different resources focused on expression data, pathways, functional terms, and phenotypic information with XGBoost (18), an optimized gradient boosting (machine learning) algorithm, and Metapath (19), a feature-extraction technique, to seek novel genes associated with Alzheimer’s disease (20). Of the top-20 ML-predicted genes previously not associated with Alzheimer’s pathology, five were experimentally confirmed using multiple methods. The same set of integrated resources, combined with MetaPath and XGBoost, resulted in the temporally validated identification of seven top-20 and two bottom-20 genes associated with autophagy (21).

Building on our success in Alzheimer’s and autophagy research, we used this integrated approach (the above dataset and algorithms) to develop 41 distinct blood cancer AIML models starting from primary tumor type and histology (22). We contrasted 725 cancer-specific genes curated in the COSMIC cancer gene census, serving as the positive set, with 440 manually curated housekeeping genes that served as the negative set. The 41 AIML models identified the expected “frequent hitters,” such as GAPDH, AKT1, HRAS, TLR4, and TP53, all having wellunderstood roles in cancer. Other genes, such as IRAK3, EPHB1, ITPKB, ACVR2B, and CAMK2D, were predicted to be relevant in 10 or more hematology/oncology malignancies. In contrast, some genes were associated with just one cancer: For example, LPAR5, GPR18, and FCER2 are predicted to be relevant only in primary bone diffuse large B cell lymphoma (22).
Cell-based validation studies for some of these genes are ongoing.

Although AI-based target selection in oncology primarily relies on gene-phenotype association models, it also offers other potential applications: 1) processing oncology biomarkers for therapeutic targeting; 2) enhancing the understanding of gene variants of uncertain significance (VUS) through in-depth context and real-world evidence; and 3) improving animal and preclinically validated model interpretation by incorporating human pathology and physiology.

Challenges and limitations of AIML technologies include: 1) data and information quality, where the maxim “garbage in, garbage out” underscores the importance of data veracity; 2) model interpretability, which is increasingly addressed through “explainable AI” to ensure that AIML models can be interpreted by humans and can aid decision-making in research and clinical development; and 3) awareness of data bias and leakage as well as ethical considerations, to prevent discriminatory practices and ensure fairness in model development.

The future of target selection in oncology is likely to incorporate AIML technologies. By processing vast datasets more rapidly and efficiently and by offering enhanced context for gene VUS, somatic mutations, and biomolecular pathways, AIML models are poised to improve target identification and validation for common and rare cancers.

Notify of
Inline Feedbacks
View all comments