Genomics

Genomics and Artificial Intelligence Technologies

At Produvia, we produce intelligent software. We also write letters about artificial intelligence (AI) to founders, executives, and decision-makers from all industries. These letters are meant to inspire and motivate companies, government agencies, and countries on the topics of AI, machine learning, and deep learning technologies.

At Produvia, we believe that artificial intelligence technologies will fundamentally change how genomics, biotechnology, and life sciences startups and companies turn data into actionable insights.

Before we talk about artificial intelligence, it is important to understand the genomics industry first.

Genomics Industry

The global genomics industry is worth $16.4 Billion USD as of 2018 and is expected to reach $41.2 Billion USD by 2025. The genomics industry consists of genomic products and services. The genomic products are expected to dominate the market due to the recurrent use of instruments and reagents for genomics research and the rising number of research programs undertaken by government and private organizations. The genomics services include next-generation sequencing, core genomics, biomarker translations, and many others. [1]

According to AngelList, there are 160+ genomic; 4,228+ biotechnology; 9,893+ life sciences; and 4,893,827+ startups around the world [2-5]. In other words, genomic startups represent about 4 percent of the biotechnology industry, 2 percent of the life sciences industry, and 3/1000 percent of all startups.

Today, the genomics industry is booming thanks to the increasing amount of data. Genomics data in the next 10 years is projected to equal and surpass other data-intensive disciplines including social media and online videos. [6]

Artificial Intelligence in Genomics

At Produvia, we predict that genomic startups that combine deep learning, computer vision, and natural language processing technologies will establish a competitive edge in the marketplace.

Deep learning, a sub-field of artificial intelligence, is combined with computer vision techniques to analyze the growing amount of genomics imagery data. In computer vision, deep learning algorithms that excel include convolutional neural networks and recurrent neural networks. These machine learning models are solving computer vision tasks such as image classification, semantic segmentation, and image retrieval.

Deep learning is also combined with natural language processing techniques to analyze the expanding amount of genomics-related text found in publically-available research papers. Deep neural networks are solving tasks such as named entity recognition, relation extraction, and information retrieval. Deep learning technologies are ideally suited to deal with natural language processing tasks since they offer state-of-the-art performance and overcome challenges with feature engineering.

At Produvia, we recognize the complexity of artificial intelligence in its applications in the genomics industry. As a result, we wrote this article as a guide for any stakeholders including patients, research participants, public, providers, researchers, advocacy groups, payers, and policymakers.

How AI and Genomics Will Save The Planet

In 2015, the United Nations (UN) set seventeen Global Goals, also known as Sustainable Development Goals (SDGs). The SDGs were adopted by all UN Member States as a universal call to action end poverty, protect the planet and ensure that all people enjoy peace and prosperity by 2030. [6]

Of the seventeen SDGs, the Produvia team identified five goals that can be solved with genomics and artificial intelligence technologies.

AI Goal #1: No Poverty

Can we really end poverty? Can we grow the middle class? These are really hard questions to answer. Satellite imagery was combined with machine learning to predict poverty [7]. Poverty has been linked to disease, chronic illness, childhood obesity, elevated blood lead levels, academic achievements, and DNA methylation [8-12]. How can machine learning help with these genomic causations or correlations? If we can predict disease or DNA methylation across genes, we can take preventative action in the fight against poverty.

AI Goal #2: Zero Hunger

How can humanity end hunger? Can we achieve a stable food supply? Can we end hidden hunger, also known as micronutrient deficiency? Certain hormones that regulate hunger and satiety [13]. Hunger can be detected in crying infants using deep learning [14]. Analyzing how people eat or their consumption patterns can reveal hidden hunger or gaps in micronutrient deficiency. Can people improve nutrition and promote sustainable agriculture? To answer these questions, consider that plant breeding and other agricultural technologies are greatly improved using machine learning. Increasing crop yield production will close the gap between crop output and hunger. Genetically improving cultivars and improving agronomic practices is one way to increase crop productivity [15]. If we make agricultural more productive, we can reduce world hunger.

AI Goal #3: Good Health and Well-Being

Can we live a healthier life? Can we promote the well-being of all humanity? Better detection of AIDS, tuberculosis, malaria and neglected tropical diseases are now possible thanks to deep learning. Imagine being able to create personalized genomic profiles of each person on earth. This will allow us to predict the outbreak of diseases knowing where the susceptibility lies. Humanity has the potential to edit human reproduction. With gene editing, we can create the next generation of humans, which are immune to the latest diseases and typical health conditions. Combing gene editing with machine learning will allow humanity to achieve customized genetic and genomic profiles of individuals. If we can better understand how the aging process affects health and longevity, we can create healthier societies. Today, we can use deep learning to detect changes in biomarkers (i.e., physiological variables, composite indices) using data from longitudinal studies.

AI Goal #4: Life Below Water

Can we conserve ocean life? Can humanity use the oceans, seas and marine resources sustainably? Genomics and machine learning can solve many problems to ensure the continuation of life below water. For example, we can classify ocean acidity to reduce declining fish stock. We can apply conservation genomics with deep learning technologies, to predict the biodiversity of living organisms. Can we improve our aquaculture? Over the past few decades, advancements in agricultural biotechnology have changed the way research is analyzed. Today, genomic data using is analyzed using a variety of computational tools including machine learning or deep learning.

AI Goal #5: Life on Land

Can we protect our ecosystem? Can we restore and promote sustainable use of terrestrial ecosystems, sustainably manage forests, combat desertification? Lastly, can humanity halt and reverse land degradation and halt biodiversity loss? Understanding complex ecosystems and how genes are affected by the environment is possible thanks to machine learning technologies. Deep learning can meet genome-scale metabolic modeling [16]. Machine learning technologies have demonstrated the ability to analyze large, complex biological data. Furthermore, the massive and rapid advancements in both biological data generation and machine learning methodologies are promising for further understanding of genomics and biological data. It’s now possible to classify microbial roles in ecosystems using deep learning [17]. Genomic tools, such as population genomics, meta-omics, and genome editing, can also restore ecosystems and biodiversity. Meta-omics can improve the assessment and monitoring of restoration outcomes. Gene editing can generate novel genotypes for restoring challenging environments. Using machine learning to analyze population genomics, meta-omics, and genome editing data will aid companies in developing solutions to improve life on earth.

AI Research in Genomics

Artificial intelligence research is driving technological breakthroughs all industry verticals, genomics included. Reading academic papers takes time and the technical language is not easy to understand. At Produvia, on the other hand, we keep up-to-date with the latest academic research papers so you don’t have to. Below, we highlight 20 AI and machine learning use cases for genomics [18-26]:

Genomics

Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. Here are five AI and machine learning applications for genomics:

  1. Extract genomic and epigenomic variants of clinical utility

  2. Identify genes

  3. Predict genomic associations

  4. Predict protein functions

  5. Predict sequence the specificity of DNA and RNA-binding proteins

Regulatory Genomics

Regulatory genomics is the study of genomic regions or features and how they regulate genes. At Produvia, we list five AI and machine learning applications for regulatory genomics:

  1. Classify gene expression

  2. Predict gene expression from genotype

  3. Predict promoters and enhancers

  4. Predict splicing

  5. Predict transcription factors and RNA-binding proteins

Functional Genomics

The field of molecular biology that attempts to describe gene functions and interactions is functional genomics. Here are five AI applications for functional genomics:

  1. Classify mutations and functional activities

  2. Classify subcellular localization

  3. Predict promoters and enhancers

  4. Predict splicing

  5. Predict transcription factors and RNA-binding proteins

Structural Genomics

Structural genomics is the field of genomics that involves the characterization of genome structures. At Produvia, we list five AI and machine learning applications for structural genomics:

  1. Classify protein tertiary structures

  2. Classify structures of proteins

  3. Predict contact maps

  4. Predict physical properties

  5. Predict protein secondary structures

AI Ideas for Genomics

You’re interested in artificial intelligence and machine learning, but don’t know where to start. At Produvia, we brainstormed several ideas for the application of artificial intelligence technologies in genomics. Here are thirty-five AI ideas for genomics:

  1. Annotate genes based on structure and chromosomes

  2. Classify cancer from gene expression profiles

  3. Classify genes

  4. Classify genomic profiles

  5. Classify mutation types

  6. Design targetted therapies

  7. Detect deoxyribonucleic acid regions that are predictive of gene expression

  8. Determine relationships between genotypes and phenotype

  9. Discover drugs for genomic medicine

  10. Distinguish between cancer and adenoma

  11. Estimate prevalence for chromatin marks

  12. Extract transcriptome patterns

  13. Identify biomarkers for a disease

  14. Identify enhancers

  15. Identify pairwise variable associations between genomic data types

  16. Identify positioned nucleosomes

  17. Identify potentially valuable disease biomarkers

  18. Identify promoters

  19. Identify subtype of breast cancer tumor

  20. Identify transcription factor binding sites

  21. Identify transcription start sites, splice sites, exons

  22. Interpret regulatory control in single cells

  23. Model regulatory elements

  24. Partition and label the genome with chromatin state annotation

  25. Predict chromatin marks from deoxyribonucleic acid sequences

  26. Predict disease phenotype or prognosis

  27. Predict gene function

  28. Predict genetic interactions

  29. Predict protein backbones from protein sequences

  30. Predict regulatory functions and relationships

  31. Predict sequence the specificity of enhancer and cis-regulatory regions

  32. Predict the specificities of deoxyribonucleic acid-binding and ribonucleic acid-binding proteins

  33. Predict the splicing activity of individual exons

  34. Predict variant deleteriousness

  35. Quantify effects of single nucleotide variants on chromatin accessibility

Challenges and Opportunities in Genomics

The use of artificial intelligence technologies to solve genomics problems poses many challenges. These industry challenges also present opportunities for AI technology providers, such as Produvia, to solve market problems and create AI solutions. Below, we list three genomics opportunities:

  1. Generating ground-truth labels or genomics datasets can be expensive

  2. “Right to an explanation” laws must be addressed

  3. Longitudinal studies are required

How can AI companies overcome these challenges? At Produvia, we believe that industry collaboration will overcome Challenge #1, algorithmic transparency will overcome Challenge #2, and long-term research projects will overcome Challenge #3.

Conclusion

The combination of artificial intelligence technologies and genomics has the potential to end poverty, end hunger, protect, restore and promote aquatic and terrestrial ecosystems.

Next Step

Are you interested in solving genomics problems?

Schedule a discovery call with Slava Kurilyak, Founder/CEO at Produvia.

Slava Kurilyak helps purpose-driven organizations to increase revenue and decrease expenses by developing artificial intelligence solutions that drive impact.

At Produvia, we serve companies with $1+ million dollars in revenue to accelerate the development of artificial intelligence technologies.

References

  1. Research, Z. (2019). Global Genomics Market Will Reach USD 41.2 Billion By 2025: Zion Market Research. GlobeNewswire News Room. Retrieved 1 September 2019, from https://www.globenewswire.com/news-release/2019/04/10/1801776/0/en/Global-Genomics-Market-Will-Reach-USD-41-2-Billion-By-2025-Zion-Market-Research.html

  2. Genomics Startups. (2019, October 26). Retrieved October 26, 2019, from AngelList website: https://angel.co/genomics-2

  3. Biotechnology Startups. (2019, October 26). Retrieved October 26, 2019, from AngelList website: https://angel.co/biotechnology

  4. Life Sciences Startups. (2019, October 26). Retrieved October 26, 2019, from AngelList website: https://angel.co/life-sciences

  5. All Startups Startups. (2019, October 26). Retrieved October 26, 2019, from AngelList website:https://angel.co/all-markets

  6. dpicampaigns. (2018). About the Sustainable Development Goals — United Nations Sustainable Development. Retrieved October 26, 2019, from United Nations Sustainable Development website: https://www.un.org/sustainabledevelopment/sustainable-development-goals/

  7. Jean, N., Burke, M., Xie, M., Davis, W. M., Lobell, D. B., & Ermon, S. (2016). Combining satellite imagery and machine learning to predict poverty. Science, 353(6301), 790–794. https://doi.org/10.1126/science.aaf7894

  8. Global genomics disparities in the wake of personalised medical services: International Journal of Medical Engineering and Informatics: Vol 1, No 4. (2009). Retrieved October 27, 2019, from International Journal of Medical Engineering and Informatics website: https://www.inderscienceonline.com/doi/abs/10.1504/IJMEI.2009.026812

  9. Newacheck, P. W. (1994). Poverty and Childhood Chronic Illness. Archives of Pediatrics & Adolescent Medicine, 148(11), 1143. https://doi.org/10.1001/archpedi.1994.02170110029005

  10. Chokshi, D. A. (2018). Income, Poverty, and Health Inequality. JAMA, 319(13), 1312. https://doi.org/10.1001/jama.2018.2521

  11. Wexler, B. E., Imal, Ahmet Esat, Pittman, B., & Bell, M. D. (2019). Executive Function Deficits Mediate Effects of Poverty on Academic Achievement: An Important Target for Interventions to Enhance Neurocognitive Development in At-Risk Children. Retrieved October 27, 2019, from Ssrn.com website: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3369774

  12. Poverty leaves a mark on our genes. (2019). Retrieved October 26, 2019, from Northwestern.edu website: https://news.northwestern.edu/stories/2019/04/poverty-leaves-a-mark-on-our-genes/

  13. Mesmar, B., & Steinle, N. (2020). Genomics of Eating Behavior and Appetite Regulation. Principles of Nutrigenetics and Nutrigenomics, 159–165. https://doi.org/10.1016/b978-0-12-804572-5.00020-3

  14. Barajas-Montiel, S. E., & Reyes-Garcia, C. A. (2019). Identifying Pain and Hunger in Infant Cry with Classifiers Ensembles. International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC’06). https://doi.org/10.1109/cimca.2005.1631561

  15. Borrill, P., Harrington, S. A., & Uauy, C. (2018). Applying the latest advances in genomics and phenomics for trait discovery in polyploid wheat. The Plant Journal. https://doi.org/10.1111/tpj.14150

  16. Zampieri, G., Vijayakumar, S., Yaneske, E., & Angione, C. (2019). Machine and deep learning meet genome-scale metabolic modeling. PLOS Computational Biology, 15(7), e1007084. https://doi.org/10.1371/journal.pcbi.1007084

  17. Handley, K. M. (2019). Determining Microbial Roles in Ecosystem Function: Redefining Microbial Food Webs and Transcending Kingdom Barriers. MSystems, 4(3). https://doi.org/10.1128/msystems.00153-19

  18. Akdemir, D. (2013). Locally epistatic genomic relationship matrices for genomic association, prediction and selection. arXiv.org. Retrieved 18 September 2019, from https://arxiv.org/abs/1302.3463

  19. Hoadley, E. (2011). Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. ArXiv E-Prints, arXiv:1102.4110. Retrieved from https://ui.adsabs.harvard.edu/abs/2011arXiv1102.4110L/abstract

  20. Wikipedia Contributors. (2019, October 23). Sustainable Development Goals. Retrieved October 26, 2019, from Wikipedia website: https://en.wikipedia.org/wiki/Sustainable_Development_Goals

  21. Deep Learning in Medical Image Analysis. (2019). @AnnualReviews. Retrieved 7 October 2019, from https://www.annualreviews.org/doi/10.1146/annurev-bioeng-071516-044442

  22. SDGs .:. Sustainable Development Knowledge Platform. (2015). Retrieved October 26, 2019, from Un.org website: https://sustainabledevelopment.un.org/topics/sustainabledevelopmentgoals

  23. Deep learning for genomics. (2018). Nature Genetics, 51(1), 1–1. doi:10.1038/s41588–018–0328–0

  24. Xiong, M., & Ma, L. (2013). An Efficient Sufficient Dimension Reduction Method for Identifying Genetic Variants of Clinical Significance. arXiv.org. Retrieved 18 September 2019, from https://arxiv.org/abs/1301.3528

  25. Kwak, G. H.-J., & Hui, P. (2019). DeepHealth: Deep Learning for Health Informatics. Retrieved October 28, 2019, from arXiv.org website: https://arxiv.org/abs/1909.00384

  26. Dinalankara, W., & Bravo, H. (2013). Anomaly Classification with the Anti-Profile Support Vector Machine. arXiv.org. Retrieved 18 September 2019, from https://arxiv.org/abs/1301.3514