Data-driven Optimisation of Protein Production for Industrial Biotechnology

 
 

Introduction

The choice of codons, often referred to as codon optimisation, is known to significantly affect recombinant protein expression levels in engineered organisms. The most common codon optimization strategies, typically based on the relative abundance of each codon in genes of the recipient manufacturing host, are rather simplistic and not linked to high expression per se. Ingenza aims to improve upon these unreliable strategies using AI/machine learning to better predict which codons are best suited for optimal protein expression. It is anticipated that this will become a routine requirement of customers and a capability which will benefit the future Scottish economy.

 

Challenge

The objective of this project was to increase the level of recombinant protein expression and secretion in Pichia pastoris.

 

Solution

IBioIC funding enabled Ingenza to utilise the bioinformatics experience of Dr Chris Wood, University of Edinburgh to supplement its own knowledge in machine learning, while generating focused DNA libraries that maximised the training value of smaller biological data sets.

 The project partners used AI/machine learning to identify the most effective DNA sequences for protein expression and secretion. Ingenza generated libraries of DNA sequences encoding secreted proteins which were driven by bioinformatics to maximise diversity within the given library size. The levels of secreted protein were then evaluated and determined via a combination of Pichia surface display and fluorescent antibody staining. Using FACS, the library was sorted and then sequenced, and the sequence data was used to generate predictive algorithms to establish the best DNA sequence for effective expression and secretion.

  

Outcome

The project strengthened the relationship between the project partners and highlighted the synergies of expertise which exist between them.

The data generated from RFP library sequencing will allow Ingenza to better predict optimal DNA coding sequences for expression / secretion in Pichia, which will help them to secure future projects and customers.