Multi-omics analysis of CHO cell lines
SUPERVISOR: Nicole Borth
Background.
Over the last 10 years, our group and others have collected big data on several levels of cellular
information across a wide array of CHO cell lines, including genome sequences of multiple cell lines,
epigenetic marks such as DNA-methylation and histone modifications, transcriptome and proteome
analysis. Few studies have been comprehensive in the sense that all of these were analysed from the same
experiment so that a full integration of multiple layers of cellular integration would be possible. In
addition, in most of these experiments the generated data set only from this experiment was analysed
without a global view across the many different CHO cell lineages and subclones.
For example, the typical situation for a transcriptome analysis is that a handful of cell lines or culture
conditions/culture states are being compared to look for genes or pathways that are differentially
expressed reflecting the specific situation of the cell in one of the investigated conditions or connected to
a behavior of interest, such as high productivity. Given the high variation of different CHO cell lineages
and subclones and the variation in transcriptomes that ensues due to culture, process or medium changes,
the results are always limited to the specific cell line or condition investigated and thus cannot be
universally applied or extrapolated. In addition, even for the same purpose and under the same condition,
most cellular pathways are redundant – that is, there are more ways to achieve the same goal and cells
are able to use all of them. Therefore, most studies that investigated genes related to increased cellular
productivity identified similar pathways as affected, but rarely the same gene(s), further complicated by
the fact that the precise expression level of multiple genes may be involved in determining phenotypes,
not just which genes are turned on or off. Just as in the Magic Eye 3D image below, the patterns that
underlie cellular properties will remain hidden amongst the complexity of the available data sets and will
require innovative thinking and tools to be made visible.
Aims and methods.
The main objectives of this project are:
- To obtain a comprehensive overview over both coding and non-coding genes expressed across the variety of CHO cell lines and lineages using all in-house and publicly available datasets
- To develop new analysis pipelines using either a combination of available scripts and programs or writing new ones, that integrate multiple layers of regulation, for instance by connecting
- DNA-methylation patterns with the resulting transcriptome or the transcriptome and miRNome with the resulting proteome
Currently genome sequences for >20 cell lines area available, DNA-methylation patterns for 12 cell
lines/states, approx. 400 transcriptome data sets including coding and long-noncoding RNAs and
proteome data for 20 samples. Approximately two thirds of these data sets were generated in house, so
we also have detailed information about cell behavior and process performance. We also have a
comprehensive dataset for an industrial fedbatch process including all of the above analyses plus detailed
metabolic characterization (generated in collaboration within the eCHO systems ITN network).
Thus, we can now aim to apply existing and/or develop new pipelines for analysis of big data sets such as
these, to extract seemingly hidden players and complex interactions between genes that control relevant
cellular properties and stress responses, so as to identify new engineering targets or to control cellular
behavior for improved process performance. For instance, the majority of highly expressed genes seems
to be expressed across all samples so far investigated, however at different expression levels. Thus it
seems that the diversity in phenotypes observed in CHO cells is mainly driven by the individual pattern of
expression levels across this set of genes, rather than by new genes being turned on or off. This has
important implications in the design of engineering strategies as it would entail being able to control
multiple genes in a fined tuned and individually adjusted manner, rather than the crude overexpression
of one gene so far employed.
Further, the rules and connections between the different layers of regulation have not been investigated
in full detail yet, due to the lack of comprehensive studies that collect complete datasets (understandably
due to the cost of such endeavours). Given the rich datasets available in our team we can start and
implement the first steps and pipelines for such an integrated type of analysis.