This paper has now been published at PLOS Computational Biology.
“Active degradation of a regulator controls coordination of downstream genes” by Nicholas A. Rossi, Thierry Mora, Aleksandra M. Walczak, and Mary J. Dunlop
https://doi.org/10.1101/272120
In this article, Rossi and colleagues observed how noise develops in the expression of genes downstream of stress response activator gene MarA by manipulating MarA degradation and expression levels and then measuring variability in downstream gene expression in single cells. As the authors note, active degradation is rare in rapidly growing bacterial cells and imposes a significant fitness cost. It is interesting to know what advantages active degradation can provide, beyond than faster response times. Rossi and colleagues demonstrate in simulations and experiments that activator degradation influences the rate at which variability in the concentration of a regulatory protein is generated in a growing colony, and the degree to which that generates coordinated variability in two downstream genes.
I selected this article because it introduces a number of experimental and analytical approaches that could be immediately useful to groups performing similar work. I also appreciated that the authors recognized the interdisciplinary nature of the article and introduced helpful analogies and illustrations to communicate topics that will be unfamiliar to many readers. For example, with respect to the importance of coordinating expression of regulated genes:
“Upregulating one gene without the others may incur needless waste. Conceptually, this is like locking both the front door and the back door to a home in order to protect it; locking just one door does not make the home safer and simply expends energy. Thus, coordinating expression of stress response genes is essential.”
I thank the authors for sharing their work on bioRxiv and also for including all supplementary text, figures and movies in the bioRxiv submission.
I solicited reviewers from 3 scientists (1 postdoctoral scholar, 1 independent researcher, and 1 professor) with experience both in stochastic modeling and relevant experimental techniques. I thank the reviewers for taking the time to provide such thorough reviews. All reviewers chose to remain anonymous.
The reviewers unanimously requested more details about simulation methods: a clearer description of the algorithm used, its parameters, and the connection between the parameters and the experimental system. All reviewers made additional, valuable suggestions, and the reviewers agreed that work will be a valuable addition to the field.
Reviewer 1 (Anonymous Reviewer)
In “Active degradation of a regulator controls coordination of downstream genes” the authors present experimental and theoretical approaches to explore the effect of transcription factor lifetime on downstream fluctuations. I appreciate their general approach and like their experiments, but it seems to me that the interpretation and simulations need more careful consideration.
The starting point of their analysis is that phenotypic diversity in a population is useful, but it has to be coordinated, in the sense that for certain genes their fluctuations need to be correlated or anti-correlated, among other things. This is true, and an important point to make, but I think they are both too aggressive and too imprecise in their use of the words. With regards to the “aggressiveness”: while I am on the author’s side on the issue of the importance of expression changes as an adaptive strategy, a large proportion of biologists still see expression noise as irrelevant or a niche subject. They might even object to referring to phenotypic variation as diversity, especially considering the word’s use in ecology. So phrases like “the rate at which diversity appears in downstream genes” sounds either a bit jarring or confusing as it would most commonly refer to allelic variation. It might be intentional, to highlight the adaptive usefulness, but they go a bit too far.
This is perhaps a style issue, but it leads into the main point of impreciseness of the definitions. The authors say “We consider diversity to be measurable as the distribution of distinct protein concentrations present in the population”. But what specifically about those distributions? The CV? The entropy? They then say “We used mutual information to quantify increasing coordination of downstream genes”. This is a reasonable measure. But still, is it compared for a fixed total CV? Mutual information could increase because they are increasingly correlated or because their total noise went up. The perils of this imprecision are clearer in fig 2A and accompanying text. They say “the longer half-life allows the trajectory of the protein concentration to wander farther away from the mean, increasing the standard deviation”. But this could mean a noise reduction or (as seems to be implied) an increase. In the most basic models of stochasticity in gene expression, a longer protein lifetime reduces the protein noise (while increasing the average and standard deviation) by time averaging the mRNA noise, as the authors point out later in the context of transmitted noise from previous genes.
This problem is exacerbated by the use of the Ohrnstein-Uhlenbeck process instead of a full Gillespie simulation. In figure 2A it would seem that the noise is increased by increasing the protein lifetime. But the means don’t change (or the normalization is not indicated) as would happen if the protein lifetime was changed. I don’t mean that an Ohrnstein-Uhlenbeck process can’t be used, but the relation with the actual biochemical parameters should be made explicit to avoid misinterpretations. It is reminiscent of something that happened in the early days of noise studies: the use of the Fano factor instead of the CV lead to a few papers with good experiments but the misinterpretation that translation noise was the important one as opposed to mRNA noise. It was only clarified when comparing at constant mean.
A similar problem arises with figure 4 and the accompanying text: while the point is valid, the analogy needs to be made more precise. What is the biological importance of having the expression be more correlated while it reaches full level? The mutual information they discuss is between the two downstream genes, not between some upstream data and what is encoded in the expression of the downstream genes, so the analogy with the transmission of an image might be misleading.
Having said that, I believe the points they are trying to make are important and timely, in that our advancing knowledge of noise in gene expression requires us to look at subtler points in analyzing the resulting phenotypic diversity. I also agree that it is high time we looked at phenotypic diversity as a basic adaptive trait. In that sense, I especially like how they look at the noise as a function of colony size, because it makes the population aspect more explicit. This is sometimes lost in (standard) approaches that look at distributions as the snapshot of an idealized population that is both constant and growing.
In summary, while this is a valuable paper it would greatly benefit from a more precise treatment of the terms, particularly “coordinated diversity”, and a clearer connection with biological function.
Minor Points
- It says “Phenotypic diversity arises in isogenic populations due to stochastic expression of genes”. Phenotypic diversity arises not only from stochasticity in gene expression, but also epigenetics and protein activation states.
- The intro changes from general explanation with the questions to be addressed interspersed to a final paragraph that is very succinct and to the point. Either style is fine but together they clash a little. They explain carefully mutual information but mention the Ohrnstein-Uhlenbeck process casually; in general mathematicians/physicists will know both and biologists neither.
- The cells in the last panels of figure 2H are what makes the point clearest, they should make those panels bigger, perhaps splitting figure 2. Same with the cells in 1C, to be able to see them requires the panel to be too big, perhaps separate them.
- It’s hard to compare the initial slopes in figures 2E and 3E. would it be possible to make finer timesteps in the first 10 minutes? I understand this might be technically unfeasible, but especially for the wild type in 2E it looks like it just jumps to a constant level, and since they are looking in detail at the slope of the mutual information it would be useful to be able to compare.
- Why ignore the mRNA noise in the stochastic simulations? The time averaging that is usually assumed to simplify it is of the same order of the effect that is being studied. It also seems strange to compute the mutual information from the correlation.
- In the first sentence of the Variance section, a paper is cited by number but the Bibliography has no numbering.
Reviewer 2 (Anonymous Reviewer)
This is a study that integrates both theoretical and experimental approaches to understand how the half-life of an upstream gene product (e.g. a TF) can impact the rate of coordination and the maximum coordination of the downstream genes that such TF regulates. Of importance, the authors find evidence for a potential trade-off between these two metrics. The study is of relevance for the design of synthetic gene circuits where robust propagation of information and noise control are desired, or in naturally occurring systems where understanding the adaptive advantages of tuning TF half-lives within this trade-off is important.
The manuscript is concise and well written. I particularly appreciate the clarity and use of analogies to make complex concepts accessible to scientists from different backgrounds. However, I was left willing to see more regarding the model development and stochastic simulation methods to properly assess its predictions. The subsection dedicated to this entitled “Stochastic simulations” is nothing of the sort. Instead, it is a rather short description of the mathematical model. I would’ve like to see how did the authors derive the model. Why is there no synthesis term for activator X? Why is this a good model for the biological question being addressed? This is particularly important since the model is actually predicting negative protein concentrations (Figs. 1A,B, 2A, and 2B). This should be amended.
Also, why concentration units are not given in any of the graphs? Would a model considering protein numbers instead of concentrations be more appropriate? I believe it should be made clear whether the model intends to be qualitative or quantitative, so that its reaches and limitations are fairly assessed by the reader. These modelling details should be included within the main methods section, whereas a description of the actual computer simulation methods (e.g. the particular stochastic simulation algorithm used, numerical integrator, pseudocode, justification for the choice of parameter values, modelling software or programming language package employed, etc.) could have been included as supplementary, but not ignored. These materials are as important for reproducibility as the supplementary information related to plasmids.
Reviewer 3 (Anonymous Reviewer)
In their manuscript “Active degradation of a regulator controls coordination of downstream genes” Rossi and coworkers address the question how variability in an activator protein, as controlled by active degradation, can result in coordinated expression levels of target genes and their proteins. The authors arrive at the conclusion that two main aspects need to be considered: the rate at which coordination between the target proteins can be attained, and what maximal level of coordination can be attained. These conclusions are based on numerical simulations as well as experiments with bacterial colonies in which MarB acts as an activator of the target proteins inaA and acrAB.
The question addressed by the authors is highly interesting, and a solid mathematical as well as experimental study would be very valuable for the understanding of such a regulatory element in the context of gene regulation. However, the current study fails to analyze the mathematical relationships and regulatory principles to a level that allows systematic insight. Also, no model parameters, or description of how they were obtained, are given. Further, the experimental procedures are lacking crucial controls, and raise significant concerns regarding the proper characterization of the effect of the experimental treatments. Lastly. central statements on the regulatory logic made by the authors are not actually verified from their experimental data.
If so desired, the authors could also strengthen the scope by following aspects. However, even if the points currently addressed would be executed in a reliable manner, this would already be a very valuable contribution. So these are just suggestions.
- Role of induction delay, expression stochasticity, and degradation properties of the target proteins
- Role of regulatory interactions between the downstream targets, such as positive and negative feedbacks
- In the analysis framework, attribute the increase in mutual information more thoroughly to colony size, properties of the activator, and properties of the target genes/proteins
- More detailed analysis of the different layers from which stochasticity arises, i.e. activator expression, activator degradation, target gene induction, target gene expression, target protein translation, target protein degradation
Major concerns
Unclear introduction: While the goal of the study is, in itself, quite clear, its explanation in the introduction confused me more than it helped me to understand what the authors’ aim to do. In my reading, the authors set up to measure the coordination between target genes of MarA, which they aim to quantify by the measure of mutual information between the expression levels of the target genes’ gene products (the translated and folded proteins). As they seed their colonies from single cells, the mutual information will increase with colony growth and increasing diversity in the colony, but at different rates and with different plateau levels. They then want to assess how these rates and plateau levels depend on the degradation rate of the main activator MarA. I find this a clear and interesting question, and appreciate that the introduction tries to explain this also in non-mathematical terms. However, this goal in my perception is not achieved, as I have to guess at what the authors did instead of the authors telling me exactly what to expect.
Missing model parameters: I could not find any information on simulation parameters, or how these parameters were chosen.
Missing control of off target effects of a sgRNA with known off target problems: The authors themselves cite work that has shown problems in the sgRNA mediated knockdown of Lon. Then they proceed with a check for “qualitative morphological changes”, after which they conclude there are no problems. This is in spite of a detection of differences in growth rate (Fig. S1). This is insufficient and makes the results obtained with this method unreliable in terms of the stated research question. Also, there are no controls for CRISPRi transfection, so we do not know if the transfection or the knock down are the cause for any experimental changes. This lack of a treatment control is not acceptable.
Missing quantification of endogenous MarA: The authors use the transfection of an additional MarA-CFP to monitor variability in the protein level. This is insufficient to quantify the variability in endogenous protein level, as the variability from plasmid transfection can be vastly different from the variability resulting from differences in endogeneous gene expression.
Inclarity in the quantification of target protein response: Both target proteins were quantified based on the transfection with plasmids. The variability in plasmid transfection can play a big role here, contributing to or taking away from the mutual information between the reporter proteins. Therefore, this method to quantify coordination in the response seems unreliable.
Unclear explanation of the role of a faster-fluctuating activator: I can only “reverse-engineer” the authors’ logic in the second model analysis, carried out in Fig. 3. I think it simply means that the activator is produced faster, but also degraded faster. With the result of the same variance in activator levels, but faster fluctuations. The current explanation is not understandable, at least not for me. Also, it is not clear from the writing why this should lead to differences in mutual information level – even though this could also be explained quite simply: slower activator fluctuations allow the target proteins to adjust to the activator level, leading to coordination. Lastly, the explanations of the underlying quantitative and regulatory principles are confusing and in some cases so vague that I cannot understand what was done.
No quantification of the activator dynamics in the second main experiment: The authors draw conclusions based on the autocorrelation times in the CFP levels in the second main experiment. As this figures prominently in their argument, it is disappointing to not see an actual quantification of these parameters, especially as the data should allow this analysis.
For the adjustment of the MarA-CFP levels by IPTG, there are no supporting data.
Minor concerns
In Fig. 2 H, color overlays are shown to illustrate coordination vs. heterogeneity. I find it close to impossible to interpret these images. By my understanding, we should be shown two grayscale images of the two color channels, and then an overlay with two colors that are more easily distinguishable. Typically magenta and green work best.
Comments on the introduction
- DNA damage repair, gene expression, cell division, … are energetically costly. But I would not list these as “metabolic processes”.
- “it is still comparatively rare in bacteria due to their rapid growth rate, which produces stronger dilution effects.” There are probably also many situations where bacterial fitness is required when bacteria are not growing. Very sweeping statement, and I am unsure how accurate this is.
- I was wondering what is known in terms of an induction delay from MarA protein level increases to actual induction of the target genes? This could play a major role in passing MarA variability to target genes.
- The introduction of mutual information is rather confusing.