Learning ecosystem-scale dynamics from microbiome data with MDSINE2 - Nature Microbiology

Open-source computational tool for learning microbiome dynamics

To facilitate inference of accurate and interpretable large-scale dynamical systems models from microbiome timeseries data, we developed MDSINE2 (Fig. 1). Inputs to the open-source software package are: (1) timeseries measurements of bacterial abundances in the form of counts (for example, 16S rRNA gene amplicon or shotgun metagenomics data), (2) total bacterial concentrations (for example, from 16S rRNA gene qPCR measurements) and (3) associated metadata for the samples. The software also provides a variety of tools for interpreting the model that it learns from data, including plotting trajectories of taxa, analysing topological properties of the interaction network, quantitating the predicted ecological importance of individual taxa or modules ('keystoneness') and formally assessing the stability of the microbial ecosystem (Fig. 1e-g).

MDSINE2 uses a probabilistic machine learning model based on generalized Lotka-Volterra (gLV) dynamics, with several key innovations over state-of-the-art methods. First, MDSINE2 employs a fully Bayesian probability model that explicitly models the measurement uncertainty associated with microbiome sequencing and bacterial concentrations (Fig. 1d). An advantage of this approach is that MDSINE2 provides quantitative measures of uncertainty (Bayes factors, which are a Bayesian alternative to P values) for all model parameters that can be used to interpret the confidence of predictions and prioritize downstream analyses. Second, MDSINE2 includes stochastic effects in dynamics to capture random fluctuations in microbial trajectories that occur due to unmeasured effects on the ecosystem. Third, MDSINE2 extends the gLV model to automatically learn 'interaction modules' (Fig. 1b), which we define as groups of taxa that share common interaction structure (that is, are promoted or inhibited by the same taxa outside the module) and have a common response to external perturbations (for example, antibiotics). Interaction modules are motivated by both empirical observations that groups of microbial taxa covary and theoretical ecology concepts such as guilds (groups of taxa that utilize resources in a similar way). Modular structure (Fig. 1b,c(ii)) reduces the complexity of the system to be analysed, which increases interpretability and also enables scalability by reducing the number of parameters in the model from order quadratic in the number of taxa (that is, all potential pairwise interactions between taxa in the gLV equations) to order quadratic in the number of modules (which scales logarithmically with the number of taxa). The number of modules is treated probabilistically with full uncertainty quantification and learned from the data, alleviating the need for the user to pre-specify this information. See Methods section 'MDSINE2 model' and Supplementary Text 1 for further details on the model. Details on model inference can be found in Methods section 'Case-study model inference' and Supplementary Text 2. Formal sensitivity analysis of model hyperparameters can be found in Supplementary Text 3.

Given the importance of sufficient temporal resolution and perturbations for inferring dynamical systems models from data, we generated two new datasets to serve as benchmarking and analysis resources for the community. The data were generated from two cohorts of 'humanized' germ-free mice (Fig. 2 and Extended Data Fig. 1) that underwent faecal microbiota transplantation from a healthy human donor (n = 4 mice) and a donor with ulcerative colitis (n = 5 mice). After an equilibration period of 3 weeks, mice were subjected to a sequence of three perturbations (high-fat diet (HFD), vancomycin and gentamicin). These perturbations were chosen because they differentially affect components of the microbiome (for example, high-fat/simple-carbohydrate versus complex-carbohydrate utilizers and bacteria susceptible or resistant to different antibiotics). Mice were separately housed, and faecal samples were collected over a 65-day duration, with an average of 76 samples per mouse (Fig. 2a and Extended Data Fig. 1a), resulting in a total of 686 faecal samples. The samples were interrogated for relative abundance via 16S ribosomal RNA (rRNA) amplicon sequencing (Fig. 2d,e and Extended Data Fig. 1d,e) and total bacterial concentration via qPCR using a universal 16S rDNA primer (Fig. 2b and Extended Data Fig. 1b). The resulting ~50 million sequencing reads were bioinformatically processed using DADA2 (ref. ) and filtered to yield high-quality timeseries information for dynamical systems inference tasks (see Methods for details), resulting in 141 amplicon sequence variants (ASVs) in the healthy cohort and 121 ASVs in the ulcerative colitis/dysbiotic cohort. See Supplementary Text 4.1 for further information about basic taxonomic composition and other standard analyses of these datasets (Fig. 2c,f and Extended Data Figs. 1c,f, 2 and 3). For brevity, and because the healthy cohort harboured more taxa and greater microbial diversity, we focus primarily on examples from this cohort in the main manuscript.

We evaluated MDSINE2's performance against state-of-the-art methods on our high-temporal-resolution datasets, using a standard metric in the field, forecasting of held-out microbial dynamics, which does not require ground-truth information and thus allows for benchmarking on real data. Specifically, we employed a one-subject-hold-out training and testing methodology. All data from one mouse in a cohort were held out while the model was trained on the remaining data from the other mice in that cohort. Then, the model forecasts all taxa trajectories for the entire timeseries (except for the first timepoint) using the measured microbial abundances on the first timepoint in the held-out mouse as the initial condition. We evaluated performance using root-mean-squared error (RMSE) of log abundances over the timeseries, a measure of the difference between the predicted and ground-truth measurement. Our experimental data included measurements of total microbial concentrations, which are formally necessary for inference of standard gLV models, including the state-of-the-art ridge regression (gLV-L2) and elastic-net regression (gLV-net) methods. To assess the impact of interaction modules on model performance and to more directly compare our method to the state-of-the-art methods, such as gLV-L2 and gLV-net which do not infer modules, we also included MDSINE2 without interaction modules (MDSINE2) as a comparator method.

MDSINE2 and MDSINE2 significantly outperformed the two gLV comparator methods that were trained on microbial concentrations for both the healthy and dysbiotic cohorts (Fig. 3a(i),b(i)). MDSINE2 showed slight but statistically significant, better forecasting accuracy over MDSINE2, consistent with our previous finding that model constraints can impact forecasting performance to some extent. However, it is notable that MDSINE2 uses vastly fewer parameters than the other methods, including MDSINE2: for the total concentration forecasting task on the healthy cohort, for example, MDSINE2 used only 272 parameters as opposed to 19,740 for MDSINE2, a >72× reduction. Given that the actual gap in performance between MDSINE2 and MDSINE was quite minor, our results suggest that the much more compact dynamical system representation learned by MDSINE2 still captures the system behaviour quite accurately. We additionally assessed performance of versions of MDSINE2 and comparator methods on relative abundance (RA) or reads-only (RO) data (that is, not including bacterial concentration information). These results demonstrated that RO versions of MDSINE2 also significantly outperformed all the comparator methods (Fig. 3a(ii),b(ii) and Supplementary Text 4.2).

We sought to assess MDSINE's ability to recover underlying dynamical systems' parameters and interaction network topologies. This type of analysis requires ground-truth information, which is unavailable for the microbiome. We thus benchmarked our method on fully synthetic and semi-synthetic data. For fully synthetic data, we used a benchmarking standard with 10 taxa that we previously published and found that MDSINE2 accurately recovered the dynamical system, and moreover significantly outperformed state-of-the-art methods, including our previous MDSINE method, on all metrics (Extended Data Fig. 4).

For realistically sized microbiome ecosystems, no established benchmarking dataset exists and theoretical principles needed to construct a realistic synthetic microbial ecosystem at this scale remain an active area of research, so we developed a semi-synthetic data generation procedure. Briefly, we used the parameters of the dynamical systems model inferred by MDSINE2 on the healthy cohort as ground-truth information (Fig. 4b) and forward simulated trajectories for the 141 taxa from the model to create a fully observed dataset. We then created corrupted versions of the dataset, with simulated measurement noise added to generate sequencing and qPCR measurements, as well as downsampling of the number of observed timepoints (Fig. 4a, see Methods section 'Semi-synthetic data and benchmarking' for complete details). The corrupted datasets were then used to assess the ability of different methods to recover parameters of the underlying dynamical system as shown in Fig. 4 and Extended Data Fig. 5. We assessed MDSINE2's performance with or without qPCR data as input, as well as performance of the two gLV models and their relative-abundance (RA)-only counterparts. Note that the scale of interactions and perturbation strengths are not identifiable without bacterial concentration measurements. Thus, to enable comparisons between methods on these parameters, we used a popular scale invariant performance metric, the Spearman rank correlation. Higher values of the Spearman rank correlation represent stronger relationships, and a value of zero represents no relationship (random chance). To assess performance on binary inferences (presence/absence of interactions or perturbations and co-clustering of taxa into interaction modules), we used area under the receiver operator characteristic curve (AUC-ROC). An AUC-ROC of 0.5 indicates random chance, with higher AUC-ROC values indicating better performance.

Overall, we found that MDSINE2 accurately recovered key properties of the underlying ground-truth dynamical system (Fig. 4 and Extended Data Fig. 5). On the full timeseries, MDSINE2 accurately recovered microbial interactions both in terms of presence/absence and strength, with a median AUC-ROC of 0.91 and a Spearman rank coefficient of 0.53, respectively (Fig. 4c,e). In addition, MDSINE2 showed strong performance in accurately predicting when two taxa came from the same module (median AUC-ROC of 0.76 on the full timeseries, Fig. 4d). With module learning off, reducing the number of temporal samples, or without qPCR measurements, MDSINE2's performance degraded. The gLV-L2 and gLV-net methods were essentially unable to recover the interactions under either metric even with the full timeseries available. MDSINE2 and MDSINE2 additionally significantly outperformed all other methods in recovering growth rates, perturbation strengths and perturbation presence/absence, except for the most sparsely sampled regime with 75% of the timepoints ablated (Extended Data Fig. 5). As with recovering interactions, reducing the number of temporal samples or removing the qPCR measurements typically reduced MDSINE's performance. We note that while including modules significantly improved MDSINE2's performance for recovering interactions, this was not always the case for growth rates or perturbations in the scenarios assessed. Overall, these results indicate that MDSINE2 can accurately recover its underlying dynamical systems model from data that has realistic levels of measurement noise and numbers of observed timepoints, and models trained on relative abundances/reads alone significantly underperform compared to models that include bacterial concentration estimates in nearly all scenarios.

To demonstrate the utility of the MDSINE2 software package for deriving biologically relevant information on ecosystem-scale microbiome dynamics from timeseries data, we performed an analysis of the healthy cohort data. MDSINE2 discovered 17 interaction modules (Fig. 5), ranging in size from 1 to 35 taxa, and connected through 56 interactions predicted with 'decisive evidence' (Bayes factor (BF) ≥ 100, Fig. 5c). This represents a 97% reduction in interaction parameters over MDSINE2 (2,179 edges predicted with 'decisive evidence'; Extended Data Fig. 6), with nearly comparable forecasting performance as described above. As a basic measure of the biological relevance of modules, we evaluated the relatedness of taxa within modules and found that modules showed statistically significant enrichment for phylogenetic and taxonomic signals (Extended Data Figs. 7 and 8, and Supplementary Text 4.3.1).

To quantitatively evaluate the relative importance of each interaction module in the ecosystem, we performed a module-level keystoneness analysis (Fig. 5d and Methods section 'Keystoneness'). In ecology, keystone taxa are defined as fundamental to the integrity of the ecological community, and have been suggested as drivers of microbial community structure and function. Here we extend the concept to groups of taxa (modules) and also generalize to a quantitative measure of 'keystoneness' with both positive and negative values. Positive keystone modules ('promoters') are those that, when removed, result in a reduction in the microbial abundances of the other members of the ecosystem; negative keystone modules ('suppressors') are those that, when removed, result in increases in abundances of the other members of the ecosystem. The magnitude of the keystoneness measure thus represents the degree of community-wide disruption (in terms of microbial abundance change, with the removal of the module).

For our cohort, the top positive and negative keystoneness modules were M3 and M4 respectively. Investigating their role through the ecological network, we see that all the outgoing edges of M3 are promoting, while all the outgoing edges of M4 are repressive, suggesting the different ecological roles that these modules play in the network. M3 is enriched for the family Ruminococcaceae. Promoting M3 are two other positive keystoneness modules M11 and M12, each containing taxa capable of degrading resistant starches (Ruminococcus bromii ASV95, Gemmiger ASV220) and others with butyrate production capabilities (Faecalibacterium ASV77, Butyricicoccus faecihominis ASV124, Butyricicoccus ASV115). Downstream modules being promoted by M3 of note are M10 (enriched for Bacteroidaceae), M13 (enriched for Bacteroidetes and the largest module in the network) and M14 (enriched for Lachnospiraceae). One explanation for this structure, consistent with known biology, is that the positive keystoneness modules are connected in a cross-feeding chain beginning with specialized starch-degrading taxa that ultimately support the more abundant generalist taxa (for example, Bacteroidaceae). In contrast to this specialist-to-generalist structure, the module with the highest negative keystoneness, M4, contains a diverse group of taxa and also suppresses multiple modules in the network, including M3 and M11, the top two positive keystoneness modules, as well as the primarily Gram-negative modules M10 and M13. An annotated module network is provided in Extended Data Fig. 9.

To assess the overall robustness of our inferred ecosystem to external perturbations, we performed a formal stability analysis, which showed that the dynamics of the model inferred were 80% more likely to be stable than by chance (Fig. 6b and Supplementary Text 4.3.2). We next sought to identify features of the ecological network inferred by MDSINE2 that could explain this remarkable stability. Stability and control theory have established that the feedback cycle is the core topological feature driving stability. Pairwise interactions, the simplest form of feedback cycles, have particular interpretations in ecology, and their contributions to stability are well characterized for linear and gLV dynamical systems: mutualism (+,+) and competition (-,-) are destabilizing, and parasitism (+,-) is stabilizing. For length three cycles and higher, more complex ecological interactions arise, and any sign combination is potentially destabilizing (Fig. 6a). For all cycle lengths analysed, we found that MDSINE2's inferred model of dynamics had a significantly lower number of cycles than expected by chance (Fig. 6c).

We next sought to understand the influence of uncertainty in the inferred network structure itself on stability estimates. To gain insight into this phenomenon, we evaluated networks at different levels of evidence for edges: 'substantial' [BF ≥ 10], 'strong' [BF ≥ 10] and 'decisive' [BF ≥ 100] evidence (Fig. 6d(i) and Supplementary Text G). As the evidence threshold for an edge being included in the model was decreased, the number of edges in the network increased from 56 to 163 (Fig. 6d(ii)). As the number of edges increased, there was also an increase in the number of (two) cycles, as expected. Interestingly, for networks with more edges, the number of parasitism (+,-) cycles increased disproportionally among two cycles present, consistent with the property that stability becomes less likely the denser a network becomes (the more edges there are for a fixed number of nodes), unless the cycles in the network are only parasitism (Fig. 6a). Previous work has also examined the role of mutualism and competition cycles, hypothesizing that for healthy ecosystems, the mutualism to competition ratio (MCR: (+,+)/(-,-)) would be less than one, and demonstrating this phenomenon on networks inferred on small microbial ecosystems. Our results provide support for this hypothesis, showing that the mutualism to competition ratio was significantly lower than chance on networks with strong evidence for the existence of edges (Fig. 6d(iii)).

Rapid Reads News

Learning ecosystem-scale dynamics from microbiome data with MDSINE2 - Nature Microbiology

POPULAR CATEGORY

corporate

entertainment

research

misc

wellness

athletics