In this study, we conducted a prospective clinical cohort study to enroll patients with PD and type 2 DM to test the hypothesis that patients with concurrent PD and DM share similar features of gut microbiota dysbiosis and disease pathogenesis.
Procedures involving participant sampling and experiments were reviewed and approved by the Institutional Review Board of China Medical University Hospital (CMUH111-REC1-170 for healthy control [HC] subjects, CMUH111-REC2-209 for PD participants, and CMUH111-REC3-204 for DM patients), in accordance with ethical standards for research involving human subjects. All participants signed the informed consent before being enrolled in the study.
We conducted a prospective clinical cohort microbiota study at China Medical University Hospital (CMUH), a major academic healthcare center in Taichung, Taiwan. Neurologists specializing in movement disorders enrolled 40 patients with PD, based on the United Kingdom Parkinson's Disease Society Brain Bank clinical diagnostic criteria. Detailed clinical data, including age, sex, duration of PD, Hoehn and Yahr stage, off-state Unified Parkinson's Disease Rating Scale (UPDRS, recorded by video and analyzed by a nurse specialized in movement disorders), probiotics use and comorbidities, were collected. Additionally, an endocrinologist recruited a prospective cohort of patients with type 2 DM at the same time. A total of 172 participants with DM were enrolled through the endocrinology outpatient clinic. We excluded participants who were younger than 20 or older than 80 years, had end-stage renal disease (estimated glomerular filtration rate < 15 mL/min/1.73 m²), or had a history of hospitalization for heart failure, acute kidney injury, or renal replacement therapy within the 12 weeks prior to enrollment. For DM patients, we also excluded those who had used laxatives, prebiotics, probiotics, or antibiotics within the 8 weeks preceding stool sample collection. Ninety-eight age and sex-matched healthy control subjects were recruited from individuals visiting our hospital for health examinations. We defined the healthy controls as adults aged 18 and above, with no history of chronic diseases (such as PD, DM, hypertension) and no recent use of prescription medications, especially antibiotics. Among patients with concurrent diagnoses of PD and DM, eight were from the PD cohort and two were from the DM cohort, with diagnoses confirmed by both a neurologist and an endocrinologist. Due to the exploratory nature of our study, no statistical methods were used to pre-determine the sample sizes.
The fecal sample collection kit was provided by the Department of Laboratory Medicine at China Medical University Hospital (CMUH). This sterile kit is specifically designed for microbiome preservation. Participants collected their fecal samples at home and returned them to the hospital at room temperature within 48 h. The sampling procedure has been approved by the Ministry of Health and Welfare, Taiwan, under the Laboratory Developed Tests (LDTs) certification, with the approval number 2022LDT0023. Upon receipt, samples were immediately stored at -80 °C until further processing.
Fecal DNA was extracted using the QIAamp PowerFecal Pro DNA Kit (QIAGEN N.V., Hilden, Germany) along with the QIAcube HT (QIAGEN N.V.), an automated nucleic acid extraction system that reduces the risk of contamination. Genomic DNA was isolated from the fecal samples, and the concentration was measured using the Qubit 3.0 Fluorometer (Qubit™ dsDNA HS Assay Kit, Invitrogen, Life Technologies, Carlsbad, CA, USA). Samples were stored at -80 °C until library construction. Prior to using the commercial kit, a positive control test was performed using the 20 Strain Even Mix Genomic Material (MSA-1002™, American Type Culture Collection, Manassas, Virginia, USA), with an expected concentration range of 2.69 ng/µL to 4.48 ng/µL (https://www.atcc.org/products/msa-1002). In our laboratory, aseptic water was used as a negative control (Table S1). Effective results were defined as having at least 8,000 to 10,000 reads per stool sample.
The 16s rRNA gene was amplified for PacBio (Pacific Biosciences, Menlo Park, CA, USA) barcoding using the primers 27F (5'-AGRGTTYGATYMTGGCTCAG-3') and 1492R (5'-RGYTACCTTGTTACGACTT-3'). The PCR reaction was carried out in a total volume of 25 µL, using KAPA HiFi HotStart 2X ReadyMix PCR Reagent (Kapa Biosystems, Woburn, MA, USA), 0.375 mM of each primer, and 500-1000 ng of nasal sample DNA as the template. The thermal cycling conditions were as follows: initial denaturation at 95 °C for 3 min, followed by 25 cycles of denaturation at 95 °C for 30 s, annealing at 57 °C for 30 s, and extension at 72 °C for 1 min. A final extension was performed at 72 °C for 1 min.
PCR products were first evaluated via 0.8% agarose gel electrophoresis and then quantified using the Qubit 3.0 Fluorometer (Qubit™ dsDNA HS Assay Kit already appeared). The quantified PCR products were purified using an equal volume of AMPure PB beads (PacBio, Menlo Park, CA, USA).
Purified PCR amplicons (500 ng) were used as templates to construct the SMRTbell library, following the protocol provided in the SMRTbell Express Template Prep Kit 2.0 (PacBio). After end-repair, adapters were ligated to the ends of the sequences. SMRT sequencing was then carried out on the PacBio Sequel IIe system using the SMRT 8 M Cell v3 (PacBio). Primary filtering analysis was performed on the Sequel system, while secondary analysis followed the standard procedures in SMRT Link 9.0.
Full-length 16 S rRNA gene analysis was performed using pb-16s-nf (version 0.7, Pacific Biosciences of California, Inc.). The sequenced reads were processed using the quantitative insights into microbial ecology (QIIME 2, ver: 2023.2 Taxonomy was then assigned at 99% similarity based on the SILVA taxonomy and reference database (SILVA_138SSURef NR99 full-length.) and a rooted phylogenetic tree built using the "align-to-treemafft-fasttree" pipeline from QIIME 2. For validation, the dataset was additionally re-analyzed with the most recent QIIME 2 version 2025.7, and the results were consistent with those obtained using version 2023.2. Outputs from the re-analysis are provided in the Supplementary Figures.
After quality control and processing, we obtained a total of 11,491,321 sequence reads from 310 samples, with an average of 37,069 ± 31,716 reads per sample. The resulting data were visualized using QIIME 2 software.
We utilized QIIME2 software to perform both α-diversity (within-sample) and β-diversity (between-sample) analyses. For α-diversity, we calculated various metrics, including observed features, Shannon diversity, evenness, and Faith's phylogenetic diversity. For β-diversity, we employed metrics such as Bray-Curtis, weighted and unweighted UniFrac, and Jaccard distance, along with emperor plots for visualization. Additionally, we applied permutational multivariate analysis of variance (PERMANOVA), a distance-based analysis of variance method based on permutation, to assess associations between groups. Metrics with statistically significant p-values were reported following rarefaction.
We applied the non-parametric Kruskal-Wallis (KW) sum-rank test to identify features with significantly different abundances across the class of interest. Biological consistency was then assessed using pairwise comparisons among subclasses via the unpaired Wilcoxon rank-sum test. Finally, linear discriminant analysis (LDA) effect size (LEfSe) was used to estimate the effect size of each differentially abundant feature. This approach supports high-dimensional class comparisons, particularly in metagenomic analyses. Class comparison methods typically identify biomarkers as features that deviate from the null hypothesis of no difference between classes. Additionally, we detected a subset of features with abundance patterns consistent with an algorithmically encoded biological hypothesis and estimated the magnitude of significant variations.
At the time of stool collection, peripheral venous blood samples (20 mL) were concurrently obtained from each participant for biochemical and metabolomic profiling. Blood draws were performed in the morning, between 05:00 and 11:00 h, following an overnight fast of at least 8 h to minimize diurnal and dietary variability in circulating metabolites. Samples were collected in serum-separating and ethylenediamine-tetraacetic acid (EDTA)-containing tubes, promptly transported to the central laboratory of China Medical University Hospital, and processed within 1 h of collection. Serum and plasma were separated by centrifugation at 3,000 × g for 10 min at 4 °C, aliquoted into cryovials, and immediately stored at - 80 °C to prevent metabolite degradation. Biochemical parameters, including serum creatinine and glycated hemoglobin (HbA1c), were measured using standard enzymatic and immunoassay methods in the hospital's clinical laboratory, which is accredited by both the Taiwan Accreditation Foundation and the College of American Pathologists (CAP). All analyses were performed strictly in accordance with standardized operating procedures for clinical diagnostics. Quality control samples and internal standards were incorporated to ensure analytical reproducibility, inter-assay consistency, and accuracy across all measurement batches.
We used MATLAB (R2024a, Natick, Massachusetts), Python (v3.11.8) with NumPy (v1.24.0), pandas (v1.5.3), and SciPy (v1.14.1) to calculate p-values and confidence intervals. Independence, normality, and homogeneity of variances were assessed for each variable to select the appropriate statistical test. For example, for age across four groups, the Shapiro-Wilk test indicated non-normality in two groups, and Levene's test showed unequal variances; therefore, the non-parametric Kruskal-Wallis test was applied.