The joint estimation of uncertainty and its relationship with psychotic-like traits and psychometric schizotypy - npj Mental Health Research

To test these predictions, we focused on both direct behavioural readouts -- such as LR, performance error, and beam width -- and computational modelling parameters. The latter included: κx, indexing the coupling between higher and lower-level belief updating about the mean (i.e. volatility); κa, indexing coupling between levels for updating about variance (i.e. noise); and β1, reflecting the extent to which participants relied on their noise belief when adjusting beam width. Normative learning would be reflected by higher LRs and performance errors in volatile blocks, wider beam use in high-noise blocks, and appropriate calibration of these parameters to environmental structure. Non-normative learning, particularly in individuals high in psychosis-relevant traits, was expected to manifest as reduced LR calibration, inappropriate beam use, diminished β1, or elevated κx across all conditions -- potentially reflecting a belief that the environment is more volatile than it actually is.

Ethical approval was obtained using a REMAS (MRA-19/20-19444) via King's College London. Individuals were given information detailing the study and provided their informed consent via Gorilla. Data collected via Gorilla were compliant with EU GDPR2016 and Gorilla's "Data Processing Agreement" (consistent with NIHR guidance).

Across all experiments, data were subject to quality control screening, where a participant is deemed to not be attending to the task in the expected or necessary way. Exclusion criteria included: failure of technology, failure to move the beam and rover in at least 10% of trials (n = 21), and/or completion of at least 10% of trials in less than 1000 ms.

For Experiments 1 and 2, participant eligibility criteria included age between 18 and 40 years, UK-based, English as a first language, no history of head injury, no diagnosis of cognitive impairment/dementia, or autism spectrum disorder. Participants were recruited and assessed over a 7-day period. In experiment 2, psychometric schizotypy scores were collected using the Schizotypal Personality Questionnaire -- Brief (SPQ) in 1000 participants using an online platform Qualtrics. The top and bottom 10% of scorers were selected and asked to take part in further cognitive testing on the online platform Gorilla (see gorilla.sc).

Experiment 3 was conducted in person using participants who had previously taken part in Experiment 1 (n = 16) and Experiment 2 (n = 3). Participants were re-contacted through Prolific to participate in the on-campus experiment sessions. Recruitment and assessment took place over the course of 1 month at the Institute and Psychology, Psychiatry and Neuroscience.

Prolific Academic (see prolific.ac), an online crowd-sourcing platform, was used to recruit all participants anonymously from the UK. Participants were then invited to complete the experiment via Gorilla, an online behavioural research platform, with the option to access the tasks via a computer/laptop or mobile phone. Before proceeding with the study, participants provided informed consent and completed a baseline demographic survey. Participants were randomised either to first respond to the symptom questionnaires or to play the Space JUnc Task to minimise fatigue effects. Within the demographic questionnaire, participants answered a question on the amount of time that they gamed per week. This was because the research involved them playing a gamified cognitive task, the success of which may be influenced by their familiarity with gaming in general. During the experiment, two attention checks were implemented. During the attention checks, participants had to answer a simple question about the colour of a square. On completion of the study, participants were debriefed, and each received an average of £6.50 per hour spent on the tasks and an additional performance-dependent bonus payment up to £2 in the Space Game. They were also invited to share any feedback or comments that they might have regarding the study.

In the on-campus study, all participants used the same laptop to complete the experiment via Gorilla. Upon completion, they were awarded a £40 Amazon voucher, comprising £35 for attendance and a £5 performance-based bonus. Participants travelling from outside London also received additional compensation for travel expenses.

Psychotic-like experiences, specifically paranoia and delusional ideation, were assessed using the Green Paranoid Thoughts Scale (GPTS) and the Peters Delusion Inventory (PDI), respectively. The GPTS includes subscales for ideas of social reference and persecutory ideas, along with associated concern, conviction, and stress. The PDI evaluates endorsement, distress, conviction, and preoccupation related to various delusional beliefs. Psychosis susceptibility was operationalised as high psychometric schizotypy, measured through the SPQ. The SPQ assesses nine subscales based on DSM-III-R criteria, including odd beliefs, unusual perceptual experiences, and paranoid ideation. Sample 1 used the 22-item brief version, while sample 2 employed the full 74-item version. Depression and anxiety symptoms were evaluated using the 62-item Mood and Anxiety Symptom Questionnaire, which measures general distress, anxious arousal, and anhedonia. Depression-related and anxiety-related subscales were combined into single measures. Two questions concerning suicide and death were excluded due to ethical considerations. Symptom scales correlated with each other moderately (see Supplementary Fig. 3 and Supplementary Table 4).

Learning under uncertainty was assessed using the Jumping Uncertainty Task (Space JUnc Task), a novel gamified paradigm designed to probe participants' ability to infer and adapt to changes in environmental statistics. The task comprised 219 trials (intertrial interval: 1000-5000 ms), divided into two practice and four task blocks, each manipulating volatility and/or noise via changes to the underlying probability distribution of stimuli (see Supplementary Table 1 and Fig. 1B).

The first block required participants to pass a minimal threshold of success to continue. The second block introduced stable conditions (no mean shifts) with either high (SD = 0.12) or low (SD = 0.03) noise. Block three ("Noise block"; n = 40) maintained a fixed mean while manipulating noise alone. Block four ("Volatility block"; n = 40) introduced volatility by switching the distribution mean every ~5 trials (p = 0.25), with stable noise. Blocks five ("Combined Uncertainty"; n = 84) and six ("Bivalent Combined Uncertainty"; n = 40) varied both mean and SD approximately every 10 trials, featuring novel and previously seen means. These two blocks differed only in reward structure: the bivalent condition included potential point losses for incorrect predictions. The task was counterbalanced across participants (i.e. forward/reverse order of volatility and noise blocks). The exact means and SDs used, alongside the rules by which the distributions change, are displayed in Supplementary Table 1. These distributions were modelled after prior work. The task inputs (space junk locations) were sampled from these distributions. The sampled inputs were then themselves subject to simulation (iterated over 100,000 replications). From the simulated inputs, the most normally distributed distribution was chosen. This was to ensure that the sampled inputs were not non-normally distributed by chance, i.e. include outliers that would confuse the participants.

Participants aimed to "catch" falling space junk -- emitted from hidden spacecrafts -- by adjusting an electromagnetic rover's horizontal position (x axis) and beam width (y axis). Smaller beams conserved fuel and yielded higher (monetary) rewards, incentivising precise inference of the spacecrafts' location and variability. Limited information about the size, location, and frequency of the incoming unseen spacecraft was given at the start of every block (see Supplementary Table 2). On the first trial of each block, participants were shown where the last piece of space junk fell and instructed to use this space junk location to predict the location of the next piece of junk. The falling junk was presented on each trial for 5000 m/s. To indicate their prediction, participants adjusted the rover and beam (which would appear at a random location on the screen x axis at the start of each trial) using the arrow keys (or WDAS) either on their keyboards or on the screen (left/right for the rover; up down for the beam). Once the participant had indicated their prediction, they were shown where the junk in fact fell, i.e. a prediction error (which could subsequently be used to make a new prediction on the next trial).

Primary performance was indexed by trial-wise scores (0-10), integrating prediction accuracy (catch success) and beam precision (smaller beams rewarded). Beam width served as an auxiliary measure of inferred variance. Derived metrics included performance error (distance between participant estimate and true mean) and LR, quantified via the beta coefficients from a linear model predicting the signed prediction error (participant rover position - space junk position at t), from the signed rover position change (participant rover position t - participant rover position t).

Participant feedback was presented trial-wise in terms of success, beam efficiency, and cumulative score, with financial bonuses tied to performance. Bonus points were awarded for every 250 points, and each bonus point earned the participant an additional 50p on top of the payment for completing the experiment.

Analysis was carried out using R 4.2.1 GUI 1.79 High Sierra build (8095). Full analysis code can be made available. Linear mixed effects (LMEs) models were computed for the task's main outcome measures. The structure of the data included both within- and between-subject factors, including nested elements. As such, it was necessary for the model to include multiple nested and non-nested random effects. The R package "lme4" was used, and random slopes and intercepts were fit for relevant variables, to account for the structure of the data. Specifically, all models included random effects of (i) participant and (ii) the counterbalancing condition (forward, reverse, priming block high or low) that were nested within block (not including practice blocks). This was to (i) inform the model that each participant's data was interrelated within each subject, but different between subjects, and (ii) to inform the model that, for example, data from the counterbalancing condition forward was allowed to differ from the reversed (and so on). Fixed effects were fit for each dependent variable using model selection, and therefore small variations in the included fixed effects between models. These are fully described in the Supplement.

Each task block represented a different learning environment. To compare metrics between each learning environment, backwards contrasting methods were employed. As such, each block was statistically compared to the block before, i.e. the Volatility Block was compared to the Noise Block; the Combined Uncertainty Block compared to the Volatility Block. The final block included a shift in reward policies. Here, the Bivalent Combined Uncertainty Block was compared to the Combined Uncertainty Block. In line with all other analyses, back-contrasting methods were also used to compare extracted computational parameters. However, as priors for the noise and volatility block were not estimated for the kax and kaa parameters, respectively, LME contrasts for the model set for kax did not include the Noise Block, and the model set for kaa did not include the Volatility Block.

Model selection was carried out to identify the best model for each metric. To directly compare the models, an information theoretic approach (Akaike information criterion (AIC)) was used. This approach is noted for its conservative approach due to the penalties it imposes on model complexity, such as additional fixed effects, as well as its ability to compare nested models. To enable this comparison of models, given that we wanted to compare models with differing numbers of fixed effects, and that the data was nested, the log likelihood was used. However, Restricted Maximum Likelihood (REML) was the choice of estimation method used once the best model had been identified. The reported results were read from the models that had the lowest AIC values of the models compared, including the simplest (null) model (fixed effects set to 1), showing that the simpler models were the least informative. A chi-square difference test was also used to statistically test whether the models were significantly improved from baseline, i.e. the simplest model. Chi-square tests for model selection are used when comparing nested models, i.e. where the differences between models lie in the number of fixed effects. Details of the model selection process for the linear mixed models and outcomes for each model set are in Supplement (Supplementary Tables 5-11 for Experiment 1; Supplementary Tables 12-18 for Experiment 2; and Supplementary Tables 20-23 for Experiment 3).

Given this statistical approach, overall, there were seven model sets, each trying to predict a different dependent variable of interest. Model comparison was conducted within model sets (see Supplement for full model comparison details and results). As this process produced a risk of false positive results, global false detection rate (FDR) corrections were applied. As such, p values from all comparisons across models or sets were combined into a single list, and FDR correction was applied globally. This approach is more conservative, meaning it is more likely to adjust p values upwards, leading to fewer significant results (i.e. more false negatives but stronger error control). This approach was appropriate because we wanted to control the FDR across the entire experiment and ensure no inflation of false discoveries overall.

To reflect the parallel learning demands of the task, our computational modelling approach was designed to independently estimate learning about both the mean and variance of the input distribution. As such, we used a computational modelling approach that was able to independently capture learning about the mean and variance of an input. This was achieved by using an extension of the standard HGF model (C. refs. ) (perceptual model). In short, the HGF is an approximation of Bayesian learning but based on (prediction) error-based learning rules. Crucially, higher-level beliefs determine the weight that is given to lower-level prediction errors, which again affect higher-level beliefs. For instance, if a subject believes the environment to be stable (volatility belief is low), prediction errors are thereby "explained away" and will not lead to a strong update on lower levels. Respectively, high updating takes place when volatility is believed to be high. For a more in-depth explanation of the HGF, please see the original publications.

The HGF JGET model (as implemented tapas-toolbox of the TNU project) was applicable to our paradigm because it was characterised by a 2-branched HGF that "learns" about (1) the current mean value (x1) and (2) its variance (α1) of the input stream (u). It is -- of course -- also hierarchical; there are volatility "parents" for the variation of mean values (x2) and the variance (α2) (see Fig. 4A). As in the "classic" HGF applications, there are the same free parameters that capture inter-individual differences, e.g. κ for the coupling between levels or the level-wise ω determining the constant step-size for trial-wise updating. Unlike in the previous HGF applications, both streams for learning about x and α are combined in an additional, common level that brings together beliefs about x1 and α1 relevant for trial-by-trial behaviour. Further, each stream has its own set of parameters (e.g. kax for mean learning, and kaa for SD learning). In the decision model (modified version of the 'tapas_gaussian_obs'-script;, we used the belief trajectories to predict the following trial-by-trial responses: belief about mean x1 → rover position and β1 weighting the belief about noise α1 → beam width. We introduced this additional free coefficient parameter β1 capturing the general tendency of an individual to use the acquired knowledge (belief) for their beam width adaptation (see Fig. 4). We did not fit an equivalent regression coefficient for weighting the effects on the belief about the mean on the rover position, as there was no variance for this parameter when setting up the modelling. This might indicate that subjects directly translated their mean belief into positioning the rover, while the translation of noise inference into beam widths might have been more complex. Note that within the Matlab-based HGF toolbox subjects are fitted individually based on a quasi-Newton algorithm.

To assess the test-retest reliability of the key outcome measures of our task, intraclass correlations (ICCs) were calculated for overall score, beam width, prediction errors, performance errors, and LRs. The average amount of time between the online and in-person assessments was 14.32 months, with a range of 11.60-27.20 months. ICCs were computed using a two-way mixed-effects model to assess absolute agreement among single measurements. To further assess the validity of our learning measures, we compared data from our in-person sample with performance on the probabilistic reversal learning (PRL) task by Reed et al., which was collected in the same experimental session. This task was selected because reversal learning and related instrumental learning tasks have consistently shown alterations in individuals with psychosis, making them strong reference points for our volatility-related measures -- unlike tasks that more directly target learning about means or noise. Despite differences in format (categorical choices in PRL vs continuous predictions in our task), both tasks require belief updating in dynamic environments. Whereas the PRL manipulates uncertainty via probabilistic feedback, our task presents it explicitly through noise and volatility. We focused on PRL block 2 (80-40-20% reward probabilities), which most closely resembles the volatility and combined uncertainty conditions in our task. Given that the PRL assesses strategy-level adaptation (e.g. win-switch, lose-stay rates) and does not include an explicit noise component, we focused on our volatility-related outcome measure: LR. As noise is continuous and embedded in all levels of our task, but block-specific and probabilistic in the Reed task, overall performance reflects different types of uncertainty. Therefore, comparisons are most appropriate at the strategy level, although we have included performance correlations for completeness.

Rapid Reads News

The joint estimation of uncertainty and its relationship with psychotic-like traits and psychometric schizotypy - npj Mental Health Research

POPULAR CATEGORY

corporate

entertainment

research

misc

wellness

athletics