A multi-paradigm EEG dataset for studying upper limb rehabilitation exercises - Scientific Data

After data collection, the collected data were preprocessed using the EEGLAB toolbox in MATLAB (R2023b). First, all behavioral samples from each subject were merged. The merged data were then subjected to downsampling, baseline correction, filtering, segmentation, and manual removal of bad segments. Following preprocessing, the original data, with a sampling rate of 1000 Hz per second, were downsampled to 512 Hz per second. The data were filtered within a frequency range of 0.1 Hz to 100 Hz. After preprocessing, the ".set" files contain EEG structure variables, with key fields such as: the data field formatted as (channels × time points × trials) and the event field containing event labels and their onset timestamps.

To verify the validity of the dataset, focus on the following four aspects:

To ensure that event markers were temporally aligned with the onset of task-related EEG data, a U3D-based system was employed. Based on predefined conditions for each scenario, the system transmitted different event types via a serial port to the event buffer of the Neuracle data acquisition software. In paradigms such as motor execution and motor imagery, all experimental scenes were presented with a black background. When a task cue appeared, its onset time and type were simultaneously recorded within the Neuracle system. This synchronization between task events and data acquisition was achieved using a multi-parameter synchronizer provided by Neuracle. When the display computer detected a specific stimulus related to a motor paradigm, it triggered the synchronizer to send a timestamp signal to the data acquisition computer. All timestamps were stored in a.bdf file and used to align physiological recordings with experimental events, thereby ensuring synchronized data collection. It should be noted that in the paradigms involving mirror movement and mirror-assisted movement, auditory cues were used to prompt the subjects. As individual responses to auditory stimuli varied slightly, the event markers may have a minor deviation from the actual onset of the movement.

This section validates the effectiveness of two EEG datasets related to motor imagery tasks by calculating event-related desynchronization (ERD) and event-related synchronization (ERS). The study focuses on left-hand and right-hand grasping tasks, primarily analyzing EEG activity from 59 channels. The calculation method involves applying a bandpass filter (selecting the 8-13 Hz alpha band, as this band contains the mu rhythm closely associated with somatosensory and motor activities, thereby enabling more precise extraction of neural oscillatory activities related to sensorimotor functions) to the simply preprocessed data. Subsequently, continuous wavelet transform is applied to compute power values during the task period (i.e., the motor imagery phase after the cue appears) and the baseline period (i.e., the resting state before the task). The ERD/ERS percentage is then calculated according to the formula (task power - baseline power)/baseline power × 100%, and finally visualized intuitively via scalp topographical maps. For the baseline period, the power values of this segment were log-transformed and then converted into scalp topographic maps for visualization.

Two types of data were calculated. As shown in Fig. 11. The first type (cross-subject analysis): displays the average power topographic maps of the baseline period (-1-0 s) for all subjects, and also calculates the average ERD/ERS topographic maps for different task time windows (0-1 s, 1-2 s, 2-3 s, 3-4 s). This part aims to present the baseline activity and the dynamic evolution of neural activity along the task progression across subjects. The second type (single-subject analysis): we randomly selected four subjects (S201, S210, S215, S228), and displayed their respective baseline power topographic maps and corresponding ERD/ERS topographic maps (the baseline period here is uniformly -1-0s, and the task period is uniformly 0-2 s). This part aims to intuitively present the individual differences in baseline activity and task responses, as well as the general signal quality.

To assess whether the collected dataset can distinguish between different task states, we conducted a classification experiment on the EEG data of each subject after preprocessing. The methods used for classification include CSP + SVM and FBCSP + SVM. CSP (Common Spatial Pattern) is a classical method for feature extraction in EEG signals. The basic idea of the CSP method is as follows: (1) For each class of data, compute the covariance matrix and normalize it. Then, perform generalized eigenvalue decomposition on each class's covariance matrix to obtain spatial filtering matrices and eigenvalues. (2) Sort the eigenvalues in descending order, select the most significant eigenvectors, and apply a whitening transformation to ensure eigenvalue stability. (3) Apply the whitening transformation to the covariance matrix and perform feature decomposition to obtain the final CSP transformation matrix. (4) Finally, apply the CSP transformation matrix to each trial's data and calculate the log variance as features, thus extracting the most discriminative features. The FBCSP (Filter Bank CSP) method is an extension of the CSP method, which enhances the representational capacity of features by introducing frequency band division and filtering. Its basic principles are as follows: (1) Frequency band division: The raw EEG signal is first divided into multiple sub-bands. This study utilized six frequency bands covering 4-28 Hz (4-8 Hz, 8-12 Hz, 12-16 Hz, 16-20 Hz, 20-24 Hz, 24-28 Hz). (2) Sub-band feature extraction: The CSP method is applied to each sub-band to extract its features. (3) Feature selection: From the features generated by each frequency band, the top two and bottom two CSP features are selected, forming a 24-dimensional feature vector. Compared to the traditional CSP method, FBCSP decomposes the EEG signal into multiple frequency bands and applies CSP separately in each sub-band, ultimately producing a richer set of log-variance features, thereby improving classification accuracy and robustness.

Support Vector Machine (SVM) is then used to classify the generated feature vectors from the two classes. For each experiment, 10-fold cross-validation was performed on the dataset, and the average result was taken as the final classification accuracy.

Figure 12 presents the specific classification accuracy results (n = 28). In the CSP + SVM model, the average accuracies (mean ± standard deviation) for experiments G1 to G6 (G1: flexible rehabilitation glove, G2: motor execution, G3: motor imagery, G4: virtual reality motor imagery, G5: mirror glove, G6: mirror therapy) were 53.12% ± 13.22%, 42.37% ± 9.34%, 43.73% ± 8.25%, 53.37% ± 12.22%, 68.20% ± 11.35%, and 67.99% ± 10.87%, respectively. In comparison, the FBCSP + SVM model achieved higher accuracy across all tasks, with results of 79.11% ± 9.56%, 69.03% ± 10.71%, 70.77% ± 11.83%, 77.20% ± 9.63%, 86.59% ± 10.98%, and 85.88% ± 9.75%, respectively.

As shown in Table 2, the classification accuracy was validated across multiple subjects. The method used for classification was to apply the SVM to the feature vectors generated by FBCSP. Figure 13 presents the confusion matrix results for multiple subjects.

To evaluate the relative effectiveness of our dataset, we referenced another motor imagery EEG dataset involving healthy subjects (n = 52), which reported an average binary classification accuracy of 67.46% ± 13.17% using the CSP + FLDA method. The performance of the CSP + SVM model used in our study for binary classification tasks was G5: 68.20% ± 11.35%; G6: 67.99% ± 10.87%, showing highly similar accuracy rates. Furthermore, for the three-class tasks (G1-G4), the accuracy ranged from 42.37% ± 9.34% to 53.37% ± 12.22%, which also yielded results consistent with our expectations: as the number of categories increases, the classification difficulty increases, and the accuracy correspondingly decreases. Additionally, we referenced a published EEG dataset from acute stroke patients related to motor imagery (n = 50). That study reported average binary classification accuracies of 55.57%, 57.57%, 61.20%, and 72.21% using the CSP + LDA, FBCSP + SVM, TSLDA + DGFDRM, and TWFB + DGFDM methods, respectively. In our experiment, using the same method (FBCSP + SVM) on data from healthy subjects yielded higher classification accuracy (G5: 86.59% ± 10.98%, G6: 85.88% ± 9.75%). This difference was also expected, as acute stroke patients may have difficulty maintaining a sitting position and steadily performing motor imagery paradigms for extended periods.

In summary, the above demonstrates the reliability of our dataset.

This section presents the brain functional network diagrams corresponding to the execution of six types of experimental paradigms. The calculation method is as follows, for all subjects, we calculated the Phase Lag Index (PLI) for epochs under various paradigms. This index is commonly used to measure the asymmetry in the distribution of phase differences between two signals. It reflects the consistency of phase lead or lag between two signals and serves as a useful measure of phase synchronization.

After obtaining the PLI matrix, we used the 59 electroencephalography (EEG) electrode locations as network nodes and the values representing functional connectivity strength between nodes in the PLI matrix as edge weights. The connection weights in the matrix were transformed into a weight vector to clarify the connectivity relationships between nodes, ultimately completing the construction of the brain functional network.

As shown in Figs. 14, 15, nodes of different colors in the network represent distinct brain regions (blue: frontal lobe; light blue: central region; green: parietal lobe; orange: occipital lobe; red: temporal lobe). Variations in line color and thickness indicate the strength of connections within the brain network. Revealing distinct patterns across the six types of experimental paradigms in terms of connectivity among the primary motor cortex, premotor area, supplementary motor area, somatosensory cortex, and visual cortex. Notably, the connectivity during actual motor execution was found to be stronger than that during motor imagery tasks. Furthermore, connectivity during actual motor execution exhibited global coordination and regulation across the entire brain, whereas motor imagery tasks primarily involved regulation of the motor cortex, with weaker connectivity in the somatosensory cortex. However, the results also revealed that assisted movement tasks -- specifically those involving soft rehabilitation gloves -- enhanced connectivity in the somatosensory cortex.

For further analysis, we binarized the weighted matrix using the upper quartile point of each PLI matrix as the threshold, transforming the connection weights into binary form (values above the threshold were set to 1, otherwise 0), thereby constructing a binarized brain functional network. On this basis, we selected the assortativity coefficient (ASS: referring to the tendency of nodes with similar degree values to connect within a network, which can assess the correlation of node degrees in the network), a commonly used network characteristic parameter in complex network analysis, for calculation. Finally, we performed T-test analysis on this parameter across different paradigms or tasks.

The following two tables represent the analysis of differences in ASS parameters under different paradigms (e.g., motor execution vs. motor imagery) or different tasks (e.g., left hand vs. right hand). (SRG: Soft Rehabilitation Glove, ME: Motor Execution, MI: Motor Imagery, VR: VR_Motor Imagery, IMA: Mirror Glove, MIR: Mirror Therapy, 1:left hand, 2:right hand, 3:both hand, SRG1 ME1 means: Soft Rehabilitation Glove with the left hand vs. Motor Execution with the left hand). The results of the 28 subjects are provided in Supplementary File 1 (Tables 3, 4).

Rapid Reads News

A multi-paradigm EEG dataset for studying upper limb rehabilitation exercises - Scientific Data

POPULAR CATEGORY

misc

entertainment

corporate

research

wellness

athletics