From 222af50fead25e562f37cc5f2142af00c42e6a91 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Tue, 19 Nov 2024 01:17:08 +0000 Subject: [PATCH] Update arXiv papers --- README.md | 316 +++++++++--------- data_store/papers_2024-11-19.json | 516 ++++++++++++++++++++++++++++++ 2 files changed, 674 insertions(+), 158 deletions(-) create mode 100644 data_store/papers_2024-11-19.json diff --git a/README.md b/README.md index 2f492f0..005d5d8 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -

Updated on 2024-11-18

+

Updated on 2024-11-19

Brain

@@ -14,74 +14,74 @@ -2024-11-14 -SMILE-UHURA Challenge -- Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiograms -Soumick Chatterjee, Hendrik Mattern, Marc Dörner, Alessandro Sciarra, Florian Dubost, Hannes Schnurre, Rupali Khatun, Chun-Chih Yu, Tsung-Lin Hsieh, Yi-Shan Tsai, Yi-Zeng Fang, Yung-Ching Yang, Juinn-Dar Huang, Marshall Xu, Siyu Liu, Fernanda L. Ribeiro, Saskia Bollmann, Karthikesh Varma Chintalapati, Chethan Mysuru Radhakrishna, Sri Chandana Hudukula Ram Kumara, Raviteja Sutrave, Abdul Qayyum, Moona Mazher, Imran Razzak, Cristobal Rodero, Steven Niederren, Fengming Lin, Yan Xia, Jiacheng Wang, Riyu Qiu, Liansheng Wang, Arya Yazdan Panah, Rosana El Jurdi, Guanghui Fu, Janan Arslan, Ghislain Vaillant, Romain Valabregue, Didier Dormont, Bruno Stankoff, Olivier Colliot, Luisa Vargas, Isai Daniel Chacón, Ioannis Pitsiorlas, Pablo Arbeláez, Maria A. Zuluaga, Stefanie Schreiber, Oliver Speck, Andreas Nürnberger -Link -The human brain receives nutrients and oxygen through an intricate network of blood vessels. Pathology affecting small vessels, at the mesoscopic scale, represents a critical vulnerability within the cerebral blood supply and can lead to severe conditions, such as Cerebral Small Vessel Diseases. The advent of 7 Tesla MRI systems has enabled the acquisition of higher spatial resolution images, making it possible to visualise such vessels in the brain. However, the lack of publicly available annotated datasets has impeded the development of robust, machine learning-driven segmentation algorithms. To address this, the SMILE-UHURA challenge was organised. This challenge, held in conjunction with the ISBI 2023, in Cartagena de Indias, Colombia, aimed to provide a platform for researchers working on related topics. The SMILE-UHURA challenge addresses the gap in publicly available annotated datasets by providing an annotated dataset of Time-of-Flight angiography acquired with 7T MRI. This dataset was created through a combination of automated pre-segmentation and extensive manual refinement. In this manuscript, sixteen submitted methods and two baseline methods are compared both quantitatively and qualitatively on two different datasets: held-out test MRAs from the same dataset as the training data (with labels kept secret) and a separate 7T ToF MRA dataset where both input volumes and labels are kept secret. The results demonstrate that most of the submitted deep learning methods, trained on the provided training dataset, achieved reliable segmentation performance. Dice scores reached up to 0.838 $\pm$ 0.066 and 0.716 $\pm$ 0.125 on the respective datasets, with an average performance of up to 0.804 $\pm$ 0.15. +2024-11-15 +Multi-Task Adversarial Variational Autoencoder for Estimating Biological Brain Age with Multimodal Neuroimaging +Muhammad Usman, Azka Rehman, Abdullah Shahid, Abd Ur Rehman, Sung-Min Gho, Aleum Lee, Tariq M. Khan, Imran Razzak +Link +Despite advances in deep learning for estimating brain age from structural MRI data, incorporating functional MRI data is challenging due to its complex structure and the noisy nature of functional connectivity measurements. To address this, we present the Multitask Adversarial Variational Autoencoder, a custom deep learning framework designed to improve brain age predictions through multimodal MRI data integration. This model separates latent variables into generic and unique codes, isolating shared and modality-specific features. By integrating multitask learning with sex classification as an additional task, the model captures sex-specific aging patterns. Evaluated on the OpenBHB dataset, a large multisite brain MRI collection, the model achieves a mean absolute error of 2.77 years, outperforming traditional methods. This success positions M-AVAE as a powerful tool for metaverse-based healthcare applications in brain age estimation. -2024-11-14 -VPBSD:Vessel-Pattern-Based Semi-Supervised Distillation for Efficient 3D Microscopic Cerebrovascular Segmentation -Xi Lin, Shixuan Zhao, Xinxu Wei, Amir Shmuel, Yongjie Li -Link -3D microscopic cerebrovascular images are characterized by their high resolution, presenting significant annotation challenges, large data volumes, and intricate variations in detail. Together, these factors make achieving high-quality, efficient whole-brain segmentation particularly demanding. In this paper, we propose a novel Vessel-Pattern-Based Semi-Supervised Distillation pipeline (VpbSD) to address the challenges of 3D microscopic cerebrovascular segmentation. This pipeline initially constructs a vessel-pattern codebook that captures diverse vascular structures from unlabeled data during the teacher model's pretraining phase. In the knowledge distillation stage, the codebook facilitates the transfer of rich knowledge from a heterogeneous teacher model to a student model, while the semi-supervised approach further enhances the student model's exposure to diverse learning samples. Experimental results on real-world data, including comparisons with state-of-the-art methods and ablation studies, demonstrate that our pipeline and its individual components effectively address the challenges inherent in microscopic cerebrovascular segmentation. +2024-11-15 +Adapting the Biological SSVEP Response to Artificial Neural Networks +Emirhan Böge, Yasemin Gunindi, Erchan Aptoula, Nihan Alp, Huseyin Ozkan +Link +Neuron importance assessment is crucial for understanding the inner workings of artificial neural networks (ANNs) and improving their interpretability and efficiency. This paper introduces a novel approach to neuron significance assessment inspired by frequency tagging, a technique from neuroscience. By applying sinusoidal contrast modulation to image inputs and analyzing resulting neuron activations, this method enables fine-grained analysis of a network's decision-making processes. Experiments conducted with a convolutional neural network for image classification reveal notable harmonics and intermodulations in neuron-specific responses under part-based frequency tagging. These findings suggest that ANNs exhibit behavior akin to biological brains in tuning to flickering frequencies, thereby opening avenues for neuron/filter importance assessment through frequency tagging. The proposed method holds promise for applications in network pruning, and model interpretability, contributing to the advancement of explainable artificial intelligence and addressing the lack of transparency in neural networks. Future research directions include developing novel loss functions to encourage biologically plausible behavior in ANNs. -2024-11-14 -Imagined Speech and Visual Imagery as Intuitive Paradigms for Brain-Computer Interfaces -Seo-Hyun Lee, Ji-Ha Park, Deok-Seon Kim -Link -Recent advancements in brain-computer interface (BCI) technology have emphasized the promise of imagined speech and visual imagery as effective paradigms for intuitive communication. This study investigates the classification performance and brain connectivity patterns associated with these paradigms, focusing on decoding accuracy across selected word classes. Sixteen participants engaged in tasks involving thirteen imagined speech and visual imagery classes, revealing above-chance classification accuracy for both paradigms. Variability in classification accuracy across individual classes highlights the influence of sensory and motor associations in imagined speech and vivid visual associations in visual imagery. Connectivity analysis further demonstrated increased functional connectivity in language-related and sensory regions for imagined speech, whereas visual imagery activated spatial and visual processing networks. These findings suggest the potential of imagined speech and visual imagery as an intuitive and scalable paradigm for BCI communication when selecting optimal word classes. Further exploration of the decoding outcomes for these two paradigms could provide insights for practical BCI communication. +2024-11-15 +Brain-inspired Action Generation with Spiking Transformer Diffusion Policy Model +Qianhao Wang, Yinqian Sun, Enmeng Lu, Qian Zhang, Yi Zeng +Link +Spiking Neural Networks (SNNs) has the ability to extract spatio-temporal features due to their spiking sequence. While previous research has primarily foucus on the classification of image and reinforcement learning. In our paper, we put forward novel diffusion policy model based on Spiking Transformer Neural Networks and Denoising Diffusion Probabilistic Model (DDPM): Spiking Transformer Modulate Diffusion Policy Model (STMDP), a new brain-inspired model for generating robot action trajectories. In order to improve the performance of this model, we develop a novel decoder module: Spiking Modulate De coder (SMD), which replaces the traditional Decoder module within the Transformer architecture. Additionally, we explored the substitution of DDPM with Denoising Diffusion Implicit Models (DDIM) in our frame work. We conducted experiments across four robotic manipulation tasks and performed ablation studies on the modulate block. Our model consistently outperforms existing Transformer-based diffusion policy method. Especially in Can task, we achieved an improvement of 8%. The proposed STMDP method integrates SNNs, dffusion model and Transformer architecture, which offers new perspectives and promising directions for exploration in brain-inspired robotics. -2024-11-14 -Experimental Demonstration of Remote Synchronization in Coupled Nonlinear Oscillator -Sanjeev Kumar Pandey -Link -This study investigates remote synchronization in scale-free networks of coupled nonlinear oscillators inspired by synchronization observed in the brain's cortical regions and power grid. We employ the Master Stability Function (MSF) approach to analyze network stability across various oscillator models. Synchronization results are obtained for a star network using linearization techniques and extended to arbitrary networks with benchmark oscillators, verifying consistent behavior. Stable synchronous solutions emerge as the Floquet multiplier decreases and the MSF becomes negative. Additionally, we demonstrate remote synchronization in a star network, where peripheral oscillators communicate exclusively through a central hub, drawing parallels to neuronal synchronization in the brain. Experimental validation is achieved through an electronic circuit testbed, supported by nonlinear ODE modeling and LTspice simulation. Future work will extend the investigation to arbitrary network topologies, further elucidating synchronization dynamics in complex systems. +2024-11-15 +EEG Spectral Analysis in Gray Zone Between Healthy and Insomnia +Ha-Na Jo, Young-Seok Kweon, Seo-Hyun Lee +Link +This study investigates the sleep characteristics and brain activity of individuals in the gray zone of insomnia, a population that experiences sleep disturbances yet does not fully meet the clinical criteria for chronic insomnia. Thirteen healthy participants and thirteen individuals from the gray zone were assessed using polysomnography and electroencephalogram to analyze both sleep architecture and neural activity. Although no significant differences in objective sleep quality or structure were found between the groups, gray zone individuals reported higher insomnia severity index scores, indicating subjective sleep difficulties. Electroencephalogram analysis revealed increased delta and alpha activity during the wake stage, suggesting lingering sleep inertia, while non-rapid eye movement stages 1 and 2 exhibited elevated beta and gamma activity, often associated with chronic insomnia. However, these high-frequency patterns were not observed in non-rapid eye movement stage 3 or rapid eye movement sleep, suggesting less severe disruptions compared to chronic insomnia. This study emphasizes that despite normal polysomnography findings, EEG patterns in gray zone individuals suggest a potential risk for chronic insomnia, highlighting the need for early identification and tailored intervention strategies to prevent progression. 2024-11-14 -Exploring Zero-Shot Anomaly Detection with CLIP in Medical Imaging: Are We There Yet? -Aldo Marzullo, Marta Bianca Maria Ranzini -Link -Zero-shot anomaly detection (ZSAD) offers potential for identifying anomalies in medical imaging without task-specific training. In this paper, we evaluate CLIP-based models, originally developed for industrial tasks, on brain tumor detection using the BraTS-MET dataset. Our analysis examines their ability to detect medical-specific anomalies with no or minimal supervision, addressing the challenges posed by limited data annotation. While these models show promise in transferring general knowledge to medical tasks, their performance falls short of the precision required for clinical use. Our findings highlight the need for further adaptation before CLIP-based models can be reliably applied to medical anomaly detection. +A Self-Supervised Model for Multi-modal Stroke Risk Prediction +Camille Delgrange, Olga Demler, Samia Mora, Bjoern Menze, Ezequiel de la Rosa, Neda Davoudi +Link +Predicting stroke risk is a complex challenge that can be enhanced by integrating diverse clinically available data modalities. This study introduces a self-supervised multimodal framework that combines 3D brain imaging, clinical data, and image-derived features to improve stroke risk prediction prior to onset. By leveraging large unannotated clinical datasets, the framework captures complementary and synergistic information across image and tabular data modalities. Our approach is based on a contrastive learning framework that couples contrastive language-image pretraining with an image-tabular matching module, to better align multimodal data representations in a shared latent space. The model is trained on the UK Biobank, which includes structural brain MRI and clinical data. We benchmark its performance against state-of-the-art unimodal and multimodal methods using tabular, image, and image-tabular combinations under diverse frozen and trainable model settings. The proposed model outperformed self-supervised tabular (image) methods by 2.6% (2.6%) in ROC-AUC and by 3.3% (5.6%) in balanced accuracy. Additionally, it showed a 7.6% increase in balanced accuracy compared to the best multimodal supervised model. Through interpretable tools, our approach demonstrated better integration of tabular and image data, providing richer and more aligned embeddings. Gradient-weighted Class Activation Mapping heatmaps further revealed activated brain regions commonly associated in the literature with brain aging, stroke risk, and clinical outcomes. This robust self-supervised multimodal framework surpasses state-of-the-art methods for stroke risk prediction and offers a strong foundation for future studies integrating diverse data modalities to advance clinical predictive modelling. 2024-11-14 -EEG-Based Speech Decoding: A Novel Approach Using Multi-Kernel Ensemble Diffusion Models -Soowon Kim, Ha-Na Jo, Eunyeong Ko -Link -In this study, we propose an ensemble learning framework for electroencephalogram-based overt speech classification, leveraging denoising diffusion probabilistic models with varying convolutional kernel sizes. The ensemble comprises three models with kernel sizes of 51, 101, and 201, effectively capturing multi-scale temporal features inherent in signals. This approach improves the robustness and accuracy of speech decoding by accommodating the rich temporal complexity of neural signals. The ensemble models work in conjunction with conditional autoencoders that refine the reconstructed signals and maximize the useful information for downstream classification tasks. The results indicate that the proposed ensemble-based approach significantly outperforms individual models and existing state-of-the-art techniques. These findings demonstrate the potential of ensemble methods in advancing brain signal decoding, offering new possibilities for non-verbal communication applications, particularly in brain-computer interface systems aimed at aiding individuals with speech impairments. +ART-Rx: A Proportional-Integral-Derivative (PID) Controlled Adaptive Real-Time Threshold Receiver for Molecular Communication +Hongbin Ni, Ozgur B. Akan +Link +Molecular communication (MC) in microfluidic channels faces significant challenges in signal detection due to the stochastic nature of molecule propagation and dynamic, noisy environments. Conventional detection methods often struggle under varying channel conditions, leading to high bit error rates (BER) and reduced communication efficiency. This paper introduces ART-Rx, a novel Adaptive Real-Time Threshold Receiver for MC that addresses these challenges. Implemented within a conceptual system-on-chip (SoC), ART-Rx employs a Proportional-Integral-Derivative (PID) controller to dynamically adjust the detection threshold based on observed errors in real time. Comprehensive simulations using MATLAB and Smoldyn compare ART-Rx's performance against a statistically optimal detection threshold across various scenarios, including different levels of interference, concentration shift keying (CSK) levels, flow velocities, transmitter-receiver distances, diffusion coefficients, and binding rates. The results demonstrate that ART-Rx significantly outperforms conventional methods, maintaining consistently low BER and bit error probabilities (BEP) even in high-noise conditions and extreme channel environments. The system exhibits exceptional robustness to interference and shows the potential to enable higher data rates in CSK modulation. Furthermore, because ART-Rx is effectively adaptable to varying environmental conditions in microfluidic channels, it offers a computationally efficient and straightforward approach to enhance signal detection in nanoscale communication systems. This approach presents a promising control theory-based solution to improve the reliability of data transmission in practical MC systems, with potential applications in healthcare, brain-machine interfaces (BMI), and the Internet of Bio-Nano Things (IoBNT). 2024-11-14 -Towards Unified Neural Decoding of Perceived, Spoken and Imagined Speech from EEG Signals -Jung-Sun Lee, Ha-Na Jo, Seo-Hyun Lee -Link -Brain signals accompany various information relevant to human actions and mental imagery, making them crucial to interpreting and understanding human intentions. Brain-computer interface technology leverages this brain activity to generate external commands for controlling the environment, offering critical advantages to individuals with paralysis or locked-in syndrome. Within the brain-computer interface domain, brain-to-speech research has gained attention, focusing on the direct synthesis of audible speech from brain signals. Most current studies decode speech from brain activity using invasive techniques and emphasize spoken speech data. However, humans express various speech states, and distinguishing these states through non-invasive approaches remains a significant yet challenging task. This research investigated the effectiveness of deep learning models for non-invasive-based neural signal decoding, with an emphasis on distinguishing between different speech paradigms, including perceived, overt, whispered, and imagined speech, across multiple frequency bands. The model utilizing the spatial conventional neural network module demonstrated superior performance compared to other models, especially in the gamma band. Additionally, imagined speech in the theta frequency band, where deep learning also showed strong effects, exhibited statistically significant differences compared to the other speech paradigms. +SMILE-UHURA Challenge -- Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiograms +Soumick Chatterjee, Hendrik Mattern, Marc Dörner, Alessandro Sciarra, Florian Dubost, Hannes Schnurre, Rupali Khatun, Chun-Chih Yu, Tsung-Lin Hsieh, Yi-Shan Tsai, Yi-Zeng Fang, Yung-Ching Yang, Juinn-Dar Huang, Marshall Xu, Siyu Liu, Fernanda L. Ribeiro, Saskia Bollmann, Karthikesh Varma Chintalapati, Chethan Mysuru Radhakrishna, Sri Chandana Hudukula Ram Kumara, Raviteja Sutrave, Abdul Qayyum, Moona Mazher, Imran Razzak, Cristobal Rodero, Steven Niederren, Fengming Lin, Yan Xia, Jiacheng Wang, Riyu Qiu, Liansheng Wang, Arya Yazdan Panah, Rosana El Jurdi, Guanghui Fu, Janan Arslan, Ghislain Vaillant, Romain Valabregue, Didier Dormont, Bruno Stankoff, Olivier Colliot, Luisa Vargas, Isai Daniel Chacón, Ioannis Pitsiorlas, Pablo Arbeláez, Maria A. Zuluaga, Stefanie Schreiber, Oliver Speck, Andreas Nürnberger +Link +The human brain receives nutrients and oxygen through an intricate network of blood vessels. Pathology affecting small vessels, at the mesoscopic scale, represents a critical vulnerability within the cerebral blood supply and can lead to severe conditions, such as Cerebral Small Vessel Diseases. The advent of 7 Tesla MRI systems has enabled the acquisition of higher spatial resolution images, making it possible to visualise such vessels in the brain. However, the lack of publicly available annotated datasets has impeded the development of robust, machine learning-driven segmentation algorithms. To address this, the SMILE-UHURA challenge was organised. This challenge, held in conjunction with the ISBI 2023, in Cartagena de Indias, Colombia, aimed to provide a platform for researchers working on related topics. The SMILE-UHURA challenge addresses the gap in publicly available annotated datasets by providing an annotated dataset of Time-of-Flight angiography acquired with 7T MRI. This dataset was created through a combination of automated pre-segmentation and extensive manual refinement. In this manuscript, sixteen submitted methods and two baseline methods are compared both quantitatively and qualitatively on two different datasets: held-out test MRAs from the same dataset as the training data (with labels kept secret) and a separate 7T ToF MRA dataset where both input volumes and labels are kept secret. The results demonstrate that most of the submitted deep learning methods, trained on the provided training dataset, achieved reliable segmentation performance. Dice scores reached up to 0.838 $\pm$ 0.066 and 0.716 $\pm$ 0.125 on the respective datasets, with an average performance of up to 0.804 $\pm$ 0.15. 2024-11-14 -Dynamic Neural Communication: Convergence of Computer Vision and Brain-Computer Interface -Ji-Ha Park, Seo-Hyun Lee, Soowon Kim, Seong-Whan Lee -Link -Interpreting human neural signals to decode static speech intentions such as text or images and dynamic speech intentions such as audio or video is showing great potential as an innovative communication tool. Human communication accompanies various features, such as articulatory movements, facial expressions, and internal speech, all of which are reflected in neural signals. However, most studies only generate short or fragmented outputs, while providing informative communication by leveraging various features from neural signals remains challenging. In this study, we introduce a dynamic neural communication method that leverages current computer vision and brain-computer interface technologies. Our approach captures the user's intentions from neural signals and decodes visemes in short time steps to produce dynamic visual outputs. The results demonstrate the potential to rapidly capture and reconstruct lip movements during natural speech attempts from human neural signals, enabling dynamic neural communication through the convergence of computer vision and brain--computer interface. +VPBSD:Vessel-Pattern-Based Semi-Supervised Distillation for Efficient 3D Microscopic Cerebrovascular Segmentation +Xi Lin, Shixuan Zhao, Xinxu Wei, Amir Shmuel, Yongjie Li +Link +3D microscopic cerebrovascular images are characterized by their high resolution, presenting significant annotation challenges, large data volumes, and intricate variations in detail. Together, these factors make achieving high-quality, efficient whole-brain segmentation particularly demanding. In this paper, we propose a novel Vessel-Pattern-Based Semi-Supervised Distillation pipeline (VpbSD) to address the challenges of 3D microscopic cerebrovascular segmentation. This pipeline initially constructs a vessel-pattern codebook that captures diverse vascular structures from unlabeled data during the teacher model's pretraining phase. In the knowledge distillation stage, the codebook facilitates the transfer of rich knowledge from a heterogeneous teacher model to a student model, while the semi-supervised approach further enhances the student model's exposure to diverse learning samples. Experimental results on real-world data, including comparisons with state-of-the-art methods and ablation studies, demonstrate that our pipeline and its individual components effectively address the challenges inherent in microscopic cerebrovascular segmentation. 2024-11-14 -Towards Scalable Handwriting Communication via EEG Decoding and Latent Embedding Integration -Jun-Young Kim, Deok-Seon Kim, Seo-Hyun Lee -Link -In recent years, brain-computer interfaces have made advances in decoding various motor-related tasks, including gesture recognition and movement classification, utilizing electroencephalogram (EEG) data. These developments are fundamental in exploring how neural signals can be interpreted to recognize specific physical actions. This study centers on a written alphabet classification task, where we aim to decode EEG signals associated with handwriting. To achieve this, we incorporate hand kinematics to guide the extraction of the consistent embeddings from high-dimensional neural recordings using auxiliary variables (CEBRA). These CEBRA embeddings, along with the EEG, are processed by a parallel convolutional neural network model that extracts features from both data sources simultaneously. The model classifies nine different handwritten characters, including symbols such as exclamation marks and commas, within the alphabet. We evaluate the model using a quantitative five-fold cross-validation approach and explore the structure of the embedding space through visualizations. Our approach achieves a classification accuracy of 91 % for the nine-class task, demonstrating the feasibility of fine-grained handwriting decoding from EEG. +Towards Neural Foundation Models for Vision: Aligning EEG, MEG, and fMRI Representations for Decoding, Encoding, and Modality Conversion +Matteo Ferrante, Tommaso Boccato, Grigorii Rashkov, Nicola Toschi +Link +This paper presents a novel approach towards creating a foundational model for aligning neural data and visual stimuli across multimodal representationsof brain activity by leveraging contrastive learning. We used electroencephalography (EEG), magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI) data. Our framework's capabilities are demonstrated through three key experiments: decoding visual information from neural data, encoding images into neural representations, and converting between neural modalities. The results highlight the model's ability to accurately capture semantic information across different brain imaging techniques, illustrating its potential in decoding, encoding, and modality conversion tasks. 2024-11-14 -Interdependent scaling exponents in the human brain -Daniel M. Castro, Ernesto P. Raposo, Mauro Copelli, Fernando A. N. Santos -Link -We apply the phenomenological renormalization group to resting-state fMRI time series of brain activity in a large population. By recursively coarse-graining the data, we compute scaling exponents for the series variance, log probability of silence, and largest covariance eigenvalue. The exponents clearly exhibit linear interdependencies, which we derive analytically in a mean-field approach. We find a significant correlation of exponent values with the gray matter volume and cognitive performance. Akin to scaling relations near critical points in thermodynamics, our findings suggest scaling interdependencies are intrinsic to brain organization and may also exist in other complex systems. +Imagined Speech and Visual Imagery as Intuitive Paradigms for Brain-Computer Interfaces +Seo-Hyun Lee, Ji-Ha Park, Deok-Seon Kim +Link +Recent advancements in brain-computer interface (BCI) technology have emphasized the promise of imagined speech and visual imagery as effective paradigms for intuitive communication. This study investigates the classification performance and brain connectivity patterns associated with these paradigms, focusing on decoding accuracy across selected word classes. Sixteen participants engaged in tasks involving thirteen imagined speech and visual imagery classes, revealing above-chance classification accuracy for both paradigms. Variability in classification accuracy across individual classes highlights the influence of sensory and motor associations in imagined speech and vivid visual associations in visual imagery. Connectivity analysis further demonstrated increased functional connectivity in language-related and sensory regions for imagined speech, whereas visual imagery activated spatial and visual processing networks. These findings suggest the potential of imagined speech and visual imagery as an intuitive and scalable paradigm for BCI communication when selecting optimal word classes. Further exploration of the decoding outcomes for these two paradigms could provide insights for practical BCI communication. @@ -100,6 +100,41 @@ +2024-11-15 +PFML: Self-Supervised Learning of Time-Series Data Without Representation Collapse +Einari Vaaras, Manu Airaksinen, Okko Räsänen +Link +Self-supervised learning (SSL) is a data-driven learning approach that utilizes the innate structure of the data to guide the learning process. In contrast to supervised learning, which depends on external labels, SSL utilizes the inherent characteristics of the data to produce its own supervisory signal. However, one frequent issue with SSL methods is representation collapse, where the model outputs a constant input-invariant feature representation. This issue hinders the potential application of SSL methods to new data modalities, as trying to avoid representation collapse wastes researchers' time and effort. This paper introduces a novel SSL algorithm for time-series data called Prediction of Functionals from Masked Latents (PFML). Instead of predicting masked input signals or their latent representations directly, PFML operates by predicting statistical functionals of the input signal corresponding to masked embeddings, given a sequence of unmasked embeddings. The algorithm is designed to avoid representation collapse, rendering it straightforwardly applicable to different time-series data domains, such as novel sensor modalities in clinical data. We demonstrate the effectiveness of PFML through complex, real-life classification tasks across three different data modalities: infant posture and movement classification from multi-sensor inertial measurement unit data, emotion recognition from speech data, and sleep stage classification from EEG data. The results show that PFML is superior to a conceptually similar pre-existing SSL method and competitive against the current state-of-the-art SSL method, while also being conceptually simpler and without suffering from representation collapse. + + +2024-11-15 +A Multi-Label EEG Dataset for Mental Attention State Classification in Online Learning +Huan Liu, Yuzhe Zhang, Guanjian Liu, Xinxin Du, Haochong Wang, Dalin Zhang +Link +Attention is a vital cognitive process in the learning and memory environment, particularly in the context of online learning. Traditional methods for classifying attention states of online learners based on behavioral signals are prone to distortion, leading to increased interest in using electroencephalography (EEG) signals for authentic and accurate assessment. However, the field of attention state classification based on EEG signals in online learning faces challenges, including the scarcity of publicly available datasets, the lack of standardized data collection paradigms, and the requirement to consider the interplay between attention and other psychological states. In light of this, we present the Multi-label EEG dataset for classifying Mental Attention states (MEMA) in online learning. We meticulously designed a reliable and standard experimental paradigm with three attention states: neutral, relaxing, and concentrating, considering human physiological and psychological characteristics. This paradigm collected EEG signals from 20 subjects, each participating in 12 trials, resulting in 1,060 minutes of data. Emotional state labels, basic personal information, and personality traits were also collected to investigate the relationship between attention and other psychological states. Extensive quantitative and qualitative analysis, including a multi-label correlation study, validated the quality of the EEG attention data. The MEMA dataset and analysis provide valuable insights for advancing research on attention in online learning. The dataset is publicly available at \url{https://github.com/GuanjianLiu/MEMA}. + + +2024-11-15 +EEG Spectral Analysis in Gray Zone Between Healthy and Insomnia +Ha-Na Jo, Young-Seok Kweon, Seo-Hyun Lee +Link +This study investigates the sleep characteristics and brain activity of individuals in the gray zone of insomnia, a population that experiences sleep disturbances yet does not fully meet the clinical criteria for chronic insomnia. Thirteen healthy participants and thirteen individuals from the gray zone were assessed using polysomnography and electroencephalogram to analyze both sleep architecture and neural activity. Although no significant differences in objective sleep quality or structure were found between the groups, gray zone individuals reported higher insomnia severity index scores, indicating subjective sleep difficulties. Electroencephalogram analysis revealed increased delta and alpha activity during the wake stage, suggesting lingering sleep inertia, while non-rapid eye movement stages 1 and 2 exhibited elevated beta and gamma activity, often associated with chronic insomnia. However, these high-frequency patterns were not observed in non-rapid eye movement stage 3 or rapid eye movement sleep, suggesting less severe disruptions compared to chronic insomnia. This study emphasizes that despite normal polysomnography findings, EEG patterns in gray zone individuals suggest a potential risk for chronic insomnia, highlighting the need for early identification and tailored intervention strategies to prevent progression. + + +2024-11-15 +A Hybrid Artificial Intelligence System for Automated EEG Background Analysis and Report Generation +Chin-Sung Tung, Sheng-Fu Liang, Shu-Feng Chang, Chung-Ping Young +Link +Electroencephalography (EEG) plays a crucial role in the diagnosis of various neurological disorders. However, small hospitals and clinics often lack advanced EEG signal analysis systems and are prone to misinterpretation in manual EEG reading. This study proposes an innovative hybrid artificial intelligence (AI) system for automatic interpretation of EEG background activity and report generation. The system combines deep learning models for posterior dominant rhythm (PDR) prediction, unsupervised artifact removal, and expert-designed algorithms for abnormality detection. For PDR prediction, 1530 labeled EEGs were used, and the best ensemble model achieved a mean absolute error (MAE) of 0.237, a root mean square error (RMSE) of 0.359, an accuracy of 91.8% within a 0.6Hz error, and an accuracy of 99% within a 1.2Hz error. The AI system significantly outperformed neurologists in detecting generalized background slowing (p = 0.02; F1: AI 0.93, neurologists 0.82) and demonstrated improved focal abnormality detection, although not statistically significant (p = 0.79; F1: AI 0.71, neurologists 0.55). Validation on both an internal dataset and the Temple University Abnormal EEG Corpus showed consistent performance (F1: 0.884 and 0.835, respectively; p = 0.66), demonstrating generalizability. The use of large language models (LLMs) for report generation demonstrated 100% accuracy, verified by three other independent LLMs. This hybrid AI system provides an easily scalable and accurate solution for EEG interpretation in resource-limited settings, assisting neurologists in improving diagnostic accuracy and reducing misdiagnosis rates. + + +2024-11-14 +Towards Neural Foundation Models for Vision: Aligning EEG, MEG, and fMRI Representations for Decoding, Encoding, and Modality Conversion +Matteo Ferrante, Tommaso Boccato, Grigorii Rashkov, Nicola Toschi +Link +This paper presents a novel approach towards creating a foundational model for aligning neural data and visual stimuli across multimodal representationsof brain activity by leveraging contrastive learning. We used electroencephalography (EEG), magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI) data. Our framework's capabilities are demonstrated through three key experiments: decoding visual information from neural data, encoding images into neural representations, and converting between neural modalities. The results highlight the model's ability to accurately capture semantic information across different brain imaging techniques, illustrating its potential in decoding, encoding, and modality conversion tasks. + + 2024-11-14 EEG-Based Speech Decoding: A Novel Approach Using Multi-Kernel Ensemble Diffusion Models Soowon Kim, Ha-Na Jo, Eunyeong Ko @@ -134,41 +169,6 @@ Link Decoding the directional focus of an attended speaker from listeners' electroencephalogram (EEG) signals is essential for developing brain-computer interfaces to improve the quality of life for individuals with hearing impairment. Previous works have concentrated on binary directional focus decoding, i.e., determining whether the attended speaker is on the left or right side of the listener. However, a more precise decoding of the exact direction of the attended speaker is necessary for effective speech processing. Additionally, audio spatial information has not been effectively leveraged, resulting in suboptimal decoding results. In this paper, we observe that, on our recently presented dataset with 15-class directional focus, models relying exclusively on EEG inputs exhibits significantly lower accuracy when decoding the directional focus in both leave-one-subject-out and leave-one-trial-out scenarios. By integrating audio spatial spectra with EEG features, the decoding accuracy can be effectively improved. We employ the CNN, LSM-CNN, and EEG-Deformer models to decode the directional focus from listeners' EEG signals with the auxiliary audio spatial spectra. The proposed Sp-Aux-Deformer model achieves notable 15-class decoding accuracies of 57.48% and 61.83% in leave-one-subject-out and leave-one-trial-out scenarios, respectively. - -2024-11-10 -Psycho Gundam: Electroencephalography based real-time robotic control system with deep learning -Chi-Sheng Chen, Wei-Sheng Wang -Link -The Psycho Frame, a sophisticated system primarily used in Universal Century (U.C.) series mobile suits for NEWTYPE pilots, has evolved as an integral component in harnessing the latent potential of mental energy. Its ability to amplify and resonate with the pilot's psyche enables real-time mental control, creating unique applications such as psychomagnetic fields and sensory-based weaponry. This paper presents the development of a novel robotic control system inspired by the Psycho Frame, combining electroencephalography (EEG) and deep learning for real-time control of robotic systems. By capturing and interpreting brainwave data through EEG, the system extends human cognitive commands to robotic actions, reflecting the seamless synchronization of thought and machine, much like the Psyco Frame's integration with a Newtype pilot's mental faculties. This research demonstrates how modern AI techniques can expand the limits of human-machine interaction, potentially transcending traditional input methods and enabling a deeper, more intuitive control of complex robotic systems. - - -2024-11-10 -Evaluating tDCS Intervention Effectiveness via Functional Connectivity Network on Resting-State EEG Data in Major Depressive Disorder -Vishwani Singh, Rohit Verma, Shaurya Shriyam, Tapan K. Gandhi -Link -Transcranial direct current stimulation (tDCS) has emerged as a promising non-invasive therapeutic intervention for major depressive disorder (MDD), yet its effects on neural mechanisms remain incompletely understood. This study investigates the impact of tDCS in individuals with MDD using resting-state EEG data and network neuroscience to analyze functional connectivity. We examined power spectral density (PSD) changes and functional connectivity (FC) patterns across theta, alpha, and beta bands before and after tDCS intervention. A notable aspect of this research involves the modification of the binarizing threshold algorithm to assess functional connectivity networks, facilitating a meaningful comparison at the group level. Our analysis using optimal threshold binarization techniques revealed significant modifications in network topology, particularly evident in the beta band, indicative of reduced randomization or enhanced small-worldness after tDCS. Furthermore, the hubness analysis identified specific brain regions, notably the dorsolateral prefrontal cortex (DLPFC) regions across all frequency bands, exhibiting increased functional connectivity, suggesting their involvement in the antidepressant effects of tDCS. Notably, tDCS intervention transformed the dispersed high connectivity into localized connectivity and increased left-sided asymmetry across all frequency bands. Overall, this study provides valuable insights into the effects of tDCS on neural mechanisms in MDD, offering a potential direction for further research and therapeutic development in the field of neuromodulation for mental health disorders. - - -2024-11-07 -Dynamic-Attention-based EEG State Transition Modeling for Emotion Recognition -Xinke Shen, Runmin Gan, Kaixuan Wang, Shuyi Yang, Qingzhu Zhang, Quanying Liu, Dan Zhang, Sen Song -Link -Electroencephalogram (EEG)-based emotion decoding can objectively quantify people's emotional state and has broad application prospects in human-computer interaction and early detection of emotional disorders. Recently emerging deep learning architectures have significantly improved the performance of EEG emotion decoding. However, existing methods still fall short of fully capturing the complex spatiotemporal dynamics of neural signals, which are crucial for representing emotion processing. This study proposes a Dynamic-Attention-based EEG State Transition (DAEST) modeling method to characterize EEG spatiotemporal dynamics. The model extracts spatiotemporal components of EEG that represent multiple parallel neural processes and estimates dynamic attention weights on these components to capture transitions in brain states. The model is optimized within a contrastive learning framework for cross-subject emotion recognition. The proposed method achieved state-of-the-art performance on three publicly available datasets: FACED, SEED, and SEED-V. It achieved 75.4% accuracy in the binary classification of positive and negative emotions and 59.3% in nine-class discrete emotion classification on the FACED dataset, 88.1% in the three-class classification of positive, negative, and neutral emotions on the SEED dataset, and 73.6% in five-class discrete emotion classification on the SEED-V dataset. The learned EEG spatiotemporal patterns and dynamic transition properties offer valuable insights into neural dynamics underlying emotion processing. - - -2024-11-06 -FLEXtime: Filterbank learning for explaining time series -Thea Brüsch, Kristoffer K. Wickstrøm, Mikkel N. Schmidt, Robert Jenssen, Tommy S. Alstrøm -Link -State-of-the-art methods for explaining predictions based on time series are built on learning an instance-wise saliency mask for each time step. However, for many types of time series, the salient information is found in the frequency domain. Adopting existing methods to the frequency domain involves naively zeroing out frequency content in the signals, which goes against established signal processing theory. Therefore, we propose a new method entitled FLEXtime, that uses a filterbank to split the time series into frequency bands and learns the optimal combinations of these bands. FLEXtime avoids the drawbacks of zeroing out frequency bins and is more stable and easier to train compared to the naive method. Our extensive evaluation shows that FLEXtime on average outperforms state-of-the-art explainability methods across a range of datasets. FLEXtime fills an important gap in the time series explainability literature and can provide a valuable tool for a wide range of time series like EEG and audio. - - -2024-11-05 -Elliptical Wishart distributions: information geometry, maximum likelihood estimator, performance analysis and statistical learning -Imen Ayadi, Florent Bouchard, Frédéric Pascal -Link -This paper deals with Elliptical Wishart distributions - which generalize the Wishart distribution - in the context of signal processing and machine learning. Two algorithms to compute the maximum likelihood estimator (MLE) are proposed: a fixed point algorithm and a Riemannian optimization method based on the derived information geometry of Elliptical Wishart distributions. The existence and uniqueness of the MLE are characterized as well as the convergence of both estimation algorithms. Statistical properties of the MLE are also investigated such as consistency, asymptotic normality and an intrinsic version of Fisher efficiency. On the statistical learning side, novel classification and clustering methods are designed. For the $t$-Wishart distribution, the performance of the MLE and statistical learning algorithms are evaluated on both simulated and real EEG and hyperspectral data, showcasing the interest of our proposed methods. - @@ -215,6 +215,13 @@ 2024-10-31 +Feature Selection via Dynamic Graph-based Attention Block in MI-based EEG Signals +Hyeon-Taek Han, Dae-Hyeok Lee, Heon-Gyu Kwak +Link +Brain-computer interface (BCI) technology enables direct interaction between humans and computers by analyzing brain signals. Electroencephalogram (EEG) is one of the non-invasive tools used in BCI systems, providing high temporal resolution for real-time applications. However, EEG signals are often affected by a low signal-to-noise ratio, physiological artifacts, and individual variability, representing challenges in extracting distinct features. Also, motor imagery (MI)-based EEG signals could contain features with low correlation to MI characteristics, which might cause the weights of the deep model to become biased towards those features. To address these problems, we proposed the end-to-end deep preprocessing method that effectively enhances MI characteristics while attenuating features with low correlation to MI characteristics. The proposed method consisted of the temporal, spatial, graph, and similarity blocks to preprocess MI-based EEG signals, aiming to extract more discriminative features and improve the robustness. We evaluated the proposed method using the public dataset 2a of BCI Competition IV to compare the performances when integrating the proposed method into the conventional models, including the DeepConvNet, the M-ShallowConvNet, and the EEGNet. The experimental results showed that the proposed method could achieve the improved performances and lead to more clustered feature distributions of MI tasks. Hence, we demonstrated that our proposed method could enhance discriminative features related to MI characteristics. + + +2024-10-31 Neurophysiological Analysis in Motor and Sensory Cortices for Improving Motor Imagination Si-Hyun Kim, Sung-Jin Kim, Dae-Hyeok Lee Link @@ -228,6 +235,13 @@ Brain computer interface (BCI) applications in robotics are becoming more famous and famous. People with disabilities are facing a real-time problem of doing simple activities such as grasping, handshaking etc. in order to aid with this problem, the use of brain signals to control actuators is showing a great importance. The Emotive Insight, a Brain-Computer Interface (BCI) device, is utilized in this project to collect brain signals and transform them into commands for controlling a robotic arm using an Arduino controller. The Emotive Insight captures brain signals, which are subsequently analyzed using Emotive software and connected with Arduino code. The HITI Brain software integrates these devices, allowing for smooth communication between brain activity and the robotic arm. This system demonstrates how brain impulses may be utilized to control external devices directly. The results showed that the system is applicable efficiently to robotic arms and also for prosthetic arms with Multi Degree of Freedom. In addition to that, the system can be used for other actuators such as bikes, mobile robots, wheelchairs etc. +2024-10-28 +Can EEG resting state data benefit data-driven approaches for motor-imagery decoding? +Rishan Mehta, Param Rajpura, Yogesh Kumar Meena +Link +Resting-state EEG data in neuroscience research serve as reliable markers for user identification and reveal individual-specific traits. Despite this, the use of resting-state data in EEG classification models is limited. In this work, we propose a feature concatenation approach to enhance decoding models' generalization by integrating resting-state EEG, aiming to improve motor imagery BCI performance and develop a user-generalized model. Using feature concatenation, we combine the EEGNet model, a standard convolutional neural network for EEG signal classification, with functional connectivity measures derived from resting-state EEG data. The findings suggest that although grounded in neuroscience with data-driven learning, the concatenation approach has limited benefits for generalizing models in within-user and across-user scenarios. While an improvement in mean accuracy for within-user scenarios is observed on two datasets, concatenation doesn't benefit across-user scenarios when compared with random data concatenation. The findings indicate the necessity of further investigation on the model interpretability and the effect of random data concatenation on model robustness. + + 2024-10-19 Evaluation Of P300 Speller Performance Using Large Language Models Along With Cross-Subject Training Nithin Parthasarathy, James Soetedjo, Saarang Panchavati, Nitya Parthasarathy, Corey Arnold, Nader Pouratian, William Speier @@ -241,20 +255,6 @@ Link The rapid evolution of Brain-Computer Interfaces (BCIs) has significantly influenced the domain of human-computer interaction, with Steady-State Visual Evoked Potentials (SSVEP) emerging as a notably robust paradigm. This study explores advanced classification techniques leveraging interpretable fuzzy transfer learning (iFuzzyTL) to enhance the adaptability and performance of SSVEP-based systems. Recent efforts have strengthened to reduce calibration requirements through innovative transfer learning approaches, which refine cross-subject generalizability and minimize calibration through strategic application of domain adaptation and few-shot learning strategies. Pioneering developments in deep learning also offer promising enhancements, facilitating robust domain adaptation and significantly improving system responsiveness and accuracy in SSVEP classification. However, these methods often require complex tuning and extensive data, limiting immediate applicability. iFuzzyTL introduces an adaptive framework that combines fuzzy logic principles with neural network architectures, focusing on efficient knowledge transfer and domain adaptation. iFuzzyTL refines input signal processing and classification in a human-interpretable format by integrating fuzzy inference systems and attention mechanisms. This approach bolsters the model's precision and aligns with real-world operational demands by effectively managing the inherent variability and uncertainty of EEG data. The model's efficacy is demonstrated across three datasets: 12JFPM (89.70% accuracy for 1s with an information transfer rate (ITR) of 149.58), Benchmark (85.81% accuracy for 1s with an ITR of 213.99), and eldBETA (76.50% accuracy for 1s with an ITR of 94.63), achieving state-of-the-art results and setting new benchmarks for SSVEP BCI performance. - -2024-10-15 -EEG-based 90-Degree Turn Intention Detection for Brain-Computer Interface -Pradyot Anand, Anant Jain, Suriya Prakash Muthukrishnan, Shubhendu Bhasin, Sitikantha Roy, Mohanavelu Kalathe, Lalan Kumar -Link -Electroencephalography (EEG)--based turn intention prediction for lower limb movement is important to build an efficient brain-computer interface (BCI) system. This study investigates the feasibility of intention detection of left-turn, right-turn, and straight walk by utilizing EEG signals obtained before the event occurrence. Synchronous data was collected using 31-channel EEG and IMU-based motion capture systems for nine healthy participants while performing left-turn, right-turn, and straight walk movements. EEG data was preprocessed with steps including Artifact Subspace Reconstruction (ASR), re-referencing, and Independent Component Analysis (ICA) to remove data noise. Feature extraction from the preprocessed EEG data involved computing various statistical measures (mean, median, standard deviation, skew, and kurtosis), and Hjorth parameters (activity, mobility, and complexity). Further, the feature selection was performed using the Random forest algorithm for the dimensionality reduction. The feature set obtained was utilized for 3-class classification using XG boost, gradient boosting, and support vector machine (SVM) with RBF kernel classifiers in a five-fold cross-validation scheme. Using the proposed intention detection methodology, the SVM classifier using an EEG window of 1.5 s and 0 s time-lag has the best decoding performance with mean accuracy, precision, and recall of 81.23%, 85.35%, and 83.92%, respectively, across the nine participants. The decoding analysis shows the feasibility of turn intention prediction for lower limb movement using the EEG signal before the event onset. - - -2024-10-13 -Towards Homogeneous Lexical Tone Decoding from Heterogeneous Intracranial Recordings -Di Wu, Siyuan Li, Chen Feng, Lu Cao, Yue Zhang, Jie Yang, Mohamad Sawan -Link -Recent advancements in brain-computer interfaces (BCIs) have enabled the decoding of lexical tones from intracranial recordings, offering the potential to restore the communication abilities of speech-impaired tonal language speakers. However, data heterogeneity induced by both physiological and instrumental factors poses a significant challenge for unified invasive brain tone decoding. Traditional subject-specific models, which operate under a heterogeneous decoding paradigm, fail to capture generalized neural representations and cannot effectively leverage data across subjects. To address these limitations, we introduce Homogeneity-Heterogeneity Disentangled Learning for neural Representations (H2DiLR), a novel framework that disentangles and learns both the homogeneity and heterogeneity from intracranial recordings across multiple subjects. To evaluate H2DiLR, we collected stereoelectroencephalography (sEEG) data from multiple participants reading Mandarin materials comprising 407 syllables, representing nearly all Mandarin characters. Extensive experiments demonstrate that H2DiLR, as a unified decoding paradigm, significantly outperforms the conventional heterogeneous decoding approach. Furthermore, we empirically confirm that H2DiLR effectively captures both homogeneity and heterogeneity during neural representation learning. - @@ -273,6 +273,13 @@ 2024-11-14 +Towards Neural Foundation Models for Vision: Aligning EEG, MEG, and fMRI Representations for Decoding, Encoding, and Modality Conversion +Matteo Ferrante, Tommaso Boccato, Grigorii Rashkov, Nicola Toschi +Link +This paper presents a novel approach towards creating a foundational model for aligning neural data and visual stimuli across multimodal representationsof brain activity by leveraging contrastive learning. We used electroencephalography (EEG), magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI) data. Our framework's capabilities are demonstrated through three key experiments: decoding visual information from neural data, encoding images into neural representations, and converting between neural modalities. The results highlight the model's ability to accurately capture semantic information across different brain imaging techniques, illustrating its potential in decoding, encoding, and modality conversion tasks. + + +2024-11-14 Interdependent scaling exponents in the human brain Daniel M. Castro, Ernesto P. Raposo, Mauro Copelli, Fernando A. N. Santos Link @@ -334,13 +341,6 @@ Link To understand collective network behavior in the complex human brain, pairwise correlation networks alone are insufficient for capturing the high-order interactions that extend beyond pairwise interactions and play a crucial role in brain network dynamics. These interactions often reveal intricate relationships among multiple brain networks, significantly influencing cognitive processes. In this study, we explored the correlation of correlation networks and topological network analysis with resting-state fMRI to gain deeper insights into these higher-order interactions and their impact on the topology of brain networks, ultimately enhancing our understanding of brain function. We observed that the correlation of correlation networks highlighted network connections while preserving the topological structure of correlation networks. Our findings suggested that the correlation of correlation networks surpassed traditional correlation networks, showcasing considerable potential for applications in various areas of network science. Moreover, after applying topological network analysis to the correlation of correlation networks, we observed that some high-order interaction hubs predominantly occurred in primary and high-level cognitive areas, such as the visual and fronto-parietal regions. These high-order hubs played a crucial role in information integration within the human brain. - -2024-11-05 -The Dynamics of Triple Interactions in Resting fMRI: Insights into Psychotic Disorders -Qiang Li, Vince D. Calhoun, Armin Iraji -Link -The human brain dynamically integrated and configured information to adapt to the environment. To capture these changes over time, dynamic second-order functional connectivity was typically used to capture transient brain patterns. However, dynamic second-order functional connectivity typically ignored interactions beyond pairwise relationships. To address this limitation, we utilized dynamic triple interactions to investigate multiscale network interactions in the brain. In this study, we evaluated a resting-state fMRI dataset that included individuals with psychotic disorders (PD). We first estimated dynamic triple interactions using resting-state fMRI. After clustering, we estimated cohort-specific and cohort-common states for controls (CN), schizophrenia (SZ), and schizoaffective disorder (SAD). From the cohort-specific states, we observed significant triple interactions, particularly among visual, subcortical, and somatomotor networks, as well as temporal and higher cognitive networks in SZ. In SAD, key interactions involved temporal networks in the initial state and somatomotor networks in subsequent states. From the cohort-common states, we observed that high-cognitive networks were primarily involved in SZ and SAD compared to CN. Furthermore, the most significant differences between SZ and SAD also existed in high-cognitive networks. In summary, we studied PD using dynamic triple interaction, the first time such an approach has been used to study PD. Our findings highlighted the significant potential of dynamic high-order functional connectivity, paving the way for new avenues in the study of the healthy and disordered human brain. - @@ -358,6 +358,13 @@ +2024-11-14 +Towards Neural Foundation Models for Vision: Aligning EEG, MEG, and fMRI Representations for Decoding, Encoding, and Modality Conversion +Matteo Ferrante, Tommaso Boccato, Grigorii Rashkov, Nicola Toschi +Link +This paper presents a novel approach towards creating a foundational model for aligning neural data and visual stimuli across multimodal representationsof brain activity by leveraging contrastive learning. We used electroencephalography (EEG), magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI) data. Our framework's capabilities are demonstrated through three key experiments: decoding visual information from neural data, encoding images into neural representations, and converting between neural modalities. The results highlight the model's ability to accurately capture semantic information across different brain imaging techniques, illustrating its potential in decoding, encoding, and modality conversion tasks. + + 2024-11-12 Search for the X17 particle in $^{7}\mathrm{Li}(\mathrm{p},\mathrm{e}^+ \mathrm{e}^{-}) ^{8}\mathrm{Be}$ processes with the MEG II detector The MEG II collaboration, K. Afanaciev, A. M. Baldini, S. Ban, H. Benmansour, G. Boca, P. W. Cattaneo, G. Cavoto, F. Cei, M. Chiappini, A. Corvaglia, G. Dal Maso, A. De Bari, M. De Gerone, L. Ferrari Barusso, M. Francesconi, L. Galli, G. Gallucci, F. Gatti, L. Gerritzen, F. Grancagnolo, E. G. Grandoni, M. Grassi, D. N. Grigoriev, M. Hildebrandt, F. Ignatov, F. Ikeda, T. Iwamoto, S. Karpov, P. -R. Kettle, N. Khomutov, A. Kolesnikov, N. Kravchuk, V. Krylov, N. Kuchinskiy, F. Leonetti, W. Li, V. Malyshev, A. Matsushita, M. Meucci, S. Mihara, W. Molzon, T. Mori, D. Nicolò, H. Nishiguchi, A. Ochi, W. Ootani, A. Oya, D. Palo, M. Panareo, A. Papa, V. Pettinacci, A. Popov, F. Renga, S. Ritt, M. Rossella, A. Rozhdestvensky. S. Scarpellini, P. Schwendimann, G. Signorelli, M. Takahashi, Y. Uchiyama, A. Venturini, B. Vitali, C. Voena, K. Yamamoto, R. Yokota, T. Yonemoto @@ -420,13 +427,6 @@ Link Epilepsy affects over 50 million people globally, with EEG/MEG-based spike detection playing a crucial role in diagnosis and treatment. Manual spike identification is time-consuming and requires specialized training, limiting the number of professionals available to analyze EEG/MEG data. To address this, various algorithmic approaches have been developed. However, current methods face challenges in handling varying channel configurations and in identifying the specific channels where spikes originate. This paper introduces a novel Nested Deep Learning (NDL) framework designed to overcome these limitations. NDL applies a weighted combination of signals across all channels, ensuring adaptability to different channel setups, and allows clinicians to identify key channels more accurately. Through theoretical analysis and empirical validation on real EEG/MEG datasets, NDL demonstrates superior accuracy in spike detection and channel localization compared to traditional methods. The results show that NDL improves prediction accuracy, supports cross-modality data integration, and can be fine-tuned for various neurophysiological applications. - -2024-09-23 -Dual Stream Graph Transformer Fusion Networks for Enhanced Brain Decoding -Lucas Goene, Siamak Mehrkanoon -Link -This paper presents the novel Dual Stream Graph-Transformer Fusion (DS-GTF) architecture designed specifically for classifying task-based Magnetoencephalography (MEG) data. In the spatial stream, inputs are initially represented as graphs, which are then passed through graph attention networks (GAT) to extract spatial patterns. Two methods, TopK and Thresholded Adjacency are introduced for initializing the adjacency matrix used in the GAT. In the temporal stream, the Transformer Encoder receives concatenated windowed input MEG data and learns new temporal representations. The learned temporal and spatial representations from both streams are fused before reaching the output layer. Experimental results demonstrate an enhancement in classification performance and a reduction in standard deviation across multiple test subjects compared to other examined models. - @@ -530,74 +530,74 @@ -2024-11-14 -Med-Bot: An AI-Powered Assistant to Provide Accurate and Reliable Medical Information -Ahan Bhatt, Nandan Vaghela -Link -This paper introduces Med-Bot, an AI-powered chatbot designed to provide users with accurate and reliable medical information. Utilizing advanced libraries and frameworks such as PyTorch, Chromadb, Langchain and Autogptq, Med-Bot is built to handle the complexities of natural language understanding in a healthcare context. The integration of llamaassisted data processing and AutoGPT-Q provides enhanced performance in processing and responding to queries based on PDFs of medical literature, ensuring that users receive precise and trustworthy information. This research details the methodologies employed in developing Med-Bot and evaluates its effectiveness in disseminating healthcare information. +2024-11-15 +Framework for Co-distillation Driven Federated Learning to Address Class Imbalance in Healthcare +Suraj Racha, Shubh Gupta, Humaira Firdowse, Aastik Solanki, Ganesh Ramakrishnan, Kshitij S. Jadhav +Link +Federated Learning (FL) is a pioneering approach in distributed machine learning, enabling collaborative model training across multiple clients while retaining data privacy. However, the inherent heterogeneity due to imbalanced resource representations across multiple clients poses significant challenges, often introducing bias towards the majority class. This issue is particularly prevalent in healthcare settings, where hospitals acting as clients share medical images. To address class imbalance and reduce bias, we propose a co-distillation driven framework in a federated healthcare setting. Unlike traditional federated setups with a designated server client, our framework promotes knowledge sharing among clients to collectively improve learning outcomes. Our experiments demonstrate that in a federated healthcare setting, co-distillation outperforms other federated methods in handling class imbalance. Additionally, we demonstrate that our framework has the least standard deviation with increasing imbalance while outperforming other baselines, signifying the robustness of our framework for FL in healthcare. -2024-11-14 -Exploring the capabilities of CNNs for 3D angiographic reconstructions from limited projection data using rotational angiography -Ahmad Rahmatpour, Allison Shields, Parmita Mondal, Parisa Naghdi, Michael Udin, Kyle A Williams, Mohammad Mahdi Shiraz Bhurwani, Swetadri Vasan Setlur Nagesh, Ciprian N Ionita -Link -This study leverages convolutional neural networks to enhance the temporal resolution of 3D angiography in intracranial aneurysms focusing on the reconstruction of volumetric contrast data from sparse and limited projections. Three patient-specific IA geometries were segmented and converted into stereolithography files to facilitate computational fluid dynamics simulations. These simulations first modeled blood flow under steady conditions with varying inlet velocities: 0.25 m/s, 0.35 m/s, and 0.45 m/s. Subsequently, 3D angiograms were simulated by labeling inlet particles to represent contrast bolus injections over durations of 0.5s, 1.0s, 1.5s, and 2.0s. The angiographic simulations were then used within a simulated cone beam C arm CT system to generate in-silico rotational DSAs, capturing projections every 10 ms over a 220-degree arc at 27 frames per second. From these simulations, both fully sampled (108 projections) and truncated projection datasets were generated the latter using a maximum of 49 projections. High fidelity volumetric images were reconstructed using a Parker weighted Feldkamp Davis Kress algorithm. A modified U Net CNN was subsequently trained on these datasets to reconstruct 3D angiographic volumes from the truncated projections. The network incorporated multiple convolutional layers with ReLU activations and Max pooling, complemented by upsampling and concatenation to preserve spatial detail. Model performance was evaluated using mean squared error (MSE). Evaluating our U net model across the test set yielded a MSE of 0.0001, indicating good agreement with ground truth reconstructions and demonstrating acceptable capabilities in capturing relevant transient angiographic features. This study confirms the feasibility of using CNNs for reconstructing 3D angiographic images from truncated projections. +2024-11-15 +Weakly-Supervised Multimodal Learning on MIMIC-CXR +Andrea Agostini, Daphné Chopard, Yang Meng, Norbert Fortin, Babak Shahbaba, Stephan Mandt, Thomas M. Sutter, Julia E. Vogt +Link +Multimodal data integration and label scarcity pose significant challenges for machine learning in medical settings. To address these issues, we conduct an in-depth evaluation of the newly proposed Multimodal Variational Mixture-of-Experts (MMVM) VAE on the challenging MIMIC-CXR dataset. Our analysis demonstrates that the MMVM VAE consistently outperforms other multimodal VAEs and fully supervised approaches, highlighting its strong potential for real-world medical applications. -2024-11-14 -MICCAI-CDMRI 2023 QuantConn Challenge Findings on Achieving Robust Quantitative Connectivity through Harmonized Preprocessing of Diffusion MRI -Nancy R. Newlin, Kurt Schilling, Serge Koudoro, Bramsh Qamar Chandio, Praitayini Kanakaraj, Daniel Moyer, Claire E. Kelly, Sila Genc, Jian Chen, Joseph Yuan-Mou Yang, Ye Wu, Yifei He, Jiawei Zhang, Qingrun Zeng, Fan Zhang, Nagesh Adluru, Vishwesh Nath, Sudhir Pathak, Walter Schneider, Anurag Gade, Yogesh Rathi, Tom Hendriks, Anna Vilanova, Maxime Chamberland, Tomasz Pieciak, Dominika Ciupek, Antonio Tristán Vega, Santiago Aja-Fernández, Maciej Malawski, Gani Ouedraogo, Julia Machnio, Christian Ewert, Paul M. Thompson, Neda Jahanshad, Eleftherios Garyfallidis, Bennett A. Landman -Link -White matter alterations are increasingly implicated in neurological diseases and their progression. International-scale studies use diffusion-weighted magnetic resonance imaging (DW-MRI) to qualitatively identify changes in white matter microstructure and connectivity. Yet, quantitative analysis of DW-MRI data is hindered by inconsistencies stemming from varying acquisition protocols. There is a pressing need to harmonize the preprocessing of DW-MRI datasets to ensure the derivation of robust quantitative diffusion metrics across acquisitions. In the MICCAI-CDMRI 2023 QuantConn challenge, participants were provided raw data from the same individuals collected on the same scanner but with two different acquisitions and tasked with preprocessing the DW-MRI to minimize acquisition differences while retaining biological variation. Submissions are evaluated on the reproducibility and comparability of cross-acquisition bundle-wise microstructure measures, bundle shape features, and connectomics. The key innovations of the QuantConn challenge are that (1) we assess bundles and tractography in the context of harmonization for the first time, (2) we assess connectomics in the context of harmonization for the first time, and (3) we have 10x additional subjects over prior harmonization challenge, MUSHAC and 100x over SuperMUDI. We find that bundle surface area, fractional anisotropy, connectome assortativity, betweenness centrality, edge count, modularity, nodal strength, and participation coefficient measures are most biased by acquisition and that machine learning voxel-wise correction, RISH mapping, and NeSH methods effectively reduce these biases. In addition, microstructure measures AD, MD, RD, bundle length, connectome density, efficiency, and path length are least biased by these acquisition differences. +2024-11-15 +EHRs Data Harmonization Platform, an easy-to-use shiny app based on recodeflow for harmonizing and deriving clinical features +Arian Aminoleslami, Geoffrey M. Anderson, Davide Chicco +Link +Electronic health records (EHRs) contain important longitudinal information on individuals who have received medical care. Traditionally, EHRs have been used to support a wide range of administrative activities such as billing and clinical workflow, but, given the depth and breadth of clinical and demographic data they contain, they are increasingly being used to provide real-world data for research. Although EHR data have enormous research potential, the full realization of that potential requires a data management strategy that extracts from large EHR databases, that are collected from a range of care settings and time periods, well-documented research-relevant data that can be used by different researchers. Having a common well-documented data management strategy for EHR will support reproducible research and sharing documentation on research variables that are derived from EHR variables is important to open science. In this short paper, we describe the EHRs Data Harmonization Platform. The platform is based on an easy to use web app a publicly available at https://poxotn-arian-aminoleslami.shinyapps.io/Arian/ and as a standalone software package at https://github.com/ArianAminoleslami/EHRs-Data Harmonization-Platform, that is linked to an existing R library for data harmonization called recodeflow. The platform can be used to extract, document, and harmonize variables from EHR and it can also be used to document and share research variables that have been derived from those EHR data. -2024-11-14 -Effect of Parametric Variation of Chordae Tendineae Structure on Simulated Atrioventricular Valve Closure -Nicolas R. Mangine, Devin W. Laurence, Patricia M. Sabin, Wensi Wu, Christian Herz, Christopher N. Zelonis, Justin S. Unger, Csaba Pinter, Andras Lasso, Steve A. Maas, Jeffrey A. Weiss, Matthew A. Jolley -Link -Many approaches have been used to model chordae tendineae geometries in finite element simulations of atrioventricular heart valves. Unfortunately, current "functional" chordae tendineae geometries lack fidelity that would be helpful when informing clinical decisions. The objectives of this work are (i) to improve synthetic chordae tendineae geometry fidelity to consider branching and (ii) to define how the chordae tendineae geometry affects finite element simulations of valve closure. In this work, we develop an open-source method to construct synthetic chordae tendineae geometries in the SlicerHeart Extension of 3D Slicer. The generated geometries are then used in FEBio finite element simulations of atrioventricular valve function to evaluate how variations in chordae tendineae geometry influence valve behavior. Effects are evaluated using functional and mechanical metrics. Our findings demonstrated that altering the chordae tendineae geometry of a stereotypical mitral valve led to changes in clinically relevant valve metrics and valve mechanics. Specifically, cross sectional area had the most influence over valve closure metrics, followed by chordae tendineae density, length, radius and branches. We then used this information to showcase the flexibility of our new workflow by altering the chordae tendineae geometry of two additional geometries (mitral valve with annular dilation and tricuspid valve) to improve finite element predictions. This study presents a flexible, open-source method for generating synthetic chordae tendineae with realistic branching structures. Further, we establish relationships between the chordae tendineae geometry and valve functional/mechanical metrics. This research contribution helps enrich our open-source workflow and brings the finite element simulations closer to use in a patient-specific clinical setting. +2024-11-15 +A Realistic Collimated X-Ray Image Simulation Pipeline +Benjamin El-Zein, Dominik Eckert, Thomas Weber, Maximilian Rohleder, Ludwig Ritschl, Steffen Kappler, Andreas Maier +Link +Collimator detection remains a challenging task in X-ray systems with unreliable or non-available information about the detectors position relative to the source. This paper presents a physically motivated image processing pipeline for simulating the characteristics of collimator shadows in X-ray images. By generating randomized labels for collimator shapes and locations, incorporating scattered radiation simulation, and including Poisson noise, the pipeline enables the expansion of limited datasets for training deep neural networks. We validate the proposed pipeline by a qualitative and quantitative comparison against real collimator shadows. Furthermore, it is demonstrated that utilizing simulated data within our deep learning framework not only serves as a suitable substitute for actual collimators but also enhances the generalization performance when applied to real-world data. -2024-11-14 -Assessing the Performance of the DINOv2 Self-supervised Learning Vision Transformer Model for the Segmentation of the Left Atrium from MRI Images -Bipasha Kundu, Bidur Khanal, Richard Simon, Cristian A. Linte -Link -Accurate left atrium (LA) segmentation from pre-operative scans is crucial for diagnosing atrial fibrillation, treatment planning, and supporting surgical interventions. While deep learning models are key in medical image segmentation, they often require extensive manually annotated data. Foundation models trained on larger datasets have reduced this dependency, enhancing generalizability and robustness through transfer learning. We explore DINOv2, a self-supervised learning vision transformer trained on natural images, for LA segmentation using MRI. The challenges for LA's complex anatomy, thin boundaries, and limited annotated data make accurate segmentation difficult before & during the image-guided intervention. We demonstrate DINOv2's ability to provide accurate & consistent segmentation, achieving a mean Dice score of .871 & a Jaccard Index of .792 for end-to-end fine-tuning. Through few-shot learning across various data sizes & patient counts, DINOv2 consistently outperforms baseline models. These results suggest that DINOv2 effectively adapts to MRI with limited data, highlighting its potential as a competitive tool for segmentation & encouraging broader use in medical imaging. +2024-11-15 +Fill in the blanks: Rethinking Interpretability in vision +Pathirage N. Deelaka, Tharindu Wickremasinghe, Devin Y. De Silva, Lisara N. Gajaweera +Link +Model interpretability is a key challenge that has yet to align with the advancements observed in contemporary state-of-the-art deep learning models. In particular, deep learning aided vision tasks require interpretability, in order for their adoption in more specialized domains such as medical imaging. Although the field of explainable AI (XAI) developed methods for interpreting vision models along with early convolutional neural networks, recent XAI research has mainly focused on assigning attributes via saliency maps. As such, these methods are restricted to providing explanations at a sample level, and many explainability methods suffer from low adaptability across a wide range of vision models. In our work, we re-think vision-model explainability from a novel perspective, to probe the general input structure that a model has learnt during its training. To this end, we ask the question: "How would a vision model fill-in a masked-image". Experiments on standard vision datasets and pre-trained models reveal consistent patterns, and could be intergrated as an additional model-agnostic explainability tool in modern machine-learning platforms. The code will be available at \url{https://github.com/BoTZ-TND/FillingTheBlanks.git} -2024-11-14 -Expert Study on Interpretable Machine Learning Models with Missing Data -Lena Stempfle, Arthur James, Julie Josse, Tobias Gauss, Fredrik D. Johansson -Link -Inherently interpretable machine learning (IML) models provide valuable insights for clinical decision-making but face challenges when features have missing values. Classical solutions like imputation or excluding incomplete records are often unsuitable in applications where values are missing at test time. In this work, we conducted a survey with 71 clinicians from 29 trauma centers across France, including 20 complete responses to study the interaction between medical professionals and IML applied to data with missing values. This provided valuable insights into how missing data is interpreted in clinical machine learning. We used the prediction of hemorrhagic shock as a concrete example to gauge the willingness and readiness of the participants to adopt IML models from three classes of methods. Our findings show that, while clinicians value interpretability and are familiar with common IML methods, classical imputation techniques often misalign with their intuition, and that models that natively handle missing values are preferred. These results emphasize the need to integrate clinical intuition into future IML models for better human-computer interaction. +2024-11-15 +ScribbleVS: Scribble-Supervised Medical Image Segmentation via Dynamic Competitive Pseudo Label Selection +Tao Wang, Xinlin Zhang, Yuanbin Chen, Yuanbo Zhou, Longxuan Zhao, Tao Tan, Tong Tong +Link +In clinical medicine, precise image segmentation can provide substantial support to clinicians. However, achieving such precision often requires a large amount of finely annotated data, which can be costly. Scribble annotation presents a more efficient alternative, boosting labeling efficiency. However, utilizing such minimal supervision for medical image segmentation training, especially with scribble annotations, poses significant challenges. To address these challenges, we introduce ScribbleVS, a novel framework that leverages scribble annotations. We introduce a Regional Pseudo Labels Diffusion Module to expand the scope of supervision and reduce the impact of noise present in pseudo labels. Additionally, we propose a Dynamic Competitive Selection module for enhanced refinement in selecting pseudo labels. Experiments conducted on the ACDC and MSCMRseg datasets have demonstrated promising results, achieving performance levels that even exceed those of fully supervised methodologies. The codes of this study are available at https://github.com/ortonwang/ScribbleVS. -2024-11-14 -OOD-SEG: Out-Of-Distribution detection for image SEGmentation with sparse multi-class positive-only annotations -Junwen Wang, Zhonghao Wang, Oscar MacCormac, Jonathan Shapey, Tom Vercauteren -Link -Despite significant advancements, segmentation based on deep neural networks in medical and surgical imaging faces several challenges, two of which we aim to address in this work. First, acquiring complete pixel-level segmentation labels for medical images is time-consuming and requires domain expertise. Second, typical segmentation pipelines cannot detect out-of-distribution (OOD) pixels, leaving them prone to spurious outputs during deployment. In this work, we propose a novel segmentation approach exploiting OOD detection that learns only from sparsely annotated pixels from multiple positive-only classes. %but \emph{no background class} annotation. These multi-class positive annotations naturally fall within the in-distribution (ID) set. Unlabelled pixels may contain positive classes but also negative ones, including what is typically referred to as \emph{background} in standard segmentation formulations. Here, we forgo the need for background annotation and consider these together with any other unseen classes as part of the OOD set. Our framework can integrate, at a pixel-level, any OOD detection approaches designed for classification tasks. To address the lack of existing OOD datasets and established evaluation metric for medical image segmentation, we propose a cross-validation strategy that treats held-out labelled classes as OOD. Extensive experiments on both multi-class hyperspectral and RGB surgical imaging datasets demonstrate the robustness and generalisation capability of our proposed framework. +2024-11-15 +Embedding Byzantine Fault Tolerance into Federated Learning via Virtual Data-Driven Consistency Scoring Plugin +Youngjoon Lee, Jinu Gong, Joonhyuk Kang +Link +Given sufficient data from multiple edge devices, federated learning (FL) enables training a shared model without transmitting private data to a central server. However, FL is generally vulnerable to Byzantine attacks from compromised edge devices, which can significantly degrade the model performance. In this paper, we propose a intuitive plugin that can be integrated into existing FL techniques to achieve Byzantine-Resilience. Key idea is to generate virtual data samples and evaluate model consistency scores across local updates to effectively filter out compromised edge devices. By utilizing this scoring mechanism before the aggregation phase, the proposed plugin enables existing FL techniques to become robust against Byzantine attacks while maintaining their original benefits. Numerical results on medical image classification task validate that plugging the proposed approach into representative FL algorithms, effectively achieves Byzantine resilience. Furthermore, the proposed plugin maintains the original convergence properties of the base FL algorithms when no Byzantine attacks are present. -2024-11-14 -Marker-free Human Gait Analysis using a Smart Edge Sensor System -Eva Katharina Bauer, Simon Bultmann, Sven Behnke -Link -The human gait is a complex interplay between the neuronal and the muscular systems, reflecting an individual's neurological and physiological condition. This makes gait analysis a valuable tool for biomechanics and medical experts. Traditional observational gait analysis is cost-effective but lacks reliability and accuracy, while instrumented gait analysis, particularly using marker-based optical systems, provides accurate data but is expensive and time-consuming. In this paper, we introduce a novel markerless approach for gait analysis using a multi-camera setup with smart edge sensors to estimate 3D body poses without fiducial markers. We propose a Siamese embedding network with triplet loss calculation to identify individuals by their gait pattern. This network effectively maps gait sequences to an embedding space that enables clustering sequences from the same individual or activity closely together while separating those of different ones. Our results demonstrate the potential of the proposed system for efficient automated gait analysis in diverse real-world environments, facilitating a wide range of applications. +2024-11-15 +Evaluating the role of `Constitutions' for learning from AI feedback +Saskia Redgate, Andrew M. Bean, Adam Mahdi +Link +The growing capabilities of large language models (LLMs) have led to their use as substitutes for human feedback for training and assessing other LLMs. These methods often rely on `constitutions', written guidelines which a critic model uses to provide feedback and improve generations. We investigate how the choice of constitution affects feedback quality by using four different constitutions to improve patient-centered communication in medical interviews. In pairwise comparisons conducted by 215 human raters, we found that detailed constitutions led to better results regarding emotive qualities. However, none of the constitutions outperformed the baseline in learning more practically-oriented skills related to information gathering and provision. Our findings indicate that while detailed constitutions should be prioritised, there are possible limitations to the effectiveness of AI feedback as a reward signal in certain areas. -2024-11-14 -When Mamba Meets xLSTM: An Efficient and Precise Method with the XLSTM-VMUNet Model for Skin lesion Segmentation -Zhuoyi Fang, KeXuan Shi, Qiang Han -Link -Automatic melanoma segmentation is essential for early skin cancer detection, yet challenges arise from the heterogeneity of melanoma, as well as interfering factors like blurred boundaries, low contrast, and imaging artifacts. While numerous algorithms have been developed to address these issues, previous approaches have often overlooked the need to jointly capture spatial and sequential features within dermatological images. This limitation hampers segmentation accuracy, especially in cases with indistinct borders or structurally similar lesions. Additionally, previous models lacked both a global receptive field and high computational efficiency. In this work, we present the XLSTM-VMUNet Model, which jointly capture spatial and sequential features within derma-tological images successfully. XLSTM-VMUNet can not only specialize in extracting spatial features from images, focusing on the structural characteristics of skin lesions, but also enhance contextual understanding, allowing more effective handling of complex medical image structures. Experiment results on the ISIC2018 dataset demonstrate that XLSTM-VMUNet outperforms VMUNet by 1.25% on DSC and 2.07% on IoU, with faster convergence and consistently high segmentation perfor-mance. Our code of XLSTM-VMUNet is available at https://github.com/MrFang/xLSTM-VMUNet. +2024-11-15 +CoSAM: Self-Correcting SAM for Domain Generalization in 2D Medical Image Segmentation +Yihang Fu, Ziyang Chen, Yiwen Ye, Xingliang Lei, Zhisong Wang, Yong Xia +Link +Medical images often exhibit distribution shifts due to variations in imaging protocols and scanners across different medical centers. Domain Generalization (DG) methods aim to train models on source domains that can generalize to unseen target domains. Recently, the segment anything model (SAM) has demonstrated strong generalization capabilities due to its prompt-based design, and has gained significant attention in image segmentation tasks. Existing SAM-based approaches attempt to address the need for manual prompts by introducing prompt generators that automatically generate these prompts. However, we argue that auto-generated prompts may not be sufficiently accurate under distribution shifts, potentially leading to incorrect predictions that still require manual verification and correction by clinicians. To address this challenge, we propose a method for 2D medical image segmentation called Self-Correcting SAM (CoSAM). Our approach begins by generating coarse masks using SAM in a prompt-free manner, providing prior prompts for the subsequent stages, and eliminating the need for prompt generators. To automatically refine these coarse masks, we introduce a generalized error decoder that simulates the correction process typically performed by clinicians. Furthermore, we generate diverse prompts as feedback based on the corrected masks, which are used to iteratively refine the predictions within a self-correcting loop, enhancing the generalization performance of our model. Extensive experiments on two medical image segmentation benchmarks across multiple scenarios demonstrate the superiority of CoSAM over state-of-the-art SAM-based methods. -2024-11-14 -Time-to-Event Pretraining for 3D Medical Imaging -Zepeng Huo, Jason Alan Fries, Alejandro Lozano, Jeya Maria Jose Valanarasu, Ethan Steinberg, Louis Blankemeier, Akshay S. Chaudhari, Curtis Langlotz, Nigam H. Shah -Link -With the rise of medical foundation models and the growing availability of imaging data, scalable pretraining techniques offer a promising way to identify imaging biomarkers predictive of future disease risk. While current self-supervised methods for 3D medical imaging models capture local structural features like organ morphology, they fail to link pixel biomarkers with long-term health outcomes due to a missing context problem. Current approaches lack the temporal context necessary to identify biomarkers correlated with disease progression, as they rely on supervision derived only from images and concurrent text descriptions. To address this, we introduce time-to-event pretraining, a pretraining framework for 3D medical imaging models that leverages large-scale temporal supervision from paired, longitudinal electronic health records (EHRs). Using a dataset of 18,945 CT scans (4.2 million 2D images) and time-to-event distributions across thousands of EHR-derived tasks, our method improves outcome prediction, achieving an average AUROC increase of 23.7% and a 29.4% gain in Harrell's C-index across 8 benchmark tasks. Importantly, these gains are achieved without sacrificing diagnostic classification performance. This study lays the foundation for integrating longitudinal EHR and 3D imaging data to advance clinical risk prediction. +2024-11-15 +Two-step registration method boosts sensitivity in longitudinal fixel-based analyses +Aurélie Lebrun, Michel Bottlaender, Julien Lagarde, Marie Sarazin, Yann Leprince +Link +Longitudinal analyses are increasingly used in clinical studies as they allow the study of subtle changes over time within the same subjects. In most of these studies, it is necessary to align all the images studied to a common reference by registering them to a template. In the study of white matter using the recently developed fixel-based analysis (FBA) method, this registration is important, in particular because the fiber bundle cross-section metric is a direct measure of this registration. In the vast majority of longitudinal FBA studies described in the literature, sessions acquired for a same subject are directly independently registered to the template. However, it has been shown in T1-based morphometry that a 2-step registration through an intra-subject average can be advantageous in longitudinal analyses. In this work, we propose an implementation of this 2-step registration method in a typical longitudinal FBA aimed at investigating the evolution of white matter changes in Alzheimer's disease (AD). We compared at the fixel level the mean absolute effect and standard deviation yielded by this registration method and by a direct registration, as well as the results obtained with each registration method for the study of AD in both fixelwise and tract-based analyses. We found that the 2-step method reduced the variability of the measurements and thus enhanced statistical power in both types of analyses. diff --git a/data_store/papers_2024-11-19.json b/data_store/papers_2024-11-19.json new file mode 100644 index 0000000..116b767 --- /dev/null +++ b/data_store/papers_2024-11-19.json @@ -0,0 +1,516 @@ +{ + "Brain": { + "2411.10100v1": { + "title": "Multi-Task Adversarial Variational Autoencoder for Estimating Biological Brain Age with Multimodal Neuroimaging", + "url": "http://arxiv.org/abs/2411.10100v1", + "authors": "Muhammad Usman, Azka Rehman, Abdullah Shahid, Abd Ur Rehman, Sung-Min Gho, Aleum Lee, Tariq M. Khan, Imran Razzak", + "update_time": "2024-11-15", + "abstract": "Despite advances in deep learning for estimating brain age from structural MRI data, incorporating functional MRI data is challenging due to its complex structure and the noisy nature of functional connectivity measurements. To address this, we present the Multitask Adversarial Variational Autoencoder, a custom deep learning framework designed to improve brain age predictions through multimodal MRI data integration. This model separates latent variables into generic and unique codes, isolating shared and modality-specific features. By integrating multitask learning with sex classification as an additional task, the model captures sex-specific aging patterns. Evaluated on the OpenBHB dataset, a large multisite brain MRI collection, the model achieves a mean absolute error of 2.77 years, outperforming traditional methods. This success positions M-AVAE as a powerful tool for metaverse-based healthcare applications in brain age estimation." + }, + "2411.10084v1": { + "title": "Adapting the Biological SSVEP Response to Artificial Neural Networks", + "url": "http://arxiv.org/abs/2411.10084v1", + "authors": "Emirhan B\u00f6ge, Yasemin Gunindi, Erchan Aptoula, Nihan Alp, Huseyin Ozkan", + "update_time": "2024-11-15", + "abstract": "Neuron importance assessment is crucial for understanding the inner workings of artificial neural networks (ANNs) and improving their interpretability and efficiency. This paper introduces a novel approach to neuron significance assessment inspired by frequency tagging, a technique from neuroscience. By applying sinusoidal contrast modulation to image inputs and analyzing resulting neuron activations, this method enables fine-grained analysis of a network's decision-making processes. Experiments conducted with a convolutional neural network for image classification reveal notable harmonics and intermodulations in neuron-specific responses under part-based frequency tagging. These findings suggest that ANNs exhibit behavior akin to biological brains in tuning to flickering frequencies, thereby opening avenues for neuron/filter importance assessment through frequency tagging. The proposed method holds promise for applications in network pruning, and model interpretability, contributing to the advancement of explainable artificial intelligence and addressing the lack of transparency in neural networks. Future research directions include developing novel loss functions to encourage biologically plausible behavior in ANNs." + }, + "2411.09953v1": { + "title": "Brain-inspired Action Generation with Spiking Transformer Diffusion Policy Model", + "url": "http://arxiv.org/abs/2411.09953v1", + "authors": "Qianhao Wang, Yinqian Sun, Enmeng Lu, Qian Zhang, Yi Zeng", + "update_time": "2024-11-15", + "abstract": "Spiking Neural Networks (SNNs) has the ability to extract spatio-temporal features due to their spiking sequence. While previous research has primarily foucus on the classification of image and reinforcement learning. In our paper, we put forward novel diffusion policy model based on Spiking Transformer Neural Networks and Denoising Diffusion Probabilistic Model (DDPM): Spiking Transformer Modulate Diffusion Policy Model (STMDP), a new brain-inspired model for generating robot action trajectories. In order to improve the performance of this model, we develop a novel decoder module: Spiking Modulate De coder (SMD), which replaces the traditional Decoder module within the Transformer architecture. Additionally, we explored the substitution of DDPM with Denoising Diffusion Implicit Models (DDIM) in our frame work. We conducted experiments across four robotic manipulation tasks and performed ablation studies on the modulate block. Our model consistently outperforms existing Transformer-based diffusion policy method. Especially in Can task, we achieved an improvement of 8%. The proposed STMDP method integrates SNNs, dffusion model and Transformer architecture, which offers new perspectives and promising directions for exploration in brain-inspired robotics." + }, + "2411.09875v1": { + "title": "EEG Spectral Analysis in Gray Zone Between Healthy and Insomnia", + "url": "http://arxiv.org/abs/2411.09875v1", + "authors": "Ha-Na Jo, Young-Seok Kweon, Seo-Hyun Lee", + "update_time": "2024-11-15", + "abstract": "This study investigates the sleep characteristics and brain activity of individuals in the gray zone of insomnia, a population that experiences sleep disturbances yet does not fully meet the clinical criteria for chronic insomnia. Thirteen healthy participants and thirteen individuals from the gray zone were assessed using polysomnography and electroencephalogram to analyze both sleep architecture and neural activity. Although no significant differences in objective sleep quality or structure were found between the groups, gray zone individuals reported higher insomnia severity index scores, indicating subjective sleep difficulties. Electroencephalogram analysis revealed increased delta and alpha activity during the wake stage, suggesting lingering sleep inertia, while non-rapid eye movement stages 1 and 2 exhibited elevated beta and gamma activity, often associated with chronic insomnia. However, these high-frequency patterns were not observed in non-rapid eye movement stage 3 or rapid eye movement sleep, suggesting less severe disruptions compared to chronic insomnia. This study emphasizes that despite normal polysomnography findings, EEG patterns in gray zone individuals suggest a potential risk for chronic insomnia, highlighting the need for early identification and tailored intervention strategies to prevent progression." + }, + "2411.09822v1": { + "title": "A Self-Supervised Model for Multi-modal Stroke Risk Prediction", + "url": "http://arxiv.org/abs/2411.09822v1", + "authors": "Camille Delgrange, Olga Demler, Samia Mora, Bjoern Menze, Ezequiel de la Rosa, Neda Davoudi", + "update_time": "2024-11-14", + "abstract": "Predicting stroke risk is a complex challenge that can be enhanced by integrating diverse clinically available data modalities. This study introduces a self-supervised multimodal framework that combines 3D brain imaging, clinical data, and image-derived features to improve stroke risk prediction prior to onset. By leveraging large unannotated clinical datasets, the framework captures complementary and synergistic information across image and tabular data modalities. Our approach is based on a contrastive learning framework that couples contrastive language-image pretraining with an image-tabular matching module, to better align multimodal data representations in a shared latent space. The model is trained on the UK Biobank, which includes structural brain MRI and clinical data. We benchmark its performance against state-of-the-art unimodal and multimodal methods using tabular, image, and image-tabular combinations under diverse frozen and trainable model settings. The proposed model outperformed self-supervised tabular (image) methods by 2.6% (2.6%) in ROC-AUC and by 3.3% (5.6%) in balanced accuracy. Additionally, it showed a 7.6% increase in balanced accuracy compared to the best multimodal supervised model. Through interpretable tools, our approach demonstrated better integration of tabular and image data, providing richer and more aligned embeddings. Gradient-weighted Class Activation Mapping heatmaps further revealed activated brain regions commonly associated in the literature with brain aging, stroke risk, and clinical outcomes. This robust self-supervised multimodal framework surpasses state-of-the-art methods for stroke risk prediction and offers a strong foundation for future studies integrating diverse data modalities to advance clinical predictive modelling." + }, + "2411.09787v1": { + "title": "ART-Rx: A Proportional-Integral-Derivative (PID) Controlled Adaptive Real-Time Threshold Receiver for Molecular Communication", + "url": "http://arxiv.org/abs/2411.09787v1", + "authors": "Hongbin Ni, Ozgur B. Akan", + "update_time": "2024-11-14", + "abstract": "Molecular communication (MC) in microfluidic channels faces significant challenges in signal detection due to the stochastic nature of molecule propagation and dynamic, noisy environments. Conventional detection methods often struggle under varying channel conditions, leading to high bit error rates (BER) and reduced communication efficiency. This paper introduces ART-Rx, a novel Adaptive Real-Time Threshold Receiver for MC that addresses these challenges. Implemented within a conceptual system-on-chip (SoC), ART-Rx employs a Proportional-Integral-Derivative (PID) controller to dynamically adjust the detection threshold based on observed errors in real time. Comprehensive simulations using MATLAB and Smoldyn compare ART-Rx's performance against a statistically optimal detection threshold across various scenarios, including different levels of interference, concentration shift keying (CSK) levels, flow velocities, transmitter-receiver distances, diffusion coefficients, and binding rates. The results demonstrate that ART-Rx significantly outperforms conventional methods, maintaining consistently low BER and bit error probabilities (BEP) even in high-noise conditions and extreme channel environments. The system exhibits exceptional robustness to interference and shows the potential to enable higher data rates in CSK modulation. Furthermore, because ART-Rx is effectively adaptable to varying environmental conditions in microfluidic channels, it offers a computationally efficient and straightforward approach to enhance signal detection in nanoscale communication systems. This approach presents a promising control theory-based solution to improve the reliability of data transmission in practical MC systems, with potential applications in healthcare, brain-machine interfaces (BMI), and the Internet of Bio-Nano Things (IoBNT)." + }, + "2411.09593v1": { + "title": "SMILE-UHURA Challenge -- Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiograms", + "url": "http://arxiv.org/abs/2411.09593v1", + "authors": "Soumick Chatterjee, Hendrik Mattern, Marc D\u00f6rner, Alessandro Sciarra, Florian Dubost, Hannes Schnurre, Rupali Khatun, Chun-Chih Yu, Tsung-Lin Hsieh, Yi-Shan Tsai, Yi-Zeng Fang, Yung-Ching Yang, Juinn-Dar Huang, Marshall Xu, Siyu Liu, Fernanda L. Ribeiro, Saskia Bollmann, Karthikesh Varma Chintalapati, Chethan Mysuru Radhakrishna, Sri Chandana Hudukula Ram Kumara, Raviteja Sutrave, Abdul Qayyum, Moona Mazher, Imran Razzak, Cristobal Rodero, Steven Niederren, Fengming Lin, Yan Xia, Jiacheng Wang, Riyu Qiu, Liansheng Wang, Arya Yazdan Panah, Rosana El Jurdi, Guanghui Fu, Janan Arslan, Ghislain Vaillant, Romain Valabregue, Didier Dormont, Bruno Stankoff, Olivier Colliot, Luisa Vargas, Isai Daniel Chac\u00f3n, Ioannis Pitsiorlas, Pablo Arbel\u00e1ez, Maria A. Zuluaga, Stefanie Schreiber, Oliver Speck, Andreas N\u00fcrnberger", + "update_time": "2024-11-14", + "abstract": "The human brain receives nutrients and oxygen through an intricate network of blood vessels. Pathology affecting small vessels, at the mesoscopic scale, represents a critical vulnerability within the cerebral blood supply and can lead to severe conditions, such as Cerebral Small Vessel Diseases. The advent of 7 Tesla MRI systems has enabled the acquisition of higher spatial resolution images, making it possible to visualise such vessels in the brain. However, the lack of publicly available annotated datasets has impeded the development of robust, machine learning-driven segmentation algorithms. To address this, the SMILE-UHURA challenge was organised. This challenge, held in conjunction with the ISBI 2023, in Cartagena de Indias, Colombia, aimed to provide a platform for researchers working on related topics. The SMILE-UHURA challenge addresses the gap in publicly available annotated datasets by providing an annotated dataset of Time-of-Flight angiography acquired with 7T MRI. This dataset was created through a combination of automated pre-segmentation and extensive manual refinement. In this manuscript, sixteen submitted methods and two baseline methods are compared both quantitatively and qualitatively on two different datasets: held-out test MRAs from the same dataset as the training data (with labels kept secret) and a separate 7T ToF MRA dataset where both input volumes and labels are kept secret. The results demonstrate that most of the submitted deep learning methods, trained on the provided training dataset, achieved reliable segmentation performance. Dice scores reached up to 0.838 $\\pm$ 0.066 and 0.716 $\\pm$ 0.125 on the respective datasets, with an average performance of up to 0.804 $\\pm$ 0.15." + }, + "2411.09567v1": { + "title": "VPBSD:Vessel-Pattern-Based Semi-Supervised Distillation for Efficient 3D Microscopic Cerebrovascular Segmentation", + "url": "http://arxiv.org/abs/2411.09567v1", + "authors": "Xi Lin, Shixuan Zhao, Xinxu Wei, Amir Shmuel, Yongjie Li", + "update_time": "2024-11-14", + "abstract": "3D microscopic cerebrovascular images are characterized by their high resolution, presenting significant annotation challenges, large data volumes, and intricate variations in detail. Together, these factors make achieving high-quality, efficient whole-brain segmentation particularly demanding. In this paper, we propose a novel Vessel-Pattern-Based Semi-Supervised Distillation pipeline (VpbSD) to address the challenges of 3D microscopic cerebrovascular segmentation. This pipeline initially constructs a vessel-pattern codebook that captures diverse vascular structures from unlabeled data during the teacher model's pretraining phase. In the knowledge distillation stage, the codebook facilitates the transfer of rich knowledge from a heterogeneous teacher model to a student model, while the semi-supervised approach further enhances the student model's exposure to diverse learning samples. Experimental results on real-world data, including comparisons with state-of-the-art methods and ablation studies, demonstrate that our pipeline and its individual components effectively address the challenges inherent in microscopic cerebrovascular segmentation." + }, + "2411.09723v1": { + "title": "Towards Neural Foundation Models for Vision: Aligning EEG, MEG, and fMRI Representations for Decoding, Encoding, and Modality Conversion", + "url": "http://arxiv.org/abs/2411.09723v1", + "authors": "Matteo Ferrante, Tommaso Boccato, Grigorii Rashkov, Nicola Toschi", + "update_time": "2024-11-14", + "abstract": "This paper presents a novel approach towards creating a foundational model for aligning neural data and visual stimuli across multimodal representationsof brain activity by leveraging contrastive learning. We used electroencephalography (EEG), magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI) data. Our framework's capabilities are demonstrated through three key experiments: decoding visual information from neural data, encoding images into neural representations, and converting between neural modalities. The results highlight the model's ability to accurately capture semantic information across different brain imaging techniques, illustrating its potential in decoding, encoding, and modality conversion tasks." + }, + "2411.09400v1": { + "title": "Imagined Speech and Visual Imagery as Intuitive Paradigms for Brain-Computer Interfaces", + "url": "http://arxiv.org/abs/2411.09400v1", + "authors": "Seo-Hyun Lee, Ji-Ha Park, Deok-Seon Kim", + "update_time": "2024-11-14", + "abstract": "Recent advancements in brain-computer interface (BCI) technology have emphasized the promise of imagined speech and visual imagery as effective paradigms for intuitive communication. This study investigates the classification performance and brain connectivity patterns associated with these paradigms, focusing on decoding accuracy across selected word classes. Sixteen participants engaged in tasks involving thirteen imagined speech and visual imagery classes, revealing above-chance classification accuracy for both paradigms. Variability in classification accuracy across individual classes highlights the influence of sensory and motor associations in imagined speech and vivid visual associations in visual imagery. Connectivity analysis further demonstrated increased functional connectivity in language-related and sensory regions for imagined speech, whereas visual imagery activated spatial and visual processing networks. These findings suggest the potential of imagined speech and visual imagery as an intuitive and scalable paradigm for BCI communication when selecting optimal word classes. Further exploration of the decoding outcomes for these two paradigms could provide insights for practical BCI communication." + } + }, + "EEG": { + "2411.10087v1": { + "title": "PFML: Self-Supervised Learning of Time-Series Data Without Representation Collapse", + "url": "http://arxiv.org/abs/2411.10087v1", + "authors": "Einari Vaaras, Manu Airaksinen, Okko R\u00e4s\u00e4nen", + "update_time": "2024-11-15", + "abstract": "Self-supervised learning (SSL) is a data-driven learning approach that utilizes the innate structure of the data to guide the learning process. In contrast to supervised learning, which depends on external labels, SSL utilizes the inherent characteristics of the data to produce its own supervisory signal. However, one frequent issue with SSL methods is representation collapse, where the model outputs a constant input-invariant feature representation. This issue hinders the potential application of SSL methods to new data modalities, as trying to avoid representation collapse wastes researchers' time and effort. This paper introduces a novel SSL algorithm for time-series data called Prediction of Functionals from Masked Latents (PFML). Instead of predicting masked input signals or their latent representations directly, PFML operates by predicting statistical functionals of the input signal corresponding to masked embeddings, given a sequence of unmasked embeddings. The algorithm is designed to avoid representation collapse, rendering it straightforwardly applicable to different time-series data domains, such as novel sensor modalities in clinical data. We demonstrate the effectiveness of PFML through complex, real-life classification tasks across three different data modalities: infant posture and movement classification from multi-sensor inertial measurement unit data, emotion recognition from speech data, and sleep stage classification from EEG data. The results show that PFML is superior to a conceptually similar pre-existing SSL method and competitive against the current state-of-the-art SSL method, while also being conceptually simpler and without suffering from representation collapse." + }, + "2411.09879v1": { + "title": "A Multi-Label EEG Dataset for Mental Attention State Classification in Online Learning", + "url": "http://arxiv.org/abs/2411.09879v1", + "authors": "Huan Liu, Yuzhe Zhang, Guanjian Liu, Xinxin Du, Haochong Wang, Dalin Zhang", + "update_time": "2024-11-15", + "abstract": "Attention is a vital cognitive process in the learning and memory environment, particularly in the context of online learning. Traditional methods for classifying attention states of online learners based on behavioral signals are prone to distortion, leading to increased interest in using electroencephalography (EEG) signals for authentic and accurate assessment. However, the field of attention state classification based on EEG signals in online learning faces challenges, including the scarcity of publicly available datasets, the lack of standardized data collection paradigms, and the requirement to consider the interplay between attention and other psychological states. In light of this, we present the Multi-label EEG dataset for classifying Mental Attention states (MEMA) in online learning. We meticulously designed a reliable and standard experimental paradigm with three attention states: neutral, relaxing, and concentrating, considering human physiological and psychological characteristics. This paradigm collected EEG signals from 20 subjects, each participating in 12 trials, resulting in 1,060 minutes of data. Emotional state labels, basic personal information, and personality traits were also collected to investigate the relationship between attention and other psychological states. Extensive quantitative and qualitative analysis, including a multi-label correlation study, validated the quality of the EEG attention data. The MEMA dataset and analysis provide valuable insights for advancing research on attention in online learning. The dataset is publicly available at \\url{https://github.com/GuanjianLiu/MEMA}." + }, + "2411.09875v1": { + "title": "EEG Spectral Analysis in Gray Zone Between Healthy and Insomnia", + "url": "http://arxiv.org/abs/2411.09875v1", + "authors": "Ha-Na Jo, Young-Seok Kweon, Seo-Hyun Lee", + "update_time": "2024-11-15", + "abstract": "This study investigates the sleep characteristics and brain activity of individuals in the gray zone of insomnia, a population that experiences sleep disturbances yet does not fully meet the clinical criteria for chronic insomnia. Thirteen healthy participants and thirteen individuals from the gray zone were assessed using polysomnography and electroencephalogram to analyze both sleep architecture and neural activity. Although no significant differences in objective sleep quality or structure were found between the groups, gray zone individuals reported higher insomnia severity index scores, indicating subjective sleep difficulties. Electroencephalogram analysis revealed increased delta and alpha activity during the wake stage, suggesting lingering sleep inertia, while non-rapid eye movement stages 1 and 2 exhibited elevated beta and gamma activity, often associated with chronic insomnia. However, these high-frequency patterns were not observed in non-rapid eye movement stage 3 or rapid eye movement sleep, suggesting less severe disruptions compared to chronic insomnia. This study emphasizes that despite normal polysomnography findings, EEG patterns in gray zone individuals suggest a potential risk for chronic insomnia, highlighting the need for early identification and tailored intervention strategies to prevent progression." + }, + "2411.09874v1": { + "title": "A Hybrid Artificial Intelligence System for Automated EEG Background Analysis and Report Generation", + "url": "http://arxiv.org/abs/2411.09874v1", + "authors": "Chin-Sung Tung, Sheng-Fu Liang, Shu-Feng Chang, Chung-Ping Young", + "update_time": "2024-11-15", + "abstract": "Electroencephalography (EEG) plays a crucial role in the diagnosis of various neurological disorders. However, small hospitals and clinics often lack advanced EEG signal analysis systems and are prone to misinterpretation in manual EEG reading. This study proposes an innovative hybrid artificial intelligence (AI) system for automatic interpretation of EEG background activity and report generation. The system combines deep learning models for posterior dominant rhythm (PDR) prediction, unsupervised artifact removal, and expert-designed algorithms for abnormality detection. For PDR prediction, 1530 labeled EEGs were used, and the best ensemble model achieved a mean absolute error (MAE) of 0.237, a root mean square error (RMSE) of 0.359, an accuracy of 91.8% within a 0.6Hz error, and an accuracy of 99% within a 1.2Hz error. The AI system significantly outperformed neurologists in detecting generalized background slowing (p = 0.02; F1: AI 0.93, neurologists 0.82) and demonstrated improved focal abnormality detection, although not statistically significant (p = 0.79; F1: AI 0.71, neurologists 0.55). Validation on both an internal dataset and the Temple University Abnormal EEG Corpus showed consistent performance (F1: 0.884 and 0.835, respectively; p = 0.66), demonstrating generalizability. The use of large language models (LLMs) for report generation demonstrated 100% accuracy, verified by three other independent LLMs. This hybrid AI system provides an easily scalable and accurate solution for EEG interpretation in resource-limited settings, assisting neurologists in improving diagnostic accuracy and reducing misdiagnosis rates.", + "code_url": "https://github.com/tcs211/ai_eeeg_report" + }, + "2411.09723v1": { + "title": "Towards Neural Foundation Models for Vision: Aligning EEG, MEG, and fMRI Representations for Decoding, Encoding, and Modality Conversion", + "url": "http://arxiv.org/abs/2411.09723v1", + "authors": "Matteo Ferrante, Tommaso Boccato, Grigorii Rashkov, Nicola Toschi", + "update_time": "2024-11-14", + "abstract": "This paper presents a novel approach towards creating a foundational model for aligning neural data and visual stimuli across multimodal representationsof brain activity by leveraging contrastive learning. We used electroencephalography (EEG), magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI) data. Our framework's capabilities are demonstrated through three key experiments: decoding visual information from neural data, encoding images into neural representations, and converting between neural modalities. The results highlight the model's ability to accurately capture semantic information across different brain imaging techniques, illustrating its potential in decoding, encoding, and modality conversion tasks." + }, + "2411.09302v1": { + "title": "EEG-Based Speech Decoding: A Novel Approach Using Multi-Kernel Ensemble Diffusion Models", + "url": "http://arxiv.org/abs/2411.09302v1", + "authors": "Soowon Kim, Ha-Na Jo, Eunyeong Ko", + "update_time": "2024-11-14", + "abstract": "In this study, we propose an ensemble learning framework for electroencephalogram-based overt speech classification, leveraging denoising diffusion probabilistic models with varying convolutional kernel sizes. The ensemble comprises three models with kernel sizes of 51, 101, and 201, effectively capturing multi-scale temporal features inherent in signals. This approach improves the robustness and accuracy of speech decoding by accommodating the rich temporal complexity of neural signals. The ensemble models work in conjunction with conditional autoencoders that refine the reconstructed signals and maximize the useful information for downstream classification tasks. The results indicate that the proposed ensemble-based approach significantly outperforms individual models and existing state-of-the-art techniques. These findings demonstrate the potential of ensemble methods in advancing brain signal decoding, offering new possibilities for non-verbal communication applications, particularly in brain-computer interface systems aimed at aiding individuals with speech impairments." + }, + "2411.09243v1": { + "title": "Towards Unified Neural Decoding of Perceived, Spoken and Imagined Speech from EEG Signals", + "url": "http://arxiv.org/abs/2411.09243v1", + "authors": "Jung-Sun Lee, Ha-Na Jo, Seo-Hyun Lee", + "update_time": "2024-11-14", + "abstract": "Brain signals accompany various information relevant to human actions and mental imagery, making them crucial to interpreting and understanding human intentions. Brain-computer interface technology leverages this brain activity to generate external commands for controlling the environment, offering critical advantages to individuals with paralysis or locked-in syndrome. Within the brain-computer interface domain, brain-to-speech research has gained attention, focusing on the direct synthesis of audible speech from brain signals. Most current studies decode speech from brain activity using invasive techniques and emphasize spoken speech data. However, humans express various speech states, and distinguishing these states through non-invasive approaches remains a significant yet challenging task. This research investigated the effectiveness of deep learning models for non-invasive-based neural signal decoding, with an emphasis on distinguishing between different speech paradigms, including perceived, overt, whispered, and imagined speech, across multiple frequency bands. The model utilizing the spatial conventional neural network module demonstrated superior performance compared to other models, especially in the gamma band. Additionally, imagined speech in the theta frequency band, where deep learning also showed strong effects, exhibited statistically significant differences compared to the other speech paradigms." + }, + "2411.09170v1": { + "title": "Towards Scalable Handwriting Communication via EEG Decoding and Latent Embedding Integration", + "url": "http://arxiv.org/abs/2411.09170v1", + "authors": "Jun-Young Kim, Deok-Seon Kim, Seo-Hyun Lee", + "update_time": "2024-11-14", + "abstract": "In recent years, brain-computer interfaces have made advances in decoding various motor-related tasks, including gesture recognition and movement classification, utilizing electroencephalogram (EEG) data. These developments are fundamental in exploring how neural signals can be interpreted to recognize specific physical actions. This study centers on a written alphabet classification task, where we aim to decode EEG signals associated with handwriting. To achieve this, we incorporate hand kinematics to guide the extraction of the consistent embeddings from high-dimensional neural recordings using auxiliary variables (CEBRA). These CEBRA embeddings, along with the EEG, are processed by a parallel convolutional neural network model that extracts features from both data sources simultaneously. The model classifies nine different handwritten characters, including symbols such as exclamation marks and commas, within the alphabet. We evaluate the model using a quantitative five-fold cross-validation approach and explore the structure of the embedding space through visualizations. Our approach achieves a classification accuracy of 91 % for the nine-class task, demonstrating the feasibility of fine-grained handwriting decoding from EEG." + }, + "2411.08521v1": { + "title": "SAD-TIME: a Spatiotemporal-fused network for depression detection with Automated multi-scale Depth-wise and TIME-interval-related common feature extractor", + "url": "http://arxiv.org/abs/2411.08521v1", + "authors": "Han-Guang Wang, Hui-Rang Hou, Li-Cheng Jin, Chen-Yang Xu, Zhong-Yi Zhang, Qing-Hao Meng", + "update_time": "2024-11-13", + "abstract": "Background and Objective: Depression is a severe mental disorder, and accurate diagnosis is pivotal to the cure and rehabilitation of people with depression. However, the current questionnaire-based diagnostic methods could bring subjective biases and may be denied by subjects. In search of a more objective means of diagnosis, researchers have begun to experiment with deep learning-based methods for identifying depressive disorders in recent years. Methods: In this study, a novel Spatiotemporal-fused network with Automated multi-scale Depth-wise and TIME-interval-related common feature extractor (SAD-TIME) is proposed. SAD-TIME incorporates an automated nodes' common features extractor (CFE), a spatial sector (SpS), a modified temporal sector (TeS), and a domain adversarial learner (DAL). The CFE includes a multi-scale depth-wise 1D-convolutional neural network and a time-interval embedding generator, where the unique information of each channel is preserved. The SpS fuses the functional connectivity with the distance-based connectivity containing spatial position of EEG electrodes. A multi-head-attention graph convolutional network is also applied in the SpS to fuse the features from different EEG channels. The TeS is based on long short-term memory and graph transformer networks, where the temporal information of different time-windows is fused. Moreover, the DAL is used after the SpS to obtain the domain-invariant feature. Results: Experimental results under tenfold cross-validation show that the proposed SAD-TIME method achieves 92.00% and 94.00% depression classification accuracies on two datasets, respectively, in cross-subject mode. Conclusion: SAD-TIME is a robust depression detection model, where the automatedly-generated features, the SpS and the TeS assist the classification performance with the fusion of the innate spatiotemporal information in the EEG signals." + }, + "2411.06928v1": { + "title": "Electroencephalogram-based Multi-class Decoding of Attended Speakers' Direction with Audio Spatial Spectrum", + "url": "http://arxiv.org/abs/2411.06928v1", + "authors": "Yuanming Zhang, Jing Lu, Zhibin Lin, Fei Chen, Haoliang Du, Xia Gao", + "update_time": "2024-11-11", + "abstract": "Decoding the directional focus of an attended speaker from listeners' electroencephalogram (EEG) signals is essential for developing brain-computer interfaces to improve the quality of life for individuals with hearing impairment. Previous works have concentrated on binary directional focus decoding, i.e., determining whether the attended speaker is on the left or right side of the listener. However, a more precise decoding of the exact direction of the attended speaker is necessary for effective speech processing. Additionally, audio spatial information has not been effectively leveraged, resulting in suboptimal decoding results. In this paper, we observe that, on our recently presented dataset with 15-class directional focus, models relying exclusively on EEG inputs exhibits significantly lower accuracy when decoding the directional focus in both leave-one-subject-out and leave-one-trial-out scenarios. By integrating audio spatial spectra with EEG features, the decoding accuracy can be effectively improved. We employ the CNN, LSM-CNN, and EEG-Deformer models to decode the directional focus from listeners' EEG signals with the auxiliary audio spatial spectra. The proposed Sp-Aux-Deformer model achieves notable 15-class decoding accuracies of 57.48% and 61.83% in leave-one-subject-out and leave-one-trial-out scenarios, respectively." + } + }, + "BCI": { + "2411.09400v1": { + "title": "Imagined Speech and Visual Imagery as Intuitive Paradigms for Brain-Computer Interfaces", + "url": "http://arxiv.org/abs/2411.09400v1", + "authors": "Seo-Hyun Lee, Ji-Ha Park, Deok-Seon Kim", + "update_time": "2024-11-14", + "abstract": "Recent advancements in brain-computer interface (BCI) technology have emphasized the promise of imagined speech and visual imagery as effective paradigms for intuitive communication. This study investigates the classification performance and brain connectivity patterns associated with these paradigms, focusing on decoding accuracy across selected word classes. Sixteen participants engaged in tasks involving thirteen imagined speech and visual imagery classes, revealing above-chance classification accuracy for both paradigms. Variability in classification accuracy across individual classes highlights the influence of sensory and motor associations in imagined speech and vivid visual associations in visual imagery. Connectivity analysis further demonstrated increased functional connectivity in language-related and sensory regions for imagined speech, whereas visual imagery activated spatial and visual processing networks. These findings suggest the potential of imagined speech and visual imagery as an intuitive and scalable paradigm for BCI communication when selecting optimal word classes. Further exploration of the decoding outcomes for these two paradigms could provide insights for practical BCI communication." + }, + "2411.07652v1": { + "title": "On the BCI Problem", + "url": "http://arxiv.org/abs/2411.07652v1", + "authors": "Ted Dobson, Gregory Robson", + "update_time": "2024-11-12", + "abstract": "Let $G$ be a group. The BCI problem asks whether two Haar graphs of $G$ are isomorphic if and only if they are isomorphic by an element of an explicit list of isomorphisms. We first generalize this problem in a natural way and give a theoretical way to solve the isomorphism problem for the natural generalization. We then restrict our attention to abelian groups and, with an exception, reduce the problem to the isomorphism problem for a related quotient, component, or corresponding Cayley digraph. For Haar graphs of an abelian group of odd order with connection sets $S$ those of Cayley graphs (i.e. $S = -S$), the exception does not exist. For Haar graphs of cyclic groups of odd order with connection sets those of a Cayley graph, among others, we solve the isomorphism problem." + }, + "2411.02094v1": { + "title": "Alignment-Based Adversarial Training (ABAT) for Improving the Robustness and Accuracy of EEG-Based BCIs", + "url": "http://arxiv.org/abs/2411.02094v1", + "authors": "Xiaoqing Chen, Ziwei Wang, Dongrui Wu", + "update_time": "2024-11-04", + "abstract": "Machine learning has achieved great success in electroencephalogram (EEG) based brain-computer interfaces (BCIs). Most existing BCI studies focused on improving the decoding accuracy, with only a few considering the adversarial security. Although many adversarial defense approaches have been proposed in other application domains such as computer vision, previous research showed that their direct extensions to BCIs degrade the classification accuracy on benign samples. This phenomenon greatly affects the applicability of adversarial defense approaches to EEG-based BCIs. To mitigate this problem, we propose alignment-based adversarial training (ABAT), which performs EEG data alignment before adversarial training. Data alignment aligns EEG trials from different domains to reduce their distribution discrepancies, and adversarial training further robustifies the classification boundary. The integration of data alignment and adversarial training can make the trained EEG classifiers simultaneously more accurate and more robust. Experiments on five EEG datasets from two different BCI paradigms (motor imagery classification, and event related potential recognition), three convolutional neural network classifiers (EEGNet, ShallowCNN and DeepCNN) and three different experimental settings (offline within-subject cross-block/-session classification, online cross-session classification, and pre-trained classifiers) demonstrated its effectiveness. It is very intriguing that adversarial attacks, which are usually used to damage BCI systems, can be used in ABAT to simultaneously improve the model accuracy and robustness.", + "code_url": "https://github.com/xqchen914/abat" + }, + "2410.23639v1": { + "title": "Biologically-Inspired Technologies: Integrating Brain-Computer Interface and Neuromorphic Computing for Human Digital Twins", + "url": "http://arxiv.org/abs/2410.23639v1", + "authors": "Chen Shang, Jiadong Yu, Dinh Thai Hoang", + "update_time": "2024-10-31", + "abstract": "The integration of the Metaverse into a human-centric ecosystem has intensified the need for sophisticated Human Digital Twins (HDTs) that are driven by the multifaceted human data. However, the effective construction of HDTs faces significant challenges due to the heterogeneity of data collection devices, the high energy demands associated with processing intricate data, and concerns over the privacy of sensitive information. This work introduces a novel biologically-inspired (bio-inspired) HDT framework that leverages Brain-Computer Interface (BCI) sensor technology to capture brain signals as the data source for constructing HDT. By collecting and analyzing these signals, the framework not only minimizes device heterogeneity and enhances data collection efficiency, but also provides richer and more nuanced physiological and psychological data for constructing personalized HDTs. To this end, we further propose a bio-inspired neuromorphic computing learning model based on the Spiking Neural Network (SNN). This model utilizes discrete neural spikes to emulate the way of human brain processes information, thereby enhancing the system's ability to process data effectively while reducing energy consumption. Additionally, we integrate a Federated Learning (FL) strategy within the model to strengthen data privacy. We then conduct a case study to demonstrate the performance of our proposed twofold bio-inspired scheme. Finally, we present several challenges and promising directions for future research of HDTs driven by bio-inspired technologies." + }, + "2411.09709v1": { + "title": "Feature Selection via Dynamic Graph-based Attention Block in MI-based EEG Signals", + "url": "http://arxiv.org/abs/2411.09709v1", + "authors": "Hyeon-Taek Han, Dae-Hyeok Lee, Heon-Gyu Kwak", + "update_time": "2024-10-31", + "abstract": "Brain-computer interface (BCI) technology enables direct interaction between humans and computers by analyzing brain signals. Electroencephalogram (EEG) is one of the non-invasive tools used in BCI systems, providing high temporal resolution for real-time applications. However, EEG signals are often affected by a low signal-to-noise ratio, physiological artifacts, and individual variability, representing challenges in extracting distinct features. Also, motor imagery (MI)-based EEG signals could contain features with low correlation to MI characteristics, which might cause the weights of the deep model to become biased towards those features. To address these problems, we proposed the end-to-end deep preprocessing method that effectively enhances MI characteristics while attenuating features with low correlation to MI characteristics. The proposed method consisted of the temporal, spatial, graph, and similarity blocks to preprocess MI-based EEG signals, aiming to extract more discriminative features and improve the robustness. We evaluated the proposed method using the public dataset 2a of BCI Competition IV to compare the performances when integrating the proposed method into the conventional models, including the DeepConvNet, the M-ShallowConvNet, and the EEGNet. The experimental results showed that the proposed method could achieve the improved performances and lead to more clustered feature distributions of MI tasks. Hence, we demonstrated that our proposed method could enhance discriminative features related to MI characteristics." + }, + "2411.05811v1": { + "title": "Neurophysiological Analysis in Motor and Sensory Cortices for Improving Motor Imagination", + "url": "http://arxiv.org/abs/2411.05811v1", + "authors": "Si-Hyun Kim, Sung-Jin Kim, Dae-Hyeok Lee", + "update_time": "2024-10-31", + "abstract": "Brain-computer interface (BCI) enables direct communication between the brain and external devices by decoding neural signals, offering potential solutions for individuals with motor impairments. This study explores the neural signatures of motor execution (ME) and motor imagery (MI) tasks using EEG signals, focusing on four conditions categorized as sense-related (hot and cold) and motor-related (pull and push) conditions. We conducted scalp topography analysis to examine activation patterns in the sensorimotor cortex, revealing distinct regional differences: sense--related conditions primarily activated the posterior region of the sensorimotor cortex, while motor--related conditions activated the anterior region of the sensorimotor cortex. These spatial distinctions align with neurophysiological principles, suggesting condition-specific functional subdivisions within the sensorimotor cortex. We further evaluated the performances of three neural network models-EEGNet, ShallowConvNet, and DeepConvNet-demonstrating that ME tasks achieved higher classification accuracies compared to MI tasks. Specifically, in sense-related conditions, the highest accuracy was observed in the cold condition. In motor-related conditions, the pull condition showed the highest performance, with DeepConvNet yielding the highest results. These findings provide insights into optimizing BCI applications by leveraging specific condition-induced neural activations." + }, + "2410.22008v1": { + "title": "Neurofeedback-Driven 6-DOF Robotic Arm: Integration of Brain-Computer Interface with Arduino for Advanced Control", + "url": "http://arxiv.org/abs/2410.22008v1", + "authors": "Ihab A. Satam, R\u00f3bert Szabolcsi", + "update_time": "2024-10-29", + "abstract": "Brain computer interface (BCI) applications in robotics are becoming more famous and famous. People with disabilities are facing a real-time problem of doing simple activities such as grasping, handshaking etc. in order to aid with this problem, the use of brain signals to control actuators is showing a great importance. The Emotive Insight, a Brain-Computer Interface (BCI) device, is utilized in this project to collect brain signals and transform them into commands for controlling a robotic arm using an Arduino controller. The Emotive Insight captures brain signals, which are subsequently analyzed using Emotive software and connected with Arduino code. The HITI Brain software integrates these devices, allowing for smooth communication between brain activity and the robotic arm. This system demonstrates how brain impulses may be utilized to control external devices directly. The results showed that the system is applicable efficiently to robotic arms and also for prosthetic arms with Multi Degree of Freedom. In addition to that, the system can be used for other actuators such as bikes, mobile robots, wheelchairs etc." + }, + "2411.09789v1": { + "title": "Can EEG resting state data benefit data-driven approaches for motor-imagery decoding?", + "url": "http://arxiv.org/abs/2411.09789v1", + "authors": "Rishan Mehta, Param Rajpura, Yogesh Kumar Meena", + "update_time": "2024-10-28", + "abstract": "Resting-state EEG data in neuroscience research serve as reliable markers for user identification and reveal individual-specific traits. Despite this, the use of resting-state data in EEG classification models is limited. In this work, we propose a feature concatenation approach to enhance decoding models' generalization by integrating resting-state EEG, aiming to improve motor imagery BCI performance and develop a user-generalized model. Using feature concatenation, we combine the EEGNet model, a standard convolutional neural network for EEG signal classification, with functional connectivity measures derived from resting-state EEG data. The findings suggest that although grounded in neuroscience with data-driven learning, the concatenation approach has limited benefits for generalizing models in within-user and across-user scenarios. While an improvement in mean accuracy for within-user scenarios is observed on two datasets, concatenation doesn't benefit across-user scenarios when compared with random data concatenation. The findings indicate the necessity of further investigation on the model interpretability and the effect of random data concatenation on model robustness." + }, + "2410.15161v1": { + "title": "Evaluation Of P300 Speller Performance Using Large Language Models Along With Cross-Subject Training", + "url": "http://arxiv.org/abs/2410.15161v1", + "authors": "Nithin Parthasarathy, James Soetedjo, Saarang Panchavati, Nitya Parthasarathy, Corey Arnold, Nader Pouratian, William Speier", + "update_time": "2024-10-19", + "abstract": "Amyotrophic lateral sclerosis (ALS), a progressive neuromuscular degenerative disease, severely restricts patient communication capacity within a few years of onset, resulting in a significant deterioration of quality of life. The P300 speller brain computer interface (BCI) offers an alternative communication medium by leveraging a subject's EEG response to characters traditionally highlighted on a character grid on a graphical user interface (GUI). A recurring theme in P300-based research is enhancing performance to enable faster subject interaction. This study builds on that theme by addressing key limitations, particularly in the training of multi-subject classifiers, and by integrating advanced language models to optimize stimuli presentation and word prediction, thereby improving communication efficiency. Furthermore, various advanced large language models such as Generative Pre-Trained Transformer (GPT2), BERT, and BART, alongside Dijkstra's algorithm, are utilized to optimize stimuli and provide word completion choices based on the spelling history. In addition, a multi-layered smoothing approach is applied to allow for out-of-vocabulary (OOV) words. By conducting extensive simulations based on randomly sampled EEG data from subjects, we show substantial speed improvements in typing passages that include rare and out-of-vocabulary (OOV) words, with the extent of improvement varying depending on the language model utilized. The gains through such character-level interface optimizations are approximately 10%, and GPT2 for multi-word prediction provides gains of around 40%. In particular, some large language models achieve performance levels within 10% of the theoretical performance limits established in this study. In addition, both within and across subjects, training techniques are explored, and speed improvements are shown to hold in both cases.", + "code_url": "https://github.com/nithinparthasarthy/P300Speller" + }, + "2410.12267v1": { + "title": "iFuzzyTL: Interpretable Fuzzy Transfer Learning for SSVEP BCI System", + "url": "http://arxiv.org/abs/2410.12267v1", + "authors": "Xiaowei Jiang, Beining Cao, Liang Ou, Yu-Cheng Chang, Thomas Do, Chin-Teng Lin", + "update_time": "2024-10-16", + "abstract": "The rapid evolution of Brain-Computer Interfaces (BCIs) has significantly influenced the domain of human-computer interaction, with Steady-State Visual Evoked Potentials (SSVEP) emerging as a notably robust paradigm. This study explores advanced classification techniques leveraging interpretable fuzzy transfer learning (iFuzzyTL) to enhance the adaptability and performance of SSVEP-based systems. Recent efforts have strengthened to reduce calibration requirements through innovative transfer learning approaches, which refine cross-subject generalizability and minimize calibration through strategic application of domain adaptation and few-shot learning strategies. Pioneering developments in deep learning also offer promising enhancements, facilitating robust domain adaptation and significantly improving system responsiveness and accuracy in SSVEP classification. However, these methods often require complex tuning and extensive data, limiting immediate applicability. iFuzzyTL introduces an adaptive framework that combines fuzzy logic principles with neural network architectures, focusing on efficient knowledge transfer and domain adaptation. iFuzzyTL refines input signal processing and classification in a human-interpretable format by integrating fuzzy inference systems and attention mechanisms. This approach bolsters the model's precision and aligns with real-world operational demands by effectively managing the inherent variability and uncertainty of EEG data. The model's efficacy is demonstrated across three datasets: 12JFPM (89.70% accuracy for 1s with an information transfer rate (ITR) of 149.58), Benchmark (85.81% accuracy for 1s with an ITR of 213.99), and eldBETA (76.50% accuracy for 1s with an ITR of 94.63), achieving state-of-the-art results and setting new benchmarks for SSVEP BCI performance." + } + }, + "fMRI": { + "2411.09723v1": { + "title": "Towards Neural Foundation Models for Vision: Aligning EEG, MEG, and fMRI Representations for Decoding, Encoding, and Modality Conversion", + "url": "http://arxiv.org/abs/2411.09723v1", + "authors": "Matteo Ferrante, Tommaso Boccato, Grigorii Rashkov, Nicola Toschi", + "update_time": "2024-11-14", + "abstract": "This paper presents a novel approach towards creating a foundational model for aligning neural data and visual stimuli across multimodal representationsof brain activity by leveraging contrastive learning. We used electroencephalography (EEG), magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI) data. Our framework's capabilities are demonstrated through three key experiments: decoding visual information from neural data, encoding images into neural representations, and converting between neural modalities. The results highlight the model's ability to accurately capture semantic information across different brain imaging techniques, illustrating its potential in decoding, encoding, and modality conversion tasks." + }, + "2411.09098v1": { + "title": "Interdependent scaling exponents in the human brain", + "url": "http://arxiv.org/abs/2411.09098v1", + "authors": "Daniel M. Castro, Ernesto P. Raposo, Mauro Copelli, Fernando A. N. Santos", + "update_time": "2024-11-14", + "abstract": "We apply the phenomenological renormalization group to resting-state fMRI time series of brain activity in a large population. By recursively coarse-graining the data, we compute scaling exponents for the series variance, log probability of silence, and largest covariance eigenvalue. The exponents clearly exhibit linear interdependencies, which we derive analytically in a mean-field approach. We find a significant correlation of exponent values with the gray matter volume and cognitive performance. Akin to scaling relations near critical points in thermodynamics, our findings suggest scaling interdependencies are intrinsic to brain organization and may also exist in other complex systems." + }, + "2411.08973v1": { + "title": "Somatosensory and motor contributions to emotion representation", + "url": "http://arxiv.org/abs/2411.08973v1", + "authors": "Marianne C. Reddan, Luke Chang, Philip Kragel, Tor D. Wager", + "update_time": "2024-11-13", + "abstract": "Emotion is often described as something people 'feel' in their bodies. Embodied emotion theorists propose that this connection is not purely linguistic; perceiving an emotion may require somatosensory and motor re-experiencing. However, it remains unclear whether self-reports of emotion-related bodily sensations (i.e., 'lump in my throat') are related to neural simulations of bodily action and sensation or whether they can be explained by cognitive appraisals or the visual features of socioemotional signals. To investigate this, participants (N = 21) were shown arousing emotional images that varied in valence, complexity, and content while undergoing fMRI scans. Participants then rated the images on a set of emotion appraisal scales and indicated where, on a body map, they experienced sensation in response to the image. To derive normative models of responses on these scales, a separate larger online sample online (N = 56 - 128) also rated these images. Representational similarity analysis (RSA) was used to compare the emotional content in the body maps with appraisals and visual features. A pairwise distance matrix between the body maps generated for each stimulus was then used in a whole brain voxel-wise searchlight analysis to identify brain regions which reflect the representational geometry of embodied emotion. This analysis revealed a network including bilateral primary somatosensory and motor cortices, precuneus, insula, and medial prefrontal cortex. The results of this study suggest that the relationship between emotion and the body is not purely conceptual: It is supported by sensorimotor cortical activations." + }, + "2411.08424v1": { + "title": "A Heterogeneous Graph Neural Network Fusing Functional and Structural Connectivity for MCI Diagnosis", + "url": "http://arxiv.org/abs/2411.08424v1", + "authors": "Feiyu Yin, Yu Lei, Siyuan Dai, Wenwen Zeng, Guoqing Wu, Liang Zhan, Jinhua Yu", + "update_time": "2024-11-13", + "abstract": "Brain connectivity alternations associated with brain disorders have been widely reported in resting-state functional imaging (rs-fMRI) and diffusion tensor imaging (DTI). While many dual-modal fusion methods based on graph neural networks (GNNs) have been proposed, they generally follow homogenous fusion ways ignoring rich heterogeneity of dual-modal information. To address this issue, we propose a novel method that integrates functional and structural connectivity based on heterogeneous graph neural networks (HGNNs) to better leverage the rich heterogeneity in dual-modal images. We firstly use blood oxygen level dependency and whiter matter structure information provided by rs-fMRI and DTI to establish homo-meta-path, capturing node relationships within the same modality. At the same time, we propose to establish hetero-meta-path based on structure-function coupling and brain community searching to capture relations among cross-modal nodes. Secondly, we further introduce a heterogeneous graph pooling strategy that automatically balances homo- and hetero-meta-path, effectively leveraging heterogeneous information and preventing feature confusion after pooling. Thirdly, based on the flexibility of heterogeneous graphs, we propose a heterogeneous graph data augmentation approach that can conveniently address the sample imbalance issue commonly seen in clinical diagnosis. We evaluate our method on ADNI-3 dataset for mild cognitive impairment (MCI) diagnosis. Experimental results indicate the proposed method is effective and superior to other algorithms, with a mean classification accuracy of 93.3%." + }, + "2411.07506v1": { + "title": "FM-TS: Flow Matching for Time Series Generation", + "url": "http://arxiv.org/abs/2411.07506v1", + "authors": "Yang Hu, Xiao Wang, Lirong Wu, Huatian Zhang, Stan Z. Li, Sheng Wang, Tianlong Chen", + "update_time": "2024-11-12", + "abstract": "Time series generation has emerged as an essential tool for analyzing temporal data across numerous fields. While diffusion models have recently gained significant attention in generating high-quality time series, they tend to be computationally demanding and reliant on complex stochastic processes. To address these limitations, we introduce FM-TS, a rectified Flow Matching-based framework for Time Series generation, which simplifies the time series generation process by directly optimizing continuous trajectories. This approach avoids the need for iterative sampling or complex noise schedules typically required in diffusion-based models. FM-TS is more efficient in terms of training and inference. Moreover, FM-TS is highly adaptive, supporting both conditional and unconditional time series generation. Notably, through our novel inference design, the model trained in an unconditional setting can seamlessly generalize to conditional tasks without the need for retraining. Extensive benchmarking across both settings demonstrates that FM-TS consistently delivers superior performance compared to existing approaches while being more efficient in terms of training and inference. For instance, in terms of discriminative score, FM-TS achieves 0.005, 0.019, 0.011, 0.005, 0.053, and 0.106 on the Sines, Stocks, ETTh, MuJoCo, Energy, and fMRI unconditional time series datasets, respectively, significantly outperforming the second-best method which achieves 0.006, 0.067, 0.061, 0.008, 0.122, and 0.167 on the same datasets. We have achieved superior performance in solar forecasting and MuJoCo imputation tasks, significantly enhanced by our innovative $t$ power sampling method. The code is available at https://github.com/UNITES-Lab/FMTS.", + "code_url": "https://github.com/unites-lab/fmts" + }, + "2411.07121v1": { + "title": "Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models", + "url": "http://arxiv.org/abs/2411.07121v1", + "authors": "Yanchen Wang, Adam Turnbull, Tiange Xiang, Yunlong Xu, Sa Zhou, Adnan Masoud, Shekoofeh Azizi, Feng Vankee Lin, Ehsan Adeli", + "update_time": "2024-11-11", + "abstract": "Neural decoding, the process of understanding how brain activity corresponds to different stimuli, has been a primary objective in cognitive sciences. Over the past three decades, advancements in functional Magnetic Resonance Imaging and machine learning have greatly improved our ability to map visual stimuli to brain activity, especially in the visual cortex. Concurrently, research has expanded into decoding more complex processes like language and memory across the whole brain, utilizing techniques to handle greater variability and improve signal accuracy. We argue that \"seeing\" involves more than just mapping visual stimuli onto the visual cortex; it engages the entire brain, as various emotions and cognitive states can emerge from observing different scenes. In this paper, we develop algorithms to enhance our understanding of visual processes by incorporating whole-brain activation maps while individuals are exposed to visual stimuli. We utilize large-scale fMRI encoders and Image generative models pre-trained on large public datasets, which are then fine-tuned through Image-fMRI contrastive learning. Our models hence can decode visual experience across the entire cerebral cortex, surpassing the traditional confines of the visual cortex. We first compare our method with state-of-the-art approaches to decoding visual processing and show improved predictive semantic accuracy by 43%. A network ablation analysis suggests that beyond the visual cortex, the default mode network contributes most to decoding stimuli, in line with the proposed role of this network in sense-making and semantic processing. Additionally, we implemented zero-shot imagination decoding on an extra validation dataset, achieving a p-value of 0.0206 for mapping the reconstructed images and ground-truth text stimuli, which substantiates the model's capability to capture semantic meanings across various scenarios.", + "code_url": "https://github.com/ppwangyc/wave" + }, + "2411.02949v1": { + "title": "A scalable generative model for dynamical system reconstruction from neuroimaging data", + "url": "http://arxiv.org/abs/2411.02949v1", + "authors": "Eric Volkmann, Alena Br\u00e4ndle, Daniel Durstewitz, Georgia Koppe", + "update_time": "2024-11-05", + "abstract": "Data-driven inference of the generative dynamics underlying a set of observed time series is of growing interest in machine learning and the natural sciences. In neuroscience, such methods promise to alleviate the need to handcraft models based on biophysical principles and allow to automatize the inference of inter-individual differences in brain dynamics. Recent breakthroughs in training techniques for state space models (SSMs) specifically geared toward dynamical systems (DS) reconstruction (DSR) enable to recover the underlying system including its geometrical (attractor) and long-term statistical invariants from even short time series. These techniques are based on control-theoretic ideas, like modern variants of teacher forcing (TF), to ensure stable loss gradient propagation while training. However, as it currently stands, these techniques are not directly applicable to data modalities where current observations depend on an entire history of previous states due to a signal's filtering properties, as common in neuroscience (and physiology more generally). Prominent examples are the blood oxygenation level dependent (BOLD) signal in functional magnetic resonance imaging (fMRI) or Ca$^{2+}$ imaging data. Such types of signals render the SSM's decoder model non-invertible, a requirement for previous TF-based methods. Here, exploiting the recent success of control techniques for training SSMs, we propose a novel algorithm that solves this problem and scales exceptionally well with model dimensionality and filter length. We demonstrate its efficiency in reconstructing dynamical systems, including their state space geometry and long-term temporal properties, from just short BOLD time series." + }, + "2411.01859v1": { + "title": "A Novel Deep Learning Tractography Fiber Clustering Framework for Functionally Consistent White Matter Parcellation Using Multimodal Diffusion MRI and Functional MRI", + "url": "http://arxiv.org/abs/2411.01859v1", + "authors": "Jin Wang, Bocheng Guo, Yijie Li, Junyi Wang, Yuqian Chen, Jarrett Rushmore, Nikos Makris, Yogesh Rathi, Lauren J O'Donnell, Fan Zhang", + "update_time": "2024-11-04", + "abstract": "Tractography fiber clustering using diffusion MRI (dMRI) is a crucial strategy for white matter (WM) parcellation. Current methods primarily use the geometric information of fibers (i.e., the spatial trajectories) to group similar fibers into clusters, overlooking the important functional signals present along the fiber tracts. There is increasing evidence that neural activity in the WM can be measured using functional MRI (fMRI), offering potentially valuable multimodal information for fiber clustering. In this paper, we develop a novel deep learning fiber clustering framework, namely Deep Multi-view Fiber Clustering (DMVFC), that uses joint dMRI and fMRI data to enable functionally consistent WM parcellation. DMVFC can effectively integrate the geometric characteristics of the WM fibers with the fMRI BOLD signals along the fiber tracts. It includes two major components: 1) a multi-view pretraining module to compute embedding features from fiber geometric information and functional signals separately, and 2) a collaborative fine-tuning module to simultaneously refine the two kinds of embeddings. In the experiments, we compare DMVFC with two state-of-the-art fiber clustering methods and demonstrate superior performance in achieving functionally meaningful and consistent WM parcellation results." + }, + "2411.01092v1": { + "title": "Cost efficiency of fMRI studies using resting-state vs task-based functional connectivity", + "url": "http://arxiv.org/abs/2411.01092v1", + "authors": "Xinzhi Zhang, Leslie A Hulvershorn, Todd Constable, Yize Zhao, Selena Wang", + "update_time": "2024-11-02", + "abstract": "We investigate whether and how we can improve the cost efficiency of neuroimaging studies with well-tailored fMRI tasks. The comparative study is conducted using a novel network science-driven Bayesian connectome-based predictive method, which incorporates network theories in model building and substantially improves precision and robustness in imaging biomarker detection. The robustness of the method lays the foundation for identifying predictive power differential across fMRI task conditions if such difference exists. When applied to a clinically heterogeneous transdiagnostic cohort, we found shared and distinct functional fingerprints of neuropsychological outcomes across seven fMRI conditions. For example, emotional N-back memory task was found to be less optimal for negative emotion outcomes, and gradual-onset continuous performance task was found to have stronger links with sensitivity and sociability outcomes than with cognitive control outcomes. Together, our results show that there are unique optimal pairings of task-based fMRI conditions and neuropsychological outcomes that should not be ignored when designing well-powered neuroimaging studies." + }, + "2411.00992v2": { + "title": "Correlation of Correlation Networks: High-Order Interactions in the Topology of Brain Networks", + "url": "http://arxiv.org/abs/2411.00992v2", + "authors": "Qiang Li, Jingyu Liu, Vince D. Calhoun", + "update_time": "2024-11-05", + "abstract": "To understand collective network behavior in the complex human brain, pairwise correlation networks alone are insufficient for capturing the high-order interactions that extend beyond pairwise interactions and play a crucial role in brain network dynamics. These interactions often reveal intricate relationships among multiple brain networks, significantly influencing cognitive processes. In this study, we explored the correlation of correlation networks and topological network analysis with resting-state fMRI to gain deeper insights into these higher-order interactions and their impact on the topology of brain networks, ultimately enhancing our understanding of brain function. We observed that the correlation of correlation networks highlighted network connections while preserving the topological structure of correlation networks. Our findings suggested that the correlation of correlation networks surpassed traditional correlation networks, showcasing considerable potential for applications in various areas of network science. Moreover, after applying topological network analysis to the correlation of correlation networks, we observed that some high-order interaction hubs predominantly occurred in primary and high-level cognitive areas, such as the visual and fronto-parietal regions. These high-order hubs played a crucial role in information integration within the human brain." + } + }, + "MEG": { + "2411.09723v1": { + "title": "Towards Neural Foundation Models for Vision: Aligning EEG, MEG, and fMRI Representations for Decoding, Encoding, and Modality Conversion", + "url": "http://arxiv.org/abs/2411.09723v1", + "authors": "Matteo Ferrante, Tommaso Boccato, Grigorii Rashkov, Nicola Toschi", + "update_time": "2024-11-14", + "abstract": "This paper presents a novel approach towards creating a foundational model for aligning neural data and visual stimuli across multimodal representationsof brain activity by leveraging contrastive learning. We used electroencephalography (EEG), magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI) data. Our framework's capabilities are demonstrated through three key experiments: decoding visual information from neural data, encoding images into neural representations, and converting between neural modalities. The results highlight the model's ability to accurately capture semantic information across different brain imaging techniques, illustrating its potential in decoding, encoding, and modality conversion tasks." + }, + "2411.07994v1": { + "title": "Search for the X17 particle in $^{7}\\mathrm{Li}(\\mathrm{p},\\mathrm{e}^+ \\mathrm{e}^{-}) ^{8}\\mathrm{Be}$ processes with the MEG II detector", + "url": "http://arxiv.org/abs/2411.07994v1", + "authors": "The MEG II collaboration, K. Afanaciev, A. M. Baldini, S. Ban, H. Benmansour, G. Boca, P. W. Cattaneo, G. Cavoto, F. Cei, M. Chiappini, A. Corvaglia, G. Dal Maso, A. De Bari, M. De Gerone, L. Ferrari Barusso, M. Francesconi, L. Galli, G. Gallucci, F. Gatti, L. Gerritzen, F. Grancagnolo, E. G. Grandoni, M. Grassi, D. N. Grigoriev, M. Hildebrandt, F. Ignatov, F. Ikeda, T. Iwamoto, S. Karpov, P. -R. Kettle, N. Khomutov, A. Kolesnikov, N. Kravchuk, V. Krylov, N. Kuchinskiy, F. Leonetti, W. Li, V. Malyshev, A. Matsushita, M. Meucci, S. Mihara, W. Molzon, T. Mori, D. Nicol\u00f2, H. Nishiguchi, A. Ochi, W. Ootani, A. Oya, D. Palo, M. Panareo, A. Papa, V. Pettinacci, A. Popov, F. Renga, S. Ritt, M. Rossella, A. Rozhdestvensky. S. Scarpellini, P. Schwendimann, G. Signorelli, M. Takahashi, Y. Uchiyama, A. Venturini, B. Vitali, C. Voena, K. Yamamoto, R. Yokota, T. Yonemoto", + "update_time": "2024-11-12", + "abstract": "The observation of a resonance structure in the opening angle of the electron-positron pairs in the $^{7}$Li(p,\\ee) $^{8}$Be reaction was claimed and interpreted as the production and subsequent decay of a hypothetical particle (X17). Similar excesses, consistent with this particle, were later observed in processes involving $^{4}$He and $^{12}$C nuclei with the same experimental technique. The MEG II apparatus at PSI, designed to search for the $\\mu^+ \\rightarrow \\mathrm{e}^+ \\gamma$ decay, can be exploited to investigate the existence of this particle and study its nature. Protons from a Cockroft-Walton accelerator, with an energy up to 1.1 MeV, were delivered on a dedicated Li-based target. The $\\gamma$ and the e$^{+}$e$^{-}$ pair emerging from the $^8\\mathrm{Be}^*$ transitions were studied with calorimeters and a spectrometer, featuring a broader angular acceptance than previous experiments. We present in this paper the analysis of a four-week data-taking in 2023 with a beam energy of 1080 keV, resulting in the excitation of two different resonances with Q-value \\SI{17.6}{\\mega\\electronvolt} and \\SI{18.1}{\\mega\\electronvolt}. No significant signal was found, and limits at \\SI{90}{\\percent} C.L. on the branching ratios (relative to the $\\gamma$ emission) of the two resonances to X17 were set, $R_{17.6} < 1.8 \\times 10^{-6} $ and $R_{18.1} < 1.2 \\times 10^{-5} $." + }, + "2411.03883v2": { + "title": "MEG: Medical Knowledge-Augmented Large Language Models for Question Answering", + "url": "http://arxiv.org/abs/2411.03883v2", + "authors": "Laura Cabello, Carmen Martin-Turrero, Uchenna Akujuobi, Anders S\u00f8gaard, Carlos Bobed", + "update_time": "2024-11-07", + "abstract": "Question answering is a natural language understanding task that involves reasoning over both explicit context and unstated, relevant domain knowledge. Large language models (LLMs), which underpin most contemporary question answering systems, struggle to induce how concepts relate in specialized domains such as medicine. Existing medical LLMs are also costly to train. In this work, we present MEG, a parameter-efficient approach for medical knowledge-augmented LLMs. MEG uses a lightweight mapping network to integrate graph embeddings into the LLM, enabling it to leverage external knowledge in a cost-effective way. We evaluate our method on four popular medical multiple-choice datasets and show that LLMs greatly benefit from the factual grounding provided by knowledge graph embeddings. MEG attains an average of +10.2% accuracy over the Mistral-Instruct baseline, and +6.7% over specialized models like BioMistral. We also show results based on Llama-3. Finally, we show that MEG's performance remains robust to the choice of graph encoder.", + "code_url": "https://github.com/lautel/meg" + }, + "2410.23386v1": { + "title": "STIED: A deep learning model for the SpatioTemporal detection of focal Interictal Epileptiform Discharges with MEG", + "url": "http://arxiv.org/abs/2410.23386v1", + "authors": "Raquel Fern\u00e1ndez-Mart\u00edn, Alfonso Gij\u00f3n, Odile Feys, Elodie Juven\u00e9, Alec Aeby, Charline Urbain, Xavier De Ti\u00e8ge, Vincent Wens", + "update_time": "2024-10-30", + "abstract": "Magnetoencephalography (MEG) allows the non-invasive detection of interictal epileptiform discharges (IEDs). Clinical MEG analysis in epileptic patients traditionally relies on the visual identification of IEDs, which is time consuming and partially subjective. Automatic, data-driven detection methods exist but show limited performance. Still, the rise of deep learning (DL)-with its ability to reproduce human-like abilities-could revolutionize clinical MEG practice. Here, we developed and validated STIED, a simple yet powerful supervised DL algorithm combining two convolutional neural networks with temporal (1D time-course) and spatial (2D topography) features of MEG signals inspired from current clinical guidelines. Our DL model enabled both temporal and spatial localization of IEDs in patients suffering from focal epilepsy with frequent and high amplitude spikes (FE group), with high-performance metrics-accuracy, specificity, and sensitivity all exceeding 85%-when learning from spatiotemporal features of IEDs. This performance can be attributed to our handling of input data, which mimics established clinical MEG practice. Reverse engineering further revealed that STIED encodes fine spatiotemporal features of IEDs rather than their mere amplitude. The model trained on the FE group also showed promising results when applied to a separate group of presurgical patients with different types of refractory focal epilepsy, though further work is needed to distinguish IEDs from physiological transients. This study paves the way of incorporating STIED and DL algorithms into the routine clinical MEG evaluation of epilepsy." + }, + "2410.20916v1": { + "title": "NeuGPT: Unified multi-modal Neural GPT", + "url": "http://arxiv.org/abs/2410.20916v1", + "authors": "Yiqian Yang, Yiqun Duan, Hyejeong Jo, Qiang Zhang, Renjing Xu, Oiwi Parker Jones, Xuming Hu, Chin-teng Lin, Hui Xiong", + "update_time": "2024-10-28", + "abstract": "This paper introduces NeuGPT, a groundbreaking multi-modal language generation model designed to harmonize the fragmented landscape of neural recording research. Traditionally, studies in the field have been compartmentalized by signal type, with EEG, MEG, ECoG, SEEG, fMRI, and fNIRS data being analyzed in isolation. Recognizing the untapped potential for cross-pollination and the adaptability of neural signals across varying experimental conditions, we set out to develop a unified model capable of interfacing with multiple modalities. Drawing inspiration from the success of pre-trained large models in NLP, computer vision, and speech processing, NeuGPT is architected to process a diverse array of neural recordings and interact with speech and text data. Our model mainly focus on brain-to-text decoding, improving SOTA from 6.94 to 12.92 on BLEU-1 and 6.93 to 13.06 on ROUGE-1F. It can also simulate brain signals, thereby serving as a novel neural interface. Code is available at \\href{https://github.com/NeuSpeech/NeuGPT}{NeuSpeech/NeuGPT (https://github.com/NeuSpeech/NeuGPT) .}", + "code_url": "https://github.com/neuspeech/neugpt" + }, + "2410.19986v1": { + "title": "Resolving Domain Shift For Representations Of Speech In Non-Invasive Brain Recordings", + "url": "http://arxiv.org/abs/2410.19986v1", + "authors": "Jeremiah Ridge, Oiwi Parker Jones", + "update_time": "2024-10-25", + "abstract": "Machine learning techniques have enabled researchers to leverage neuroimaging data to decode speech from brain activity, with some amazing recent successes achieved by applications built using invasive devices. However, research requiring surgical implants has a number of practical limitations. Non-invasive neuroimaging techniques provide an alternative but come with their own set of challenges, the limited scale of individual studies being among them. Without the ability to pool the recordings from different non-invasive studies, data on the order of magnitude needed to leverage deep learning techniques to their full potential remains out of reach. In this work, we focus on non-invasive data collected using magnetoencephalography (MEG). We leverage two different, leading speech decoding models to investigate how an adversarial domain adaptation framework augments their ability to generalize across datasets. We successfully improve the performance of both models when training across multiple datasets. To the best of our knowledge, this study is the first ever application of feature-level, deep learning based harmonization for MEG neuroimaging data. Our analysis additionally offers further evidence of the impact of demographic features on neuroimaging data, demonstrating that participant age strongly affects how machine learning models solve speech decoding tasks using MEG data. Lastly, in the course of this study we produce a new open-source implementation of one of these models to the benefit of the broader scientific community." + }, + "2410.19838v1": { + "title": "Non-invasive Neural Decoding in Source Reconstructed Brain Space", + "url": "http://arxiv.org/abs/2410.19838v1", + "authors": "Yonatan Gideoni, Ryan Charles Timms, Oiwi Parker Jones", + "update_time": "2024-10-20", + "abstract": "Non-invasive brainwave decoding is usually done using Magneto/Electroencephalography (MEG/EEG) sensor measurements as inputs. This makes combining datasets and building models with inductive biases difficult as most datasets use different scanners and the sensor arrays have a nonintuitive spatial structure. In contrast, fMRI scans are acquired directly in brain space, a voxel grid with a typical structured input representation. By using established techniques to reconstruct the sensors' sources' neural activity it is possible to decode from voxels for MEG data as well. We show that this enables spatial inductive biases, spatial data augmentations, better interpretability, zero-shot generalisation between datasets, and data harmonisation." + }, + "2410.14971v1": { + "title": "BrainECHO: Semantic Brain Signal Decoding through Vector-Quantized Spectrogram Reconstruction for Whisper-Enhanced Text Generation", + "url": "http://arxiv.org/abs/2410.14971v1", + "authors": "Jilong Li, Zhenxi Song, Jiaqi Wang, Min Zhang, Zhiguo Zhang", + "update_time": "2024-10-19", + "abstract": "Recent advances in decoding language from brain signals (EEG and MEG) have been significantly driven by pre-trained language models, leading to remarkable progress on publicly available non-invasive EEG/MEG datasets. However, previous works predominantly utilize teacher forcing during text generation, leading to significant performance drops without its use. A fundamental issue is the inability to establish a unified feature space correlating textual data with the corresponding evoked brain signals. Although some recent studies attempt to mitigate this gap using an audio-text pre-trained model, Whisper, which is favored for its signal input modality, they still largely overlook the inherent differences between audio signals and brain signals in directly applying Whisper to decode brain signals. To address these limitations, we propose a new multi-stage strategy for semantic brain signal decoding via vEctor-quantized speCtrogram reconstruction for WHisper-enhanced text generatiOn, termed BrainECHO. Specifically, BrainECHO successively conducts: 1) Discrete autoencoding of the audio spectrogram; 2) Brain-audio latent space alignment; and 3) Semantic text generation via Whisper finetuning. Through this autoencoding--alignment--finetuning process, BrainECHO outperforms state-of-the-art methods under the same data split settings on two widely accepted resources: the EEG dataset (Brennan) and the MEG dataset (GWilliams). The innovation of BrainECHO, coupled with its robustness and superiority at the sentence, session, and subject-independent levels across public datasets, underscores its significance for language-based brain-computer interfaces." + }, + "2410.08718v1": { + "title": "Determining sensor geometry and gain in a wearable MEG system", + "url": "http://arxiv.org/abs/2410.08718v1", + "authors": "Ryan M. Hill, Gonzalo Reina Rivero, Ashley J. Tyler, Holly Schofield, Cody Doyle, James Osborne, David Bobela, Lukas Rier, Joseph Gibson, Zoe Tanner, Elena Boto, Richard Bowtell, Matthew J. Brookes, Vishal Shah, Niall Holmes", + "update_time": "2024-10-11", + "abstract": "Optically pumped magnetometers (OPMs) are compact and lightweight sensors that can measure magnetic fields generated by current flow in neuronal assemblies in the brain. Such sensors enable construction of magnetoencephalography (MEG) instrumentation, with significant advantages over conventional MEG devices including adaptability to head size, enhanced movement tolerance, lower complexity and improved data quality. However, realising the potential of OPMs depends on our ability to perform system calibration, which means finding sensor locations, orientations, and the relationship between the sensor output and magnetic field (termed sensor gain). Such calibration is complex in OPMMEG since, for example, OPM placement can change from subject to subject (unlike in conventional MEG where sensor locations or orientations are fixed). Here, we present two methods for calibration, both based on generating well-characterised magnetic fields across a sensor array. Our first device (the HALO) is a head mounted system that generates dipole like fields from a set of coils. Our second (the matrix coil (MC)) generates fields using coils embedded in the walls of a magnetically shielded room. Our results show that both methods offer an accurate means to calibrate an OPM array (e.g. sensor locations within 2 mm of the ground truth) and that the calibrations produced by the two methods agree strongly with each other. When applied to data from human MEG experiments, both methods offer improved signal to noise ratio after beamforming suggesting that they give calibration parameters closer to the ground truth than factory settings and presumed physical sensor coordinates and orientations. Both techniques are practical and easy to integrate into real world MEG applications. This advances the field significantly closer to the routine use of OPMs for MEG recording." + }, + "2410.03191v2": { + "title": "Nested Deep Learning Model Towards A Foundation Model for Brain Signal Data", + "url": "http://arxiv.org/abs/2410.03191v2", + "authors": "Fangyi Wei, Jiajie Mo, Kai Zhang, Haipeng Shen, Srikantan Nagarajan, Fei Jiang", + "update_time": "2024-10-09", + "abstract": "Epilepsy affects over 50 million people globally, with EEG/MEG-based spike detection playing a crucial role in diagnosis and treatment. Manual spike identification is time-consuming and requires specialized training, limiting the number of professionals available to analyze EEG/MEG data. To address this, various algorithmic approaches have been developed. However, current methods face challenges in handling varying channel configurations and in identifying the specific channels where spikes originate. This paper introduces a novel Nested Deep Learning (NDL) framework designed to overcome these limitations. NDL applies a weighted combination of signals across all channels, ensuring adaptability to different channel setups, and allows clinicians to identify key channels more accurately. Through theoretical analysis and empirical validation on real EEG/MEG datasets, NDL demonstrates superior accuracy in spike detection and channel localization compared to traditional methods. The results show that NDL improves prediction accuracy, supports cross-modality data integration, and can be fine-tuned for various neurophysiological applications." + } + }, + "neuroAI": { + "2410.19315v1": { + "title": "A prescriptive theory for brain-like inference", + "url": "http://arxiv.org/abs/2410.19315v1", + "authors": "Hadi Vafaii, Dekel Galor, Jacob L. Yates", + "update_time": "2024-10-25", + "abstract": "The Evidence Lower Bound (ELBO) is a widely used objective for training deep generative models, such as Variational Autoencoders (VAEs). In the neuroscience literature, an identical objective is known as the variational free energy, hinting at a potential unified framework for brain function and machine learning. Despite its utility in interpreting generative models, including diffusion models, ELBO maximization is often seen as too broad to offer prescriptive guidance for specific architectures in neuroscience or machine learning. In this work, we show that maximizing ELBO under Poisson assumptions for general sequence data leads to a spiking neural network that performs Bayesian posterior inference through its membrane potential dynamics. The resulting model, the iterative Poisson VAE (iP-VAE), has a closer connection to biological neurons than previous brain-inspired predictive coding models based on Gaussian assumptions. Compared to amortized and iterative VAEs, iP-VAElearns sparser representations and exhibits superior generalization to out-of-distribution samples. These findings suggest that optimizing ELBO, combined with Poisson assumptions, provides a solid foundation for developing prescriptive theories in NeuroAI." + }, + "2409.05771v1": { + "title": "Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models", + "url": "http://arxiv.org/abs/2409.05771v1", + "authors": "Emily Cheng, Richard J. Antonello", + "update_time": "2024-09-09", + "abstract": "Research has repeatedly demonstrated that intermediate hidden states extracted from large language models are able to predict measured brain response to natural language stimuli. Yet, very little is known about the representation properties that enable this high prediction performance. Why is it the intermediate layers, and not the output layers, that are most capable for this unique and highly general transfer task? In this work, we show that evidence from language encoding models in fMRI supports the existence of a two-phase abstraction process within LLMs. We use manifold learning methods to show that this abstraction process naturally arises over the course of training a language model and that the first \"composition\" phase of this abstraction process is compressed into fewer layers as training continues. Finally, we demonstrate a strong correspondence between layerwise encoding performance and the intrinsic dimensionality of representations from LLMs. We give initial evidence that this correspondence primarily derives from the inherent compositionality of LLMs and not their next-word prediction properties." + }, + "2407.04117v2": { + "title": "Predictive Coding Networks and Inference Learning: Tutorial and Survey", + "url": "http://arxiv.org/abs/2407.04117v2", + "authors": "Bj\u00f6rn van Zwol, Ro Jefferson, Egon L. van den Broek", + "update_time": "2024-07-22", + "abstract": "Recent years have witnessed a growing call for renewed emphasis on neuroscience-inspired approaches in artificial intelligence research, under the banner of NeuroAI. A prime example of this is predictive coding networks (PCNs), based on the neuroscientific framework of predictive coding. This framework views the brain as a hierarchical Bayesian inference model that minimizes prediction errors through feedback connections. Unlike traditional neural networks trained with backpropagation (BP), PCNs utilize inference learning (IL), a more biologically plausible algorithm that explains patterns of neural activity that BP cannot. Historically, IL has been more computationally intensive, but recent advancements have demonstrated that it can achieve higher efficiency than BP with sufficient parallelization. Furthermore, PCNs can be mathematically considered a superset of traditional feedforward neural networks (FNNs), significantly extending the range of trainable architectures. As inherently probabilistic (graphical) latent variable models, PCNs provide a versatile framework for both supervised learning and unsupervised (generative) modeling that goes beyond traditional artificial neural networks. This work provides a comprehensive review and detailed formal specification of PCNs, particularly situating them within the context of modern ML methods. Additionally, we introduce a Python library (PRECO) for practical implementation. This positions PC as a promising framework for future ML innovations." + }, + "2306.10168v3": { + "title": "Beyond Geometry: Comparing the Temporal Structure of Computation in Neural Circuits with Dynamical Similarity Analysis", + "url": "http://arxiv.org/abs/2306.10168v3", + "authors": "Mitchell Ostrow, Adam Eisen, Leo Kozachkov, Ila Fiete", + "update_time": "2023-10-29", + "abstract": "How can we tell whether two neural networks utilize the same internal processes for a particular computation? This question is pertinent for multiple subfields of neuroscience and machine learning, including neuroAI, mechanistic interpretability, and brain-machine interfaces. Standard approaches for comparing neural networks focus on the spatial geometry of latent states. Yet in recurrent networks, computations are implemented at the level of dynamics, and two networks performing the same computation with equivalent dynamics need not exhibit the same geometry. To bridge this gap, we introduce a novel similarity metric that compares two systems at the level of their dynamics, called Dynamical Similarity Analysis (DSA). Our method incorporates two components: Using recent advances in data-driven dynamical systems theory, we learn a high-dimensional linear system that accurately captures core features of the original nonlinear dynamics. Next, we compare different systems passed through this embedding using a novel extension of Procrustes Analysis that accounts for how vector fields change under orthogonal transformation. In four case studies, we demonstrate that our method disentangles conjugate and non-conjugate recurrent neural networks (RNNs), while geometric methods fall short. We additionally show that our method can distinguish learning rules in an unsupervised manner. Our method opens the door to comparative analyses of the essential temporal structure of computation in neural circuits.", + "code_url": "https://github.com/mitchellostrow/dsa" + }, + "2305.11275v2": { + "title": "Explaining V1 Properties with a Biologically Constrained Deep Learning Architecture", + "url": "http://arxiv.org/abs/2305.11275v2", + "authors": "Galen Pogoncheff, Jacob Granley, Michael Beyeler", + "update_time": "2023-05-25", + "abstract": "Convolutional neural networks (CNNs) have recently emerged as promising models of the ventral visual stream, despite their lack of biological specificity. While current state-of-the-art models of the primary visual cortex (V1) have surfaced from training with adversarial examples and extensively augmented data, these models are still unable to explain key neural properties observed in V1 that arise from biological circuitry. To address this gap, we systematically incorporated neuroscience-derived architectural components into CNNs to identify a set of mechanisms and architectures that comprehensively explain neural activity in V1. We show drastic improvements in model-V1 alignment driven by the integration of architectural components that simulate center-surround antagonism, local receptive fields, tuned normalization, and cortical magnification. Upon enhancing task-driven CNNs with a collection of these specialized components, we uncover models with latent representations that yield state-of-the-art explanation of V1 neural activity and tuning properties. Our results highlight an important advancement in the field of NeuroAI, as we systematically establish a set of architectural components that contribute to unprecedented explanation of V1. The neuroscience insights that could be gleaned from increasingly accurate in-silico models of the brain have the potential to greatly advance the fields of both neuroscience and artificial intelligence." + }, + "2302.07243v4": { + "title": "A Deep Probabilistic Spatiotemporal Framework for Dynamic Graph Representation Learning with Application to Brain Disorder Identification", + "url": "http://arxiv.org/abs/2302.07243v4", + "authors": "Sin-Yee Yap, Junn Yong Loo, Chee-Ming Ting, Fuad Noman, Raphael C. -W. Phan, Adeel Razi, David L. Dowe", + "update_time": "2024-11-09", + "abstract": "Recent applications of pattern recognition techniques on brain connectome classification using functional connectivity (FC) are shifting towards acknowledging the non-Euclidean topology and dynamic aspects of brain connectivity across time. In this paper, a deep spatiotemporal variational Bayes (DSVB) framework is proposed to learn time-varying topological structures in dynamic FC networks for identifying autism spectrum disorder (ASD) in human participants. The framework incorporates a spatial-aware recurrent neural network with an attention-based message passing scheme to capture rich spatiotemporal patterns across dynamic FC networks. To overcome model overfitting on limited training datasets, an adversarial training strategy is introduced to learn graph embedding models that generalize well to unseen brain networks. Evaluation on the ABIDE resting-state functional magnetic resonance imaging dataset shows that our proposed framework substantially outperforms state-of-the-art methods in identifying patients with ASD. Dynamic FC analyses with DSVB-learned embeddings reveal apparent group differences between ASD and healthy controls in brain network connectivity patterns and switching dynamics of brain states. The code is available at https://github.com/Monash-NeuroAI/Deep-Spatiotemporal-Variational-Bayes.", + "code_url": "https://github.com/Monash-NeuroAI/Deep-Spatiotemporal-Variational-Bayes" + }, + "2301.09245v2": { + "title": "Towards NeuroAI: Introducing Neuronal Diversity into Artificial Neural Networks", + "url": "http://arxiv.org/abs/2301.09245v2", + "authors": "Feng-Lei Fan, Yingxin Li, Hanchuan Peng, Tieyong Zeng, Fei Wang", + "update_time": "2023-03-11", + "abstract": "Throughout history, the development of artificial intelligence, particularly artificial neural networks, has been open to and constantly inspired by the increasingly deepened understanding of the brain, such as the inspiration of neocognitron, which is the pioneering work of convolutional neural networks. Per the motives of the emerging field: NeuroAI, a great amount of neuroscience knowledge can help catalyze the next generation of AI by endowing a network with more powerful capabilities. As we know, the human brain has numerous morphologically and functionally different neurons, while artificial neural networks are almost exclusively built on a single neuron type. In the human brain, neuronal diversity is an enabling factor for all kinds of biological intelligent behaviors. Since an artificial network is a miniature of the human brain, introducing neuronal diversity should be valuable in terms of addressing those essential problems of artificial networks such as efficiency, interpretability, and memory. In this Primer, we first discuss the preliminaries of biological neuronal diversity and the characteristics of information transmission and processing in a biological neuron. Then, we review studies of designing new neurons for artificial networks. Next, we discuss what gains can neuronal diversity bring into artificial networks and exemplary applications in several important fields. Lastly, we discuss the challenges and future directions of neuronal diversity to explore the potential of NeuroAI." + }, + "2212.04401v1": { + "title": "A Rubric for Human-like Agents and NeuroAI", + "url": "http://arxiv.org/abs/2212.04401v1", + "authors": "Ida Momennejad", + "update_time": "2022-12-08", + "abstract": "Researchers across cognitive, neuro-, and computer sciences increasingly reference human-like artificial intelligence and neuroAI. However, the scope and use of the terms are often inconsistent. Contributed research ranges widely from mimicking behaviour, to testing machine learning methods as neurally plausible hypotheses at the cellular or functional levels, or solving engineering problems. However, it cannot be assumed nor expected that progress on one of these three goals will automatically translate to progress in others. Here a simple rubric is proposed to clarify the scope of individual contributions, grounded in their commitments to human-like behaviour, neural plausibility, or benchmark/engineering goals. This is clarified using examples of weak and strong neuroAI and human-like agents, and discussing the generative, corroborate, and corrective ways in which the three dimensions interact with one another. The author maintains that future progress in artificial intelligence will need strong interactions across the disciplines, with iterative feedback loops and meticulous validity tests, leading to both known and yet-unknown advances that may span decades to come." + }, + "2210.08340v3": { + "title": "Toward Next-Generation Artificial Intelligence: Catalyzing the NeuroAI Revolution", + "url": "http://arxiv.org/abs/2210.08340v3", + "authors": "Anthony Zador, Sean Escola, Blake Richards, Bence \u00d6lveczky, Yoshua Bengio, Kwabena Boahen, Matthew Botvinick, Dmitri Chklovskii, Anne Churchland, Claudia Clopath, James DiCarlo, Surya Ganguli, Jeff Hawkins, Konrad Koerding, Alexei Koulakov, Yann LeCun, Timothy Lillicrap, Adam Marblestone, Bruno Olshausen, Alexandre Pouget, Cristina Savin, Terrence Sejnowski, Eero Simoncelli, Sara Solla, David Sussillo, Andreas S. Tolias, Doris Tsao", + "update_time": "2023-02-22", + "abstract": "Neuroscience has long been an essential driver of progress in artificial intelligence (AI). We propose that to accelerate progress in AI, we must invest in fundamental research in NeuroAI. A core component of this is the embodied Turing test, which challenges AI animal models to interact with the sensorimotor world at skill levels akin to their living counterparts. The embodied Turing test shifts the focus from those capabilities like game playing and language that are especially well-developed or uniquely human to those capabilities, inherited from over 500 million years of evolution, that are shared with all animals. Building models that can pass the embodied Turing test will provide a roadmap for the next generation of AI." + }, + "2112.15459v3": { + "title": "Social Neuro AI: Social Interaction as the \"dark matter\" of AI", + "url": "http://arxiv.org/abs/2112.15459v3", + "authors": "Samuele Bolotta, Guillaume Dumas", + "update_time": "2022-04-11", + "abstract": "This article introduces a three-axis framework indicating how AI can be informed by biological examples of social learning mechanisms. We argue that the complex human cognitive architecture owes a large portion of its expressive power to its ability to engage in social and cultural learning. However, the field of AI has mostly embraced a solipsistic perspective on intelligence. We thus argue that social interactions not only are largely unexplored in this field but also are an essential element of advanced cognitive ability, and therefore constitute metaphorically the dark matter of AI. In the first section, we discuss how social learning plays a key role in the development of intelligence. We do so by discussing social and cultural learning theories and empirical findings from social neuroscience. Then, we discuss three lines of research that fall under the umbrella of Social NeuroAI and can contribute to developing socially intelligent embodied agents in complex environments. First, neuroscientific theories of cognitive architecture, such as the global workspace theory and the attention schema theory, can enhance biological plausibility and help us understand how we could bridge individual and social theories of intelligence. Second, intelligence occurs in time as opposed to over time, and this is naturally incorporated by dynamical systems. Third, embodiment has been demonstrated to provide more sophisticated array of communicative signals. To conclude, we discuss the example of active inference, which offers powerful insights for developing agents that possess biological realism, can self-organize in time, and are socially embodied." + } + }, + "medical": { + "2411.10383v1": { + "title": "Framework for Co-distillation Driven Federated Learning to Address Class Imbalance in Healthcare", + "url": "http://arxiv.org/abs/2411.10383v1", + "authors": "Suraj Racha, Shubh Gupta, Humaira Firdowse, Aastik Solanki, Ganesh Ramakrishnan, Kshitij S. Jadhav", + "update_time": "2024-11-15", + "abstract": "Federated Learning (FL) is a pioneering approach in distributed machine learning, enabling collaborative model training across multiple clients while retaining data privacy. However, the inherent heterogeneity due to imbalanced resource representations across multiple clients poses significant challenges, often introducing bias towards the majority class. This issue is particularly prevalent in healthcare settings, where hospitals acting as clients share medical images. To address class imbalance and reduce bias, we propose a co-distillation driven framework in a federated healthcare setting. Unlike traditional federated setups with a designated server client, our framework promotes knowledge sharing among clients to collectively improve learning outcomes. Our experiments demonstrate that in a federated healthcare setting, co-distillation outperforms other federated methods in handling class imbalance. Additionally, we demonstrate that our framework has the least standard deviation with increasing imbalance while outperforming other baselines, signifying the robustness of our framework for FL in healthcare." + }, + "2411.10356v1": { + "title": "Weakly-Supervised Multimodal Learning on MIMIC-CXR", + "url": "http://arxiv.org/abs/2411.10356v1", + "authors": "Andrea Agostini, Daphn\u00e9 Chopard, Yang Meng, Norbert Fortin, Babak Shahbaba, Stephan Mandt, Thomas M. Sutter, Julia E. Vogt", + "update_time": "2024-11-15", + "abstract": "Multimodal data integration and label scarcity pose significant challenges for machine learning in medical settings. To address these issues, we conduct an in-depth evaluation of the newly proposed Multimodal Variational Mixture-of-Experts (MMVM) VAE on the challenging MIMIC-CXR dataset. Our analysis demonstrates that the MMVM VAE consistently outperforms other multimodal VAEs and fully supervised approaches, highlighting its strong potential for real-world medical applications." + }, + "2411.10342v1": { + "title": "EHRs Data Harmonization Platform, an easy-to-use shiny app based on recodeflow for harmonizing and deriving clinical features", + "url": "http://arxiv.org/abs/2411.10342v1", + "authors": "Arian Aminoleslami, Geoffrey M. Anderson, Davide Chicco", + "update_time": "2024-11-15", + "abstract": "Electronic health records (EHRs) contain important longitudinal information on individuals who have received medical care. Traditionally, EHRs have been used to support a wide range of administrative activities such as billing and clinical workflow, but, given the depth and breadth of clinical and demographic data they contain, they are increasingly being used to provide real-world data for research. Although EHR data have enormous research potential, the full realization of that potential requires a data management strategy that extracts from large EHR databases, that are collected from a range of care settings and time periods, well-documented research-relevant data that can be used by different researchers. Having a common well-documented data management strategy for EHR will support reproducible research and sharing documentation on research variables that are derived from EHR variables is important to open science. In this short paper, we describe the EHRs Data Harmonization Platform. The platform is based on an easy to use web app a publicly available at https://poxotn-arian-aminoleslami.shinyapps.io/Arian/ and as a standalone software package at https://github.com/ArianAminoleslami/EHRs-Data Harmonization-Platform, that is linked to an existing R library for data harmonization called recodeflow. The platform can be used to extract, document, and harmonize variables from EHR and it can also be used to document and share research variables that have been derived from those EHR data." + }, + "2411.10308v1": { + "title": "A Realistic Collimated X-Ray Image Simulation Pipeline", + "url": "http://arxiv.org/abs/2411.10308v1", + "authors": "Benjamin El-Zein, Dominik Eckert, Thomas Weber, Maximilian Rohleder, Ludwig Ritschl, Steffen Kappler, Andreas Maier", + "update_time": "2024-11-15", + "abstract": "Collimator detection remains a challenging task in X-ray systems with unreliable or non-available information about the detectors position relative to the source. This paper presents a physically motivated image processing pipeline for simulating the characteristics of collimator shadows in X-ray images. By generating randomized labels for collimator shapes and locations, incorporating scattered radiation simulation, and including Poisson noise, the pipeline enables the expansion of limited datasets for training deep neural networks. We validate the proposed pipeline by a qualitative and quantitative comparison against real collimator shadows. Furthermore, it is demonstrated that utilizing simulated data within our deep learning framework not only serves as a suitable substitute for actual collimators but also enhances the generalization performance when applied to real-world data." + }, + "2411.10273v1": { + "title": "Fill in the blanks: Rethinking Interpretability in vision", + "url": "http://arxiv.org/abs/2411.10273v1", + "authors": "Pathirage N. Deelaka, Tharindu Wickremasinghe, Devin Y. De Silva, Lisara N. Gajaweera", + "update_time": "2024-11-15", + "abstract": "Model interpretability is a key challenge that has yet to align with the advancements observed in contemporary state-of-the-art deep learning models. In particular, deep learning aided vision tasks require interpretability, in order for their adoption in more specialized domains such as medical imaging. Although the field of explainable AI (XAI) developed methods for interpreting vision models along with early convolutional neural networks, recent XAI research has mainly focused on assigning attributes via saliency maps. As such, these methods are restricted to providing explanations at a sample level, and many explainability methods suffer from low adaptability across a wide range of vision models. In our work, we re-think vision-model explainability from a novel perspective, to probe the general input structure that a model has learnt during its training. To this end, we ask the question: \"How would a vision model fill-in a masked-image\". Experiments on standard vision datasets and pre-trained models reveal consistent patterns, and could be intergrated as an additional model-agnostic explainability tool in modern machine-learning platforms. The code will be available at \\url{https://github.com/BoTZ-TND/FillingTheBlanks.git}" + }, + "2411.10237v1": { + "title": "ScribbleVS: Scribble-Supervised Medical Image Segmentation via Dynamic Competitive Pseudo Label Selection", + "url": "http://arxiv.org/abs/2411.10237v1", + "authors": "Tao Wang, Xinlin Zhang, Yuanbin Chen, Yuanbo Zhou, Longxuan Zhao, Tao Tan, Tong Tong", + "update_time": "2024-11-15", + "abstract": "In clinical medicine, precise image segmentation can provide substantial support to clinicians. However, achieving such precision often requires a large amount of finely annotated data, which can be costly. Scribble annotation presents a more efficient alternative, boosting labeling efficiency. However, utilizing such minimal supervision for medical image segmentation training, especially with scribble annotations, poses significant challenges. To address these challenges, we introduce ScribbleVS, a novel framework that leverages scribble annotations. We introduce a Regional Pseudo Labels Diffusion Module to expand the scope of supervision and reduce the impact of noise present in pseudo labels. Additionally, we propose a Dynamic Competitive Selection module for enhanced refinement in selecting pseudo labels. Experiments conducted on the ACDC and MSCMRseg datasets have demonstrated promising results, achieving performance levels that even exceed those of fully supervised methodologies. The codes of this study are available at https://github.com/ortonwang/ScribbleVS.", + "code_url": "https://github.com/ortonwang/scribblevs" + }, + "2411.10212v1": { + "title": "Embedding Byzantine Fault Tolerance into Federated Learning via Virtual Data-Driven Consistency Scoring Plugin", + "url": "http://arxiv.org/abs/2411.10212v1", + "authors": "Youngjoon Lee, Jinu Gong, Joonhyuk Kang", + "update_time": "2024-11-15", + "abstract": "Given sufficient data from multiple edge devices, federated learning (FL) enables training a shared model without transmitting private data to a central server. However, FL is generally vulnerable to Byzantine attacks from compromised edge devices, which can significantly degrade the model performance. In this paper, we propose a intuitive plugin that can be integrated into existing FL techniques to achieve Byzantine-Resilience. Key idea is to generate virtual data samples and evaluate model consistency scores across local updates to effectively filter out compromised edge devices. By utilizing this scoring mechanism before the aggregation phase, the proposed plugin enables existing FL techniques to become robust against Byzantine attacks while maintaining their original benefits. Numerical results on medical image classification task validate that plugging the proposed approach into representative FL algorithms, effectively achieves Byzantine resilience. Furthermore, the proposed plugin maintains the original convergence properties of the base FL algorithms when no Byzantine attacks are present." + }, + "2411.10168v1": { + "title": "Evaluating the role of `Constitutions' for learning from AI feedback", + "url": "http://arxiv.org/abs/2411.10168v1", + "authors": "Saskia Redgate, Andrew M. Bean, Adam Mahdi", + "update_time": "2024-11-15", + "abstract": "The growing capabilities of large language models (LLMs) have led to their use as substitutes for human feedback for training and assessing other LLMs. These methods often rely on `constitutions', written guidelines which a critic model uses to provide feedback and improve generations. We investigate how the choice of constitution affects feedback quality by using four different constitutions to improve patient-centered communication in medical interviews. In pairwise comparisons conducted by 215 human raters, we found that detailed constitutions led to better results regarding emotive qualities. However, none of the constitutions outperformed the baseline in learning more practically-oriented skills related to information gathering and provision. Our findings indicate that while detailed constitutions should be prioritised, there are possible limitations to the effectiveness of AI feedback as a reward signal in certain areas." + }, + "2411.10136v1": { + "title": "CoSAM: Self-Correcting SAM for Domain Generalization in 2D Medical Image Segmentation", + "url": "http://arxiv.org/abs/2411.10136v1", + "authors": "Yihang Fu, Ziyang Chen, Yiwen Ye, Xingliang Lei, Zhisong Wang, Yong Xia", + "update_time": "2024-11-15", + "abstract": "Medical images often exhibit distribution shifts due to variations in imaging protocols and scanners across different medical centers. Domain Generalization (DG) methods aim to train models on source domains that can generalize to unseen target domains. Recently, the segment anything model (SAM) has demonstrated strong generalization capabilities due to its prompt-based design, and has gained significant attention in image segmentation tasks. Existing SAM-based approaches attempt to address the need for manual prompts by introducing prompt generators that automatically generate these prompts. However, we argue that auto-generated prompts may not be sufficiently accurate under distribution shifts, potentially leading to incorrect predictions that still require manual verification and correction by clinicians. To address this challenge, we propose a method for 2D medical image segmentation called Self-Correcting SAM (CoSAM). Our approach begins by generating coarse masks using SAM in a prompt-free manner, providing prior prompts for the subsequent stages, and eliminating the need for prompt generators. To automatically refine these coarse masks, we introduce a generalized error decoder that simulates the correction process typically performed by clinicians. Furthermore, we generate diverse prompts as feedback based on the corrected masks, which are used to iteratively refine the predictions within a self-correcting loop, enhancing the generalization performance of our model. Extensive experiments on two medical image segmentation benchmarks across multiple scenarios demonstrate the superiority of CoSAM over state-of-the-art SAM-based methods." + }, + "2411.10116v1": { + "title": "Two-step registration method boosts sensitivity in longitudinal fixel-based analyses", + "url": "http://arxiv.org/abs/2411.10116v1", + "authors": "Aur\u00e9lie Lebrun, Michel Bottlaender, Julien Lagarde, Marie Sarazin, Yann Leprince", + "update_time": "2024-11-15", + "abstract": "Longitudinal analyses are increasingly used in clinical studies as they allow the study of subtle changes over time within the same subjects. In most of these studies, it is necessary to align all the images studied to a common reference by registering them to a template. In the study of white matter using the recently developed fixel-based analysis (FBA) method, this registration is important, in particular because the fiber bundle cross-section metric is a direct measure of this registration. In the vast majority of longitudinal FBA studies described in the literature, sessions acquired for a same subject are directly independently registered to the template. However, it has been shown in T1-based morphometry that a 2-step registration through an intra-subject average can be advantageous in longitudinal analyses. In this work, we propose an implementation of this 2-step registration method in a typical longitudinal FBA aimed at investigating the evolution of white matter changes in Alzheimer's disease (AD). We compared at the fixel level the mean absolute effect and standard deviation yielded by this registration method and by a direct registration, as well as the results obtained with each registration method for the study of AD in both fixelwise and tract-based analyses. We found that the 2-step method reduced the variability of the measurements and thus enhanced statistical power in both types of analyses." + } + } +} \ No newline at end of file