From 907127058e09a8a82e46d8f04b791df8fa74b9f6 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Tue, 25 Feb 2025 01:15:23 +0000 Subject: [PATCH] Update arXiv papers --- README.md | 272 ++++++++-------- data_store/papers_2025-02-25.json | 514 ++++++++++++++++++++++++++++++ 2 files changed, 650 insertions(+), 136 deletions(-) create mode 100644 data_store/papers_2025-02-25.json diff --git a/README.md b/README.md index 701bac3..2c250fc 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -

Updated on 2025-02-24

+

Updated on 2025-02-25

Brain

@@ -14,74 +14,74 @@ -2025-02-20 -Beyond Performance Scores: Directed Functional Connectivity as a Brain-Based Biomarker for Motor Skill Learning and Retention -Anil Kamat, Rahul Rahul, Lora Cavuoto, Harry Burke, Matthew Hackett, Jack Norfleet, Steven Schwaitzberg, Suvranu De -Link -Motor skill acquisition in fields like surgery, robotics, and sports involves learning complex task sequences through extensive training. Traditional performance metrics, like execution time and error rates, offer limited insight as they fail to capture the neural mechanisms underlying skill learning and retention. This study introduces directed functional connectivity (dFC), derived from electroencephalography (EEG), as a novel brain-based biomarker for assessing motor skill learning and retention. For the first time, dFC is applied as a biomarker to map the stages of the Fitts and Posner motor learning model, offering new insights into the neural mechanisms underlying skill acquisition and retention. Unlike traditional measures, it captures both the strength and direction of neural information flow, providing a comprehensive understanding of neural adaptations across different learning stages. The analysis demonstrates that dFC can effectively identify and track the progression through various stages of the Fitts and Posner model. Furthermore, its stability over a six-week washout period highlights its utility in monitoring long-term retention. No significant changes in dFC were observed in a control group, confirming that the observed neural adaptations were specific to training and not due to external factors. By offering a granular view of the learning process at the group and individual levels, dFC facilitates the development of personalized, targeted training protocols aimed at enhancing outcomes in fields where precision and long-term retention are critical, such as surgical education. These findings underscore the value of dFC as a robust biomarker that complements traditional performance metrics, providing a deeper understanding of motor skill learning and retention. +2025-02-21 +Causal Modeling of fMRI Time-series for Interpretable Autism Spectrum Disorder Classification +Peiyu Duan, Nicha C. Dvornek, Jiyao Wang, Lawrence H. Staib, James S. Duncan +Link +Autism spectrum disorder (ASD) is a neurological and developmental disorder that affects social and communicative behaviors. It emerges in early life and is generally associated with lifelong disabilities. Thus, accurate and early diagnosis could facilitate treatment outcomes for those with ASD. Functional magnetic resonance imaging (fMRI) is a useful tool that measures changes in brain signaling to facilitate our understanding of ASD. Much effort is being made to identify ASD biomarkers using various connectome-based machine learning and deep learning classifiers. However, correlation-based models cannot capture the non-linear interactions between brain regions. To solve this problem, we introduce a causality-inspired deep learning model that uses time-series information from fMRI and captures causality among ROIs useful for ASD classification. The model is compared with other baseline and state-of-the-art models with 5-fold cross-validation on the ABIDE dataset. We filtered the dataset by choosing all the images with mean FD less than 15mm to ensure data quality. Our proposed model achieved the highest average classification accuracy of 71.9% and an average AUC of 75.8%. Moreover, the inter-ROI causality interpretation of the model suggests that the left precuneus, right precuneus, and cerebellum are placed in the top 10 ROIs in inter-ROI causality among the ASD population. In contrast, these ROIs are not ranked in the top 10 in the control population. We have validated our findings with the literature and found that abnormalities in these ROIs are often associated with ASD. -2025-02-20 -Explanations of Deep Language Models Explain Language Representations in the Brain -Maryam Rahimi, Yadollah Yaghoobzadeh, Mohammad Reza Daliri -Link -Recent advances in artificial intelligence have given rise to large language models (LLMs) that not only achieve human-like performance but also share computational principles with the brain's language processing mechanisms. While previous research has primarily focused on aligning LLMs' internal representations with neural activity, we introduce a novel approach that leverages explainable AI (XAI) methods to forge deeper connections between the two domains. Using attribution methods, we quantified how preceding words contribute to an LLM's next-word predictions and employed these explanations to predict fMRI recordings from participants listening to the same narratives. Our findings demonstrate that attribution methods robustly predict brain activity across the language network, surpassing traditional internal representations in early language areas. This alignment is hierarchical: early-layer explanations correspond to the initial stages of language processing in the brain, while later layers align with more advanced stages. Moreover, the layers more influential on LLM next-word prediction$\unicode{x2014}$those with higher attribution scores$\unicode{x2014}$exhibited stronger alignment with neural activity. This work establishes a bidirectional bridge between AI and neuroscience. First, we demonstrate that attribution methods offer a powerful lens for investigating the neural mechanisms of language comprehension, revealing how meaning emerges from preceding context. Second, we propose using brain alignment as a metric to evaluate the validity of attribution methods, providing a framework for assessing their biological plausibility. +2025-02-21 +BAN: Neuroanatomical Aligning in Auditory Recognition between Artificial Neural Network and Human Cortex +Haidong Wang, Pengfei Xiao, Ao Liu, Jianhua Zhang, Qia Shan +Link +Drawing inspiration from neurosciences, artificial neural networks (ANNs) have evolved from shallow architectures to highly complex, deep structures, yielding exceptional performance in auditory recognition tasks. However, traditional ANNs often struggle to align with brain regions due to their excessive depth and lack of biologically realistic features, like recurrent connection. To address this, a brain-like auditory network (BAN) is introduced, which incorporates four neuroanatomically mapped areas and recurrent connection, guided by a novel metric called the brain-like auditory score (BAS). BAS serves as a benchmark for evaluating the similarity between BAN and human auditory recognition pathway. We further propose that specific areas in the cerebral cortex, mainly the middle and medial superior temporal (T2/T3) areas, correspond to the designed network structure, drawing parallels with the brain's auditory perception pathway. Our findings suggest that the neuroanatomical similarity in the cortex and auditory classification abilities of the ANN are well-aligned. In addition to delivering excellent performance on a music genre classification task, the BAN demonstrates a high BAS score. In conclusion, this study presents BAN as a recurrent, brain-inspired ANN, representing the first model that mirrors the cortical pathway of auditory recognition. -2025-02-20 -MedFuncta: Modality-Agnostic Representations Based on Efficient Neural Fields -Paul Friedrich, Florentin Bieder, Phlippe C. Cattin -Link -Recent research in medical image analysis with deep learning almost exclusively focuses on grid- or voxel-based data representations. We challenge this common choice by introducing MedFuncta, a modality-agnostic continuous data representation based on neural fields. We demonstrate how to scale neural fields from single instances to large datasets by exploiting redundancy in medical signals and by applying an efficient meta-learning approach with a context reduction scheme. We further address the spectral bias in commonly used SIREN activations, by introducing an $\omega_0$-schedule, improving reconstruction quality and convergence speed. We validate our proposed approach on a large variety of medical signals of different dimensions and modalities (1D: ECG; 2D: Chest X-ray, Retinal OCT, Fundus Camera, Dermatoscope, Colon Histopathology, Cell Microscopy; 3D: Brain MRI, Lung CT) and successfully demonstrate that we can solve relevant downstream tasks on these representations. We additionally release a large-scale dataset of > 550k annotated neural fields to promote research in this direction. +2025-02-21 +Confidence-Based Annotation Of Brain Tumours In Ultrasound +Alistair Weld, Luke Dixon, Alfie Roddan, Giulio Anichini, Sophie Camp, Stamatia Giannarou +Link +Purpose: An investigation of the challenge of annotating discrete segmentations of brain tumours in ultrasound, with a focus on the issue of aleatoric uncertainty along the tumour margin, particularly for diffuse tumours. A segmentation protocol and method is proposed that incorporates this margin-related uncertainty while minimising the interobserver variance through reduced subjectivity, thereby diminishing annotator epistemic uncertainty. Approach: A sparse confidence method for annotation is proposed, based on a protocol designed using computer vision and radiology theory. Results: Output annotations using the proposed method are compared with the corresponding professional discrete annotation variance between the observers. A linear relationship was measured within the tumour margin region, with a Pearson correlation of 0.8. The downstream application was explored, comparing training using confidence annotations as soft labels with using the best discrete annotations as hard labels. In all evaluation folds, the Brier score was superior for the soft-label trained network. Conclusion: A formal framework was constructed to demonstrate the infeasibility of discrete annotation of brain tumours in B-mode ultrasound. Subsequently, a method for sparse confidence-based annotation is proposed and evaluated. Keywords: Brain tumours, ultrasound, confidence, annotation. -2025-02-19 -Dynamic Activation with Knowledge Distillation for Energy-Efficient Spiking NN Ensembles -Orestis Konstantaropoulos, Theodoris Mallios, Maria Papadopouli -Link -While foundation AI models excel at tasks like classification and decision-making, their high energy consumption makes them unsuitable for energy-constrained applications. Inspired by the brain's efficiency, spiking neural networks (SNNs) have emerged as a viable alternative due to their event-driven nature and compatibility with neuromorphic chips. This work introduces a novel system that combines knowledge distillation and ensemble learning to bridge the performance gap between artificial neural networks (ANNs) and SNNs. A foundation AI model acts as a teacher network, guiding smaller student SNNs organized into an ensemble, called Spiking Neural Ensemble (SNE). SNE enables the disentanglement of the teacher's knowledge, allowing each student to specialize in predicting a distinct aspect of it, while processing the same input. The core innovation of SNE is the adaptive activation of a subset of SNN models of an ensemble, leveraging knowledge-distillation, enhanced with an informed-partitioning (disentanglement) of the teacher's feature space. By dynamically activating only a subset of these student SNNs, the system balances accuracy and energy efficiency, achieving substantial energy savings with minimal accuracy loss. Moreover, SNE is significantly more efficient than the teacher network, reducing computational requirements by up to 20x with only a 2% drop in accuracy on the CIFAR-10 dataset. This disentanglement procedure achieves an accuracy improvement of up to 2.4% on the CIFAR-10 dataset compared to other partitioning schemes. Finally, we comparatively analyze SNE performance under noisy conditions, demonstrating enhanced robustness compared to its ANN teacher. In summary, SNE offers a promising new direction for energy-constrained applications. +2025-02-21 +M2LADS Demo: A System for Generating Multimodal Learning Analytics Dashboards +Alvaro Becerra, Roberto Daza, Ruth Cobos, Aythami Morales, Julian Fierrez +Link +We present a demonstration of a web-based system called M2LADS ("System for Generating Multimodal Learning Analytics Dashboards"), designed to integrate, synchronize, visualize, and analyze multimodal data recorded during computer-based learning sessions with biosensors. This system presents a range of biometric and behavioral data on web-based dashboards, providing detailed insights into various physiological and activity-based metrics. The multimodal data visualized include electroencephalogram (EEG) data for assessing attention and brain activity, heart rate metrics, eye-tracking data to measure visual attention, webcam video recordings, and activity logs of the monitored tasks. M2LADS aims to assist data scientists in two key ways: (1) by providing a comprehensive view of participants' experiences, displaying all data categorized by the activities in which participants are engaged, and (2) by synchronizing all biosignals and videos, facilitating easier data relabeling if any activity information contains errors. -2025-02-19 -Freezing of Gait as a Complication of Pallidal Deep Brain Stimulation in DYT- KMT2B Patients with Evidence of Striatonigral Degeneration -Laura Cif, Diane Demailly, Xavier Vasques, Delphine de Verbizier, Philippe Coubes, Kathleen Gorman, Manju A Kurian -Link -Background: Mutations in KMT2B are a recognized cause of early-onset complex dystonia, with deep brain stimulation (DBS) of the internal globus pallidus (GPi-DBS) being an effective treatment. However, gait impairment, particularly freezing of gait (FOG), remains a significant challenge in DYT-KMT2B patients post-DBS. Objectives: To characterize the emergence of FOG in DYT-KMT2B patients treated with GPi-DBS and explore potential underlying mechanisms, including striatonigral degeneration. Methods: Five patients (four females) with KMT2B-related dystonia and protein-truncating variants (PTVs) were retrospectively analyzed. Clinical progression, response to GPi-DBS, and the presence of FOG were documented. Dopaminergic function was assessed using DaTscan (SPECT for ^123I-ioflupane) in four patients. Results: FOG developed in all patients, with onset ranging from 1 to 15.5 years post-DBS. DaTscan abnormalities, indicative of bilateral striatal dopaminergic denervation, were observed in four cases. Prior to DBS, all patients exhibited dystonia unresponsive to L-dopa, and post-DBS, FOG remained refractory to dopaminergic treatment in most cases. Despite initial improvements in gait post-DBS, only one patient maintained independent ambulation at the last follow-up. Conclusions: FOG is an emerging complication in DYT-KMT2B patients with PTVs undergoing GPi-DBS, potentially linked to underlying striatonigral degeneration. The findings suggest a need for long-term motor surveillance and consideration of alternative therapeutic strategies, including dopaminergic trials, in this patient population. Further studies are required to elucidate the precise mechanisms driving DBS-related hypokinetic gait disturbances in DYT-KMT2B dystonia. +2025-02-21 +Applications of wavelet transform in classification of local field potential recorded from the rat brain in conditioned place preference paradigm +AmirAli Kalbasi, Mahdi Aliyari Shoorehdeli, Shole Jamali, Abbas Haghparast +Link +This study investigates the multi-label classification of Local Field Potential (LFP) data from the hippocampus (HIP) and nucleus accumbens (NAc) in the rat brain, focusing on reward responses using the Conditioned Place Preference (CPP) paradigm. Rats were conditioned with saline, morphine, and food rewards, and LFP recordings were conducted from both HIP and NAc during pre- and post-tests. The LFP data were classified into four categories: treatment types, test phases, recording channels, and chamber positions within the CPP setup. Features were extracted using Continuous Wavelet Transform (CWT), Wavelet Coherence, and Wavelet Scattering. Classification was performed via Decision Trees, Multilayer Perceptrons, and Support Vector Machines. Notably, in the Food group, HIP and combined HIP-NAc features yielded the highest classification accuracy for CPP chambers, whereas NAc features excelled in the Morphine group. Employing wavelet scattering, an 80% classification accuracy was achieved across treatment groups, test phases, and channels. Exceptionally high classification accuracies were observed for Food-post-test-HIP (99.75%) and Morphine-post-test-NAc (99.58%). The study reveals that NAc activity is pivotal for morphine-induced CPP, whereas HIP and HIP-NAc connectivity are crucial for food-induced CPP. The proposed methodology provides a novel avenue for precisely classifying LFP data, shedding light on neural circuit activities underlying behavioral responses. -2025-02-20 -Emergence of the Primacy Effect in Structured State-Space Models -Takashi Morita -Link -Human and animal memory for sequentially presented items is well-documented to be more accurate for those at the beginning and end of the sequence, phenomena known as the primacy and recency effects, respectively. By contrast, artificial neural network (ANN) models are typically designed with a memory that decays monotonically over time. Accordingly, ANNs are expected to show the recency effect but not the primacy effect. Contrary to this theoretical expectation, however, the present study reveals a counterintuitive finding: a recently developed ANN architecture, called structured state-space models, exhibits the primacy effect when trained and evaluated on a synthetic task that mirrors psychological memory experiments. Given that this model was originally designed for recovering neuronal activity patterns observed in biological brains, this result provides a novel perspective on the psychological primacy effect while also posing a non-trivial puzzle for the current theories in machine learning. +2025-02-21 +Graph-Based Deep Learning on Stereo EEG for Predicting Seizure Freedom in Epilepsy Patients +Artur Agaronyan, Syeda Abeera Amir, Nunthasiri Wittayanakorn, John Schreiber, Marius G. Linguraru, William Gaillard, Chima Oluigbo, Syed Muhammad Anwar +Link +Predicting seizure freedom is essential for tailoring epilepsy treatment. But accurate prediction remains challenging with traditional methods, especially with diverse patient populations. This study developed a deep learning-based graph neural network (GNN) model to predict seizure freedom from stereo electroencephalography (sEEG) data in patients with refractory epilepsy. We utilized high-quality sEEG data from 15 pediatric patients to train a deep learning model that can accurately predict seizure freedom outcomes and advance understanding of brain connectivity at the seizure onset zone. Our model integrates local and global connectivity using graph convolutions with multi-scale attention mechanisms to capture connections between difficult-to-study regions such as the thalamus and motor regions. The model achieved an accuracy of 92.4% in binary class analysis, 86.6% in patient-wise analysis, and 81.4% in multi-class analysis. Node and edge-level feature analysis highlighted the anterior cingulate and frontal pole regions as key contributors to seizure freedom outcomes. The nodes identified by our model were also more likely to coincide with seizure onset zones. Our findings underscore the potential of new connectivity-based deep learning models such as GNNs for enhancing the prediction of seizure freedom, predicting seizure onset zones, connectivity analysis of the brain during seizure, as well as informing AI-assisted personalized epilepsy treatment planning. -2025-02-19 -MoM: Linear Sequence Modeling with Mixture-of-Memories -Jusen Du, Weigao Sun, Disen Lan, Jiaxi Hu, Yu Cheng -Link -Linear sequence modeling methods, such as linear attention, state space modeling, and linear RNNs, offer significant efficiency improvements by reducing the complexity of training and inference. However, these methods typically compress the entire input sequence into a single fixed-size memory state, which leads to suboptimal performance on recall-intensive downstream tasks. Drawing inspiration from neuroscience, particularly the brain's ability to maintain robust long-term memory while mitigating "memory interference", we introduce a novel architecture called Mixture-of-Memories (MoM). MoM utilizes multiple independent memory states, with a router network directing input tokens to specific memory states. This approach greatly enhances the overall memory capacity while minimizing memory interference. As a result, MoM performs exceptionally well on recall-intensive tasks, surpassing existing linear sequence modeling techniques. Despite incorporating multiple memory states, the computation of each memory state remains linear in complexity, allowing MoM to retain the linear-complexity advantage during training, while constant-complexity during inference. Our experimental results show that MoM significantly outperforms current linear sequence models on downstream language tasks, particularly recall-intensive tasks, and even achieves performance comparable to Transformer models. The code is released at https://github.com/OpenSparseLLMs/MoM and is also released as a part of https://github.com/OpenSparseLLMs/Linear-MoE. +2025-02-21 +BP-GPT: Auditory Neural Decoding Using fMRI-prompted LLM +Xiaoyu Chen, Changde Du, Che Liu, Yizhe Wang, Huiguang He +Link +Decoding language information from brain signals represents a vital research area within brain-computer interfaces, particularly in the context of deciphering the semantic information from the fMRI signal. Although existing work uses LLM to achieve this goal, their method does not use an end-to-end approach and avoids the LLM in the mapping of fMRI-to-text, leaving space for the exploration of the LLM in auditory decoding. In this paper, we introduce a novel method, the Brain Prompt GPT (BP-GPT). By using the brain representation that is extracted from the fMRI as a prompt, our method can utilize GPT-2 to decode fMRI signals into stimulus text. Further, we introduce the text prompt and align the fMRI prompt to it. By introducing the text prompt, our BP-GPT can extract a more robust brain prompt and promote the decoding of pre-trained LLM. We evaluate our BP-GPT on the open-source auditory semantic decoding dataset and achieve a significant improvement up to 4.61 on METEOR and 2.43 on BERTScore across all the subjects compared to the state-of-the-art method. The experimental results demonstrate that using brain representation as a prompt to further drive LLM for auditory neural decoding is feasible and effective. The code is available at https://github.com/1994cxy/BP-GPT. -2025-02-19 -Long-term follow-up of DYT1 dystonia patients treated by deep brain stimulation: an open-label study -Laura Cif, Xavier Vasques, Victoria Gonzalez, Patrice Ravel, Brigitte Biolsi, Gwenaelle Collod-Beroud, Sylvie Tuffery-Giraud, Hassan Elfertit, Mireille Claustres, Philippe Coubes -Link -Long-term efficacy of internal globus pallidus (GPi) deep-brain stimulation (DBS) in DYT1 dystonia and disease progression under DBS was studied. Twenty-six patients of this open-label study were divided into two groups: (A) with single bilateral GPi lead, (B) with a second bilateral GPi lead implanted owning to subsequent worsening of symptomatology. Dystonia was assessed with the Burke Scale. Appearance of new symptoms and distribution according to body region were recorded. In the whole cohort, significant decreases in motor and disability subscores (P < 0.0001) were observed at 1 year and maintained up to 10 years. Group B showed worsening of the symptoms. At 1 year, there were no significant differences between Groups A (without subsequent worsening) and B; at 5 years, a significant difference was found for motor and disability scores. Within Group B, four patients exhibited additional improvement after the second DBS surgery. In the 26 patients, significant difference (P = 0.001) was found between the number of body regions affected by dystonia preoperatively and over the whole follow-up. DBS efficacy in DYT1 dystonia can be maintained up to 10 years (two patients). New symptoms appear with long-term follow-up and may improve with additional leads in a subgroup of patients. +2025-02-20 +Estimating Neural Representation Alignment from Limited Inputs and Features +Chanwoo Chun, Abdulkadir Canatar, SueYeon Chung, Daniel D. Lee +Link +In both artificial and biological systems, the centered kernel alignment (CKA) has become a widely used tool for quantifying neural representation similarity. While current CKA estimators typically correct for the effects of finite stimuli sampling, the effects of sampling a subset of neurons are overlooked, introducing notable bias in standard experimental scenarios. Here, we provide a theoretical analysis showing how this bias is affected by the representation geometry. We then introduce a novel estimator that corrects for both input and feature sampling. We use our method for evaluating both brain-to-brain and model-to-brain alignments and show that it delivers reliable comparisons even with very sparsely sampled neurons. We perform within-animal and across-animal comparisons on electrophysiological data from visual cortical areas V1, V4, and IT data, and use these as benchmarks to evaluate model-to-brain alignment. We also apply our method to reveal how object representations become progressively disentangled across layers in both biological and artificial systems. These findings underscore the importance of correcting feature-sampling biases in CKA and demonstrate that our bias-corrected estimator provides a more faithful measure of representation alignment. The improved estimates increase our understanding of how neural activity is structured across both biological and artificial systems. -2025-02-19 -LaVCa: LLM-assisted Visual Cortex Captioning -Takuya Matsuyama, Shinji Nishimoto, Yu Takagi -Link -Understanding the property of neural populations (or voxels) in the human brain can advance our comprehension of human perceptual and cognitive processing capabilities and contribute to developing brain-inspired computer models. Recent encoding models using deep neural networks (DNNs) have successfully predicted voxel-wise activity. However, interpreting the properties that explain voxel responses remains challenging because of the black-box nature of DNNs. As a solution, we propose LLM-assisted Visual Cortex Captioning (LaVCa), a data-driven approach that uses large language models (LLMs) to generate natural-language captions for images to which voxels are selective. By applying LaVCa for image-evoked brain activity, we demonstrate that LaVCa generates captions that describe voxel selectivity more accurately than the previously proposed method. Furthermore, the captions generated by LaVCa quantitatively capture more detailed properties than the existing method at both the inter-voxel and intra-voxel levels. Furthermore, a more detailed analysis of the voxel-specific properties generated by LaVCa reveals fine-grained functional differentiation within regions of interest (ROIs) in the visual cortex and voxels that simultaneously represent multiple distinct concepts. These findings offer profound insights into human visual representations by assigning detailed captions throughout the visual cortex while highlighting the potential of LLM-based methods in understanding brain representations. Please check out our webpage at https://sites.google.com/view/lavca-llm/ +2025-02-20 +Fundamental Survey on Neuromorphic Based Audio Classification +Amlan Basu, Pranav Chaudhari, Gaetano Di Caterina +Link +Audio classification is paramount in a variety of applications including surveillance, healthcare monitoring, and environmental analysis. Traditional methods frequently depend on intricate signal processing algorithms and manually crafted features, which may fall short in fully capturing the complexities of audio patterns. Neuromorphic computing, inspired by the architecture and functioning of the human brain, presents a promising alternative for audio classification tasks. This survey provides an exhaustive examination of the current state-of-the-art in neuromorphic-based audio classification. It delves into the crucial components of neuromorphic systems, such as Spiking Neural Networks (SNNs), memristors, and neuromorphic hardware platforms, highlighting their advantages in audio classification. Furthermore, the survey explores various methodologies and strategies employed in neuromorphic audio classification, including event-based processing, spike-based learning, and bio-inspired feature extraction. It examines how these approaches address the limitations of traditional audio classification methods, particularly in terms of energy efficiency, real-time processing, and robustness to environmental noise. Additionally, the paper conducts a comparative analysis of different neuromorphic audio classification models and benchmarks, evaluating their performance metrics, computational efficiency, and scalability. By providing a comprehensive guide for researchers, engineers and practitioners, this survey aims to stimulate further innovation and advancements in the evolving field of neuromorphic audio classification. -2025-02-19 -Improving the Sparse Structure Learning of Spiking Neural Networks from the View of Compression Efficiency -Jiangrong Shen, Qi Xu, Gang Pan, Badong Chen -Link -The human brain utilizes spikes for information transmission and dynamically reorganizes its network structure to boost energy efficiency and cognitive capabilities throughout its lifespan. Drawing inspiration from this spike-based computation, Spiking Neural Networks (SNNs) have been developed to construct event-driven models that emulate this efficiency. Despite these advances, deep SNNs continue to suffer from over-parameterization during training and inference, a stark contrast to the brain's ability to self-organize. Furthermore, existing sparse SNNs are challenged by maintaining optimal pruning levels due to a static pruning ratio, resulting in either under- or over-pruning. In this paper, we propose a novel two-stage dynamic structure learning approach for deep SNNs, aimed at maintaining effective sparse training from scratch while optimizing compression efficiency. The first stage evaluates the compressibility of existing sparse subnetworks within SNNs using the PQ index, which facilitates an adaptive determination of the rewiring ratio for synaptic connections based on data compression insights. In the second stage, this rewiring ratio critically informs the dynamic synaptic connection rewiring process, including both pruning and regrowth. This approach significantly improves the exploration of sparse structure training in deep SNNs, adapting sparsity dynamically from the point view of compression efficiency. Our experiments demonstrate that this sparse training approach not only aligns with the performance of current deep SNNs models but also significantly improves the efficiency of compressing sparse SNNs. Crucially, it preserves the advantages of initiating training with sparse models and offers a promising solution for implementing edge AI on neuromorphic hardware. +2025-02-20 +Beyond Performance Scores: Directed Functional Connectivity as a Brain-Based Biomarker for Motor Skill Learning and Retention +Anil Kamat, Rahul Rahul, Lora Cavuoto, Harry Burke, Matthew Hackett, Jack Norfleet, Steven Schwaitzberg, Suvranu De +Link +Motor skill acquisition in fields like surgery, robotics, and sports involves learning complex task sequences through extensive training. Traditional performance metrics, like execution time and error rates, offer limited insight as they fail to capture the neural mechanisms underlying skill learning and retention. This study introduces directed functional connectivity (dFC), derived from electroencephalography (EEG), as a novel brain-based biomarker for assessing motor skill learning and retention. For the first time, dFC is applied as a biomarker to map the stages of the Fitts and Posner motor learning model, offering new insights into the neural mechanisms underlying skill acquisition and retention. Unlike traditional measures, it captures both the strength and direction of neural information flow, providing a comprehensive understanding of neural adaptations across different learning stages. The analysis demonstrates that dFC can effectively identify and track the progression through various stages of the Fitts and Posner model. Furthermore, its stability over a six-week washout period highlights its utility in monitoring long-term retention. No significant changes in dFC were observed in a control group, confirming that the observed neural adaptations were specific to training and not due to external factors. By offering a granular view of the learning process at the group and individual levels, dFC facilitates the development of personalized, targeted training protocols aimed at enhancing outcomes in fields where precision and long-term retention are critical, such as surgical education. These findings underscore the value of dFC as a robust biomarker that complements traditional performance metrics, providing a deeper understanding of motor skill learning and retention. @@ -100,6 +100,27 @@ +2025-02-21 +M2LADS Demo: A System for Generating Multimodal Learning Analytics Dashboards +Alvaro Becerra, Roberto Daza, Ruth Cobos, Aythami Morales, Julian Fierrez +Link +We present a demonstration of a web-based system called M2LADS ("System for Generating Multimodal Learning Analytics Dashboards"), designed to integrate, synchronize, visualize, and analyze multimodal data recorded during computer-based learning sessions with biosensors. This system presents a range of biometric and behavioral data on web-based dashboards, providing detailed insights into various physiological and activity-based metrics. The multimodal data visualized include electroencephalogram (EEG) data for assessing attention and brain activity, heart rate metrics, eye-tracking data to measure visual attention, webcam video recordings, and activity logs of the monitored tasks. M2LADS aims to assist data scientists in two key ways: (1) by providing a comprehensive view of participants' experiences, displaying all data categorized by the activities in which participants are engaged, and (2) by synchronizing all biosignals and videos, facilitating easier data relabeling if any activity information contains errors. + + +2025-02-21 +Graph-Based Deep Learning on Stereo EEG for Predicting Seizure Freedom in Epilepsy Patients +Artur Agaronyan, Syeda Abeera Amir, Nunthasiri Wittayanakorn, John Schreiber, Marius G. Linguraru, William Gaillard, Chima Oluigbo, Syed Muhammad Anwar +Link +Predicting seizure freedom is essential for tailoring epilepsy treatment. But accurate prediction remains challenging with traditional methods, especially with diverse patient populations. This study developed a deep learning-based graph neural network (GNN) model to predict seizure freedom from stereo electroencephalography (sEEG) data in patients with refractory epilepsy. We utilized high-quality sEEG data from 15 pediatric patients to train a deep learning model that can accurately predict seizure freedom outcomes and advance understanding of brain connectivity at the seizure onset zone. Our model integrates local and global connectivity using graph convolutions with multi-scale attention mechanisms to capture connections between difficult-to-study regions such as the thalamus and motor regions. The model achieved an accuracy of 92.4% in binary class analysis, 86.6% in patient-wise analysis, and 81.4% in multi-class analysis. Node and edge-level feature analysis highlighted the anterior cingulate and frontal pole regions as key contributors to seizure freedom outcomes. The nodes identified by our model were also more likely to coincide with seizure onset zones. Our findings underscore the potential of new connectivity-based deep learning models such as GNNs for enhancing the prediction of seizure freedom, predicting seizure onset zones, connectivity analysis of the brain during seizure, as well as informing AI-assisted personalized epilepsy treatment planning. + + +2025-02-21 +Assessing a Single Student's Concentration on Learning Platforms: A Machine Learning-Enhanced EEG-Based Framework +Zewen Zhuo, Mohamad Najafi, Hazem Zein, Amine Nait-Ali +Link +This study introduces a specialized pipeline designed to classify the concentration state of an individual student during online learning sessions by training a custom-tailored machine learning model. Detailed protocols for acquiring and preprocessing EEG data are outlined, along with the extraction of fifty statistical features from five EEG signal bands: alpha, beta, theta, delta, and gamma. Following feature extraction, a thorough feature selection process was conducted to optimize the data inputs for a personalized analysis. The study also explores the benefits of hyperparameter fine-tuning to enhance the classification accuracy of the student's concentration state. EEG signals were captured from the student using a Muse headband (Gen 2), equipped with five electrodes (TP9, AF7, AF8, TP10, and a reference electrode NZ), during engagement with educational content on computer-based e-learning platforms. Employing a random forest model customized to the student's data, we achieved remarkable classification performance, with test accuracies of 97.6% in the computer-based learning setting and 98% in the virtual reality setting. These results underscore the effectiveness of our approach in delivering personalized insights into student concentration during online educational activities. + + 2025-02-20 Beyond Performance Scores: Directed Functional Connectivity as a Brain-Based Biomarker for Motor Skill Learning and Retention Anil Kamat, Rahul Rahul, Lora Cavuoto, Harry Burke, Matthew Hackett, Jack Norfleet, Steven Schwaitzberg, Suvranu De @@ -148,27 +169,6 @@ Link Human-robot collaboration (HRC) relies on accurate and timely recognition of human intentions to ensure seamless interactions. Among common HRC tasks, human-to-robot object handovers have been studied extensively for planning the robot's actions during object reception, assuming the human intention for object handover. However, distinguishing handover intentions from other actions has received limited attention. Most research on handovers has focused on visually detecting motion trajectories, which often results in delays or false detections when trajectories overlap. This paper investigates whether human intentions for object handovers are reflected in non-movement-based physiological signals. We conduct a multimodal analysis comparing three data modalities: electroencephalogram (EEG), gaze, and hand-motion signals. Our study aims to distinguish between handover-intended human motions and non-handover motions in an HRC setting, evaluating each modality's performance in predicting and classifying these actions before and after human movement initiation. We develop and evaluate human intention detectors based on these modalities, comparing their accuracy and timing in identifying handover intentions. To the best of our knowledge, this is the first study to systematically develop and test intention detectors across multiple modalities within the same experimental context of human-robot handovers. Our analysis reveals that handover intention can be detected from all three modalities. Nevertheless, gaze signals are the earliest as well as the most accurate to classify the motion as intended for handover or non-handover. - -2025-02-15 -Hybrid Brain-Machine Interface: Integrating EEG and EMG for Reduced Physical Demand -Daniel Wang, Katie Hong, Zachary Sayyah, Malcolm Krolick, Emma Steinberg, Rohan Venkatdas, Sidharth Pavuluri, Yipeng Wang, Zihan Huang -Link -We present a hybrid brain-machine interface (BMI) that integrates steady-state visually evoked potential (SSVEP)-based EEG and facial EMG to improve multimodal control and mitigate fatigue in assistive applications. Traditional BMIs relying solely on EEG or EMG suffer from inherent limitations; EEG-based control requires sustained visual focus, leading to cognitive fatigue, while EMG-based control induces muscular fatigue over time. Our system dynamically alternates between EEG and EMG inputs, using EEG to detect SSVEP signals at 9.75 Hz and 14.25 Hz and EMG from cheek and neck muscles to optimize control based on task demands. In a virtual turtle navigation task, the hybrid system achieved task completion times comparable to an EMG-only approach, while 90% of users reported reduced or equal physical demand. These findings demonstrate that multimodal BMI systems can enhance usability, reduce strain, and improve long-term adherence in assistive technologies. - - -2025-02-13 -Revisiting Euclidean Alignment for Transfer Learning in EEG-Based Brain-Computer Interfaces -Dongrui Wu -Link -Due to the non-stationarity and large individual differences of EEG signals, EEG-based brain-computer interfaces (BCIs) usually need subject-specific calibration to tailor the decoding algorithm for each new subject, which is time-consuming and user-unfriendly, hindering their real-world applications. Transfer learning (TL) has been extensively used to expedite the calibration, by making use of EEG data from other subjects/sessions. An important consideration in TL for EEG-based BCIs is to reduce the data distribution discrepancies among different subjects/session, to avoid negative transfer. Euclidean alignment (EA) was proposed in 2020 to address this challenge. Numerous experiments from 10 different BCI paradigms demonstrated its effectiveness and efficiency. This paper revisits the EA, explaining its procedure and correct usage, introducing its applications and extensions, and pointing out potential new research directions. It should be very helpful to BCI researchers, especially those who are working on EEG signal decoding. - - -2025-02-12 -Deep EEG Super-Resolution: Upsampling EEG Spatial Resolution with Generative Adversarial Networks -Isaac Corley, Yufei Huang -Link -Electroencephalography (EEG) activity contains a wealth of information about what is happening within the human brain. Recording more of this data has the potential to unlock endless future applications. However, the cost of EEG hardware is increasingly expensive based upon the number of EEG channels being recorded simultaneously. We combat this problem in this paper by proposing a novel deep EEG super-resolution (SR) approach based on Generative Adversarial Networks (GANs). This approach can produce high spatial resolution EEG data from low resolution samples, by generating channel-wise upsampled data to effectively interpolate numerous missing channels, thus reducing the need for expensive EEG equipment. We tested the performance using an EEG dataset from a mental imagery task. Our proposed GAN model provided 10^4 fold and 10^2 fold reduction in mean-squared error (MSE) and mean-absolute error (MAE), respectively, over the baseline bicubic interpolation method. We further validate our method by training a classifier on the original classification task, which displayed minimal loss in accuracy while using the super-resolved data. The proposed SR EEG by GAN is a promising approach to improve the spatial resolution of low density EEG headsets. - @@ -272,10 +272,24 @@ -2025-02-20 +2025-02-21 +Causal Modeling of fMRI Time-series for Interpretable Autism Spectrum Disorder Classification +Peiyu Duan, Nicha C. Dvornek, Jiyao Wang, Lawrence H. Staib, James S. Duncan +Link +Autism spectrum disorder (ASD) is a neurological and developmental disorder that affects social and communicative behaviors. It emerges in early life and is generally associated with lifelong disabilities. Thus, accurate and early diagnosis could facilitate treatment outcomes for those with ASD. Functional magnetic resonance imaging (fMRI) is a useful tool that measures changes in brain signaling to facilitate our understanding of ASD. Much effort is being made to identify ASD biomarkers using various connectome-based machine learning and deep learning classifiers. However, correlation-based models cannot capture the non-linear interactions between brain regions. To solve this problem, we introduce a causality-inspired deep learning model that uses time-series information from fMRI and captures causality among ROIs useful for ASD classification. The model is compared with other baseline and state-of-the-art models with 5-fold cross-validation on the ABIDE dataset. We filtered the dataset by choosing all the images with mean FD less than 15mm to ensure data quality. Our proposed model achieved the highest average classification accuracy of 71.9% and an average AUC of 75.8%. Moreover, the inter-ROI causality interpretation of the model suggests that the left precuneus, right precuneus, and cerebellum are placed in the top 10 ROIs in inter-ROI causality among the ASD population. In contrast, these ROIs are not ranked in the top 10 in the control population. We have validated our findings with the literature and found that abnormalities in these ROIs are often associated with ASD. + + +2025-02-21 +BP-GPT: Auditory Neural Decoding Using fMRI-prompted LLM +Xiaoyu Chen, Changde Du, Che Liu, Yizhe Wang, Huiguang He +Link +Decoding language information from brain signals represents a vital research area within brain-computer interfaces, particularly in the context of deciphering the semantic information from the fMRI signal. Although existing work uses LLM to achieve this goal, their method does not use an end-to-end approach and avoids the LLM in the mapping of fMRI-to-text, leaving space for the exploration of the LLM in auditory decoding. In this paper, we introduce a novel method, the Brain Prompt GPT (BP-GPT). By using the brain representation that is extracted from the fMRI as a prompt, our method can utilize GPT-2 to decode fMRI signals into stimulus text. Further, we introduce the text prompt and align the fMRI prompt to it. By introducing the text prompt, our BP-GPT can extract a more robust brain prompt and promote the decoding of pre-trained LLM. We evaluate our BP-GPT on the open-source auditory semantic decoding dataset and achieve a significant improvement up to 4.61 on METEOR and 2.43 on BERTScore across all the subjects compared to the state-of-the-art method. The experimental results demonstrate that using brain representation as a prompt to further drive LLM for auditory neural decoding is feasible and effective. The code is available at https://github.com/1994cxy/BP-GPT. + + +2025-02-21 Explanations of Deep Language Models Explain Language Representations in the Brain Maryam Rahimi, Yadollah Yaghoobzadeh, Mohammad Reza Daliri -Link +Link Recent advances in artificial intelligence have given rise to large language models (LLMs) that not only achieve human-like performance but also share computational principles with the brain's language processing mechanisms. While previous research has primarily focused on aligning LLMs' internal representations with neural activity, we introduce a novel approach that leverages explainable AI (XAI) methods to forge deeper connections between the two domains. Using attribution methods, we quantified how preceding words contribute to an LLM's next-word predictions and employed these explanations to predict fMRI recordings from participants listening to the same narratives. Our findings demonstrate that attribution methods robustly predict brain activity across the language network, surpassing traditional internal representations in early language areas. This alignment is hierarchical: early-layer explanations correspond to the initial stages of language processing in the brain, while later layers align with more advanced stages. Moreover, the layers more influential on LLM next-word prediction$\unicode{x2014}$those with higher attribution scores$\unicode{x2014}$exhibited stronger alignment with neural activity. This work establishes a bidirectional bridge between AI and neuroscience. First, we demonstrate that attribution methods offer a powerful lens for investigating the neural mechanisms of language comprehension, revealing how meaning emerges from preceding context. Second, we propose using brain alignment as a metric to evaluate the validity of attribution methods, providing a framework for assessing their biological plausibility. @@ -327,20 +341,6 @@ Link In the medical field, most resting-state fMRI (rs-fMRI) data are collected from multiple hospital sites. Multi-site rs-fMRI data can increase the volume of training data, enabling auxiliary diagnostic algorithms for brain diseases to learn more accurate and stable models. However, due to the significant heterogeneity and domain shift in rs-fMRI data across different sites, the accuracy of auxiliary diagnosis remains unsatisfactory. Moreover, there has been limited exploration of multi-source domain adaptation algorithms, and the interpretability of models is often poor. To address these challenges, we proposed a domain-adaptive algorithm based on hyperbolic space embedding. Hyperbolic space is naturally suited for representing the topology of complex networks such as brain functional networks. Therefore, we embedded the brain functional network into hyperbolic space and constructed the corresponding hyperbolic space community network to effectively extract brain network representations. To address the heterogeneity of data across different sites and the issue of domain shift, we introduce a constraint loss function, HMMD (Hyperbolic Maximum Mean Discrepancy), to align the marginal distributions in the hyperbolic space. Additionally, we employ class prototype alignment to align the conditional distributions. This significantly improves the quality of brain representations and enhances diagnostic classification accuracy for Autism Spectrum Disorder (ASD). Experimental results demonstrated that the proposed algorithm is robust to multi-site heterogeneity and shows promising potential for brain network mechanism analysis. - -2025-02-07 -MindAligner: Explicit Brain Functional Alignment for Cross-Subject Visual Decoding from Limited fMRI Data -Yuqin Dai, Zhouheng Yao, Chunfeng Song, Qihao Zheng, Weijian Mai, Kunyu Peng, Shuai Lu, Wanli Ouyang, Jian Yang, Jiamin Wu -Link -Brain decoding aims to reconstruct visual perception of human subject from fMRI signals, which is crucial for understanding brain's perception mechanisms. Existing methods are confined to the single-subject paradigm due to substantial brain variability, which leads to weak generalization across individuals and incurs high training costs, exacerbated by limited availability of fMRI data. To address these challenges, we propose MindAligner, an explicit functional alignment framework for cross-subject brain decoding from limited fMRI data. The proposed MindAligner enjoys several merits. First, we learn a Brain Transfer Matrix (BTM) that projects the brain signals of an arbitrary new subject to one of the known subjects, enabling seamless use of pre-trained decoding models. Second, to facilitate reliable BTM learning, a Brain Functional Alignment module is proposed to perform soft cross-subject brain alignment under different visual stimuli with a multi-level brain alignment loss, uncovering fine-grained functional correspondences with high interpretability. Experiments indicate that MindAligner not only outperforms existing methods in visual decoding under data-limited conditions, but also provides valuable neuroscience insights in cross-subject functional analysis. The code will be made publicly available. - - -2025-02-07 -A Foundational Brain Dynamics Model via Stochastic Optimal Control -Joonhyeong Park, Byoungwoo Park, Chang-Bae Bang, Jungwon Choi, Hyungjin Chung, Byung-Hoon Kim, Juho Lee -Link -We introduce a foundational model for brain dynamics that utilizes stochastic optimal control (SOC) and amortized inference. Our method features a continuous-discrete state space model (SSM) that can robustly handle the intricate and noisy nature of fMRI signals. To address computational limitations, we implement an approximation strategy grounded in the SOC framework. Additionally, we present a simulation-free latent dynamics approach that employs locally linear approximations, facilitating efficient and scalable inference. For effective representation learning, we derive an Evidence Lower Bound (ELBO) from the SOC formulation, which integrates smoothly with recent advancements in self-supervised learning (SSL), thereby promoting robust and transferable representations. Pre-trained on extensive datasets such as the UKB, our model attains state-of-the-art results across a variety of downstream tasks, including demographic prediction, trait analysis, disease diagnosis, and prognosis. Moreover, evaluating on external datasets such as HCP-A, ABIDE, and ADHD200 further validates its superior abilities and resilience across different demographic and clinical distributions. Our foundational model provides a scalable and efficient approach for deciphering brain dynamics, opening up numerous applications in neuroscience. - @@ -530,74 +530,74 @@ -2025-02-20 -FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis -Fadillah Maani, Numan Saeed, Tausifa Saleem, Zaid Farooq, Hussain Alasmawi, Werner Diehl, Ameera Mohammad, Gareth Waring, Saudabi Valappi, Leanne Bricker, Mohammad Yaqub -Link -Foundation models are becoming increasingly effective in the medical domain, offering pre-trained models on large datasets that can be readily adapted for downstream tasks. Despite progress, fetal ultrasound images remain a challenging domain for foundation models due to their inherent complexity, often requiring substantial additional training and facing limitations due to the scarcity of paired multimodal data. To overcome these challenges, here we introduce FetalCLIP, a vision-language foundation model capable of generating universal representation of fetal ultrasound images. FetalCLIP was pre-trained using a multimodal learning approach on a diverse dataset of 210,035 fetal ultrasound images paired with text. This represents the largest paired dataset of its kind used for foundation model development to date. This unique training approach allows FetalCLIP to effectively learn the intricate anatomical features present in fetal ultrasound images, resulting in robust representations that can be used for a variety of downstream applications. In extensive benchmarking across a range of key fetal ultrasound applications, including classification, gestational age estimation, congenital heart defect (CHD) detection, and fetal structure segmentation, FetalCLIP outperformed all baselines while demonstrating remarkable generalizability and strong performance even with limited labeled data. We plan to release the FetalCLIP model publicly for the benefit of the broader scientific community. +2025-02-21 +Modeling Infectious Diseases: From SIR Models to Diffusion-Based Approaches and Numerical Solutions +Ayesha Baig, Li Zhouxin +Link +As global living standards improve and medical technology advances, many infectious diseases have been effectively controlled. However, certain diseases, such as the recent COVID-19 pandemic, continue to pose significant threats to public health. This paper explores the evolution of infectious disease modeling, from early ordinary differential equation-based models like the SIR framework to more complex reaction-diffusion models that incorporate both temporal and spatial dynamics. The study highlights the importance of numerical methods, such as the Runge-Kutta method, implicit-explicit time-discretization techniques, and finite difference methods, in solving these models. By analyzing the development and application of these methods, this research underscores their critical role in predicting disease spread, informing public health strategies, and mitigating the impact of future pandemics. -2025-02-20 -Step-by-Step Fact Verification System for Medical Claims with Explainable Reasoning -Juraj Vladika, Ivana Hacajová, Florian Matthes -Link -Fact verification (FV) aims to assess the veracity of a claim based on relevant evidence. The traditional approach for automated FV includes a three-part pipeline relying on short evidence snippets and encoder-only inference models. More recent approaches leverage the multi-turn nature of LLMs to address FV as a step-by-step problem where questions inquiring additional context are generated and answered until there is enough information to make a decision. This iterative method makes the verification process rational and explainable. While these methods have been tested for encyclopedic claims, exploration on domain-specific and realistic claims is missing. In this work, we apply an iterative FV system on three medical fact-checking datasets and evaluate it with multiple settings, including different LLMs, external web search, and structured reasoning using logic predicates. We demonstrate improvements in the final performance over traditional approaches and the high potential of step-by-step FV systems for domain-specific claims. +2025-02-21 +MHQA: A Diverse, Knowledge Intensive Mental Health Question Answering Challenge for Language Models +Suraj Racha, Prashant Joshi, Anshika Raman, Nikita Jangid, Mridul Sharma, Ganesh Ramakrishnan, Nirmal Punjabi +Link +Mental health remains a challenging problem all over the world, with issues like depression, anxiety becoming increasingly common. Large Language Models (LLMs) have seen a vast application in healthcare, specifically in answering medical questions. However, there is a lack of standard benchmarking datasets for question answering (QA) in mental health. Our work presents a novel multiple choice dataset, MHQA (Mental Health Question Answering), for benchmarking Language models (LMs). Previous mental health datasets have focused primarily on text classification into specific labels or disorders. MHQA, on the other hand, presents question-answering for mental health focused on four key domains: anxiety, depression, trauma, and obsessive/compulsive issues, with diverse question types, namely, factoid, diagnostic, prognostic, and preventive. We use PubMed abstracts as the primary source for QA. We develop a rigorous pipeline for LLM-based identification of information from abstracts based on various selection criteria and converting it into QA pairs. Further, valid QA pairs are extracted based on post-hoc validation criteria. Overall, our MHQA dataset consists of 2,475 expert-verified gold standard instances called MHQA-gold and ~56.1k pairs pseudo labeled using external medical references. We report F1 scores on different LLMs along with few-shot and supervised fine-tuning experiments, further discussing the insights for the scores. -2025-02-20 -MedVAE: Efficient Automated Interpretation of Medical Images with Large-Scale Generalizable Autoencoders -Maya Varma, Ashwin Kumar, Rogier van der Sluijs, Sophie Ostmeier, Louis Blankemeier, Pierre Chambon, Christian Bluethgen, Jip Prince, Curtis Langlotz, Akshay Chaudhari -Link -Medical images are acquired at high resolutions with large fields of view in order to capture fine-grained features necessary for clinical decision-making. Consequently, training deep learning models on medical images can incur large computational costs. In this work, we address the challenge of downsizing medical images in order to improve downstream computational efficiency while preserving clinically-relevant features. We introduce MedVAE, a family of six large-scale 2D and 3D autoencoders capable of encoding medical images as downsized latent representations and decoding latent representations back to high-resolution images. We train MedVAE autoencoders using a novel two-stage training approach with 1,052,730 medical images. Across diverse tasks obtained from 20 medical image datasets, we demonstrate that (1) utilizing MedVAE latent representations in place of high-resolution images when training downstream models can lead to efficiency benefits (up to 70x improvement in throughput) while simultaneously preserving clinically-relevant features and (2) MedVAE can decode latent representations back to high-resolution images with high fidelity. Our work demonstrates that large-scale, generalizable autoencoders can help address critical efficiency challenges in the medical domain. Our code is available at https://github.com/StanfordMIMI/MedVAE. +2025-02-21 +A biomechanical comparison of concussion and head acceleration events in elite-level American football and rugby union +Gregory Tierney +Link +Elite-level American football and rugby union are two high-contact sports with growing clinical and legal concerns over player safety, necessitating a comparative analysis. A biomechanical comparison of concussion and head acceleration events (HAEs) in elite-level American football and rugby union was undertaken. Rugby union players have a greater number of professional playing years and matches available in a season than their American football counterparts. Rugby union players have a greater number of concussions reported per match and a higher proportion of concussions occurring during training sessions, based on National Football League (NFL) and Rugby Football Union (RFU) injury reports. Preliminary findings indicate that rugby union forwards experience a higher incidence of HAEs per player match over lower and higher magnitude thresholds, than American football defensive players. Overall, elite-level rugby union appears less favourable than American football in in almost all metrics pertinent to concussion and HAE exposure in the biomechanical comparison undertaken. The findings highlight the critical importance of independence, scientific rigour, and transparency in future concussion and HAE biomechanics research and real-world implementation, ensuring the development of more effective mitigation strategies. -2025-02-20 -Instrumented mouthguards in elite sports: Validity and head acceleration event (HAE) incidence in NCAA American Football -Mario Rotundo, Nicholas Murray, David Allan, Gregory Tierney -Link -Objectives: To determine the on-field validity of instrumented mouthguards (iMGs) in American football and to quantify head acceleration event (HAE) incidence in NCAA football players. Methods: Instrumented mouthguards were fitted to 35 male NCAA football players. Head kinematic data were collected during 64 player matches. On-field validity was determined through video review with positive predictive value (PPV) and sensitivity values calculated. HAE incidence was calculated as the number of HAEs per player match and stratified by Offense and Defense positions. Results: On-field validity of the Prevent Biometrics iMG in NCAA American Football indicates a sensitivity was 0.89 and PPV ranging from 0.76-0.98 based on false positive definitions. The incidence of PLA and PAA HAEs above a range of thresholds in Defense and Offense appear similar. The incidence of HAEs above 10 g was 11.2 and 11.3 HAEs per player match for Defense and Offense, respectively, while PAA incidence above 1.0 krad/s2 was 5.5 and 6.9 HAEs per player match for Defense and Offense, respectively. Incidence of HAEs above 30 g was 1.6 and 2.6 per player match and 0.9 and 1.4 for HAEs above 2.0 krad/s2 for Defense and Offense, respectively. Conclusion: The Prevent Biometrics iMG appears suitable for measuring HAEs in elite American football. The study provides a benchmark assessment of HAE incidence in elite American Football and lays a foundation for the development of position-specific interventions aimed at reducing HAE exposure. +2025-02-21 +Drug-Target Interaction/Affinity Prediction: Deep Learning Models and Advances Review +Ali Vefghi, Zahed Rahmati, Mohammad Akbari +Link +Drug discovery remains a slow and expensive process that involves many steps, from detecting the target structure to obtaining approval from the Food and Drug Administration (FDA), and is often riddled with safety concerns. Accurate prediction of how drugs interact with their targets and the development of new drugs by using better methods and technologies have immense potential to speed up this process, ultimately leading to faster delivery of life-saving medications. Traditional methods used for drug-target interaction prediction show limitations, particularly in capturing complex relationships between drugs and their targets. As an outcome, deep learning models have been presented to overcome the challenges of interaction prediction through their precise and efficient end results. By outlining promising research avenues and models, each with a different solution but similar to the problem, this paper aims to give researchers a better idea of methods for even more accurate and efficient prediction of drug-target interaction, ultimately accelerating the development of more effective drugs. A total of 180 prediction methods for drug-target interactions were analyzed throughout the period spanning 2016 to 2025 using different frameworks based on machine learning, mainly deep learning and graph neural networks. Additionally, this paper discusses the novelty, architecture, and input representation of these models. -2025-02-20 -ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation -Angxiao Yue, Zichong Wang, Hongteng Xu -Link -Protein backbone generation plays a central role in de novo protein design and is significant for many biological and medical applications. Although diffusion and flow-based generative models provide potential solutions to this challenging task, they often generate proteins with undesired designability and suffer computational inefficiency. In this study, we propose a novel rectified quaternion flow (ReQFlow) matching method for fast and high-quality protein backbone generation. In particular, our method generates a local translation and a 3D rotation from random noise for each residue in a protein chain, which represents each 3D rotation as a unit quaternion and constructs its flow by spherical linear interpolation (SLERP) in an exponential format. We train the model by quaternion flow (QFlow) matching with guaranteed numerical stability and rectify the QFlow model to accelerate its inference and improve the designability of generated protein backbones, leading to the proposed ReQFlow model. Experiments show that ReQFlow achieves state-of-the-art performance in protein backbone generation while requiring much fewer sampling steps and significantly less inference time (e.g., being 37x faster than RFDiffusion and 62x faster than Genie2 when generating a backbone of length 300), demonstrating its effectiveness and efficiency. The code is available at https://github.com/AngxiaoYue/ReQFlow. +2025-02-21 +Lung-DDPM: Semantic Layout-guided Diffusion Models for Thoracic CT Image Synthesis +Yifan Jiang, Yannick Lemaréchal, Josée Bafaro, Jessica Abi-Rjeile, Philippe Joubert, Philippe Després, Venkata Manem +Link +With the rapid development of artificial intelligence (AI), AI-assisted medical imaging analysis demonstrates remarkable performance in early lung cancer screening. However, the costly annotation process and privacy concerns limit the construction of large-scale medical datasets, hampering the further application of AI in healthcare. To address the data scarcity in lung cancer screening, we propose Lung-DDPM, a thoracic CT image synthesis approach that effectively generates high-fidelity 3D synthetic CT images, which prove helpful in downstream lung nodule segmentation tasks. Our method is based on semantic layout-guided denoising diffusion probabilistic models (DDPM), enabling anatomically reasonable, seamless, and consistent sample generation even from incomplete semantic layouts. Our results suggest that the proposed method outperforms other state-of-the-art (SOTA) generative models in image quality evaluation and downstream lung nodule segmentation tasks. Specifically, Lung-DDPM achieved superior performance on our large validation cohort, with a Fr\'echet inception distance (FID) of 0.0047, maximum mean discrepancy (MMD) of 0.0070, and mean squared error (MSE) of 0.0024. These results were 7.4$\times$, 3.1$\times$, and 29.5$\times$ better than the second-best competitors, respectively. Furthermore, the lung nodule segmentation model, trained on a dataset combining real and Lung-DDPM-generated synthetic samples, attained a dice coefficient (Dice) of 0.3914 and sensitivity of 0.4393. This represents 8.8\% and 18.6\% improvements in DICE and sensitivity compared to the model trained solely on real samples. The experimental results highlight Lung-DDPM's potential for a broader range of medical imaging applications, such as general tumor segmentation, cancer survival estimation, and risk prediction. -2025-02-20 -FIND: Fine-grained Information Density Guided Adaptive Retrieval-Augmented Generation for Disease Diagnosis -Mingyi Jia, Junwen Duan, Yan Song, Jianxin Wang -Link -Retrieval-Augmented Large Language Models (LLMs), which integrate external knowledge into LLMs, have shown remarkable performance in various medical domains, including clinical diagnosis. However, existing RAG methods struggle to effectively assess task difficulty to make retrieval decisions, thereby failing to meet the clinical requirements for balancing efficiency and accuracy. So in this paper, we propose FIND (\textbf{F}ine-grained \textbf{In}formation \textbf{D}ensity Guided Adaptive RAG), a novel framework that improves the reliability of RAG in disease diagnosis scenarios. FIND incorporates a fine-grained adaptive control module to determine whether retrieval is necessary based on the information density of the input. By optimizing the retrieval process and implementing a knowledge filtering module, FIND ensures that the retrieval is better suited to clinical scenarios. Experiments on three Chinese electronic medical record datasets demonstrate that FIND significantly outperforms various baseline methods, highlighting its effectiveness in clinical diagnosis tasks. +2025-02-21 +Image Translation-Based Unsupervised Cross-Modality Domain Adaptation for Medical Image Segmentation +Tao Yang, Lisheng Wang +Link +Supervised deep learning usually faces more challenges in medical images than in natural images. Since annotations in medical images require the expertise of doctors and are more time-consuming and expensive. Thus, some researchers turn to unsupervised learning methods, which usually face inevitable performance drops. In addition, medical images may have been acquired at different medical centers with different scanners and under different image acquisition protocols, so the modalities of the medical images are often inconsistent. This modality difference (domain shift) also reduces the applicability of deep learning methods. In this regard, we propose an unsupervised crossmodality domain adaptation method based on image translation by transforming the source modality image with annotation into the unannotated target modality and using its annotation to achieve supervised learning of the target modality. In addition, the subtle differences between translated pseudo images and real images are overcome by self-training methods to further improve the task performance of deep learning. The proposed method showed mean Dice Similarity Coefficient (DSC) and Average Symmetric Surface Distance (ASSD) of $0.8351 \pm 0.1152$ and $1.6712 \pm 2.1948$ for vestibular schwannoma (VS), $0.8098 \pm 0.0233$ and $0.2317 \pm 0.1577$ for cochlea on the VS and cochlea segmentation task of the Cross-Modality Domain Adaptation (crossMoDA 2022) challenge validation phase leaderboard. -2025-02-20 -Enhancing nuclear cross-section predictions with deep learning: the DINo algorithm -Levana Gesson, Greg Henning, Jonathan Collin, Marie Vanstalle -Link -Accurate modeling of nuclear reaction cross-sections is crucial for applications such as hadron therapy, radiation protection, and nuclear reactor design. Despite continuous advancements in nuclear physics, significant discrepancies persist between experimental data and theoretical models such as TENDL, and ENDF/B. These deviations introduce uncertainties in Monte Carlo simulations widely used in nuclear physics and medical applications. In this work, DINo (Deep learning Intelligence for Nuclear reactiOns) is introduced as a deep learning-based algorithm designed to improve cross-section predictions by learning correlations between charge-changing and total cross-sections. Trained on the TENDL-2021 dataset and validated against experimental data from the EXFOR database, DINo demonstrates a significant improvement in predictive accuracy over conventional nuclear models. The results show that DINo systematically achieves lower chi2 values compared to TENDL-2021 across multiple isotopes, particularly for proton-induced reactions on a 12C target. Specifically, for 11C production, DINo reduces the discrepancy with experimental data by \sim 28\% compared to TENDL-2021. Additionally, DINo provides improved predictions for other relevant isotopes produced, such as 4He, 6Li, 9Be, and 10B, which play a crucial role in modeling nuclear fragmentation processes. By leveraging neural networks, DINo offers fast cross-section predictions, making it a promising complementary tool for nuclear reaction modeling. However, the algorithm's performance evaluation is sensitive to the availability of experimental data, with increased uncertainty in sparsely measured energy ranges. Future work will focus on refining the model through data augmentation, expanding its applicability to other reaction channels, and integrating it into Monte Carlo transport codes for real-time nuclear data processing. +2025-02-21 +Key Body Posture Characteristics of Short-distance Speed Skaters at the Start Based on Artificial Intelligence +Zhang Xueliana, Fang Yingjieb, Liu Hang +Link +Objective To conduct biomechanical analysis on the starting technique of male short-distance speed skating athletes in China and determine the key factors affecting the effectiveness of the starting movement. Methods 13 high-level male short-distance speed skating athletes were selected as the test subjects, and kinematic data were collected using an artificial intelligence video capture and analysis system. The body posture features and their effects on the starting movement performance were analyzed in the three stages of starting preparation, starting, and sprinting. Results The post-stability angle, anterior knee angle of the front leg, posterior knee angle of the rear leg, and stride length showed moderate to high positive correlations with the starting speed during the starting preparation stage. The trunk angle showed a high negative correlation with the starting speed. The trunk angle (TO4, TD4, TO6, TD6), hip angle (TO1, TO4, TO6), and knee angle (TD1) showed moderate to high negative correlations with the effectiveness of the starting movement during the starting and sprinting stages. The knee angle (TD2), ice-contact angle (TD2, TD4, TD5, TD6), and propulsion angle (TO1, TO4, TO7) showed moderate positive correlations with the effectiveness of the starting movement. Conclusion Stride length, left knee angle, and post-stability angle are the key factors affecting the starting speed. The larger the post-stability angle and left knee angle and the longer the stride length, the faster the starting speed. During the starting and sprinting stages, the smaller the ice-contact angle and propulsion angle, the greater the trunk angle and hip angle changes, the more effective the starting movement. -2025-02-20 -Vision Foundation Models in Medical Image Analysis: Advances and Challenges -Pengchen Liang, Bin Pu, Haishan Huang, Yiwei Li, Hualiang Wang, Weibo Ma, Qing Chang -Link -The rapid development of Vision Foundation Models (VFMs), particularly Vision Transformers (ViT) and Segment Anything Model (SAM), has sparked significant advances in the field of medical image analysis. These models have demonstrated exceptional capabilities in capturing long-range dependencies and achieving high generalization in segmentation tasks. However, adapting these large models to medical image analysis presents several challenges, including domain differences between medical and natural images, the need for efficient model adaptation strategies, and the limitations of small-scale medical datasets. This paper reviews the state-of-the-art research on the adaptation of VFMs to medical image segmentation, focusing on the challenges of domain adaptation, model compression, and federated learning. We discuss the latest developments in adapter-based improvements, knowledge distillation techniques, and multi-scale contextual feature modeling, and propose future directions to overcome these bottlenecks. Our analysis highlights the potential of VFMs, along with emerging methodologies such as federated learning and model compression, to revolutionize medical image analysis and enhance clinical applications. The goal of this work is to provide a comprehensive overview of current approaches and suggest key areas for future research that can drive the next wave of innovation in medical image segmentation. +2025-02-21 +Nonlinear Dynamical Systems for Automatic Face Annotation in Head Tracking and Pose Estimation +Thoa Thieu, Roderick Melnik +Link +Facial landmark tracking plays a vital role in applications such as facial recognition, expression analysis, and medical diagnostics. In this paper, we consider the performance of the Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF) in tracking 3D facial motion in both deterministic and stochastic settings. We first analyze a noise-free environment where the state transition is purely deterministic, demonstrating that UKF outperforms EKF by achieving lower mean squared error (MSE) due to its ability to capture higher-order nonlinearities. However, when stochastic noise is introduced, EKF exhibits superior robustness, maintaining lower mean square error (MSE) compared to UKF, which becomes more sensitive to measurement noise and occlusions. Our results highlight that UKF is preferable for high-precision applications in controlled environments, whereas EKF is better suited for real-world scenarios with unpredictable noise. These findings provide practical insights for selecting the appropriate filtering technique in 3D facial tracking applications, such as motion capture and facial recognition. 2025-02-20 -Poststroke rehabilitative mechanisms in individualized fatigue level-controlled treadmill training -- a Rat Model Study -Yuchen Xu, Yulong Peng, Yuanfa Yao, Xiaoman Fan, Minmin Wang, Feng Gao, Mohamad Sawan, Shaomin Zhang, Xiaoling Hu -Link -Individualized training improved post-stroke motor function rehabilitation efficiency. However, the mechanisms of how individualized training facilitates recovery is not clear. This study explored the cortical and corticomuscular rehabilitative effects in post-stroke motor function recovery during individualized training. Sprague-Dawley rats with intracerebral hemorrhage (ICH) were randomly distributed into two groups: forced training (FOR-T, n=13) and individualized fatigue-controlled training (FAT-C, n=13) to receive training respectively from day 2 to day 14 post-stroke. The FAT-C group exhibited superior motor function recovery and less central fatigue compared to the FOR-T group. EEG PSD slope analysis demonstrated a better inter-hemispheric balance in FAT-C group compare to the FOR-T group. The dCMC analysis indicated that training-induced fatigue led to a short-term down-regulation of descending corticomuscular coherence (dCMC) and an up-regulation of ascending dCMC. In the long term, excessive fatigue hindered the recovery of descending control in the affected hemisphere. The individualized strategy of peripheral fatigue-controlled training achieved better motor function recovery, which could be attributed to the mitigation of central fatigue, optimization of inter-hemispheric balance and enhancement of descending control in the affected hemisphere. +Pseudoinverse Diffusion Models for Generative CT Image Reconstruction from Low Dose Data +Matthew Tivnan, Dufan Wu, Quanzheng Li +Link +Score-based diffusion models have significantly advanced generative deep learning for image processing. Measurement conditioned models have also been applied to inverse problems such as CT reconstruction. However, the conventional approach, culminating in white noise, often requires a high number of reverse process update steps and score function evaluations. To address this limitation, we propose an alternative forward process in score-based diffusion models that aligns with the noise characteristics of low-dose CT reconstructions, rather than converging to white noise. This method significantly reduces the number of required score function evaluations, enhancing efficiency and maintaining familiar noise textures for radiologists, Our approach not only accelerates the generative process but also retains CT noise correlations, a key aspect often criticized by clinicians for deep learning reconstructions. In this work, we rigorously define a matrix-controlled stochastic process for this purpose and validate it through computational experiments. Using a dataset from The Cancer Genome Atlas Liver Hepatocellular Carcinoma (TCGA-LIHC), we simulate low-dose CT measurements and train our model, comparing it with a baseline scalar diffusion process and conditional diffusion model. Our results demonstrate the superiority of our pseudoinverse diffusion model in terms of efficiency and the ability to produce high-quality reconstructions that are familiar in texture to medical professionals in a low number of score function evaluations. This advancement paves the way for more efficient and clinically practical diffusion models in medical imaging, particularly beneficial in scenarios demanding rapid reconstructions or lower radiation exposure. 2025-02-20 -Accelerated X-Ray Fluorescence Computed Tomography via Multi-Pencil-Beam Excitation -Ryder M. Schmidt, Daiki Hara, Jorge D. Vega, Marwan Abuhaija, Brett Bocian, Wendi Ma, Nesrin Dogan, Alan Pollack, Ge Wang, John C. Ford, Junwei Shi -Link -X-ray fluorescence computed tomography (XFCT), a form of X-ray molecular imaging, offers detailed quantitative imaging capabilities for high-Z metal nanoparticles (MNPs), which are widely studied for their applications in multifunctional theranostics. Due to its affordability and accessibility, the benchtop XFCT prototype typically employs a single-pixel detector (SPD) with single-pencil-beam (SPB) X-ray excitation. While this design (resembling the first-generation CT geometry) achieves reliable detection sensitivity, it is hindered by long imaging times. The use of simultaneous multiple-pencil-beam (MPB) excitation presents a promising solution to significantly reduce imaging times. In this study, we developed a repeatable workflow that combines Monte Carlo (MC) simulations and 3D printing to design Nbeam-MPB collimator, where Nbeam is the number of beams generated by the collimator. As an initial test, we fabricated a 2-MPB collimator and evaluated the performance of 2-MPB-based XFCT imaging on a physical phantom and small animals surgically implanted with agarose pellets containing gold chloride (H[AuCl4]). The results demonstrated a 2x acceleration in image acquisition without compromising the contrast-to-noise ratio (CNR). We further investigated the concept of Nbeam-MPB acceleration on the MC computational XFCT system, which confirmed the feasibility of achieving at least 4x acceleration with 4-MPB excitation. Combined with additional system optimization, such as X-ray beam flux optimization, XFCT imaging could be further accelerated, reducing acquisition time from hours to minutes and meeting the requirements for routine MNP imaging. +Multi-Source Static CT with Adaptive Fluence Modulation to Minimize Hallucinations in Generative Reconstructions +Matthew Tivnan, Amar Gupta, Kai Yang, Dufan Wu, Rajiv Gupta +Link +Multi-source static Computed Tomography (CT) systems have introduced novel opportunities for adaptive imaging techniques. This work presents an innovative method of fluence field modulation using spotlight collimators. These instruments block positive or negative fan angles of even and odd indexed sources, respectively. Spotlight collimators enable volume of interest imaging by increasing relative exposure for the overlapping views. To achieve high quality reconstructions from sparse-view low-dose data, we introduce a generative reconstruction algorithm called Langevin Posterior Sampling (LPS), which uses a score based diffusion prior and physics based likelihood model to sample a posterior random walk. We conduct simulation-based experiments of head CT imaging for stroke detection and we demonstrate that spotlight collimators can effectively reduce the standard deviation and worst-case scenario hallucinations in reconstructed images. Compared to uniform fluence, our approach shows a significant reduction in posterior standard deviation. This highlights the potential for spotlight collimators and generative reconstructions to improve image quality and diagnostic accuracy of multi-source static CT. diff --git a/data_store/papers_2025-02-25.json b/data_store/papers_2025-02-25.json new file mode 100644 index 0000000..980730e --- /dev/null +++ b/data_store/papers_2025-02-25.json @@ -0,0 +1,514 @@ +{ + "Brain": { + "2502.15595v1": { + "title": "Causal Modeling of fMRI Time-series for Interpretable Autism Spectrum Disorder Classification", + "url": "http://arxiv.org/abs/2502.15595v1", + "authors": "Peiyu Duan, Nicha C. Dvornek, Jiyao Wang, Lawrence H. Staib, James S. Duncan", + "update_time": "2025-02-21", + "abstract": "Autism spectrum disorder (ASD) is a neurological and developmental disorder that affects social and communicative behaviors. It emerges in early life and is generally associated with lifelong disabilities. Thus, accurate and early diagnosis could facilitate treatment outcomes for those with ASD. Functional magnetic resonance imaging (fMRI) is a useful tool that measures changes in brain signaling to facilitate our understanding of ASD. Much effort is being made to identify ASD biomarkers using various connectome-based machine learning and deep learning classifiers. However, correlation-based models cannot capture the non-linear interactions between brain regions. To solve this problem, we introduce a causality-inspired deep learning model that uses time-series information from fMRI and captures causality among ROIs useful for ASD classification. The model is compared with other baseline and state-of-the-art models with 5-fold cross-validation on the ABIDE dataset. We filtered the dataset by choosing all the images with mean FD less than 15mm to ensure data quality. Our proposed model achieved the highest average classification accuracy of 71.9% and an average AUC of 75.8%. Moreover, the inter-ROI causality interpretation of the model suggests that the left precuneus, right precuneus, and cerebellum are placed in the top 10 ROIs in inter-ROI causality among the ASD population. In contrast, these ROIs are not ranked in the top 10 in the control population. We have validated our findings with the literature and found that abnormalities in these ROIs are often associated with ASD." + }, + "2502.15503v1": { + "title": "BAN: Neuroanatomical Aligning in Auditory Recognition between Artificial Neural Network and Human Cortex", + "url": "http://arxiv.org/abs/2502.15503v1", + "authors": "Haidong Wang, Pengfei Xiao, Ao Liu, Jianhua Zhang, Qia Shan", + "update_time": "2025-02-21", + "abstract": "Drawing inspiration from neurosciences, artificial neural networks (ANNs) have evolved from shallow architectures to highly complex, deep structures, yielding exceptional performance in auditory recognition tasks. However, traditional ANNs often struggle to align with brain regions due to their excessive depth and lack of biologically realistic features, like recurrent connection. To address this, a brain-like auditory network (BAN) is introduced, which incorporates four neuroanatomically mapped areas and recurrent connection, guided by a novel metric called the brain-like auditory score (BAS). BAS serves as a benchmark for evaluating the similarity between BAN and human auditory recognition pathway. We further propose that specific areas in the cerebral cortex, mainly the middle and medial superior temporal (T2/T3) areas, correspond to the designed network structure, drawing parallels with the brain's auditory perception pathway. Our findings suggest that the neuroanatomical similarity in the cortex and auditory classification abilities of the ANN are well-aligned. In addition to delivering excellent performance on a music genre classification task, the BAN demonstrates a high BAS score. In conclusion, this study presents BAN as a recurrent, brain-inspired ANN, representing the first model that mirrors the cortical pathway of auditory recognition." + }, + "2502.15484v1": { + "title": "Confidence-Based Annotation Of Brain Tumours In Ultrasound", + "url": "http://arxiv.org/abs/2502.15484v1", + "authors": "Alistair Weld, Luke Dixon, Alfie Roddan, Giulio Anichini, Sophie Camp, Stamatia Giannarou", + "update_time": "2025-02-21", + "abstract": "Purpose: An investigation of the challenge of annotating discrete segmentations of brain tumours in ultrasound, with a focus on the issue of aleatoric uncertainty along the tumour margin, particularly for diffuse tumours. A segmentation protocol and method is proposed that incorporates this margin-related uncertainty while minimising the interobserver variance through reduced subjectivity, thereby diminishing annotator epistemic uncertainty. Approach: A sparse confidence method for annotation is proposed, based on a protocol designed using computer vision and radiology theory. Results: Output annotations using the proposed method are compared with the corresponding professional discrete annotation variance between the observers. A linear relationship was measured within the tumour margin region, with a Pearson correlation of 0.8. The downstream application was explored, comparing training using confidence annotations as soft labels with using the best discrete annotations as hard labels. In all evaluation folds, the Brier score was superior for the soft-label trained network. Conclusion: A formal framework was constructed to demonstrate the infeasibility of discrete annotation of brain tumours in B-mode ultrasound. Subsequently, a method for sparse confidence-based annotation is proposed and evaluated. Keywords: Brain tumours, ultrasound, confidence, annotation." + }, + "2502.15363v1": { + "title": "M2LADS Demo: A System for Generating Multimodal Learning Analytics Dashboards", + "url": "http://arxiv.org/abs/2502.15363v1", + "authors": "Alvaro Becerra, Roberto Daza, Ruth Cobos, Aythami Morales, Julian Fierrez", + "update_time": "2025-02-21", + "abstract": "We present a demonstration of a web-based system called M2LADS (\"System for Generating Multimodal Learning Analytics Dashboards\"), designed to integrate, synchronize, visualize, and analyze multimodal data recorded during computer-based learning sessions with biosensors. This system presents a range of biometric and behavioral data on web-based dashboards, providing detailed insights into various physiological and activity-based metrics. The multimodal data visualized include electroencephalogram (EEG) data for assessing attention and brain activity, heart rate metrics, eye-tracking data to measure visual attention, webcam video recordings, and activity logs of the monitored tasks. M2LADS aims to assist data scientists in two key ways: (1) by providing a comprehensive view of participants' experiences, displaying all data categorized by the activities in which participants are engaged, and (2) by synchronizing all biosignals and videos, facilitating easier data relabeling if any activity information contains errors." + }, + "2502.15230v1": { + "title": "Applications of wavelet transform in classification of local field potential recorded from the rat brain in conditioned place preference paradigm", + "url": "http://arxiv.org/abs/2502.15230v1", + "authors": "AmirAli Kalbasi, Mahdi Aliyari Shoorehdeli, Shole Jamali, Abbas Haghparast", + "update_time": "2025-02-21", + "abstract": "This study investigates the multi-label classification of Local Field Potential (LFP) data from the hippocampus (HIP) and nucleus accumbens (NAc) in the rat brain, focusing on reward responses using the Conditioned Place Preference (CPP) paradigm. Rats were conditioned with saline, morphine, and food rewards, and LFP recordings were conducted from both HIP and NAc during pre- and post-tests. The LFP data were classified into four categories: treatment types, test phases, recording channels, and chamber positions within the CPP setup. Features were extracted using Continuous Wavelet Transform (CWT), Wavelet Coherence, and Wavelet Scattering. Classification was performed via Decision Trees, Multilayer Perceptrons, and Support Vector Machines. Notably, in the Food group, HIP and combined HIP-NAc features yielded the highest classification accuracy for CPP chambers, whereas NAc features excelled in the Morphine group. Employing wavelet scattering, an 80% classification accuracy was achieved across treatment groups, test phases, and channels. Exceptionally high classification accuracies were observed for Food-post-test-HIP (99.75%) and Morphine-post-test-NAc (99.58%). The study reveals that NAc activity is pivotal for morphine-induced CPP, whereas HIP and HIP-NAc connectivity are crucial for food-induced CPP. The proposed methodology provides a novel avenue for precisely classifying LFP data, shedding light on neural circuit activities underlying behavioral responses." + }, + "2502.15198v1": { + "title": "Graph-Based Deep Learning on Stereo EEG for Predicting Seizure Freedom in Epilepsy Patients", + "url": "http://arxiv.org/abs/2502.15198v1", + "authors": "Artur Agaronyan, Syeda Abeera Amir, Nunthasiri Wittayanakorn, John Schreiber, Marius G. Linguraru, William Gaillard, Chima Oluigbo, Syed Muhammad Anwar", + "update_time": "2025-02-21", + "abstract": "Predicting seizure freedom is essential for tailoring epilepsy treatment. But accurate prediction remains challenging with traditional methods, especially with diverse patient populations. This study developed a deep learning-based graph neural network (GNN) model to predict seizure freedom from stereo electroencephalography (sEEG) data in patients with refractory epilepsy. We utilized high-quality sEEG data from 15 pediatric patients to train a deep learning model that can accurately predict seizure freedom outcomes and advance understanding of brain connectivity at the seizure onset zone. Our model integrates local and global connectivity using graph convolutions with multi-scale attention mechanisms to capture connections between difficult-to-study regions such as the thalamus and motor regions. The model achieved an accuracy of 92.4% in binary class analysis, 86.6% in patient-wise analysis, and 81.4% in multi-class analysis. Node and edge-level feature analysis highlighted the anterior cingulate and frontal pole regions as key contributors to seizure freedom outcomes. The nodes identified by our model were also more likely to coincide with seizure onset zones. Our findings underscore the potential of new connectivity-based deep learning models such as GNNs for enhancing the prediction of seizure freedom, predicting seizure onset zones, connectivity analysis of the brain during seizure, as well as informing AI-assisted personalized epilepsy treatment planning." + }, + "2502.15172v1": { + "title": "BP-GPT: Auditory Neural Decoding Using fMRI-prompted LLM", + "url": "http://arxiv.org/abs/2502.15172v1", + "authors": "Xiaoyu Chen, Changde Du, Che Liu, Yizhe Wang, Huiguang He", + "update_time": "2025-02-21", + "abstract": "Decoding language information from brain signals represents a vital research area within brain-computer interfaces, particularly in the context of deciphering the semantic information from the fMRI signal. Although existing work uses LLM to achieve this goal, their method does not use an end-to-end approach and avoids the LLM in the mapping of fMRI-to-text, leaving space for the exploration of the LLM in auditory decoding. In this paper, we introduce a novel method, the Brain Prompt GPT (BP-GPT). By using the brain representation that is extracted from the fMRI as a prompt, our method can utilize GPT-2 to decode fMRI signals into stimulus text. Further, we introduce the text prompt and align the fMRI prompt to it. By introducing the text prompt, our BP-GPT can extract a more robust brain prompt and promote the decoding of pre-trained LLM. We evaluate our BP-GPT on the open-source auditory semantic decoding dataset and achieve a significant improvement up to 4.61 on METEOR and 2.43 on BERTScore across all the subjects compared to the state-of-the-art method. The experimental results demonstrate that using brain representation as a prompt to further drive LLM for auditory neural decoding is feasible and effective. The code is available at https://github.com/1994cxy/BP-GPT." + }, + "2502.15104v1": { + "title": "Estimating Neural Representation Alignment from Limited Inputs and Features", + "url": "http://arxiv.org/abs/2502.15104v1", + "authors": "Chanwoo Chun, Abdulkadir Canatar, SueYeon Chung, Daniel D. Lee", + "update_time": "2025-02-20", + "abstract": "In both artificial and biological systems, the centered kernel alignment (CKA) has become a widely used tool for quantifying neural representation similarity. While current CKA estimators typically correct for the effects of finite stimuli sampling, the effects of sampling a subset of neurons are overlooked, introducing notable bias in standard experimental scenarios. Here, we provide a theoretical analysis showing how this bias is affected by the representation geometry. We then introduce a novel estimator that corrects for both input and feature sampling. We use our method for evaluating both brain-to-brain and model-to-brain alignments and show that it delivers reliable comparisons even with very sparsely sampled neurons. We perform within-animal and across-animal comparisons on electrophysiological data from visual cortical areas V1, V4, and IT data, and use these as benchmarks to evaluate model-to-brain alignment. We also apply our method to reveal how object representations become progressively disentangled across layers in both biological and artificial systems. These findings underscore the importance of correcting feature-sampling biases in CKA and demonstrate that our bias-corrected estimator provides a more faithful measure of representation alignment. The improved estimates increase our understanding of how neural activity is structured across both biological and artificial systems." + }, + "2502.15056v1": { + "title": "Fundamental Survey on Neuromorphic Based Audio Classification", + "url": "http://arxiv.org/abs/2502.15056v1", + "authors": "Amlan Basu, Pranav Chaudhari, Gaetano Di Caterina", + "update_time": "2025-02-20", + "abstract": "Audio classification is paramount in a variety of applications including surveillance, healthcare monitoring, and environmental analysis. Traditional methods frequently depend on intricate signal processing algorithms and manually crafted features, which may fall short in fully capturing the complexities of audio patterns. Neuromorphic computing, inspired by the architecture and functioning of the human brain, presents a promising alternative for audio classification tasks. This survey provides an exhaustive examination of the current state-of-the-art in neuromorphic-based audio classification. It delves into the crucial components of neuromorphic systems, such as Spiking Neural Networks (SNNs), memristors, and neuromorphic hardware platforms, highlighting their advantages in audio classification. Furthermore, the survey explores various methodologies and strategies employed in neuromorphic audio classification, including event-based processing, spike-based learning, and bio-inspired feature extraction. It examines how these approaches address the limitations of traditional audio classification methods, particularly in terms of energy efficiency, real-time processing, and robustness to environmental noise. Additionally, the paper conducts a comparative analysis of different neuromorphic audio classification models and benchmarks, evaluating their performance metrics, computational efficiency, and scalability. By providing a comprehensive guide for researchers, engineers and practitioners, this survey aims to stimulate further innovation and advancements in the evolving field of neuromorphic audio classification." + }, + "2502.14731v1": { + "title": "Beyond Performance Scores: Directed Functional Connectivity as a Brain-Based Biomarker for Motor Skill Learning and Retention", + "url": "http://arxiv.org/abs/2502.14731v1", + "authors": "Anil Kamat, Rahul Rahul, Lora Cavuoto, Harry Burke, Matthew Hackett, Jack Norfleet, Steven Schwaitzberg, Suvranu De", + "update_time": "2025-02-20", + "abstract": "Motor skill acquisition in fields like surgery, robotics, and sports involves learning complex task sequences through extensive training. Traditional performance metrics, like execution time and error rates, offer limited insight as they fail to capture the neural mechanisms underlying skill learning and retention. This study introduces directed functional connectivity (dFC), derived from electroencephalography (EEG), as a novel brain-based biomarker for assessing motor skill learning and retention. For the first time, dFC is applied as a biomarker to map the stages of the Fitts and Posner motor learning model, offering new insights into the neural mechanisms underlying skill acquisition and retention. Unlike traditional measures, it captures both the strength and direction of neural information flow, providing a comprehensive understanding of neural adaptations across different learning stages. The analysis demonstrates that dFC can effectively identify and track the progression through various stages of the Fitts and Posner model. Furthermore, its stability over a six-week washout period highlights its utility in monitoring long-term retention. No significant changes in dFC were observed in a control group, confirming that the observed neural adaptations were specific to training and not due to external factors. By offering a granular view of the learning process at the group and individual levels, dFC facilitates the development of personalized, targeted training protocols aimed at enhancing outcomes in fields where precision and long-term retention are critical, such as surgical education. These findings underscore the value of dFC as a robust biomarker that complements traditional performance metrics, providing a deeper understanding of motor skill learning and retention." + } + }, + "EEG": { + "2502.15363v1": { + "title": "M2LADS Demo: A System for Generating Multimodal Learning Analytics Dashboards", + "url": "http://arxiv.org/abs/2502.15363v1", + "authors": "Alvaro Becerra, Roberto Daza, Ruth Cobos, Aythami Morales, Julian Fierrez", + "update_time": "2025-02-21", + "abstract": "We present a demonstration of a web-based system called M2LADS (\"System for Generating Multimodal Learning Analytics Dashboards\"), designed to integrate, synchronize, visualize, and analyze multimodal data recorded during computer-based learning sessions with biosensors. This system presents a range of biometric and behavioral data on web-based dashboards, providing detailed insights into various physiological and activity-based metrics. The multimodal data visualized include electroencephalogram (EEG) data for assessing attention and brain activity, heart rate metrics, eye-tracking data to measure visual attention, webcam video recordings, and activity logs of the monitored tasks. M2LADS aims to assist data scientists in two key ways: (1) by providing a comprehensive view of participants' experiences, displaying all data categorized by the activities in which participants are engaged, and (2) by synchronizing all biosignals and videos, facilitating easier data relabeling if any activity information contains errors." + }, + "2502.15198v1": { + "title": "Graph-Based Deep Learning on Stereo EEG for Predicting Seizure Freedom in Epilepsy Patients", + "url": "http://arxiv.org/abs/2502.15198v1", + "authors": "Artur Agaronyan, Syeda Abeera Amir, Nunthasiri Wittayanakorn, John Schreiber, Marius G. Linguraru, William Gaillard, Chima Oluigbo, Syed Muhammad Anwar", + "update_time": "2025-02-21", + "abstract": "Predicting seizure freedom is essential for tailoring epilepsy treatment. But accurate prediction remains challenging with traditional methods, especially with diverse patient populations. This study developed a deep learning-based graph neural network (GNN) model to predict seizure freedom from stereo electroencephalography (sEEG) data in patients with refractory epilepsy. We utilized high-quality sEEG data from 15 pediatric patients to train a deep learning model that can accurately predict seizure freedom outcomes and advance understanding of brain connectivity at the seizure onset zone. Our model integrates local and global connectivity using graph convolutions with multi-scale attention mechanisms to capture connections between difficult-to-study regions such as the thalamus and motor regions. The model achieved an accuracy of 92.4% in binary class analysis, 86.6% in patient-wise analysis, and 81.4% in multi-class analysis. Node and edge-level feature analysis highlighted the anterior cingulate and frontal pole regions as key contributors to seizure freedom outcomes. The nodes identified by our model were also more likely to coincide with seizure onset zones. Our findings underscore the potential of new connectivity-based deep learning models such as GNNs for enhancing the prediction of seizure freedom, predicting seizure onset zones, connectivity analysis of the brain during seizure, as well as informing AI-assisted personalized epilepsy treatment planning." + }, + "2502.15107v1": { + "title": "Assessing a Single Student's Concentration on Learning Platforms: A Machine Learning-Enhanced EEG-Based Framework", + "url": "http://arxiv.org/abs/2502.15107v1", + "authors": "Zewen Zhuo, Mohamad Najafi, Hazem Zein, Amine Nait-Ali", + "update_time": "2025-02-21", + "abstract": "This study introduces a specialized pipeline designed to classify the concentration state of an individual student during online learning sessions by training a custom-tailored machine learning model. Detailed protocols for acquiring and preprocessing EEG data are outlined, along with the extraction of fifty statistical features from five EEG signal bands: alpha, beta, theta, delta, and gamma. Following feature extraction, a thorough feature selection process was conducted to optimize the data inputs for a personalized analysis. The study also explores the benefits of hyperparameter fine-tuning to enhance the classification accuracy of the student's concentration state. EEG signals were captured from the student using a Muse headband (Gen 2), equipped with five electrodes (TP9, AF7, AF8, TP10, and a reference electrode NZ), during engagement with educational content on computer-based e-learning platforms. Employing a random forest model customized to the student's data, we achieved remarkable classification performance, with test accuracies of 97.6% in the computer-based learning setting and 98% in the virtual reality setting. These results underscore the effectiveness of our approach in delivering personalized insights into student concentration during online educational activities." + }, + "2502.14731v1": { + "title": "Beyond Performance Scores: Directed Functional Connectivity as a Brain-Based Biomarker for Motor Skill Learning and Retention", + "url": "http://arxiv.org/abs/2502.14731v1", + "authors": "Anil Kamat, Rahul Rahul, Lora Cavuoto, Harry Burke, Matthew Hackett, Jack Norfleet, Steven Schwaitzberg, Suvranu De", + "update_time": "2025-02-20", + "abstract": "Motor skill acquisition in fields like surgery, robotics, and sports involves learning complex task sequences through extensive training. Traditional performance metrics, like execution time and error rates, offer limited insight as they fail to capture the neural mechanisms underlying skill learning and retention. This study introduces directed functional connectivity (dFC), derived from electroencephalography (EEG), as a novel brain-based biomarker for assessing motor skill learning and retention. For the first time, dFC is applied as a biomarker to map the stages of the Fitts and Posner motor learning model, offering new insights into the neural mechanisms underlying skill acquisition and retention. Unlike traditional measures, it captures both the strength and direction of neural information flow, providing a comprehensive understanding of neural adaptations across different learning stages. The analysis demonstrates that dFC can effectively identify and track the progression through various stages of the Fitts and Posner model. Furthermore, its stability over a six-week washout period highlights its utility in monitoring long-term retention. No significant changes in dFC were observed in a control group, confirming that the observed neural adaptations were specific to training and not due to external factors. By offering a granular view of the learning process at the group and individual levels, dFC facilitates the development of personalized, targeted training protocols aimed at enhancing outcomes in fields where precision and long-term retention are critical, such as surgical education. These findings underscore the value of dFC as a robust biomarker that complements traditional performance metrics, providing a deeper understanding of motor skill learning and retention." + }, + "2502.14534v1": { + "title": "Poststroke rehabilitative mechanisms in individualized fatigue level-controlled treadmill training -- a Rat Model Study", + "url": "http://arxiv.org/abs/2502.14534v1", + "authors": "Yuchen Xu, Yulong Peng, Yuanfa Yao, Xiaoman Fan, Minmin Wang, Feng Gao, Mohamad Sawan, Shaomin Zhang, Xiaoling Hu", + "update_time": "2025-02-20", + "abstract": "Individualized training improved post-stroke motor function rehabilitation efficiency. However, the mechanisms of how individualized training facilitates recovery is not clear. This study explored the cortical and corticomuscular rehabilitative effects in post-stroke motor function recovery during individualized training. Sprague-Dawley rats with intracerebral hemorrhage (ICH) were randomly distributed into two groups: forced training (FOR-T, n=13) and individualized fatigue-controlled training (FAT-C, n=13) to receive training respectively from day 2 to day 14 post-stroke. The FAT-C group exhibited superior motor function recovery and less central fatigue compared to the FOR-T group. EEG PSD slope analysis demonstrated a better inter-hemispheric balance in FAT-C group compare to the FOR-T group. The dCMC analysis indicated that training-induced fatigue led to a short-term down-regulation of descending corticomuscular coherence (dCMC) and an up-regulation of ascending dCMC. In the long term, excessive fatigue hindered the recovery of descending control in the affected hemisphere. The individualized strategy of peripheral fatigue-controlled training achieved better motor function recovery, which could be attributed to the mitigation of central fatigue, optimization of inter-hemispheric balance and enhancement of descending control in the affected hemisphere." + }, + "2502.14227v1": { + "title": "SleepGMUformer: A gated multimodal temporal neural network for sleep staging", + "url": "http://arxiv.org/abs/2502.14227v1", + "authors": "Chenjun Zhao, Xuesen Niu, Xinglin Yu, Long Chen, Na Lv, Huiyu Zhou, Aite Zhao", + "update_time": "2025-02-20", + "abstract": "Sleep staging is a key method for assessing sleep quality and diagnosing sleep disorders. However, current deep learning methods face challenges: 1) postfusion techniques ignore the varying contributions of different modalities; 2) unprocessed sleep data can interfere with frequency-domain information. To tackle these issues, this paper proposes a gated multimodal temporal neural network for multidomain sleep data, including heart rate, motion, steps, EEG (Fpz-Cz, Pz-Oz), and EOG from WristHR-Motion-Sleep and SleepEDF-78. The model integrates: 1) a pre-processing module for feature alignment, missing value handling, and EEG de-trending; 2) a feature extraction module for complex sleep features in the time dimension; and 3) a dynamic fusion module for real-time modality weighting.Experiments show classification accuracies of 85.03% on SleepEDF-78 and 94.54% on WristHR-Motion-Sleep datasets. The model handles heterogeneous datasets and outperforms state-of-the-art models by 1.00%-4.00%." + }, + "2502.13362v1": { + "title": "Dynamic directed functional connectivity as a neural biomarker for objective motor skill assessment", + "url": "http://arxiv.org/abs/2502.13362v1", + "authors": "Anil Kamat, Rahul Rahul, Anirban Dutta, Lora Cavuoto, Uwe Kruger, Harry Burke, Matthew Hackett, Jack Norfleet, Steven Schwaitzberg, Suvranu De", + "update_time": "2025-02-19", + "abstract": "Objective motor skill assessment plays a critical role in fields such as surgery, where proficiency is vital for certification and patient safety. Existing assessment methods, however, rely heavily on subjective human judgment, which introduces bias and limits reproducibility. While recent efforts have leveraged kinematic data and neural imaging to provide more objective evaluations, these approaches often overlook the dynamic neural mechanisms that differentiate expert and novice performance. This study proposes a novel method for motor skill assessment based on dynamic directed functional connectivity (dFC) as a neural biomarker. By using electroencephalography (EEG) to capture brain dynamics and employing an attention-based Long Short-Term Memory (LSTM) model for non-linear Granger causality analysis, we compute dFC among key brain regions involved in psychomotor tasks. Coupled with hierarchical task analysis (HTA), our approach enables subtask-level evaluation of motor skills, offering detailed insights into neural coordination that underpins expert proficiency. A convolutional neural network (CNN) is then used to classify skill levels, achieving greater accuracy and specificity than established performance metrics in laparoscopic surgery. This methodology provides a reliable, objective framework for assessing motor skills, contributing to the development of tailored training protocols and enhancing the certification process." + }, + "2502.12814v1": { + "title": "Dimension reduction methods, persistent homology and machine learning for EEG signal analysis of Interictal Epileptic Discharges", + "url": "http://arxiv.org/abs/2502.12814v1", + "authors": "Annika Stiehl, Stefan Gei\u00dfels\u00f6der, Nicole Ille, Fabienne Anselstetter, Harald Bornfleth, Christian Uhl", + "update_time": "2025-02-18", + "abstract": "Recognizing specific events in medical data requires trained personnel. To aid the classification, machine learning algorithms can be applied. In this context, medical records are usually high-dimensional, although a lower dimension can also reflect the dynamics of the signal. In this study, electroencephalogram data with Interictal Epileptic Discharges (IEDs) are investigated. First, the dimensions are reduced using Dynamical Component Analysis (DyCA) and Principal Component Analysis (PCA), respectively. The reduced data are examined using topological data analysis (TDA), specifically using a persistent homology algorithm. The persistent homology results are used for targeted feature generation. The features are used to train and evaluate a Support Vector Machine (SVM) to distinguish IEDs from background activities.", + "code_url": "https://github.com/HS-Ansbach-CCS/dyca" + }, + "2502.12048v2": { + "title": "A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond", + "url": "http://arxiv.org/abs/2502.12048v2", + "authors": "Shreya Shukla, Jose Torres, Abhijit Mishra, Jacek Gwizdka, Shounak Roychowdhury", + "update_time": "2025-02-18", + "abstract": "Integration of Brain-Computer Interfaces (BCIs) and Generative Artificial Intelligence (GenAI) has opened new frontiers in brain signal decoding, enabling assistive communication, neural representation learning, and multimodal integration. BCIs, particularly those leveraging Electroencephalography (EEG), provide a non-invasive means of translating neural activity into meaningful outputs. Recent advances in deep learning, including Generative Adversarial Networks (GANs) and Transformer-based Large Language Models (LLMs), have significantly improved EEG-based generation of images, text, and speech. This paper provides a literature review of the state-of-the-art in EEG-based multimodal generation, focusing on (i) EEG-to-image generation through GANs, Variational Autoencoders (VAEs), and Diffusion Models, and (ii) EEG-to-text generation leveraging Transformer based language models and contrastive learning methods. Additionally, we discuss the emerging domain of EEG-to-speech synthesis, an evolving multimodal frontier. We highlight key datasets, use cases, challenges, and EEG feature encoding methods that underpin generative approaches. By providing a structured overview of EEG-based generative AI, this survey aims to equip researchers and practitioners with insights to advance neural decoding, enhance assistive technologies, and expand the frontiers of brain-computer interaction." + }, + "2502.11752v1": { + "title": "Early Detection of Human Handover Intentions in Human-Robot Collaboration: Comparing EEG, Gaze, and Hand Motion", + "url": "http://arxiv.org/abs/2502.11752v1", + "authors": "Parag Khanna, Nona Rajabi, Sumeyra U. Demir Kanik, Danica Kragic, M\u00e5rten Bj\u00f6rkman, Christian Smith", + "update_time": "2025-02-17", + "abstract": "Human-robot collaboration (HRC) relies on accurate and timely recognition of human intentions to ensure seamless interactions. Among common HRC tasks, human-to-robot object handovers have been studied extensively for planning the robot's actions during object reception, assuming the human intention for object handover. However, distinguishing handover intentions from other actions has received limited attention. Most research on handovers has focused on visually detecting motion trajectories, which often results in delays or false detections when trajectories overlap. This paper investigates whether human intentions for object handovers are reflected in non-movement-based physiological signals. We conduct a multimodal analysis comparing three data modalities: electroencephalogram (EEG), gaze, and hand-motion signals. Our study aims to distinguish between handover-intended human motions and non-handover motions in an HRC setting, evaluating each modality's performance in predicting and classifying these actions before and after human movement initiation. We develop and evaluate human intention detectors based on these modalities, comparing their accuracy and timing in identifying handover intentions. To the best of our knowledge, this is the first study to systematically develop and test intention detectors across multiple modalities within the same experimental context of human-robot handovers. Our analysis reveals that handover intention can be detected from all three modalities. Nevertheless, gaze signals are the earliest as well as the most accurate to classify the motion as intended for handover or non-handover." + } + }, + "BCI": { + "2502.12048v2": { + "title": "A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond", + "url": "http://arxiv.org/abs/2502.12048v2", + "authors": "Shreya Shukla, Jose Torres, Abhijit Mishra, Jacek Gwizdka, Shounak Roychowdhury", + "update_time": "2025-02-18", + "abstract": "Integration of Brain-Computer Interfaces (BCIs) and Generative Artificial Intelligence (GenAI) has opened new frontiers in brain signal decoding, enabling assistive communication, neural representation learning, and multimodal integration. BCIs, particularly those leveraging Electroencephalography (EEG), provide a non-invasive means of translating neural activity into meaningful outputs. Recent advances in deep learning, including Generative Adversarial Networks (GANs) and Transformer-based Large Language Models (LLMs), have significantly improved EEG-based generation of images, text, and speech. This paper provides a literature review of the state-of-the-art in EEG-based multimodal generation, focusing on (i) EEG-to-image generation through GANs, Variational Autoencoders (VAEs), and Diffusion Models, and (ii) EEG-to-text generation leveraging Transformer based language models and contrastive learning methods. Additionally, we discuss the emerging domain of EEG-to-speech synthesis, an evolving multimodal frontier. We highlight key datasets, use cases, challenges, and EEG feature encoding methods that underpin generative approaches. By providing a structured overview of EEG-based generative AI, this survey aims to equip researchers and practitioners with insights to advance neural decoding, enhance assistive technologies, and expand the frontiers of brain-computer interaction." + }, + "2502.11659v2": { + "title": "An Innovative Brain-Computer Interface Interaction System Based on the Large Language Model", + "url": "http://arxiv.org/abs/2502.11659v2", + "authors": "Jing Jin, Yutao Zhang, Ruitian Xu, Yixin Chen", + "update_time": "2025-02-18", + "abstract": "Recent advancements in large language models (LLMs) provide a more effective pathway for upgrading brain-computer interface (BCI) technology in terms of user interaction. The widespread adoption of BCIs in daily application scenarios is still limited by factors such as their single functionality, restricted paradigm design, weak multilingual support, and low levels of intelligence. In this paper, we propose an innovative BCI system that deeply integrates a steady-state visual evoked potential (SSVEP) speller with an LLM application programming interface (API). It allows natural language input through the SSVEP speller and dynamically calls large models to generate SSVEP paradigms. The command prompt, blinking frequency, and layout position are adjustable to meet the user's control requirements in various scenarios. More than ten languages are compatible with the multilingual support of LLM. A variety of task scenarios, such as home appliance control, robotic arm operation, and unmanned aerial vehicle (UAV) management are provided. The task interfaces of the system can be personalized according to the user's habits, usage scenarios, and equipment characteristics. By combining the SSVEP speller with an LLM, the system solves numerous challenges faced by current BCI systems and makes breakthroughs in functionality, intelligence, and multilingual support. The introduction of LLM not only enhances user experience but also expands the potential applications of BCI technology in real-world environments." + }, + "2502.10994v1": { + "title": "SSVEP-BiMA: Bifocal Masking Attention Leveraging Native and Symmetric-Antisymmetric Components for Robust SSVEP Decoding", + "url": "http://arxiv.org/abs/2502.10994v1", + "authors": "Yuxin Liu, Zhenxi Song, Guoyang Xu, Zirui Wang, Feng Wan, Yong Hu, Min Zhang, Zhiguo Zhang", + "update_time": "2025-02-16", + "abstract": "Brain-computer interface (BCI) based on steady-state visual evoked potentials (SSVEP) is a popular paradigm for its simplicity and high information transfer rate (ITR). Accurate and fast SSVEP decoding is crucial for reliable BCI performance. However, conventional decoding methods demand longer time windows, and deep learning models typically require subject-specific fine-tuning, leaving challenges in achieving optimal performance in cross-subject settings. This paper proposed a biofocal masking attention-based method (SSVEP-BiMA) that synergistically leverages the native and symmetric-antisymmetric components for decoding SSVEP. By utilizing multiple signal representations, the network is able to integrate features from a wider range of sample perspectives, leading to more generalized and comprehensive feature learning, which enhances both prediction accuracy and robustness. We performed experiments on two public datasets, and the results demonstrate that our proposed method surpasses baseline approaches in both accuracy and ITR. We believe that this work will contribute to the development of more efficient SSVEP-based BCI systems." + }, + "2502.09203v1": { + "title": "Revisiting Euclidean Alignment for Transfer Learning in EEG-Based Brain-Computer Interfaces", + "url": "http://arxiv.org/abs/2502.09203v1", + "authors": "Dongrui Wu", + "update_time": "2025-02-13", + "abstract": "Due to the non-stationarity and large individual differences of EEG signals, EEG-based brain-computer interfaces (BCIs) usually need subject-specific calibration to tailor the decoding algorithm for each new subject, which is time-consuming and user-unfriendly, hindering their real-world applications. Transfer learning (TL) has been extensively used to expedite the calibration, by making use of EEG data from other subjects/sessions. An important consideration in TL for EEG-based BCIs is to reduce the data distribution discrepancies among different subjects/session, to avoid negative transfer. Euclidean alignment (EA) was proposed in 2020 to address this challenge. Numerous experiments from 10 different BCI paradigms demonstrated its effectiveness and efficiency. This paper revisits the EA, explaining its procedure and correct usage, introducing its applications and extensions, and pointing out potential new research directions. It should be very helpful to BCI researchers, especially those who are working on EEG signal decoding." + }, + "2502.08373v1": { + "title": "Uncertainty Aware Human-machine Collaboration in Camouflaged Object Detection", + "url": "http://arxiv.org/abs/2502.08373v1", + "authors": "Ziyue Yang, Kehan Wang, Yuhang Ming, Yong Peng, Han Yang, Qiong Chen, Wanzeng Kong", + "update_time": "2025-02-12", + "abstract": "Camouflaged Object Detection (COD), the task of identifying objects concealed within their environments, has seen rapid growth due to its wide range of practical applications. A key step toward developing trustworthy COD systems is the estimation and effective utilization of uncertainty. In this work, we propose a human-machine collaboration framework for classifying the presence of camouflaged objects, leveraging the complementary strengths of computer vision (CV) models and noninvasive brain-computer interfaces (BCIs). Our approach introduces a multiview backbone to estimate uncertainty in CV model predictions, utilizes this uncertainty during training to improve efficiency, and defers low-confidence cases to human evaluation via RSVP-based BCIs during testing for more reliable decision-making. We evaluated the framework in the CAMO dataset, achieving state-of-the-art results with an average improvement of 4.56\\% in balanced accuracy (BA) and 3.66\\% in the F1 score compared to existing methods. For the best-performing participants, the improvements reached 7.6\\% in BA and 6.66\\% in the F1 score. Analysis of the training process revealed a strong correlation between our confidence measures and precision, while an ablation study confirmed the effectiveness of the proposed training policy and the human-machine collaboration strategy. In general, this work reduces human cognitive load, improves system reliability, and provides a strong foundation for advancements in real-world COD applications and human-computer interaction. Our code and data are available at: https://github.com/ziyuey/Uncertainty-aware-human-machine-collaboration-in-camouflaged-object-identification.", + "code_url": "https://github.com/ziyuey/uncertainty-aware-human-machine-collaboration-in-camouflaged-object-identification" + }, + "2502.05334v1": { + "title": "Geometric Machine Learning on EEG Signals", + "url": "http://arxiv.org/abs/2502.05334v1", + "authors": "Benjamin J. Choi", + "update_time": "2025-02-07", + "abstract": "Brain-computer interfaces (BCIs) offer transformative potential, but decoding neural signals presents significant challenges. The core premise of this paper is built around demonstrating methods to elucidate the underlying low-dimensional geometric structure present in high-dimensional brainwave data in order to assist in downstream BCI-related neural classification tasks. We demonstrate two pipelines related to electroencephalography (EEG) signal processing: (1) a preliminary pipeline removing noise from individual EEG channels, and (2) a downstream manifold learning pipeline uncovering geometric structure across networks of EEG channels. We conduct preliminary validation using two EEG datasets and situate our demonstration in the context of the BCI-relevant imagined digit decoding problem. Our preliminary pipeline uses an attention-based EEG filtration network to extract clean signal from individual EEG channels. Our primary pipeline uses a fast Fourier transform, a Laplacian eigenmap, a discrete analog of Ricci flow via Ollivier's notion of Ricci curvature, and a graph convolutional network to perform dimensionality reduction on high-dimensional multi-channel EEG data in order to enable regularizable downstream classification. Our system achieves competitive performance with existing signal processing and classification benchmarks; we demonstrate a mean test correlation coefficient of >0.95 at 2 dB on semi-synthetic neural denoising and a downstream EEG-based classification accuracy of 0.97 on distinguishing digit- versus non-digit thoughts. Results are preliminary and our geometric machine learning pipeline should be validated by more extensive follow-up studies; generalizing these results to larger inter-subject sample sizes, different hardware systems, and broader use cases will be crucial." + }, + "2502.04132v1": { + "title": "Transfer Learning for Covert Speech Classification Using EEG Hilbert Envelope and Temporal Fine Structure", + "url": "http://arxiv.org/abs/2502.04132v1", + "authors": "Saravanakumar Duraisamy, Mateusz Dubiel, Maurice Rekrut, Luis A. Leiva", + "update_time": "2025-02-06", + "abstract": "Brain-Computer Interfaces (BCIs) can decode imagined speech from neural activity. However, these systems typically require extensive training sessions where participants imaginedly repeat words, leading to mental fatigue and difficulties identifying the onset of words, especially when imagining sequences of words. This paper addresses these challenges by transferring a classifier trained in overt speech data to covert speech classification. We used electroencephalogram (EEG) features derived from the Hilbert envelope and temporal fine structure, and used them to train a bidirectional long-short-term memory (BiLSTM) model for classification. Our method reduces the burden of extensive training and achieves state-of-the-art classification accuracy: 86.44% for overt speech and 79.82% for covert speech using the overt speech classifier." + }, + "2502.03736v2": { + "title": "Decoding Human Attentive States from Spatial-temporal EEG Patches Using Transformers", + "url": "http://arxiv.org/abs/2502.03736v2", + "authors": "Yi Ding, Joon Hei Lee, Shuailei Zhang, Tianze Luo, Cuntai Guan", + "update_time": "2025-02-07", + "abstract": "Learning the spatial topology of electroencephalogram (EEG) channels and their temporal dynamics is crucial for decoding attention states. This paper introduces EEG-PatchFormer, a transformer-based deep learning framework designed specifically for EEG attention classification in Brain-Computer Interface (BCI) applications. By integrating a Temporal CNN for frequency-based EEG feature extraction, a pointwise CNN for feature enhancement, and Spatial and Temporal Patching modules for organizing features into spatial-temporal patches, EEG-PatchFormer jointly learns spatial-temporal information from EEG data. Leveraging the global learning capabilities of the self-attention mechanism, it captures essential features across brain regions over time, thereby enhancing EEG data decoding performance. Demonstrating superior performance, EEG-PatchFormer surpasses existing benchmarks in accuracy, area under the ROC curve (AUC), and macro-F1 score on a public cognitive attention dataset. The code can be found via: https://github.com/yi-ding-cs/EEG-PatchFormer .", + "code_url": "https://github.com/yi-ding-cs/eeg-patchformer" + }, + "2502.06828v1": { + "title": "Fine-Tuning Strategies for Continual Online EEG Motor Imagery Decoding: Insights from a Large-Scale Longitudinal Study", + "url": "http://arxiv.org/abs/2502.06828v1", + "authors": "Martin Wimpff, Bruno Aristimunha, Sylvain Chevallier, Bin Yang", + "update_time": "2025-02-05", + "abstract": "This study investigates continual fine-tuning strategies for deep learning in online longitudinal electroencephalography (EEG) motor imagery (MI) decoding within a causal setting involving a large user group and multiple sessions per participant. We are the first to explore such strategies across a large user group, as longitudinal adaptation is typically studied in the single-subject setting with a single adaptation strategy, which limits the ability to generalize findings. First, we examine the impact of different fine-tuning approaches on decoder performance and stability. Building on this, we integrate online test-time adaptation (OTTA) to adapt the model during deployment, complementing the effects of prior fine-tuning. Our findings demonstrate that fine-tuning that successively builds on prior subject-specific information improves both performance and stability, while OTTA effectively adapts the model to evolving data distributions across consecutive sessions, enabling calibration-free operation. These results offer valuable insights and recommendations for future research in longitudinal online MI decoding and highlight the importance of combining domain adaptation strategies for improving BCI performance in real-world applications. Clinical Relevance: Our investigation enables more stable and efficient long-term motor imagery decoding, which is critical for neurorehabilitation and assistive technologies.", + "code_url": "https://github.com/martinwimpff/eeg-continual" + }, + "2502.02830v1": { + "title": "Multimodal Brain-Computer Interfaces: AI-powered Decoding Methodologies", + "url": "http://arxiv.org/abs/2502.02830v1", + "authors": "Siyang Li, Hongbin Wang, Xiaoqing Chen, Dongrui Wu", + "update_time": "2025-02-05", + "abstract": "Brain-computer interfaces (BCIs) enable direct communication between the brain and external devices. This review highlights the core decoding algorithms that enable multimodal BCIs, including a dissection of the elements, a unified view of diversified approaches, and a comprehensive analysis of the present state of the field. We emphasize algorithmic advancements in cross-modality mapping, sequential modeling, besides classic multi-modality fusion, illustrating how these novel AI approaches enhance decoding of brain data. The current literature of BCI applications on visual, speech, and affective decoding are comprehensively explored. Looking forward, we draw attention on the impact of emerging architectures like multimodal Transformers, and discuss challenges such as brain data heterogeneity and common errors. This review also serves as a bridge in this interdisciplinary field for experts with neuroscience background and experts that study AI, aiming to provide a comprehensive understanding for AI-powered multimodal BCIs." + } + }, + "fMRI": { + "2502.15595v1": { + "title": "Causal Modeling of fMRI Time-series for Interpretable Autism Spectrum Disorder Classification", + "url": "http://arxiv.org/abs/2502.15595v1", + "authors": "Peiyu Duan, Nicha C. Dvornek, Jiyao Wang, Lawrence H. Staib, James S. Duncan", + "update_time": "2025-02-21", + "abstract": "Autism spectrum disorder (ASD) is a neurological and developmental disorder that affects social and communicative behaviors. It emerges in early life and is generally associated with lifelong disabilities. Thus, accurate and early diagnosis could facilitate treatment outcomes for those with ASD. Functional magnetic resonance imaging (fMRI) is a useful tool that measures changes in brain signaling to facilitate our understanding of ASD. Much effort is being made to identify ASD biomarkers using various connectome-based machine learning and deep learning classifiers. However, correlation-based models cannot capture the non-linear interactions between brain regions. To solve this problem, we introduce a causality-inspired deep learning model that uses time-series information from fMRI and captures causality among ROIs useful for ASD classification. The model is compared with other baseline and state-of-the-art models with 5-fold cross-validation on the ABIDE dataset. We filtered the dataset by choosing all the images with mean FD less than 15mm to ensure data quality. Our proposed model achieved the highest average classification accuracy of 71.9% and an average AUC of 75.8%. Moreover, the inter-ROI causality interpretation of the model suggests that the left precuneus, right precuneus, and cerebellum are placed in the top 10 ROIs in inter-ROI causality among the ASD population. In contrast, these ROIs are not ranked in the top 10 in the control population. We have validated our findings with the literature and found that abnormalities in these ROIs are often associated with ASD." + }, + "2502.15172v1": { + "title": "BP-GPT: Auditory Neural Decoding Using fMRI-prompted LLM", + "url": "http://arxiv.org/abs/2502.15172v1", + "authors": "Xiaoyu Chen, Changde Du, Che Liu, Yizhe Wang, Huiguang He", + "update_time": "2025-02-21", + "abstract": "Decoding language information from brain signals represents a vital research area within brain-computer interfaces, particularly in the context of deciphering the semantic information from the fMRI signal. Although existing work uses LLM to achieve this goal, their method does not use an end-to-end approach and avoids the LLM in the mapping of fMRI-to-text, leaving space for the exploration of the LLM in auditory decoding. In this paper, we introduce a novel method, the Brain Prompt GPT (BP-GPT). By using the brain representation that is extracted from the fMRI as a prompt, our method can utilize GPT-2 to decode fMRI signals into stimulus text. Further, we introduce the text prompt and align the fMRI prompt to it. By introducing the text prompt, our BP-GPT can extract a more robust brain prompt and promote the decoding of pre-trained LLM. We evaluate our BP-GPT on the open-source auditory semantic decoding dataset and achieve a significant improvement up to 4.61 on METEOR and 2.43 on BERTScore across all the subjects compared to the state-of-the-art method. The experimental results demonstrate that using brain representation as a prompt to further drive LLM for auditory neural decoding is feasible and effective. The code is available at https://github.com/1994cxy/BP-GPT." + }, + "2502.14671v2": { + "title": "Explanations of Deep Language Models Explain Language Representations in the Brain", + "url": "http://arxiv.org/abs/2502.14671v2", + "authors": "Maryam Rahimi, Yadollah Yaghoobzadeh, Mohammad Reza Daliri", + "update_time": "2025-02-21", + "abstract": "Recent advances in artificial intelligence have given rise to large language models (LLMs) that not only achieve human-like performance but also share computational principles with the brain's language processing mechanisms. While previous research has primarily focused on aligning LLMs' internal representations with neural activity, we introduce a novel approach that leverages explainable AI (XAI) methods to forge deeper connections between the two domains. Using attribution methods, we quantified how preceding words contribute to an LLM's next-word predictions and employed these explanations to predict fMRI recordings from participants listening to the same narratives. Our findings demonstrate that attribution methods robustly predict brain activity across the language network, surpassing traditional internal representations in early language areas. This alignment is hierarchical: early-layer explanations correspond to the initial stages of language processing in the brain, while later layers align with more advanced stages. Moreover, the layers more influential on LLM next-word prediction$\\unicode{x2014}$those with higher attribution scores$\\unicode{x2014}$exhibited stronger alignment with neural activity. This work establishes a bidirectional bridge between AI and neuroscience. First, we demonstrate that attribution methods offer a powerful lens for investigating the neural mechanisms of language comprehension, revealing how meaning emerges from preceding context. Second, we propose using brain alignment as a metric to evaluate the validity of attribution methods, providing a framework for assessing their biological plausibility." + }, + "2502.11096v1": { + "title": "Mixture of Tunable Experts -- Behavior Modification of DeepSeek-R1 at Inference Time", + "url": "http://arxiv.org/abs/2502.11096v1", + "authors": "Robert Dahlke, Henrik Klagges, Dan Zecha, Benjamin Merkel, Sven Rohr, Fabian Klemm", + "update_time": "2025-02-16", + "abstract": "We present the Mixture-of-Tunable-Experts (MoTE), a method that extends the Mixture-of-Experts architecture of Large Language Models (LLMs). Without additional training, MoTE enables meaningful and focused behavior changes in LLMs on-the-fly during inference time. By analyzing the digital LLM brain of DeepSeek-R1 using a technique we dub 'functional Token Resonance Imaging' (fTRI) -- inspired by fMRI and using prompts designed to elicit specific behavior (e.g., 'What happened {time}{place}?') -- we empirically identify distinctive experts associated with behaviors like refusal responses. Using MoTE we are able to intervene and control such specific behavior. We switched off the top 10 most refusal-relevant experts (0.07% of R1's 14,848 routed experts), achieving a 52% refusal reduction on sensitive reference prompts without performance degradation on MT-Bench. Random expert deactivation resulted in smaller behavioral shifts with increased noise, whereas forced expert activation led to significantly higher refusal rates. Our approach shares similarities with sparse autoencoders (SAEs) in terms of explainability and steerability. Unlike SAEs, MoTE does not require large training efforts, as within MoEs with a vast number of experts, specialization already emerged naturally during pretraining. Our findings suggest that significant functional mechanisms in Mixture-of-Experts architectures can at least partially be localized in a small number of specific experts, rather than being distributed throughout the model's weights. Expert subgroups can be tuned to trigger significant behavior variations, providing insights into the inner workings of LLMs." + }, + "2502.10662v1": { + "title": "Towards Zero-Shot Task-Generalizable Learning on fMRI", + "url": "http://arxiv.org/abs/2502.10662v1", + "authors": "Jiyao Wang, Nicha C. Dvornek, Peiyu Duan, Lawrence H. Staib, James S. Duncan", + "update_time": "2025-02-15", + "abstract": "Functional MRI measuring BOLD signal is an increasingly important imaging modality in studying brain functions and neurological disorders. It can be acquired in either a resting-state or a task-based paradigm. Compared to resting-state fMRI, task-based fMRI is acquired while the subject is performing a specific task designed to enhance study-related brain activities. Consequently, it generally has more informative task-dependent signals. However, due to the variety of task designs, it is much more difficult than in resting state to aggregate task-based fMRI acquired in different tasks to train a generalizable model. To resolve this complication, we propose a supervised task-aware network TA-GAT that jointly learns a general-purpose encoder and task-specific contextual information. The encoder-generated embedding and the learned contextual information are then combined as input to multiple modules for performing downstream tasks. We believe that the proposed task-aware architecture can plug-and-play in any neural network architecture to incorporate the prior knowledge of fMRI tasks into capturing functional brain patterns." + }, + "2502.08694v1": { + "title": "Neuronal Correlates of Semantic Event Classes during Presentation of Complex Naturalistic Stimuli: Anatomical Patterns, Context-Sensitivity, and Potential Impact on shared Human-Robot Ontologies", + "url": "http://arxiv.org/abs/2502.08694v1", + "authors": "Florian Ahrens, Mihai Pomarlan, Daniel Be\u00dfler, Michael Beetz, Manfred Herrmann", + "update_time": "2025-02-12", + "abstract": "The present study forms part of a research project that aims to develop cognition-enabled robotic agents with environmental interaction capabilities close to human proficiency. This approach is based on human-derived neuronal data in combination with a shared ontology to enable robots to learn from human experiences. To gain further insight into the relation between human neuronal activity patterns and ontological classes, we introduced General Linear Model (GLM) analyses on fMRI data of participants who were presented with complex naturalistic video stimuli comparable to the robot tasks. We modeled four event classes (pick, place, fetch and deliver) attached to different environmental and object-related context and employed a Representational Similarity Analysis (RSA) on associated brain activity patterns as a starting point for an automatic hierarchical clustering. Based on the default values for the Hemodynamic Response Function (HRF), the activity patterns were reliably grouped according to their parent classes of object interaction and navigation. Although fetch and deliver events were also distinguished by neuronal patterns, pick and place events demonstrated higher ambiguity with respect to neuronal activation patterns. Introducing a shorter HRF time-to-peak leads to a more reliable grouping of all four semantic classes, despite contextual factors. These data might give novel insights into the neuronal representation of complex stimuli and may enable further research in ontology validation in cognition-enabled robotics." + }, + "2502.08025v2": { + "title": "From Brainwaves to Brain Scans: A Robust Neural Network for EEG-to-fMRI Synthesis", + "url": "http://arxiv.org/abs/2502.08025v2", + "authors": "Kristofer Grover Roos, Atsushi Fukuda, Quan Huu Cap", + "update_time": "2025-02-15", + "abstract": "While functional magnetic resonance imaging (fMRI) offers rich spatial resolution, it is limited by high operational costs and significant infrastructural demands. In contrast, electroencephalography (EEG) provides millisecond-level precision in capturing electrical activity but lacks the spatial resolution necessary for precise neural localization. To bridge these gaps, we introduce E2fNet, a simple yet effective deep learning model for synthesizing fMRI images from low-cost EEG data. E2fNet is specifically designed to capture and translate meaningful features from EEG across electrode channels into accurate fMRI representations. Extensive evaluations across three datasets demonstrate that E2fNet consistently outperforms existing methods, achieving state-of-the-art results in terms of the structural similarity index measure (SSIM). Our findings suggest that E2fNet is a promising, cost-effective solution for enhancing neuroimaging capabilities. The code is available at https://github.com/kgr20/E2fNet.", + "code_url": "https://github.com/kgr20/e2fnet" + }, + "2502.06920v1": { + "title": "Direct Estimation of Pediatric Heart Rate Variability from BOLD-fMRI: A Machine Learning Approach Using Dynamic Connectivity", + "url": "http://arxiv.org/abs/2502.06920v1", + "authors": "Abdoljalil Addeh, Karen Ardila, Rebecca J Williams, G. Bruce Pike, M. Ethan MacDonald", + "update_time": "2025-02-10", + "abstract": "In many pediatric fMRI studies, cardiac signals are often missing or of poor quality. A tool to extract Heart Rate Variation (HRV) waveforms directly from fMRI data, without the need for peripheral recording devices, would be highly beneficial. We developed a machine learning framework to accurately reconstruct HRV for pediatric applications. A hybrid model combining one-dimensional Convolutional Neural Networks (1D-CNN) and Gated Recurrent Units (GRU) analyzed BOLD signals from 628 ROIs, integrating past and future data. The model achieved an 8% improvement in HRV accuracy, as evidenced by enhanced performance metrics. This approach eliminates the need for peripheral photoplethysmography devices, reduces costs, and simplifies procedures in pediatric fMRI. Additionally, it improves the robustness of pediatric fMRI studies, which are more sensitive to physiological and developmental variations than those in adults." + }, + "2502.05814v1": { + "title": "Topological Time Frequency Analysis of Functional Brain Signals", + "url": "http://arxiv.org/abs/2502.05814v1", + "authors": "Moo K. Chung, Aaron F. Struck", + "update_time": "2025-02-09", + "abstract": "We present a novel topological framework for analyzing functional brain signals using time-frequency analysis. By integrating persistent homology with time-frequency representations, we capture multi-scale topological features that characterize the dynamic behavior of brain activity. This approach identifies 0D (connected components) and 1D (loops) topological structures in the signal's time-frequency domain, enabling robust extraction of features invariant to noise and temporal misalignments. The proposed method is demonstrated on resting-state functional magnetic resonance imaging (fMRI) data, showcasing its ability to discern critical topological patterns and provide insights into functional connectivity. This topological approach opens new avenues for analyzing complex brain signals, offering potential applications in neuroscience and clinical diagnostics." + }, + "2502.05493v1": { + "title": "Multi-Site rs-fMRI Domain Alignment for Autism Spectrum Disorder Auxiliary Diagnosis Based on Hyperbolic Space", + "url": "http://arxiv.org/abs/2502.05493v1", + "authors": "Yiqian Luo, Qiurong Chen, Yangsong Zhang", + "update_time": "2025-02-08", + "abstract": "In the medical field, most resting-state fMRI (rs-fMRI) data are collected from multiple hospital sites. Multi-site rs-fMRI data can increase the volume of training data, enabling auxiliary diagnostic algorithms for brain diseases to learn more accurate and stable models. However, due to the significant heterogeneity and domain shift in rs-fMRI data across different sites, the accuracy of auxiliary diagnosis remains unsatisfactory. Moreover, there has been limited exploration of multi-source domain adaptation algorithms, and the interpretability of models is often poor. To address these challenges, we proposed a domain-adaptive algorithm based on hyperbolic space embedding. Hyperbolic space is naturally suited for representing the topology of complex networks such as brain functional networks. Therefore, we embedded the brain functional network into hyperbolic space and constructed the corresponding hyperbolic space community network to effectively extract brain network representations. To address the heterogeneity of data across different sites and the issue of domain shift, we introduce a constraint loss function, HMMD (Hyperbolic Maximum Mean Discrepancy), to align the marginal distributions in the hyperbolic space. Additionally, we employ class prototype alignment to align the conditional distributions. This significantly improves the quality of brain representations and enhances diagnostic classification accuracy for Autism Spectrum Disorder (ASD). Experimental results demonstrated that the proposed algorithm is robust to multi-site heterogeneity and shows promising potential for brain network mechanism analysis." + } + }, + "MEG": { + "2502.07429v2": { + "title": "From Thought to Action: How a Hierarchy of Neural Dynamics Supports Language Production", + "url": "http://arxiv.org/abs/2502.07429v2", + "authors": "Mingfang Zhang, Jarod L\u00e9vy, St\u00e9phane d'Ascoli, J\u00e9r\u00e9my Rapin, F. -Xavier Alario, Pierre Bourdillon, Svetlana Pinet, Jean-R\u00e9mi King", + "update_time": "2025-02-18", + "abstract": "Humans effortlessly communicate their thoughts through intricate sequences of motor actions. Yet, the neural processes that coordinate language production remain largely unknown, in part because speech artifacts limit the use of neuroimaging. To elucidate the unfolding of language production in the brain, we investigate with magnetoencephalography (MEG) and electroencephalography (EEG) the neurophysiological activity of 35 skilled typists, while they typed sentences on a keyboard. This approach confirms the hierarchical predictions of linguistic theories: the neural activity preceding the production of each word is marked by the sequential rise and fall of context-, word-, syllable-, and letter-level representations. Remarkably, each of these neural representations is maintained over long time periods within each level of the language hierarchy. This phenomenon results in a superposition of successive representations that is supported by a hierarchy of dynamic neural codes. Overall, these findings provide a precise computational breakdown of the neural dynamics that coordinate the production of language in the human brain." + }, + "2502.05161v1": { + "title": "Estimated Roadway Segment Traffic Data by Vehicle Class for the United States: A Machine Learning Approach", + "url": "http://arxiv.org/abs/2502.05161v1", + "authors": "Brittany Antonczak, Meg Fay, Aviral Chawla, Gregory Rowangould", + "update_time": "2025-02-07", + "abstract": "The Highway Performance Monitoring System, managed by the Federal Highway Administration, provides essential data on average annual daily traffic across U.S. roadways, but it has limited representation of medium- and heavy-duty vehicles on non-interstate roads. This gap limits research and policy analysis on the impacts of truck traffic, especially concerning air quality and public health. To address this, we use random forest regression to estimate medium- and heavy-duty vehicle traffic volumes in areas with sparse data. This results in a more comprehensive dataset, which enables the estimation of traffic density at the census block level as a proxy for traffic-related air pollution exposure. Our high-resolution spatial data products, rigorously validated, provide a more accurate representation of truck traffic and its environmental and health impacts. These datasets are valuable for transportation planning, public health research, and policy decisions aimed at mitigating the effects of truck traffic on vulnerable communities exposed to air pollution." + }, + "2502.04658v1": { + "title": "Shifting Attention to You: Personalized Brain-Inspired AI Models", + "url": "http://arxiv.org/abs/2502.04658v1", + "authors": "Stephen Chong Zhao, Yang Hu, Jason Lee, Andrew Bender, Trisha Mazumdar, Mark Wallace, David A. Tovar", + "update_time": "2025-02-07", + "abstract": "The integration of human and artificial intelligence represents a scientific opportunity to advance our understanding of information processing, as each system offers unique computational insights that can enhance and inform the other. The synthesis of human cognitive principles with artificial intelligence has the potential to produce more interpretable and functionally aligned computational models, while simultaneously providing a formal framework for investigating the neural mechanisms underlying perception, learning, and decision-making through systematic model comparisons and representational analyses. In this study, we introduce personalized brain-inspired modeling that integrates human behavioral embeddings and neural data to align with cognitive processes. We took a stepwise approach, fine-tuning the Contrastive Language-Image Pre-training (CLIP) model with large-scale behavioral decisions, group-level neural data, and finally, participant-level neural data within a broader framework that we have named CLIP-Human-Based Analysis (CLIP-HBA). We found that fine-tuning on behavioral data enhances its ability to predict human similarity judgments while indirectly aligning it with dynamic representations captured via MEG. To further gain mechanistic insights into the temporal evolution of cognitive processes, we introduced a model specifically fine-tuned on millisecond-level MEG neural dynamics (CLIP-HBA-MEG). This model resulted in enhanced temporal alignment with human neural processing while still showing improvement on behavioral alignment. Finally, we trained individualized models on participant-specific neural data, effectively capturing individualized neural dynamics and highlighting the potential for personalized AI systems. These personalized systems have far-reaching implications for the fields of medicine, cognitive research, human-computer interfaces, and AI development." + }, + "2502.04258v1": { + "title": "Detecting Mild Traumatic Brain Injury with MEG Scan Data: One-vs-K-Sample Tests", + "url": "http://arxiv.org/abs/2502.04258v1", + "authors": "Jian Zhang, Gary Green", + "update_time": "2025-02-06", + "abstract": "Magnetoencephalography (MEG) scanner has been shown to be more accurate than other medical devices in detecting mild traumatic brain injury (mTBI). However, MEG scan data in certain spectrum ranges can be skewed, multimodal and heterogeneous which can mislead the conventional case-control analysis that requires the data to be homogeneous and normally distributed within the control group. To meet this challenge, we propose a flexible one-vs-K-sample testing procedure for detecting brain injury for a single-case versus heterogeneous controls. The new procedure begins with source magnitude imaging using MEG scan data in frequency domain, followed by region-wise contrast tests for abnormality between the case and controls. The critical values for these tests are automatically determined by cross-validation. We adjust the testing results for heterogeneity effects by similarity analysis. An asymptotic theory is established for the proposed test statistic. By simulated and real data analyses in the context of neurotrauma, we show that the proposed test outperforms commonly used nonparametric methods in terms of overall accuracy and ability in accommodating data non-normality and subject-heterogeneity." + }, + "2501.18837v1": { + "title": "Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming", + "url": "http://arxiv.org/abs/2501.18837v1", + "authors": "Mrinank Sharma, Meg Tong, Jesse Mu, Jerry Wei, Jorrit Kruthoff, Scott Goodfriend, Euan Ong, Alwin Peng, Raj Agarwal, Cem Anil, Amanda Askell, Nathan Bailey, Joe Benton, Emma Bluemke, Samuel R. Bowman, Eric Christiansen, Hoagy Cunningham, Andy Dau, Anjali Gopal, Rob Gilson, Logan Graham, Logan Howard, Nimit Kalra, Taesung Lee, Kevin Lin, Peter Lofgren, Francesco Mosconi, Clare O'Hara, Catherine Olsson, Linda Petrini, Samir Rajani, Nikhil Saxena, Alex Silverstein, Tanya Singh, Theodore Sumers, Leonard Tang, Kevin K. Troy, Constantin Weisser, Ruiqi Zhong, Giulio Zhou, Jan Leike, Jared Kaplan, Ethan Perez", + "update_time": "2025-01-31", + "abstract": "Large language models (LLMs) are vulnerable to universal jailbreaks-prompting strategies that systematically bypass model safeguards and enable users to carry out harmful processes that require many model interactions, like manufacturing illegal substances at scale. To defend against these attacks, we introduce Constitutional Classifiers: safeguards trained on synthetic data, generated by prompting LLMs with natural language rules (i.e., a constitution) specifying permitted and restricted content. In over 3,000 estimated hours of red teaming, no red teamer found a universal jailbreak that could extract information from an early classifier-guarded LLM at a similar level of detail to an unguarded model across most target queries. On automated evaluations, enhanced classifiers demonstrated robust defense against held-out domain-specific jailbreaks. These classifiers also maintain deployment viability, with an absolute 0.38% increase in production-traffic refusals and a 23.7% inference overhead. Our work demonstrates that defending against universal jailbreaks while maintaining practical deployment viability is tractable." + }, + "2501.17299v1": { + "title": "\"Ownership, Not Just Happy Talk\": Co-Designing a Participatory Large Language Model for Journalism", + "url": "http://arxiv.org/abs/2501.17299v1", + "authors": "Emily Tseng, Meg Young, Marianne Aubin Le Qu\u00e9r\u00e9, Aimee Rinehart, Harini Suresh", + "update_time": "2025-01-28", + "abstract": "Journalism has emerged as an essential domain for understanding the uses, limitations, and impacts of large language models (LLMs) in the workplace. News organizations face divergent financial incentives: LLMs already permeate newswork processes within financially constrained organizations, even as ongoing legal challenges assert that AI companies violate their copyright. At stake are key questions about what LLMs are created to do, and by whom: How might a journalist-led LLM work, and what can participatory design illuminate about the present-day challenges about adapting ``one-size-fits-all'' foundation models to a given context of use? In this paper, we undertake a co-design exploration to understand how a participatory approach to LLMs might address opportunities and challenges around AI in journalism. Our 20 interviews with reporters, data journalists, editors, labor organizers, product leads, and executives highlight macro, meso, and micro tensions that designing for this opportunity space must address. From these desiderata, we describe the result of our co-design work: organizational structures and functionality for a journalist-controlled LLM. In closing, we discuss the limitations of commercial foundation models for workplace use, and the methodological implications of applying participatory methods to LLM co-design." + }, + "2501.15664v1": { + "title": "The Advanced Muon Facility: a proposed multi-purpose muon facility at Fermilab", + "url": "http://arxiv.org/abs/2501.15664v1", + "authors": "Sophie Middleton", + "update_time": "2025-01-26", + "abstract": "Charged lepton flavor violation (CLFV) is expected in a diverse set of new physics scenarios. The current generation of experiments probe CLFV in the muon sector in three complementary channels: $\\mu^-N \\rightarrow e^- N$ (Mu2e, COMET), $\\mu^+ \\rightarrow e^+ \\gamma$ (MEG-II), and $\\mu^+ \\rightarrow e^+e^+e^-$s (Mu3e). These experiments aim to enhance existing limits by several orders-of-magnitude in the coming decade and offer discovery potential to many new physics models. The proposed Advanced Muon Facility (AMF) would be a multi-purpose muon facility based at Fermilab and introduces an innovative approach based on a muon storage ring to enable a full suite of muon CLFV experiments. AMF would host CLFV experiments with sensitivities orders-of-magnitude beyond the present era. In the event of a signal in these currently planned experiments, AMF would enable additional measurements to elucidate the nature of the new physics observed. The design and R$\\&$D for AMF is in its infancy. This article outlines the motivations for AMF, detailing on-going R$\\&$D efforts, and highlighting potential synergies with the proposed muon collider." + }, + "2501.15322v2": { + "title": "Scaling laws for decoding images from brain activity", + "url": "http://arxiv.org/abs/2501.15322v2", + "authors": "Hubert Banville, Yohann Benchetrit, St\u00e9phane d'Ascoli, J\u00e9r\u00e9my Rapin, Jean-R\u00e9mi King", + "update_time": "2025-01-28", + "abstract": "Generative AI has recently propelled the decoding of images from brain activity. How do these approaches scale with the amount and type of neural recordings? Here, we systematically compare image decoding from four types of non-invasive devices: electroencephalography (EEG), magnetoencephalography (MEG), high-field functional Magnetic Resonance Imaging (3T fMRI) and ultra-high field (7T) fMRI. For this, we evaluate decoding models on the largest benchmark to date, encompassing 8 public datasets, 84 volunteers, 498 hours of brain recording and 2.3 million brain responses to natural images. Unlike previous work, we focus on single-trial decoding performance to simulate real-time settings. This systematic comparison reveals three main findings. First, the most precise neuroimaging devices tend to yield the best decoding performances, when the size of the training sets are similar. However, the gain enabled by deep learning - in comparison to linear models - is obtained with the noisiest devices. Second, we do not observe any plateau of decoding performance as the amount of training data increases. Rather, decoding performance scales log-linearly with the amount of brain recording. Third, this scaling law primarily depends on the amount of data per subject. However, little decoding gain is observed by increasing the number of subjects. Overall, these findings delineate the path most suitable to scale the decoding of images from non-invasive brain recordings." + }, + "2501.12184v1": { + "title": "Probing Type II Seesaw Leptogenesis Through Lepton Flavor Violation", + "url": "http://arxiv.org/abs/2501.12184v1", + "authors": "Chengcheng Han, Yijun Han, Sihui Huang, Zhanhong Lei", + "update_time": "2025-01-21", + "abstract": "Lepton flavor violation (LFV) offers a powerful probe of physics beyond the Standard Model, particularly in models addressing neutrino masses and the baryon asymmetry of the universe. In this study, we investigate LFV processes within the framework of type II seesaw leptogenesis, where the Standard Model is extended by an $SU(2)_L$ triplet Higgs field. We focus on key LFV processes including $\\mu^+\\to e^+\\gamma$, $\\mu^+ \\to e^+e^-e^+$, and $\\mu \\rightarrow e$ conversion in nuclei, deriving stringent constraints on the parameter space from current experimental data. We scan the 3$\\sigma$ range of neutrino oscillation parameters and identify the most conservative bounds consistent with existing measurements. Our results reveal that the MEG experiment currently provides the strongest constraints in the normal ordering (NO) scenario, while the SINDRUM experiment offers comparable sensitivity in the inverted ordering (IO) case. Future experiments, such as MEG II, Mu3e, Mu2e, and COMET, are predicted to significantly improve the sensitivity, testing larger regions of the parameter space. This work underscores the crucial role of LFV experiments in probing type II seesaw leptogenesis, providing an avenue to explore the connections between neutrino mass generation, baryogenesis, and inflation at experimentally accessible energy scales." + }, + "2501.11566v1": { + "title": "Artificial Neural Networks for Magnetoencephalography: A review of an emerging field", + "url": "http://arxiv.org/abs/2501.11566v1", + "authors": "Arthur Dehgan, Hamza Abdelhedi, Vanessa Hadid, Irina Rish, Karim Jerbi", + "update_time": "2025-01-20", + "abstract": "Magnetoencephalography (MEG) is a cutting-edge neuroimaging technique that measures the intricate brain dynamics underlying cognitive processes with an unparalleled combination of high temporal and spatial precision. MEG data analytics has always relied on advanced signal processing and mathematical and statistical tools for various tasks ranging from data cleaning to probing the signals' rich dynamics and estimating the neural sources underlying the surface-level recordings. Like in most domains, the surge in Artificial Intelligence (AI) has led to the increased use of Machine Learning (ML) methods for MEG data classification. More recently, an emerging trend in this field is using Artificial Neural Networks (ANNs) to address many MEG-related tasks. This review provides a comprehensive overview of how ANNs are being used with MEG data from three vantage points: First, we review work that employs ANNs for MEG signal classification, i.e., for brain decoding. Second, we report on work that has used ANNs as putative models of information processing in the human brain. Finally, we examine studies that use ANNs as techniques to tackle methodological questions in MEG, including artifact correction and source estimation. Furthermore, we assess the current strengths and limitations of using ANNs with MEG and discuss future challenges and opportunities in this field. Finally, by establishing a detailed portrait of the field and providing practical recommendations for the future, this review seeks to provide a helpful reference for both seasoned MEG researchers and newcomers to the field who are interested in using ANNs to enhance the exploration of the complex dynamics of the human brain with MEG." + } + }, + "neuroAI": { + "2501.02402v1": { + "title": "Asynchronous Hebbian/anti-Hebbian networks", + "url": "http://arxiv.org/abs/2501.02402v1", + "authors": "Henrique Reis Aguiar, Matthias H. Hennig", + "update_time": "2025-01-04", + "abstract": "Lateral inhibition models coupled with Hebbian plasticity have been shown to learn factorised causal representations of input stimuli, for instance, oriented edges are learned from natural images. Currently, these models require the recurrent dynamics to settle into a stable state before weight changes can be applied, which is not only biologically implausible, but also impractical for real-time learning systems. Here, we propose a new Hebbian learning rule which is implemented using plausible biological mechanisms that have been observed experimentally. We find that this rule allows for efficient, time-continuous learning of factorised representations, very similar to the classic noncontinuous Hebbian/anti-Hebbian learning. Furthermore, we show that this rule naturally prevents catastrophic forgetting when stimuli from different distributions are shown sequentially.", + "code_url": "https://github.com/henri-edinb/async_learning" + }, + "2411.18526v1": { + "title": "NeuroAI for AI Safety", + "url": "http://arxiv.org/abs/2411.18526v1", + "authors": "Patrick Mineault, Niccol\u00f2 Zanichelli, Joanne Zichen Peng, Anton Arkhipov, Eli Bingham, Julian Jara-Ettinger, Emily Mackevicius, Adam Marblestone, Marcelo Mattar, Andrew Payne, Sophia Sanborn, Karen Schroeder, Zenna Tavares, Andreas Tolias", + "update_time": "2024-11-27", + "abstract": "As AI systems become increasingly powerful, the need for safe AI has become more pressing. Humans are an attractive model for AI safety: as the only known agents capable of general intelligence, they perform robustly even under conditions that deviate significantly from prior experiences, explore the world safely, understand pragmatics, and can cooperate to meet their intrinsic goals. Intelligence, when coupled with cooperation and safety mechanisms, can drive sustained progress and well-being. These properties are a function of the architecture of the brain and the learning algorithms it implements. Neuroscience may thus hold important keys to technical AI safety that are currently underexplored and underutilized. In this roadmap, we highlight and critically evaluate several paths toward AI safety inspired by neuroscience: emulating the brain's representations, information processing, and architecture; building robust sensory and motor systems from imitating brain data and bodies; fine-tuning AI systems on brain data; advancing interpretability using neuroscience methods; and scaling up cognitively-inspired architectures. We make several concrete recommendations for how neuroscience can positively impact AI safety." + }, + "2411.14633v1": { + "title": "Evaluating Representational Similarity Measures from the Lens of Functional Correspondence", + "url": "http://arxiv.org/abs/2411.14633v1", + "authors": "Yiqing Bo, Ansh Soni, Sudhanshu Srivastava, Meenakshi Khosla", + "update_time": "2024-11-21", + "abstract": "Neuroscience and artificial intelligence (AI) both face the challenge of interpreting high-dimensional neural data, where the comparative analysis of such data is crucial for revealing shared mechanisms and differences between these complex systems. Despite the widespread use of representational comparisons and the abundance classes of comparison methods, a critical question remains: which metrics are most suitable for these comparisons? While some studies evaluate metrics based on their ability to differentiate models of different origins or constructions (e.g., various architectures), another approach is to assess how well they distinguish models that exhibit distinct behaviors. To investigate this, we examine the degree of alignment between various representational similarity measures and behavioral outcomes, employing group statistics and a comprehensive suite of behavioral metrics for comparison. In our evaluation of eight commonly used representational similarity metrics in the visual domain -- spanning alignment-based, Canonical Correlation Analysis (CCA)-based, inner product kernel-based, and nearest-neighbor methods -- we found that metrics like linear Centered Kernel Alignment (CKA) and Procrustes distance, which emphasize the overall geometric structure or shape of representations, excelled in differentiating trained from untrained models and aligning with behavioral measures, whereas metrics such as linear predictivity, commonly used in neuroscience, demonstrated only moderate alignment with behavior. These insights are crucial for selecting metrics that emphasize behaviorally meaningful comparisons in NeuroAI research." + }, + "2410.19315v1": { + "title": "A prescriptive theory for brain-like inference", + "url": "http://arxiv.org/abs/2410.19315v1", + "authors": "Hadi Vafaii, Dekel Galor, Jacob L. Yates", + "update_time": "2024-10-25", + "abstract": "The Evidence Lower Bound (ELBO) is a widely used objective for training deep generative models, such as Variational Autoencoders (VAEs). In the neuroscience literature, an identical objective is known as the variational free energy, hinting at a potential unified framework for brain function and machine learning. Despite its utility in interpreting generative models, including diffusion models, ELBO maximization is often seen as too broad to offer prescriptive guidance for specific architectures in neuroscience or machine learning. In this work, we show that maximizing ELBO under Poisson assumptions for general sequence data leads to a spiking neural network that performs Bayesian posterior inference through its membrane potential dynamics. The resulting model, the iterative Poisson VAE (iP-VAE), has a closer connection to biological neurons than previous brain-inspired predictive coding models based on Gaussian assumptions. Compared to amortized and iterative VAEs, iP-VAElearns sparser representations and exhibits superior generalization to out-of-distribution samples. These findings suggest that optimizing ELBO, combined with Poisson assumptions, provides a solid foundation for developing prescriptive theories in NeuroAI." + }, + "2409.05771v1": { + "title": "Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models", + "url": "http://arxiv.org/abs/2409.05771v1", + "authors": "Emily Cheng, Richard J. Antonello", + "update_time": "2024-09-09", + "abstract": "Research has repeatedly demonstrated that intermediate hidden states extracted from large language models are able to predict measured brain response to natural language stimuli. Yet, very little is known about the representation properties that enable this high prediction performance. Why is it the intermediate layers, and not the output layers, that are most capable for this unique and highly general transfer task? In this work, we show that evidence from language encoding models in fMRI supports the existence of a two-phase abstraction process within LLMs. We use manifold learning methods to show that this abstraction process naturally arises over the course of training a language model and that the first \"composition\" phase of this abstraction process is compressed into fewer layers as training continues. Finally, we demonstrate a strong correspondence between layerwise encoding performance and the intrinsic dimensionality of representations from LLMs. We give initial evidence that this correspondence primarily derives from the inherent compositionality of LLMs and not their next-word prediction properties." + }, + "2407.04117v2": { + "title": "Predictive Coding Networks and Inference Learning: Tutorial and Survey", + "url": "http://arxiv.org/abs/2407.04117v2", + "authors": "Bj\u00f6rn van Zwol, Ro Jefferson, Egon L. van den Broek", + "update_time": "2024-07-22", + "abstract": "Recent years have witnessed a growing call for renewed emphasis on neuroscience-inspired approaches in artificial intelligence research, under the banner of NeuroAI. A prime example of this is predictive coding networks (PCNs), based on the neuroscientific framework of predictive coding. This framework views the brain as a hierarchical Bayesian inference model that minimizes prediction errors through feedback connections. Unlike traditional neural networks trained with backpropagation (BP), PCNs utilize inference learning (IL), a more biologically plausible algorithm that explains patterns of neural activity that BP cannot. Historically, IL has been more computationally intensive, but recent advancements have demonstrated that it can achieve higher efficiency than BP with sufficient parallelization. Furthermore, PCNs can be mathematically considered a superset of traditional feedforward neural networks (FNNs), significantly extending the range of trainable architectures. As inherently probabilistic (graphical) latent variable models, PCNs provide a versatile framework for both supervised learning and unsupervised (generative) modeling that goes beyond traditional artificial neural networks. This work provides a comprehensive review and detailed formal specification of PCNs, particularly situating them within the context of modern ML methods. Additionally, we introduce a Python library (PRECO) for practical implementation. This positions PC as a promising framework for future ML innovations." + }, + "2306.10168v3": { + "title": "Beyond Geometry: Comparing the Temporal Structure of Computation in Neural Circuits with Dynamical Similarity Analysis", + "url": "http://arxiv.org/abs/2306.10168v3", + "authors": "Mitchell Ostrow, Adam Eisen, Leo Kozachkov, Ila Fiete", + "update_time": "2023-10-29", + "abstract": "How can we tell whether two neural networks utilize the same internal processes for a particular computation? This question is pertinent for multiple subfields of neuroscience and machine learning, including neuroAI, mechanistic interpretability, and brain-machine interfaces. Standard approaches for comparing neural networks focus on the spatial geometry of latent states. Yet in recurrent networks, computations are implemented at the level of dynamics, and two networks performing the same computation with equivalent dynamics need not exhibit the same geometry. To bridge this gap, we introduce a novel similarity metric that compares two systems at the level of their dynamics, called Dynamical Similarity Analysis (DSA). Our method incorporates two components: Using recent advances in data-driven dynamical systems theory, we learn a high-dimensional linear system that accurately captures core features of the original nonlinear dynamics. Next, we compare different systems passed through this embedding using a novel extension of Procrustes Analysis that accounts for how vector fields change under orthogonal transformation. In four case studies, we demonstrate that our method disentangles conjugate and non-conjugate recurrent neural networks (RNNs), while geometric methods fall short. We additionally show that our method can distinguish learning rules in an unsupervised manner. Our method opens the door to comparative analyses of the essential temporal structure of computation in neural circuits.", + "code_url": "https://github.com/mitchellostrow/dsa" + }, + "2305.11275v2": { + "title": "Explaining V1 Properties with a Biologically Constrained Deep Learning Architecture", + "url": "http://arxiv.org/abs/2305.11275v2", + "authors": "Galen Pogoncheff, Jacob Granley, Michael Beyeler", + "update_time": "2023-05-25", + "abstract": "Convolutional neural networks (CNNs) have recently emerged as promising models of the ventral visual stream, despite their lack of biological specificity. While current state-of-the-art models of the primary visual cortex (V1) have surfaced from training with adversarial examples and extensively augmented data, these models are still unable to explain key neural properties observed in V1 that arise from biological circuitry. To address this gap, we systematically incorporated neuroscience-derived architectural components into CNNs to identify a set of mechanisms and architectures that comprehensively explain neural activity in V1. We show drastic improvements in model-V1 alignment driven by the integration of architectural components that simulate center-surround antagonism, local receptive fields, tuned normalization, and cortical magnification. Upon enhancing task-driven CNNs with a collection of these specialized components, we uncover models with latent representations that yield state-of-the-art explanation of V1 neural activity and tuning properties. Our results highlight an important advancement in the field of NeuroAI, as we systematically establish a set of architectural components that contribute to unprecedented explanation of V1. The neuroscience insights that could be gleaned from increasingly accurate in-silico models of the brain have the potential to greatly advance the fields of both neuroscience and artificial intelligence." + }, + "2302.07243v4": { + "title": "A Deep Probabilistic Spatiotemporal Framework for Dynamic Graph Representation Learning with Application to Brain Disorder Identification", + "url": "http://arxiv.org/abs/2302.07243v4", + "authors": "Sin-Yee Yap, Junn Yong Loo, Chee-Ming Ting, Fuad Noman, Raphael C. -W. Phan, Adeel Razi, David L. Dowe", + "update_time": "2024-11-09", + "abstract": "Recent applications of pattern recognition techniques on brain connectome classification using functional connectivity (FC) are shifting towards acknowledging the non-Euclidean topology and dynamic aspects of brain connectivity across time. In this paper, a deep spatiotemporal variational Bayes (DSVB) framework is proposed to learn time-varying topological structures in dynamic FC networks for identifying autism spectrum disorder (ASD) in human participants. The framework incorporates a spatial-aware recurrent neural network with an attention-based message passing scheme to capture rich spatiotemporal patterns across dynamic FC networks. To overcome model overfitting on limited training datasets, an adversarial training strategy is introduced to learn graph embedding models that generalize well to unseen brain networks. Evaluation on the ABIDE resting-state functional magnetic resonance imaging dataset shows that our proposed framework substantially outperforms state-of-the-art methods in identifying patients with ASD. Dynamic FC analyses with DSVB-learned embeddings reveal apparent group differences between ASD and healthy controls in brain network connectivity patterns and switching dynamics of brain states. The code is available at https://github.com/Monash-NeuroAI/Deep-Spatiotemporal-Variational-Bayes.", + "code_url": "https://github.com/Monash-NeuroAI/Deep-Spatiotemporal-Variational-Bayes" + }, + "2301.09245v2": { + "title": "Towards NeuroAI: Introducing Neuronal Diversity into Artificial Neural Networks", + "url": "http://arxiv.org/abs/2301.09245v2", + "authors": "Feng-Lei Fan, Yingxin Li, Hanchuan Peng, Tieyong Zeng, Fei Wang", + "update_time": "2023-03-11", + "abstract": "Throughout history, the development of artificial intelligence, particularly artificial neural networks, has been open to and constantly inspired by the increasingly deepened understanding of the brain, such as the inspiration of neocognitron, which is the pioneering work of convolutional neural networks. Per the motives of the emerging field: NeuroAI, a great amount of neuroscience knowledge can help catalyze the next generation of AI by endowing a network with more powerful capabilities. As we know, the human brain has numerous morphologically and functionally different neurons, while artificial neural networks are almost exclusively built on a single neuron type. In the human brain, neuronal diversity is an enabling factor for all kinds of biological intelligent behaviors. Since an artificial network is a miniature of the human brain, introducing neuronal diversity should be valuable in terms of addressing those essential problems of artificial networks such as efficiency, interpretability, and memory. In this Primer, we first discuss the preliminaries of biological neuronal diversity and the characteristics of information transmission and processing in a biological neuron. Then, we review studies of designing new neurons for artificial networks. Next, we discuss what gains can neuronal diversity bring into artificial networks and exemplary applications in several important fields. Lastly, we discuss the challenges and future directions of neuronal diversity to explore the potential of NeuroAI." + } + }, + "medical": { + "2502.15439v1": { + "title": "Modeling Infectious Diseases: From SIR Models to Diffusion-Based Approaches and Numerical Solutions", + "url": "http://arxiv.org/abs/2502.15439v1", + "authors": "Ayesha Baig, Li Zhouxin", + "update_time": "2025-02-21", + "abstract": "As global living standards improve and medical technology advances, many infectious diseases have been effectively controlled. However, certain diseases, such as the recent COVID-19 pandemic, continue to pose significant threats to public health. This paper explores the evolution of infectious disease modeling, from early ordinary differential equation-based models like the SIR framework to more complex reaction-diffusion models that incorporate both temporal and spatial dynamics. The study highlights the importance of numerical methods, such as the Runge-Kutta method, implicit-explicit time-discretization techniques, and finite difference methods, in solving these models. By analyzing the development and application of these methods, this research underscores their critical role in predicting disease spread, informing public health strategies, and mitigating the impact of future pandemics." + }, + "2502.15418v1": { + "title": "MHQA: A Diverse, Knowledge Intensive Mental Health Question Answering Challenge for Language Models", + "url": "http://arxiv.org/abs/2502.15418v1", + "authors": "Suraj Racha, Prashant Joshi, Anshika Raman, Nikita Jangid, Mridul Sharma, Ganesh Ramakrishnan, Nirmal Punjabi", + "update_time": "2025-02-21", + "abstract": "Mental health remains a challenging problem all over the world, with issues like depression, anxiety becoming increasingly common. Large Language Models (LLMs) have seen a vast application in healthcare, specifically in answering medical questions. However, there is a lack of standard benchmarking datasets for question answering (QA) in mental health. Our work presents a novel multiple choice dataset, MHQA (Mental Health Question Answering), for benchmarking Language models (LMs). Previous mental health datasets have focused primarily on text classification into specific labels or disorders. MHQA, on the other hand, presents question-answering for mental health focused on four key domains: anxiety, depression, trauma, and obsessive/compulsive issues, with diverse question types, namely, factoid, diagnostic, prognostic, and preventive. We use PubMed abstracts as the primary source for QA. We develop a rigorous pipeline for LLM-based identification of information from abstracts based on various selection criteria and converting it into QA pairs. Further, valid QA pairs are extracted based on post-hoc validation criteria. Overall, our MHQA dataset consists of 2,475 expert-verified gold standard instances called MHQA-gold and ~56.1k pairs pseudo labeled using external medical references. We report F1 scores on different LLMs along with few-shot and supervised fine-tuning experiments, further discussing the insights for the scores." + }, + "2502.15405v1": { + "title": "A biomechanical comparison of concussion and head acceleration events in elite-level American football and rugby union", + "url": "http://arxiv.org/abs/2502.15405v1", + "authors": "Gregory Tierney", + "update_time": "2025-02-21", + "abstract": "Elite-level American football and rugby union are two high-contact sports with growing clinical and legal concerns over player safety, necessitating a comparative analysis. A biomechanical comparison of concussion and head acceleration events (HAEs) in elite-level American football and rugby union was undertaken. Rugby union players have a greater number of professional playing years and matches available in a season than their American football counterparts. Rugby union players have a greater number of concussions reported per match and a higher proportion of concussions occurring during training sessions, based on National Football League (NFL) and Rugby Football Union (RFU) injury reports. Preliminary findings indicate that rugby union forwards experience a higher incidence of HAEs per player match over lower and higher magnitude thresholds, than American football defensive players. Overall, elite-level rugby union appears less favourable than American football in in almost all metrics pertinent to concussion and HAE exposure in the biomechanical comparison undertaken. The findings highlight the critical importance of independence, scientific rigour, and transparency in future concussion and HAE biomechanics research and real-world implementation, ensuring the development of more effective mitigation strategies." + }, + "2502.15346v1": { + "title": "Drug-Target Interaction/Affinity Prediction: Deep Learning Models and Advances Review", + "url": "http://arxiv.org/abs/2502.15346v1", + "authors": "Ali Vefghi, Zahed Rahmati, Mohammad Akbari", + "update_time": "2025-02-21", + "abstract": "Drug discovery remains a slow and expensive process that involves many steps, from detecting the target structure to obtaining approval from the Food and Drug Administration (FDA), and is often riddled with safety concerns. Accurate prediction of how drugs interact with their targets and the development of new drugs by using better methods and technologies have immense potential to speed up this process, ultimately leading to faster delivery of life-saving medications. Traditional methods used for drug-target interaction prediction show limitations, particularly in capturing complex relationships between drugs and their targets. As an outcome, deep learning models have been presented to overcome the challenges of interaction prediction through their precise and efficient end results. By outlining promising research avenues and models, each with a different solution but similar to the problem, this paper aims to give researchers a better idea of methods for even more accurate and efficient prediction of drug-target interaction, ultimately accelerating the development of more effective drugs. A total of 180 prediction methods for drug-target interactions were analyzed throughout the period spanning 2016 to 2025 using different frameworks based on machine learning, mainly deep learning and graph neural networks. Additionally, this paper discusses the novelty, architecture, and input representation of these models." + }, + "2502.15204v1": { + "title": "Lung-DDPM: Semantic Layout-guided Diffusion Models for Thoracic CT Image Synthesis", + "url": "http://arxiv.org/abs/2502.15204v1", + "authors": "Yifan Jiang, Yannick Lemar\u00e9chal, Jos\u00e9e Bafaro, Jessica Abi-Rjeile, Philippe Joubert, Philippe Despr\u00e9s, Venkata Manem", + "update_time": "2025-02-21", + "abstract": "With the rapid development of artificial intelligence (AI), AI-assisted medical imaging analysis demonstrates remarkable performance in early lung cancer screening. However, the costly annotation process and privacy concerns limit the construction of large-scale medical datasets, hampering the further application of AI in healthcare. To address the data scarcity in lung cancer screening, we propose Lung-DDPM, a thoracic CT image synthesis approach that effectively generates high-fidelity 3D synthetic CT images, which prove helpful in downstream lung nodule segmentation tasks. Our method is based on semantic layout-guided denoising diffusion probabilistic models (DDPM), enabling anatomically reasonable, seamless, and consistent sample generation even from incomplete semantic layouts. Our results suggest that the proposed method outperforms other state-of-the-art (SOTA) generative models in image quality evaluation and downstream lung nodule segmentation tasks. Specifically, Lung-DDPM achieved superior performance on our large validation cohort, with a Fr\\'echet inception distance (FID) of 0.0047, maximum mean discrepancy (MMD) of 0.0070, and mean squared error (MSE) of 0.0024. These results were 7.4$\\times$, 3.1$\\times$, and 29.5$\\times$ better than the second-best competitors, respectively. Furthermore, the lung nodule segmentation model, trained on a dataset combining real and Lung-DDPM-generated synthetic samples, attained a dice coefficient (Dice) of 0.3914 and sensitivity of 0.4393. This represents 8.8\\% and 18.6\\% improvements in DICE and sensitivity compared to the model trained solely on real samples. The experimental results highlight Lung-DDPM's potential for a broader range of medical imaging applications, such as general tumor segmentation, cancer survival estimation, and risk prediction." + }, + "2502.15193v1": { + "title": "Image Translation-Based Unsupervised Cross-Modality Domain Adaptation for Medical Image Segmentation", + "url": "http://arxiv.org/abs/2502.15193v1", + "authors": "Tao Yang, Lisheng Wang", + "update_time": "2025-02-21", + "abstract": "Supervised deep learning usually faces more challenges in medical images than in natural images. Since annotations in medical images require the expertise of doctors and are more time-consuming and expensive. Thus, some researchers turn to unsupervised learning methods, which usually face inevitable performance drops. In addition, medical images may have been acquired at different medical centers with different scanners and under different image acquisition protocols, so the modalities of the medical images are often inconsistent. This modality difference (domain shift) also reduces the applicability of deep learning methods. In this regard, we propose an unsupervised crossmodality domain adaptation method based on image translation by transforming the source modality image with annotation into the unannotated target modality and using its annotation to achieve supervised learning of the target modality. In addition, the subtle differences between translated pseudo images and real images are overcome by self-training methods to further improve the task performance of deep learning. The proposed method showed mean Dice Similarity Coefficient (DSC) and Average Symmetric Surface Distance (ASSD) of $0.8351 \\pm 0.1152$ and $1.6712 \\pm 2.1948$ for vestibular schwannoma (VS), $0.8098 \\pm 0.0233$ and $0.2317 \\pm 0.1577$ for cochlea on the VS and cochlea segmentation task of the Cross-Modality Domain Adaptation (crossMoDA 2022) challenge validation phase leaderboard." + }, + "2502.15185v1": { + "title": "Key Body Posture Characteristics of Short-distance Speed Skaters at the Start Based on Artificial Intelligence", + "url": "http://arxiv.org/abs/2502.15185v1", + "authors": "Zhang Xueliana, Fang Yingjieb, Liu Hang", + "update_time": "2025-02-21", + "abstract": "Objective To conduct biomechanical analysis on the starting technique of male short-distance speed skating athletes in China and determine the key factors affecting the effectiveness of the starting movement. Methods 13 high-level male short-distance speed skating athletes were selected as the test subjects, and kinematic data were collected using an artificial intelligence video capture and analysis system. The body posture features and their effects on the starting movement performance were analyzed in the three stages of starting preparation, starting, and sprinting. Results The post-stability angle, anterior knee angle of the front leg, posterior knee angle of the rear leg, and stride length showed moderate to high positive correlations with the starting speed during the starting preparation stage. The trunk angle showed a high negative correlation with the starting speed. The trunk angle (TO4, TD4, TO6, TD6), hip angle (TO1, TO4, TO6), and knee angle (TD1) showed moderate to high negative correlations with the effectiveness of the starting movement during the starting and sprinting stages. The knee angle (TD2), ice-contact angle (TD2, TD4, TD5, TD6), and propulsion angle (TO1, TO4, TO7) showed moderate positive correlations with the effectiveness of the starting movement. Conclusion Stride length, left knee angle, and post-stability angle are the key factors affecting the starting speed. The larger the post-stability angle and left knee angle and the longer the stride length, the faster the starting speed. During the starting and sprinting stages, the smaller the ice-contact angle and propulsion angle, the greater the trunk angle and hip angle changes, the more effective the starting movement." + }, + "2502.15179v1": { + "title": "Nonlinear Dynamical Systems for Automatic Face Annotation in Head Tracking and Pose Estimation", + "url": "http://arxiv.org/abs/2502.15179v1", + "authors": "Thoa Thieu, Roderick Melnik", + "update_time": "2025-02-21", + "abstract": "Facial landmark tracking plays a vital role in applications such as facial recognition, expression analysis, and medical diagnostics. In this paper, we consider the performance of the Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF) in tracking 3D facial motion in both deterministic and stochastic settings. We first analyze a noise-free environment where the state transition is purely deterministic, demonstrating that UKF outperforms EKF by achieving lower mean squared error (MSE) due to its ability to capture higher-order nonlinearities. However, when stochastic noise is introduced, EKF exhibits superior robustness, maintaining lower mean square error (MSE) compared to UKF, which becomes more sensitive to measurement noise and occlusions. Our results highlight that UKF is preferable for high-precision applications in controlled environments, whereas EKF is better suited for real-world scenarios with unpredictable noise. These findings provide practical insights for selecting the appropriate filtering technique in 3D facial tracking applications, such as motion capture and facial recognition." + }, + "2502.15064v1": { + "title": "Pseudoinverse Diffusion Models for Generative CT Image Reconstruction from Low Dose Data", + "url": "http://arxiv.org/abs/2502.15064v1", + "authors": "Matthew Tivnan, Dufan Wu, Quanzheng Li", + "update_time": "2025-02-20", + "abstract": "Score-based diffusion models have significantly advanced generative deep learning for image processing. Measurement conditioned models have also been applied to inverse problems such as CT reconstruction. However, the conventional approach, culminating in white noise, often requires a high number of reverse process update steps and score function evaluations. To address this limitation, we propose an alternative forward process in score-based diffusion models that aligns with the noise characteristics of low-dose CT reconstructions, rather than converging to white noise. This method significantly reduces the number of required score function evaluations, enhancing efficiency and maintaining familiar noise textures for radiologists, Our approach not only accelerates the generative process but also retains CT noise correlations, a key aspect often criticized by clinicians for deep learning reconstructions. In this work, we rigorously define a matrix-controlled stochastic process for this purpose and validate it through computational experiments. Using a dataset from The Cancer Genome Atlas Liver Hepatocellular Carcinoma (TCGA-LIHC), we simulate low-dose CT measurements and train our model, comparing it with a baseline scalar diffusion process and conditional diffusion model. Our results demonstrate the superiority of our pseudoinverse diffusion model in terms of efficiency and the ability to produce high-quality reconstructions that are familiar in texture to medical professionals in a low number of score function evaluations. This advancement paves the way for more efficient and clinically practical diffusion models in medical imaging, particularly beneficial in scenarios demanding rapid reconstructions or lower radiation exposure." + }, + "2502.15060v1": { + "title": "Multi-Source Static CT with Adaptive Fluence Modulation to Minimize Hallucinations in Generative Reconstructions", + "url": "http://arxiv.org/abs/2502.15060v1", + "authors": "Matthew Tivnan, Amar Gupta, Kai Yang, Dufan Wu, Rajiv Gupta", + "update_time": "2025-02-20", + "abstract": "Multi-source static Computed Tomography (CT) systems have introduced novel opportunities for adaptive imaging techniques. This work presents an innovative method of fluence field modulation using spotlight collimators. These instruments block positive or negative fan angles of even and odd indexed sources, respectively. Spotlight collimators enable volume of interest imaging by increasing relative exposure for the overlapping views. To achieve high quality reconstructions from sparse-view low-dose data, we introduce a generative reconstruction algorithm called Langevin Posterior Sampling (LPS), which uses a score based diffusion prior and physics based likelihood model to sample a posterior random walk. We conduct simulation-based experiments of head CT imaging for stroke detection and we demonstrate that spotlight collimators can effectively reduce the standard deviation and worst-case scenario hallucinations in reconstructed images. Compared to uniform fluence, our approach shows a significant reduction in posterior standard deviation. This highlights the potential for spotlight collimators and generative reconstructions to improve image quality and diagnostic accuracy of multi-source static CT." + } + } +} \ No newline at end of file