From e2652c13901a0dd3e7afbe7eb22ea474a31bed27 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Fri, 5 Jul 2024 01:02:49 +0000 Subject: [PATCH] Update arXiv papers --- README.md | 266 ++++++++-------- data_store/papers_2024-07-05.json | 506 ++++++++++++++++++++++++++++++ 2 files changed, 639 insertions(+), 133 deletions(-) create mode 100644 data_store/papers_2024-07-05.json diff --git a/README.md b/README.md index e669520..32210c6 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -

Updated on 2024-07-04

+

Updated on 2024-07-05

Brain

@@ -14,6 +14,41 @@ +2024-07-03 +MHNet: Multi-view High-order Network for Diagnosing Neurodevelopmental Disorders Using Resting-state fMRI +Yueyang Li, Weiming Zeng, Wenhao Dong, Luhui Cai, Lei Wang, Hongyu Chen, Hongjie Yan, Lingbin Bian, Nizhuan Wang +Link +Background: Deep learning models have shown promise in diagnosing neurodevelopmental disorders (NDD) like ASD and ADHD. However, many models either use graph neural networks (GNN) to construct single-level brain functional networks (BFNs) or employ spatial convolution filtering for local information extraction from rs-fMRI data, often neglecting high-order features crucial for NDD classification. Methods: We introduce a Multi-view High-order Network (MHNet) to capture hierarchical and high-order features from multi-view BFNs derived from rs-fMRI data for NDD prediction. MHNet has two branches: the Euclidean Space Features Extraction (ESFE) module and the Non-Euclidean Space Features Extraction (Non-ESFE) module, followed by a Feature Fusion-based Classification (FFC) module for NDD identification. ESFE includes a Functional Connectivity Generation (FCG) module and a High-order Convolutional Neural Network (HCNN) module to extract local and high-order features from BFNs in Euclidean space. Non-ESFE comprises a Generic Internet-like Brain Hierarchical Network Generation (G-IBHN-G) module and a High-order Graph Neural Network (HGNN) module to capture topological and high-order features in non-Euclidean space. Results: Experiments on three public datasets show that MHNet outperforms state-of-the-art methods using both AAL1 and Brainnetome Atlas templates. Extensive ablation studies confirm the superiority of MHNet and the effectiveness of using multi-view fMRI information and high-order features. Our study also offers atlas options for constructing more sophisticated hierarchical networks and explains the association between key brain regions and NDD. Conclusion: MHNet leverages multi-view feature learning from both Euclidean and non-Euclidean spaces, incorporating high-order information from BFNs to enhance NDD classification performance. + + +2024-07-03 +EDPNet: An Efficient Dual Prototype Network for Motor Imagery EEG Decoding +Can Han, Chen Liu, Crystal Cai, Jun Wang, Dahong Qian +Link +Motor imagery electroencephalograph (MI-EEG) decoding plays a crucial role in developing motor imagery brain-computer interfaces (MI-BCIs). However, decoding intentions from MI remains challenging due to the inherent complexity of EEG signals relative to the small-sample size. In this paper, we propose an Efficient Dual Prototype Network (EDPNet) to enable accurate and fast MI decoding. EDPNet employs a lightweight adaptive spatial-spectral fusion module, which promotes more efficient information fusion between multiple EEG electrodes. Subsequently, a parameter-free multi-scale variance pooling module extracts more comprehensive temporal features. Furthermore, we introduce dual prototypical learning to optimize the feature space distribution and training process, thereby improving the model's generalization ability on small-sample MI datasets. Our experimental results show that the EDPNet outperforms state-of-the-art models with superior classification accuracy and kappa values (84.11% and 0.7881 for dataset BCI competition IV 2a, 86.65% and 0.7330 for dataset BCI competition IV 2b). Additionally, we use the BCI competition III IVa dataset with fewer training data to further validate the generalization ability of the proposed EDPNet. We also achieve superior performance with 82.03% classification accuracy. Benefiting from the lightweight parameters and superior decoding accuracy, our EDPNet shows great potential for MI-BCI applications. The code is publicly available at https://github.com/hancan16/EDPNet. + + +2024-07-03 +MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition +Yanjie Cui, Xiaohong Liu, Jing Liang, Yamin Fu +Link +Electroencephalography (EEG), a medical imaging technique that captures scalp electrical activity of brain structures via electrodes, has been widely used in affective computing. The spatial domain of EEG is rich in affective information.However, few of the existing studies have simultaneously analyzed EEG signals from multiple perspectives of geometric and anatomical structures in spatial domain. In this paper, we propose a multi-view Graph Transformer (MVGT) based on spatial relations, which integrates information from the temporal, frequency and spatial domains, including geometric and anatomical structures, so as to enhance the expressive power of the model comprehensively.We incorporate the spatial information of EEG channels into the model as encoding, thereby improving its ability to perceive the spatial structure of the channels. Meanwhile, experimental results based on publicly available datasets demonstrate that our proposed model outperforms state-of-the-art methods in recent years. In addition, the results also show that the MVGT could extract information from multiple domains and capture inter-channel relationships in EEG emotion recognition tasks effectively. + + +2024-07-03 +SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research +Meghal Dani, Muthu Jeyanthi Prakash, Zeynep Akata, Stefanie Liebe +Link +Large Language Models have shown promising results in their ability to encode general medical knowledge in standard medical question-answering datasets. However, their potential application in clinical practice requires evaluation in domain-specific tasks, where benchmarks are largely missing. In this study semioLLM, we test the ability of state-of-the-art LLMs (GPT-3.5, GPT-4, Mixtral 8x7B, and Qwen-72chat) to leverage their internal knowledge and reasoning for epilepsy diagnosis. Specifically, we obtain likelihood estimates linking unstructured text descriptions of seizures to seizure-generating brain regions, using an annotated clinical database containing 1269 entries. We evaluate the LLM's performance, confidence, reasoning, and citation abilities in comparison to clinical evaluation. Models achieve above-chance classification performance with prompt engineering significantly improving their outcome, with some models achieving close-to-clinical performance and reasoning. However, our analyses also reveal significant pitfalls with several models being overly confident while showing poor performance, as well as exhibiting citation errors and hallucinations. In summary, our work provides the first extensive benchmark comparing current SOTA LLMs in the medical domain of epilepsy and highlights their ability to leverage unstructured texts from patients' medical history to aid diagnostic processes in health care. + + +2024-07-02 +A Novel Approach to Image EEG Sleep Data for Improving Quality of Life in Patients Suffering From Brain Injuries Using DreamDiffusion +David Fahim, Joshveer Grewal, Ritvik Ellendula +Link +Those experiencing strokes, traumatic brain injuries, and drug complications can often end up hospitalized and diagnosed with coma or locked-in syndrome. Such mental impediments can permanently alter the neurological pathways in work and significantly decrease the quality of life (QoL). It is critical to translate brain signals into images to gain a deeper understanding of the thoughts of a comatose patient. Traditionally, brain signals collected by an EEG could only be translated into text, but with the novel method of an open-source model available on GitHub, DreamDiffusion can be used to convert brain waves into images directly. DreamDiffusion works by extracting features from EEG signals and then using the features to create images through StableDiffusion. Upon this, we made further improvements that could make StableDiffusion the forerunner technology in waves to media translation. In our study, we begin by modifying the existing DreamDiffusion codebase so that it does not require any prior setup, avoiding any confusing steps needed to run the model from GitHub. For many researchers, the incomplete setup process, errors in the existing code, and a lack of directions made it nearly impossible to run, not even considering the model's performance. We brought the code into Google Colab so users could run and evaluate problems cell-by-cell, eliminating the specific file and repository dependencies. We also provided the original training data file so users do not need to purchase the necessary computing power to train the model from the given dataset. The second change is utilizing the mutability of the code and optimizing the model so it can be used to generate images from other given inputs, such as sleep data. Additionally, the affordability of EEG technology allows for global dissemination and creates the opportunity for those who want to work on the shared DreamDiffusion model. + + 2024-07-02 AXIAL: Attention-based eXplainability for Interpretable Alzheimer's Localized Diagnosis using 2D CNNs on 3D MRI brain scans Gabriele Lozupone, Alessandro Bria, Francesco Fontanella, Claudio De Stefano @@ -48,41 +83,6 @@ Link Understanding the dynamics of electrical signals within neuronal assemblies is crucial to unraveling complex brain function. Despite recent advances in employing optically active nanostructures in transmembrane potential sensing, there remains room for improvement in terms of response time and sensitivity. Here, we report the development of such a nanosensor capable of detecting electric fields with a submillisecond response time at the single particle level. We achieve this by using ferroelectric nanocrystals doped with rare earth ions producing upconversion (UC). When such a nanocrystal experiences a variation of surrounding electric potential, its surface charge density changes, inducing electric polarization modifications that vary, via converse piezoelectric effect, the crystal field around the ions. The latter variation is finally converted into UC spectral changes, enabling optical detection of electric potential. To develop such a sensor, we synthesized erbium and ytterbium-doped barium titanate crystals of size $\approx160$ nm. We observed distinct changes in the UC spectrum when individual nanocrystals were subjected to an external field via a conductive AFM tip, with a response time of 100 $\mu$s. Furthermore, our sensor exhibits a sensitivity to electric fields of only 4.8 kV/cm/$\sqrt{\rm Hz}$, making possible the time-resolved detection of a neuron action potential. - -2024-07-02 -Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection -Zixing Li, Chao Yan, Zhen Lan, Dengqing Tang, Xiaojia Xiang, Han Zhou, Jun Lai -Link -Advanced cognition can be extracted from the human brain using brain-computer interfaces. Integrating these interfaces with computer vision techniques, which possess efficient feature extraction capabilities, can achieve more robust and accurate detection of dim targets in aerial images. However, existing target detection methods primarily concentrate on homogeneous data, lacking efficient and versatile processing capabilities for heterogeneous multimodal data. In this paper, we first build a brain-eye-computer based object detection system for aerial images under few-shot conditions. This system detects suspicious targets using region proposal networks, evokes the event-related potential (ERP) signal in electroencephalogram (EEG) through the eye-tracking-based slow serial visual presentation (ESSVP) paradigm, and constructs the EEG-image data pairs with eye movement data. Then, an adaptive modality balanced online knowledge distillation (AMBOKD) method is proposed to recognize dim objects with the EEG-image data. AMBOKD fuses EEG and image features using a multi-head attention module, establishing a new modality with comprehensive features. To enhance the performance and robust capability of the fusion modality, simultaneous training and mutual learning between modalities are enabled by end-to-end online knowledge distillation. During the learning process, an adaptive modality balancing module is proposed to ensure multimodal equilibrium by dynamically adjusting the weights of the importance and the training gradients across various modalities. The effectiveness and superiority of our method are demonstrated by comparing it with existing state-of-the-art methods. Additionally, experiments conducted on public datasets and system validations in real-world scenarios demonstrate the reliability and practicality of the proposed system and the designed method. - - -2024-07-02 -EIT-1M: One Million EEG-Image-Text Pairs for Human Visual-textual Recognition and More -Xu Zheng, Ling Wang, Kanghao Chen, Yuanhuiyi Lyu, Jiazhou Zhou, Lin Wang -Link -Recently, electroencephalography (EEG) signals have been actively incorporated to decode brain activity to visual or textual stimuli and achieve object recognition in multi-modal AI. Accordingly, endeavors have been focused on building EEG-based datasets from visual or textual single-modal stimuli. However, these datasets offer limited EEG epochs per category, and the complex semantics of stimuli presented to participants compromise their quality and fidelity in capturing precise brain activity. The study in neuroscience unveils that the relationship between visual and textual stimulus in EEG recordings provides valuable insights into the brain's ability to process and integrate multi-modal information simultaneously. Inspired by this, we propose a novel large-scale multi-modal dataset, named EIT-1M, with over 1 million EEG-image-text pairs. Our dataset is superior in its capacity of reflecting brain activities in simultaneously processing multi-modal information. To achieve this, we collected data pairs while participants viewed alternating sequences of visual-textual stimuli from 60K natural images and category-specific texts. Common semantic categories are also included to elicit better reactions from participants' brains. Meanwhile, response-based stimulus timing and repetition across blocks and sessions are included to ensure data diversity. To verify the effectiveness of EIT-1M, we provide an in-depth analysis of EEG data captured from multi-modal stimuli across different categories and participants, along with data quality scores for transparency. We demonstrate its validity on two tasks: 1) EEG recognition from visual or textual stimuli or both and 2) EEG-to-visual generation. - - -2024-07-01 -Empirical Determination of Baseball Eras: Multivariate Changepoint Analysis in Major League Baseball -Mena CR Whalen, Gregory J Matthews, Brain M Mills -Link -We use multivariate change point analysis methods, to identify not only mean shifts but also changes in variance across a wide array of statistical time series. Our primary objective is to empirically discern distinct eras in the evolution of baseball, shedding light on significant transformations in team performance and management strategies. We leverage a rich dataset comprising baseball statistics from the late 1800s to 2020, spanning over a century of the sport's history. Results confirm previous historical research, pinpointing well-known baseball eras, such as the Dead Ball Era, Integration Era, Steroid Era, and Post-Steroid Era. Moreover, the study delves into the detection of substantial changes in team performance, effectively identifying periods of both dynasties and collapses within a team's history. The multivariate change point analysis proves to be a valuable tool for understanding the intricate dynamics of baseball's evolution. The method offers a data-driven approach to unveil structural shifts in the sport's historical landscape, providing fresh insights into the impact of rule changes, player strategies, and external factors on baseball's evolution. This not only enhances our comprehension of baseball, showing more robust identification of eras than past univariate time series work, but also showcases the broader applicability of multivariate change point analysis in the domain of sports research and beyond. - - -2024-07-01 -Domain Influence in MRI Medical Image Segmentation: spatial versus k-space inputs -Erik Gösche, Reza Eghbali, Florian Knoll, Andreas Rauschecker -Link -Transformer-based networks applied to image patches have achieved cutting-edge performance in many vision tasks. However, lacking the built-in bias of convolutional neural networks (CNN) for local image statistics, they require large datasets and modifications to capture relationships between patches, especially in segmentation tasks. Images in the frequency domain might be more suitable for the attention mechanism, as local features are represented globally. By transforming images into the frequency domain, local features are represented globally. Due to MRI data acquisition properties, these images are particularly suitable. This work investigates how the image domain (spatial or k-space) affects segmentation results of deep learning (DL) models, focusing on attention-based networks and other non-convolutional models based on MLPs. We also examine the necessity of additional positional encoding for Transformer-based networks when input images are in the frequency domain. For evaluation, we pose a skull stripping task and a brain tissue segmentation task. The attention-based models used are PerceiverIO and a vanilla Transformer encoder. To compare with non-attention-based models, an MLP and ResMLP are also trained and tested. Results are compared with the Swin-Unet, the state-of-the-art medical image segmentation model. Experimental results show that using k-space for the input domain can significantly improve segmentation results. Also, additional positional encoding does not seem beneficial for attention-based networks if the input is in the frequency domain. Although none of the models matched the Swin-Unet's performance, the less complex models showed promising improvements with a different domain choice. - - -2024-07-01 -Statistical signatures of abstraction in deep neural networks -Carlo Orientale Caputo, Matteo Marsili -Link -We study how abstract representations emerge in a Deep Belief Network (DBN) trained on benchmark datasets. Our analysis targets the principles of learning in the early stages of information processing, starting from the "primordial soup" of the under-sampling regime. As the data is processed by deeper and deeper layers, features are detected and removed, transferring more and more "context-invariant" information to deeper layers. We show that the representation approaches an universal model -- the Hierarchical Feature Model (HFM) -- determined by the principle of maximal relevance. Relevance quantifies the uncertainty on the model of the data, thus suggesting that "meaning" -- i.e. syntactic information -- is that part of the data which is not yet captured by a model. Our analysis shows that shallow layers are well described by pairwise Ising models, which provide a representation of the data in terms of generic, low order features. We also show that plasticity increases with depth, in a similar way as it does in the brain. These findings suggest that DBNs are capable of extracting a hierarchy of features from the data which is consistent with the principle of maximal relevance. - @@ -100,6 +100,34 @@ +2024-07-03 +EDPNet: An Efficient Dual Prototype Network for Motor Imagery EEG Decoding +Can Han, Chen Liu, Crystal Cai, Jun Wang, Dahong Qian +Link +Motor imagery electroencephalograph (MI-EEG) decoding plays a crucial role in developing motor imagery brain-computer interfaces (MI-BCIs). However, decoding intentions from MI remains challenging due to the inherent complexity of EEG signals relative to the small-sample size. In this paper, we propose an Efficient Dual Prototype Network (EDPNet) to enable accurate and fast MI decoding. EDPNet employs a lightweight adaptive spatial-spectral fusion module, which promotes more efficient information fusion between multiple EEG electrodes. Subsequently, a parameter-free multi-scale variance pooling module extracts more comprehensive temporal features. Furthermore, we introduce dual prototypical learning to optimize the feature space distribution and training process, thereby improving the model's generalization ability on small-sample MI datasets. Our experimental results show that the EDPNet outperforms state-of-the-art models with superior classification accuracy and kappa values (84.11% and 0.7881 for dataset BCI competition IV 2a, 86.65% and 0.7330 for dataset BCI competition IV 2b). Additionally, we use the BCI competition III IVa dataset with fewer training data to further validate the generalization ability of the proposed EDPNet. We also achieve superior performance with 82.03% classification accuracy. Benefiting from the lightweight parameters and superior decoding accuracy, our EDPNet shows great potential for MI-BCI applications. The code is publicly available at https://github.com/hancan16/EDPNet. + + +2024-07-03 +MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition +Yanjie Cui, Xiaohong Liu, Jing Liang, Yamin Fu +Link +Electroencephalography (EEG), a medical imaging technique that captures scalp electrical activity of brain structures via electrodes, has been widely used in affective computing. The spatial domain of EEG is rich in affective information.However, few of the existing studies have simultaneously analyzed EEG signals from multiple perspectives of geometric and anatomical structures in spatial domain. In this paper, we propose a multi-view Graph Transformer (MVGT) based on spatial relations, which integrates information from the temporal, frequency and spatial domains, including geometric and anatomical structures, so as to enhance the expressive power of the model comprehensively.We incorporate the spatial information of EEG channels into the model as encoding, thereby improving its ability to perceive the spatial structure of the channels. Meanwhile, experimental results based on publicly available datasets demonstrate that our proposed model outperforms state-of-the-art methods in recent years. In addition, the results also show that the MVGT could extract information from multiple domains and capture inter-channel relationships in EEG emotion recognition tasks effectively. + + +2024-07-03 +Spatio-Temporal Adaptive Diffusion Models for EEG Super-Resolution in Epilepsy Diagnosis +Tong Zhou, Shuqiang Wang +Link +Electroencephalogram (EEG) technology, particularly high-density EEG (HD EEG) devices, is widely used in fields such as neuroscience. HD EEG devices improve the spatial resolution of EEG by placing more electrodes on the scalp, meeting the requirements of clinical diagnostic applications such as epilepsy focus localization. However, this technique faces challenges such as high acquisition costs and limited usage scenarios. In this paper, spatio-temporal adaptive diffusion models (STADMs) are proposed to pioneer the use of diffusion models for achieving spatial SR reconstruction from low-resolution (LR, 64 channels or fewer) EEG to high-resolution (HR, 256 channels) EEG. Specifically, a spatio-temporal condition module is designed to extract the spatio-temporal features of LR EEG, which then serve as conditional inputs to guide the reverse denoising process of diffusion models. Additionally, a multi-scale Transformer denoising module is constructed to leverage multi-scale convolution blocks and cross-attention-based diffusion Transformer blocks for conditional guidance to generate subject-adaptive SR EEG. Experimental results demonstrate that the proposed method effectively enhances the spatial resolution of LR EEG and quantitatively outperforms existing methods. Furthermore, STADMs demonstrate their value by applying synthetic SR EEG to classification and source localization tasks of epilepsy patients, indicating their potential to significantly improve the spatial resolution of LR EEG. + + +2024-07-02 +A Novel Approach to Image EEG Sleep Data for Improving Quality of Life in Patients Suffering From Brain Injuries Using DreamDiffusion +David Fahim, Joshveer Grewal, Ritvik Ellendula +Link +Those experiencing strokes, traumatic brain injuries, and drug complications can often end up hospitalized and diagnosed with coma or locked-in syndrome. Such mental impediments can permanently alter the neurological pathways in work and significantly decrease the quality of life (QoL). It is critical to translate brain signals into images to gain a deeper understanding of the thoughts of a comatose patient. Traditionally, brain signals collected by an EEG could only be translated into text, but with the novel method of an open-source model available on GitHub, DreamDiffusion can be used to convert brain waves into images directly. DreamDiffusion works by extracting features from EEG signals and then using the features to create images through StableDiffusion. Upon this, we made further improvements that could make StableDiffusion the forerunner technology in waves to media translation. In our study, we begin by modifying the existing DreamDiffusion codebase so that it does not require any prior setup, avoiding any confusing steps needed to run the model from GitHub. For many researchers, the incomplete setup process, errors in the existing code, and a lack of directions made it nearly impossible to run, not even considering the model's performance. We brought the code into Google Colab so users could run and evaluate problems cell-by-cell, eliminating the specific file and repository dependencies. We also provided the original training data file so users do not need to purchase the necessary computing power to train the model from the given dataset. The second change is utilizing the mutability of the code and optimizing the model so it can be used to generate images from other given inputs, such as sleep data. Additionally, the affordability of EEG technology allows for global dissemination and creates the opportunity for those who want to work on the shared DreamDiffusion model. + + 2024-07-02 Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection Zixing Li, Chao Yan, Zhen Lan, Dengqing Tang, Xiaojia Xiang, Han Zhou, Jun Lai @@ -141,34 +169,6 @@ Link Integrating prior knowledge of neurophysiology into neural network architecture enhances the performance of emotion decoding. While numerous techniques emphasize learning spatial and short-term temporal patterns, there has been limited emphasis on capturing the vital long-term contextual information associated with emotional cognitive processes. In order to address this discrepancy, we introduce a novel transformer model called emotion transformer (EmT). EmT is designed to excel in both generalized cross-subject EEG emotion classification and regression tasks. In EmT, EEG signals are transformed into a temporal graph format, creating a sequence of EEG feature graphs using a temporal graph construction module (TGC). A novel residual multi-view pyramid GCN module (RMPG) is then proposed to learn dynamic graph representations for each EEG feature graph within the series, and the learned representations of each graph are fused into one token. Furthermore, we design a temporal contextual transformer module (TCT) with two types of token mixers to learn the temporal contextual information. Finally, the task-specific output module (TSO) generates the desired outputs. Experiments on four publicly available datasets show that EmT achieves higher results than the baseline methods for both EEG emotion classification and regression tasks. The code is available at https://github.com/yi-ding-cs/EmT. - -2024-06-26 -Online Learning of Multiple Tasks and Their Relationships : Testing on Spam Email Data and EEG Signals Recorded in Construction Fields -Yixin Jin, Wenjing Zhou, Meiqi Wang, Meng Li, Xintao Li, Tianyu Hu, Xingyuan Bu -Link -This paper examines an online multi-task learning (OMTL) method, which processes data sequentially to predict labels across related tasks. The framework learns task weights and their relatedness concurrently. Unlike previous models that assumed static task relatedness, our approach treats tasks as initially independent, updating their relatedness iteratively using newly calculated weight vectors. We introduced three rules to update the task relatedness matrix: OMTLCOV, OMTLLOG, and OMTLVON, and compared them against a conventional method (CMTL) that uses a fixed relatedness value. Performance evaluations on three datasets a spam dataset and two EEG datasets from construction workers under varying conditions demonstrated that our OMTL methods outperform CMTL, improving accuracy by 1\% to 3\% on EEG data, and maintaining low error rates around 12\% on the spam dataset. - - -2024-06-25 -SincVAE: a New Approach to Improve Anomaly Detection on EEG Data Using SincNet and Variational Autoencoder -Andrea Pollastro, Francesco Isgrò, Roberto Prevete -Link -Over the past few decades, electroencephalography (EEG) monitoring has become a pivotal tool for diagnosing neurological disorders, particularly for detecting seizures. Epilepsy, one of the most prevalent neurological diseases worldwide, affects approximately the 1 \% of the population. These patients face significant risks, underscoring the need for reliable, continuous seizure monitoring in daily life. Most of the techniques discussed in the literature rely on supervised Machine Learning (ML) methods. However, the challenge of accurately labeling variations in epileptic EEG waveforms complicates the use of these approaches. Additionally, the rarity of ictal events introduces an high imbalancing within the data, which could lead to poor prediction performance in supervised learning approaches. Instead, a semi-supervised approach allows to train the model only on data not containing seizures, thus avoiding the issues related to the data imbalancing. This work proposes a semi-supervised approach for detecting epileptic seizures from EEG data, utilizing a novel Deep Learning-based method called SincVAE. This proposal incorporates the learning of an ad-hoc array of bandpass filter as a first layer of a Variational Autoencoder (VAE), potentially eliminating the preprocessing stage where informative band frequencies are identified and isolated. Results indicate that SincVAE improves seizure detection in EEG data and is capable of identifying early seizures during the preictal stage as well as monitoring patients throughout the postictal stage. - - -2024-06-24 -Stationary and Sparse Denoising Approach for Corticomuscular Causality Estimation -Farwa Abbas, Verity McClelland, Zoran Cvetkovic, Wei Dai -Link -Objective: Cortico-muscular communication patterns are instrumental in understanding movement control. Estimating significant causal relationships between motor cortex electroencephalogram (EEG) and surface electromyogram (sEMG) from concurrently active muscles presents a formidable challenge since the relevant processes underlying muscle control are typically weak in comparison to measurement noise and background activities. Methodology: In this paper, a novel framework is proposed to simultaneously estimate the order of the autoregressive model of cortico-muscular interactions along with the parameters while enforcing stationarity condition in a convex program to ensure global optimality. The proposed method is further extended to a non-convex program to account for the presence of measurement noise in the recorded signals by introducing a wavelet sparsity assumption on the excitation noise in the model. Results: The proposed methodology is validated using both simulated data and neurophysiological signals. In case of simulated data, the performance of the proposed methods has been compared with the benchmark approaches in terms of order identification, computational efficiency, and goodness of fit in relation to various noise levels. In case of physiological signals our proposed methods are compared against the state-of-the-art approaches in terms of the ability to detect Granger causality. Significance: The proposed methods are shown to be effective in handling stationarity and measurement noise assumptions, revealing significant causal interactions from brain to muscles and vice versa. - - -2024-06-26 -Multimodal Physiological Signals Representation Learning via Multiscale Contrasting for Depression Recognition -Kai Shao, Rui Wang, Yixue Hao, Long Hu, Min Chen, Hans Arno Jacobsen -Link -Depression recognition based on physiological signals such as functional near-infrared spectroscopy (fNIRS) and electroencephalogram (EEG) has made considerable progress. However, most existing studies ignore the complementarity and semantic consistency of multimodal physiological signals under the same stimulation task in complex spatio-temporal patterns. In this paper, we introduce a multimodal physiological signals representation learning framework using Siamese architecture via multiscale contrasting for depression recognition (MRLMC). First, fNIRS and EEG are transformed into different but correlated data based on a time-domain data augmentation strategy. Then, we design a spatio-temporal contrasting module to learn the representation of fNIRS and EEG through weight-sharing multiscale spatio-temporal convolution. Furthermore, to enhance the learning of semantic representation associated with stimulation tasks, a semantic consistency contrast module is proposed, aiming to maximize the semantic similarity of fNIRS and EEG. Extensive experiments on publicly available and self-collected multimodal physiological signals datasets indicate that MRLMC outperforms the state-of-the-art models. Moreover, our proposed framework is capable of transferring to multimodal time series downstream tasks. - @@ -186,6 +186,13 @@ +2024-07-03 +EDPNet: An Efficient Dual Prototype Network for Motor Imagery EEG Decoding +Can Han, Chen Liu, Crystal Cai, Jun Wang, Dahong Qian +Link +Motor imagery electroencephalograph (MI-EEG) decoding plays a crucial role in developing motor imagery brain-computer interfaces (MI-BCIs). However, decoding intentions from MI remains challenging due to the inherent complexity of EEG signals relative to the small-sample size. In this paper, we propose an Efficient Dual Prototype Network (EDPNet) to enable accurate and fast MI decoding. EDPNet employs a lightweight adaptive spatial-spectral fusion module, which promotes more efficient information fusion between multiple EEG electrodes. Subsequently, a parameter-free multi-scale variance pooling module extracts more comprehensive temporal features. Furthermore, we introduce dual prototypical learning to optimize the feature space distribution and training process, thereby improving the model's generalization ability on small-sample MI datasets. Our experimental results show that the EDPNet outperforms state-of-the-art models with superior classification accuracy and kappa values (84.11% and 0.7881 for dataset BCI competition IV 2a, 86.65% and 0.7330 for dataset BCI competition IV 2b). Additionally, we use the BCI competition III IVa dataset with fewer training data to further validate the generalization ability of the proposed EDPNet. We also achieve superior performance with 82.03% classification accuracy. Benefiting from the lightweight parameters and superior decoding accuracy, our EDPNet shows great potential for MI-BCI applications. The code is publicly available at https://github.com/hancan16/EDPNet. + + 2024-06-26 L-Sort: An Efficient Hardware for Real-time Multi-channel Spike Sorting with Localization Yuntao Han, Shiwei Wang, Alister Hamilton @@ -248,13 +255,6 @@ Link Effective teaching relies on knowing what students know-or think they know. Revealing student thinking is challenging. Often used because of their ease of grading, even the best multiple choice (MC) tests, those using research based distractors (wrong answers) are intrinsically limited in the insights they provide due to two factors. When distractors do not reflect student beliefs they can be ignored, increasing the likelihood that the correct answer will be chosen by chance. Moreover, making the correct choice does not guarantee that the student understands why it is correct. To address these limitations, we recommend asking students to explain why they chose their answer, and why "wrong" choices are wrong. Using a discipline-trained artificial intelligence-based bot it is possible to analyze their explanations, identifying the concepts and scientific principles that maybe missing or misapplied. The bot also makes suggestions for how instructors can use these data to better guide student thinking. In a small "proof of concept" study, we tested this approach using questions from the Biology Concepts Instrument (BCI). The result was rapid, informative, and provided actionable feedback on student thinking. It appears that the use of AI addresses the weaknesses of conventional MC test. It seems likely that incorporating AI-analyzed formative assessments will lead to improved overall learning outcomes. - -2024-06-05 -Enhancing Computational Efficiency of Motor Imagery BCI Classification with Block-Toeplitz Augmented Covariance Matrices and Siegel Metric -Igor Carrara, Theodore Papadopoulo -Link -Electroencephalographic signals are represented as multidimensional datasets. We introduce an enhancement to the augmented covariance method (ACM), exploiting more thoroughly its mathematical properties, in order to improve motor imagery classification.Standard ACM emerges as a combination of phase space reconstruction of dynamical systems and of Riemannian geometry. Indeed, it is based on the construction of a Symmetric Positive Definite matrix to improve classification. But this matrix also has a Block-Toeplitz structure that was previously ignored. This work treats such matrices in the real manifold to which they belong: the set of Block-Toeplitz SPD matrices. After some manipulation, this set is can be seen as the product of an SPD manifold and a Siegel Disk Space.The proposed methodology was tested using the MOABB framework with a within-session evaluation procedure. It achieves a similar classification performance to ACM, which is typically better than -- or at worse comparable to -- state-of-the-art methods. But, it also improves consequently the computational efficiency over ACM, making it even more suitable for real time experiments. - @@ -273,6 +273,13 @@ 2024-07-03 +MHNet: Multi-view High-order Network for Diagnosing Neurodevelopmental Disorders Using Resting-state fMRI +Yueyang Li, Weiming Zeng, Wenhao Dong, Luhui Cai, Lei Wang, Hongyu Chen, Hongjie Yan, Lingbin Bian, Nizhuan Wang +Link +Background: Deep learning models have shown promise in diagnosing neurodevelopmental disorders (NDD) like ASD and ADHD. However, many models either use graph neural networks (GNN) to construct single-level brain functional networks (BFNs) or employ spatial convolution filtering for local information extraction from rs-fMRI data, often neglecting high-order features crucial for NDD classification. Methods: We introduce a Multi-view High-order Network (MHNet) to capture hierarchical and high-order features from multi-view BFNs derived from rs-fMRI data for NDD prediction. MHNet has two branches: the Euclidean Space Features Extraction (ESFE) module and the Non-Euclidean Space Features Extraction (Non-ESFE) module, followed by a Feature Fusion-based Classification (FFC) module for NDD identification. ESFE includes a Functional Connectivity Generation (FCG) module and a High-order Convolutional Neural Network (HCNN) module to extract local and high-order features from BFNs in Euclidean space. Non-ESFE comprises a Generic Internet-like Brain Hierarchical Network Generation (G-IBHN-G) module and a High-order Graph Neural Network (HGNN) module to capture topological and high-order features in non-Euclidean space. Results: Experiments on three public datasets show that MHNet outperforms state-of-the-art methods using both AAL1 and Brainnetome Atlas templates. Extensive ablation studies confirm the superiority of MHNet and the effectiveness of using multi-view fMRI information and high-order features. Our study also offers atlas options for constructing more sophisticated hierarchical networks and explains the association between key brain regions and NDD. Conclusion: MHNet leverages multi-view feature learning from both Euclidean and non-Euclidean spaces, incorporating high-order information from BFNs to enhance NDD classification performance. + + +2024-07-03 Deconvolving Complex Neuronal Networks into Interpretable Task-Specific Connectomes Yifan Wang, Vikram Ravindra, Ananth Grama Link @@ -334,13 +341,6 @@ Link Task-based fMRI uses actions or stimuli to trigger task-specific brain responses and measures them using BOLD contrast. Despite the significant task-induced spatiotemporal brain activation fluctuations, most studies on task-based fMRI ignore the task context information aligned with fMRI and consider task-based fMRI a coherent sequence. In this paper, we show that using the task structures as data-driven guidance is effective for spatiotemporal analysis. We propose STNAGNN, a GNN-based spatiotemporal architecture, and validate its performance in an autism classification task. The trained model is also interpreted for identifying autism-related spatiotemporal brain biomarkers. - -2024-06-13 -Oblivious subspace embeddings for compressed Tucker decompositions -Matthew Pietrosanu, Bei Jiang, Linglong Kong -Link -Emphasis in the tensor literature on random embeddings (tools for low-distortion dimension reduction) for the canonical polyadic (CP) tensor decomposition has left analogous results for the more expressive Tucker decomposition comparatively lacking. This work establishes general Johnson-Lindenstrauss (JL) type guarantees for the estimation of Tucker decompositions when an oblivious random embedding is applied along each mode. When these embeddings are drawn from a JL-optimal family, the decomposition can be estimated within $\varepsilon$ relative error under restrictions on the embedding dimension that are in line with recent CP results. We implement a higher-order orthogonal iteration (HOOI) decomposition algorithm with random embeddings to demonstrate the practical benefits of this approach and its potential to improve the accessibility of otherwise prohibitive tensor analyses. On moderately large face image and fMRI neuroimaging datasets, empirical results show that substantial dimension reduction is possible with minimal increase in reconstruction error relative to traditional HOOI ($\leq$5% larger error, 50%-60% lower computation time for large models with 50% dimension reduction along each mode). Especially for large tensors, our method outperforms traditional higher-order singular value decomposition (HOSVD) and recently proposed TensorSketch methods. - @@ -358,6 +358,13 @@ +2024-07-03 +Mobile Edge Generation-Enabled Digital Twin: Architecture Design and Research Opportunities +Xiaoxia Xu, Ruikang Zhong, Xidong Mu, Yuanwei Liu, Kaibin Huang +Link +A novel paradigm of mobile edge generation (MEG)-enabled digital twin (DT) is proposed, which enables distributed on-device generation at mobile edge networks for real-time DT applications. First, an MEG-DT architecture is put forward to decentralize generative artificial intelligence (GAI) models onto edge servers (ESs) and user equipments (UEs), which has the advantages of low latency, privacy preservation, and individual-level customization. Then, various single-user and multi-user generation mechanisms are conceived for MEG-DT, which strike trade-offs between generation latency, hardware costs, and device coordination. Furthermore, to perform efficient distributed generation, two operating protocols are explored for transmitting interpretable and latent features between ESs and UEs, namely sketch-based generation and seed-based generation, respectively. Based on the proposed protocols, the convergence between MEG and DT are highlighted. Considering the seed-based image generation scenario, numerical case studies are provided to reveal the superiority of MEG-DT over centralized generation. Finally, promising applications and research opportunities are identified. + + 2024-06-11 EEG-ImageNet: An Electroencephalogram Dataset and Benchmarks with Image Visual Stimuli of Multi-Granularity Labels Shuqi Zhu, Ziyi Ye, Qingyao Ai, Yiqun Liu @@ -420,13 +427,6 @@ Link EEG slowing is reported in various neurological disorders including Alzheimer's, Parkinson's and Epilepsy. Here, we investigate alpha rhythm slowing in individuals with refractory temporal lobe epilepsy (TLE), compared to healthy controls, using scalp electroencephalography (EEG) and magnetoencephalography (MEG). We retrospectively analysed data from 17,(46) healthy controls and 22,(24) individuals with TLE who underwent scalp EEG and (MEG) recordings as part of presurgical evaluation. Resting-state, eyes-closed recordings were source reconstructed using the standardized low-resolution brain electrographic tomography (sLORETA) method. We extracted low (slow) 6-9 Hz and high (fast) 10-11 Hz alpha relative band power and calculated the alpha power ratio by dividing low (slow) alpha by high (fast) alpha. This ratio was computed for all brain regions in all individuals. Alpha oscillations were slower in individuals with TLE than controls (p<0.05). This effect was present in both the ipsilateral and contralateral hemispheres, and across widespread brain regions. Alpha slowing in TLE was found in both EEG and MEG recordings. We interpret greater low (slow)-alpha as greater deviation from health. - -2024-04-14 -Foundational GPT Model for MEG -Richard Csaky, Mats W. J. van Es, Oiwi Parker Jones, Mark Woolrich -Link -Deep learning techniques can be used to first training unsupervised models on large amounts of unlabelled data, before fine-tuning the models on specific tasks. This approach has seen massive success for various kinds of data, e.g. images, language, audio, and holds the promise of improving performance in various downstream tasks (e.g. encoding or decoding brain data). However, there has been limited progress taking this approach for modelling brain signals, such as Magneto-/electroencephalography (M/EEG). Here we propose two classes of deep learning foundational models that can be trained using forecasting of unlabelled MEG. First, we consider a modified Wavenet; and second, we consider a modified Transformer-based (GPT2) model. The modified GPT2 includes a novel application of tokenisation and embedding methods, allowing a model developed initially for the discrete domain of language to be applied to continuous multichannel time series data. We also extend the forecasting framework to include condition labels as inputs, enabling better modelling (encoding) of task data. We compare the performance of these deep learning models with standard linear autoregressive (AR) modelling on MEG data. This shows that GPT2-based models provide better modelling capabilities than Wavenet and linear AR models, by better reproducing the temporal, spatial and spectral characteristics of real data and evoked activity in task data. We show how the GPT2 model scales well to multiple subjects, while adapting its model to each subject through subject embedding. Finally, we show how such a model can be useful in downstream decoding tasks through data simulation. All code is available on GitHub (https://github.com/ricsinaruto/MEG-transfer-decoding). - @@ -516,74 +516,74 @@ -2024-07-02 -MMedAgent: Learning to Use Medical Tools with Multi-modal Agent -Binxu Li, Tiankai Yan, Yuanting Pan, Zhe Xu, Jie Luo, Ruiyang Ji, Shilong Liu, Haoyu Dong, Zihao Lin, Yixin Wang -Link -Multi-Modal Large Language Models (MLLMs), despite being successful, exhibit limited generality and often fall short when compared to specialized models. Recently, LLM-based agents have been developed to address these challenges by selecting appropriate specialized models as tools based on user inputs. However, such advancements have not been extensively explored within the medical domain. To bridge this gap, this paper introduces the first agent explicitly designed for the medical field, named \textbf{M}ulti-modal \textbf{Med}ical \textbf{Agent} (MMedAgent). We curate an instruction-tuning dataset comprising six medical tools solving seven tasks, enabling the agent to choose the most suitable tools for a given task. Comprehensive experiments demonstrate that MMedAgent achieves superior performance across a variety of medical tasks compared to state-of-the-art open-source methods and even the closed-source model, GPT-4o. Furthermore, MMedAgent exhibits efficiency in updating and integrating new medical tools. +2024-07-03 +Accelerated Proton Resonance Frequency-based Magnetic Resonance Thermometry by Optimized Deep Learning Method +Sijie Xu, Shenyan Zong, Chang-Sheng Mei, Guofeng Shen, Yueran Zhao, He Wang +Link +Proton resonance frequency (PRF) based MR thermometry is essential for focused ultrasound (FUS) thermal ablation therapies. This work aims to enhance temporal resolution in dynamic MR temperature map reconstruction using an improved deep learning method. The training-optimized methods and five classical neural networks were applied on the 2-fold and 4-fold under-sampling k-space data to reconstruct the temperature maps. The enhanced training modules included offline/online data augmentations, knowledge distillation, and the amplitude-phase decoupling loss function. The heating experiments were performed by a FUS transducer on phantom and ex vivo tissues, respectively. These data were manually under-sampled to imitate acceleration procedures and trained in our method to get the reconstruction model. The additional dozen or so testing datasets were separately obtained for evaluating the real-time performance and temperature accuracy. Acceleration factors of 1.9 and 3.7 were found for 2 times and 4 times k-space under-sampling strategies and the ResUNet-based deep learning reconstruction performed exceptionally well. In 2-fold acceleration scenario, the RMSE of temperature map patches provided the values of 0.888 degree centigrade and 1.145 degree centigrade on phantom and ex vivo testing datasets. The DICE value of temperature areas enclosed by 43 degree centigrade isotherm was 0.809, and the Bland-Altman analysis showed a bias of -0.253 degree centigrade with the apart of plus or minus 2.16 degree centigrade. In 4 times under-sampling case, these evaluating values decreased by approximately 10%. This study demonstrates that deep learning-based reconstruction can significantly enhance the accuracy and efficiency of MR thermometry for clinical FUS thermal therapies. -2024-07-02 -Evaluating the Robustness of Adverse Drug Event Classification Models Using Templates -Dorothea MacPhail, David Harbecke, Lisa Raithel, Sebastian Möller -Link -An adverse drug effect (ADE) is any harmful event resulting from medical drug treatment. Despite their importance, ADEs are often under-reported in official channels. Some research has therefore turned to detecting discussions of ADEs in social media. Impressive results have been achieved in various attempts to detect ADEs. In a high-stakes domain such as medicine, however, an in-depth evaluation of a model's abilities is crucial. We address the issue of thorough performance evaluation in English-language ADE detection with hand-crafted templates for four capabilities: Temporal order, negation, sentiment, and beneficial effect. We find that models with similar performance on held-out test sets have varying results on these capabilities. +2024-07-03 +Biomechanics-informed Non-rigid Medical Image Registration and its Inverse Material Property Estimation with Linear and Nonlinear Elasticity +Zhe Min, Zachary M. C. Baum, Shaheer U. Saeed, Mark Emberton, Dean C. Barratt, Zeike A. Taylor, Yipeng Hu +Link +This paper investigates both biomechanical-constrained non-rigid medical image registrations and accurate identifications of material properties for soft tissues, using physics-informed neural networks (PINNs). The complex nonlinear elasticity theory is leveraged to formally establish the partial differential equations (PDEs) representing physics laws of biomechanical constraints that need to be satisfied, with which registration and identification tasks are treated as forward (i.e., data-driven solutions of PDEs) and inverse (i.e., parameter estimation) problems under PINNs respectively. Two net configurations (i.e., Cfg1 and Cfg2) have also been compared for both linear and nonlinear physics model. Two sets of experiments have been conducted, using pairs of undeformed and deformed MR images from clinical cases of prostate cancer biopsy. Our contributions are summarised as follows. 1) We developed a learning-based biomechanical-constrained non-rigid registration algorithm using PINNs, where linear elasticity is generalised to the nonlinear version. 2) We demonstrated extensively that nonlinear elasticity shows no statistical significance against linear models in computing point-wise displacement vectors but their respective benefits may depend on specific patients, with finite-element (FE) computed ground-truth. 3) We formulated and solved the inverse parameter estimation problem, under the joint optimisation scheme of registration and parameter identification using PINNs, whose solutions can be accurately found by locating saddle points. -2024-07-02 -Enable the Right to be Forgotten with Federated Client Unlearning in Medical Imaging -Zhipeng Deng, Luyang Luo, Hao Chen -Link -The right to be forgotten, as stated in most data regulations, poses an underexplored challenge in federated learning (FL), leading to the development of federated unlearning (FU). However, current FU approaches often face trade-offs between efficiency, model performance, forgetting efficacy, and privacy preservation. In this paper, we delve into the paradigm of Federated Client Unlearning (FCU) to guarantee a client the right to erase the contribution or the influence, introducing the first FU framework in medical imaging. In the unlearning process of a client, the proposed model-contrastive unlearning marks a pioneering step towards feature-level unlearning, and frequency-guided memory preservation ensures smooth forgetting of local knowledge while maintaining the generalizability of the trained global model, thus avoiding performance compromises and guaranteeing rapid post-training. We evaluated our FCU framework on two public medical image datasets, including Intracranial hemorrhage diagnosis and skin lesion diagnosis, demonstrating that our framework outperformed other state-of-the-art FU frameworks, with an expected speed-up of 10-15 times compared with retraining from scratch. The code and the organized datasets can be found at: https://github.com/dzp2095/FCU. +2024-07-03 +MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition +Yanjie Cui, Xiaohong Liu, Jing Liang, Yamin Fu +Link +Electroencephalography (EEG), a medical imaging technique that captures scalp electrical activity of brain structures via electrodes, has been widely used in affective computing. The spatial domain of EEG is rich in affective information.However, few of the existing studies have simultaneously analyzed EEG signals from multiple perspectives of geometric and anatomical structures in spatial domain. In this paper, we propose a multi-view Graph Transformer (MVGT) based on spatial relations, which integrates information from the temporal, frequency and spatial domains, including geometric and anatomical structures, so as to enhance the expressive power of the model comprehensively.We incorporate the spatial information of EEG channels into the model as encoding, thereby improving its ability to perceive the spatial structure of the channels. Meanwhile, experimental results based on publicly available datasets demonstrate that our proposed model outperforms state-of-the-art methods in recent years. In addition, the results also show that the MVGT could extract information from multiple domains and capture inter-channel relationships in EEG emotion recognition tasks effectively. -2024-07-02 -CALICO: Confident Active Learning with Integrated Calibration -Lorenzo S. Querol, Hajime Nagahara, Hideaki Hayashi -Link -The growing use of deep learning in safety-critical applications, such as medical imaging, has raised concerns about limited labeled data, where this demand is amplified as model complexity increases, posing hurdles for domain experts to annotate data. In response to this, active learning (AL) is used to efficiently train models with limited annotation costs. In the context of deep neural networks (DNNs), AL often uses confidence or probability outputs as a score for selecting the most informative samples. However, modern DNNs exhibit unreliable confidence outputs, making calibration essential. We propose an AL framework that self-calibrates the confidence used for sample selection during the training process, referred to as Confident Active Learning with Integrated CalibratiOn (CALICO). CALICO incorporates the joint training of a classifier and an energy-based model, instead of the standard softmax-based classifier. This approach allows for simultaneous estimation of the input data distribution and the class probabilities during training, improving calibration without needing an additional labeled dataset. Experimental results showcase improved classification performance compared to a softmax-based classifier with fewer labeled samples. Furthermore, the calibration stability of the model is observed to depend on the prior class distribution of the data. +2024-07-03 +SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research +Meghal Dani, Muthu Jeyanthi Prakash, Zeynep Akata, Stefanie Liebe +Link +Large Language Models have shown promising results in their ability to encode general medical knowledge in standard medical question-answering datasets. However, their potential application in clinical practice requires evaluation in domain-specific tasks, where benchmarks are largely missing. In this study semioLLM, we test the ability of state-of-the-art LLMs (GPT-3.5, GPT-4, Mixtral 8x7B, and Qwen-72chat) to leverage their internal knowledge and reasoning for epilepsy diagnosis. Specifically, we obtain likelihood estimates linking unstructured text descriptions of seizures to seizure-generating brain regions, using an annotated clinical database containing 1269 entries. We evaluate the LLM's performance, confidence, reasoning, and citation abilities in comparison to clinical evaluation. Models achieve above-chance classification performance with prompt engineering significantly improving their outcome, with some models achieving close-to-clinical performance and reasoning. However, our analyses also reveal significant pitfalls with several models being overly confident while showing poor performance, as well as exhibiting citation errors and hallucinations. In summary, our work provides the first extensive benchmark comparing current SOTA LLMs in the medical domain of epilepsy and highlights their ability to leverage unstructured texts from patients' medical history to aid diagnostic processes in health care. 2024-07-03 -FedIA: Federated Medical Image Segmentation with Heterogeneous Annotation Completeness -Yangyang Xiang, Nannan Wu, Li Yu, Xin Yang, Kwang-Ting Cheng, Zengqiang Yan -Link -Federated learning has emerged as a compelling paradigm for medical image segmentation, particularly in light of increasing privacy concerns. However, most of the existing research relies on relatively stringent assumptions regarding the uniformity and completeness of annotations across clients. Contrary to this, this paper highlights a prevalent challenge in medical practice: incomplete annotations. Such annotations can introduce incorrectly labeled pixels, potentially undermining the performance of neural networks in supervised learning. To tackle this issue, we introduce a novel solution, named FedIA. Our insight is to conceptualize incomplete annotations as noisy data (i.e., low-quality data), with a focus on mitigating their adverse effects. We begin by evaluating the completeness of annotations at the client level using a designed indicator. Subsequently, we enhance the influence of clients with more comprehensive annotations and implement corrections for incomplete ones, thereby ensuring that models are trained on accurate data. Our method's effectiveness is validated through its superior performance on two extensively used medical image segmentation datasets, outperforming existing solutions. The code is available at https://github.com/HUSTxyy/FedIA. +MedPix 2.0: A Comprehensive Multimodal Biomedical Dataset for Advanced AI Applications +Irene Siragusa, Salvatore Contino, Massimo La Ciura, Rosario Alicata, Roberto Pirrone +Link +The increasing interest in developing Artificial Intelligence applications in the medical domain, suffers from the lack of high-quality dataset, mainly due to privacy-related issues. Moreover, the recent rising of Multimodal Large Language Models (MLLM) leads to a need for multimodal medical datasets, where clinical reports and findings are attached to the corresponding CT or MR scans. This paper illustrates the entire workflow for building the data set MedPix 2.0. Starting from the well-known multimodal dataset MedPix\textsuperscript{\textregistered}, mainly used by physicians, nurses and healthcare students for Continuing Medical Education purposes, a semi-automatic pipeline was developed to extract visual and textual data followed by a manual curing procedure where noisy samples were removed, thus creating a MongoDB database. Along with the dataset, we developed a GUI aimed at navigating efficiently the MongoDB instance, and obtaining the raw data that can be easily used for training and/or fine-tuning MLLMs. To enforce this point, we also propose a CLIP-based model trained on MedPix 2.0 for scan classification tasks. 2024-07-03 -Federated Distillation for Medical Image Classification: Towards Trustworthy Computer-Aided Diagnosis -Sufen Ren, Yule Hu, Shengchao Chen, Guanjun Wang -Link -Medical image classification plays a crucial role in computer-aided clinical diagnosis. While deep learning techniques have significantly enhanced efficiency and reduced costs, the privacy-sensitive nature of medical imaging data complicates centralized storage and model training. Furthermore, low-resource healthcare organizations face challenges related to communication overhead and efficiency due to increasing data and model scales. This paper proposes a novel privacy-preserving medical image classification framework based on federated learning to address these issues, named FedMIC. The framework enables healthcare organizations to learn from both global and local knowledge, enhancing local representation of private data despite statistical heterogeneity. It provides customized models for organizations with diverse data distributions while minimizing communication overhead and improving efficiency without compromising performance. Our FedMIC enhances robustness and practical applicability under resource-constrained conditions. We demonstrate FedMIC's effectiveness using four public medical image datasets for classical medical image classification tasks. +IM-MoCo: Self-supervised MRI Motion Correction using Motion-Guided Implicit Neural Representations +Ziad Al-Haj Hemidi, Christian Weihsbach, Mattias P. Heinrich +Link +Motion artifacts in Magnetic Resonance Imaging (MRI) arise due to relatively long acquisition times and can compromise the clinical utility of acquired images. Traditional motion correction methods often fail to address severe motion, leading to distorted and unreliable results. Deep Learning (DL) alleviated such pitfalls through generalization with the cost of vanishing structures and hallucinations, making it challenging to apply in the medical field where hallucinated structures can tremendously impact the diagnostic outcome. In this work, we present an instance-wise motion correction pipeline that leverages motion-guided Implicit Neural Representations (INRs) to mitigate the impact of motion artifacts while retaining anatomical structure. Our method is evaluated using the NYU fastMRI dataset with different degrees of simulated motion severity. For the correction alone, we can improve over state-of-the-art image reconstruction methods by $+5\%$ SSIM, $+5\:db$ PSNR, and $+14\%$ HaarPSI. Clinical relevance is demonstrated by a subsequent experiment, where our method improves classification outcomes by at least $+1.5$ accuracy percentage points compared to motion-corrupted images. -2024-07-02 -Towards a Holistic Framework for Multimodal Large Language Models in Three-dimensional Brain CT Report Generation -Cheng-Yi Li, Kao-Jung Chang, Cheng-Fu Yang, Hsin-Yu Wu, Wenting Chen, Hritik Bansal, Ling Chen, Yi-Ping Yang, Yu-Chun Chen, Shih-Pin Chen, Jiing-Feng Lirng, Kai-Wei Chang, Shih-Hwa Chiou -Link -Multi-modal large language models (MLLMs) have been given free rein to explore exciting medical applications with a primary focus on radiology report generation. Nevertheless, the preliminary success in 2D radiology captioning is incompetent to reflect the real-world diagnostic challenge in the volumetric 3D anatomy. To mitigate three crucial limitation aspects in the existing literature, including (1) data complexity, (2) model capacity, and (3) evaluation metric fidelity, we collected an 18,885 text-scan pairs 3D-BrainCT dataset and applied clinical visual instruction tuning (CVIT) to train BrainGPT models to generate radiology-adherent 3D brain CT reports. Statistically, our BrainGPT scored BLEU-1 = 44.35, BLEU-4 = 20.38, METEOR = 30.13, ROUGE-L = 47.6, and CIDEr-R = 211.77 during internal testing and demonstrated an accuracy of 0.91 in captioning midline shifts on the external validation CQ500 dataset. By further inspecting the captioned report, we reported that the traditional metrics appeared to measure only the surface text similarity and failed to gauge the information density of the diagnostic purpose. To close this gap, we proposed a novel Feature-Oriented Radiology Task Evaluation (FORTE) to estimate the report's clinical relevance (lesion feature and landmarks). Notably, the BrainGPT model scored an average FORTE F1-score of 0.71 (degree=0.661; landmark=0.706; feature=0.693; impression=0.779). To demonstrate that BrainGPT models possess objective readiness to generate human-like radiology reports, we conducted a Turing test that enrolled 11 physician evaluators, and around 74% of the BrainGPT-generated captions were indistinguishable from those written by humans. Our work embodies a holistic framework that showcased the first-hand experience of curating a 3D brain CT dataset, fine-tuning anatomy-sensible language models, and proposing robust radiology evaluation metrics. +2024-07-03 +Information Greenhouse: Optimal Persuasion for Medical Test-Avoiders +Zhuo Chen +Link +Patients often delay or reject medical tests due to information avoidance, which hinders timely reception of necessary treatments. This paper studies the optimal information policy to persuade an information-avoidant patient to undergo the test and make the best choice that maximizes his health. The patient sequentially decides whether to take the test and the optimal treatment plan. The information provided is about the background knowledge of the disease, which is complementary with the test result, and disclosure can take place both before and after the test decision. The optimal information policy depends on whether the patient is willing to be tested when he is completely pessimistic. If so, the optimal policy features \textit{preemptive warning}: the disclosure only takes place before the test, and the bad news guarantees the patient to be tested and be treated even without further information. If not, the optimal policy constructs an \textit{information greenhouse}: an information structure that provides high anticipatory utility is committed when the patient is tested and the test result is bad. I consider extensions to general information preference and ex ante participation constraint. -2024-07-02 -Interplay between MRI-based axon diameter and myelination estimates in macaque and human brain -Ting Gong, Chiara Maffei, Evan Dann, Hong-Hsi Lee, Hansol Lee, Jean C. Augustinack, Susie Y. Huang, Suzanne N. Haber, Anastasia Yendiki -Link -Axon diameter and myelin thickness are closely related microstructural tissue properties that affect the conduction velocity of action potentials in the nervous system. Imaging them non-invasively with MRI-based methods is thus valuable for studying brain microstructure and function. However, the relationship between MRI-based axon diameter and myelination measures has not been investigated across the brain, mainly due to methodological limitations in estimating axon diameters. In recent years, studies using ultra-high gradient strength diffusion MRI (dMRI) have demonstrated improved estimation of axon diameter across white-matter (WM) tracts in the human brain, making such investigations feasible. In this study, we aim to investigate relationships between tissue microstructure properties with MRI-based methods and compare the imaging findings to histological evidence from the literature. We collected dMRI with ultra-high gradient strength and multi-echo spin-echo MRI on ex vivo macaque and human brain samples on a preclinical scanner. From these data, we estimated axon diameter, intra-axonal signal fraction, myelin water fraction (MWF) and aggregate g-ratio and investigated their correlations. We found that the microstructural imaging parameters exhibited consistent patterns across WM tracts and species. Overall, the findings suggest that MRI-based axon geometry and myelination measures can provide complementary information about fiber morphology, and the relationships between these measures agree with prior histological evidence. +2024-07-03 +An Uncertainty-guided Tiered Self-training Framework for Active Source-free Domain Adaptation in Prostate Segmentation +Zihao Luo, Xiangde Luo, Zijun Gao, Guotai Wang +Link +Deep learning models have exhibited remarkable efficacy in accurately delineating the prostate for diagnosis and treatment of prostate diseases, but challenges persist in achieving robust generalization across different medical centers. Source-free Domain Adaptation (SFDA) is a promising technique to adapt deep segmentation models to address privacy and security concerns while reducing domain shifts between source and target domains. However, recent literature indicates that the performance of SFDA remains far from satisfactory due to unpredictable domain gaps. Annotating a few target domain samples is acceptable, as it can lead to significant performance improvement with a low annotation cost. Nevertheless, due to extremely limited annotation budgets, careful consideration is needed in selecting samples for annotation. Inspired by this, our goal is to develop Active Source-free Domain Adaptation (ASFDA) for medical image segmentation. Specifically, we propose a novel Uncertainty-guided Tiered Self-training (UGTST) framework, consisting of efficient active sample selection via entropy-based primary local peak filtering to aggregate global uncertainty and diversity-aware redundancy filter, coupled with a tiered self-learning strategy, achieves stable domain adaptation. Experimental results on cross-center prostate MRI segmentation datasets revealed that our method yielded marked advancements, with a mere 5% annotation, exhibiting an average Dice score enhancement of 9.78% and 7.58% in two target domains compared with state-of-the-art methods, on par with fully supervised learning. Code is available at:https://github.com/HiLab-git/UGTST -2024-07-02 -Abstract Dialectical Frameworks are Boolean Networks (full version) -Jesse Heyninck, Matthias Knorr, João Leite -Link -Dialectical frameworks are a unifying model of formal argumentation, where argumentative relations between arguments are represented by assigning acceptance conditions to atomic arguments. Their generality allow them to cover a number of different approaches with varying forms of representing the argumentation structure. Boolean regulatory networks are used to model the dynamics of complex biological processes, taking into account the interactions of biological compounds, such as proteins or genes. These models have proven highly useful for comprehending such biological processes, allowing to reproduce known behaviour and testing new hypotheses and predictions in silico, for example in the context of new medical treatments. While both these approaches stem from entirely different communities, it turns out that there are striking similarities in their appearence. In this paper, we study the relation between these two formalisms revealing their communalities as well as their differences, and introducing a correspondence that allows to establish novel results for the individual formalisms. +2024-07-03 +Membership Inference Attacks Against Time-Series Models +Noam Koren, Abigail Goldsteen, Ariel Farkash, Guy Amit +Link +Analyzing time-series data that may contain personal information, particularly in the medical field, presents serious privacy concerns. Sensitive health data from patients is often used to train machine-learning models for diagnostics and ongoing care. Assessing the privacy risk of such models is crucial to making knowledgeable decisions on whether to use a model in production, share it with third parties, or deploy it in patients homes. Membership Inference Attacks (MIA) are a key method for this kind of evaluation, however time-series prediction models have not been thoroughly studied in this context. We explore existing MIA techniques on time-series models, and introduce new features, focusing on the seasonality and trend components of the data. Seasonality is estimated using a multivariate Fourier transform, and a low-degree polynomial is used to approximate trends. We applied these techniques to various types of time-series models, using datasets from the health domain. Our results demonstrate that these new features enhance the effectiveness of MIAs in identifying membership, improving the understanding of privacy risks in medical data applications. -2024-07-02 -Extracting and Encoding: Leveraging Large Language Models and Medical Knowledge to Enhance Radiological Text Representation -Pablo Messina, René Vidal, Denis Parra, Álvaro Soto, Vladimir Araujo -Link -Advancing representation learning in specialized fields like medicine remains challenging due to the scarcity of expert annotations for text and images. To tackle this issue, we present a novel two-stage framework designed to extract high-quality factual statements from free-text radiology reports in order to improve the representations of text encoders and, consequently, their performance on various downstream tasks. In the first stage, we propose a \textit{Fact Extractor} that leverages large language models (LLMs) to identify factual statements from well-curated domain-specific datasets. In the second stage, we introduce a \textit{Fact Encoder} (CXRFE) based on a BERT model fine-tuned with objective functions designed to improve its representations using the extracted factual data. Our framework also includes a new embedding-based metric (CXRFEScore) for evaluating chest X-ray text generation systems, leveraging both stages of our approach. Extensive evaluations show that our fact extractor and encoder outperform current state-of-the-art methods in tasks such as sentence ranking, natural language inference, and label extraction from radiology reports. Additionally, our metric proves to be more robust and effective than existing metrics commonly used in the radiology report generation literature. The code of this project is available at \url{https://github.com/PabloMessina/CXR-Fact-Encoder}. +2024-07-03 +Multi-Attention Integrated Deep Learning Frameworks for Enhanced Breast Cancer Segmentation and Identification +Pandiyaraju V, Shravan Venkatraman, Pavan Kumar S, Santhosh Malarvannan, Kannan A +Link +Breast cancer poses a profound threat to lives globally, claiming numerous lives each year. Therefore, timely detection is crucial for early intervention and improved chances of survival. Accurately diagnosing and classifying breast tumors using ultrasound images is a persistent challenge in medicine, demanding cutting-edge solutions for improved treatment strategies. This research introduces multiattention-enhanced deep learning (DL) frameworks designed for the classification and segmentation of breast cancer tumors from ultrasound images. A spatial channel attention mechanism is proposed for segmenting tumors from ultrasound images, utilizing a novel LinkNet DL framework with an InceptionResNet backbone. Following this, the paper proposes a deep convolutional neural network with an integrated multi-attention framework (DCNNIMAF) to classify the segmented tumor as benign, malignant, or normal. From experimental results, it is observed that the segmentation model has recorded an accuracy of 98.1%, with a minimal loss of 0.6%. It has also achieved high Intersection over Union (IoU) and Dice Coefficient scores of 96.9% and 97.2%, respectively. Similarly, the classification model has attained an accuracy of 99.2%, with a low loss of 0.31%. Furthermore, the classification framework has achieved outstanding F1-Score, precision, and recall values of 99.1%, 99.3%, and 99.1%, respectively. By offering a robust framework for early detection and accurate classification of breast cancer, this proposed work significantly advances the field of medical image analysis, potentially improving diagnostic precision and patient outcomes. diff --git a/data_store/papers_2024-07-05.json b/data_store/papers_2024-07-05.json new file mode 100644 index 0000000..cd65954 --- /dev/null +++ b/data_store/papers_2024-07-05.json @@ -0,0 +1,506 @@ +{ + "Brain": { + "2407.03217v1": { + "title": "MHNet: Multi-view High-order Network for Diagnosing Neurodevelopmental Disorders Using Resting-state fMRI", + "url": "http://arxiv.org/abs/2407.03217v1", + "authors": "Yueyang Li, Weiming Zeng, Wenhao Dong, Luhui Cai, Lei Wang, Hongyu Chen, Hongjie Yan, Lingbin Bian, Nizhuan Wang", + "update_time": "2024-07-03", + "abstract": "Background: Deep learning models have shown promise in diagnosing neurodevelopmental disorders (NDD) like ASD and ADHD. However, many models either use graph neural networks (GNN) to construct single-level brain functional networks (BFNs) or employ spatial convolution filtering for local information extraction from rs-fMRI data, often neglecting high-order features crucial for NDD classification. Methods: We introduce a Multi-view High-order Network (MHNet) to capture hierarchical and high-order features from multi-view BFNs derived from rs-fMRI data for NDD prediction. MHNet has two branches: the Euclidean Space Features Extraction (ESFE) module and the Non-Euclidean Space Features Extraction (Non-ESFE) module, followed by a Feature Fusion-based Classification (FFC) module for NDD identification. ESFE includes a Functional Connectivity Generation (FCG) module and a High-order Convolutional Neural Network (HCNN) module to extract local and high-order features from BFNs in Euclidean space. Non-ESFE comprises a Generic Internet-like Brain Hierarchical Network Generation (G-IBHN-G) module and a High-order Graph Neural Network (HGNN) module to capture topological and high-order features in non-Euclidean space. Results: Experiments on three public datasets show that MHNet outperforms state-of-the-art methods using both AAL1 and Brainnetome Atlas templates. Extensive ablation studies confirm the superiority of MHNet and the effectiveness of using multi-view fMRI information and high-order features. Our study also offers atlas options for constructing more sophisticated hierarchical networks and explains the association between key brain regions and NDD. Conclusion: MHNet leverages multi-view feature learning from both Euclidean and non-Euclidean spaces, incorporating high-order information from BFNs to enhance NDD classification performance." + }, + "2407.03177v1": { + "title": "EDPNet: An Efficient Dual Prototype Network for Motor Imagery EEG Decoding", + "url": "http://arxiv.org/abs/2407.03177v1", + "authors": "Can Han, Chen Liu, Crystal Cai, Jun Wang, Dahong Qian", + "update_time": "2024-07-03", + "abstract": "Motor imagery electroencephalograph (MI-EEG) decoding plays a crucial role in developing motor imagery brain-computer interfaces (MI-BCIs). However, decoding intentions from MI remains challenging due to the inherent complexity of EEG signals relative to the small-sample size. In this paper, we propose an Efficient Dual Prototype Network (EDPNet) to enable accurate and fast MI decoding. EDPNet employs a lightweight adaptive spatial-spectral fusion module, which promotes more efficient information fusion between multiple EEG electrodes. Subsequently, a parameter-free multi-scale variance pooling module extracts more comprehensive temporal features. Furthermore, we introduce dual prototypical learning to optimize the feature space distribution and training process, thereby improving the model's generalization ability on small-sample MI datasets. Our experimental results show that the EDPNet outperforms state-of-the-art models with superior classification accuracy and kappa values (84.11% and 0.7881 for dataset BCI competition IV 2a, 86.65% and 0.7330 for dataset BCI competition IV 2b). Additionally, we use the BCI competition III IVa dataset with fewer training data to further validate the generalization ability of the proposed EDPNet. We also achieve superior performance with 82.03% classification accuracy. Benefiting from the lightweight parameters and superior decoding accuracy, our EDPNet shows great potential for MI-BCI applications. The code is publicly available at https://github.com/hancan16/EDPNet." + }, + "2407.03131v1": { + "title": "MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition", + "url": "http://arxiv.org/abs/2407.03131v1", + "authors": "Yanjie Cui, Xiaohong Liu, Jing Liang, Yamin Fu", + "update_time": "2024-07-03", + "abstract": "Electroencephalography (EEG), a medical imaging technique that captures scalp electrical activity of brain structures via electrodes, has been widely used in affective computing. The spatial domain of EEG is rich in affective information.However, few of the existing studies have simultaneously analyzed EEG signals from multiple perspectives of geometric and anatomical structures in spatial domain. In this paper, we propose a multi-view Graph Transformer (MVGT) based on spatial relations, which integrates information from the temporal, frequency and spatial domains, including geometric and anatomical structures, so as to enhance the expressive power of the model comprehensively.We incorporate the spatial information of EEG channels into the model as encoding, thereby improving its ability to perceive the spatial structure of the channels. Meanwhile, experimental results based on publicly available datasets demonstrate that our proposed model outperforms state-of-the-art methods in recent years. In addition, the results also show that the MVGT could extract information from multiple domains and capture inter-channel relationships in EEG emotion recognition tasks effectively." + }, + "2407.03004v1": { + "title": "SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research", + "url": "http://arxiv.org/abs/2407.03004v1", + "authors": "Meghal Dani, Muthu Jeyanthi Prakash, Zeynep Akata, Stefanie Liebe", + "update_time": "2024-07-03", + "abstract": "Large Language Models have shown promising results in their ability to encode general medical knowledge in standard medical question-answering datasets. However, their potential application in clinical practice requires evaluation in domain-specific tasks, where benchmarks are largely missing. In this study semioLLM, we test the ability of state-of-the-art LLMs (GPT-3.5, GPT-4, Mixtral 8x7B, and Qwen-72chat) to leverage their internal knowledge and reasoning for epilepsy diagnosis. Specifically, we obtain likelihood estimates linking unstructured text descriptions of seizures to seizure-generating brain regions, using an annotated clinical database containing 1269 entries. We evaluate the LLM's performance, confidence, reasoning, and citation abilities in comparison to clinical evaluation. Models achieve above-chance classification performance with prompt engineering significantly improving their outcome, with some models achieving close-to-clinical performance and reasoning. However, our analyses also reveal significant pitfalls with several models being overly confident while showing poor performance, as well as exhibiting citation errors and hallucinations. In summary, our work provides the first extensive benchmark comparing current SOTA LLMs in the medical domain of epilepsy and highlights their ability to leverage unstructured texts from patients' medical history to aid diagnostic processes in health care." + }, + "2407.02673v1": { + "title": "A Novel Approach to Image EEG Sleep Data for Improving Quality of Life in Patients Suffering From Brain Injuries Using DreamDiffusion", + "url": "http://arxiv.org/abs/2407.02673v1", + "authors": "David Fahim, Joshveer Grewal, Ritvik Ellendula", + "update_time": "2024-07-02", + "abstract": "Those experiencing strokes, traumatic brain injuries, and drug complications can often end up hospitalized and diagnosed with coma or locked-in syndrome. Such mental impediments can permanently alter the neurological pathways in work and significantly decrease the quality of life (QoL). It is critical to translate brain signals into images to gain a deeper understanding of the thoughts of a comatose patient. Traditionally, brain signals collected by an EEG could only be translated into text, but with the novel method of an open-source model available on GitHub, DreamDiffusion can be used to convert brain waves into images directly. DreamDiffusion works by extracting features from EEG signals and then using the features to create images through StableDiffusion. Upon this, we made further improvements that could make StableDiffusion the forerunner technology in waves to media translation. In our study, we begin by modifying the existing DreamDiffusion codebase so that it does not require any prior setup, avoiding any confusing steps needed to run the model from GitHub. For many researchers, the incomplete setup process, errors in the existing code, and a lack of directions made it nearly impossible to run, not even considering the model's performance. We brought the code into Google Colab so users could run and evaluate problems cell-by-cell, eliminating the specific file and repository dependencies. We also provided the original training data file so users do not need to purchase the necessary computing power to train the model from the given dataset. The second change is utilizing the mutability of the code and optimizing the model so it can be used to generate images from other given inputs, such as sleep data. Additionally, the affordability of EEG technology allows for global dissemination and creates the opportunity for those who want to work on the shared DreamDiffusion model." + }, + "2407.02418v1": { + "title": "AXIAL: Attention-based eXplainability for Interpretable Alzheimer's Localized Diagnosis using 2D CNNs on 3D MRI brain scans", + "url": "http://arxiv.org/abs/2407.02418v1", + "authors": "Gabriele Lozupone, Alessandro Bria, Francesco Fontanella, Claudio De Stefano", + "update_time": "2024-07-02", + "abstract": "This study presents an innovative method for Alzheimer's disease diagnosis using 3D MRI designed to enhance the explainability of model decisions. Our approach adopts a soft attention mechanism, enabling 2D CNNs to extract volumetric representations. At the same time, the importance of each slice in decision-making is learned, allowing the generation of a voxel-level attention map to produces an explainable MRI. To test our method and ensure the reproducibility of our results, we chose a standardized collection of MRI data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). On this dataset, our method significantly outperforms state-of-the-art methods in (i) distinguishing AD from cognitive normal (CN) with an accuracy of 0.856 and Matthew's correlation coefficient (MCC) of 0.712, representing improvements of 2.4\\% and 5.3\\% respectively over the second-best, and (ii) in the prognostic task of discerning stable from progressive mild cognitive impairment (MCI) with an accuracy of 0.725 and MCC of 0.443, showing improvements of 10.2\\% and 20.5\\% respectively over the second-best. We achieved this prognostic result by adopting a double transfer learning strategy, which enhanced sensitivity to morphological changes and facilitated early-stage AD detection. With voxel-level precision, our method identified which specific areas are being paid attention to, identifying these predominant brain regions: the \\emph{hippocampus}, the \\emph{amygdala}, the \\emph{parahippocampal}, and the \\emph{inferior lateral ventricles}. All these areas are clinically associated with AD development. Furthermore, our approach consistently found the same AD-related areas across different cross-validation folds, proving its robustness and precision in highlighting areas that align closely with known pathological markers of the disease.", + "code_url": "https://github.com/GabrieleLozupone/AXIAL" + }, + "2407.02366v1": { + "title": "Hybrid Quantum-Classical Photonic Neural Networks", + "url": "http://arxiv.org/abs/2407.02366v1", + "authors": "Tristan Austin, Simon Bilodeau, Andrew Hayman, Nir Rotenberg, Bhavin Shastri", + "update_time": "2024-07-02", + "abstract": "Neuromorphic (brain-inspired) photonics leverages photonic chips to accelerate artificial intelligence, offering high-speed and energy efficient solutions for use in RF communication, tensor processing, and data classification. However, the limited physical size of integrated photonic hardware limits network complexity and computational capacity. In light of recent advance in photonic quantum technology, it is natural to utilize quantum exponential speedup to scale photonic neural network capacity without increasing the network size. Here we show a combination of classical network layers with trainable continuous variable quantum circuits yields hybrid networks with improved trainability and accuracy. On a classification task, hybrid networks achieve the same performance when benchmarked against fully classical networks that are twice the size. When the bit precision of the optimized networks is reduced through added noise, the hybrid networks still achieve greater accuracy when evaluated at state of the art bit precision. These hybrid quantum classical networks demonstrate a unique route to improve computational capacity of integrated photonic neural networks without increasing the network size." + }, + "2407.02235v1": { + "title": "Towards a Holistic Framework for Multimodal Large Language Models in Three-dimensional Brain CT Report Generation", + "url": "http://arxiv.org/abs/2407.02235v1", + "authors": "Cheng-Yi Li, Kao-Jung Chang, Cheng-Fu Yang, Hsin-Yu Wu, Wenting Chen, Hritik Bansal, Ling Chen, Yi-Ping Yang, Yu-Chun Chen, Shih-Pin Chen, Jiing-Feng Lirng, Kai-Wei Chang, Shih-Hwa Chiou", + "update_time": "2024-07-02", + "abstract": "Multi-modal large language models (MLLMs) have been given free rein to explore exciting medical applications with a primary focus on radiology report generation. Nevertheless, the preliminary success in 2D radiology captioning is incompetent to reflect the real-world diagnostic challenge in the volumetric 3D anatomy. To mitigate three crucial limitation aspects in the existing literature, including (1) data complexity, (2) model capacity, and (3) evaluation metric fidelity, we collected an 18,885 text-scan pairs 3D-BrainCT dataset and applied clinical visual instruction tuning (CVIT) to train BrainGPT models to generate radiology-adherent 3D brain CT reports. Statistically, our BrainGPT scored BLEU-1 = 44.35, BLEU-4 = 20.38, METEOR = 30.13, ROUGE-L = 47.6, and CIDEr-R = 211.77 during internal testing and demonstrated an accuracy of 0.91 in captioning midline shifts on the external validation CQ500 dataset. By further inspecting the captioned report, we reported that the traditional metrics appeared to measure only the surface text similarity and failed to gauge the information density of the diagnostic purpose. To close this gap, we proposed a novel Feature-Oriented Radiology Task Evaluation (FORTE) to estimate the report's clinical relevance (lesion feature and landmarks). Notably, the BrainGPT model scored an average FORTE F1-score of 0.71 (degree=0.661; landmark=0.706; feature=0.693; impression=0.779). To demonstrate that BrainGPT models possess objective readiness to generate human-like radiology reports, we conducted a Turing test that enrolled 11 physician evaluators, and around 74% of the BrainGPT-generated captions were indistinguishable from those written by humans. Our work embodies a holistic framework that showcased the first-hand experience of curating a 3D brain CT dataset, fine-tuning anatomy-sensible language models, and proposing robust radiology evaluation metrics.", + "code_url": "https://github.com/charlierabea/FORTE" + }, + "2407.02227v1": { + "title": "Interplay between MRI-based axon diameter and myelination estimates in macaque and human brain", + "url": "http://arxiv.org/abs/2407.02227v1", + "authors": "Ting Gong, Chiara Maffei, Evan Dann, Hong-Hsi Lee, Hansol Lee, Jean C. Augustinack, Susie Y. Huang, Suzanne N. Haber, Anastasia Yendiki", + "update_time": "2024-07-02", + "abstract": "Axon diameter and myelin thickness are closely related microstructural tissue properties that affect the conduction velocity of action potentials in the nervous system. Imaging them non-invasively with MRI-based methods is thus valuable for studying brain microstructure and function. However, the relationship between MRI-based axon diameter and myelination measures has not been investigated across the brain, mainly due to methodological limitations in estimating axon diameters. In recent years, studies using ultra-high gradient strength diffusion MRI (dMRI) have demonstrated improved estimation of axon diameter across white-matter (WM) tracts in the human brain, making such investigations feasible. In this study, we aim to investigate relationships between tissue microstructure properties with MRI-based methods and compare the imaging findings to histological evidence from the literature. We collected dMRI with ultra-high gradient strength and multi-echo spin-echo MRI on ex vivo macaque and human brain samples on a preclinical scanner. From these data, we estimated axon diameter, intra-axonal signal fraction, myelin water fraction (MWF) and aggregate g-ratio and investigated their correlations. We found that the microstructural imaging parameters exhibited consistent patterns across WM tracts and species. Overall, the findings suggest that MRI-based axon geometry and myelination measures can provide complementary information about fiber morphology, and the relationships between these measures agree with prior histological evidence." + }, + "2407.02000v1": { + "title": "Sub-millisecond electric field sensing with an individual rare-earth doped ferroelectric nanocrystal", + "url": "http://arxiv.org/abs/2407.02000v1", + "authors": "Athulya Muraleedharan, Jingye Zou, Maxime Vallet, Abdelali Zaki, Christine Bogicevic, Charles Paillard, Karen Perronet, Fran\u00e7ois Treussart", + "update_time": "2024-07-02", + "abstract": "Understanding the dynamics of electrical signals within neuronal assemblies is crucial to unraveling complex brain function. Despite recent advances in employing optically active nanostructures in transmembrane potential sensing, there remains room for improvement in terms of response time and sensitivity. Here, we report the development of such a nanosensor capable of detecting electric fields with a submillisecond response time at the single particle level. We achieve this by using ferroelectric nanocrystals doped with rare earth ions producing upconversion (UC). When such a nanocrystal experiences a variation of surrounding electric potential, its surface charge density changes, inducing electric polarization modifications that vary, via converse piezoelectric effect, the crystal field around the ions. The latter variation is finally converted into UC spectral changes, enabling optical detection of electric potential. To develop such a sensor, we synthesized erbium and ytterbium-doped barium titanate crystals of size $\\approx160$ nm. We observed distinct changes in the UC spectrum when individual nanocrystals were subjected to an external field via a conductive AFM tip, with a response time of 100 $\\mu$s. Furthermore, our sensor exhibits a sensitivity to electric fields of only 4.8 kV/cm/$\\sqrt{\\rm Hz}$, making possible the time-resolved detection of a neuron action potential." + } + }, + "EEG": { + "2407.03177v1": { + "title": "EDPNet: An Efficient Dual Prototype Network for Motor Imagery EEG Decoding", + "url": "http://arxiv.org/abs/2407.03177v1", + "authors": "Can Han, Chen Liu, Crystal Cai, Jun Wang, Dahong Qian", + "update_time": "2024-07-03", + "abstract": "Motor imagery electroencephalograph (MI-EEG) decoding plays a crucial role in developing motor imagery brain-computer interfaces (MI-BCIs). However, decoding intentions from MI remains challenging due to the inherent complexity of EEG signals relative to the small-sample size. In this paper, we propose an Efficient Dual Prototype Network (EDPNet) to enable accurate and fast MI decoding. EDPNet employs a lightweight adaptive spatial-spectral fusion module, which promotes more efficient information fusion between multiple EEG electrodes. Subsequently, a parameter-free multi-scale variance pooling module extracts more comprehensive temporal features. Furthermore, we introduce dual prototypical learning to optimize the feature space distribution and training process, thereby improving the model's generalization ability on small-sample MI datasets. Our experimental results show that the EDPNet outperforms state-of-the-art models with superior classification accuracy and kappa values (84.11% and 0.7881 for dataset BCI competition IV 2a, 86.65% and 0.7330 for dataset BCI competition IV 2b). Additionally, we use the BCI competition III IVa dataset with fewer training data to further validate the generalization ability of the proposed EDPNet. We also achieve superior performance with 82.03% classification accuracy. Benefiting from the lightweight parameters and superior decoding accuracy, our EDPNet shows great potential for MI-BCI applications. The code is publicly available at https://github.com/hancan16/EDPNet." + }, + "2407.03131v1": { + "title": "MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition", + "url": "http://arxiv.org/abs/2407.03131v1", + "authors": "Yanjie Cui, Xiaohong Liu, Jing Liang, Yamin Fu", + "update_time": "2024-07-03", + "abstract": "Electroencephalography (EEG), a medical imaging technique that captures scalp electrical activity of brain structures via electrodes, has been widely used in affective computing. The spatial domain of EEG is rich in affective information.However, few of the existing studies have simultaneously analyzed EEG signals from multiple perspectives of geometric and anatomical structures in spatial domain. In this paper, we propose a multi-view Graph Transformer (MVGT) based on spatial relations, which integrates information from the temporal, frequency and spatial domains, including geometric and anatomical structures, so as to enhance the expressive power of the model comprehensively.We incorporate the spatial information of EEG channels into the model as encoding, thereby improving its ability to perceive the spatial structure of the channels. Meanwhile, experimental results based on publicly available datasets demonstrate that our proposed model outperforms state-of-the-art methods in recent years. In addition, the results also show that the MVGT could extract information from multiple domains and capture inter-channel relationships in EEG emotion recognition tasks effectively." + }, + "2407.03089v1": { + "title": "Spatio-Temporal Adaptive Diffusion Models for EEG Super-Resolution in Epilepsy Diagnosis", + "url": "http://arxiv.org/abs/2407.03089v1", + "authors": "Tong Zhou, Shuqiang Wang", + "update_time": "2024-07-03", + "abstract": "Electroencephalogram (EEG) technology, particularly high-density EEG (HD EEG) devices, is widely used in fields such as neuroscience. HD EEG devices improve the spatial resolution of EEG by placing more electrodes on the scalp, meeting the requirements of clinical diagnostic applications such as epilepsy focus localization. However, this technique faces challenges such as high acquisition costs and limited usage scenarios. In this paper, spatio-temporal adaptive diffusion models (STADMs) are proposed to pioneer the use of diffusion models for achieving spatial SR reconstruction from low-resolution (LR, 64 channels or fewer) EEG to high-resolution (HR, 256 channels) EEG. Specifically, a spatio-temporal condition module is designed to extract the spatio-temporal features of LR EEG, which then serve as conditional inputs to guide the reverse denoising process of diffusion models. Additionally, a multi-scale Transformer denoising module is constructed to leverage multi-scale convolution blocks and cross-attention-based diffusion Transformer blocks for conditional guidance to generate subject-adaptive SR EEG. Experimental results demonstrate that the proposed method effectively enhances the spatial resolution of LR EEG and quantitatively outperforms existing methods. Furthermore, STADMs demonstrate their value by applying synthetic SR EEG to classification and source localization tasks of epilepsy patients, indicating their potential to significantly improve the spatial resolution of LR EEG." + }, + "2407.02673v1": { + "title": "A Novel Approach to Image EEG Sleep Data for Improving Quality of Life in Patients Suffering From Brain Injuries Using DreamDiffusion", + "url": "http://arxiv.org/abs/2407.02673v1", + "authors": "David Fahim, Joshveer Grewal, Ritvik Ellendula", + "update_time": "2024-07-02", + "abstract": "Those experiencing strokes, traumatic brain injuries, and drug complications can often end up hospitalized and diagnosed with coma or locked-in syndrome. Such mental impediments can permanently alter the neurological pathways in work and significantly decrease the quality of life (QoL). It is critical to translate brain signals into images to gain a deeper understanding of the thoughts of a comatose patient. Traditionally, brain signals collected by an EEG could only be translated into text, but with the novel method of an open-source model available on GitHub, DreamDiffusion can be used to convert brain waves into images directly. DreamDiffusion works by extracting features from EEG signals and then using the features to create images through StableDiffusion. Upon this, we made further improvements that could make StableDiffusion the forerunner technology in waves to media translation. In our study, we begin by modifying the existing DreamDiffusion codebase so that it does not require any prior setup, avoiding any confusing steps needed to run the model from GitHub. For many researchers, the incomplete setup process, errors in the existing code, and a lack of directions made it nearly impossible to run, not even considering the model's performance. We brought the code into Google Colab so users could run and evaluate problems cell-by-cell, eliminating the specific file and repository dependencies. We also provided the original training data file so users do not need to purchase the necessary computing power to train the model from the given dataset. The second change is utilizing the mutability of the code and optimizing the model so it can be used to generate images from other given inputs, such as sleep data. Additionally, the affordability of EEG technology allows for global dissemination and creates the opportunity for those who want to work on the shared DreamDiffusion model." + }, + "2407.01894v1": { + "title": "Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection", + "url": "http://arxiv.org/abs/2407.01894v1", + "authors": "Zixing Li, Chao Yan, Zhen Lan, Dengqing Tang, Xiaojia Xiang, Han Zhou, Jun Lai", + "update_time": "2024-07-02", + "abstract": "Advanced cognition can be extracted from the human brain using brain-computer interfaces. Integrating these interfaces with computer vision techniques, which possess efficient feature extraction capabilities, can achieve more robust and accurate detection of dim targets in aerial images. However, existing target detection methods primarily concentrate on homogeneous data, lacking efficient and versatile processing capabilities for heterogeneous multimodal data. In this paper, we first build a brain-eye-computer based object detection system for aerial images under few-shot conditions. This system detects suspicious targets using region proposal networks, evokes the event-related potential (ERP) signal in electroencephalogram (EEG) through the eye-tracking-based slow serial visual presentation (ESSVP) paradigm, and constructs the EEG-image data pairs with eye movement data. Then, an adaptive modality balanced online knowledge distillation (AMBOKD) method is proposed to recognize dim objects with the EEG-image data. AMBOKD fuses EEG and image features using a multi-head attention module, establishing a new modality with comprehensive features. To enhance the performance and robust capability of the fusion modality, simultaneous training and mutual learning between modalities are enabled by end-to-end online knowledge distillation. During the learning process, an adaptive modality balancing module is proposed to ensure multimodal equilibrium by dynamically adjusting the weights of the importance and the training gradients across various modalities. The effectiveness and superiority of our method are demonstrated by comparing it with existing state-of-the-art methods. Additionally, experiments conducted on public datasets and system validations in real-world scenarios demonstrate the reliability and practicality of the proposed system and the designed method.", + "code_url": "https://github.com/lizixing23/AMBOKD" + }, + "2407.01884v1": { + "title": "EIT-1M: One Million EEG-Image-Text Pairs for Human Visual-textual Recognition and More", + "url": "http://arxiv.org/abs/2407.01884v1", + "authors": "Xu Zheng, Ling Wang, Kanghao Chen, Yuanhuiyi Lyu, Jiazhou Zhou, Lin Wang", + "update_time": "2024-07-02", + "abstract": "Recently, electroencephalography (EEG) signals have been actively incorporated to decode brain activity to visual or textual stimuli and achieve object recognition in multi-modal AI. Accordingly, endeavors have been focused on building EEG-based datasets from visual or textual single-modal stimuli. However, these datasets offer limited EEG epochs per category, and the complex semantics of stimuli presented to participants compromise their quality and fidelity in capturing precise brain activity. The study in neuroscience unveils that the relationship between visual and textual stimulus in EEG recordings provides valuable insights into the brain's ability to process and integrate multi-modal information simultaneously. Inspired by this, we propose a novel large-scale multi-modal dataset, named EIT-1M, with over 1 million EEG-image-text pairs. Our dataset is superior in its capacity of reflecting brain activities in simultaneously processing multi-modal information. To achieve this, we collected data pairs while participants viewed alternating sequences of visual-textual stimuli from 60K natural images and category-specific texts. Common semantic categories are also included to elicit better reactions from participants' brains. Meanwhile, response-based stimulus timing and repetition across blocks and sessions are included to ensure data diversity. To verify the effectiveness of EIT-1M, we provide an in-depth analysis of EEG data captured from multi-modal stimuli across different categories and participants, along with data quality scores for transparency. We demonstrate its validity on two tasks: 1) EEG recognition from visual or textual stimuli or both and 2) EEG-to-visual generation." + }, + "2406.19884v1": { + "title": "Investigating the Timescales of Language Processing with EEG and Language Models", + "url": "http://arxiv.org/abs/2406.19884v1", + "authors": "Davide Turco, Conor Houghton", + "update_time": "2024-06-28", + "abstract": "This study explores the temporal dynamics of language processing by examining the alignment between word representations from a pre-trained transformer-based language model, and EEG data. Using a Temporal Response Function (TRF) model, we investigate how neural activity corresponds to model representations across different layers, revealing insights into the interaction between artificial language models and brain responses during language comprehension. Our analysis reveals patterns in TRFs from distinct layers, highlighting varying contributions to lexical and compositional processing. Additionally, we used linear discriminant analysis (LDA) to isolate part-of-speech (POS) representations, offering insights into their influence on neural responses and the underlying mechanisms of syntactic processing. These findings underscore EEG's utility for probing language processing dynamics with high temporal resolution. By bridging artificial language models and neural activity, this study advances our understanding of their interaction at fine timescales." + }, + "2406.19246v1": { + "title": "An Interpretable and Efficient Sleep Staging Algorithm: DetectsleepNet", + "url": "http://arxiv.org/abs/2406.19246v1", + "authors": "Shengwei Guo", + "update_time": "2024-06-27", + "abstract": "Sleep quality directly impacts human health and quality of life, so accurate sleep staging is essential for assessing sleep quality. However, most traditional methods are inefficient and time-consuming due to segmenting different sleep cycles by manual labeling. In contrast, automated sleep staging technology not only directly assesses sleep quality but also helps sleep specialists analyze sleep status, significantly improving efficiency and reducing the cost of sleep monitoring, especially for continuous sleep monitoring. Most of the existing models, however, are deficient in computational efficiency, lightweight design, and model interpretability. In this paper, we propose a neural network architecture based on the prior knowledge of sleep experts. Specifically, 1) Propose an end-to-end model named DetectsleepNet that uses single-channel EEG signals without additional data processing, which has achieved an impressive 80.9% accuracy on the SHHS dataset and an outstanding 88.0% accuracy on the Physio2018 dataset. 2) Constructure an efficient lightweight sleep staging model named DetectsleepNet-tiny based on DetectsleepNet, which has just 6% of the parameter numbers of existing models, but its accuracy exceeds 99% of state-of-the-art models, 3) Introducing a specific inference header to assess the attention given to a specific EEG segment in each sleep frame, enhancing the transparency in the decisions of models. Our model comprises fewer parameters compared to existing ones and ulteriorly explores the interpretability of the model to facilitate its application in healthcare. The code is available at https://github.com/komdec/DetectSleepNet.git." + }, + "2406.19189v1": { + "title": "BISeizuRe: BERT-Inspired Seizure Data Representation to Improve Epilepsy Monitoring", + "url": "http://arxiv.org/abs/2406.19189v1", + "authors": "Luca Benfenati, Thorir Mar Ingolfsson, Andrea Cossettini, Daniele Jahier Pagliari, Alessio Burrello, Luca Benini", + "update_time": "2024-06-27", + "abstract": "This study presents a novel approach for EEG-based seizure detection leveraging a BERT-based model. The model, BENDR, undergoes a two-phase training process. Initially, it is pre-trained on the extensive Temple University Hospital EEG Corpus (TUEG), a 1.5 TB dataset comprising over 10,000 subjects, to extract common EEG data patterns. Subsequently, the model is fine-tuned on the CHB-MIT Scalp EEG Database, consisting of 664 EEG recordings from 24 pediatric patients, of which 198 contain seizure events. Key contributions include optimizing fine-tuning on the CHB-MIT dataset, where the impact of model architecture, pre-processing, and post-processing techniques are thoroughly examined to enhance sensitivity and reduce false positives per hour (FP/h). We also explored custom training strategies to ascertain the most effective setup. The model undergoes a novel second pre-training phase before subject-specific fine-tuning, enhancing its generalization capabilities. The optimized model demonstrates substantial performance enhancements, achieving as low as 0.23 FP/h, 2.5$\\times$ lower than the baseline model, with a lower but still acceptable sensitivity rate, showcasing the effectiveness of applying a BERT-based approach on EEG-based seizure detection." + }, + "2406.18345v1": { + "title": "EmT: A Novel Transformer for Generalized Cross-subject EEG Emotion Recognition", + "url": "http://arxiv.org/abs/2406.18345v1", + "authors": "Yi Ding, Chengxuan Tong, Shuailei Zhang, Muyun Jiang, Yong Li, Kevin Lim Jun Liang, Cuntai Guan", + "update_time": "2024-06-26", + "abstract": "Integrating prior knowledge of neurophysiology into neural network architecture enhances the performance of emotion decoding. While numerous techniques emphasize learning spatial and short-term temporal patterns, there has been limited emphasis on capturing the vital long-term contextual information associated with emotional cognitive processes. In order to address this discrepancy, we introduce a novel transformer model called emotion transformer (EmT). EmT is designed to excel in both generalized cross-subject EEG emotion classification and regression tasks. In EmT, EEG signals are transformed into a temporal graph format, creating a sequence of EEG feature graphs using a temporal graph construction module (TGC). A novel residual multi-view pyramid GCN module (RMPG) is then proposed to learn dynamic graph representations for each EEG feature graph within the series, and the learned representations of each graph are fused into one token. Furthermore, we design a temporal contextual transformer module (TCT) with two types of token mixers to learn the temporal contextual information. Finally, the task-specific output module (TSO) generates the desired outputs. Experiments on four publicly available datasets show that EmT achieves higher results than the baseline methods for both EEG emotion classification and regression tasks. The code is available at https://github.com/yi-ding-cs/EmT.", + "code_url": "https://github.com/yi-ding-cs/emt" + } + }, + "BCI": { + "2407.03177v1": { + "title": "EDPNet: An Efficient Dual Prototype Network for Motor Imagery EEG Decoding", + "url": "http://arxiv.org/abs/2407.03177v1", + "authors": "Can Han, Chen Liu, Crystal Cai, Jun Wang, Dahong Qian", + "update_time": "2024-07-03", + "abstract": "Motor imagery electroencephalograph (MI-EEG) decoding plays a crucial role in developing motor imagery brain-computer interfaces (MI-BCIs). However, decoding intentions from MI remains challenging due to the inherent complexity of EEG signals relative to the small-sample size. In this paper, we propose an Efficient Dual Prototype Network (EDPNet) to enable accurate and fast MI decoding. EDPNet employs a lightweight adaptive spatial-spectral fusion module, which promotes more efficient information fusion between multiple EEG electrodes. Subsequently, a parameter-free multi-scale variance pooling module extracts more comprehensive temporal features. Furthermore, we introduce dual prototypical learning to optimize the feature space distribution and training process, thereby improving the model's generalization ability on small-sample MI datasets. Our experimental results show that the EDPNet outperforms state-of-the-art models with superior classification accuracy and kappa values (84.11% and 0.7881 for dataset BCI competition IV 2a, 86.65% and 0.7330 for dataset BCI competition IV 2b). Additionally, we use the BCI competition III IVa dataset with fewer training data to further validate the generalization ability of the proposed EDPNet. We also achieve superior performance with 82.03% classification accuracy. Benefiting from the lightweight parameters and superior decoding accuracy, our EDPNet shows great potential for MI-BCI applications. The code is publicly available at https://github.com/hancan16/EDPNet." + }, + "2406.18425v1": { + "title": "L-Sort: An Efficient Hardware for Real-time Multi-channel Spike Sorting with Localization", + "url": "http://arxiv.org/abs/2406.18425v1", + "authors": "Yuntao Han, Shiwei Wang, Alister Hamilton", + "update_time": "2024-06-26", + "abstract": "Spike sorting is essential for extracting neuronal information from neural signals and understanding brain function. With the advent of high-density microelectrode arrays (HDMEAs), the challenges and opportunities in multi-channel spike sorting have intensified. Real-time spike sorting is particularly crucial for closed-loop brain computer interface (BCI) applications, demanding efficient hardware implementations. This paper introduces L-Sort, an hardware design for real-time multi-channel spike sorting. Leveraging spike localization techniques, L-Sort achieves efficient spike detection and clustering without the need to store raw signals during detection. By incorporating median thresholding and geometric features, L-Sort demonstrates promising results in terms of accuracy and hardware efficiency. We assessed the detection and clustering accuracy of our design with publicly available datasets recorded using high-density neural probes (Neuropixel). We implemented our design on an FPGA and compared the results with state of the art. Results show that our designs consume less hardware resource comparing with other FPGA-based spike sorting hardware." + }, + "2406.17391v1": { + "title": "Comparing fingers and gestures for bci control using an optimized classical machine learning decoder", + "url": "http://arxiv.org/abs/2406.17391v1", + "authors": "D. Keller, M. J. Vansteensel, S. Mehrkanoon, M. P. Branco", + "update_time": "2024-06-25", + "abstract": "Severe impairment of the central motor network can result in loss of motor function, clinically recognized as Locked-in Syndrome. Advances in Brain-Computer Interfaces offer a promising avenue for partially restoring compromised communicative abilities by decoding different types of hand movements from the sensorimotor cortex. In this study, we collected ECoG recordings from 8 epilepsy patients and compared the decodability of individual finger flexion and hand gestures with the resting state, as a proxy for a one-dimensional brain-click. The results show that all individual finger flexion and hand gestures are equally decodable across multiple models and subjects (>98.0\\%). In particular, hand movements, involving index finger flexion, emerged as promising candidates for brain-clicks. When decoding among multiple hand movements, finger flexion appears to outperform hand gestures (96.2\\% and 92.5\\% respectively) and exhibit greater robustness against misclassification errors when all hand movements are included. These findings highlight that optimized classical machine learning models with feature engineering are viable decoder designs for communication-assistive systems." + }, + "2406.14801v1": { + "title": "Model Predictive Control of the Neural Manifold", + "url": "http://arxiv.org/abs/2406.14801v1", + "authors": "Christof Fehrman, C. Daniel Meliza", + "update_time": "2024-06-21", + "abstract": "Neural manifolds are an attractive theoretical framework for characterizing the complex behaviors of neural populations. However, many of the tools for identifying these low-dimensional subspaces are correlational and provide limited insight into the underlying dynamics. The ability to precisely control this latent activity would allow researchers to investigate the structure and function of neural manifolds. Employing techniques from the field of optimal control, we simulate controlling the latent dynamics of a neural population using closed-loop, dynamically generated sensory inputs. Using a spiking neural network (SNN) as a model of a neural circuit, we find low-dimensional representations of both the network activity (the neural manifold) and a set of salient visual stimuli. With a data-driven latent dynamics model, we apply model predictive control (MPC) to provide anticipatory, optimal control over the trajectory of the circuit in a latent space. We are able to control the latent dynamics of the SNN to follow several reference trajectories despite observing only a subset of neurons and with a substantial amount of unknown noise injected into the network. These results provide a framework to experimentally test for causal relationships between manifold dynamics and other variables of interest such as organismal behavior and BCI performance." + }, + "2406.14179v1": { + "title": "Single Channel-based Motor Imagery Classification using Fisher's Ratio and Pearson Correlation", + "url": "http://arxiv.org/abs/2406.14179v1", + "authors": "Sonal Santosh Baberwal, Tomas Ward, Shirley Coyle", + "update_time": "2024-06-20", + "abstract": "Motor imagery-based BCI systems have been promising and gaining popularity in rehabilitation and Activities of daily life(ADL). Despite this, the technology is still emerging and has not yet been outside the laboratory constraints. Channel reduction is one contributing avenue to make these systems part of ADL. Although Motor Imagery classification heavily depends on spatial factors, single channel-based classification remains an avenue to be explored thoroughly. Since Fisher's ratio and Pearson Correlation are powerful measures actively used in the domain, we propose an integrated framework (FRPC integrated framework) that integrates Fisher's Ratio to select the best channel and Pearson correlation to select optimal filter banks and extract spectral and temporal features respectively. The framework is tested for a 2-class motor imagery classification on 2 open-source datasets and 1 collected dataset and compared with state-of-art work. Apart from implementing the framework, this study also explores the most optimal channel among all the subjects and later explores classes where the single-channel framework is efficient." + }, + "2406.11799v1": { + "title": "Mix-Domain Contrastive Learning for Unpaired H&E-to-IHC Stain Translation", + "url": "http://arxiv.org/abs/2406.11799v1", + "authors": "Song Wang, Zhong Zhang, Huan Yan, Ming Xu, Guanghui Wang", + "update_time": "2024-06-17", + "abstract": "H&E-to-IHC stain translation techniques offer a promising solution for precise cancer diagnosis, especially in low-resource regions where there is a shortage of health professionals and limited access to expensive equipment. Considering the pixel-level misalignment of H&E-IHC image pairs, current research explores the pathological consistency between patches from the same positions of the image pair. However, most of them overemphasize the correspondence between domains or patches, overlooking the side information provided by the non-corresponding objects. In this paper, we propose a Mix-Domain Contrastive Learning (MDCL) method to leverage the supervision information in unpaired H&E-to-IHC stain translation. Specifically, the proposed MDCL method aggregates the inter-domain and intra-domain pathology information by estimating the correlation between the anchor patch and all the patches from the matching images, encouraging the network to learn additional contrastive knowledge from mixed domains. With the mix-domain pathology information aggregation, MDCL enhances the pathological consistency between the corresponding patches and the component discrepancy of the patches from the different positions of the generated IHC image. Extensive experiments on two H&E-to-IHC stain translation datasets, namely MIST and BCI, demonstrate that the proposed method achieves state-of-the-art performance across multiple metrics." + }, + "2406.11568v1": { + "title": "Towards an End-to-End Framework for Invasive Brain Signal Decoding with Large Language Models", + "url": "http://arxiv.org/abs/2406.11568v1", + "authors": "Sheng Feng, Heyang Liu, Yu Wang, Yanfeng Wang", + "update_time": "2024-06-17", + "abstract": "In this paper, we introduce a groundbreaking end-to-end (E2E) framework for decoding invasive brain signals, marking a significant advancement in the field of speech neuroprosthesis. Our methodology leverages the comprehensive reasoning abilities of large language models (LLMs) to facilitate direct decoding. By fully integrating LLMs, we achieve results comparable to the state-of-the-art cascade models. Our findings underscore the immense potential of E2E frameworks in speech neuroprosthesis, particularly as the technology behind brain-computer interfaces (BCIs) and the availability of relevant datasets continue to evolve. This work not only showcases the efficacy of combining LLMs with E2E decoding for enhancing speech neuroprosthesis but also sets a new direction for future research in BCI applications, underscoring the impact of LLMs in decoding complex neural signals for communication restoration. Code will be made available at https://github.com/FsFrancis15/BrainLLM.", + "code_url": "https://github.com/fsfrancis15/brainllm" + }, + "2406.11500v2": { + "title": "ESI-GAL: EEG Source Imaging-based Kinematics Parameter Estimation for Grasp and Lift Task", + "url": "http://arxiv.org/abs/2406.11500v2", + "authors": "Anant Jain, Lalan Kumar", + "update_time": "2024-06-18", + "abstract": "Objective: Electroencephalogram (EEG) signals-based motor kinematics prediction (MKP) has been an active area of research to develop brain-computer interface (BCI) systems such as exosuits, prostheses, and rehabilitation devices. However, EEG source imaging (ESI) based kinematics prediction is sparsely explored in the literature. Approach: In this study, pre-movement EEG features are utilized to predict three-dimensional (3D) hand kinematics for the grasp-and-lift motor task. A public dataset, WAY-EEG-GAL, is utilized for MKP analysis. In particular, sensor-domain (EEG data) and source-domain (ESI data) based features from the frontoparietal region are explored for MKP. Deep learning-based models are explored to achieve efficient kinematics decoding. Various time-lagged and window sizes are analyzed for hand kinematics prediction. Subsequently, intra-subject and inter-subject MKP analysis is performed to investigate the subject-specific and subject-independent motor-learning capabilities of the neural decoders. The Pearson correlation coefficient (PCC) is used as the performance metric for kinematics trajectory decoding. Main results: The rEEGNet neural decoder achieved the best performance with sensor-domain and source-domain features with the time lag and window size of 100 ms and 450 ms, respectively. The highest mean PCC values of 0.790, 0.795, and 0.637 are achieved using sensor-domain features, while 0.769, 0.777, and 0.647 are achieved using source-domain features in x, y, and z-directions, respectively. Significance: This study explores the feasibility of trajectory prediction using EEG sensor-domain and source-domain EEG features for the grasp-and-lift task. Furthermore, inter-subject trajectory estimation is performed using the proposed deep learning decoder with EEG source domain features." + }, + "2406.11081v1": { + "title": "A Bayesian dynamic stopping method for evoked response brain-computer interfacing", + "url": "http://arxiv.org/abs/2406.11081v1", + "authors": "Sara Ahmadi, Peter Desain, Jordy Thielen", + "update_time": "2024-06-16", + "abstract": "As brain-computer interfacing (BCI) systems transition from assistive technology to more diverse applications, their speed, reliability, and user experience become increasingly important. Dynamic stopping methods enhance BCI system speed by deciding at any moment whether to output a result or wait for more information. Such approach leverages trial variance, allowing good trials to be detected earlier, thereby speeding up the process without significantly compromising accuracy. Existing dynamic stopping algorithms typically optimize measures such as symbols per minute (SPM) and information transfer rate (ITR). However, these metrics may not accurately reflect system performance for specific applications or user types. Moreover, many methods depend on arbitrary thresholds or parameters that require extensive training data. We propose a model-based approach that takes advantage of the analytical knowledge that we have about the underlying classification model. By using a risk minimisation approach, our model allows precise control over the types of errors and the balance between precision and speed. This adaptability makes it ideal for customizing BCI systems to meet the diverse needs of various applications. We validate our proposed method on a publicly available dataset, comparing it with established static and dynamic stopping methods. Our results demonstrate that our approach offers a broad range of accuracy-speed trade-offs and achieves higher precision than baseline stopping methods.", + "code_url": "https://github.com/thijor/pyntbci" + }, + "2406.07481v1": { + "title": "The end of multiple choice tests: using AI to enhance assessment", + "url": "http://arxiv.org/abs/2406.07481v1", + "authors": "Michael Klymkowsky, Melanie M. Cooper", + "update_time": "2024-06-11", + "abstract": "Effective teaching relies on knowing what students know-or think they know. Revealing student thinking is challenging. Often used because of their ease of grading, even the best multiple choice (MC) tests, those using research based distractors (wrong answers) are intrinsically limited in the insights they provide due to two factors. When distractors do not reflect student beliefs they can be ignored, increasing the likelihood that the correct answer will be chosen by chance. Moreover, making the correct choice does not guarantee that the student understands why it is correct. To address these limitations, we recommend asking students to explain why they chose their answer, and why \"wrong\" choices are wrong. Using a discipline-trained artificial intelligence-based bot it is possible to analyze their explanations, identifying the concepts and scientific principles that maybe missing or misapplied. The bot also makes suggestions for how instructors can use these data to better guide student thinking. In a small \"proof of concept\" study, we tested this approach using questions from the Biology Concepts Instrument (BCI). The result was rapid, informative, and provided actionable feedback on student thinking. It appears that the use of AI addresses the weaknesses of conventional MC test. It seems likely that incorporating AI-analyzed formative assessments will lead to improved overall learning outcomes." + } + }, + "fMRI": { + "2407.03217v1": { + "title": "MHNet: Multi-view High-order Network for Diagnosing Neurodevelopmental Disorders Using Resting-state fMRI", + "url": "http://arxiv.org/abs/2407.03217v1", + "authors": "Yueyang Li, Weiming Zeng, Wenhao Dong, Luhui Cai, Lei Wang, Hongyu Chen, Hongjie Yan, Lingbin Bian, Nizhuan Wang", + "update_time": "2024-07-03", + "abstract": "Background: Deep learning models have shown promise in diagnosing neurodevelopmental disorders (NDD) like ASD and ADHD. However, many models either use graph neural networks (GNN) to construct single-level brain functional networks (BFNs) or employ spatial convolution filtering for local information extraction from rs-fMRI data, often neglecting high-order features crucial for NDD classification. Methods: We introduce a Multi-view High-order Network (MHNet) to capture hierarchical and high-order features from multi-view BFNs derived from rs-fMRI data for NDD prediction. MHNet has two branches: the Euclidean Space Features Extraction (ESFE) module and the Non-Euclidean Space Features Extraction (Non-ESFE) module, followed by a Feature Fusion-based Classification (FFC) module for NDD identification. ESFE includes a Functional Connectivity Generation (FCG) module and a High-order Convolutional Neural Network (HCNN) module to extract local and high-order features from BFNs in Euclidean space. Non-ESFE comprises a Generic Internet-like Brain Hierarchical Network Generation (G-IBHN-G) module and a High-order Graph Neural Network (HGNN) module to capture topological and high-order features in non-Euclidean space. Results: Experiments on three public datasets show that MHNet outperforms state-of-the-art methods using both AAL1 and Brainnetome Atlas templates. Extensive ablation studies confirm the superiority of MHNet and the effectiveness of using multi-view fMRI information and high-order features. Our study also offers atlas options for constructing more sophisticated hierarchical networks and explains the association between key brain regions and NDD. Conclusion: MHNet leverages multi-view feature learning from both Euclidean and non-Euclidean spaces, incorporating high-order information from BFNs to enhance NDD classification performance." + }, + "2407.00201v2": { + "title": "Deconvolving Complex Neuronal Networks into Interpretable Task-Specific Connectomes", + "url": "http://arxiv.org/abs/2407.00201v2", + "authors": "Yifan Wang, Vikram Ravindra, Ananth Grama", + "update_time": "2024-07-03", + "abstract": "Task-specific functional MRI (fMRI) images provide excellent modalities for studying the neuronal basis of cognitive processes. We use fMRI data to formulate and solve the problem of deconvolving task-specific aggregate neuronal networks into a set of basic building blocks called canonical networks, to use these networks for functional characterization, and to characterize the physiological basis of these responses by mapping them to regions of the brain. Our results show excellent task-specificity of canonical networks, i.e., the expression of a small number of canonical networks can be used to accurately predict tasks; generalizability across cohorts, i.e., canonical networks are conserved across diverse populations, studies, and acquisition protocols; and that canonical networks have strong anatomical and physiological basis. From a methods perspective, the problem of identifying these canonical networks poses challenges rooted in the high dimensionality, small sample size, acquisition variability, and noise. Our deconvolution technique is based on non-negative matrix factorization (NMF) that identifies canonical networks as factors of a suitably constructed matrix. We demonstrate that our method scales to large datasets, yields stable and accurate factors, and is robust to noise." + }, + "2406.19041v1": { + "title": "Multiscale Functional Connectivity: Exploring the brain functional connectivity at different timescales", + "url": "http://arxiv.org/abs/2406.19041v1", + "authors": "Manuel Morante, Kristian Fr\u00f8lich, Naveed ur Rehman", + "update_time": "2024-06-27", + "abstract": "Human brains exhibit highly organized multiscale neurophysiological dynamics. Understanding those dynamic changes and the neuronal networks involved is critical for understanding how the brain functions in health and disease. Functional Magnetic Resonance Imaging (fMRI) is a prevalent neuroimaging technique for studying these complex interactions. However, analyzing fMRI data poses several challenges. Furthermore, most approaches for analyzing Functional Connectivity (FC) still rely on preprocessing or conventional methods, often built upon oversimplified assumptions. On top of that, those approaches often ignore frequency-related information despite evidence showing that fMRI data contain rich information that spans multiple timescales. This study introduces a novel methodology, Multiscale Functional Connectivity (MFC), to analyze fMRI data by decomposing the fMRI into their intrinsic modes, allowing us to separate the neurophysiological activation patterns at multiple timescales while separating them from other interfering components. Additionally, the proposed approach accounts for the natural nonlinear and nonstationary nature of fMRI and the particularities of each individual in a data-driven way. We evaluated the performance of our proposed methodology using three fMRI experiments. Our results demonstrate that our novel approach effectively separates the fMRI data into different timescales while identifying highly reliable functional connectivity patterns across individuals. In addition, we further extended our knowledge of how the FC for these three experiments spans among different timescales." + }, + "2406.18531v1": { + "title": "A principled framework to assess information theoretical fitness of brain functional sub-circuits", + "url": "http://arxiv.org/abs/2406.18531v1", + "authors": "Duy Duong-Tran, Nghi Nguyen, Shizhuo Mu, Jiong Chen, Jingxuan Bao, Frederick Xu, Sumita Garai, Jose Cadena-Pico, Alan David Kaplan, Tianlong Chen, Yize Zhao, Li Shen, Joaqu\u00edn Go\u00f1i", + "update_time": "2024-06-26", + "abstract": "In systems and network neuroscience, many common practices in brain connectomic analysis are often not properly scrutinized. One such practice is mapping a predetermined set of sub-circuits, like functional networks (FNs), onto subjects' functional connectomes (FCs) without adequately assessing the information-theoretic appropriateness of the partition. Another practice that goes unchallenged is thresholding weighted FCs to remove spurious connections without justifying the chosen threshold. This paper leverages recent theoretical advances in Stochastic Block Models (SBMs) to formally define and quantify the information-theoretic fitness (e.g., prominence) of a predetermined set of FNs when mapped to individual FCs under different fMRI task conditions. Our framework allows for evaluating any combination of FC granularity, FN partition, and thresholding strategy, thereby optimizing these choices to preserve important topological features of the human brain connectomes. Our results pave the way for the proper use of predetermined FNs and thresholding methods and provide insights for future research in individualized parcellations." + }, + "2406.18344v1": { + "title": "AlignedCut: Visual Concepts Discovery on Brain-Guided Universal Feature Space", + "url": "http://arxiv.org/abs/2406.18344v1", + "authors": "Huzheng Yang, James Gee, Jianbo Shi", + "update_time": "2024-06-26", + "abstract": "We study the intriguing connection between visual data, deep networks, and the brain. Our method creates a universal channel alignment by using brain voxel fMRI response prediction as the training objective. We discover that deep networks, trained with different objectives, share common feature channels across various models. These channels can be clustered into recurring sets, corresponding to distinct brain regions, indicating the formation of visual concepts. Tracing the clusters of channel responses onto the images, we see semantically meaningful object segments emerge, even without any supervised decoder. Furthermore, the universal feature alignment and the clustering of channels produce a picture and quantification of how visual information is processed through the different network layers, which produces precise comparisons between the networks." + }, + "2406.17086v1": { + "title": "BrainMAE: A Region-aware Self-supervised Learning Framework for Brain Signals", + "url": "http://arxiv.org/abs/2406.17086v1", + "authors": "Yifan Yang, Yutong Mao, Xufu Liu, Xiao Liu", + "update_time": "2024-06-24", + "abstract": "The human brain is a complex, dynamic network, which is commonly studied using functional magnetic resonance imaging (fMRI) and modeled as network of Regions of interest (ROIs) for understanding various brain functions. Recent studies utilize deep learning approaches to learn the brain network representation based on functional connectivity (FC) profile, broadly falling into two main categories. The Fixed-FC approaches, utilizing the FC profile which represents the linear temporal relation within the brain network, are limited by failing to capture informative brain temporal dynamics. On the other hand, the Dynamic-FC approaches, modeling the evolving FC profile over time, often exhibit less satisfactory performance due to challenges in handling the inherent noisy nature of fMRI data. To address these challenges, we propose Brain Masked Auto-Encoder (BrainMAE) for learning representations directly from fMRI time-series data. Our approach incorporates two essential components: a region-aware graph attention mechanism designed to capture the relationships between different brain ROIs, and a novel self-supervised masked autoencoding framework for effective model pre-training. These components enable the model to capture rich temporal dynamics of brain activity while maintaining resilience to inherent noise in fMRI data. Our experiments demonstrate that BrainMAE consistently outperforms established baseline methods by significant margins in four distinct downstream tasks. Finally, leveraging the model's inherent interpretability, our analysis of model-generated representations reveals findings that resonate with ongoing research in the field of neuroscience." + }, + "2406.16981v1": { + "title": "Research on Feature Extraction Data Processing System For MRI of Brain Diseases Based on Computer Deep Learning", + "url": "http://arxiv.org/abs/2406.16981v1", + "authors": "Lingxi Xiao, Jinxin Hu, Yutian Yang, Yinqiu Feng, Zichao Li, Zexi Chen", + "update_time": "2024-06-23", + "abstract": "Most of the existing wavelet image processing techniques are carried out in the form of single-scale reconstruction and multiple iterations. However, processing high-quality fMRI data presents problems such as mixed noise and excessive computation time. This project proposes the use of matrix operations by combining mixed noise elimination methods with wavelet analysis to replace traditional iterative algorithms. Functional magnetic resonance imaging (fMRI) of the auditory cortex of a single subject is analyzed and compared to the wavelet domain signal processing technology based on repeated times and the world's most influential SPM8. Experiments show that this algorithm is the fastest in computing time, and its detection effect is comparable to the traditional iterative algorithm. However, this has a higher practical value for the processing of FMRI data. In addition, the wavelet analysis method proposed signal processing to speed up the calculation rate." + }, + "2406.15537v1": { + "title": "R&B -- Rhythm and Brain: Cross-subject Decoding of Music from Human Brain Activity", + "url": "http://arxiv.org/abs/2406.15537v1", + "authors": "Matteo Ferrante, Matteo Ciferri, Nicola Toschi", + "update_time": "2024-06-21", + "abstract": "Music is a universal phenomenon that profoundly influences human experiences across cultures. This study investigates whether music can be decoded from human brain activity measured with functional MRI (fMRI) during its perception. Leveraging recent advancements in extensive datasets and pre-trained computational models, we construct mappings between neural data and latent representations of musical stimuli. Our approach integrates functional and anatomical alignment techniques to facilitate cross-subject decoding, addressing the challenges posed by the low temporal resolution and signal-to-noise ratio (SNR) in fMRI data. Starting from the GTZan fMRI dataset, where five participants listened to 540 musical stimuli from 10 different genres while their brain activity was recorded, we used the CLAP (Contrastive Language-Audio Pretraining) model to extract latent representations of the musical stimuli and developed voxel-wise encoding models to identify brain regions responsive to these stimuli. By applying a threshold to the association between predicted and actual brain activity, we identified specific regions of interest (ROIs) which can be interpreted as key players in music processing. Our decoding pipeline, primarily retrieval-based, employs a linear map to project brain activity to the corresponding CLAP features. This enables us to predict and retrieve the musical stimuli most similar to those that originated the fMRI data. Our results demonstrate state-of-the-art identification accuracy, with our methods significantly outperforming existing approaches. Our findings suggest that neural-based music retrieval systems could enable personalized recommendations and therapeutic applications. Future work could use higher temporal resolution neuroimaging and generative models to improve decoding accuracy and explore the neural underpinnings of music perception and emotion.", + "code_url": "https://github.com/neoayanami/fmri-music-retrieve" + }, + "2406.12179v1": { + "title": "The Wisdom of a Crowd of Brains: A Universal Brain Encoder", + "url": "http://arxiv.org/abs/2406.12179v1", + "authors": "Roman Beliy, Navve Wasserman, Amit Zalcher, Michal Irani", + "update_time": "2024-06-18", + "abstract": "Image-to-fMRI encoding is important for both neuroscience research and practical applications. However, such \"Brain-Encoders\" have been typically trained per-subject and per fMRI-dataset, thus restricted to very limited training data. In this paper we propose a Universal Brain-Encoder, which can be trained jointly on data from many different subjects/datasets/machines. What makes this possible is our new voxel-centric Encoder architecture, which learns a unique \"voxel-embedding\" per brain-voxel. Our Encoder trains to predict the response of each brain-voxel on every image, by directly computing the cross-attention between the brain-voxel embedding and multi-level deep image features. This voxel-centric architecture allows the functional role of each brain-voxel to naturally emerge from the voxel-image cross-attention. We show the power of this approach to (i) combine data from multiple different subjects (a \"Crowd of Brains\") to improve each individual brain-encoding, (ii) quick & effective Transfer-Learning across subjects, datasets, and machines (e.g., 3-Tesla, 7-Tesla), with few training examples, and (iii) use the learned voxel-embeddings as a powerful tool to explore brain functionality (e.g., what is encoded where in the brain)." + }, + "2406.12065v1": { + "title": "STNAGNN: Spatiotemporal Node Attention Graph Neural Network for Task-based fMRI Analysis", + "url": "http://arxiv.org/abs/2406.12065v1", + "authors": "Jiyao Wang, Nicha C. Dvornek, Peiyu Duan, Lawrence H. Staib, Pamela Ventola, James S. Duncan", + "update_time": "2024-06-17", + "abstract": "Task-based fMRI uses actions or stimuli to trigger task-specific brain responses and measures them using BOLD contrast. Despite the significant task-induced spatiotemporal brain activation fluctuations, most studies on task-based fMRI ignore the task context information aligned with fMRI and consider task-based fMRI a coherent sequence. In this paper, we show that using the task structures as data-driven guidance is effective for spatiotemporal analysis. We propose STNAGNN, a GNN-based spatiotemporal architecture, and validate its performance in an autism classification task. The trained model is also interpreted for identifying autism-related spatiotemporal brain biomarkers." + } + }, + "MEG": { + "2407.02804v1": { + "title": "Mobile Edge Generation-Enabled Digital Twin: Architecture Design and Research Opportunities", + "url": "http://arxiv.org/abs/2407.02804v1", + "authors": "Xiaoxia Xu, Ruikang Zhong, Xidong Mu, Yuanwei Liu, Kaibin Huang", + "update_time": "2024-07-03", + "abstract": "A novel paradigm of mobile edge generation (MEG)-enabled digital twin (DT) is proposed, which enables distributed on-device generation at mobile edge networks for real-time DT applications. First, an MEG-DT architecture is put forward to decentralize generative artificial intelligence (GAI) models onto edge servers (ESs) and user equipments (UEs), which has the advantages of low latency, privacy preservation, and individual-level customization. Then, various single-user and multi-user generation mechanisms are conceived for MEG-DT, which strike trade-offs between generation latency, hardware costs, and device coordination. Furthermore, to perform efficient distributed generation, two operating protocols are explored for transmitting interpretable and latent features between ESs and UEs, namely sketch-based generation and seed-based generation, respectively. Based on the proposed protocols, the convergence between MEG and DT are highlighted. Considering the seed-based image generation scenario, numerical case studies are provided to reveal the superiority of MEG-DT over centralized generation. Finally, promising applications and research opportunities are identified." + }, + "2406.07151v1": { + "title": "EEG-ImageNet: An Electroencephalogram Dataset and Benchmarks with Image Visual Stimuli of Multi-Granularity Labels", + "url": "http://arxiv.org/abs/2406.07151v1", + "authors": "Shuqi Zhu, Ziyi Ye, Qingyao Ai, Yiqun Liu", + "update_time": "2024-06-11", + "abstract": "Identifying and reconstructing what we see from brain activity gives us a special insight into investigating how the biological visual system represents the world. While recent efforts have achieved high-performance image classification and high-quality image reconstruction from brain signals collected by Functional Magnetic Resonance Imaging (fMRI) or magnetoencephalogram (MEG), the expensiveness and bulkiness of these devices make relevant applications difficult to generalize to practical applications. On the other hand, Electroencephalography (EEG), despite its advantages of ease of use, cost-efficiency, high temporal resolution, and non-invasive nature, has not been fully explored in relevant studies due to the lack of comprehensive datasets. To address this gap, we introduce EEG-ImageNet, a novel EEG dataset comprising recordings from 16 subjects exposed to 4000 images selected from the ImageNet dataset. EEG-ImageNet consists of 5 times EEG-image pairs larger than existing similar EEG benchmarks. EEG-ImageNet is collected with image stimuli of multi-granularity labels, i.e., 40 images with coarse-grained labels and 40 with fine-grained labels. Based on it, we establish benchmarks for object classification and image reconstruction. Experiments with several commonly used models show that the best models can achieve object classification with accuracy around 60% and image reconstruction with two-way identification around 64%. These results demonstrate the dataset's potential to advance EEG-based visual brain-computer interfaces, understand the visual perception of biological systems, and provide potential applications in improving machine visual models.", + "code_url": "https://github.com/promise-z5q2sq/eeg-imagenet-dataset" + }, + "2406.01512v1": { + "title": "MAD: Multi-Alignment MEG-to-Text Decoding", + "url": "http://arxiv.org/abs/2406.01512v1", + "authors": "Yiqian Yang, Hyejeong Jo, Yiqun Duan, Qiang Zhang, Jinni Zhou, Won Hee Lee, Renjing Xu, Hui Xiong", + "update_time": "2024-06-03", + "abstract": "Deciphering language from brain activity is a crucial task in brain-computer interface (BCI) research. Non-invasive cerebral signaling techniques including electroencephalography (EEG) and magnetoencephalography (MEG) are becoming increasingly popular due to their safety and practicality, avoiding invasive electrode implantation. However, current works under-investigated three points: 1) a predominant focus on EEG with limited exploration of MEG, which provides superior signal quality; 2) poor performance on unseen text, indicating the need for models that can better generalize to diverse linguistic contexts; 3) insufficient integration of information from other modalities, which could potentially constrain our capacity to comprehensively understand the intricate dynamics of brain activity. This study presents a novel approach for translating MEG signals into text using a speech-decoding framework with multiple alignments. Our method is the first to introduce an end-to-end multi-alignment framework for totally unseen text generation directly from MEG signals. We achieve an impressive BLEU-1 score on the $\\textit{GWilliams}$ dataset, significantly outperforming the baseline from 5.49 to 10.44 on the BLEU-1 metric. This improvement demonstrates the advancement of our model towards real-world applications and underscores its potential in advancing BCI research. Code is available at $\\href{https://github.com/NeuSpeech/MAD-MEG2text}{https://github.com/NeuSpeech/MAD-MEG2text}$.", + "code_url": "https://github.com/neuspeech/mad-meg2text" + }, + "2406.16902v1": { + "title": "Learning Exemplar Representations in Single-Trial EEG Category Decoding", + "url": "http://arxiv.org/abs/2406.16902v1", + "authors": "Jack Kilgallen, Barak Pearlmutter, Jeffery Mark Siskind", + "update_time": "2024-05-31", + "abstract": "Within neuroimgaing studies it is a common practice to perform repetitions of trials in an experiment when working with a noisy class of data acquisition system, such as electroencephalography (EEG) or magnetoencephalography (MEG). While this approach can be useful in some experimental designs, it presents significant limitations for certain types of analyses, such as identifying the category of an object observed by a subject. In this study we demonstrate that when trials relating to a single object are allowed to appear in both the training and testing sets, almost any classification algorithm is capable of learning the representation of an object given only category labels. This ability to learn object representations is of particular significance as it suggests that the results of several published studies which predict the category of observed objects from EEG signals may be affected by a subtle form of leakage which has inflated their reported accuracies. We demonstrate the ability of both simple classification algorithms, and sophisticated deep learning models, to learn object representations given only category labels. We do this using two datasets; the Kaneshiro et al. (2015) dataset and the Gifford et al. (2022) dataset. Our results raise doubts about the true generalizability of several published models and suggests that the reported performance of these models may be significantly inflated." + }, + "2405.19479v1": { + "title": "Participation in the age of foundation models", + "url": "http://arxiv.org/abs/2405.19479v1", + "authors": "Harini Suresh, Emily Tseng, Meg Young, Mary L. Gray, Emma Pierson, Karen Levy", + "update_time": "2024-05-29", + "abstract": "Growing interest and investment in the capabilities of foundation models has positioned such systems to impact a wide array of public services. Alongside these opportunities is the risk that these systems reify existing power imbalances and cause disproportionate harm to marginalized communities. Participatory approaches hold promise to instead lend agency and decision-making power to marginalized stakeholders. But existing approaches in participatory AI/ML are typically deeply grounded in context - how do we apply these approaches to foundation models, which are, by design, disconnected from context? Our paper interrogates this question. First, we examine existing attempts at incorporating participation into foundation models. We highlight the tension between participation and scale, demonstrating that it is intractable for impacted communities to meaningfully shape a foundation model that is intended to be universally applicable. In response, we develop a blueprint for participatory foundation models that identifies more local, application-oriented opportunities for meaningful participation. In addition to the \"foundation\" layer, our framework proposes the \"subfloor'' layer, in which stakeholders develop shared technical infrastructure, norms and governance for a grounded domain, and the \"surface'' layer, in which affected communities shape the use of a foundation model for a specific downstream task. The intermediate \"subfloor'' layer scopes the range of potential harms to consider, and affords communities more concrete avenues for deliberation and intervention. At the same time, it avoids duplicative effort by scaling input across relevant use cases. Through three case studies in clinical care, financial services, and journalism, we illustrate how this multi-layer model can create more meaningful opportunities for participation than solely intervening at the foundation layer." + }, + "2405.17698v3": { + "title": "BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos", + "url": "http://arxiv.org/abs/2405.17698v3", + "authors": "Isla Duporge, Maksim Kholiavchenko, Roi Harel, Scott Wolf, Dan Rubenstein, Meg Crofoot, Tanya Berger-Wolf, Stephen Lee, Julie Barreau, Jenna Kline, Michelle Ramirez, Charles Stewart", + "update_time": "2024-06-03", + "abstract": "Using drones to track multiple individuals simultaneously in their natural environment is a powerful approach for better understanding group primate behavior. Previous studies have demonstrated that it is possible to automate the classification of primate behavior from video data, but these studies have been carried out in captivity or from ground-based cameras. To understand group behavior and the self-organization of a collective, the whole troop needs to be seen at a scale where behavior can be seen in relation to the natural environment in which ecological decisions are made. This study presents a novel dataset from drone videos for baboon detection, tracking, and behavior recognition. The baboon detection dataset was created by manually annotating all baboons in drone videos with bounding boxes. A tiling method was subsequently applied to create a pyramid of images at various scales from the original 5.3K resolution images, resulting in approximately 30K images used for baboon detection. The tracking dataset is derived from the detection dataset, where all bounding boxes are assigned the same ID throughout the video. This process resulted in half an hour of very dense tracking data. The behavior recognition dataset was generated by converting tracks into mini-scenes, a video subregion centered on each animal; each mini-scene was manually annotated with 12 distinct behavior types, resulting in over 20 hours of data. Benchmark results show mean average precision (mAP) of 92.62\\% for the YOLOv8-X detection model, multiple object tracking precision (MOTA) of 63.81\\% for the BotSort tracking algorithm, and micro top-1 accuracy of 63.97\\% for the X3D behavior recognition model. Using deep learning to classify wildlife behavior from drone footage facilitates non-invasive insight into the collective behavior of an entire group." + }, + "2405.13875v1": { + "title": "On the Inapproximability of Finding Minimum Monitoring Edge-Geodetic Sets", + "url": "http://arxiv.org/abs/2405.13875v1", + "authors": "Davide Bil\u00f2, Giordano Colli, Luca Forlizzi, Stefano Leucci", + "update_time": "2024-05-22", + "abstract": "Given an undirected connected graph $G = (V(G), E(G))$ on $n$ vertices, the minimum Monitoring Edge-Geodetic Set (MEG-set) problem asks to find a subset $M \\subseteq V(G)$ of minimum cardinality such that, for every edge $e \\in E(G)$, there exist $x,y \\in M$ for which all shortest paths between $x$ and $y$ in $G$ traverse $e$. We show that, for any constant $c < \\frac{1}{2}$, no polynomial-time $(c \\log n)$-approximation algorithm for the minimum MEG-set problem exists, unless $\\mathsf{P} = \\mathsf{NP}$." + }, + "2405.01012v1": { + "title": "Correcting Biased Centered Kernel Alignment Measures in Biological and Artificial Neural Networks", + "url": "http://arxiv.org/abs/2405.01012v1", + "authors": "Alex Murphy, Joel Zylberberg, Alona Fyshe", + "update_time": "2024-05-02", + "abstract": "Centred Kernel Alignment (CKA) has recently emerged as a popular metric to compare activations from biological and artificial neural networks (ANNs) in order to quantify the alignment between internal representations derived from stimuli sets (e.g. images, text, video) that are presented to both systems. In this paper we highlight issues that the community should take into account if using CKA as an alignment metric with neural data. Neural data are in the low-data high-dimensionality domain, which is one of the cases where (biased) CKA results in high similarity scores even for pairs of random matrices. Using fMRI and MEG data from the THINGS project, we show that if biased CKA is applied to representations of different sizes in the low-data high-dimensionality domain, they are not directly comparable due to biased CKA's sensitivity to differing feature-sample ratios and not stimuli-driven responses. This situation can arise both when comparing a pre-selected area of interest (e.g. ROI) to multiple ANN layers, as well as when determining to which ANN layer multiple regions of interest (ROIs) / sensor groups of different dimensionality are most similar. We show that biased CKA can be artificially driven to its maximum value when using independent random data of different sample-feature ratios. We further show that shuffling sample-feature pairs of real neural data does not drastically alter biased CKA similarity in comparison to unshuffled data, indicating an undesirable lack of sensitivity to stimuli-driven neural responses. Positive alignment of true stimuli-driven responses is only achieved by using debiased CKA. Lastly, we report findings that suggest biased CKA is sensitive to the inherent structure of neural data, only differing from shuffled data when debiased CKA detects stimuli-driven alignment.", + "code_url": "https://github.com/Alxmrphi/correcting_CKA_alignment" + }, + "2404.15588v1": { + "title": "Minimal Evidence Group Identification for Claim Verification", + "url": "http://arxiv.org/abs/2404.15588v1", + "authors": "Xiangci Li, Sihao Chen, Rajvi Kapadia, Jessica Ouyang, Fan Zhang", + "update_time": "2024-04-24", + "abstract": "Claim verification in real-world settings (e.g. against a large collection of candidate evidences retrieved from the web) typically requires identifying and aggregating a complete set of evidence pieces that collectively provide full support to the claim. The problem becomes particularly challenging when there exists distinct sets of evidence that could be used to verify the claim from different perspectives. In this paper, we formally define and study the problem of identifying such minimal evidence groups (MEGs) for claim verification. We show that MEG identification can be reduced from Set Cover problem, based on entailment inference of whether a given evidence group provides full/partial support to a claim. Our proposed approach achieves 18.4% and 34.8% absolute improvements on the WiCE and SciFact datasets over LLM prompting. Finally, we demonstrate the benefits of MEGs in downstream applications such as claim generation." + }, + "2404.10869v1": { + "title": "Alpha rhythm slowing in temporal epilepsy across Scalp EEG and MEG", + "url": "http://arxiv.org/abs/2404.10869v1", + "authors": "Vytene Janiukstyte, Csaba Kozma, Thomas W. Owen, Umair J Chaudhury, Beate Diehl, Louis Lemieux, John S Duncan, Fergus Rugg-Gunn, Jane de Tisi, Yujiang Wang, Peter N. Taylor", + "update_time": "2024-04-16", + "abstract": "EEG slowing is reported in various neurological disorders including Alzheimer's, Parkinson's and Epilepsy. Here, we investigate alpha rhythm slowing in individuals with refractory temporal lobe epilepsy (TLE), compared to healthy controls, using scalp electroencephalography (EEG) and magnetoencephalography (MEG). We retrospectively analysed data from 17,(46) healthy controls and 22,(24) individuals with TLE who underwent scalp EEG and (MEG) recordings as part of presurgical evaluation. Resting-state, eyes-closed recordings were source reconstructed using the standardized low-resolution brain electrographic tomography (sLORETA) method. We extracted low (slow) 6-9 Hz and high (fast) 10-11 Hz alpha relative band power and calculated the alpha power ratio by dividing low (slow) alpha by high (fast) alpha. This ratio was computed for all brain regions in all individuals. Alpha oscillations were slower in individuals with TLE than controls (p<0.05). This effect was present in both the ipsilateral and contralateral hemispheres, and across widespread brain regions. Alpha slowing in TLE was found in both EEG and MEG recordings. We interpret greater low (slow)-alpha as greater deviation from health." + } + }, + "neuroAI": { + "2306.10168v3": { + "title": "Beyond Geometry: Comparing the Temporal Structure of Computation in Neural Circuits with Dynamical Similarity Analysis", + "url": "http://arxiv.org/abs/2306.10168v3", + "authors": "Mitchell Ostrow, Adam Eisen, Leo Kozachkov, Ila Fiete", + "update_time": "2023-10-29", + "abstract": "How can we tell whether two neural networks utilize the same internal processes for a particular computation? This question is pertinent for multiple subfields of neuroscience and machine learning, including neuroAI, mechanistic interpretability, and brain-machine interfaces. Standard approaches for comparing neural networks focus on the spatial geometry of latent states. Yet in recurrent networks, computations are implemented at the level of dynamics, and two networks performing the same computation with equivalent dynamics need not exhibit the same geometry. To bridge this gap, we introduce a novel similarity metric that compares two systems at the level of their dynamics, called Dynamical Similarity Analysis (DSA). Our method incorporates two components: Using recent advances in data-driven dynamical systems theory, we learn a high-dimensional linear system that accurately captures core features of the original nonlinear dynamics. Next, we compare different systems passed through this embedding using a novel extension of Procrustes Analysis that accounts for how vector fields change under orthogonal transformation. In four case studies, we demonstrate that our method disentangles conjugate and non-conjugate recurrent neural networks (RNNs), while geometric methods fall short. We additionally show that our method can distinguish learning rules in an unsupervised manner. Our method opens the door to comparative analyses of the essential temporal structure of computation in neural circuits.", + "code_url": "https://github.com/mitchellostrow/dsa" + }, + "2305.11275v2": { + "title": "Explaining V1 Properties with a Biologically Constrained Deep Learning Architecture", + "url": "http://arxiv.org/abs/2305.11275v2", + "authors": "Galen Pogoncheff, Jacob Granley, Michael Beyeler", + "update_time": "2023-05-25", + "abstract": "Convolutional neural networks (CNNs) have recently emerged as promising models of the ventral visual stream, despite their lack of biological specificity. While current state-of-the-art models of the primary visual cortex (V1) have surfaced from training with adversarial examples and extensively augmented data, these models are still unable to explain key neural properties observed in V1 that arise from biological circuitry. To address this gap, we systematically incorporated neuroscience-derived architectural components into CNNs to identify a set of mechanisms and architectures that comprehensively explain neural activity in V1. We show drastic improvements in model-V1 alignment driven by the integration of architectural components that simulate center-surround antagonism, local receptive fields, tuned normalization, and cortical magnification. Upon enhancing task-driven CNNs with a collection of these specialized components, we uncover models with latent representations that yield state-of-the-art explanation of V1 neural activity and tuning properties. Our results highlight an important advancement in the field of NeuroAI, as we systematically establish a set of architectural components that contribute to unprecedented explanation of V1. The neuroscience insights that could be gleaned from increasingly accurate in-silico models of the brain have the potential to greatly advance the fields of both neuroscience and artificial intelligence." + }, + "2301.09245v2": { + "title": "Towards NeuroAI: Introducing Neuronal Diversity into Artificial Neural Networks", + "url": "http://arxiv.org/abs/2301.09245v2", + "authors": "Feng-Lei Fan, Yingxin Li, Hanchuan Peng, Tieyong Zeng, Fei Wang", + "update_time": "2023-03-11", + "abstract": "Throughout history, the development of artificial intelligence, particularly artificial neural networks, has been open to and constantly inspired by the increasingly deepened understanding of the brain, such as the inspiration of neocognitron, which is the pioneering work of convolutional neural networks. Per the motives of the emerging field: NeuroAI, a great amount of neuroscience knowledge can help catalyze the next generation of AI by endowing a network with more powerful capabilities. As we know, the human brain has numerous morphologically and functionally different neurons, while artificial neural networks are almost exclusively built on a single neuron type. In the human brain, neuronal diversity is an enabling factor for all kinds of biological intelligent behaviors. Since an artificial network is a miniature of the human brain, introducing neuronal diversity should be valuable in terms of addressing those essential problems of artificial networks such as efficiency, interpretability, and memory. In this Primer, we first discuss the preliminaries of biological neuronal diversity and the characteristics of information transmission and processing in a biological neuron. Then, we review studies of designing new neurons for artificial networks. Next, we discuss what gains can neuronal diversity bring into artificial networks and exemplary applications in several important fields. Lastly, we discuss the challenges and future directions of neuronal diversity to explore the potential of NeuroAI." + }, + "2212.04401v1": { + "title": "A Rubric for Human-like Agents and NeuroAI", + "url": "http://arxiv.org/abs/2212.04401v1", + "authors": "Ida Momennejad", + "update_time": "2022-12-08", + "abstract": "Researchers across cognitive, neuro-, and computer sciences increasingly reference human-like artificial intelligence and neuroAI. However, the scope and use of the terms are often inconsistent. Contributed research ranges widely from mimicking behaviour, to testing machine learning methods as neurally plausible hypotheses at the cellular or functional levels, or solving engineering problems. However, it cannot be assumed nor expected that progress on one of these three goals will automatically translate to progress in others. Here a simple rubric is proposed to clarify the scope of individual contributions, grounded in their commitments to human-like behaviour, neural plausibility, or benchmark/engineering goals. This is clarified using examples of weak and strong neuroAI and human-like agents, and discussing the generative, corroborate, and corrective ways in which the three dimensions interact with one another. The author maintains that future progress in artificial intelligence will need strong interactions across the disciplines, with iterative feedback loops and meticulous validity tests, leading to both known and yet-unknown advances that may span decades to come." + }, + "2210.08340v3": { + "title": "Toward Next-Generation Artificial Intelligence: Catalyzing the NeuroAI Revolution", + "url": "http://arxiv.org/abs/2210.08340v3", + "authors": "Anthony Zador, Sean Escola, Blake Richards, Bence \u00d6lveczky, Yoshua Bengio, Kwabena Boahen, Matthew Botvinick, Dmitri Chklovskii, Anne Churchland, Claudia Clopath, James DiCarlo, Surya Ganguli, Jeff Hawkins, Konrad Koerding, Alexei Koulakov, Yann LeCun, Timothy Lillicrap, Adam Marblestone, Bruno Olshausen, Alexandre Pouget, Cristina Savin, Terrence Sejnowski, Eero Simoncelli, Sara Solla, David Sussillo, Andreas S. Tolias, Doris Tsao", + "update_time": "2023-02-22", + "abstract": "Neuroscience has long been an essential driver of progress in artificial intelligence (AI). We propose that to accelerate progress in AI, we must invest in fundamental research in NeuroAI. A core component of this is the embodied Turing test, which challenges AI animal models to interact with the sensorimotor world at skill levels akin to their living counterparts. The embodied Turing test shifts the focus from those capabilities like game playing and language that are especially well-developed or uniquely human to those capabilities, inherited from over 500 million years of evolution, that are shared with all animals. Building models that can pass the embodied Turing test will provide a roadmap for the next generation of AI." + }, + "2112.15459v3": { + "title": "Social Neuro AI: Social Interaction as the \"dark matter\" of AI", + "url": "http://arxiv.org/abs/2112.15459v3", + "authors": "Samuele Bolotta, Guillaume Dumas", + "update_time": "2022-04-11", + "abstract": "This article introduces a three-axis framework indicating how AI can be informed by biological examples of social learning mechanisms. We argue that the complex human cognitive architecture owes a large portion of its expressive power to its ability to engage in social and cultural learning. However, the field of AI has mostly embraced a solipsistic perspective on intelligence. We thus argue that social interactions not only are largely unexplored in this field but also are an essential element of advanced cognitive ability, and therefore constitute metaphorically the dark matter of AI. In the first section, we discuss how social learning plays a key role in the development of intelligence. We do so by discussing social and cultural learning theories and empirical findings from social neuroscience. Then, we discuss three lines of research that fall under the umbrella of Social NeuroAI and can contribute to developing socially intelligent embodied agents in complex environments. First, neuroscientific theories of cognitive architecture, such as the global workspace theory and the attention schema theory, can enhance biological plausibility and help us understand how we could bridge individual and social theories of intelligence. Second, intelligence occurs in time as opposed to over time, and this is naturally incorporated by dynamical systems. Third, embodiment has been demonstrated to provide more sophisticated array of communicative signals. To conclude, we discuss the example of active inference, which offers powerful insights for developing agents that possess biological realism, can self-organize in time, and are socially embodied." + }, + "2011.07464v2": { + "title": "Predictive Coding, Variational Autoencoders, and Biological Connections", + "url": "http://arxiv.org/abs/2011.07464v2", + "authors": "Joseph Marino", + "update_time": "2021-10-23", + "abstract": "This paper reviews predictive coding, from theoretical neuroscience, and variational autoencoders, from machine learning, identifying the common origin and mathematical framework underlying both areas. As each area is prominent within its respective field, more firmly connecting these areas could prove useful in the dialogue between neuroscience and machine learning. After reviewing each area, we discuss two possible correspondences implied by this perspective: cortical pyramidal dendrites as analogous to (non-linear) deep networks and lateral inhibition as analogous to normalizing flows. These connections may provide new directions for further investigations in each field." + }, + "1909.02603v2": { + "title": "Additive function approximation in the brain", + "url": "http://arxiv.org/abs/1909.02603v2", + "authors": "Kameron Decker Harris", + "update_time": "2019-09-13", + "abstract": "Many biological learning systems such as the mushroom body, hippocampus, and cerebellum are built from sparsely connected networks of neurons. For a new understanding of such networks, we study the function spaces induced by sparse random features and characterize what functions may and may not be learned. A network with $d$ inputs per neuron is found to be equivalent to an additive model of order $d$, whereas with a degree distribution the network combines additive terms of different orders. We identify three specific advantages of sparsity: additive function approximation is a powerful inductive bias that limits the curse of dimensionality, sparse networks are stable to outlier noise in the inputs, and sparse random features are scalable. Thus, even simple brain architectures can be powerful function approximators. Finally, we hope that this work helps popularize kernel theories of networks among computational neuroscientists.", + "code_url": "https://github.com/kharris/sparse-random-features" + } + }, + "medical": { + "2407.03308v1": { + "title": "Accelerated Proton Resonance Frequency-based Magnetic Resonance Thermometry by Optimized Deep Learning Method", + "url": "http://arxiv.org/abs/2407.03308v1", + "authors": "Sijie Xu, Shenyan Zong, Chang-Sheng Mei, Guofeng Shen, Yueran Zhao, He Wang", + "update_time": "2024-07-03", + "abstract": "Proton resonance frequency (PRF) based MR thermometry is essential for focused ultrasound (FUS) thermal ablation therapies. This work aims to enhance temporal resolution in dynamic MR temperature map reconstruction using an improved deep learning method. The training-optimized methods and five classical neural networks were applied on the 2-fold and 4-fold under-sampling k-space data to reconstruct the temperature maps. The enhanced training modules included offline/online data augmentations, knowledge distillation, and the amplitude-phase decoupling loss function. The heating experiments were performed by a FUS transducer on phantom and ex vivo tissues, respectively. These data were manually under-sampled to imitate acceleration procedures and trained in our method to get the reconstruction model. The additional dozen or so testing datasets were separately obtained for evaluating the real-time performance and temperature accuracy. Acceleration factors of 1.9 and 3.7 were found for 2 times and 4 times k-space under-sampling strategies and the ResUNet-based deep learning reconstruction performed exceptionally well. In 2-fold acceleration scenario, the RMSE of temperature map patches provided the values of 0.888 degree centigrade and 1.145 degree centigrade on phantom and ex vivo testing datasets. The DICE value of temperature areas enclosed by 43 degree centigrade isotherm was 0.809, and the Bland-Altman analysis showed a bias of -0.253 degree centigrade with the apart of plus or minus 2.16 degree centigrade. In 4 times under-sampling case, these evaluating values decreased by approximately 10%. This study demonstrates that deep learning-based reconstruction can significantly enhance the accuracy and efficiency of MR thermometry for clinical FUS thermal therapies." + }, + "2407.03292v1": { + "title": "Biomechanics-informed Non-rigid Medical Image Registration and its Inverse Material Property Estimation with Linear and Nonlinear Elasticity", + "url": "http://arxiv.org/abs/2407.03292v1", + "authors": "Zhe Min, Zachary M. C. Baum, Shaheer U. Saeed, Mark Emberton, Dean C. Barratt, Zeike A. Taylor, Yipeng Hu", + "update_time": "2024-07-03", + "abstract": "This paper investigates both biomechanical-constrained non-rigid medical image registrations and accurate identifications of material properties for soft tissues, using physics-informed neural networks (PINNs). The complex nonlinear elasticity theory is leveraged to formally establish the partial differential equations (PDEs) representing physics laws of biomechanical constraints that need to be satisfied, with which registration and identification tasks are treated as forward (i.e., data-driven solutions of PDEs) and inverse (i.e., parameter estimation) problems under PINNs respectively. Two net configurations (i.e., Cfg1 and Cfg2) have also been compared for both linear and nonlinear physics model. Two sets of experiments have been conducted, using pairs of undeformed and deformed MR images from clinical cases of prostate cancer biopsy. Our contributions are summarised as follows. 1) We developed a learning-based biomechanical-constrained non-rigid registration algorithm using PINNs, where linear elasticity is generalised to the nonlinear version. 2) We demonstrated extensively that nonlinear elasticity shows no statistical significance against linear models in computing point-wise displacement vectors but their respective benefits may depend on specific patients, with finite-element (FE) computed ground-truth. 3) We formulated and solved the inverse parameter estimation problem, under the joint optimisation scheme of registration and parameter identification using PINNs, whose solutions can be accurately found by locating saddle points." + }, + "2407.03131v1": { + "title": "MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition", + "url": "http://arxiv.org/abs/2407.03131v1", + "authors": "Yanjie Cui, Xiaohong Liu, Jing Liang, Yamin Fu", + "update_time": "2024-07-03", + "abstract": "Electroencephalography (EEG), a medical imaging technique that captures scalp electrical activity of brain structures via electrodes, has been widely used in affective computing. The spatial domain of EEG is rich in affective information.However, few of the existing studies have simultaneously analyzed EEG signals from multiple perspectives of geometric and anatomical structures in spatial domain. In this paper, we propose a multi-view Graph Transformer (MVGT) based on spatial relations, which integrates information from the temporal, frequency and spatial domains, including geometric and anatomical structures, so as to enhance the expressive power of the model comprehensively.We incorporate the spatial information of EEG channels into the model as encoding, thereby improving its ability to perceive the spatial structure of the channels. Meanwhile, experimental results based on publicly available datasets demonstrate that our proposed model outperforms state-of-the-art methods in recent years. In addition, the results also show that the MVGT could extract information from multiple domains and capture inter-channel relationships in EEG emotion recognition tasks effectively." + }, + "2407.03004v1": { + "title": "SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research", + "url": "http://arxiv.org/abs/2407.03004v1", + "authors": "Meghal Dani, Muthu Jeyanthi Prakash, Zeynep Akata, Stefanie Liebe", + "update_time": "2024-07-03", + "abstract": "Large Language Models have shown promising results in their ability to encode general medical knowledge in standard medical question-answering datasets. However, their potential application in clinical practice requires evaluation in domain-specific tasks, where benchmarks are largely missing. In this study semioLLM, we test the ability of state-of-the-art LLMs (GPT-3.5, GPT-4, Mixtral 8x7B, and Qwen-72chat) to leverage their internal knowledge and reasoning for epilepsy diagnosis. Specifically, we obtain likelihood estimates linking unstructured text descriptions of seizures to seizure-generating brain regions, using an annotated clinical database containing 1269 entries. We evaluate the LLM's performance, confidence, reasoning, and citation abilities in comparison to clinical evaluation. Models achieve above-chance classification performance with prompt engineering significantly improving their outcome, with some models achieving close-to-clinical performance and reasoning. However, our analyses also reveal significant pitfalls with several models being overly confident while showing poor performance, as well as exhibiting citation errors and hallucinations. In summary, our work provides the first extensive benchmark comparing current SOTA LLMs in the medical domain of epilepsy and highlights their ability to leverage unstructured texts from patients' medical history to aid diagnostic processes in health care." + }, + "2407.02994v1": { + "title": "MedPix 2.0: A Comprehensive Multimodal Biomedical Dataset for Advanced AI Applications", + "url": "http://arxiv.org/abs/2407.02994v1", + "authors": "Irene Siragusa, Salvatore Contino, Massimo La Ciura, Rosario Alicata, Roberto Pirrone", + "update_time": "2024-07-03", + "abstract": "The increasing interest in developing Artificial Intelligence applications in the medical domain, suffers from the lack of high-quality dataset, mainly due to privacy-related issues. Moreover, the recent rising of Multimodal Large Language Models (MLLM) leads to a need for multimodal medical datasets, where clinical reports and findings are attached to the corresponding CT or MR scans. This paper illustrates the entire workflow for building the data set MedPix 2.0. Starting from the well-known multimodal dataset MedPix\\textsuperscript{\\textregistered}, mainly used by physicians, nurses and healthcare students for Continuing Medical Education purposes, a semi-automatic pipeline was developed to extract visual and textual data followed by a manual curing procedure where noisy samples were removed, thus creating a MongoDB database. Along with the dataset, we developed a GUI aimed at navigating efficiently the MongoDB instance, and obtaining the raw data that can be easily used for training and/or fine-tuning MLLMs. To enforce this point, we also propose a CLIP-based model trained on MedPix 2.0 for scan classification tasks.", + "code_url": "https://github.com/chilab1/medpix-2.0" + }, + "2407.02974v1": { + "title": "IM-MoCo: Self-supervised MRI Motion Correction using Motion-Guided Implicit Neural Representations", + "url": "http://arxiv.org/abs/2407.02974v1", + "authors": "Ziad Al-Haj Hemidi, Christian Weihsbach, Mattias P. Heinrich", + "update_time": "2024-07-03", + "abstract": "Motion artifacts in Magnetic Resonance Imaging (MRI) arise due to relatively long acquisition times and can compromise the clinical utility of acquired images. Traditional motion correction methods often fail to address severe motion, leading to distorted and unreliable results. Deep Learning (DL) alleviated such pitfalls through generalization with the cost of vanishing structures and hallucinations, making it challenging to apply in the medical field where hallucinated structures can tremendously impact the diagnostic outcome. In this work, we present an instance-wise motion correction pipeline that leverages motion-guided Implicit Neural Representations (INRs) to mitigate the impact of motion artifacts while retaining anatomical structure. Our method is evaluated using the NYU fastMRI dataset with different degrees of simulated motion severity. For the correction alone, we can improve over state-of-the-art image reconstruction methods by $+5\\%$ SSIM, $+5\\:db$ PSNR, and $+14\\%$ HaarPSI. Clinical relevance is demonstrated by a subsequent experiment, where our method improves classification outcomes by at least $+1.5$ accuracy percentage points compared to motion-corrupted images.", + "code_url": "https://github.com/mdl-uzl/miccai24_immoco" + }, + "2407.02948v1": { + "title": "Information Greenhouse: Optimal Persuasion for Medical Test-Avoiders", + "url": "http://arxiv.org/abs/2407.02948v1", + "authors": "Zhuo Chen", + "update_time": "2024-07-03", + "abstract": "Patients often delay or reject medical tests due to information avoidance, which hinders timely reception of necessary treatments. This paper studies the optimal information policy to persuade an information-avoidant patient to undergo the test and make the best choice that maximizes his health. The patient sequentially decides whether to take the test and the optimal treatment plan. The information provided is about the background knowledge of the disease, which is complementary with the test result, and disclosure can take place both before and after the test decision. The optimal information policy depends on whether the patient is willing to be tested when he is completely pessimistic. If so, the optimal policy features \\textit{preemptive warning}: the disclosure only takes place before the test, and the bad news guarantees the patient to be tested and be treated even without further information. If not, the optimal policy constructs an \\textit{information greenhouse}: an information structure that provides high anticipatory utility is committed when the patient is tested and the test result is bad. I consider extensions to general information preference and ex ante participation constraint." + }, + "2407.02893v1": { + "title": "An Uncertainty-guided Tiered Self-training Framework for Active Source-free Domain Adaptation in Prostate Segmentation", + "url": "http://arxiv.org/abs/2407.02893v1", + "authors": "Zihao Luo, Xiangde Luo, Zijun Gao, Guotai Wang", + "update_time": "2024-07-03", + "abstract": "Deep learning models have exhibited remarkable efficacy in accurately delineating the prostate for diagnosis and treatment of prostate diseases, but challenges persist in achieving robust generalization across different medical centers. Source-free Domain Adaptation (SFDA) is a promising technique to adapt deep segmentation models to address privacy and security concerns while reducing domain shifts between source and target domains. However, recent literature indicates that the performance of SFDA remains far from satisfactory due to unpredictable domain gaps. Annotating a few target domain samples is acceptable, as it can lead to significant performance improvement with a low annotation cost. Nevertheless, due to extremely limited annotation budgets, careful consideration is needed in selecting samples for annotation. Inspired by this, our goal is to develop Active Source-free Domain Adaptation (ASFDA) for medical image segmentation. Specifically, we propose a novel Uncertainty-guided Tiered Self-training (UGTST) framework, consisting of efficient active sample selection via entropy-based primary local peak filtering to aggregate global uncertainty and diversity-aware redundancy filter, coupled with a tiered self-learning strategy, achieves stable domain adaptation. Experimental results on cross-center prostate MRI segmentation datasets revealed that our method yielded marked advancements, with a mere 5% annotation, exhibiting an average Dice score enhancement of 9.78% and 7.58% in two target domains compared with state-of-the-art methods, on par with fully supervised learning. Code is available at:https://github.com/HiLab-git/UGTST" + }, + "2407.02870v1": { + "title": "Membership Inference Attacks Against Time-Series Models", + "url": "http://arxiv.org/abs/2407.02870v1", + "authors": "Noam Koren, Abigail Goldsteen, Ariel Farkash, Guy Amit", + "update_time": "2024-07-03", + "abstract": "Analyzing time-series data that may contain personal information, particularly in the medical field, presents serious privacy concerns. Sensitive health data from patients is often used to train machine-learning models for diagnostics and ongoing care. Assessing the privacy risk of such models is crucial to making knowledgeable decisions on whether to use a model in production, share it with third parties, or deploy it in patients homes. Membership Inference Attacks (MIA) are a key method for this kind of evaluation, however time-series prediction models have not been thoroughly studied in this context. We explore existing MIA techniques on time-series models, and introduce new features, focusing on the seasonality and trend components of the data. Seasonality is estimated using a multivariate Fourier transform, and a low-degree polynomial is used to approximate trends. We applied these techniques to various types of time-series models, using datasets from the health domain. Our results demonstrate that these new features enhance the effectiveness of MIAs in identifying membership, improving the understanding of privacy risks in medical data applications." + }, + "2407.02844v1": { + "title": "Multi-Attention Integrated Deep Learning Frameworks for Enhanced Breast Cancer Segmentation and Identification", + "url": "http://arxiv.org/abs/2407.02844v1", + "authors": "Pandiyaraju V, Shravan Venkatraman, Pavan Kumar S, Santhosh Malarvannan, Kannan A", + "update_time": "2024-07-03", + "abstract": "Breast cancer poses a profound threat to lives globally, claiming numerous lives each year. Therefore, timely detection is crucial for early intervention and improved chances of survival. Accurately diagnosing and classifying breast tumors using ultrasound images is a persistent challenge in medicine, demanding cutting-edge solutions for improved treatment strategies. This research introduces multiattention-enhanced deep learning (DL) frameworks designed for the classification and segmentation of breast cancer tumors from ultrasound images. A spatial channel attention mechanism is proposed for segmenting tumors from ultrasound images, utilizing a novel LinkNet DL framework with an InceptionResNet backbone. Following this, the paper proposes a deep convolutional neural network with an integrated multi-attention framework (DCNNIMAF) to classify the segmented tumor as benign, malignant, or normal. From experimental results, it is observed that the segmentation model has recorded an accuracy of 98.1%, with a minimal loss of 0.6%. It has also achieved high Intersection over Union (IoU) and Dice Coefficient scores of 96.9% and 97.2%, respectively. Similarly, the classification model has attained an accuracy of 99.2%, with a low loss of 0.31%. Furthermore, the classification framework has achieved outstanding F1-Score, precision, and recall values of 99.1%, 99.3%, and 99.1%, respectively. By offering a robust framework for early detection and accurate classification of breast cancer, this proposed work significantly advances the field of medical image analysis, potentially improving diagnostic precision and patient outcomes." + } + } +} \ No newline at end of file