Skip to main content
CenXiv.org
此网站处于试运行阶段,支持我们!
我们衷心感谢所有贡献者的支持。
贡献
赞助
cenxiv logo > q-bio

帮助 | 高级搜索

定量生物学

  • 新提交
  • 交叉列表
  • 替换

查看 最近的 文章

显示 2025年11月19日, 星期三 新的列表

总共 36 条目
显示最多 2000 每页条目: 较少 | 更多 | 所有

新提交 (展示 23 之 23 条目 )

[1] arXiv:2511.13960 [中文pdf, pdf, html, 其他]
标题: 主动脉形状的分类与地形对相关函数
标题: Classification of Aortic Shape with Topographical Pair Correlation Functions
Byung-Kwan Ko, Soowon Kim, Seo-Hyun Lee
评论: 14页,5图
主题: 组织与器官 (q-bio.TO)

定量描述符将高维医学图像转换为低维特征,这些特征能够区分与损伤或疾病进展相关的器官形状,用于诊断目的。 一个重要例子是主动脉夹层,可以使用高分辨率CT扫描进行成像,而主动脉真腔和假腔的形状长期以来一直用于预测疾病状态和正向手术结果的可能性(即胸主动脉腔内修复术或TEVAR)。 在这里,我们提出了一种计算拓扑对相关函数(TPCF)的方法,这是一种描述约束在网格图像表面上的高斯曲率、平均曲率、形状指数和弯曲比的点估计的空间相关性的描述符。 我们使用TPCF作为描述主动脉形状的度量,并从生成的曲线中提取定量特征。 当TPCF通过形状指数参数化时,相关函数曲线下的面积有助于疾病存在和/或即将进行TEVAR成功的分类准确率达到0.95。 与单点统计相比,TPCF提供了强大的特征来分类主动脉的疾病状态,并更广泛地捕捉解剖数据中的结构相关性。

Quantitative descriptors convert high-dimensional medical images into low-dimensional features capable of differentiating organ shapes that correlate with injury or disease progression for diagnostic purposes. An important example is aortic dissections, which can be imaged using high-resolution CT scans and for which the shape of the true and false lumens of the aorta has long been used to predict disease state and the potential for positive surgical outcomes (namely thoracic endovascular repair or TEVAR). Here we present a method for calculating the topographical pair correlation function (TPCF), a descriptor of the spatial correlation of point estimates for Gaussian curvature, mean curvature, shape index, and bending ratio constrained to the surface of a meshed image. We used the TPCF as a metric to describe aortic shape and extracted quantitative features from the resulting curves. When the TPCF was parameterized by shape index, the area under the curve of the correlation function contributed to a classification accuracy of 0.95 for disease presence and/or impending TEVAR success. Comparison with single-point statistics suggests that the TPCF provides powerful features for classifying the disease state of aortas and more broadly in capturing structural correlations in anatomical data.

[2] arXiv:2511.14559 [中文pdf, pdf, html, 其他]
标题: Apo2Mol:通过动态口袋感知扩散模型生成3D分子
标题: Apo2Mol: 3D Molecule Generation via Dynamic Pocket-Aware Diffusion Models
Yue Ling, Peiqi Zhang, Zhenyi Zhang, Peijie Zhou
评论: 被AAAI 2026接受
主题: 生物大分子 (q-bio.BM) ; 人工智能 (cs.AI) ; 机器学习 (cs.LG) ; 定量方法 (q-bio.QM)

深度生成模型正在迅速推进基于结构的药物设计,为生成能够结合特定蛋白质靶标的中小分子配体提供了巨大的潜力。 然而,大多数当前方法假设蛋白质结合口袋是刚性的,忽视了蛋白质的固有灵活性以及配体结合引起的构象重排,限制了它们在实际药物发现中的应用。 在此,我们提出Apo2Mol,这是一种基于扩散的3D分子设计生成框架,它明确考虑了蛋白质结合口袋中的构象灵活性。 为了支持这一点,我们整理了一个来自蛋白质数据库的超过24,000对实验解析的apo-全态结构数据集,使得能够表征与配体结合相关的蛋白质结构变化。 Apo2Mol采用了一种全原子分层图基扩散模型,可以从输入的apo状态同时生成3D配体分子及其对应的全态口袋构象。 实证研究表明,Apo2Mol可以在生成高亲和力配体方面达到最先进的性能,并准确捕捉现实的蛋白质口袋构象变化。

Deep generative models are rapidly advancing structure-based drug design, offering substantial promise for generating small molecule ligands that bind to specific protein targets. However, most current approaches assume a rigid protein binding pocket, neglecting the intrinsic flexibility of proteins and the conformational rearrangements induced by ligand binding, limiting their applicability in practical drug discovery. Here, we propose Apo2Mol, a diffusion-based generative framework for 3D molecule design that explicitly accounts for conformational flexibility in protein binding pockets. To support this, we curate a dataset of over 24,000 experimentally resolved apo-holo structure pairs from the Protein Data Bank, enabling the characterization of protein structure changes associated with ligand binding. Apo2Mol employs a full-atom hierarchical graph-based diffusion model that simultaneously generates 3D ligand molecules and their corresponding holo pocket conformations from input apo states. Empirical studies demonstrate that Apo2Mol can achieve state-of-the-art performance in generating high-affinity ligands and accurately capture realistic protein pocket conformational changes.

[3] arXiv:2511.14188 [中文pdf, pdf, html, 其他]
标题: 特定区域的脑功能障碍导致长期新冠脑雾认知障碍
标题: A region-specific brain dysfunction underlies cognitive impairment in long COVID brain fog
Lukas Picek, César Leblanc, Alexis Joly, Pierre Bonnet, Rémi Palard, Maximilien Servajean
评论: 58页,6图
主题: 神经与认知 (q-bio.NC)

长期新冠“脑雾”是一种常见且使人衰弱的主观综合征,通常与新冠感染后持续的认知障碍有关。 在这里,我们确定了一种特定区域的大脑功能障碍,该障碍介导了这种认知障碍,并提供了有针对性的神经调节可以改善这种缺陷的证据。 在120名患有长期新冠脑雾的患者中,我们发现了一种异常的感知加工模式。 尽管总体准确性得以保持,但脑雾更严重的患者显著更多地出现错误警报(对非信号的冲动反应)。 高密度(128通道)脑电图和结构磁共振成像分析提供了关于右侧下岛叶缺陷的汇聚证据,其特征是神经监测信号减弱和皮层萎缩。 我们在一个独立的796名参与者的英国生物银行纵向新冠再成像队列中确认了这一缺陷,其中新冠幸存者在感知加工任务中表现出选择性损伤,并且与健康对照组相比,右侧下岛叶出现了相应的纵向萎缩。 最后,在一项原理验证的随机、假治疗对照试验(n = 40)中,一种针对右侧下岛叶的非侵入性兴奋性θ爆发超声刺激方案通过减少错误警报来拯救感知缺陷。 这些发现提供了右侧下岛叶功能障碍在长期新冠相关感知障碍中的因果作用的证据,并表明对该区域的调节可以拯救缺陷,从而确立其作为长期新冠认知障碍的新治疗靶点。

Long COVID "brain fog" is a common and debilitating subjective syndrome often associated with persistent cognitive impairment after COVID-19 infection. Here we identify a specific regional brain dysfunction that mediates this cognitive impairment and provide evidence that targeted neuromodulation improves this deficit. In 120 patients with long COVID brain fog, we found an aberrant perceptual processing pattern. Patients with more severe brain fog committed significantly more false alarms (impulsive responses to non-signals) despite preserved overall accuracy. Both high-density (128-channel) EEG and structural MRI analyses provided converging evidence of a right inferior insula deficit, characterized by a blunted neural monitoring signal and cortical atrophy. We confirmed this deficit in a separate 796-participant UK Biobank longitudinal COVID re-imaging cohort, where COVID-19 survivors also showed selective impairment on a perceptual processing task and corresponding longitudinal atrophy of the right inferior insula compared with healthy controls. Finally, in a proof-of-principle randomized, sham-controlled trial (n = 40), a non-invasive, excitatory theta-burst ultrasound stimulation protocol targeting the right inferior insula rescued the perceptual deficit by reducing false alarms. These findings provide evidence of a causal role for right inferior insula dysfunction in long COVID-related perceptual impairment and show that modulation of this region can rescue the deficit, establishing it as a novel therapeutic target for long COVID cognitive impairment.

[4] arXiv:2511.13791 [中文pdf, pdf, html, 其他]
标题: 基于XAI的深度学习用于蛋白质序列功能组分类
标题: XAI-Driven Deep Learning for Protein Sequence Functional Group Classification
Pratik Chakraborty, Aryan Bhargava
评论: 8页,4图
主题: 定量方法 (q-bio.QM) ; 人工智能 (cs.AI) ; 机器学习 (cs.LG)

蛋白质执行关键的生物功能,对其序列的准确分类对于理解结构-功能关系、酶机制和分子相互作用至关重要。 本研究提出了一种基于深度学习的框架,用于从蛋白质数据银行(PDB)中获取的蛋白质序列的功能组分类。 实现了四种架构:卷积神经网络(CNN)、双向长短期记忆网络(BiLSTM)、CNN-BiLSTM混合模型以及带有注意力机制的CNN。 每个模型均使用k-mer整数编码进行训练,以捕捉局部和长程依赖关系。 其中,CNN在验证集上达到了91.8%的最高准确率,证明了局部基序检测的有效性。 应用了可解释的人工智能技术,包括Grad-CAM和集成梯度,以解释模型预测并识别具有生物学意义的序列基序。 发现的基序富含组氨酸、天冬氨酸、谷氨酸和赖氨酸,代表了转移酶催化和金属结合区域中常见的氨基酸残基。 这些发现表明,深度学习模型可以揭示功能相关的生化特征,在蛋白质序列分析的预测准确性与生物学可解释性之间架起了桥梁。

Proteins perform essential biological functions, and accurate classification of their sequences is critical for understanding structure-function relationships, enzyme mechanisms, and molecular interactions. This study presents a deep learning-based framework for functional group classification of protein sequences derived from the Protein Data Bank (PDB). Four architectures were implemented: Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory (BiLSTM), CNN-BiLSTM hybrid, and CNN with Attention. Each model was trained using k-mer integer encoding to capture both local and long-range dependencies. Among these, the CNN achieved the highest validation accuracy of 91.8%, demonstrating the effectiveness of localized motif detection. Explainable AI techniques, including Grad-CAM and Integrated Gradients, were applied to interpret model predictions and identify biologically meaningful sequence motifs. The discovered motifs, enriched in histidine, aspartate, glutamate, and lysine, represent amino acid residues commonly found in catalytic and metal-binding regions of transferase enzymes. These findings highlight that deep learning models can uncover functionally relevant biochemical signatures, bridging the gap between predictive accuracy and biological interpretability in protein sequence analysis.

[5] arXiv:2511.12205 [中文pdf, pdf, html, 其他]
标题: LCPan:使用局部一致解析的高效变体图构造
标题: LCPan: efficient variation graph construction using Locally Consistent Parsing
Zhaoxuan Wang, Weichen Kang, Yutian Han, Lingyuan Zhao, Bo Li
主题: 基因组学 (q-bio.GN)

在不断指数增长的基因组数据时代,高效且一致的字符串处理至关重要。 局部一致解析(LCP)通过将输入基因组字符串划分为短的、完全匹配的子字符串(例如,“核心”),确保各个划分之间的一致性,从而满足这一需求。 对输入字符串的核心进行一致标记,不仅提供了输入的紧凑表示,还允许重新应用LCP以在多次迭代中优化核心,为下游分析提供逐渐更长且更具信息量的子字符串集合。 我们提出了第一个LCP的迭代实现,即Lcptools,并展示了其在识别具有最小碰撞的核心方面的有效性。 实验结果表明,在第i次迭代时,核心的数量为O(n/c^i),其中c约为2.34,而平均长度和相邻核心之间的平均距离分别为O(c^i)。 与流行的草图技术相比,LCP生成的核心显著更少,从而实现了更紧凑的表示和更快的分析。 为了展示LCP在计算和内存效率方面在基因组字符串处理中的优势,我们还介绍了LCPan,一种高效的变异图构造器。 我们证明,LCPan生成变异图的速度比vg快超过10倍,同时使用的内存超过13倍更少。

Efficient and consistent string processing is critical in the exponentially growing genomic data era. Locally Consistent Parsing (LCP) addresses this need by partitioning an input genome string into short, exactly matching substrings (e.g., "cores"), ensuring consistency across partitions. Labeling the cores of an input string consistently not only provides a compact representation of the input but also enables the reapplication of LCP to refine the cores over multiple iterations, providing a progressively longer and more informative set of substrings for downstream analyses. We present the first iterative implementation of LCP with Lcptools and demonstrate its effectiveness in identifying cores with minimal collisions. Experimental results show that the number of cores at the i^th iteration is O(n/c^i) for c ~ 2.34, while the average length and the average distance between consecutive cores are O(c^i). Compared to the popular sketching techniques, LCP produces significantly fewer cores, enabling a more compact representation and faster analyses. To demonstrate the advantages of LCP in genomic string processing in terms of computation and memory efficiency, we also introduce LCPan, an efficient variation graph constructor. We show that LCPan generates variation graphs >10x faster than vg, while using >13x less memory.

[6] arXiv:2511.13786 [中文pdf, pdf, 其他]
标题: CellStream:基于动态最优传输的嵌入方法用于从快照数据中重建细胞轨迹
标题: CellStream: Dynamical Optimal Transport Informed Embeddings for Reconstructing Cellular Trajectories from Snapshots Data
Hirokuni Miyamoto, Kenta Suzuki, Shigeharu Moriya, Makiko Matsuura, Naoko Tsuji, Teruno Nakaguma, Chitose Ishii, Takayuki Nagatsuka, Takashi Satoh, Wataru Suda, Tamotsu Kato, Chie Shindo, Atsushi Kurotani, Hiroaki Kodama, Hiroshi Masuya, Satoshi Wada, Nobuhiro Kawachi, Hisashi Miyamoto, Yukinari Tsuruda, Yohei Shimasaki, Shouzo Ogizo, Nobuo Suzuki, Tomoharu Yuge, Toshio Takahashi, Tomohito Ojima, Toshio Furota, Akio Sakamoto, Keiichi Takimoto, Kozo Kugimiya, Takehiro Tanaka, Takashi Kimura, Yuuji Oshima, Jun Kikuchi, Hiroshi Ohno
评论: 发表于AAAI 2026会议的口头报告论文
主题: 基因组学 (q-bio.GN) ; 机器学习 (cs.LG)

单细胞RNA测序(scRNA-seq),尤其是时间分辨的数据集,使得在离散的时间点上能够以单细胞分辨率对基因表达动态进行全基因组分析。 然而,当前技术仅能提供细胞状态的稀疏、静态快照,并且受技术噪声的固有影响,这使得推断和表示连续的转录动态变得复杂。 尽管嵌入方法可以降低维度并减轻技术噪声,但大多数现有方法通常将轨迹推断与嵌入构建分开处理,常常忽略时间结构。 为了解决这一挑战,我们引入了CellStream,这是一种新颖的深度学习框架,通过将自编码器与不平衡动力最优传输相结合,从单细胞快照数据中联合学习嵌入和细胞动态。 与现有方法相比,CellStream生成的动态信息嵌入能够稳健地捕捉时间发展过程,同时保持与底层数据流形的高度一致性。 我们在模拟数据集和真实的scRNA-seq数据(包括空间转录组学数据)上展示了CellStream的有效性。 我们的实验表明,在表示细胞轨迹方面,CellStream相对于最先进的方法有显著的定量改进,具有增强的时间连贯性和减少的噪声敏感性。 总体而言,CellStream为从单细胞基因表达的噪声静态快照中学习和表示连续流提供了一个新工具。

Single-cell RNA sequencing (scRNA-seq), especially temporally resolved datasets, enables genome-wide profiling of gene expression dynamics at single-cell resolution across discrete time points. However, current technologies provide only sparse, static snapshots of cell states and are inherently influenced by technical noise, complicating the inference and representation of continuous transcriptional dynamics. Although embedding methods can reduce dimensionality and mitigate technical noise, the majority of existing approaches typically treat trajectory inference separately from embedding construction, often neglecting temporal structure. To address this challenge, here we introduce CellStream, a novel deep learning framework that jointly learns embedding and cellular dynamics from single-cell snapshot data by integrating an autoencoder with unbalanced dynamical optimal transport. Compared to existing methods, CellStream generates dynamics-informed embeddings that robustly capture temporal developmental processes while maintaining high consistency with the underlying data manifold. We demonstrate CellStream's effectiveness on both simulated datasets and real scRNA-seq data, including spatial transcriptomics. Our experiments indicate significant quantitative improvements over state-of-the-art methods in representing cellular trajectories with enhanced temporal coherence and reduced noise sensitivity. Overall, CellStream provides a new tool for learning and representing continuous streams from the noisy, static snapshots of single-cell gene expression.

[7] arXiv:2511.14065 [中文pdf, pdf, html, 其他]
标题: 内在共振依赖于耦合延迟相互作用振荡器的网络大小
标题: Intrinsic Resonance depends on Network Size of Coupled-Delayed Interacting Oscillators
Chengrui Li, Yunmiao Wang, Yule Wang, Weihan Li, Dieter Jaeger, Anqi Wu
评论: 16页,3个图:2个图在正文中,1个图在附录中
主题: 神经与认知 (q-bio.NC) ; 系统与控制 (eess.SY)

从同步神经元种群中产生的集体频率——网络共振——与脑大小有系统关系:全脑的大网络振荡缓慢,而固定体积的更细划分表现出更快的节律。 这种共振-尺寸比例在延迟神经质量模型和人类神经成像中已被报道,但物理机制仍未解决。 在这里,我们表明尺寸依赖的共振直接来自于延迟耦合相位振荡器中的传播延迟。 从具有异质延迟的Kuramoto模型出发,我们在接近同步解附近进行线性化,并得到一个闭式近似,将共振$Ω$与平均延迟和有效耦合场联系起来。 分析预测了一个通用的比例定律:$Ω\approx (\sum_j c_{ij} τ)^{-1}$,因此共振是延迟限制的,因此系统地依赖于几何尺寸或划分密度。 我们评估了四种增长情景——扩展几何、固定体积划分、恒定几何和一个不现实的参考案例——并表明只有几何一致的比例符合分析预测。 具有异质延迟的数值模拟验证了该定律,并将其误差量化为延迟离散度的函数。 这些结果确定了尺寸依赖皮层共振的最小物理机制,并提供了一个统一数值模拟输出的分析框架。

The collective frequency that emerges from synchronized neuronal populations--the network resonance--shows a systematic relationship with brain size: whole-brain's large networks oscillate slowly, whereas finer parcellations of fixed volume exhibit faster rhythms. This resonance-size scaling has been reported in delayed neural mass models and human neuroimaging, yet the physical mechanism remained unresolved. Here we show that size-dependent resonance follows directly from propagation delays in delay-coupled phase oscillators. Starting from a Kuramoto model with heterogeneous delays, we linearize around the near-synchronous solution and obtain a closed-form approximation linking the resonance $Ω$ to the mean delay and the effective coupling field. The analysis predicts a generic scaling law: $Ω\approx (\sum_j c_{ij} τ)^{-1}$, so resonance is delay-limited and therefore depends systematically on geometric size or parcellation density. We evaluate four growth scenarios--expanding geometry, fixed-volume parcellation, constant geometry, and an unphysical reference case--and show that only geometry-consistent scaling satisfies the analytical prediction. Numerical simulations with heterogeneous delays validate the law and quantify its error as a function of delay dispersion. These results identify a minimal physical mechanism for size-dependent cortical resonance and provide an analytical framework that unifies numeric simulation outputs.

[8] arXiv:2511.14576 [中文pdf, pdf, html, 其他]
标题: Artin $L$-函数在按导子排序的$D_4$-四次函数域中的非零性
标题: Non-vanishing of Artin $L$-functions associated with $D_4$-quartic function fields ordered by conductor
Tomas Ascoli, Dhruba Pariyar Damay, Jing Li, Angela Peace, Gregory D. Mayer, Rebecca A. Everett
评论: 34页
主题: 数论 (math.NT)

我们研究与$D_4$-四次函数域相关的某些Artin$L$-函数的低能零点。 具体来说,我们证明当按导子排序时,至少$77\%$个这些$L$-函数在中心点处不为零。 这推广并扩展了Durlanik在$\mathbb{Q}$上的结果,证明了这些$L$-函数中有无限多个在中心点处不为零。 我们通过使用单水平密度来研究$L$-函数的低能零点,从而得到这些结果。 具体来说,我们应用并扩展了Rudnick使用的方法,他研究了与二次函数域扩张相关的Dirichlet$L$-函数,以处理$D_4$的情况。 主要困难在于研究与二次子域判别式较大的$L$-域相关的$D_4$-函数。 这些$L$-函数通过利用所谓的翻转域来研究,该翻转域是$D_4$扩张的,结合Friedrichsen用于计算$D_4$-域的方法,并结合Altuğ、Shankar、Varma和Wilson提供的此类域中的显式分歧理论。

We study the low-lying zeros of certain Artin $L$-functions associated with $D_4$-quartic function fields. Specifically, we prove that when ordered by conductor, at least $77\%$ of these $L$-functions are non-vanishing at the central point. This generalises and extends results over $\mathbb{Q}$ due to Durlanik, proving that an infinite number of these $L$-functions are non-vanishing. We obtain these results by examining the low-lying zeros of the $L$-functions using the one-level density. Specifically, we apply and extend a method used by Rudnick, who studied Dirichlet $L$-functions associated with quadratic function field extensions, to the $D_4$-case. The main difficulty is studying $L$-functions which are associated to $D_4$-fields whose quadratic subfield is of large discriminant. These $L$-functions are studied by utilising the so-called flipped field of a $D_4$ extension, combining a method introduced by Friedrichsen for counting $D_4$-fields, with explicit ramification theory in such fields provided by Altuğ, Shankar, Varma and Wilson.

[9] arXiv:2511.02340 [中文pdf, pdf, 其他]
标题: 基于Transformer的慢性肾脏病预后预测
标题: Chronic Kidney Disease Prognosis Prediction Using Transformer
Karthik Venuturimilli (1), Yang Ha (1) ((1) Berkeley National Laboratory)
评论: 5页,2图,2表
主题: 人工智能 (cs.AI) ; 其他定量生物学 (q-bio.OT)

慢性肾脏病(CKD)影响全球近10%的人口,通常会进展为终末期肾功能衰竭。 准确的预后预测对于及时干预和资源优化至关重要。 我们提出了一种基于Transformer的框架,用于使用首尔国立大学医院OMOP通用数据模型中的多模态电子健康记录(EHR)预测CKD进展。 我们的方法(\textbf{ProQ-BERT})整合了人口统计、临床和实验室数据,采用基于量化的方法对连续实验室值进行分词,并利用注意力机制以提高可解释性。 该模型通过掩码语言建模进行预训练,并针对二分类任务进行微调,以预测从3a期到5期的进展,覆盖不同的随访和评估时间段。 在91,816名患者的队列中进行评估,我们的模型始终优于CEHR-BERT,在短期预测中ROC-AUC最高达到0.995,PR-AUC最高达到0.989。 这些结果突显了Transformer架构和时间设计选择在临床预后建模中的有效性,为个性化CKD护理提供了有前景的方向。

Chronic Kidney Disease (CKD) affects nearly 10\% of the global population and often progresses to end-stage renal failure. Accurate prognosis prediction is vital for timely interventions and resource optimization. We present a transformer-based framework for predicting CKD progression using multi-modal electronic health records (EHR) from the Seoul National University Hospital OMOP Common Data Model. Our approach (\textbf{ProQ-BERT}) integrates demographic, clinical, and laboratory data, employing quantization-based tokenization for continuous lab values and attention mechanisms for interpretability. The model was pretrained with masked language modeling and fine-tuned for binary classification tasks predicting progression from stage 3a to stage 5 across varying follow-up and assessment periods. Evaluated on a cohort of 91,816 patients, our model consistently outperformed CEHR-BERT, achieving ROC-AUC up to 0.995 and PR-AUC up to 0.989 for short-term prediction. These results highlight the effectiveness of transformer architectures and temporal design choices in clinical prognosis modeling, offering a promising direction for personalized CKD care.

[10] arXiv:2511.14669 [中文pdf, pdf, html, 其他]
标题: 双曲图嵌入揭示宿主-病原体相互作用组
标题: Hyperbolic Graph Embeddings Reveal the Host-Pathogen Interactome
Nilay Kumar, Priyansh Bhandari, G. Maragatham
主题: 分子网络 (q-bio.MN)

感染依赖于病原体和宿主蛋白之间的相互作用,但全面映射这些相互作用具有挑战性且耗时。 许多生物网络具有分层、无标度的结构,因此我们开发了一个深度学习框架ApexPPI,它在双曲黎曼空间中表示蛋白质网络以捕捉这些特征。 我们的模型整合了多模态生物数据(蛋白质序列、基因扰动实验和互补的相互作用网络),通过多任务双曲图神经网络预测病原体和宿主蛋白之间的可能相互作用。 将蛋白质特征映射到双曲空间,在预测宿主-病原体相互作用方面比之前的方法准确得多。 从数千万个可能的蛋白质对中,我们的模型识别出数千个高置信度的相互作用,包括许多涉及人类G蛋白偶联受体(GPCRs)的相互作用。 我们使用AlphaFold 3结构建模验证了数十个这些预测的复合物,支持了我们预测的准确性。 这份宿主-病原体蛋白质相互作用的综合图谱为发现新疗法提供了资源,并展示了先进人工智能如何解析复杂的生物系统。

Infections depend on interactions between pathogen and host proteins, but comprehensively mapping these interactions is challenging and labor intensive. Many biological networks have hierarchical, scale-free structure, so we developed a deep learning framework, ApexPPI, that represents protein networks in hyperbolic Riemannian space to capture these features. Our model integrates multimodal biological data (protein sequences, gene perturbation experiments, and complementary interaction networks) to predict likely interactions between pathogen and host proteins through multi-task hyperbolic graph neural networks. Mapping protein features into hyperbolic space led to much higher accuracy than previous methods in predicting host-pathogen interactions. From tens of millions of possible protein pairs, our model identified thousands of high-confidence interactions, including many involving human G-protein-coupled receptors (GPCRs). We validated dozens of these predicted complexes using AlphaFold 3 structural modeling, supporting the accuracy of our predictions. This comprehensive map of host-pathogen protein interactions provides a resource for discovering new treatments and illustrates how advanced AI can unravel complex biological systems.

[11] arXiv:2511.14555 [中文pdf, pdf, html, 其他]
标题: DecNefLab:一种模块化和可解释的解码神经反馈模拟框架
标题: DecNefLab: A Modular and Interpretable Simulation Framework for Decoded Neurofeedback
Cooper Bruno, Tiago Cecchi, Joseph A. Pugar, Luka Pocivavsek, Newell Washburn
主题: 神经与认知 (q-bio.NC) ; 人工智能 (cs.AI)

解码神经反馈(DecNef)是一种蓬勃发展的非侵入性脑调节方法,在神经医学和认知神经科学中有广泛的应用。 然而,DecNef研究的进展仍然受到受试者依赖的学习变异性、依赖间接指标来量化进展以及实验的高成本和时间要求的限制。 我们提出了DecNefLab,这是一个模块化且可解释的模拟框架,将DecNef形式化为一个机器学习问题。 除了提供虚拟实验室外,DecNefLab使研究人员能够建模、分析和理解神经反馈动态。 使用潜在变量生成模型作为模拟参与者,DecNefLab允许直接观察内部认知状态,并系统评估不同协议设计和受试者特征如何影响学习。 我们展示了这种方法可以(i)再现DecNef学习的经验现象,(ii)识别DecNef反馈无法诱导学习的条件,以及(iii)在人类实施之前,通过计算模拟指导设计更稳健和可靠的DecNef协议。 总之,DecNefLab连接了计算建模和认知神经科学,为方法创新、稳健协议设计以及最终深入理解基于DecNef的脑调节提供了原则性基础。

Decoded Neurofeedback (DecNef) is a flourishing non-invasive approach to brain modulation with wide-ranging applications in neuromedicine and cognitive neuroscience. However, progress in DecNef research remains constrained by subject-dependent learning variability, reliance on indirect measures to quantify progress, and the high cost and time demands of experimentation. We present DecNefLab, a modular and interpretable simulation framework that formalizes DecNef as a machine learning problem. Beyond providing a virtual laboratory, DecNefLab enables researchers to model, analyze and understand neurofeedback dynamics. Using latent variable generative models as simulated participants, DecNefLab allows direct observation of internal cognitive states and systematic evaluation of how different protocol designs and subject characteristics influence learning. We demonstrate how this approach can (i) reproduce empirical phenomena of DecNef learning, (ii) identify conditions under which DecNef feedback fails to induce learning, and (iii) guide the design of more robust and reliable DecNef protocols in silico before human implementation. In summary, DecNefLab bridges computational modeling and cognitive neuroscience, offering a principled foundation for methodological innovation, robust protocol design, and ultimately, a deeper understanding of DecNef-based brain modulation.

[12] arXiv:2511.14466 [中文pdf, pdf, html, 其他]
标题: 多巴胺在增强皮层-纹状体-丘脑-皮层环路尖峰信噪比中的作用
标题: Effect of Dopamine in Enhancement of SNR of Cortico-Striatal-Thalamo-Cortical Loop Spiking
Felipe A. Torres, Alejandro Weinstein, Jesus M. Cortes, Wael El-Deredy
评论: 9页
主题: 神经与认知 (q-bio.NC)

在本工作中,研究了多巴胺神经递质在皮层-纹状体-丘脑-皮层(CSTC)环路中的作用。 模拟结果证实,多巴胺通过丘脑去抑制促进运动。 对其对信噪比(SNR)的影响进行分析显示,结果复杂且具有区域特异性:SNR在某些区域增加(例如,D2纹状体:3.41 dB至6.25 dB),在其他区域减少(例如,丘脑VL:6.24 dB至3.93 dB),而在其他区域保持稳定(例如,M1:3.16 dB至3.13 dB)。 这种异质性源于多巴胺增加了表达D1受体的神经元的兴奋性,这会增强通道传导噪声并在特定电路中降低SNR。 因此,多巴胺的作用并非作为统一的信号增强器,而是一个复杂的调节因子,在CSTC环路中关键地平衡促进作用和噪声。

In this work, the effects of dopamine neurotransmitter within the Cortico-Striatal-Thalamo-Cortical (CSTC) loop. Simulations confirmed dopamine facilitates movement via thalamic disinhibition. Analysis of its impact on the signal-to-noise ratio (SNR) revealed a complex, region-specific outcome: SNR increased in some regions (e.g., D2 Striatum: 3.41 dB to 6.25 dB), decreased in others (e.g., Thalamus VL: 6.24 dB to 3.93 dB), and remained stable elsewhere (e.g., M1: 3.16 dB to 3.13 dB). This heterogeneity stems from dopamine increasing the excitability of D1-receptor-expressing neurons, which amplifies channel conductance noise and reduces SNR in specific circuits. Thus, dopamine acts not as a uniform signal enhancer, but as a complex modulator that critically balances facilitation and noise within the CSTC loop.

[13] arXiv:2511.14083 [中文pdf, pdf, html, 其他]
标题: 肩关节不稳术前计划中CT扫描的自动颈窝骨缺损测量和分割
标题: Automated glenoid bone loss measurement and segmentation in CT scans for pre-operative planning in shoulder instability
Luke Piszkin, Dervis C. Vural
主题: 计算机视觉与模式识别 (cs.CV) ; 人工智能 (cs.AI) ; 定量方法 (q-bio.QM)

可靠的肩关节骨缺损测量对于肩关节不稳定的手术规划至关重要,但目前的手动和半自动方法耗时且常受读者间差异的影响。 我们开发并验证了一种完全自动化的深度学习流程,用于使用基于线性的正面视图最佳圆方法,在三维计算机断层扫描(CT)图像上测量肩关节骨缺损。 收集了91名患者的肩部CT图像(平均年龄40岁;范围14-89岁;65名男性)以及包括肩胛盂分割、解剖标志和骨缺损测量的手动标注。 多阶段算法包含三个主要阶段:(1) 分割,我们开发了一个U-Net来自动分割肩胛盂和肱骨头;(2) 解剖标志检测,其中第二个网络预测肩胛盂边缘点;以及(3) 几何拟合,我们应用主成分分析(PCA)、投影和圆拟合来计算骨缺损的百分比。 自动化测量与共识读数显示出高度一致性,并超过了外科医生之间的一致性(组内相关系数(ICC)0.84 vs 0.78),包括在低骨缺损和高骨缺损亚组中(ICC 0.71 vs 0.63和0.83 vs 0.21,分别;P < 0.001)。 对于将患者分类为低、中和高骨缺损类别,该流程在低严重程度中的召回率为0.714,在高严重程度中的召回率为0.857,没有低骨缺损病例被误分类为高或反之。 这些结果表明,我们的方法是一种时间高效且临床可靠的工具,可用于肩关节不稳定的术前规划以及大量肩胛盂骨缺损患者的筛查。 代码和数据集可在https://github.com/Edenliu1/Auto-Glenoid-Measurement-DL-Pipeline获取。

Reliable measurement of glenoid bone loss is essential for operative planning in shoulder instability, but current manual and semi-automated methods are time-consuming and often subject to interreader variability. We developed and validated a fully automated deep learning pipeline for measuring glenoid bone loss on three-dimensional computed tomography (CT) scans using a linear-based, en-face view, best-circle method. Shoulder CT images of 91 patients (average age, 40 years; range, 14-89 years; 65 men) were retrospectively collected along with manual labels including glenoid segmentation, landmarks, and bone loss measurements. The multi-stage algorithm has three main stages: (1) segmentation, where we developed a U-Net to automatically segment the glenoid and humerus; (2) anatomical landmark detection, where a second network predicts glenoid rim points; and (3) geometric fitting, where we applied principal component analysis (PCA), projection, and circle fitting to compute the percentage of bone loss. The automated measurements showed strong agreement with consensus readings and exceeded surgeon-to-surgeon consistency (intraclass correlation coefficient (ICC) 0.84 vs 0.78), including in low- and high-bone-loss subgroups (ICC 0.71 vs 0.63 and 0.83 vs 0.21, respectively; P < 0.001). For classifying patients into low, medium, and high bone-loss categories, the pipeline achieved a recall of 0.714 for low and 0.857 for high severity, with no low cases misclassified as high or vice versa. These results suggest that our method is a time-efficient and clinically reliable tool for preoperative planning in shoulder instability and for screening patients with substantial glenoid bone loss. Code and dataset are available at https://github.com/Edenliu1/Auto-Glenoid-Measurement-DL-Pipeline.

[14] arXiv:2511.13916 [中文pdf, pdf, 其他]
标题: TCF7L2-DNA复合物的结构灵活性与2型糖尿病SNP rs7903146
标题: Structural Flexibility of the TCF7L2-DNA Complex with the Type 2 Diabetes SNP rs7903146
Jinhao Yang, Shaojiong Zhou, Zhibin Wang, Jiahua Xu, Jia Chen, Zhouqian Yin, Tao Wei, Chaofan Geng, Xiaoduo Liu, Xiang Li, Xiaoyu Zhou, Kun Li, Ruolei Gu, Raymond Dolan, Yi Tang, Yunzhe Liu
评论: 10页,6图。被2025年IEEE国际健康与生物工程会议(EHB)接收;会议论文集将由Springer Nature收录/出版
主题: 生物大分子 (q-bio.BM)

单核苷酸多态性(SNP)rs7903146位于TCF7L2基因中,已被确定为2型糖尿病(T2D)最强的常见遗传风险因素之一。 该SNP位于非编码区域,表明存在调控机制,即该SNP不会改变蛋白质本身的结构,而是影响TCF7L2蛋白与DNA结合的方式,从而调控其他基因。 然而,这种结合高度依赖于DNA的形状和灵活性。 本研究旨在揭示SNP的胞嘧啶到胸腺嘧啶替换对TCF7L2-DNA复合物的原子级影响。 我们首先利用AlphaFold生成了TCF7L2蛋白以及两个15碱基对DNA双链的高置信度结构:一个包含参考C等位基因,另一个包含变异T等位基因。 然后将这些结构作为Neurosnap的Boltz2深度学习模型的输入,生成两个完整的TCF7L2 HMG-box与每种DNA变体结合的蛋白-DNA复合物。 使用iMODS服务器,我们进行了正常模分析(NMA),以预测和比较复合物之间的大规模柔韧性和相互作用差异。 通过PDBsum对蛋白-DNA界面进行分解,以定位原子接触、裂缝和相互作用图谱。 总体而言,我们的结果表明,T等位基因变体表现出更高的全局刚度,具有更高的特征值和较低的柔韧性,这表明SNP破坏了TCF7L2-DNA结合所需的机制和生物力学平衡,从而影响下游基因调控。

The single nucleotide polymorphism (SNP) rs7903146 in the TCF7L2 gene has been determined as one of the strongest common genetic risk factors for Type 2 Diabetes (T2D). The location of the SNP in a non-coding region suggests a regulatory mechanism, meaning the SNP doesn't change the protein's own structure but rather affects how the TCF7L2 protein binds to DNA to control other genes. This binding, however, is highly dependent on the shape and flexibility of the DNA. This study aims to reveal the atomic-level effects of the SNP's cytosine-to-thymine substitution on the TCF7L2-DNA complex. We first utilized AlphaFold to generate individual high-confidence structures of the TCF7L2 protein and two 15-base pair DNA duplexes: one containing the reference C allele and one containing the variant T allele. These structures were then used as inputs for Neurosnap's Boltz2 deep learning model to generate two complete protein-DNA complexes of the TCF7L2 HMG-box bound to each DNA variant. Using the iMODS server, we conducted a Normal Mode Analysis (NMA) to predict and compare large-scale flexibility and differences in interactions between the complexes. The protein-DNA interface was dissected using PDBsum to locate atomic contacts, clefts, and interaction maps. Overall, our results show that the T allele variant exhibits increased global stiffness with a higher eigenvalue and reduced flexibility, suggesting that the SNP disrupts the mechanism and biomechanical balance needed for efficient TCF7L2-DNA binding, thus affecting downstream gene regulation.

[15] arXiv:2312.02040 [中文pdf, pdf, 其他]
标题: 无界拟阵
标题: Unbounded matroids
Richard Golnik, Thomas Gatter, Peter F. Stadler, Nicola Vassena
评论: 25页。最终版本,将发表于《欧洲组合学杂志》
主题: 组合数学 (math.CO)

一个拟阵基多面体是一个每个顶点都有0,1坐标的多面体,且每条边都平行于两个坐标向量的差。 拟阵基多面体通过布尔格上的积分次模函数进行组合描述,这些函数满足单位增加性质。 我们通过将布尔格替换为任意分配格,定义了一个更一般的无界拟阵类,或称为U-拟阵。 因此,U-拟阵作为满足拟阵基多面体顶点和边条件但可能无界的多面体的组合模型。 像拟模多面体一样,U-拟阵推广了拟阵,并作为次模系统的特例出现。 我们证明每个U-拟阵都存在一个规范的最大扩展,称为慷慨扩展;类似的几何陈述是每个U-拟阵基多面体包含一个唯一的最大拟阵基多面体。 我们表明U-拟阵基多面体的顶点支撑集形成一个可剥复的单纯复形,并用剥复顺序来表征U-拟阵基系统,推广了Björner和Gale对单纯复形为拟阵独立复形的条件。 最后,我们将我们的理论应用于子空间排列,并表明在这一设置中,慷慨扩展具有自然的几何解释。

A matroid base polytope is a polytope in which each vertex has 0,1 coordinates and each edge is parallel to a difference of two coordinate vectors. Matroid base polytopes are described combinatorially by integral submodular functions on a boolean lattice, satisfying the unit increase property. We define a more general class of unbounded matroids, or U-matroids, by replacing the boolean lattice with an arbitrary distributive lattice. U-matroids thus serve as a combinatorial model for polyhedra that satisfy the vertex and edge conditions of matroid base polytopes, but may be unbounded. Like polymatroids, U-matroids generalize matroids and arise as a special case of submodular systems. We prove that every U-matroid admits a canonical largest extension to a matroid, which we call the generous extension; the analogous geometric statement is that every U-matroid base polyhedron contains a unique largest matroid base polytope. We show that the supports of vertices of a U-matroid base polyhedron span a shellable simplicial complex, and we characterize U-matroid basis systems in terms of shelling orders, generalizing Björner's and Gale's criteria for a simplicial complex to be a matroid independence complex. Finally, we present an application of our theory to subspace arrangements and show that the generous extension has a natural geometric interpretation in this setting.

[16] arXiv:2511.13797 [中文pdf, pdf, 其他]
标题: MAT-MPNN:一种面向移动性的Transformer-MPNN模型,用于加利福尼亚州、佛罗里达州和新英格兰地区HIV诊断的动态时空预测
标题: MAT-MPNN: A Mobility-Aware Transformer-MPNN Model for Dynamic Spatiotemporal Prediction of HIV Diagnoses in California, Florida, and New England
Peilun Song, Shuguang Yang, Xiujuan Geng, Zhenzhong Gan, Suiping Wang, Gangyi Feng
评论: 21页,20图,1表。预印本
主题: 定量方法 (q-bio.QM) ; 人工智能 (cs.AI) ; 机器学习 (cs.LG)

人类免疫缺陷病毒(HIV)几十年来一直是全球健康的重大挑战,预测HIV诊断结果仍然是研究的关键领域。 然而,捕捉HIV传播的复杂空间和时间依赖性仍然具有挑战性。 传统的消息传递神经网络(MPNN)模型依赖于一个固定的二进制邻接矩阵,该矩阵仅编码地理邻接关系,无法表示非相邻县之间的相互作用。 我们的研究提出了一种深度学习架构——移动感知变换器-消息传递神经网络(MAT-MPNN)框架,以预测加利福尼亚州、佛罗里达州和新英格兰地区的县一级HIV诊断率。 该模型结合了变换器编码器提取的时间特征和通过移动图生成器(MGG)捕获的空间关系。 MGG通过结合地理和人口统计数据改进了传统的邻接矩阵。 与表现最好的混合基线——变换器MPNN模型相比,MAT-MPNN在佛罗里达州将均方预测误差(MSPE)降低了27.9%,在加利福尼亚州降低了39.1%,在新英格兰地区降低了12.5%,并且分别提高了预测模型选择准则(PMCC)7.7%、3.5%和3.9%。 MAT-MPNN在佛罗里达州和新英格兰地区也优于空间变化自回归(SVAR)模型,在加利福尼亚州表现出相当的性能。 这些结果表明,应用移动感知的动态空间结构显著提高了时空流行病学预测的预测准确性和校准效果。

Human Immunodeficiency Virus (HIV) has posed a major global health challenge for decades, and forecasting HIV diagnoses continues to be a critical area of research. However, capturing the complex spatial and temporal dependencies of HIV transmission remains challenging. Conventional Message Passing Neural Network (MPNN) models rely on a fixed binary adjacency matrix that only encodes geographic adjacency, which is unable to represent interactions between non-contiguous counties. Our study proposes a deep learning architecture Mobility-Aware Transformer-Message Passing Neural Network (MAT-MPNN) framework to predict county-level HIV diagnosis rates across California, Florida, and the New England region. The model combines temporal features extracted by a Transformer encoder with spatial relationships captured through a Mobility Graph Generator (MGG). The MGG improves conventional adjacency matrices by combining geographic and demographic information. Compared with the best-performing hybrid baseline, the Transformer MPNN model, MAT-MPNN reduced the Mean Squared Prediction Error (MSPE) by 27.9% in Florida, 39.1% in California, and 12.5% in New England, and improved the Predictive Model Choice Criterion (PMCC) by 7.7%, 3.5%, and 3.9%, respectively. MAT-MPNN also achieved better results than the Spatially Varying Auto-Regressive (SVAR) model in Florida and New England, with comparable performance in California. These results demonstrate that applying mobility-aware dynamic spatial structures substantially enhances predictive accuracy and calibration in spatiotemporal epidemiological prediction.

[17] arXiv:2511.13762 [中文pdf, pdf, 其他]
标题: 基因增量学习用于单细胞转录组学
标题: Gene Incremental Learning for Single-Cell Transcriptomics
Hadi Barati, Ali Nayerifar, Mehdi Fardmanesh
评论: 被AAAI 2026接受
主题: 机器学习 (cs.LG) ; 人工智能 (cs.AI) ; 基因组学 (q-bio.GN)

类作为计算机视觉的基本元素,在增量学习框架中已被广泛研究。 相比之下,令牌在许多研究领域中发挥着重要作用,表现出类似的增长特性,但对其增量学习的研究仍然非常有限。 这一研究空白主要源于语言中令牌的整体性,这给为其设计增量学习框架带来了重大挑战。 为克服这一障碍,本文我们转向一种令牌——基因,针对大规模生物数据集——单细胞转录组学,制定基因增量学习的流程并建立相应的评估方法。 我们发现遗忘问题也存在于基因增量学习中,因此我们适应了现有的类别增量学习方法以减轻基因的遗忘。 通过大量实验,我们证明了我们框架设计和评估的合理性以及方法适应的有效性。 最后,我们为单细胞转录组学中的基因增量学习提供了完整的基准。

Classes, as fundamental elements of Computer Vision, have been extensively studied within incremental learning frameworks. In contrast, tokens, which play essential roles in many research fields, exhibit similar characteristics of growth, yet investigations into their incremental learning remain significantly scarce. This research gap primarily stems from the holistic nature of tokens in language, which imposes significant challenges on the design of incremental learning frameworks for them. To overcome this obstacle, in this work, we turn to a type of token, gene, for a large-scale biological dataset--single-cell transcriptomics--to formulate a pipeline for gene incremental learning and establish corresponding evaluations. We found that the forgetting problem also exists in gene incremental learning, thus we adapted existing class incremental learning methods to mitigate the forgetting of genes. Through extensive experiments, we demonstrated the soundness of our framework design and evaluations, as well as the effectiveness of our method adaptations. Finally, we provide a complete benchmark for gene incremental learning in single-cell transcriptomics.

[18] arXiv:2403.20239 [中文pdf, pdf, html, 其他]
标题: 一种基于EEG的新生儿缺氧缺血性脑病治疗性低温决策工具
标题: A simple EEG-based decision tool for neonatal therapeutic hypothermia in hypoxic-ischemic encephalopathy
Alexander Olza, Roberto Santana, David Soto
评论: 20页,1表,2图
主题: 神经与认知 (q-bio.NC)

目标 在新生儿早期时期准确识别缺氧缺血性脑损伤对于在出生后6小时内开始治疗性低温(TH)以优化神经发育结果至关重要。 我们旨在开发一个简单的决策工具,基于出生后6小时内记录的常规脑电图(EEG)特征来识别足月新生儿的缺氧缺血性脑病(HIE)。 方法 由儿科神经科医生对100名患有HIE的足月新生儿的EEG记录进行严重程度分级。 分析了慢频带中的振幅,重点研究了delta(0.5-4 Hz)频谱功率。 delta功率的时间波动表征了每个HIE等级,联合水平和持续时间概率密度为delta振荡功率进行了估计。 本研究已在clinicaltrials.gouv注册(NCT05114070)。 结果 这些2D EEG表示能够有效区分轻度HIE病例与需要低温治疗的病例,准确率达到98%,灵敏度为99%,阳性预测值为99%,阴性预测值为94%,F1得分为99%,误报率仅为6%。 该系统能准确区分轻度与中度或重度HIE,仅有一个轻度病例被错误识别为需要低温治疗,一个中度病例被错误标记为需要治疗。 结论 早期EEG(出生后6小时内)的delta频谱特征的量化概率密度显示了轻度与中度/重度HIE之间的显著差异,从而能够准确区分TH的候选者。 意义 早期EEG的简单、可解释的生物标志物可以提供一种高效的视觉临床决策支持工具,用于识别适合接受治疗性低温的足月新生儿HIE患者。

Objective Accurate identification of hypoxic-ischemic brain injury in the early neonatal period is essential for initiating therapeutic hypothermia (TH) within 6 hours of birth to optimize neurodevelopmental outcomes. We aimed to develop a simple decision-making tool for identifying term neonates with hypoxic-ischemic encephalopathy (HIE) based on features of conventional electroencephalograms (EEG) recorded within 6 hours of birth. Methods EEG recordings from 100 full-term neonates with HIE were graded by pediatric neurologists for severity. Amplitude in slow frequency bands was analyzed, focusing on delta (0.5-4 Hz) spectral power. Temporal fluctuations of delta power characterized each HIE grade, with joint level and duration probability densities estimated for delta oscillation power. This study is registered on clinicaltrials.gouv (NCT05114070). Results These 2D EEG representations effectively distinguish mild HIE cases from those requiring hypothermia, achieving 98% accuracy, 99% sensitivity, 99% positive predictive value, 94% negative predictive value, an F1 score of 99%, and a false alarm rate of only 6%. This system accurately discriminates mild from moderate or severe HIE, with only one mild case mistakenly identified as requiring hypothermia and one moderate case erroneously flagged for treatment. Conclusions Quantized probability densities of delta spectral features from early EEG (within 6 hours of birth) revealed significant differences between mild and moderate/severe HIE, enabling accurate discrimination of candidates for TH. Significance Simple, interpretable biomarkers from early EEG can provide an efficient visual clinical decision support tool to identify full-term neonates with HIE eligible for therapeutic hypothermia.

[19] arXiv:2511.13790 [中文pdf, pdf, html, 其他]
标题: GeoPl@ntNet:探索基本生物多样性变量的平台
标题: GeoPl@ntNet: A Platform for Exploring Essential Biodiversity Variables
Xinzhe Zheng, Shiyu Jiang, Gustavo Seabra, Chenglong Li, Yanjun Li
评论: 4页,5图和2表
主题: 定量方法 (q-bio.QM) ; 人工智能 (cs.AI)

本文介绍了GeoPl@ntNet,一个交互式网络应用程序,旨在通过动态地图和事实表使基本生物多样性变量对所有人易于访问和理解。 其核心目的是让用户探索欧洲物种分布、生境类型和生物多样性指标的高分辨率AI生成地图。 这些地图通过涉及卷积神经网络和大语言模型的级联流程开发,提供直观且信息丰富的界面,以更好地理解生物多样性,分辨率精确至50x50米。 该网站还允许探索特定区域,让用户在地图上选择感兴趣区域(例如城市绿地、保护区或河岸)以查看当地物种及其覆盖情况。 此外,GeoPl@ntNet为选定区域生成综合报告,包括关于受保护物种、入侵物种和特有物种数量的见解。

This paper describes GeoPl@ntNet, an interactive web application designed to make Essential Biodiversity Variables accessible and understandable to everyone through dynamic maps and fact sheets. Its core purpose is to allow users to explore high-resolution AI-generated maps of species distributions, habitat types, and biodiversity indicators across Europe. These maps, developed through a cascading pipeline involving convolutional neural networks and large language models, provide an intuitive yet information-rich interface to better understand biodiversity, with resolutions as precise as 50x50 meters. The website also enables exploration of specific regions, allowing users to select areas of interest on the map (e.g., urban green spaces, protected areas, or riverbanks) to view local species and their coverage. Additionally, GeoPl@ntNet generates comprehensive reports for selected regions, including insights into the number of protected species, invasive species, and endemic species.

[20] arXiv:2511.13739 [中文pdf, pdf, 其他]
标题: 基于跨被试泛化和校准的独立于被试的想象语音检测
标题: Subject-Independent Imagined Speech Detection via Cross-Subject Generalization and Calibration
Xiaoqiong Xia, Cesar de la Fuente-Nunez
评论: 4页,2图,会议名称:国际脑机接口会议
主题: 神经与认知 (q-bio.NC) ; 人工智能 (cs.AI) ; 声音 (cs.SD)

在脑电图基础上的想象语音解码中,实现个体间的鲁棒泛化仍然是一个重大挑战,这是由于神经活动模式存在显著的变异性。 本研究考察了训练动态和轻量级个体适应如何影响神经解码框架中的跨个体性能。 一种循环的跨个体训练方法,包括较短的每个个体训练片段和频繁的个体交替,导致了在未见过的目标数据上的适度但一致的解码性能提升。 此外,在个体校准的留一法方案下,仅使用目标个体数据的10%进行校准即可达到0.781的准确率和0.801的AUC,证明了少量样本适应的有效性。 这些发现表明,将循环训练与最小校准相结合,为开发可扩展、用户自适应的脑机接口系统提供了一种简单而有效的策略,该策略在泛化和个性化之间取得了平衡。

Achieving robust generalization across individuals remains a major challenge in electroencephalogram based imagined speech decoding due to substantial variability in neural activity patterns. This study examined how training dynamics and lightweight subject specific adaptation influence cross subject performance in a neural decoding framework. A cyclic inter subject training approach, involving shorter per subject training segments and frequent alternation among subjects, led to modest yet consistent improvements in decoding performance across unseen target data. Furthermore, under the subject calibrated leave one subject out scheme, incorporating only 10 % of the target subjects data for calibration achieved an accuracy of 0.781 and an AUC of 0.801, demonstrating the effectiveness of few shot adaptation. These findings suggest that integrating cyclic training with minimal calibration provides a simple and effective strategy for developing scalable, user adaptive brain computer interface systems that balance generalization and personalization.

[21] arXiv:2511.13733 [中文pdf, pdf, 其他]
标题: THD-BAR:用于EEG通用表示的拓扑分层导出脑自回归建模
标题: THD-BAR: Topology Hierarchical Derived Brain Autoregressive Modeling for EEG Generic Representations
Xiaoqiong Xia, Cesar de la Fuente-Nunez
主题: 信号处理 (eess.SP) ; 机器学习 (cs.LG) ; 神经与认知 (q-bio.NC)

大规模预训练模型在学习通用的EEG表示方面具有显著潜力。 然而,大多数现有方法,特别是自回归(AR)框架,主要依赖于多通道EEG数据的简单时间序列,这无法捕捉EEG信号固有的丰富生理特征。 此外,它们以时间为中心的建模方法也限制了对脑活动动态空间拓扑的有效表示。 为了解决这些挑战并充分挖掘大规模EEG模型的潜力,我们提出了一种新颖的拓扑层次衍生脑自回归建模(THD-BAR)用于EEG通用表示。 THD-BAR的核心创新在于引入了脑拓扑层次(BTH),为EEG通道建立了一个多尺度空间顺序。 这种分层结构使自回归学习重新定义为一个“下一尺度时间预测”问题,从而有效捕捉空间和时间动态。 基于BTH,我们设计了一个拓扑层次向量量化变分自编码器(THVQ-VAE)用于多尺度标记化,并开发了一个增强的脑自回归(BAR)模块,该模块采用专门的遮蔽策略进行预测。 通过在17个数据集上进行广泛的大型预训练,随后在涵盖5种不同任务的10个下游数据集上进行严格的验证,THD-BAR始终优于现有方法。 这些结果突显了我们所提出方法的优越泛化能力和建模能力。

Large-scale pre-trained models hold significant potential for learning universal EEG representations. However, most existing methods, particularly autoregressive (AR) frameworks, primarily rely on straightforward temporal sequencing of multi-channel EEG data, which fails to capture the rich physiological characteristics inherent to EEG signals. Moreover, their time-centered modeling approach also limits the effective representation of the dynamic spatial topology of brain activity. To address these challenges and fully exploit the potential of large-scale EEG models, we propose a novel Topology Hierarchical Derived Brain Autoregressive Modeling (THD-BAR) for EEG generic representations. The core innovation of THD-BAR lies in the introduction of the Brain Topology Hierarchy (BTH), which establishes a multi-scale spatial order for EEG channels. This hierarchical structure enables a redefinition of autoregressive learning as a "next-scale-time prediction" problem, effectively capturing both spatial and temporal dynamics. Based on BTH, we design a Topology-Hierarchical Vector Quantized-Variational Autoencoder (THVQ-VAE) for multi-scale tokenization and develop an enhanced Brain Autoregressive (BAR) module with specialized masking strategies for prediction. Through extensive large-scale pre-training on 17 datasets, followed by rigorous validation on 10 downstream datasets spanning 5 distinct tasks, THD-BAR consistently outperforms existing methods. These results highlight the superior generalization and modeling capabilities of our proposed approach.

[22] arXiv:2511.12931 [中文pdf, pdf, 其他]
标题: cryoSENSE:压缩感知在蛋白质冷冻电镜图像流形上稀疏和生成先验的高通量显微镜应用
标题: cryoSENSE: Compressive Sensing Enables High-throughput Microscopy with Sparse and Generative Priors on the Protein Cryo-EM Image Manifold
Yiyang Xu, Ziyou Shen, Yanqing Lv, Shutong Tan, Chun Sun, Juan Zhang
主题: 图像与视频处理 (eess.IV) ; 生物大分子 (q-bio.BM)

冷冻电子显微镜(冷冻电镜)能够实现生物分子的原子分辨率可视化;然而,现代直接探测器生成的数据量远超可用存储和传输带宽,从而限制了实际吞吐量。 我们介绍了冷冻SENSE,这是为压缩冷冻电镜感知和采集设计的软硬件协同框架的计算实现。 我们表明,蛋白质的冷冻电镜图像位于低维流形上,可以使用预定义基中的稀疏先验和由去噪扩散模型捕获的生成先验独立表示。 冷冻SENSE利用这些低维流形,从空间和傅里叶域欠采样测量中实现忠实的图像重建,同时保持下游结构分辨率。 在实验中,冷冻SENSE将采集吞吐量提高了高达2.5$\times$,同时保留原始的3D分辨率,提供了遮蔽测量数量与下采样水平之间的可控权衡。 稀疏先验有利于从傅里叶域测量和中等压缩中实现忠实重建,而生成扩散先验则可以从像素域测量和更严重的欠采样中实现准确恢复。 项目网站:https://cryosense.github.io.

Cryo-electron microscopy (cryo-EM) enables the atomic-resolution visualization of biomolecules; however, modern direct detectors generate data volumes that far exceed the available storage and transfer bandwidth, thereby constraining practical throughput. We introduce cryoSENSE, the computational realization of a hardware-software co-designed framework for compressive cryo-EM sensing and acquisition. We show that cryo-EM images of proteins lie on low-dimensional manifolds that can be independently represented using sparse priors in predefined bases and generative priors captured by a denoising diffusion model. cryoSENSE leverages these low-dimensional manifolds to enable faithful image reconstruction from spatial and Fourier-domain undersampled measurements while preserving downstream structural resolution. In experiments, cryoSENSE increases acquisition throughput by up to 2.5$\times$ while retaining the original 3D resolution, offering controllable trade-offs between the number of masked measurements and the level of downsampling. Sparse priors favor faithful reconstruction from Fourier-domain measurements and moderate compression, whereas generative diffusion priors achieve accurate recovery from pixel-domain measurements and more severe undersampling. Project website: https://cryosense.github.io.

[23] arXiv:2509.00122 [中文pdf, pdf, html, 其他]
标题: 鱼类迁移建模中的两个问题
标题: Two Issues in Modelling Fish Migration
Rui Zhu, Xiaopu Zhou, Haixu Tang, Stephen W. Scherer, Lucila Ohno-Machado
主题: 种群与进化 (q-bio.PE)

鱼类迁徙是地球上许多地表水体中观察到的一种动态现象,但对其理解仍不充分。 特别是,鱼类迁徙背后的生物机制尚未完全明了。 此外,其观察通常通过视觉进行,因此是手动的,这引发了对所采样数据准确性和解释的质疑。 我们基于一个最近开发的数学模型来解决鱼类迁徙的两个问题,即机制和观察。 本文获得的结果表明,鱼类迁徙可以通过一个最小化原理来描述,并评估其手动观察的误差。 我们假设的最小化原理是一个最优控制问题,其中迁徙的鱼类种群动态地改变其数量和波动。 我们数值研究了交替和密集的观察方案作为案例研究,证明在某些现实条件下,总鱼类数量的估计不可靠。 我们认为,本文有助于更深入地理解鱼类迁徙。

Fish migration is a dynamic phenomenon observed in many surface water bodies on the earth, while its understanding is still insufficient. Particularly, the biological mechanism behind fish migration is not fully understood. Moreover, its observation is often conducted visually and hence manually, raising questions of accuracy and interpretation of the data sampled. We address the two issues, mechanism and observation, of fish migration based on a recently developed mathematical model. The results obtained in this short paper show that fish migration can be characterized through a minimization principle and evaluate the error of its manual observations. The minimization principle we hypothesize is an optimal control problem where the migrating fish population dynamically changes its size and fluctuation. We numerically investigate alternating and intensive observation schemes as case studies, demonstrating that in some realistic conditions the estimate of total fish count is not reliable. We believe that this paper contributes to a deeper understanding of fish migration.

交叉提交 (展示 5 之 5 条目 )

[24] arXiv:2511.13954 (交叉列表自 q-bio.NC) [中文pdf, pdf, html, 其他]
标题: 脑波编码千个标记:用于有效基于EEG的情绪识别的皮层间神经交互建模
标题: A Brain Wave Encodes a Thousand Tokens: Modeling Inter-Cortical Neural Interactions for Effective EEG-based Emotion Recognition
Wenchao Yang, Weidong Yan, Wenkang Liu, Yulan Ma, Yang Li
主题: 神经与认知 (q-bio.NC) ; 机器学习 (cs.LG)

人类情感通过文字难以传达,在这个过程中常常被抽象化;然而,脑电图(EEG)信号可以为情感的大脑活动提供更直接的视角。 最近的研究表明,深度学习模型可以处理这些信号,以高精度执行情感识别。 然而,许多现有的方法忽视了不同大脑区域之间的动态相互作用,这可能对于理解情感如何随时间展开和演变至关重要,从而有助于更准确的情感识别。 为了解决这个问题,我们提出了RBTransformer,这是一种基于Transformer的神经网络架构,它在潜在空间中建模大脑的皮层神经动力学,以更好地捕捉结构化的神经交互,从而实现有效的基于EEG的情感识别。 首先,将EEG信号转换为带差熵(BDE)标记,然后通过电极身份嵌入来保留空间来源。 这些标记通过一系列皮层间多头注意力块进行处理,构建一个电极x电极注意力矩阵,使模型能够学习皮层间的神经依赖关系。 然后将得到的特征传递给分类头以获得最终预测。 我们在SEED、DEAP和DREAMER数据集上进行了广泛的实验,特别是在受试者依赖设置下,所有三个维度,即愉悦度、唤醒度和支配度(对于DEAP和DREAMER),在二元和多类分类设置下。 结果表明,所提出的RBTransformer在所有三个数据集的所有三个维度下,在两种分类设置下均优于所有之前最先进的方法。 源代码可在以下地址获取:https://github.com/nnilayy/RBTransformer.

Human emotions are difficult to convey through words and are often abstracted in the process; however, electroencephalogram (EEG) signals can offer a more direct lens into emotional brain activity. Recent studies show that deep learning models can process these signals to perform emotion recognition with high accuracy. However, many existing approaches overlook the dynamic interplay between distinct brain regions, which can be crucial to understanding how emotions unfold and evolve over time, potentially aiding in more accurate emotion recognition. To address this, we propose RBTransformer, a Transformer-based neural network architecture that models inter-cortical neural dynamics of the brain in latent space to better capture structured neural interactions for effective EEG-based emotion recognition. First, the EEG signals are converted into Band Differential Entropy (BDE) tokens, which are then passed through Electrode Identity embeddings to retain spatial provenance. These tokens are processed through successive inter-cortical multi-head attention blocks that construct an electrode x electrode attention matrix, allowing the model to learn the inter-cortical neural dependencies. The resulting features are then passed through a classification head to obtain the final prediction. We conducted extensive experiments, specifically under subject-dependent settings, on the SEED, DEAP, and DREAMER datasets, over all three dimensions, Valence, Arousal, and Dominance (for DEAP and DREAMER), under both binary and multi-class classification settings. The results demonstrate that the proposed RBTransformer outperforms all previous state-of-the-art methods across all three datasets, over all three dimensions under both classification settings. The source code is available at: https://github.com/nnilayy/RBTransformer.

[25] arXiv:2511.14453 (交叉列表自 q-bio.NC) [中文pdf, pdf, html, 其他]
标题: 多网络拓扑在个体语言学习成功中的作用
标题: Multi-network Topology Underlying Individual Language Learning Success
Jiaxin Qi, Yan Cui, Jianqiang Huang, Gaogang Xie
主题: 神经与认知 (q-bio.NC)

成人语言学习在个体之间存在很大差异。 传统上与额颞叶语言区域相关,这种差异现在越来越多地被视为源于分布式脑网络。 然而,这些网络及其拓扑结构在解释这些差异中的作用仍不清楚。 我们假设基于图论的内在多模态连接网络分析可以解释语言学习的整体和特定成分的变异。 我们在101名健康成年人中进行了测试,在七天的六项人工语言训练任务之前进行了静息态fMRI、结构MRI和扩散张量成像。 我们确定了一个跨任务共享的主要通用学习成分和五个任务特定的成分。 经过交叉验证的预测模型使用多模态多网络图论指标来预测最终的学习结果(LO)和学习速率(LR)。 我们显著地预测了通用成分的LO和LR,这主要由背侧注意网络和前顶叶网络贡献。 节点局部效率是最一致的预测因子,对于LR,节点聚类系数和网络中心性也有额外贡献,突显了局部稳健性、中尺度网络分离和全局影响在解释个体差异中的作用。 只有特定任务的单词学习LO是可以预测的,这依赖于默认模式和前顶叶枢纽,这些枢纽具有高介数中心性和效率。 这些发现表明,内在网络拓扑结构是语言学习成功差异的基础,支持了一种多系统假说,即注意控制网络与默认和皮层下系统相互作用以塑造学习轨迹。 这推进了对机制的理解,并为个性化语言教育铺平了道路。

Adult language learning varies greatly among individuals. Traditionally associated with frontotemporal language regions, this variability is increasingly seen as stemming from distributed brain networks. However, the role of these networks and their topological organization in explaining these differences remains unclear. We hypothesize that graph-theory-based network analysis of intrinsic multimodal connectivities across multiple networks explains overall and component-specific variations in language learning. We tested this in 101 healthy adults who underwent resting-state fMRI, structural MRI, and diffusion tensor imaging before seven days of six artificial language training tasks. We identified one dominant general learning component shared across tasks and five task-specific ones. Cross-validated predictive models used multimodal multi-network graph-theoretic metrics to predict final learning outcomes (LO) and rates (LR). We significantly predicted the LO and LR of the general component, which were primarily contributed by dorsal attention and frontoparietal networks. Nodal local efficiency was the most consistent predictor, with additional contributions from node clustering coefficient and network centrality for LR, highlighting local robustness, mesoscale network segregation, and global influence in explaining individual differences. Only task-specific word learning LO was predictable, relying on default mode and frontoparietal hubs with high betweenness centrality and efficiency. These findings demonstrate that intrinsic network topologies underlie differences in language learning success, supporting a multiple-systems hypothesis in which attentional-control networks interact with default and subcortical systems to shape learning trajectories. This advances mechanistic understanding and paves the way for personalized language education.

[26] arXiv:2511.14676 (交叉列表自 q-bio.BM) [中文pdf, pdf, html, 其他]
标题: 探索AlphaFold 3用于CD47抗体-抗原结合亲和力:反向对接的意外发现
标题: Exploring AlphaFold 3 for CD47 Antibody-Antigen Binding Affinity: An Unexpected Discovery of Reverse docking
Zhonghao Liu, Hanxue Gu, Qihang Li, Michael Fox, Jay M. Levin, Maciej A. Mazurowski, Brian C. Lau
评论: 15页,4图,已提交至ACS Omega
主题: 生物大分子 (q-bio.BM)

AlphaFold 3 (AF3) 是一种基于最新深度学习算法和革命性人工智能模型架构的强大生物分子结构预测工具。 一些论文已经研究了其在预测不同生物分子结构方面的准确性。 然而,AF3 在基础结构预测之外的潜在应用尚未得到充分探索。 在我们的研究中,我们首先关注抗体-抗原(CD47)复合物的结构预测,由于有限的已解析同源晶体结构,这被认为对 AF3 是一个挑战。 此外,我们旨在通过与商业软件的其他分子对接模块进行结合亲和力分析,评估 AF3 在进行有效抗体候选物预筛选方面的潜力,这将大大有助于药物开发中的先导化合物识别或优化过程。 本质上,这不仅限于抗体-抗原结合亲和力,还包括基于 AF3 准确预测结构的任何药物候选物的许多其他化学或物理性质,这些结构非常接近现实。 根据我们的实验结果,AF3 是一个非常有前景的竞争对手,能够高效地为大多数对象生成高度可靠的分子结构以及后续的结合能预测。 令人惊讶的是,我们的两个抗体对象中观察到了一种意外且非随机的现象“反向对接”,这表明 AF3 架构革命带来了新的问题。 我们的分析和错误校正实验证明,这种现象可能是由革命性的人工智能模型架构引起的,这为人工智能在结构预测中的优化和设计方向提供了重要的经验和提醒。 所有软件版权属于中国药科大学(CPU)及其附属药学院和理学院。

AlphaFold 3 (AF3) is a powerful biomolecular structure-predicting tool based on the latest deep learning algorithms and revolutionized AI model architectures. A few of papers have already investigated its accuracy in predicting different biomolecular structures. However, the potential applications of AF3 beyond basic structure prediction have not been fully explored. In our study, we firstly focused on structure predictions of antibody-antigen (CD47) complexes, which is believed to be challenge for AF3 due to limited resolved cognate crystallographic structures. Furtherly, we aimed to the potentiality of AF3 in performing pre-screening for potent antibody candidates as an auxiliary work through binding affinity analysis compared to other molecular docking modules of commercial software, which would greatly benefit the lead identification or optimization process in the drug development. In essence, this is not limited to antibody-antigen binding affinity, but many other chemical or physical properties of any drug candidate based on AF3's accurate predicting structures that are extremely close to the reality. According to our experimental results, AF3 is a very promising competitor, which can efficiently produce highly reliable molecular structures and subsequent binding energy predictions for most subjects. Surprisingly, an unexpected and nonrandom phenomenon "reverse docking" was observed for two of our antibody subjects, suggesting new issues arising from the architectural revolution of AF3. Our analysis and error correction experiments show that this phenomenon is likely to be caused by revolutionized AI model architectures, which provides important experience and reminders for the optimization and design direction of AI for structural prediction. All software copyrights belong to the China Pharmaceutical University (CPU) and its affiliated School of Pharmacy and School of Science.

[27] arXiv:2511.14472 (交叉列表自 physics.hist-ph) [中文pdf, pdf, html, 其他]
标题: 麦克劳林与道德
标题: MacLaurin and Morality
Oriol Cabanas-Tirapu, Sergio Cobo-Lopez, Savannah E. Sanchez, Forest L. Rohwer, Marta Sales-Pardo, Roger Guimerà
评论: 30页,6幅图,1张表。一篇拟议章节的草稿,将收录于因康森纳恩斯会议(梅努思,2023年8月30日至9月1日)而编纂的文集,该文集将出版于科学系列,并由苏珊·戈特洛伯编辑。Ciarán Mac an Bhaird 和 Kevin Tracey
主题: 物理的历史与哲学 (physics.hist-ph) ; 历史与概述 (math.HO)

苏格兰数学家科林·麦克劳林(1698-1746)最著名的作品是《流数论》(1742)、《艾萨克·牛顿哲学发现的概述》(1748)以及一种幂级数的名称。 然而,很少有人知道,1714年时年仅十六岁的麦克劳林撰写了一篇简短的手稿,在其中他试图将牛顿原理应用于道德领域,这种数学化的尝试表明与前几个世纪有很强的连续性。 《论善求之心的力量》(De viribus mentium bonipetis)一直未被发表,隐藏在爱丁堡大学科林·坎贝尔收藏的文件中,超过250年后才被发现;直到二十世纪末才被重新发现。 《论善求之心的力量》为人们提供了深入了解年轻麦克劳林如何处理早期牛顿主义、苏格兰教会教义,以及科学与宗教之间新兴交汇点的宝贵视角,这发生在苏格兰启蒙运动黎明之前。 《论善求之心的力量》最引人入胜的方面,是麦克劳林在其数学讨论中穿插的有关苏格兰长老会道德的个人片段。 这些内容常常模糊而含蓄,必须通过他的数学、同时代人以及周围社会结构来理解它们。 在这个过程中,不仅可以了解麦克劳林的家庭背景和个人宗教思想,还能了解当时苏格兰大学的教学文化与性质。 《论善求之心的力量》不仅表明,早在更著名的弗朗西斯·哈奇森、大卫·休谟和乔治·特布尔等人的后期较为粗糙的尝试之前,苏格兰就已经开始探讨将道德数学化的思想,还表明了与前几个世纪数学化方法之间的延续性,为进入启蒙时代提供了一个强有力的桥梁。

Scottish mathematician Colin MacLaurin (1698-1746) is best known for his A Treatise of Fluxions (1742), An Account of Sir Isaac Newton's Philosophical Discoveries (1748), and the appellation for a type of power series. However, it is hardly known that in 1714 at the age of sixteen MacLaurin penned a short manuscript wherein he tried to apply Newtonian principles to morality, in an approach to mathematization that suggests strong continuities with earlier centuries. De viribus mentium bonipetis (On the good-seeking forces of minds) remained unpublished and hidden in the papers of the Colin Campbell Collection at the University of Edinburgh for over 250 years; it was only uncovered at the end of the twentieth century. De viribus provides a remarkable glimpse into how the young MacLaurin dealt with early Newtonianism, the tenets of the Church of Scotland, and the nascent interface between science and religion just prior to the dawn of the Scottish Enlightenment. Perhaps the most intriguing aspect of De viribus is the personal snippets related to Scottish Presbyterian morality that MacLaurin interjects throughout his mathematical discussion. These are often vague and oblique, and one must look to his mathematics, his contemporaries, and the social fabric of his surroundings to understand them. In the process, one gains insight not only into MacLaurin's family background and personal religious thought, but also into the culture and nature of teaching in Scottish universities at the time. De Viribus not only demonstrates that ideas of mathematising morality were being canvassed in Scotland much earlier than the better known, but less sophisticated, later attempts by the likes of Frances Hutcheson, David Hume and George Turnbull, but suggests continuities with approaches to mathematization in earlier centuries, providing a strong bridge into the Enlightenment era.

[28] arXiv:2511.14663 (交叉列表自 q-bio.BM) [中文pdf, pdf, 其他]
标题: ApexGen:针对目标蛋白的肽结合物序列和结构的同步设计
标题: ApexGen: Simultaneous design of peptide binder sequence and structure for target proteins
Sunday A. Adetunji
主题: 生物大分子 (q-bio.BM)

基于肽的药物可以结合小分子通常无法结合的蛋白质相互作用位点,并且比大蛋白药物更容易生产。 然而,设计有效的肽结合物很困难。 一种典型的肽有大量可能的序列,其中只有少数会折叠成正确的三维结构以匹配给定的蛋白质靶点。 现有的计算方法要么生成许多候选序列而不考虑它们如何折叠,要么先构建肽骨架,然后再寻找合适的序列。 在这里,我们介绍了ApexGen,一种新的基于人工智能的框架,它同时设计肽的氨基酸序列和三维结构以适应给定的蛋白质靶点。 对于每个靶点,ApexGen在少量确定性的积分步骤中生成一个完整的全原子肽模型。 在对数百个蛋白质靶点的测试中,由ApexGen设计的肽紧密地结合到其靶点表面,并覆盖几乎整个结合位点。 这些肽的形状类似于天然蛋白质-肽复合物中的形状,并在计算实验中表现出强烈的预测结合亲和力。 由于ApexGen在流匹配采样器内的欧拉积分每一步都耦合序列和结构设计,因此比之前的方法快得多且更高效。 这种统一的方法可以大大加速新型基于肽的治疗药物的发现。

Peptide-based drugs can bind to protein interaction sites that small molecules often cannot, and are easier to produce than large protein drugs. However, designing effective peptide binders is difficult. A typical peptide has an enormous number of possible sequences, and only a few of these will fold into the right 3D shape to match a given protein target. Existing computational methods either generate many candidate sequences without considering how they will fold, or build peptide backbones and then find suitable sequences afterward. Here we introduce ApexGen, a new AI-based framework that simultaneously designs a peptide's amino-acid sequence and its three-dimensional structure to fit a given protein target. For each target, ApexGen produces a full all-atom peptide model in a small number of deterministic integration steps. In tests on hundreds of protein targets, the peptides designed by ApexGen fit tightly onto their target surfaces and cover nearly the entire binding site. These peptides have shapes similar to those found in natural protein-peptide complexes, and they show strong predicted binding affinity in computational experiments. Because ApexGen couples sequence and structure design at every step of Euler integration within a flow-matching sampler, it is much faster and more efficient than prior approaches. This unified method could greatly accelerate the discovery of new peptide-based therapeutics.

替换提交 (展示 8 之 8 条目 )

[29] arXiv:2511.13899 (替换) [中文pdf, pdf, 其他]
标题: 一种解耦低秩RNN框架,用于揭示神经连接和动态特性
标题: A Disentangled Low-Rank RNN Framework for Uncovering Neural Connectivity and Dynamics
Marc Fiammante (1,2), Anne-Isabelle Vermersch (3), Marie Vidailhet (1,4), Mario Chavez (5) ((1) Paris Brain Institute, Inserm U1127, CNRS UMR7225, Sorbonne Universite UM75, Inria Paris (Team Nerv), Pitie-Salpetriere Hospital, Paris, France, (2) Retired IBM Fellow, (3) Physiology &amp; Paediatric Functional Explorations Unit, Armand Trousseau Hospital, Paris, France, (4) Institut de Neurologie, Pitie-Salpetriere Hospital, Paris, France, (5) CNRS UMR-7225, Pitie-Salpetriere Hospital, Paris, France)
主题: 神经与认知 (q-bio.NC) ; 计算工程、金融与科学 (cs.CE) ; 机器学习 (cs.LG)

低秩循环神经网络(lrRNNs)是一类模型,能够揭示神经种群活动背后的低维潜在动态。 尽管其功能连接是低秩的,但缺乏解耦解释,使得难以为不同的潜在维度分配不同的计算角色。 为了解决这个问题,我们提出了解耦循环神经网络(DisRNN),这是一种生成式lrRNN框架,假设潜在动态之间存在组内独立性,同时允许组内灵活的纠缠。 这些独立的潜在组允许潜在动态分别演化,但在内部丰富,适合复杂计算。 我们在变分自编码器(VAE)框架下重新制定了lrRNN,使我们能够引入部分相关惩罚,以鼓励潜在维度组之间的解耦。 在合成数据、猴子M1数据和小鼠电压成像数据上的实验表明,DisRNN在低维空间和低秩连接中,始终提高了学习到的神经潜在轨迹的解耦性和可解释性,优于不鼓励部分解耦的基线lrRNN。

Low-rank recurrent neural networks (lrRNNs) are a class of models that uncover low-dimensional latent dynamics underlying neural population activity. Although their functional connectivity is low-rank, it lacks disentanglement interpretations, making it difficult to assign distinct computational roles to different latent dimensions. To address this, we propose the Disentangled Recurrent Neural Network (DisRNN), a generative lrRNN framework that assumes group-wise independence among latent dynamics while allowing flexible within-group entanglement. These independent latent groups allow latent dynamics to evolve separately, but are internally rich for complex computation. We reformulate the lrRNN under a variational autoencoder (VAE) framework, enabling us to introduce a partial correlation penalty that encourages disentanglement between groups of latent dimensions. Experiments on synthetic, monkey M1, and mouse voltage imaging data show that DisRNN consistently improves the disentanglement and interpretability of learned neural latent trajectories in low-dimensional space and low-rank connectivity over baseline lrRNNs that do not encourage partial disentanglement.

[30] arXiv:2511.13799 (替换) [中文pdf, pdf, html, 其他]
标题: 海草-细菌-藻类-硅藻相互作用的共生因果网络
标题: Symbiotic causal network of seagrass-bacteria-alga-diatom interactions
Xinnan Zhang, Jialin Wu, Junyi Xie, Tianlong Chen, Kaixiong Zhou
评论: 11页,36图(5张主图)
主题: 定量方法 (q-bio.QM)

海草床对海洋生态系统的保护、减少全球变暖影响和病原体控制有贡献。 然而,由于环境负荷导致的海草栖息地退化已成为一个紧迫的全球性问题。 解决这一问题的一种方法是更好地了解健康的海草栖息地。 在此,我们估算了日本八个沿海地区的沉积物中共生和代谢系统的结构特征,每个地区都包含海草覆盖区域和相邻的无植被区域。 值得注意的是,海草通常维持一种平衡的共生关系,表现为与电缆细菌(Desulfobulbaceae)、氮循环细菌(Hyphomonadaceae)和珊瑚藻(Corallinophycidae)的正相关关系,以及与硅藻(Diatomea)的负相关关系。 此外,海草的生长条件通过激活与氮相关的代谢而抑制甲烷生成。 我们的研究结果突显了海洋植物及其共生系统在蓝碳储存背景下,确保环境保守的重要作用,跨越环境梯度。

Seagrass meadows contribute to the conservation of marine ecosystems, reduction in global warming impacts and pathogen controls. However, the decline in seagrass habitats due to environmental loads has become an urgent global issue. One way to address this issue is to better understand healthy seagrass habitats. Here, we estimate the structural characteristics of symbiotic and metabolic systems in sediments from eight coastal regions of Japan, with each region containing both seagrass-covered areas and adjacent unvegetated areas. Notably, seagrasses commonly maintain a balanced symbiotic relationship characterized by a positive association with cable bacteria (Desulfobulbaceae), nitrogen-cycling bacteria (Hyphomonadaceae), and coral alga (Corallinophycidae) and a negative association with diatoms (Diatomea). Furthermore, seagrass growth conditions influence metabolic pathways by activating nitrogen-related metabolism while attenuating methanogenesis. Our findings highlight the crucial roles of marine plants and their symbiotic systems in ensuring environmental conservation within the context of blue carbon storage across environmental gradients.

[31] arXiv:2511.14121 (替换) [中文pdf, pdf, html, 其他]
标题: 平衡热力学的规范量子化
标题: Canonical quantization for Equilibrium Thermodynamics
Lea Gassab, Onur Pusuluk, Marco Cattaneo, Özgür E. Müstecaplıoğlu
评论: 15+3页,没有图表。欢迎提出意见!
主题: 量子物理 (quant-ph)

我们通过应用狄拉克的约束系统理论,对平衡热力学进行规范量子化。 热力学变量被当作共轭坐标和动量对处理,使得广延量和强度量能够在希尔伯特空间中提升为算符。 该形式化方法应用于理想气体、范德瓦尔斯气体和光子气体,展示了第一类和第二类量子化过程。 对于理想气体,出现一个类似于薛定谔方程的方程,其中熵扮演时间的角色,波函数的相位由内能决定。 一个伪厄米特框架恢复了温度算符的厄米性,并建立了约束实现之间的等价性。 这种方法自然导致热力学不确定关系,并暗示向量子和拓扑相变以及黑洞和非平衡热力学的扩展。

We formulate a canonical quantization of Equilibrium Thermodynamics by applying Dirac's theory of constrained systems. Thermodynamic variables are treated as conjugate pairs of coordinates and momenta, allowing extensive and intensive quantities to be promoted to operators in a Hilbert space. The formalism is applied to the ideal gas, the van der Waals gas, and the photon gas, illustrating both first- and second-class quantization procedures. For the ideal gas, a Schrödinger-like equation emerges in which entropy plays the role of time, and the wave function acquires a phase determined by the internal energy. A pseudo-Hermitian framework restores Hermiticity of the temperature operator and establishes the equivalence among constraint realizations. The approach naturally leads to thermodynamic uncertainty relations and suggests extensions to quantum and topological phase transitions, as well as black-hole and non-equilibrium thermodynamics.

[32] arXiv:2503.22662 (替换) [中文pdf, pdf, 其他]
标题: 三相Muskat问题:关于界面之间条带宽度的统一寿命
标题: Three-phase Muskat problem: uniform lifespan with respect to the width of the strip between interfaces
Mareike Fischer, Steven Kelk, Sofia Vazquez Alferez
主题: 偏微分方程分析 (math.AP)

我们考虑密度不同但粘度相同的三相Muskat问题。研究了解在界面之间条带宽度方面的生存时间。 确实,这些界面由两个函数$f(x,t)$和$g(x,t)$的图参数化,并且我们施加条件$||f(\cdot,0)-g(\cdot,0)||_{L^\infty}\leq Cσ$和$\inf_x |f(\cdot,0)-g(\cdot,0)|\geq cσ.$。在对$f(x,0)$和$g(x,0)$有更强假设的情况下,证明了局部存在性,该存在性独立于参数$σ$(其中$σ$足够小)。 为了证明这样的结果,我们需要在解析空间中进行工作。

We consider the three-phase Muskat problem with different densities and the same viscosities. The lifespan of the solutions with respect to the width of the strip between interfaces is studied. Indeed, the interfaces are parameterized by the graph of two functions $f(x,t)$ and $g(x,t)$ and we impose that $||f(\cdot,0)-g(\cdot,0)||_{L^\infty}\leq Cσ$ and $\inf_x |f(\cdot,0)-g(\cdot,0)|\geq cσ.$ It is shown, under stronger assumption on $f(x,0)$ and $g(x,0)$, local existence independent of the parameter $σ$ (with $σ$ small enough). In order to prove such a result, we need to work in analytic spaces.

[33] arXiv:2511.14523 (替换) [中文pdf, pdf, 其他]
标题: 教授纵向线性混合模型端到端:小鼠体重增长的可重复案例研究
标题: Teaching Longitudinal Linear Mixed Models End-to-End: A Reproducible Case Study in Mouse Body-Weight Growth
Hidekazu Yoshioka
评论: 42页,5图,7表。包含可完全复现的R代码和使用小鼠体重案例研究的纵向线性混合模型教学材料
主题: 方法论 (stat.ME) ; 其他定量生物学 (q-bio.OT)

背景:线性混合效应模型对于分析纵向连续数据至关重要,但许多学习者只是将它们作为零散的公式或软件输出来接触,而不是作为一个连贯的工作流程。 需要一个单一且可重复的案例研究,将问题、模型构建、诊断和解释联系起来。 方法:我们重新分析了一项已发表的小鼠体重实验,该实验包含31只小鼠,分为三组,每周称重12周。 在将数据重塑为长格式并使用轮廓图来激发线性时间趋势后,我们拟合了三个随机截距线性混合模型:一个公共斜率模型,一个完全交互的组别-时间模型,以及一个简洁模型,该模型具有组特定截距,两个组共享一个斜率,第三个组有一个额外的斜率。 使用最大似然法、AIC、BIC和似然比检验对模型进行比较,并使用线性对比来估计组别在每周均值和12周增长中的差异。 结果:简洁模型的拟合效果与完全交互模型相当,并明显优于公共斜率模型,揭示了两组的小而相似的增长和第三组更陡峭的增长,且对于过量体重增长的对比具有高度显著性。 解释:这个案例研究提供了一个完整的、可执行的工作流程,用于纵向线性混合建模,从原始数据和探索性图表到模型选择、诊断和有针对性的对比。 通过明确科学问题到模型项和可估计对比的映射,并提供R代码和分步检查清单,它为生物统计学、流行病学及相关领域的教学和应用工作提供了一个实用的模板。

Background: Linear mixed-effects models are central for analyzing longitudinal continuous data, yet many learners meet them as scattered formulas or software output rather than as a coherent workflow. There is a need for a single, reproducible case study that links questions, model building, diagnostics, and interpretation. Methods: We reanalyze a published mouse body-weight experiment with 31 mice in three groups weighed weekly for 12 weeks. After reshaping the data to long format and using profile plots to motivate linear time trends, we fit three random-intercept linear mixed models: a common-slope model, a fully interacted group-by-time model, and a parsimonious model with group-specific intercepts, a shared slope for two groups, and an extra slope for the third. Models are compared using maximum likelihood, AIC, BIC, and likelihood ratio tests, and linear contrasts are used to estimate group differences in weekly means and 12 week gains. Results: The parsimonious model fits as well as the fully interacted model and clearly outperforms the common-slope model, revealing small and similar gains in two groups and much steeper growth in the third, with highly significant contrasts for excess weight gain. Interpretation: This case study gives a complete, executable workflow for longitudinal linear mixed modeling, from raw data and exploratory plots through model selection, diagnostics, and targeted contrasts. By making explicit the mapping from scientific questions to model terms and estimable contrasts, and by providing R code and a stepwise checklist, it serves as a practical template for teaching and applied work in biostatistics, epidemiology, and related fields

[34] arXiv:2511.14090 (替换) [中文pdf, pdf, html, 其他]
标题: 进化滞后的:在崎岖景观中的循环
标题: Evolutionary Hysteresis: Cycling about in a Rugged Landscape
Akmuhammet Ashyralyyev, Zülal Bingöl, Begüm Filiz Öz, Salem Malikic, Uzi Vishkin, S. Cenk Sahinalp, Can Alkan
评论: 12页,5图
主题: 种群与进化 (q-bio.PE)

在本工作中,我们结合理论建模、分子模拟和经验分析来识别和表征进化滞后的现象。 我们首先展示如何通过上位相互作用在两座位 Wright-Fisher 模型中产生双稳态适应度景观和结构滞后,揭示在循环和噪声选择下两种不同的滞后区域。 值得注意的是,一个受到上位约束的种群在环境随机性处于中间水平时达到最大平均适应度。 随后,我们将这一框架扩展到更复杂的系统,展示了在无序多座位模型以及蛋白质结构灵活性的生物物理现实模拟中,滞后环具有鲁棒性。 最后,我们提供了进化滞后的直接实证证据。 通过对经历强烈季节温度循环的淡水 C. Nanopelagicaceae 的二十年宏基因组时间序列数据进行分析,我们发现大约 65% 的季节性振荡等位基因表现出统计显著的滞后效应。 综上所述,这些结果确立了滞后作为进化的一个普遍可测量特征,并可能作为复杂适应度景观的探测工具。

In this work, we integrate theoretical modeling, molecular simulation, and empirical analysis to identify and characterize evolutionary hysteresis. We first show how epistatic interactions create bistable fitness landscapes and structural hysteresis in a two-locus Wright-Fisher model, revealing two distinct hysteresis regimes under cyclic and noisy selection. Notably, an epistatically constrained population achieves maximal average fitness at an intermediate level of environmental stochasticity. We then extend this framework to more complex systems, demonstrating robust hysteresis loops in both a disordered multi-locus model and in biophysically realistic simulation of protein structural flexibility. Finally, we present direct empirical evidence of evolutionary hysteresis. By analyzing two decades of metagenomic time-series data from freshwater C. Nanopelagicaceae experiencing strong seasonal temperature cycles, we find that approximately 65% of seasonally oscillating alleles exhibit statistically significant hysteresis. Together, these results establish hysteresis as a general, measurable feature of evolution and a potential probe of complex fitness landscapes.

[35] arXiv:2407.04055 (替换) [中文pdf, pdf, html, 其他]
标题: 从药物结构角度的药物靶点相互作用建模基准测试
标题: Benchmark on Drug Target Interaction Modeling from a Drug Structure Perspective
Yohan Lee, DongGyun Kang, SeHoon Park, Sa-Yoon Park, Kwangsoo Kim
主题: 定量方法 (q-bio.QM) ; 人工智能 (cs.AI) ; 机器学习 (cs.LG)

药物-靶点相互作用的预测建模对于药物发现和设计至关重要,由于深度学习技术的进步,这一领域已取得了快速进展。 最近开发的方法,如基于图神经网络(GNNs)和Transformer的方法,在各种数据集上表现出色,能够有效提取结构信息。 然而,这些新方法的基准测试在超参数设置和数据集方面往往存在显著差异,这限制了算法的进步。 鉴于此,我们通过集成数十种显式(即基于GNN)和隐式(即基于Transformer)结构学习算法,从结构角度对药物-靶点相互作用建模进行了全面的调查和基准测试。 我们对这两类编码策略以及影响分子化学和物理性质的不同特征化技术进行了宏观比较。 随后,我们通过全面基准测试其有效性和效率,对六个数据集上的所有集成模型进行了微观比较。 为确保公平性,我们研究了在单独优化配置下的模型性能。 值得注意的是,基准研究总结的见解促成了模型组合的设计。 我们证明,我们的组合可以在与成本效益高的内存和计算相关的各种数据集上实现新的最先进性能。

The prediction modeling of drug-target interactions is crucial to drug discovery and design, which has seen rapid advancements owing to deep learning technologies. Recently developed methods, such as those based on graph neural networks (GNNs) and Transformers, demonstrate exceptional performance across various datasets by effectively extracting structural information. However, the benchmarking of these novel methods often varies significantly in terms of hyperparameter settings and datasets, which limits algorithmic progress. In view of these, we conducted a comprehensive survey and benchmark for drug-target interaction modeling from a structural perspective via integrating tens of explicit (i.e., GNN-based) and implicit (i.e., Transformer-based) structure learning algorithms. We conducted a macroscopical comparison between these two classes of encoding strategies as well as the different featurization techniques that inform molecules' chemical and physical properties. We then carry out the microscopical comparison between all the integrated models across the six datasets via comprehensively benchmarking their effectiveness and efficiency. To ensure fairness, we investigate model performance under individually optimized configuration. Remarkably, the summarized insights from the benchmark studies lead to the design of model combos. We demonstrate that our combos can achieve new state-of-the-art performance on various datasets associated with cost-effective memory and computation.

[36] arXiv:2511.14694 (替换) [中文pdf, pdf, html, 其他]
标题: 近无损模型压缩在DNA大语言模型中实现更长上下文推理
标题: Near-Lossless Model Compression Enables Longer Context Inference in DNA Large Language Models
Zain Shabeeb, Daniel Saeedi, Darin Tsui, Vida Jamali, Amirali Aghazadeh
主题: 基因组学 (q-bio.GN) ; 人工智能 (cs.AI) ; 机器学习 (cs.LG) ; 种群与进化 (q-bio.PE)

基于大规模跨物种DNA语料库训练,DNA大型语言模型(LLMs)学习基因组序列的基本“语法”和进化模式。 这使它们成为DNA序列建模的强大先验,尤其是在长范围内。 然而,两个主要限制阻碍了它们在实际中的使用:自注意力的二次计算成本以及自回归解码过程中键值(KV)缓存所需的不断增长的内存。 这些限制迫使使用启发式方法,如固定窗口截断或滑动窗口,这会通过丢弃远距离信息而损害超长序列的保真度。 我们引入了FOCUS(面向特征的超长自注意力压缩),一个可以插入预训练DNA LLM中的渐进式上下文压缩模块。 FOCUS结合了基因组学中已建立的k-mer表示与可学习的分层压缩:它在k-mer粒度上插入摘要标记,并在多个Transformer层中逐步压缩注意力键和值激活,仅保留窗口间的摘要KV状态,同时丢弃普通标记的KV。 一种共享边界窗口方案产生了一个静态的跨窗口接口,以最小的损失传播长距离信息。 我们在基于Evo-2的DNA LLM上验证了FOCUS,该模型在GRCh38染色体1上进行了自监督训练,并采用了随机压缩计划以提高不同压缩比下的鲁棒性。 在保留的人类染色体上,FOCUS实现了接近无损的保真度:将1 kb的上下文压缩为仅10个摘要标记(约100倍)仅使每核苷酸概率平均变化约0.0004。 与没有压缩的基线相比,FOCUS减少了KV缓存内存,并将有效推理扩展从O(N^2)转换为近线性O(N),在商品GPU上实现了约100倍更长的推理窗口,同时保持接近无损的保真度。

Trained on massive cross-species DNA corpora, DNA large language models (LLMs) learn the fundamental "grammar" and evolutionary patterns of genomic sequences. This makes them powerful priors for DNA sequence modeling, particularly over long ranges. However, two major constraints hinder their use in practice: the quadratic computational cost of self-attention and the growing memory required for key-value (KV) caches during autoregressive decoding. These constraints force the use of heuristics such as fixed-window truncation or sliding windows, which compromise fidelity on ultra-long sequences by discarding distant information. We introduce FOCUS (Feature-Oriented Compression for Ultra-long Self-attention), a progressive context-compression module that can be plugged into pretrained DNA LLMs. FOCUS combines the established k-mer representation in genomics with learnable hierarchical compression: it inserts summary tokens at k-mer granularity and progressively compresses attention key and value activations across multiple Transformer layers, retaining only the summary KV states across windows while discarding ordinary-token KV. A shared-boundary windowing scheme yields a stationary cross-window interface that propagates long-range information with minimal loss. We validate FOCUS on an Evo-2-based DNA LLM fine-tuned on GRCh38 chromosome 1 with self-supervised training and randomized compression schedules to promote robustness across compression ratios. On held-out human chromosomes, FOCUS achieves near-lossless fidelity: compressing a 1 kb context into only 10 summary tokens (about 100x) shifts the average per-nucleotide probability by only about 0.0004. Compared to a baseline without compression, FOCUS reduces KV-cache memory and converts effective inference scaling from O(N^2) to near-linear O(N), enabling about 100x longer inference windows on commodity GPUs with near-lossless fidelity.

总共 36 条目
显示最多 2000 每页条目: 较少 | 更多 | 所有
  • 关于
  • 帮助
  • contact arXivClick here to contact arXiv 联系
  • 订阅 arXiv 邮件列表点击这里订阅 订阅
  • 版权
  • 隐私政策
  • 网络无障碍帮助
  • arXiv 运营状态
    通过...获取状态通知 email 或者 slack

京ICP备2025123034号