电气工程与系统科学
查看 最近的 文章
显示 2025年11月19日, 星期三 新的列表
- [1] arXiv:2505.20311 [中文pdf, pdf, html, 其他]
-
标题: 欧盟人工智能法案、利益相关者需求和可解释人工智能:在临床决策支持系统中实现监管合规性标题: The EU AI Act, Stakeholder Needs, and Explainable AI: Aligning Regulatory Compliance in a Clinical Decision Support System评论: 18页,2图主题: 计算机与社会 (cs.CY) ; 人机交互 (cs.HC)
可解释人工智能(XAI)是遵守欧盟人工智能法案的有前景的途径,这是首个跨国的人工智能法规。 XAI提高了人工智能系统的透明度和人类监督,特别是被批评为难以理解的“黑箱”模型。 然而,关于人工智能法案的利益相关者和XAI的讨论仍然脱节:XAI越来越关注最终用户的需求,而人工智能法案则关注提供者和部署者的义务。 我们的目标是弥合这一鸿沟,并提供有关它们之间关系的实用指导。 通过XAI、人工智能法案、法律和需求工程专家组成的跨职能团队进行跨学科讨论,我们概述了分析基于人工智能的临床决策支持系统、明确最终用户需求和评估人工智能法案适用性的步骤。 使用正在开发的人工智能系统作为案例研究,我们展示了XAI技术如何帮助协调利益相关者需求与人工智能法案要求,并填补可用性与监管要求之间的差距。 我们比较了法律义务和最终用户需求之间的相似性和差异,识别了矛盾之处,并指出了具体的设计选择和权衡。 我们邀请XAI领域的研究人员和实践者反思他们在人工智能法案中的角色,并在不同学科之间建立相互理解。 虽然XAI可以帮助实施透明度和人类监督等核心人工智能法案原则,但应将其视为更广泛合规战略的一部分,该战略还要求标准化、法律解释、文档、组织流程、治理、测试以及持续的监控和审计实践。 我们的研究结果为将XAI整合到产品开发、合规工作流程和利益相关者沟通中提供了可操作的建议,有助于政策制定和标准发展。
Explainable AI (XAI) is a promising route to comply with the EU AI Act, the first multinational AI regulation. XAI enhances transparency and human oversight of AI systems, especially ''black-box`` models criticized as incomprehensible. Yet discourse about the AI Act's stakeholders and XAI remains disconnected: XAI increasingly prioritizes end users' needs, while the AI Act focuses on providers' and deployers' obligations. We aim to bridge this divide and offer practical guidance on their relationship. Through interdisciplinary discussion in a cross functional team of XAI, AI Act, legal, and requirements-engineering experts, we outline steps to analyze an AI-based clinical decision support system, clarify end-user needs, and assess AI Act applicability. Using an AI system under development as a case study, we show how XAI techniques can help reconcile stakeholder needs with AI Act requirements and fill gaps between usability and regulatory demands. We compare similarities and differences between legal obligations and end-user needs, identify tensions, and point to concrete design choices and trade-offs. We invite researchers and practitioners in XAI to reflect on their role relative to the AI Act and to develop mutual understanding across disciplines. While XAI can help implement core AI Act principles such as transparency and human oversight, it should be considered one element of a broader compliance strategy that also requires standardization, legal interpretation, documentation, organizational processes, governance, testing, and ongoing monitoring and auditing practices. Our findings yield actionable recommendations for integrating XAI into product development, compliance workflows, and stakeholder communication, informing policy-making and standards development.
- [2] arXiv:2511.13732 [中文pdf, pdf, html, 其他]
-
标题: 语音中推测解码的合理粗粒度接受标题: Principled Coarse-Grained Acceptance for Speculative Decoding in Speech主题: 音频与语音处理 (eess.AS) ; 机器学习 (cs.LG)
通过让一个快速的草稿模型提出令牌,由更大的目标模型验证,推测解码加速了自回归语音生成。 然而,对于生成声学令牌的语音大语言模型来说,精确的令牌匹配过于严格:许多离散令牌在声学或语义上可以互换,这降低了接受率并限制了加速效果。 我们引入了原则性粗粒化(PCG),它在从目标模型的嵌入空间中得出的声学相似组(ASGs)层面上验证提议。 通过将每个令牌的概率质量分布在包含它的重叠组上,我们定义了一个重叠感知的粗粒化分布,并在得到的组变量上进行拒绝采样。 这在组层面上提供了精确性保证,同时允许接受的草稿令牌在实践中代表组中的任何成员。 在LibriTTS上,与标准推测解码和先前的语音专用松弛方法相比,PCG提高了接受率和吞吐量,同时保持了可理解性和说话人相似性。 这些结果表明,具有声学意识的组级接受是一种简单且通用的方法,可以在保持语音质量的同时加速语音令牌生成。
Speculative decoding accelerates autoregressive speech generation by letting a fast draft model propose tokens that a larger target model verifies. However, for speech LLMs that generate acoustic tokens, exact token matching is overly restrictive: many discrete tokens are acoustically or semantically interchangeable, reducing acceptance rates and limiting speedups. We introduce Principled Coarse-Graining (PCG), which verifies proposals at the level of Acoustic Similarity Groups (ASGs) derived from the target model's embedding space. By splitting each token's probability mass across the overlapping groups that contain it, we define an overlap-aware coarse-grained distribution and perform rejection sampling on the resulting group variable. This yields an exactness guarantee at the group level while allowing the accepted draft token to stand in for any member of the group in practice. On LibriTTS, PCG increases acceptance and throughput relative to standard speculative decoding and prior speech-specific relaxations while maintaining intelligibility and speaker similarity. These results suggest acoustically aware, group-level acceptance as a simple and general way to accelerate speech token generation while maintaining speech quality.
- [3] arXiv:2412.13918 [中文pdf, pdf, html, 其他]
-
标题: 基于嵌套图条件的增量图查询的局部 RETE 算法标题: Localized RETE for Incremental Graph Queries with Nested Graph Conditions评论: arXiv管理员备注:与arXiv:2405.01145存在大量文本重叠主题: 计算机科学中的逻辑 (cs.LO) ; 数据库 (cs.DB)
随着基于图的建模工件在模型驱动工程中的规模不断增大,需要能够实现图查询高效执行的技术。 基于 RETE 算法的增量方法在许多场景中提供了适当的解决方案,但通常设计为在整个图上搜索查询结果。 然而,在某些情况下,用户可能只对子图的查询结果感兴趣,例如当开发人员正在处理一个大型模型,而只有其中一部分加载到他们的工作区中时。 在这种情况下,全局执行语义可能导致显著的计算开销。 为了缓解上述不足,本文提出了一种 RETE 方法的扩展,该方法能够在保证与相关子图结果完整性的同时,实现图查询的局部且完全增量执行。 我们通过受软件开发场景启发的实验以及来自独立社交网络基准的查询和数据,对所提出的方法进行了实证评估。 实验结果表明,所提出的技术在有利情况下可以显著提高内存消耗和执行时间的性能,但在不利情况下可能会产生明显的开销。
The growing size of graph-based modeling artifacts in model-driven engineering calls for techniques that enable efficient execution of graph queries. Incremental approaches based on the RETE algorithm provide an adequate solution in many scenarios, but are generally designed to search for query results over the entire graph. However, in certain situations, a user may only be interested in query results for a subgraph, for instance when a developer is working on a large model of which only a part is loaded into their workspace. In this case, the global execution semantics can result in significant computational overhead. To mitigate the outlined shortcoming, in this article we propose an extension of the RETE approach that enables local, yet fully incremental execution of graph queries, while still guaranteeing completeness of results with respect to the relevant subgraph. We empirically evaluate the presented approach via experiments inspired by a scenario from software development and with queries and data from an independent social network benchmark. The experimental results indicate that the proposed technique can significantly improve performance regarding memory consumption and execution time in favorable cases, but may incur a noticeable overhead in unfavorable cases.
- [4] arXiv:2511.13954 [中文pdf, pdf, html, 其他]
-
标题: 脑波编码千个标记:用于有效基于EEG的情绪识别的皮层间神经交互建模标题: A Brain Wave Encodes a Thousand Tokens: Modeling Inter-Cortical Neural Interactions for Effective EEG-based Emotion Recognition主题: 神经与认知 (q-bio.NC) ; 机器学习 (cs.LG)
人类情感通过文字难以传达,在这个过程中常常被抽象化;然而,脑电图(EEG)信号可以为情感的大脑活动提供更直接的视角。 最近的研究表明,深度学习模型可以处理这些信号,以高精度执行情感识别。 然而,许多现有的方法忽视了不同大脑区域之间的动态相互作用,这可能对于理解情感如何随时间展开和演变至关重要,从而有助于更准确的情感识别。 为了解决这个问题,我们提出了RBTransformer,这是一种基于Transformer的神经网络架构,它在潜在空间中建模大脑的皮层神经动力学,以更好地捕捉结构化的神经交互,从而实现有效的基于EEG的情感识别。 首先,将EEG信号转换为带差熵(BDE)标记,然后通过电极身份嵌入来保留空间来源。 这些标记通过一系列皮层间多头注意力块进行处理,构建一个电极x电极注意力矩阵,使模型能够学习皮层间的神经依赖关系。 然后将得到的特征传递给分类头以获得最终预测。 我们在SEED、DEAP和DREAMER数据集上进行了广泛的实验,特别是在受试者依赖设置下,所有三个维度,即愉悦度、唤醒度和支配度(对于DEAP和DREAMER),在二元和多类分类设置下。 结果表明,所提出的RBTransformer在所有三个数据集的所有三个维度下,在两种分类设置下均优于所有之前最先进的方法。 源代码可在以下地址获取:https://github.com/nnilayy/RBTransformer.
Human emotions are difficult to convey through words and are often abstracted in the process; however, electroencephalogram (EEG) signals can offer a more direct lens into emotional brain activity. Recent studies show that deep learning models can process these signals to perform emotion recognition with high accuracy. However, many existing approaches overlook the dynamic interplay between distinct brain regions, which can be crucial to understanding how emotions unfold and evolve over time, potentially aiding in more accurate emotion recognition. To address this, we propose RBTransformer, a Transformer-based neural network architecture that models inter-cortical neural dynamics of the brain in latent space to better capture structured neural interactions for effective EEG-based emotion recognition. First, the EEG signals are converted into Band Differential Entropy (BDE) tokens, which are then passed through Electrode Identity embeddings to retain spatial provenance. These tokens are processed through successive inter-cortical multi-head attention blocks that construct an electrode x electrode attention matrix, allowing the model to learn the inter-cortical neural dependencies. The resulting features are then passed through a classification head to obtain the final prediction. We conducted extensive experiments, specifically under subject-dependent settings, on the SEED, DEAP, and DREAMER datasets, over all three dimensions, Valence, Arousal, and Dominance (for DEAP and DREAMER), under both binary and multi-class classification settings. The results demonstrate that the proposed RBTransformer outperforms all previous state-of-the-art methods across all three datasets, over all three dimensions under both classification settings. The source code is available at: https://github.com/nnilayy/RBTransformer.
- [5] arXiv:2511.14274 [中文pdf, pdf, 其他]
-
标题: 低推力星际轨迹中的错过推进事件:一种数值方法标题: Low-thrust Interplanetary Trajectories with Missed Thrust Events: a Numerical Approach主题: 数值分析 (math.NA)
考虑的问题是驱动一个空间车辆在给定的最终时间到达目标,同时最小化燃料消耗。 这是一个在确定性设置下的经典最优控制问题。 然而发动机的临时随机故障可能在发动机使用恢复后导致无法到达目标。 因此,在确保达到目标的最小概率约束下,提出了一个随机最优控制问题。 该问题通过将概率约束对偶化并使用Arrow-Hurwicz随机算法进行建模、改进和最终求解。 给出了有关星际任务的数值结果。
The problem under consideration is to drive a spatial vehicle to a target at a given final time while minimizing fuel consumption. This is a classical optimal control problem in a deterministic setting. However temporary stochastic failures of the engine may prevent reaching the target after the engine usage is recovered. Therefore, a stochastic optimal control problem is formulated under the constraint of ensuring a minimal probability of hitting the target. This problem is modeled, improved and finally solved by dualizing the probability constraint and using an Arrow-Hurwicz stochastic algorithm. Numerical results concerning an interplanetary mission are presented.
- [6] arXiv:2511.14067 [中文pdf, pdf, html, 其他]
-
标题: 快速验证强数据库隔离(扩展版本)标题: Fast Verification of Strong Database Isolation (Extended Version)评论: 18页,19图,3表;被VLDB'2026接收主题: 数据库 (cs.DB)
强大的隔离保证,如可串行化和快照隔离,在现代数据库中对于保持数据一致性和完整性至关重要。 验证数据库是否遵守其声称的保证变得越来越重要,因为这些保证是供应商与其用户之间的契约。 然而,这一任务具有挑战性,特别是在黑盒环境中,此时只有可观察到的系统行为可用,并且通常涉及事务之间的不确定依赖关系。 在本文中,我们提出了VeriStrong,一种快速验证强数据库隔离的工具。 其核心是一种称为超多图的新形式化方法,能够紧凑地捕捉数据库执行中的确定性和不确定性事务依赖关系。 利用这种形式化方法,我们开发了用于验证可串行化和快照隔离的可靠且完整的编码。 为了实现高效率,VeriStrong针对数据库工作负载的特性定制了SMT求解,与之前的一般性方法形成对比。 我们在多种基准上的广泛评估表明,VeriStrong不仅在它们支持的工作负载上显著优于最先进的验证器,还能扩展到大型通用工作负载,超出它们的范围,同时在检测隔离异常方面保持高准确性。
Strong isolation guarantees, such as serializability and snapshot isolation, are essential for maintaining data consistency and integrity in modern databases. Verifying whether a database upholds its claimed guarantees is increasingly critical, as these guarantees form a contract between the vendor and its users. However, this task is challenging, particularly in black-box settings, where only observable system behavior is available and often involves uncertain dependencies between transactions. In this paper, we present VeriStrong, a fast verifier for strong database isolation. At its core is a novel formalism called hyper-polygraphs, which compactly captures both certain and uncertain transactional dependencies in database executions. Leveraging this formalism, we develop sound and complete encodings for verifying both serializability and snapshot isolation. To achieve high efficiency, VeriStrong tailors SMT solving to the characteristics of database workloads, in contrast to prior general-purpose approaches. Our extensive evaluation across diverse benchmarks shows that VeriStrong not only significantly outperforms state-of-the-art verifiers on the workloads they support, but also scales to large, general workloads beyond their reach, while maintaining high accuracy in detecting isolation anomalies.
- [7] arXiv:2511.14234 [中文pdf, pdf, html, 其他]
-
标题: Dirichlet级数分布的不可约因子数量的大偏差标题: Large deviations for number of irreducible divisors of the Dirichlet series distribution评论: 35页主题: 数论 (math.NT)
在本文中,我们通过模泊松收敛的视角产生精确的大偏差估计。 我们将一个一般性结果应用于数论、戴德金整环和有限域上的多项式中的各种例子,当使用基于狄利克雷级数的分布选择一个元素时。
In this paper we produce precise large deviation estimates through the lens of mod-Poisson convergence. We apply a general result to various examples from number theory, Dedekind domains and polynomials over finite fields when an element is selected using a distribution based on a Dirichlet series.
- [8] arXiv:2511.13861 [中文pdf, pdf, html, 其他]
-
标题: 数据驱动的电动汽车充电负荷曲线估计和典型电动汽车日负荷数据集生成标题: Data-Driven EV Charging Load Profile Estimation and Typical EV Daily Load Dataset Generation主题: 系统与控制 (eess.SY)
广泛采用电动汽车(EV)给配电网带来了新的挑战,这是由于局部负载大幅增加、随机的充电行为以及数据可用性有限。 本文提出了两种数据驱动的方法,利用服务于休斯顿地区的CenterPoint Energy的真实客户电表数据来估算住宅电动汽车充电曲线。 第一种方法应用最小二乘估计,通过比较聚合的电动汽车和非电动汽车电表数据来提取平均充电速率,从而实现一种统计方法来确定开始和结束充电时间。 第二种方法从电表曲线中分离电动汽车负载,并应用核密度估计(KDE)来开发一种概率充电模型。 这两种方法都产生了一个独特的“U形”每日充电曲线,大部分充电发生在夜间。 经过验证的曲线为公用事业公司提供了一种可扩展的工具,以便更好地预测电动汽车驱动的需求增长并支持主动的电网规划。
Widespread electric vehicle (EV) adoption introduces new challenges for distribution grids due to large, localized load increases, stochastic charging behavior, and limited data availability. This paper proposes two data-driven methods to estimate residential EV charging profiles using real-world customer meter data from CenterPoint Energy serving the Houston area. The first approach applies a least-squares estimation to extract average charging rates by comparing aggregated EV and non-EV meter data, enabling a statistical method for starting and ending charge times. The second method isolates EV load from meter profiles and applies a kernel density estimation (KDE) to develop a probabilistic charging model. Both methods produce a distinct "u-shaped" daily charging profile, with most charging occurring overnight. The validated profiles offer a scalable tool for utilities to better anticipate EV-driven demand increases and support proactive grid planning.
- [9] arXiv:2511.13760 [中文pdf, pdf, html, 其他]
-
标题: MoETTA:在混合分布偏移下使用MoE-LayerNorm的测试时间适应标题: MoETTA: Test-Time Adaptation Under Mixed Distribution Shifts with MoE-LayerNorm评论: 被AAAI 2026主技术赛道接收主题: 机器学习 (cs.LG) ; 人工智能 (cs.AI) ; 计算机视觉与模式识别 (cs.CV)
测试时适应(TTA)通过在推理过程中更新模型参数,已被证明在单领域分布偏移下有效缓解性能下降。 然而,现实世界的部署通常涉及混合分布偏移,其中测试样本受到多种且可能冲突的领域因素影响,即使对于最先进的TTA方法也构成了重大挑战。 现有方法的一个关键限制是其依赖于统一的适应路径,这未能考虑到不同领域中最佳梯度方向可能有显著差异。 此外,当前基准仅关注合成或同质偏移,未能捕捉现实世界异质混合分布偏移的复杂性。 为了解决这个问题,我们提出了MoETTA,一种基于熵的TTA框架,该框架集成了专家混合(MoE)架构。 与对所有测试样本强制执行单一参数更新规则不同,MoETTA引入了一组结构解耦的专家,使得可以沿多种梯度方向进行适应。 这种设计使模型能够通过灵活且解耦的参数更新更好地适应异质偏移。 为了模拟现实的部署条件,我们引入了两个新的基准:potpourri和potpourri+。 虽然经典设置仅关注合成损坏,但potpourri涵盖了更广泛的领域偏移——包括自然、艺术和对抗性失真——捕捉了更真实的部署挑战。 此外,potpourri+进一步包含源领域样本,以评估对灾难性遗忘的鲁棒性。 在三个混合分布偏移设置中的广泛实验表明,MoETTA始终优于强大的基线,建立了最先进的性能,并突显了通过专家级多样性建模多个适应方向的好处。
Test-Time adaptation (TTA) has proven effective in mitigating performance drops under single-domain distribution shifts by updating model parameters during inference. However, real-world deployments often involve mixed distribution shifts, where test samples are affected by diverse and potentially conflicting domain factors, posing significant challenges even for SOTA TTA methods. A key limitation in existing approaches is their reliance on a unified adaptation path, which fails to account for the fact that optimal gradient directions can vary significantly across different domains. Moreover, current benchmarks focus only on synthetic or homogeneous shifts, failing to capture the complexity of real-world heterogeneous mixed distribution shifts. To address this, we propose MoETTA, a novel entropy-based TTA framework that integrates the Mixture-of-Experts (MoE) architecture. Rather than enforcing a single parameter update rule for all test samples, MoETTA introduces a set of structurally decoupled experts, enabling adaptation along diverse gradient directions. This design allows the model to better accommodate heterogeneous shifts through flexible and disentangled parameter updates. To simulate realistic deployment conditions, we introduce two new benchmarks: potpourri and potpourri+. While classical settings focus solely on synthetic corruptions, potpourri encompasses a broader range of domain shifts--including natural, artistic, and adversarial distortions--capturing more realistic deployment challenges. Additionally, potpourri+ further includes source-domain samples to evaluate robustness against catastrophic forgetting. Extensive experiments across three mixed distribution shifts settings show that MoETTA consistently outperforms strong baselines, establishing SOTA performance and highlighting the benefit of modeling multiple adaptation directions via expert-level diversity.
- [10] arXiv:2511.14113 [中文pdf, pdf, html, 其他]
-
标题: 咖啡:可控扩散微调标题: Coffee: Controllable Diffusion Fine-tuning主题: 计算机视觉与模式识别 (cs.CV)
文本到图像扩散模型可以通过灵活的提示生成多样化的内容,这使得它们通过少量用户提供的数据进行微调而适合定制。 然而,防止模型学习微调数据中存在的不需要的概念,并防止这些概念与用户提示纠缠的可控微调仍然是一个开放性挑战。 这对于下游任务如偏差缓解、防止恶意适应、属性解缠和扩散策略的可推广微调至关重要。 我们提出了Coffee,它允许使用语言来指定不需要的概念以规范适应过程。 我们方法的关键在于保持用户提示的嵌入不与不需要的概念对齐。 重要的是,Coffee不需要额外的训练,并且可以通过修改文本描述来灵活修改不需要的概念。 我们通过在与不需要的概念配对的用户提示图像上进行微调来评估Coffee。 实验结果表明,Coffee可以在微调过程中防止文本到图像模型学习指定的不需要的概念,并优于现有方法。 代码将在接受后发布。
Text-to-image diffusion models can generate diverse content with flexible prompts, which makes them well-suited for customization through fine-tuning with a small amount of user-provided data. However, controllable fine-tuning that prevents models from learning undesired concepts present in the fine-tuning data, and from entangling those concepts with user prompts, remains an open challenge. It is crucial for downstream tasks like bias mitigation, preventing malicious adaptation, attribute disentanglement, and generalizable fine-tuning of diffusion policy. We propose Coffee that allows using language to specify undesired concepts to regularize the adaptation process. The crux of our method lies in keeping the embeddings of the user prompt from aligning with undesired concepts. Crucially, Coffee requires no additional training and enables flexible modification of undesired concepts by modifying textual descriptions. We evaluate Coffee by fine-tuning on images associated with user prompts paired with undesired concepts. Experimental results demonstrate that Coffee can prevent text-to-image models from learning specified undesired concepts during fine-tuning and outperforms existing methods. Code will be released upon acceptance.
- [11] arXiv:2511.14040 [中文pdf, pdf, html, 其他]
-
标题: 基于显著性引导的无人机图像桥梁缺陷检测深度学习方法标题: Saliency-Guided Deep Learning for Bridge Defect Detection in Drone Imagery主题: 计算机视觉与模式识别 (cs.CV)
异常物体检测和分类是计算机视觉和模式识别中的主要挑战任务之一。 在本文中,我们提出了一种新方法,使用无人机图像自动检测、定位和分类混凝土桥梁结构中的缺陷。 该框架由两个主要阶段组成。 第一阶段使用显著性进行缺陷区域建议,其中缺陷通常在正常表面模式中表现出相对于周围区域的局部不连续性。 第二阶段采用基于YOLOX的深度学习检测器,在通过在显著缺陷区域应用边界框级别的亮度增强获得的显著增强图像上运行。 在标准数据集上的实验结果证实了我们框架的性能及其在准确性和计算效率方面的适用性,这表明其在自供电检测系统中具有巨大的实施潜力。
Anomaly object detection and classification are one of the main challenging tasks in computer vision and pattern recognition. In this paper, we propose a new method to automatically detect, localize and classify defects in concrete bridge structures using drone imagery. This framework is constituted of two main stages. The first stage uses saliency for defect region proposals where defects often exhibit local discontinuities in the normal surface patterns with regard to their surrounding. The second stage employs a YOLOX-based deep learning detector that operates on saliency-enhanced images obtained by applying bounding-box level brightness augmentation to salient defect regions. Experimental results on standard datasets confirm the performance of our framework and its suitability in terms of accuracy and computational efficiency, which give a huge potential to be implemented in a self-powered inspection system.
- [12] arXiv:2511.14031 [中文pdf, pdf, 其他]
-
标题: FashionMAC:具有细粒度模型外观定制的无变形时尚图像生成标题: FashionMAC: Deformation-Free Fashion Image Generation with Fine-Grained Model Appearance Customization主题: 计算机视觉与模式识别 (cs.CV)
服装中心的时尚图像生成旨在合成现实且可控的人类模型,穿着给定的服装,由于其在电子商务中的实际应用,引起了越来越多的关注。 该任务的关键挑战在于两个方面:(1) 真实地保留服装细节,(2) 获得对模型外观的细粒度控制。 现有方法通常需要在生成过程中进行服装变形,这常常导致服装纹理失真。 此外,由于缺乏专门设计的机制,它们无法控制生成模型的细粒度属性。 为了解决这些问题,我们提出了FashionMAC,一种基于扩散的无变形框架,实现了高质量和可控的时尚展示图像生成。 我们框架的核心思想是消除进行服装变形的需要,并直接对从穿着服装的人身上分割出的服装进行外绘,这使得复杂的服装细节得以真实保留。 此外,我们提出了一种新颖的区域自适应解耦注意机制(RADA)以及一种链式掩码注入策略,以实现对合成的人类模型外观的细粒度控制。 具体来说,RADA自适应地预测每个细粒度文本属性的生成区域,并通过链式掩码注入策略使文本属性专注于预测的区域,显著提高了视觉保真度和可控性。 大量实验验证了我们的框架相比现有最先进方法的优越性能。
Garment-centric fashion image generation aims to synthesize realistic and controllable human models dressing a given garment, which has attracted growing interest due to its practical applications in e-commerce. The key challenges of the task lie in two aspects: (1) faithfully preserving the garment details, and (2) gaining fine-grained controllability over the model's appearance. Existing methods typically require performing garment deformation in the generation process, which often leads to garment texture distortions. Also, they fail to control the fine-grained attributes of the generated models, due to the lack of specifically designed mechanisms. To address these issues, we propose FashionMAC, a novel diffusion-based deformation-free framework that achieves high-quality and controllable fashion showcase image generation. The core idea of our framework is to eliminate the need for performing garment deformation and directly outpaint the garment segmented from a dressed person, which enables faithful preservation of the intricate garment details. Moreover, we propose a novel region-adaptive decoupled attention (RADA) mechanism along with a chained mask injection strategy to achieve fine-grained appearance controllability over the synthesized human models. Specifically, RADA adaptively predicts the generated regions for each fine-grained text attribute and enforces the text attribute to focus on the predicted regions by a chained mask injection strategy, significantly enhancing the visual fidelity and the controllability. Extensive experiments validate the superior performance of our framework compared to existing state-of-the-art methods.
- [13] arXiv:2511.13877 [中文pdf, pdf, html, 其他]
-
标题: 基于伪牛顿提升的混合卷积神经网络用于腰椎退行性病变检测标题: Hybrid Convolution Neural Network Integrated with Pseudo-Newton Boosting for Lumbar Spine Degeneration Detection主题: 计算机视觉与模式识别 (cs.CV) ; 人工智能 (cs.AI)
本文提出了一种新的增强模型架构,使用混合方法结合EfficientNet和VGG19以及自定义设计的组件,对DICOM图像进行腰椎退行性病变分类。 所提出的模型与传统迁移学习方法不同,因为它结合了伪牛顿提升层和稀疏诱导特征缩减层,形成多级框架,进一步提高了特征选择和表示。 伪牛顿提升层对特征权重进行智能变化,包含更多详细的解剖特征,这些特征在迁移学习设置中通常被忽略。 此外,稀疏诱导层消除了学习特征的冗余,为腰椎病理提供了精简但稳健的表示。 该架构是新颖的,因为它克服了传统迁移学习方法的限制,特别是在医学图像的高维背景下,并实现了显著的性能提升,相比基线模型EfficientNet,精度达到0.9,召回率为0.861,F1得分为0.88,损失为0.18,准确率为88.1%。 本工作将展示架构、预处理流程和实验结果。 结果有助于开发用于医学图像的自动化诊断工具。
This paper proposes a new enhanced model architecture to perform classification of lumbar spine degeneration with DICOM images while using a hybrid approach, integrating EfficientNet and VGG19 together with custom-designed components. The proposed model is differentiated from traditional transfer learning methods as it incorporates a Pseudo-Newton Boosting layer along with a Sparsity-Induced Feature Reduction Layer that forms a multi-tiered framework, further improving feature selection and representation. The Pseudo-Newton Boosting layer makes smart variations of feature weights, with more detailed anatomical features, which are mostly left out in a transfer learning setup. In addition, the Sparsity-Induced Layer removes redundancy for learned features, producing lean yet robust representations for pathology in the lumbar spine. This architecture is novel as it overcomes the constraints in the traditional transfer learning approach, especially in the high-dimensional context of medical images, and achieves a significant performance boost, reaching a precision of 0.9, recall of 0.861, F1 score of 0.88, loss of 0.18, and an accuracy of 88.1%, compared to the baseline model, EfficientNet. This work will present the architectures, preprocessing pipeline, and experimental results. The results contribute to the development of automated diagnostic tools for medical images.
- [14] arXiv:2511.13872 [中文pdf, pdf, html, 其他]
-
标题: 混合系统的动态状态估计:在跟随电网和形成电网控制方案之间切换的逆变器标题: Dynamic state estimation of hybrid systems: Inverters that switch between grid-following and grid-forming control schemes主题: 系统与控制 (eess.SY)
本文开发了一种用于在跟随电网和形成电网控制方案之间切换的逆变器的混合系统建模框架。 特别是,此类逆变器被建模为具有电压和频率守卫条件的混合自动机,并具有在模式转换期间保持相位、频率和下垂参考一致的重置映射。 混合模型嵌入在扩展卡尔曼滤波器中,以在显式模式切换下评估估计性能。 结果表明,与平滑连续模型相比,所提出的框架确保了稳定且行为良好的动态特性,并提高了状态估计,尤其是在切换瞬间附近。
This paper develops a hybrid system modeling framework for inverters that switch between grid-following and grid-forming control schemes. In particular, such inverters are modeled as hybrid automata with guard conditions on voltage and frequency, and reset maps that maintain consistent phase, frequency, and droop references during mode transitions. The hybrid model is embedded within an extended Kalman filter to assess estimation performance under explicit mode switching. Results show that the proposed framework ensures stable, well-behaved dynamics and improves state estimation, especially near switching instants, compared with smooth continuous models.
- [15] arXiv:2511.13869 [中文pdf, pdf, html, 其他]
-
标题: H-CNN-ViT:一种分层门控注意力多分支模型用于膀胱癌复发预测标题: H-CNN-ViT: A Hierarchical Gated Attention Multi-Branch Model for Bladder Cancer Recurrence Prediction主题: 计算机视觉与模式识别 (cs.CV) ; 人工智能 (cs.AI)
膀胱癌是全球最普遍的恶性肿瘤之一,复发率高达78%,需要准确的术后监测以实现有效的患者管理。 多序列对比增强MRI常用于复发检测;然而,由于术后改变如瘢痕、肿胀和组织重塑,即使对于经验丰富的放射科医生来说,解释这些扫描仍然具有挑战性。 人工智能辅助诊断工具在改善膀胱癌复发预测方面显示出前景,但该领域的进展受到缺乏专门用于复发评估研究的多序列MRI数据集的阻碍。 在这项工作中,我们首先介绍了一个精心整理的多序列、多模态MRI数据集,专门设计用于膀胱癌复发预测,为未来的研究建立了一个有价值的基准。 然后我们提出了H-CNN-ViT,一种新的分层门控注意力多分支模型,该模型可以根据上下文需求对全局(ViT)和局部(CNN)路径的特征进行选择性加权,实现了平衡且有针对性的特征融合。 我们的多分支架构独立处理每种模态,确保每种成像通道的独特特性得到最佳捕捉和整合。 在我们的数据集上评估,H-CNN-ViT达到了78.6%的AUC,超过了最先进的模型。 我们的模型可在https://github.com/XLIAaron/H-CNN-ViT公开获取。
Bladder cancer is one of the most prevalent malignancies worldwide, with a recurrence rate of up to 78%, necessitating accurate post-operative monitoring for effective patient management. Multi-sequence contrast-enhanced MRI is commonly used for recurrence detection; however, interpreting these scans remains challenging, even for experienced radiologists, due to post-surgical alterations such as scarring, swelling, and tissue remodeling. AI-assisted diagnostic tools have shown promise in improving bladder cancer recurrence prediction, yet progress in this field is hindered by the lack of dedicated multi-sequence MRI datasets for recurrence assessment study. In this work, we first introduce a curated multi-sequence, multi-modal MRI dataset specifically designed for bladder cancer recurrence prediction, establishing a valuable benchmark for future research. We then propose H-CNN-ViT, a new Hierarchical Gated Attention Multi-Branch model that enables selective weighting of features from the global (ViT) and local (CNN) paths based on contextual demands, achieving a balanced and targeted feature fusion. Our multi-branch architecture processes each modality independently, ensuring that the unique properties of each imaging channel are optimally captured and integrated. Evaluated on our dataset, H-CNN-ViT achieves an AUC of 78.6%, surpassing state-of-the-art models. Our model is publicly available at https://github.com/XLIAaron/H-CNN-ViT}.
- [16] arXiv:2511.13873 [中文pdf, pdf, html, 其他]
-
标题: 基于区域允许被动平衡的依赖于拥塞的不平衡定价机制标题: A congestion-dependent imbalance pricing mechanism for regions allowing passive balancing评论: 该手稿已提交至《可持续能源、电网和网络》(SEGAN)期刊,以期发表。主题: 系统与控制 (eess.SY)
维持系统平衡变得越来越具有挑战性,因为市场设计和电网容量增强滞后于可再生能源份额的增长,这需要输电系统运营商(TSO)和平衡责任方(BRPs)付出更多努力。 一个主体可以通过参与备用市场积极地支持平衡,或者通过根据系统需求调整其组合被动地支持平衡。 在一些国家,当BRPs的偏差有助于整体系统稳定性时,他们被激励进行被动平衡。 然而,BRPs关注的是利润最大化,而不是最小化组合差异,这可能导致对价格信号的同时响应,并在输电-配电接口处产生问题。 本研究提供了一个两阶段的随机模型,该模型捕捉了BRP的动态行为及其在日前市场和平衡市场价格不确定性下对电网的影响,涉及三种不平衡定价机制:单一价格、双价格和两价格。 然后,提出了一种与拥堵相关的不平衡定价机制,在保持被动平衡激励的同时满足电网约束。 通过荷兰配电电网的一部分仿真提供了概念验证。 结果表明,所提出的方法缓解了拥堵区域的意外高峰流量问题,同时保留了非拥堵区域其他BRPs的被动平衡贡献。
Maintaining system balance becomes increasingly challenging as market design and grid capacity enhancement lag behind the growing share of renewables, requiring greater effort from both the transmission system operator (TSO) and the Balance Responsible Parties (BRPs). An actor can support balancing actively by bidding into reserve markets, or passively by adjusting its portfolio in line with system needs. In some countries, BRPs are incentivized to engage in passive balancing when their deviations support overall system stability. However, BRPs focus on profit maximization rather than minimizing portfolio discrepancies, which can cause simultaneous responses to price signals and create issues at the transmission-distribution interface. This research provides a two-stage stochastic model that captures BRP dynamic behavior and their impact on the grid under day-ahead and balancing market price uncertainty across three imbalance pricing mechanisms: the single, dual, and two-price. Then, a congestion-dependent imbalance pricing mechanism is proposed that maintains incentives for passive balancing while satisfying the grid constraint. A proof of concept is provided via the simulation with a part of the Dutch distribution grid. Results show that the proposed method mitigates the unexpected peak flow issue in congested areas while preserving passive balancing contributions from other BRPs in non-congested areas.
- [17] arXiv:2411.15127 [中文pdf, pdf, 其他]
-
标题: PRIMUS:使用多模态自监督预训练IMU编码器标题: PRIMUS: Pretraining IMU Encoders with Multimodal Self-Supervision评论: 在ICASSP 2025上发表。同时在NeurIPS 2024 TSALM研讨会(大模型时代的时序数据)上以“PRIMUS:通过多模态和自监督学习预训练IMU编码器”为题发表。主题: 机器学习 (cs.LG)
通过嵌入在个人设备中的惯性测量单元(IMUs)感知人类运动,已在健康和保健领域实现了重要的应用。 标记的IMU数据很少,但是可以使用未标记或弱标记的IMU数据来建模人类运动。 对于视频或文本模态,"预训练和适应"方法利用大量未标记或弱标记的数据来构建强大的特征提取器,然后使用有限的标记数据进行适应。 然而,对于IMU数据,预训练方法尚不明确,且很少在跨域任务上进行评估。 我们提出了PRIMUS:一种用于PRetraining IMU编码器的方法,该方法使用了一种新的预训练目标,该目标基于在域内和跨域数据集上的下游性能进行了实证验证。 PRIMUS目标通过结合自监督、多模态和最近邻监督,有效提高了下游性能。 与最先进的基线相比,在每类少于500个标记样本的情况下,PRIMUS将测试准确率提高了高达15%。 为了使更广泛的社区受益,我们已在github.com/nokia-bell-labs/pretrained-imu-encoders开源了我们的代码。
Sensing human motions through Inertial Measurement Units (IMUs) embedded in personal devices has enabled significant applications in health and wellness. Labeled IMU data is scarce, however, unlabeled or weakly labeled IMU data can be used to model human motions. For video or text modalities, the "pretrain and adapt" approach utilizes large volumes of unlabeled or weakly labeled data to build a strong feature extractor, followed by adaptation to specific tasks using limited labeled data. However, pretraining methods are poorly understood for IMU data, and pipelines are rarely evaluated on out-of-domain tasks. We propose PRIMUS: a method for PRetraining IMU encoderS that uses a novel pretraining objective that is empirically validated based on downstream performance on both in-domain and out-of-domain datasets. The PRIMUS objective effectively enhances downstream performance by combining self-supervision, multimodal, and nearest-neighbor supervision. With fewer than 500 labeled samples per class, PRIMUS improves test accuracy by up to 15%, compared to state-of-the-art baselines. To benefit the broader community, we have open-sourced our code at github.com/nokia-bell-labs/pretrained-imu-encoders.
- [18] arXiv:2410.14589 [中文pdf, pdf, html, 其他]
-
标题: 方言,但有多方言? 在连续体上转录和评估方言标题: Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum评论: 发表于NAACL 2025成果主题: 计算与语言 (cs.CL)
在自然语言处理中,对方言的兴趣日益增加。 然而,迄今为止的大多数工作仍将方言视为离散类别。 例如,针对英语的面向变异的自然语言处理评估工作通常将印度英语或非裔美国人通用英语作为同质类别(Faisal等,2024;Ziems等,2023),但在一种变体内部存在显著差异。 我们研究了方言内部的变化,并表明性能在类别内部存在关键差异。 我们在意大利方言上测量了语音到文本的性能,并实证观察到地理上的性能差异。 这种差异与最高性能方言变体的语言相似性密切相关(-0.5)。 我们通过方言度量方法验证了结果,并将性能差异解释为语音到文本模型偏向于与标准变体更相似的方言。 我们还利用地统计方法预测未见过地点的零样本性能,并发现引入地理信息可显著提高预测性能,表明性能分布中存在地理结构。
There is increasing interest in looking at dialects in NLP. However, most work to date still treats dialects as discrete categories. For instance, evaluative work in variation-oriented NLP for English often works with Indian English or African-American Venacular English as homogeneous categories (Faisal et al., 2024; Ziems et al., 2023), yet even within one variety there is substantial variation. We examine within-dialect variation and show that performance critically varies within categories. We measure speech-to-text performance on Italian dialects, and empirically observe a geographical performance disparity. This disparity correlates substantially (-0.5) with linguistic similarity to the highest performing dialect variety. We cross-examine our results against dialectometry methods, and interpret the performance disparity to be due to a bias towards dialects that are more similar to the standard variety in the speech-to-text model examined. We additionally leverage geostatistical methods to predict zero-shot performance at unseen sites, and find the incorporation of geographical information to substantially improve prediction performance, indicating there to be geographical structure in the performance distribution.
- [19] arXiv:2511.14563 [中文pdf, pdf, html, 其他]
-
标题: 对函数域中微小位移处的$L$函数的对数的线性组合及其超越标题: Linear Combinations of Logarithms of $L$-functions over Function Fields at Microscopic Shifts and Beyond主题: 数论 (math.NT)
在固定特征的函数域设置中,第二和第三作者证明了当$D$的次数趋于$\infty$时,在低零点假设下,$\log \big|L\big(\frac12, χ_D\big)\big|$的值随着$D$在首多项式且无平方因子的多项式上变化时,渐近地服从高斯分布。对于所有微观尺度或相对于亏格的所有非微观尺度的实数不同位移$t_j$,我们考虑$\log\big|L\big(\frac12+it_j, χ_D\big)\big|$的线性组合与实系数,以及$\arg L\big(\frac12+it_j, χ_D\big).$的单独考虑。我们在低零点假设下提供了它们的分布函数的估计。 我们同样研究了椭圆曲线 $E$ 的二次扭量中 $\log\big|L\big(\frac12+it_j, E\otimes χ_D\big)\big|$ 的线性组合的分布函数,以及单独的 $\arg L\big(\frac12+it_j, E\otimesχ_D\big)$,当导子变大时,其根数为一。 作为这些结果的应用,我们证明了这类 $L$-函数的非平凡零点数与其均值的波动的中心极限定理,从而恢复了 Faifman 和 Rudnick 的先前结果。 此类波动的相关性与 Bourgade、Coram 和 Diaconis 以及 Wieand 对黎曼 zeta 函数的零点和酉随机矩阵的特征角的结果一致。
In the function field setting with a fixed characteristic, it was proven by the second and third authors that the values $\log \big|L\big(\frac12, χ_D\big)\big|$ as $D$ varies over monic and square-free polynomials are asymptotically Gaussian distributed on the assumption of a low lying zeros hypothesis as the degree of $D$ tends to $\infty$. For real distinct shifts $t_j$ all of microscopic size or all of nonmicroscopic size relative to the genus, we consider linear combinations of $\log\big|L\big(\frac12+it_j, χ_D\big)\big|$ with real coefficients, and separately, of $\arg L\big(\frac12+it_j, χ_D\big).$ We provide estimates for their distribution functions under the low lying zeros hypothesis. We similarly study distribution functions of linear combinations of $\log\big|L\big(\frac12+it_j, E\otimes χ_D\big)\big|$, and separately $\arg L\big(\frac12+it_j, E\otimesχ_D\big)$, for quadratic twists of elliptic curves $E$ with root number one as the conductor gets large. As an application of these results, we prove a central limit theorem for the fluctuation of the number of nontrivial zeros of such $L$-functions from its mean, and thus recover previous results by Faifman and Rudnick. Correlations of such fluctuations are in harmony with the results of Bourgade, Coram and Diaconis, and Wieand for zeros of the Riemann zeta function and for eigenangles of unitary random matrices.
- [20] arXiv:2511.14494 [中文pdf, pdf, html, 其他]
-
标题: 射影余解的张量环上的Gorenstein平坦模标题: (projectively coresolved) Gorenstein flat modules over tensor rings评论: 13页主题: 环与代数 (math.RA) ; 交换代数 (math.AC)
设 $T_R(M)$ 是一个张量环,其中 $R$ 是一个环,而 $M$ 是一个 $N$-幂零 $R$-双模。 在某些条件下,我们表征了$T_R(M)$上的投影核心化解Gorenstein平坦模,表明一个$T_R(M)$模$(X,u)$是投影核心化解Gorenstein平坦模当且仅当$u$是单态的且$coker(u)$是一个投影核心化解Gorenstein平坦$R$-模。 $T_R(M)$上的Gorenstein平坦模的一类也被明确描述。 我们讨论了对平凡环扩张和Morita上下文环的应用。
Let $T_R(M)$ be a tensor ring, where $R$ is a ring and $M$ is an $N$-nilpotent $R$-bimodule. Under certain conditions, we characterize projectively coresolved Gorenstein flat modules over $T_R(M)$, showing that a $T_R(M)$ module $(X,u)$ is projectively coresolved Gorenstein flat if and only if $u$ is monomorphic and $coker(u)$ is a projectively coresolved Gorenstein flat $R$-module. A class of Gorenstein at modules over $T_R(M)$ are also explicitly described. We discuss applications to trivial ring extensions and Morita context rings.
- [21] arXiv:2511.13793 [中文pdf, pdf, html, 其他]
-
标题: 通过信息流建模招聘人工智能中的公平性标题: Modeling Fairness in Recruitment AI via Information Flow主题: 计算机与社会 (cs.CY) ; 人工智能 (cs.AI)
避免偏见并理解人工智能支持决策的实际后果对于解决公平性问题和分配责任至关重要。 现有的方法通常要么关注技术方面,如数据集和模型,要么关注高层次的社会伦理考虑——很少能捕捉到这些元素在实践中如何相互作用。 在本文中,我们将基于信息流的建模框架应用于一个将自动化候选人匹配与人工决策相结合的真实招聘流程。 通过半结构化的利益相关者访谈和迭代建模,我们构建了一个多层次的招聘流程表示,捕捉信息在算法和人类组件之间如何被转换、过滤和解释。 我们确定了偏见可能出现的环节,它们如何在整个系统中传播,以及可能对候选人产生的后续影响。 这个案例研究说明了信息流建模如何支持对公平性风险的结构化分析,在复杂的社会技术系统中提供透明度。
Avoiding bias and understanding the real-world consequences of AI-supported decision-making are critical to address fairness and assign accountability. Existing approaches often focus either on technical aspects, such as datasets and models, or on high-level socio-ethical considerations - rarely capturing how these elements interact in practice. In this paper, we apply an information flow-based modeling framework to a real-world recruitment process that integrates automated candidate matching with human decision-making. Through semi-structured stakeholder interviews and iterative modeling, we construct a multi-level representation of the recruitment pipeline, capturing how information is transformed, filtered, and interpreted across both algorithmic and human components. We identify where biases may emerge, how they can propagate through the system, and what downstream impacts they may have on candidates. This case study illustrates how information flow modeling can support structured analysis of fairness risks, providing transparency across complex socio-technical systems.
- [22] arXiv:2511.14104 [中文pdf, pdf, html, 其他]
-
标题: 基于GRU-扩散的轻量级多任务CNN用于ECG诊断标题: Lightweight Multi-task CNN for ECG Diagnosis with GRU-Diffusion评论: 15页,5图主题: 信号处理 (eess.SP)
随着对边缘设备上实时心电图(ECG)分类需求的增加,现有模型在不平衡数据集上面临计算成本高和准确率有限的挑战。本文提出了多任务DFNet,这是一种轻量级多任务框架,用于MIT-BIH心律失常数据库和PTB诊断心电图数据库中的ECG分类,通过在任务间动态共享知识来实现高效的任务协作,例如心律失常检测、心肌梗死(MI)分类和其他心血管异常。所提出的方法集成了GRU增强的扩散模型,其中GRU被嵌入扩散模型中以更好地捕捉时间依赖性并为不平衡类别生成高质量的合成信号。实验结果表明,与传统模型相比,多任务DFNet在MIT-BIH数据集和PTB数据集上的准确率分别达到99.72%和99.89%,参数显著减少,使其适用于可穿戴ECG监测器的部署。这项工作为多任务ECG诊断提供了一个紧凑且高效的解决方案,为资源受限设备上的边缘医疗应用提供了有前景的潜力。
With the increasing demand for real-time Electrocardiogram (ECG) classification on edge devices, existing models face challenges of high computational cost and limited accuracy on imbalanced datasets.This paper presents Multi-task DFNet, a lightweight multi-task framework for ECG classification across the MIT-BIH Arrhythmia Database and the PTB Diagnostic ECG Database, enabling efficient task collaboration by dynamically sharing knowledge across tasks, such as arrhythmia detection, myocardial infarction (MI) classification, and other cardiovascular abnormalities. The proposed method integrates GRU-augmented Diffusion, where the GRU is embedded within the diffusion model to capture temporal dependencies better and generate high-quality synthetic signals for imbalanced classes. The experimental results show that Multi-task DFNet achieves 99.72% and 99.89% accuracy on the MIT-BIH dataset and PTB dataset, respectively, with significantly fewer parameters compared to traditional models, making it suitable for deployment on wearable ECG monitors. This work offers a compact and efficient solution for multi-task ECG diagnosis, providing a promising potential for edge healthcare applications on resource-constrained devices.
- [23] arXiv:2511.13809 [中文pdf, pdf, html, 其他]
-
标题: Scores激活函数:通过设计实现模型无关全局可解释性的新激活函数标题: ScoresActivation: A New Activation Function for Model Agnostic Global Explainability by DesignYating Zou, Batuhan Keskin, Gregor G. Taylor, Zenghui Li, Jie Wang, Eduard Alarcon, Fabio Sebastiano, Masoud Babaie, Edoardo Charbon评论: 提交至ECAI 2025会议的论文主题: 机器学习 (cs.LG) ; 人工智能 (cs.AI)
理解大型深度学习模型的决策是构建透明和可信系统的关键挑战。 尽管当前的事后解释方法为特征重要性提供了有价值的见解,但它们本质上与模型训练过程脱节,限制了其忠实性和实用性。 在本工作中,我们通过设计一种新颖的可微分方法引入全局可解释性,将特征重要性估计直接整合到模型训练中。 我们方法的核心是ScoresActivation函数,这是一种嵌入在学习流程中的特征排序机制。 这种集成使模型能够以可微分和端到端可训练的方式根据特征对预测性能的贡献来优先考虑特征。 在基准数据集上的评估表明,我们的方法产生了全局忠实且稳定的特征排名,与SHAP值和真实特征重要性一致,同时保持了高预测性能。 此外,特征评分比经典的SHAP方法快150倍,在训练期间仅需2秒,而SHAP在同一配置下进行特征排名需要300秒。 我们的方法在使用10个特征(5个相关)时提高了11.24%的分类准确率,在使用16个特征(5个相关,11个不相关)时提高了29.33%,展示了对不相关信息的鲁棒性。 这项工作弥合了模型准确性与可解释性之间的差距,提供了一个固有可解释机器学习的可扩展框架。
Understanding the decision of large deep learning models is a critical challenge for building transparent and trustworthy systems. Although the current post hoc explanation methods offer valuable insights into feature importance, they are inherently disconnected from the model training process, limiting their faithfulness and utility. In this work, we introduce a novel differentiable approach to global explainability by design, integrating feature importance estimation directly into model training. Central to our method is the ScoresActivation function, a feature-ranking mechanism embedded within the learning pipeline. This integration enables models to prioritize features according to their contribution to predictive performance in a differentiable and end-to-end trainable manner. Evaluations across benchmark datasets show that our approach yields globally faithful, stable feature rankings aligned with SHAP values and ground-truth feature importance, while maintaining high predictive performance. Moreover, feature scoring is 150 times faster than the classical SHAP method, requiring only 2 seconds during training compared to SHAP's 300 seconds for feature ranking in the same configuration. Our method also improves classification accuracy by 11.24% with 10 features (5 relevant) and 29.33% with 16 features (5 relevant, 11 irrelevant), demonstrating robustness to irrelevant inputs. This work bridges the gap between model accuracy and interpretability, offering a scalable framework for inherently explainable machine learning.
- [24] arXiv:2409.18486 [中文pdf, pdf, html, 其他]
-
标题: OpenAI o1的评估:通用人工智能的机会与挑战标题: Evaluation of OpenAI o1: Opportunities and Challenges of AGI主题: 计算与语言 (cs.CL)
这项综合研究评估了OpenAI的o1-preview大语言模型在多种复杂推理任务中的表现,涵盖了计算机科学、数学、自然科学、医学、语言学和社会科学等多个领域。 通过严格的测试,o1-preview表现出色,往往在从编码挑战到科学推理,从语言处理到创造性问题解决的多个领域达到人类水平或超越人类水平的表现。 主要发现包括: -83.3%的复杂竞赛编程问题解决成功率,超过了许多人类专家。 -生成连贯且准确的放射科报告的能力更强,优于其他评估的模型。 -高中水平的数学推理任务准确率为100%,提供详细的分步解决方案。 -在一般和专业领域(如医学)的自然语言推理能力更为先进。 -在芯片设计任务中表现出色,在电子设计自动化脚本生成和错误分析等领域的表现优于专业模型。 -在人类学和地质学方面表现出色,展示了这些专业领域的深入理解和推理能力。 -在量化投资方面具有强大的能力。 o1具备全面的金融知识和统计建模技能。 -在社交媒体分析中表现有效,包括情感分析和情绪识别。 该模型在需要跨不同领域进行复杂推理和知识整合的任务中尤为出色。 尽管观察到一些局限性,包括在简单问题上的偶尔错误以及对某些高度专业概念的挑战,但总体结果表明在向通用人工智能发展方面取得了显著进展。
This comprehensive study evaluates the performance of OpenAI's o1-preview large language model across a diverse array of complex reasoning tasks, spanning multiple domains, including computer science, mathematics, natural sciences, medicine, linguistics, and social sciences. Through rigorous testing, o1-preview demonstrated remarkable capabilities, often achieving human-level or superior performance in areas ranging from coding challenges to scientific reasoning and from language processing to creative problem-solving. Key findings include: -83.3% success rate in solving complex competitive programming problems, surpassing many human experts. -Superior ability in generating coherent and accurate radiology reports, outperforming other evaluated models. -100% accuracy in high school-level mathematical reasoning tasks, providing detailed step-by-step solutions. -Advanced natural language inference capabilities across general and specialized domains like medicine. -Impressive performance in chip design tasks, outperforming specialized models in areas such as EDA script generation and bug analysis. -Remarkable proficiency in anthropology and geology, demonstrating deep understanding and reasoning in these specialized fields. -Strong capabilities in quantitative investing. O1 has comprehensive financial knowledge and statistical modeling skills. -Effective performance in social media analysis, including sentiment analysis and emotion recognition. The model excelled particularly in tasks requiring intricate reasoning and knowledge integration across various fields. While some limitations were observed, including occasional errors on simpler problems and challenges with certain highly specialized concepts, the overall results indicate significant progress towards artificial general intelligence.
- [25] arXiv:2511.13807 [中文pdf, pdf, html, 其他]
-
标题: GAEA:国家规模环境数字孪生的经验与教训标题: GAEA: Experiences and Lessons Learned from a Country-Scale Environmental Digital Twin主题: 计算机与社会 (cs.CY) ; 人工智能 (cs.AI) ; 新兴技术 (cs.ET)
本文描述了在塞浦路斯岛上部署国家规模的环境数字孪生体三年后的经验与教训。 这个数字孪生体称为GAEA,包含27个环境地理空间服务,适用于城市规划者、政策制定者、农民、房产所有者、房地产和林业专业人士,以及在其投资组合中拥有房产的保险公司和银行。 本文展示了地理空间分析和环境数字孪生体在大规模应用中的力量、潜力、当前及未来的挑战。
This paper describes the experiences and lessons learned after the deployment of a country-scale environmental digital twin on the island of Cyprus for three years. This digital twin, called GAEA, contains 27 environmental geospatial services and is suitable for urban planners, policymakers, farmers, property owners, real-estate and forestry professionals, as well as insurance companies and banks that have properties in their portfolio. This paper demonstrates the power, potential, current and future challenges of geospatial analytics and environmental digital twins on a large scale.
- [26] arXiv:2511.14757 [中文pdf, pdf, html, 其他]
-
标题: 一种弱收敛方法用于动态薛定谔问题的大偏差标题: A weak convergence approach to the large deviations of the dynamic Schrödinger problem主题: 概率 (math.PR)
在本文中,我们考虑使用Dupuis、Ellis、Budhiraja等人开发的变分方法来研究动力Schrödinger问题的大偏差。最近关于缩放的Schrödinger问题族的结果,特别是由Bernton、Ghosal和Nutz以及作者们的研究,已经为静态问题建立了大偏差原理。对于动态问题,只有在参考过程为缩放布朗运动的情况下,Kato进行了探索。在这里,我们使用变分方法推导大偏差结果,旨在超越Kato所考虑的布朗参考动力学。具体而言,我们为在端点条件下进行的桥过程建立了紧集上的统一拉普拉斯原理。当与现有静态问题的结果相结合时,这导致了相应(动态)Schrödinger桥的大偏差原理。除了论文的具体结果外,我们的工作将此类大偏差问题置于弱收敛框架中,并我们猜想这些结果可以扩展以涵盖更复杂的参考动力学类型。具体而言,我们展望了将结果应用于反射Schrödinger桥的应用。
In this paper, we consider large deviations for dynamical Schrödinger problems, using the variational approach developed by Dupuis, Ellis, Budhiraja, and others. Recent results on scaled families of Schrödinger problems, in particular by Bernton, Ghosal, and Nutz, and the authors, have established large deviation principles for the static problem. For the dynamic problem, only the case with a scaled Brownian motion reference process has been explored by Kato. Here, we derive large deviation results using the variational approach, with the aim of going beyond the Brownian reference dynamics considered by Kato. Specifically, we develop a uniform on compacts Laplace principle for bridge processes conditioned on their endpoints. When combined with existing results for the static problem, this leads to a large deviation principle for the corresponding (dynamic) Schrödinger bridge. In addition to the specific results of the paper, our work puts such large deviation questions into the weak convergence framework, and we conjecture that the results can be extended to cover also more involved types of reference dynamics. Specifically, we provide an outlook on applying the result to reflected Schrödinger bridges.
- [27] arXiv:2404.03336 [中文pdf, pdf, html, 其他]
-
标题: 基于GPU加速仿真的跨机器人任务基准测试群体强化学习标题: Benchmarking Population-Based Reinforcement Learning across Robotic Tasks with GPU-Accelerated Simulation评论: 已接受发表于2025年IEEE第21届自动化科学与工程国际会议期刊参考: 2025年IEEE第21届自动化科学与工程国际会议(CASE),美国加利福尼亚州洛杉矶,2025年,第1231-1238页主题: 机器人技术 (cs.RO)
近年来,深度强化学习(RL)在解决复杂的连续控制任务方面表现出其有效性。 然而,这需要大量的训练经验,由于学习效率和策略性能对超参数选择的敏感性,往往需要进行大量耗时的实验尝试。 这项工作利用了一种基于种群的强化学习(PBRL)方法和一个GPU加速的物理模拟器,通过并行训练多个策略来增强RL的探索能力。 PBRL框架与三种最先进的RL算法——PPO、SAC和DDPG——进行了基准比较,根据学习代理的性能动态调整超参数。 实验在Isaac Gym中的四个具有挑战性的任务上进行——Anymal Terrain、Shadow Hand、Humanoid、Franka Nut Pick——通过分析种群大小和超参数的变异机制的影响。 结果表明,PBRL代理在累积奖励方面相比非进化基线代理表现出更优的性能。 此外,训练好的代理最终被部署到现实世界中进行Franka Nut Pick任务。 据我们所知,这是首次将PBRL代理部署到真实硬件上的仿真到现实的尝试。 代码和学习策略的视频可在我们的项目网站(https://sites.google.com/view/pbrl)上获得。
In recent years, deep reinforcement learning (RL) has shown its effectiveness in solving complex continuous control tasks. However, this comes at the cost of an enormous amount of experience required for training, exacerbated by the sensitivity of learning efficiency and the policy performance to hyperparameter selection, which often requires numerous trials of time-consuming experiments. This work leverages a Population-Based Reinforcement Learning (PBRL) approach and a GPU-accelerated physics simulator to enhance the exploration capabilities of RL by concurrently training multiple policies in parallel. The PBRL framework is benchmarked against three state-of-the-art RL algorithms -- PPO, SAC, and DDPG -- dynamically adjusting hyperparameters based on the performance of learning agents. The experiments are performed on four challenging tasks in Isaac Gym -- Anymal Terrain, Shadow Hand, Humanoid, Franka Nut Pick -- by analyzing the effect of population size and mutation mechanisms for hyperparameters. The results show that PBRL agents achieve superior performance, in terms of cumulative reward, compared to non-evolutionary baseline agents. Moreover, the trained agents are finally deployed in the real world for a Franka Nut Pick task. To our knowledge, this is the first sim-to-real attempt for deploying PBRL agents on real hardware. Code and videos of the learned policies are available on our project website (https://sites.google.com/view/pbrl).
- [28] arXiv:2511.14421 [中文pdf, pdf, html, 其他]
-
标题: 协同多LEO上行通信的定位与通信:一种双时间尺度卡尔曼滤波辅助的方法标题: Integrated Positioning and Communication for Cooperative Multi-LEO Uplink Communications: A Dual-Timescale Kalman Filter-Aided Approach主题: 信号处理 (eess.SP)
低地球轨道(LEO)卫星是未来非地面网络(NTN)的关键组成部分,因为其具有较低的时延、强大的信号强度、较短的重访时间和密集的星座结构。 然而,由于长距离传播中的严重信号衰减和较短的相干时间,获取可靠的信道状态信息(CSI)在LEO卫星通信中仍然具有挑战性。 尽管存在这些挑战,LEO信道受益于明显的视距主导特性和与定位信息固有相关的几何特性。 在本工作中,我们提出了一种集成定位与通信(IPAC)框架,用于多LEO卫星网络,以解决LEO信道带来的独特挑战。 具体而言,我们利用闭环LEO定位,利用用户的位置信息来提高上行链路CSI获取。 为了克服单颗卫星系统的链路预算限制,采用了协作的多LEO上行数据检测。 通过利用与位置相关参数和随机信道增益的不同相干时间尺度,我们开发了一个基于双时间尺度卡尔曼滤波器的IPAC框架:一个无迹卡尔曼滤波器(UKF)用于在大时间尺度上跟踪用户的位置和速度,以及一个利用大时间尺度获得的位置信息来改进小时间尺度上的数据辅助上行链路信道估计的卡尔曼滤波器。 最后,通过期望最大化(EM)算法联合解决信道估计和协作数据检测两项任务。 数值结果表明,所提出的IPAC方法在信道估计精度和通信性能方面优于传统基线。
Low Earth orbit (LEO) satellites are a crucial component of the future non-terrestrial networks (NTN) due to lower latency, robust signal strengths, shorter revisit times, and dense constellations. However, acquiring reliable channel state information (CSI) in LEO satellite communication remains challenging owing to severe signal attenuation over long propagation distances and short coherence times. Despite these challenges, LEO channels benefit from pronounced line-of-sight dominance and geometric properties inherently tied to positioning information. In this work, we propose an integrated positioning and communication (IPAC) framework for multi-LEO satellite networks to address the unique challenges posed by LEO channels. Specifically, we leverage in-the-loop LEO positioning to exploit users' position information for improving uplink CSI acquisition. To overcome the link-budget limitations of single-satellite systems, cooperative multi-LEO uplink data detection is adopted. By exploiting the different coherent timescales of position-related parameters and random channel gains, we develop a dual-timescale Kalman filter-based IPAC framework: an unscented Kalman filter (UKF) for tracking users' position and velocity in the large-timescale, and a Kalman filter that leverages the position information obtained in the large-timescale for improved data-aided uplink channel estimation in the small-timescale. Finally, the two tasks of channel estimation and cooperative data detection are jointly addressed through the expectation maximization (EM) algorithm. Numerical results demonstrate that the proposed IPAC approach outperforms the conventional baseline in terms of channel estimation accuracy and communication performance.
- [29] arXiv:2306.10656 [中文pdf, pdf, html, 其他]
-
标题: 虚拟人类生成模型:用于学习人类特征的掩码建模方法标题: Virtual Human Generative Model: Masked Modeling Approach for Learning Human CharacteristicsSithmini Ranasingha, Agasthi Haputhanthri, Hansa Marasinghe, Nima Wickramasinghe, Kithmin Wickremasinghe, Jithangi Wanigasinghe, Chamira U. S. Edussooriya, Joshua P. Kulasingham主题: 机器学习 (cs.LG) ; 人工智能 (cs.AI) ; 机器学习 (stat.ML)
虚拟人生成模型(VHGM)是一个生成模型,它近似超过2000个人类医疗保健相关属性的联合概率。 本文介绍了核心算法,VHGM-MAE,这是一种针对处理高维、稀疏医疗数据的掩码自编码器(MAE)。 VHGM-MAE解决了四个关键的技术挑战:(1)医疗数据类型的异质性,(2)概率分布建模,(3)由于多个数据源导致的训练数据集中的系统性缺失,以及(4)高维、小-$n$-大-$p$的问题。 为了解决这些挑战,VHGM-MAE采用基于似然的方法来对具有异质类型的数据进行建模,采用基于Transformer的MAE来捕捉观测属性和缺失属性之间的复杂依赖关系,并采用一种新的训练方案,有效利用具有不同缺失模式的可用样本以缓解小-n大-p问题。 实验结果表明,VHGM-MAE在缺失值填补和合成数据生成方面均优于现有方法。
Virtual Human Generative Model (VHGM) is a generative model that approximates the joint probability over more than 2000 human healthcare-related attributes. This paper presents the core algorithm, VHGM-MAE, a masked autoencoder (MAE) tailored for handling high-dimensional, sparse healthcare data. VHGM-MAE tackles four key technical challenges: (1) heterogeneity of healthcare data types, (2) probability distribution modeling, (3) systematic missingness in the training dataset arising from multiple data sources, and (4) the high-dimensional, small-$n$-large-$p$ problem. To address these challenges, VHGM-MAE employs a likelihood-based approach to model distributions with heterogeneous types, a transformer-based MAE to capture complex dependencies among observed and missing attributes, and a novel training scheme that effectively leverages available samples with diverse missingness patterns to mitigate the small-n-large-p problem. Experimental results demonstrate that VHGM-MAE outperforms existing methods in both missing value imputation and synthetic data generation.
- [30] arXiv:2505.13787 [中文pdf, pdf, html, 其他]
-
标题: 带有李检测器的偏好学习可以诱导诚实或回避标题: Preference Learning with Lie Detectors can Induce Honesty or Evasion评论: 神经信息处理系统大会 2025主题: 机器学习 (cs.LG) ; 人工智能 (cs.AI)
随着AI系统的功能越来越强大,欺骗行为可能在部署时破坏评估并误导用户。 近期的研究表明,谎言检测器可以准确分类欺骗行为,但由于对污染和目标劫持的担忧,它们通常不用于训练流程中。 我们通过将谎言检测器整合到大语言模型后训练的标注步骤中,来检验这些担忧,并评估所学习的策略是否真正更加诚实,或者反而学会欺骗谎言检测器同时保持欺骗性。 使用DolusChat,一个包含65k个示例的新数据集,其中包含配对的真实/欺骗性回复,我们确定了三个决定所学策略诚实性的关键因素:偏好学习期间的探索量、谎言检测器的准确性以及KL正则化强度。 我们发现,结合谎言检测器和GRPO的偏好学习可能导致逃避谎言检测器的策略,欺骗率超过85%。 然而,如果谎言检测器的真正阳性率(TPR)或KL正则化足够高,GRPO会学习到诚实的策略。 相比之下,非策略算法(DPO)在现实的TPR下始终导致低于25%的欺骗率。 我们的结果表明,情况比之前设想的更为复杂:根据上下文,增强谎言检测器的训练可以成为一种强大的可扩展监督工具,或者是一种鼓励不可检测偏差的无效方法。
As AI systems become more capable, deceptive behaviors can undermine evaluation and mislead users at deployment. Recent work has shown that lie detectors can accurately classify deceptive behavior, but they are not typically used in the training pipeline due to concerns around contamination and objective hacking. We examine these concerns by incorporating a lie detector into the labelling step of LLM post-training and evaluating whether the learned policy is genuinely more honest, or instead learns to fool the lie detector while remaining deceptive. Using DolusChat, a novel 65k-example dataset with paired truthful/deceptive responses, we identify three key factors that determine the honesty of learned policies: amount of exploration during preference learning, lie detector accuracy, and KL regularization strength. We find that preference learning with lie detectors and GRPO can lead to policies which evade lie detectors, with deception rates of over 85\%. However, if the lie detector true positive rate (TPR) or KL regularization is sufficiently high, GRPO learns honest policies. In contrast, off-policy algorithms (DPO) consistently lead to deception rates under 25\% for realistic TPRs. Our results illustrate a more complex picture than previously assumed: depending on the context, lie-detector-enhanced training can be a powerful tool for scalable oversight, or a counterproductive method encouraging undetectable misalignment.
- [31] arXiv:2511.14357 [中文pdf, pdf, 其他]
-
标题: IBGS:基于图像的高斯点云标题: IBGS: Image-Based Gaussian Splatting评论: 被NeurIPS 2025接收主题: 计算机视觉与模式识别 (cs.CV)
3D高斯点云(3DGS)最近作为一种快速且高质量的方法在新视角合成(NVS)中出现。然而,其使用低阶球面谐波限制了其捕捉空间变化颜色和视图相关效果(如镜面高光)的能力。现有工作通过全局纹理图对高斯进行增强,这在复杂场景中表现不佳,或者通过每个高斯的纹理图进行增强,这会引入高的存储开销。我们提出了基于图像的高斯点云,这是一种高效的替代方法,利用高分辨率源图像来获取精细细节和视图特定的颜色建模。具体来说,我们将每个像素颜色建模为标准3DGS渲染的基础颜色与从相邻训练图像推断出的已学习残差的组合。这促进了准确的表面对齐,并实现了高频细节和准确视图相关效果的图像渲染。在标准NVS基准测试中的实验表明,我们的方法在不增加存储开销的情况下,在渲染质量上显著优于之前的高斯点云方法。
3D Gaussian Splatting (3DGS) has recently emerged as a fast, high-quality method for novel view synthesis (NVS). However, its use of low-degree spherical harmonics limits its ability to capture spatially varying color and view-dependent effects such as specular highlights. Existing works augment Gaussians with either a global texture map, which struggles with complex scenes, or per-Gaussian texture maps, which introduces high storage overhead. We propose Image-Based Gaussian Splatting, an efficient alternative that leverages high-resolution source images for fine details and view-specific color modeling. Specifically, we model each pixel color as a combination of a base color from standard 3DGS rendering and a learned residual inferred from neighboring training images. This promotes accurate surface alignment and enables rendering images of high-frequency details and accurate view-dependent effects. Experiments on standard NVS benchmarks show that our method significantly outperforms prior Gaussian Splatting approaches in rendering quality, without increasing the storage footprint.
- [32] arXiv:2511.14209 [中文pdf, pdf, 其他]
-
标题: 完全无变压器通用直接注入功率流控制器标题: Entirely Transformerless Universal Direct-Injection Power-Flow Controller评论: 8页,12图主题: 系统与控制 (eess.SY)
可再生能源资源、电动汽车充电器和储能系统在低压配电网络中的渗透率不断提高,导致了若干电力管理和稳定性问题,如反向功率流动、(局部)过载线路以及过压/欠压。 以前的潮流和软开点解决方案体积大且成本高。它们需要变压器和大型磁性元件,一些基于电网频率,另一些则在高频下更紧凑。即使建议的使用高频变压器的电路,在成本和尺寸方面仍然存在困难。 我们提出了一种无需单个变压器的紧凑型部分功率转换高电流全功率流控制电路。 我们将硅和碳化硅结合使用,各自利用其特定优势进行高电流直接注入。 该电路所需的半导体元件比以往的概念更少。 该电路通过非隔离逆变器双向连接并联转换器与低压串联模块,这些模块实际上可以与其各自的相位浮动,并可在低压配电网络的不同馈线之间运行。 我们对电路进行了数学分析,并通过仿真和实验结果评估了其运行情况。
An increasing penetration of renewable energy resources, electric vehicle chargers, and energy storage systems into low-voltage power grids causes several power management and stability problems, such as reverse power flow, (local) overload lines, and over- / under-voltage. Previous power-flow and soft-open-point solutions are bulky and expensive. They need transformers and large magnetics, some on grid frequency, others more compact at high frequency. Even suggested circuits with high-frequency transformers still struggle with cost and size. We present a compact partial power-conversion high-current full-power-flow control circuit without a single transformer. We combine silicon and silicon-carbide, each with their specific advantages for current-dense direct injection. The circuit further needs fewer semiconductors than previous concepts. The circuit links a shunt converter through a non-isolated inverter bidirectionally with low-voltage series modules that practically float with their respective phases can serve between different feeders in low-voltage power grids. We analyze the circuit mathematically and evaluate the operation in simulation and experimental results.
- [33] arXiv:2511.14182 [中文pdf, pdf, html, 其他]
-
标题: WebRec:利用网络中注意力引导的RAG增强基于大语言模型的推荐标题: WebRec: Enhancing LLM-based Recommendations with Attention-guided RAG from Web主题: 信息检索 (cs.IR)
推荐系统在缓解信息过载和丰富用户的在线体验方面起着至关重要的作用。 在大型语言模型(LLMs)的时代,基于LLM的推荐系统已成为推动个性化推荐的普遍范式。 最近,检索增强生成(RAG)引起了越来越多的关注,以促进LLM的推荐能力,结合从外部知识库中检索到的有用信息。 然而,作为最新信息的丰富来源,网络仍然未被现有的基于RAG的推荐系统充分探索。 特别是,从两个角度来看,提出了独特的挑战:一个是生成有效的查询用于网络检索,考虑到网络搜索和推荐之间的固有知识差距;另一个挑战在于利用包含大量噪声内容的在线网站。 为了解决这些限制,我们提出了WebRec,一种新的基于网络的RAG框架,它利用LLM的推理能力,将推荐任务转化为满足网络检索的用户偏好查询。 此外,考虑到从网络检索的信息存在噪声,相关证据分散较远,设计了一个有见地的MP-Head,通过消息传递来增强LLM对远距离相关信息标记的注意力。 进行了广泛的实验,以证明我们提出的基于网络的RAG方法在推荐场景中的有效性。
Recommender systems play a vital role in alleviating information overload and enriching users' online experience. In the era of large language models (LLMs), LLM-based recommender systems have emerged as a prevalent paradigm for advancing personalized recommendations. Recently, retrieval-augmented generation (RAG) has drawn growing interest to facilitate the recommendation capability of LLMs, incorporating useful information retrieved from external knowledge bases. However, as a rich source of up-to-date information, the web remains under-explored by existing RAG-based recommendations. In particular, unique challenges are posed from two perspectives: one is to generate effective queries for web retrieval, considering the inherent knowledge gap between web search and recommendations; another challenge lies in harnessing online websites that contain substantial noisy content. To tackle these limitations, we propose WebRec, a novel web-based RAG framework, which takes advantage of the reasoning capability of LLMs to interpret recommendation tasks into queries of user preferences that cater to web retrieval. Moreover, given noisy web-retrieved information, where relevant pieces of evidence are scattered far apart, an insightful MP-Head is designed to enhance LLM attentions between distant tokens of relevant information via message passing. Extensive experiments have been conducted to demonstrate the effectiveness of our proposed web-based RAG methods in recommendation scenarios.
- [34] arXiv:2511.14544 [中文pdf, pdf, 其他]
-
标题: 注意差距:维度降低中的视觉伪影测量标题: Mind the Gaps: Measuring Visual Artifacts in Dimensionality Reduction主题: 机器学习 (cs.LG)
降维(DR)技术由于能够将高维点集投影到2D平面上,常用于高维数据的可视化探索和分析。 然而,将数据集投影到较低维度通常会带来一些失真,这种失真不一定容易识别,但可能导致用户得出错误的结论。 已经开发了几种投影质量度量(PQMs)作为量化DR投影拟合优度的工具;然而,它们大多专注于测量投影在多大程度上捕捉了数据的全局或局部结构,而没有考虑结果图的视觉失真,因此常常忽略了可能误导投影视觉分析的异常值或伪影的存在。 在本工作中,我们引入了变形指数(WI),这是一种基于正确保留点之间空区域重要性的假设,用于测量DR投影到2D平面的质量的新指标。
Dimensionality Reduction (DR) techniques are commonly used for the visual exploration and analysis of high-dimensional data due to their ability to project datasets of high-dimensional points onto the 2D plane. However, projecting datasets in lower dimensions often entails some distortion, which is not necessarily easy to recognize but can lead users to misleading conclusions. Several Projection Quality Metrics (PQMs) have been developed as tools to quantify the goodness-of-fit of a DR projection; however, they mostly focus on measuring how well the projection captures the global or local structure of the data, without taking into account the visual distortion of the resulting plots, thus often ignoring the presence of outliers or artifacts that can mislead a visual analysis of the projection. In this work, we introduce the Warping Index (WI), a new metric for measuring the quality of DR projections onto the 2D plane, based on the assumption that the correct preservation of empty regions between points is of crucial importance towards a faithful visual representation of the data.
- [35] arXiv:2511.14436 [中文pdf, pdf, html, 其他]
-
标题: 分析混合程序的许多模拟在Lince中标题: Analyzing Many Simulations of Hybrid Programs in Lince评论: 在FMAS 2025会议论文集,arXiv:2511.13245期刊参考: EPTCS 436,2025,第88-95页主题: 计算机科学中的逻辑 (cs.LO) ; 计算工程、金融与科学 (cs.CE)
混合系统在医疗设备、基础设施系统和自动驾驶车辆等关键应用中被越来越多地使用。 Lince 是一个学术工具,使用带有微分方程的类似 C 的语言来指定和模拟此类系统。 本文介绍了最近的实验,这些实验为 Lince 增加了执行多个仿真变体并生成直方图的机制,这些直方图量化了给定属性成立的频率。 我们使用自适应巡航控制系统的变化形式来说明扩展后的 Lince。
Hybrid systems are increasingly used in critical applications such as medical devices, infrastructure systems, and autonomous vehicles. Lince is an academic tool for specifying and simulating such systems using a C-like language with differential equations. This paper presents recent experiments that enhance Lince with mechanisms for executing multiple simulation variants and generating histograms that quantify the frequency with which a given property holds. We illustrate our extended Lince using variations of an adaptive cruise control system.
- [36] arXiv:2511.14282 [中文pdf, pdf, html, 其他]
-
标题: 权重方差放大器在高稀疏性一次性剪枝中提高准确性标题: Weight Variance Amplifier Improves Accuracy in High-Sparsity One-Shot Pruning主题: 机器学习 (cs.LG) ; 人工智能 (cs.AI)
深度神经网络在视觉识别任务中表现出色,但其大量参数使其在实际应用中不够实用。 最近,一次性剪枝作为一种有效的策略出现,可以在不进行额外训练的情况下减少模型大小。 然而,使用标准目标函数训练的模型在激进剪枝后通常会显著降低准确率。 一些现有的剪枝鲁棒优化器,如SAM和CrAM,通过引导模型进入参数空间中更平坦的区域来减轻准确率下降,但它们不可避免地带来不可忽略的额外计算量。 我们提出了一种方差放大正则化器(VAR),在训练过程中有意增加模型参数的方差。 我们的研究发现了一个有趣的结论,即方差较高的参数表现出更强的剪枝鲁棒性。 VAR通过在权重分布中促进这种方差,从而缓解剪枝的不利影响。 我们进一步提供了其收敛行为的理论分析,并通过广泛的实验结果证明了VAR的优越剪枝鲁棒性。
Deep neural networks achieve outstanding performance in visual recognition tasks, yet their large number of parameters makes them less practical for real-world applications. Recently, one-shot pruning has emerged as an effective strategy for reducing model size without additional training. However, models trained with standard objective functions often suffer a significant drop in accuracy after aggressive pruning. Some existing pruning-robust optimizers, such as SAM, and CrAM, mitigate this accuracy drop by guiding the model toward flatter regions of the parameter space, but they inevitably incur non-negligible additional computations. We propose a Variance Amplifying Regularizer (VAR) that deliberately increases the variance of model parameters during training. Our study reveals an intriguing finding that parameters with higher variance exhibit greater pruning robustness. VAR exploits this property by promoting such variance in the weight distribution, thereby mitigating the adverse effects of pruning. We further provide a theoretical analysis of its convergence behavior, supported by extensive empirical results demonstrating the superior pruning robustness of VAR.
- [37] arXiv:2511.14675 [中文pdf, pdf, html, 其他]
-
标题: 一个关于紧支集上同调的猜想$C_{\rm st}$标题: Une conjecture $C_{\rm st}$ pour la cohomologie à support compact评论: 8页,法语。欢迎发表评论!主题: 代数几何 (math.AG)
设$\mathbf{B}$为在法格斯-方丹曲线$Y_{\rm FF}$上的解析函数环。我们证明,添加$p$的类似$\log p$和$\log 2πi$的 adic 类似物会使其在度数~$\geq 1$的伽罗瓦上同调消失。对于$\mathbf{B}^+_{\rm dR}$的类似结果是常识性的。 这使得可以为$C_{\rm dR}$和$C_{\rm st}$类猜想提出紧支点上同调的$p$ 进分析流形。
Let $\mathbf{B}$ be the ring of analytic functions on the Fargues-Fontaine curve $Y_{\rm FF}$. We show that adding $p$-adic analogs of $\log p$ and $\log 2πi$ kills its Galois cohomology in degrees~$\geq 1$. The analogous result for $\mathbf{B}^+_{\rm dR}$ is folklore. This makes it possible to formulate $C_{\rm dR}$ and $C_{\rm st}$-type conjectures for compact support cohomology of $p$-adic analytic varieties.
- [38] arXiv:2511.14457 [中文pdf, pdf, html, 其他]
-
标题: 在ESP32上对OpenWiFiSync进行基准测试:迈向成本有效的无线时间同步标题: Benchmarking OpenWiFiSync on ESP32: Towards Cost-Effective Wireless Time Synchronization主题: 网络与互联网架构 (cs.NI)
移动设备的无线时间同步是许多工业4.0应用的关键使能技术,例如协调和同步任务或为机器学习或人工智能算法生成高精度时间戳。 然而,传统的有线时钟同步协议在无线环境中无法实现所需的性能,除非进行重大修改。 为了解决这一挑战,我们采用了参考广播基础设施同步协议,该协议利用了无线通信的广播特性,并且既非侵入性又符合标准。 我们在一个低成本的测试平台上实现了该协议,使用ESP32模块和商用Wi-Fi接入点进行了验证。 为了支持进一步的研究和开发,我们通过GitHub上的OpenWifiSync项目,以GNU通用公共许可证第3版将我们的实现作为开源软件发布。 我们的结果表明,使用节能且经济的硬件可以实现±30微秒以内的同步精度,使这种方法适用于各种应用场景。
Wireless time synchronization of mobile devices is a key enabler for numerous Industry 4.0 applications, such as coordinated and synchronized tasks or the generation of high-precision timestamps for machine learning or artificial intelligence algorithms. Traditional wireline clock synchronization protocols, however, cannot achieve the performance in wireless environments without significant modifications. To address this challenge, we make use of the Reference Broadcast Infrastructure Synchronization protocol, which leverages the broadcast nature of wireless communications and remains both non-invasive and standard-compliant. We implement and validate this protocol on a low-cost testbed using ESP32 modules and a commercial Wi-Fi access point. To support further research and development, we release our implementation as open-source software under the GNU General Public License Version 3 license via the OpenWifiSync project on GitHub. Our results demonstrate that synchronization accuracies within +/-30 microseconds are achievable using energy-efficient and affordable hardware, making this approach suitable for a wide range of use cases.
- [39] arXiv:2511.14406 [中文pdf, pdf, html, 其他]
-
标题: 注意生命周期:评估针对联邦模型适应的后门攻击标题: Watch Out for the Lifespan: Evaluating Backdoor Attacks Against Federated Model Adaptation评论: 已接受于FPS 2025主题: 机器学习 (cs.LG) ; 密码学与安全 (cs.CR)
通过联邦学习(FL)进行大型模型适应,解决了广泛的应用场景,并由参数高效微调技术(如低秩适应LoRA)实现。 然而,这种分布式学习范式面临多种安全威胁,尤其是对其完整性的威胁,例如后门攻击,在某些客户的本地训练步骤中试图注入恶意行为。 我们对LoRA对针对FL中模型适应的最先进后门攻击的影响进行了首次分析。 具体来说,我们关注后门生命周期,这是FL中的一个关键特性,其长短取决于攻击场景和攻击者有效注入后门的能力。 我们在实验中的一个关键发现是,对于最优注入的后门,当LoRA的秩较低时,攻击后的后门持续时间更长。 重要的是,我们的工作突显了针对FL的后门攻击评估问题,并有助于开发更稳健和公平的后门攻击评估方法,从而提高对关键FL系统的风险评估的可靠性。 我们的代码是公开的。
Large models adaptation through Federated Learning (FL) addresses a wide range of use cases and is enabled by Parameter-Efficient Fine-Tuning techniques such as Low-Rank Adaptation (LoRA). However, this distributed learning paradigm faces several security threats, particularly to its integrity, such as backdoor attacks that aim to inject malicious behavior during the local training steps of certain clients. We present the first analysis of the influence of LoRA on state-of-the-art backdoor attacks targeting model adaptation in FL. Specifically, we focus on backdoor lifespan, a critical characteristic in FL, that can vary depending on the attack scenario and the attacker's ability to effectively inject the backdoor. A key finding in our experiments is that for an optimally injected backdoor, the backdoor persistence after the attack is longer when the LoRA's rank is lower. Importantly, our work highlights evaluation issues of backdoor attacks against FL and contributes to the development of more robust and fair evaluations of backdoor attacks, enhancing the reliability of risk assessments for critical FL systems. Our code is publicly available.
- [40] arXiv:2511.14672 [中文pdf, pdf, html, 其他]
-
标题: 通过箭图模空间和紧密分次的聚类散射图标题: Cluster scattering diagrams via quiver moduli and tight gradings评论: 37页,欢迎提出意见主题: 组合数学 (math.CO) ; 代数几何 (math.AG) ; 表示理论 (math.RT)
我们通过模空间的quiver表示和最近发展的紧致分层组合框架来研究rank-2簇散射图。 结合quiver理论和组合方法,我们证明并扩展了Elgin--Reading--Stella提出的关于墙函数系数结构和计数性质的一系列猜想。 紧致分层的观点也提供了散射图的Weyl群对称性的新的证明。
We study rank-2 cluster scattering diagrams through moduli spaces of quiver representations and a recently developed combinatorial framework of tight gradings. Combining quiver-theoretic and combinatorial methods, we prove and extend a collection of conjectures posed by Elgin--Reading--Stella concerning the structural and enumerative properties of the wall-function coefficients. The tight grading perspective also provides a new proof of the Weyl group symmetry of the scattering diagram.
- [41] arXiv:2511.14242 [中文pdf, pdf, 其他]
-
标题: 尾部提示:探索用于自动驾驶车辆交互的仿生机器人尾巴标题: TailCue: Exploring Animal-inspired Robotic Tail for Automated Vehicles Interaction主题: 人机交互 (cs.HC)
自动驾驶车辆(AVs)正逐渐成为我们日常生活的一部分。 然而,道路使用者与自动驾驶车辆之间的有效沟通仍然是一个重大挑战。 尽管已经开发了各种外部人机界面(eHMIs)以促进交互,但心理因素,如缺乏信任和情感信号不足,可能仍然会阻止用户在某些情况下自信地与自动驾驶车辆互动。 为弥补这一差距,我们提出了TailCue,探讨基于尾部的eHMIs如何影响用户与自动驾驶车辆的交互。 我们首先研究了机器人技术和动物学中尾部运动与情感表达之间的映射关系,并据此开发了一种运动-情感映射方案。 实现了一个物理机械尾巴,并根据我们的方案设计了特定的尾巴运动。 进行了一项在线视频用户研究,共有21名参与者。 我们的研究结果表明,尽管尾巴传达的意图情感并未被一致识别,但开放式反馈表明,尾巴运动需要与场景和线索相一致。 我们的结果突显了场景特定优化的必要性,以增强基于尾部的eHMIs。 未来的工作将优化尾巴运动策略,以在多样化的交互情境中最大化其效果。
Automated vehicles (AVs) are gradually becoming part of our daily lives. However, effective communication between road users and AVs remains a significant challenge. Although various external human-machine interfaces (eHMIs) have been developed to facilitate interactions, psychological factors, such as a lack of trust and inadequate emotional signaling, may still deter users from confidently engaging with AVs in certain contexts. To address this gap, we propose TailCue, an exploration of how tail-based eHMIs affect user interaction with AVs. We first investigated mappings between tail movements and emotional expressions from robotics and zoology, and accordingly developed a motion-emotion mapping scheme. A physical robotic tail was implemented, and specific tail motions were designed based on our scheme. An online, video-based user study with 21 participants was conducted. Our findings suggest that, although the intended emotions conveyed by the tail were not consistently recognized, open-ended feedback indicated that the tail motion needs to align with the scenarios and cues. Our result highlights the necessity of scenario-specific optimization to enhance tail-based eHMIs. Future work will refine tail movement strategies to maximize their effectiveness across diverse interaction contexts.
- [42] arXiv:2511.14166 [中文pdf, pdf, html, 其他]
-
标题: 选择性弱到强的泛化标题: Selective Weak-to-Strong Generalization评论: AAAI2025人工智能对齐专题研讨会主题: 计算与语言 (cs.CL) ; 人工智能 (cs.AI)
未来超人类模型将超越人类的能力,人类只能对超人类模型进行\textit{弱的}监督。 为了缓解模型对齐中高质量数据不足的问题,一些关于弱到强泛化(W2SG)的工作使用弱监督器微调一个强大的预训练模型,使其能够超越弱监督进行泛化。 然而,现有方法中不变地使用弱监督会暴露出鲁棒性问题,其中一部分弱标签对模型是有害的。 在本文中,我们提出了一种选择性 W2SG 框架,在不需要时避免使用弱监督。 我们训练了一个二分类器 P(IK) 来识别强模型可以回答的问题,并使用其自生成的标签进行对齐。 我们进一步通过图平滑方法精炼弱标签。 在三个基准测试上的大量实验表明,我们的方法始终优于竞争性基线。 进一步分析表明,P(IK) 可以跨任务和难度进行泛化,这表明选择性 W2SG 可以帮助超对齐。
Future superhuman models will surpass the ability of humans and humans will only be able to \textit{weakly} supervise superhuman models. To alleviate the issue of lacking high-quality data for model alignment, some works on weak-to-strong generalization (W2SG) finetune a strong pretrained model with a weak supervisor so that it can generalize beyond weak supervision. However, the invariable use of weak supervision in existing methods exposes issues in robustness, with a proportion of weak labels proving harmful to models. In this paper, we propose a selective W2SG framework to avoid using weak supervision when unnecessary. We train a binary classifier P(IK) to identify questions that a strong model can answer and use its self-generated labels for alignment. We further refine weak labels with a graph smoothing method. Extensive experiments on three benchmarks show that our method consistently outperforms competitive baselines. Further analyses show that P(IK) can generalize across tasks and difficulties, which indicates selective W2SG can help superalignment.
- [43] arXiv:2511.14368 [中文pdf, pdf, html, 其他]
-
标题: O3SLM:开放权重、开放数据和开放词汇草图语言模型标题: O3SLM: Open Weight, Open Data, and Open Vocabulary Sketch-Language Model评论: 被AAAI 2026接收主题: 计算机视觉与模式识别 (cs.CV) ; 计算与语言 (cs.CL) ; 机器学习 (cs.LG)
虽然大型视觉语言模型(LVLMs)在现实应用中越来越被部署,但它们解释抽象视觉输入的能力仍然有限。 具体来说,它们难以理解手绘草图,这种模态提供了一种直观的方式来表达难以用文字描述的概念。 我们确定主要瓶颈在于缺乏一个大规模的数据集,该数据集联合建模了草图、逼真图像和相应的自然语言指令。 为了解决这个问题,我们提出了两个关键贡献:(1) 一个新设计的大规模图像-草图-指令三元组数据集,旨在促进预训练和指令微调,以及 (2) O3SLM,一个在此数据集上训练的 LVLM。 在多个基于草图的任务上的全面评估:(a) 对象定位,(b) 计数,(c) 图像检索即 (SBIR 和细粒度 SBIR),以及 (d) 视觉问答 (VQA);同时结合三个现有的草图数据集,即 QuickDraw!、Sketchy 和 Tu Berlin,以及我们生成的 SketchVCL 数据集,结果显示 O3SLM 实现了最先进的性能,在草图理解和推理方面显著优于现有的 LVLMs。
While Large Vision Language Models (LVLMs) are increasingly deployed in real-world applications, their ability to interpret abstract visual inputs remains limited. Specifically, they struggle to comprehend hand-drawn sketches, a modality that offers an intuitive means of expressing concepts that are difficult to describe textually. We identify the primary bottleneck as the absence of a large-scale dataset that jointly models sketches, photorealistic images, and corresponding natural language instructions. To address this, we present two key contributions: (1) a new, large-scale dataset of image-sketch-instruction triplets designed to facilitate both pretraining and instruction tuning, and (2) O3SLM, an LVLM trained on this dataset. Comprehensive evaluations on multiple sketch-based tasks: (a) object localization, (b) counting, (c) image retrieval i.e., (SBIR and fine-grained SBIR), and (d) visual question answering (VQA); while incorporating the three existing sketch datasets, namely QuickDraw!, Sketchy, and Tu Berlin, along with our generated SketchVCL dataset, show that O3SLM achieves state-of-the-art performance, substantially outperforming existing LVLMs in sketch comprehension and reasoning.
- [44] arXiv:2509.17101 [中文pdf, pdf, html, 其他]
-
标题: 多用户连续孔径阵列系统的功能WMMSE算法标题: Functional WMMSE Algorithm for Multiuser Continuous Aperture Array Systems评论: 6页,4图主题: 信号处理 (eess.SP)
在本文中,我们开发了一种功能加权最小均方误差(WMMSE)算法,用于多用户连续孔径阵列(CAPA)系统中的下行波束成形,其中基站(BS)和用户都配备了CAPAs。 我们首先提出了一种多用户CAPA系统的可实现速率的显式表达式,基于此建立了最大化总速率与最小化加权均方误差(MSE)总和之间的等价性。 然后,我们采用正交基展开将制定的功能优化问题转换为参数优化问题。 通过推导参数优化问题的一阶最优条件,并将其映射回功能域,我们得到了所提出的功能WMMSE算法的更新方程。 仿真结果表明,所提出的方法在总速率和计算复杂度方面均优于基于离散化的基准方法。
In this paper, we develop a functional weighted minimum mean-squared error (WMMSE) algorithm for downlink beamforming in multiuser continuous aperture array (CAPA) systems where both the base station (BS) and users are equipped with CAPAs. We first present a closed-form expression for the achievable rate in multiuser CAPA systems, based on which the equivalence between maximizing the sum rate and minimizing the sum of weighted mean-squared errors (MSE) is established. We then employ the orthonormal basis expansion to transform the formulated functional optimization problem into a parameter optimization problem. By deriving the first-order optimality conditions of the parameter optimization problem and mapping them back to the functional domain, we obtain the update equations of the proposed functional WMMSE algorithm. Simulation results show that the proposed method outperforms discretization-based baselines in both sum rate and computational complexity.
- [45] arXiv:2503.07155 [中文pdf, pdf, html, 其他]
-
标题: 频率差异阵列 OFDM 传输系统在频率中部分重叠标题: Frequency Diverse Array OFDM Transmit System with Partial Overlap in Frequency评论: 被JC&S 2026接受主题: 信号处理 (eess.SP)
频率多样化阵列(FDA)是一种替代的阵列架构,其中每个天线前面有一个混频器,而不是相移器。 混频器在每个天线发射的信号之间引入频率偏移,导致时变的波束图。 然而,时变波束成形对于通信或感知来说并不理想。 在本文中,FDA与正交频分复用(OFDM)调制相结合。 所提出的波束成形方法将所有天线发送的OFDM符号划分为子载波块,这些子载波块携带相同的数据,但预编码方式不同。 天线之间的频率偏移等于子载波块的宽度。 因此,每个天线在中心频率上发送不同预编码的子载波块,导致块的重叠和相干叠加。 所提出的架构能够在单个块上实现完全数字波束成形,而只需要一个数模转换器。 在联合通信和感知的背景下研究了系统的性能和权衡。
A frequency-diverse array (FDA) is an alternative array architecture in which each antenna is preceded by a mixer instead of a phase shifter. The mixers introduce a frequency offset between signals transmitted by each antenna, resulting in a time-varying beam pattern. However, time-dependent beamforming is not desirable for communication or sensing. In this paper, the FDA is combined with orthogonal frequency-division multiplexing (OFDM) modulation. The proposed beamforming method partitions the OFDM symbol transmitted by all antennas into subcarrier blocks, which carry the same data but are precoded differently. The frequency offset between the antennas is equal to the subcarrier block width. Consequently, each antenna transmits a differently precoded subcarrier block at the center frequency, resulting in overlap and coherent summation of the blocks. Proposed architecture enables fully digital beamforming over a single block while requiring only a single digital-to-analog converter. The system's performance and tradeoffs are investigated in the context of joint communication and sensing.
- [46] arXiv:2511.14458 [中文pdf, pdf, 其他]
-
标题: 以机器人柔性内窥镜推进开放腔体中的微创精准手术标题: Advancing Minimally Invasive Precision Surgery in Open Cavities with Robotic Flexible Endoscopy主题: 机器人技术 (cs.RO)
柔性机器人在通过提供卓越的灵巧性、精确控制和安全的组织交互来增强微创手术(MIS)方面具有巨大潜力。 然而,将这些优势转化为开放腔体内的内窥镜干预仍然具有挑战性。 缺乏解剖约束以及此类设备的固有灵活性使得它们的控制变得复杂,而内窥镜的有限视野则限制了情境意识。 我们提出了一种机器人平台,旨在克服这些挑战,并展示了其在胎儿内窥镜激光凝固术中的潜力,这是一种复杂的MIS手术,通常仅由经验丰富的外科医生执行。 我们的系统结合了磁驱动的柔性内窥镜与远程操作和半自主导航能力,以进行靶向激光消融。 为了增强手术意识,该平台实时重建内窥镜场景的拼接图像,提供扩展且连续的视觉背景。 该系统解决开放空间中MIS关键局限性的能力在绵羊模型中进行了活体验证。
Flexible robots hold great promise for enhancing minimally invasive surgery (MIS) by providing superior dexterity, precise control, and safe tissue interaction. Yet, translating these advantages into endoscopic interventions within open cavities remains challenging. The lack of anatomical constraints and the inherent flexibility of such devices complicate their control, while the limited field of view of endoscopes restricts situational awareness. We present a robotic platform designed to overcome these challenges and demonstrate its potential in fetoscopic laser coagulation, a complex MIS procedure typically performed only by highly experienced surgeons. Our system combines a magnetically actuated flexible endoscope with teleoperated and semi-autonomous navigation capabilities for performing targeted laser ablations. To enhance surgical awareness, the platform reconstructs real-time mosaics of the endoscopic scene, providing an extended and continuous visual context. The ability of this system to address the key limitations of MIS in open spaces is validated in vivo in an ovine model.
- [47] arXiv:2511.14315 [中文pdf, pdf, html, 其他]
-
标题: Dental3R:从稀疏视图照片中进行口腔内三维重建的几何感知配对标题: Dental3R: Geometry-Aware Pairing for Intraoral 3D Reconstruction from Sparse-View Photographs主题: 计算机视觉与模式识别 (cs.CV)
口腔内三维重建是数字正畸学的基础,但传统的如口内扫描等方法对于远程远程正畸学不可用,后者通常依赖于稀疏的智能手机图像。 虽然三维高斯点云(3DGS)在新视图合成方面显示出前景,但将其应用于标准临床三联征的未定位前牙和双侧颊部照片仍具有挑战性。 口腔内环境中常见的大视角基线、不一致的光照和镜面表面可能会导致同时姿态和几何估计的不稳定。 此外,稀疏视图的光度监督常常会引起频率偏差,导致过度平滑的重建,丢失关键的诊断细节。 为了解决这些限制,我们提出了\textbf{牙科3R},一种无姿态、图引导的管道,用于从稀疏的口腔内照片中进行鲁棒、高保真的重建。 我们的方法首先构建一个几何感知配对策略(GAPS),以智能选择高价值图像对的紧凑子图。 GAPS专注于对应匹配,从而提高几何初始化的稳定性并减少内存使用。 在恢复的姿态和点云基础上,我们使用小波正则化目标训练3DGS模型。 通过使用离散小波变换强制带限保真度,我们的方法在抑制高频伪影的同时保留了细小的釉质边界和邻面边缘。 我们在一个包含950个临床案例的大规模数据集和一个额外的195个案例的视频测试集上验证了我们的方法。 实验结果表明,Dental3R能够有效处理稀疏、未定位的输入,并在牙科咬合可视化的新视图合成质量方面优于最先进的方法。
Intraoral 3D reconstruction is fundamental to digital orthodontics, yet conventional methods like intraoral scanning are inaccessible for remote tele-orthodontics, which typically relies on sparse smartphone imagery. While 3D Gaussian Splatting (3DGS) shows promise for novel view synthesis, its application to the standard clinical triad of unposed anterior and bilateral buccal photographs is challenging. The large view baselines, inconsistent illumination, and specular surfaces common in intraoral settings can destabilize simultaneous pose and geometry estimation. Furthermore, sparse-view photometric supervision often induces a frequency bias, leading to over-smoothed reconstructions that lose critical diagnostic details. To address these limitations, we propose \textbf{Dental3R}, a pose-free, graph-guided pipeline for robust, high-fidelity reconstruction from sparse intraoral photographs. Our method first constructs a Geometry-Aware Pairing Strategy (GAPS) to intelligently select a compact subgraph of high-value image pairs. The GAPS focuses on correspondence matching, thereby improving the stability of the geometry initialization and reducing memory usage. Building on the recovered poses and point cloud, we train the 3DGS model with a wavelet-regularized objective. By enforcing band-limited fidelity using a discrete wavelet transform, our approach preserves fine enamel boundaries and interproximal edges while suppressing high-frequency artifacts. We validate our approach on a large-scale dataset of 950 clinical cases and an additional video-based test set of 195 cases. Experimental results demonstrate that Dental3R effectively handles sparse, unposed inputs and achieves superior novel view synthesis quality for dental occlusion visualization, outperforming state-of-the-art methods.
- [48] arXiv:2309.16458 [中文pdf, pdf, html, 其他]
-
标题: 代数群的同态纲领标题: Hom schemes for algebraic groupsJunyuan Gao, Weifeng Zhu, Yanmo Hu, Shuowen Zhang, Jiannong Cao, Yongpeng Wu, Giuseppe Caire, Liang Liu评论: 34页,已接受发表于《代数与数论》主题: 代数几何 (math.AG) ; 数论 (math.NT)
在SGA3中,Demazure和Grothendieck证明了如果$G$和$H$是某个概形$S$上的光滑仿射群概形,并且$G$是半单的,那么$S$-同态的函子$G \to H$是可表示的。 在本文中,我们将这一结果扩展到$G$不是半单的情况,并且证明更加简洁。 我们的结果特别适用于任何基上的抛物型,并且在域上基本上是最优的。 我们还将Hom概形中的闭轨道与Serre的完全可约性理论联系起来,回答了Furter--Kraft的一个问题,并提供了许多例子。
In SGA3, Demazure and Grothendieck showed that if $G$ and $H$ are smooth affine group schemes over a scheme $S$ and $G$ is reductive, then the functor of $S$-homomorphism $G \to H$ is representable. In this paper we extend this result to cover cases in which $G$ is not reductive, with much simpler proofs. Our results apply in particular to parabolics over any base, and they are essentially optimal over a field. We also relate the closed orbits in Hom schemes to Serre's theory of complete reducibility, answer a question of Furter--Kraft, and provide many examples.
- [49] arXiv:2511.13933 [中文pdf, pdf, html, 其他]
-
标题: 用于多载波认知无线电ISAC的叠加智能超表面标题: Stacked Intelligent Metasurfaces for Multicarrier Cognitive Radio ISAC评论: 13页,14图主题: 信号处理 (eess.SP)
认知无线电(CR)和集成感知与通信(ISAC)的融合,通过堆叠智能超表面(SIMs)实现,为6G及以后的多功能可编程前端提供了一条有前景的路径。 在本文中,我们提出了一种新颖的CR-ISAC框架,该框架利用与次级基站(SB)集成的SIM来学习并实现最优波束成形,同时(i)最小化用于定位次级用户设备(SU)的贝叶斯Cramér-Rao界(BCRB),以及(ii)限制主用户设备(PUs)的平均干扰,从而约束频谱效率损失,目标是不超过几百分比的退化。 我们提出了一种高效的基于交替优化的算法,以获得所有正交频分复用(OFDM)子载波的SIM最优端到端传输响应。 将分层SIM架构与深度神经网络进行类比,我们定义了一个波束成形匹配损失,推导出反向传播的解析梯度,并使用小批量Adam优化器实现了基于学习的SIM系数优化。 提供了复杂度分析,并进行了广泛的数值实验以评估所提出的CR-ISAC框架。 结果表明,当SIM具有足够多的层时,所提出的SIM系数优化方法在SU的BCRB定位指标和PUs的平均频谱效率方面都能达到接近最优的性能,并且显著优于传统的单层可重构智能表面(RIS)设计。
The fusion of cognitive radio (CR) and integrated sensing and communication (ISAC), enabled by stacked intelligent metasurfaces (SIMs), offers a promising path for multi-functional programmable front ends in 6G and beyond. In this paper we propose a novel CR-ISAC framework that leverages an SIM integrated with the secondary base station (SB) to learn and realize optimal beampatterns that simultaneously (i) minimize the Bayesian Cramér-Rao bound (BCRB) for localizing a secondary user equipment (SU) and (ii) limit averaged interference at primary user equipments (PUs) so that spectral efficiency loss is constrained, with the target of at most a few percent degradation. We propose an efficient alternating optimization-based algorithm to obtain the optimal end-to-end transmission response of the SIM for all orthogonal frequency division multiplexing (OFDM) subcarriers. Drawing an analogy between the layered SIM architecture and deep neural networks, we define a beampattern- matching loss, derive analytical gradients for backpropagation, and implement a learning-based optimization of the SIM coefficients using a mini-batch Adam optimizer. A complexity analysis is provided, and extensive numerical experiments are performed to evaluate the proposed CR-ISAC framework. The results show that the proposed SIM coefficient optimization methods attain near-optimal performance in terms of both the SU BCRB localization metric and the PUs average spectral efficiency when the SIM has a sufficient number of layers, and they substantially outperform traditional single-layer reconfigurable intelligent surface (RIS) designs.
- [50] arXiv:2511.14529 [中文pdf, pdf, html, 其他]
-
标题: 基于5G NR信号的低空经济双阶段ISAC框架标题: A Two-Stage ISAC Framework for Low-Altitude Economy Based on 5G NR Signals主题: 信号处理 (eess.SP)
下一代无线网络的发展推动了低空经济(LAE)的蓬勃发展。 为了支持这一新兴领域,同时保持与现有网络架构的兼容性,基于5G新空口(NR)信号的集成感知与通信(ISAC)被视为一种有前景的解决方案。 然而,仅利用标准5G NR信号,如同步信号块(SSB),在感知分辨率方面存在根本性的限制。 为了解决这个问题,本文提出了一种两阶段的粗到精感知框架,该框架利用经过自定义设计的稀疏导频结构(SPS)增强的标准5G NR初始接入信号,以实现高精度的无人机(UAV)感知。 在第一阶段,我们首先融合来自SSB、Type#0-PDCCH和系统信息块1(SIB1)的信息,以确保初始目标检测。 在第二阶段,引入了一种精化的估计算法,以克服这些信号的分辨率限制。 受稀疏阵列理论的启发,本阶段采用了一种新颖的SPS,该SPS被插入到CORSET#0带宽内的资源块(RBs)中。 为了准确地从这些稀疏导频中提取离格范围和速度参数,我们开发了一种基于加权展开相位(WUP)技术和基于RELAX的迭代方法的相应高分辨率算法。 最后,采用基于密度的空间聚类算法(DBSCAN)来修剪由于波束重叠引起的冗余检测。 全面的仿真结果表明,所提出的框架在估计精度和计算效率方面优于其他技术。
The evolution of next-generation wireless networks has spurred the vigorous development of the low-altitude economy (LAE). To support this emerging field while remaining compatible with existing network architectures, integrated sensing and communication (ISAC) based on 5G New Radio (NR) signals is regarded as a promising solution. However, merely leveraging standard 5G NR signals, such as the Synchronization Signal Block (SSB), presents fundamental limitations in sensing resolution. To address the issue, this paper proposes a two-stage coarse-to-fine sensing framework that utilizes standard 5G NR initial access signals augmented by a custom-designed sparse pilot structure (SPS) for high-precision unmanned aerial vehicles (UAV) sensing. In Stage I, we first fuse information from the SSB, Type\#0-PDCCH, and system information block 1 (SIB1) to ensure the initial target detection. In Stage II, a refined estimation algorithm is introduced to overcome the resolution limitations of these signals. Inspired by the sparse array theory, this stage employs a novel SPS, which is inserted into resource blocks (RBs) within the CORSET\#0 bandwidth. To accurately extract the off-grid range and velocity parameters from these sparse pilots, we develop a corresponding high-resolution algorithm based on the weighted unwrapped phase (WUP) technique and the RELAX-based iterative method. Finally, the density-based spatial clustering of applications with noise (DBSCAN) algorithm is adopted to prune the redundant detections arising from beam overlap. Comprehensive simulation results demonstrate the superior estimation accuracy and computational efficiency of the proposed framework in comparison to other techniques.
- [51] arXiv:2511.14197 [中文pdf, pdf, html, 其他]
-
标题: 基于数据集平均精度的边缘贡献的在线目标检测数据整理标题: Online Data Curation for Object Detection via Marginal Contributions to Dataset-level Average Precision评论: 预印本版本,正在审稿中主题: 计算机视觉与模式识别 (cs.CV)
高质量数据已成为规模定律下进步的主要驱动力,经过筛选的数据集通常以较低成本优于更大但未过滤的数据集。在线数据筛选通过根据模型的演变状态动态选择训练样本,扩展了这一理念。尽管在分类和多模态学习中有效,现有的在线采样策略很少扩展到目标检测,因为其结构复杂性和领域差距。我们引入了DetGain,这是一种专为目标检测设计的在线数据筛选方法,它基于预测质量估计每张图像对数据集级平均精度(AP)的边际扰动。通过建模全局得分分布,DetGain高效估计全局AP变化,并计算教师-学生贡献差距,在每次迭代中选择信息丰富的样本。该方法与架构无关且侵入性最小,能够简便地集成到各种目标检测架构中。在COCO数据集上使用多个代表性检测器进行的实验显示了准确率的一致提升。DetGain在低质量数据下也表现出强大的鲁棒性,并可以有效地与知识蒸馏技术结合以进一步提升性能,突显了其作为数据高效目标检测的通用且补充策略的潜力。
High-quality data has become a primary driver of progress under scale laws, with curated datasets often outperforming much larger unfiltered ones at lower cost. Online data curation extends this idea by dynamically selecting training samples based on the model's evolving state. While effective in classification and multimodal learning, existing online sampling strategies rarely extend to object detection because of its structural complexity and domain gaps. We introduce DetGain, an online data curation method specifically for object detection that estimates the marginal perturbation of each image to dataset-level Average Precision (AP) based on its prediction quality. By modeling global score distributions, DetGain efficiently estimates the global AP change and computes teacher-student contribution gaps to select informative samples at each iteration. The method is architecture-agnostic and minimally intrusive, enabling straightforward integration into diverse object detection architectures. Experiments on the COCO dataset with multiple representative detectors show consistent improvements in accuracy. DetGain also demonstrates strong robustness under low-quality data and can be effectively combined with knowledge distillation techniques to further enhance performance, highlighting its potential as a general and complementary strategy for data-efficient object detection.
- [52] arXiv:2511.14410 [中文pdf, pdf, html, 其他]
-
标题: TTA:跨语言语音表示的转录、翻译和对齐标题: TTA: Transcribe, Translate and Alignment for Cross-lingual Speech Representation评论: 提交至ICASSP2026主题: 音频与语音处理 (eess.AS)
语音-大语言模型(LLM)在多模态和多任务语音理解方面表现出色。 一种典型的语音-大语言模型范式是将语音模态与大语言模型(LLM)相结合。 尽管之前的研究所经常采用Whisper编码器作为语音输入,但它在输入格式、模型规模和语义性能方面存在局限性。 为此,我们提出了一种专注于语音语义的轻量级TTA模型,以实现更有效的LLM集成。 通过在多语言语音识别(ASR)、语音翻译(ST)和语音-文本对齐任务上进行358000小时语音数据的大规模训练,TTA能够生成稳健的跨语言语音表示。 在包括ASR/ST、语音检索和ASR-LLM性能评估在内的多种基准测试中进行的广泛评估表明,TTA优于Whisper。 此外,我们严格验证了跨语言能力与ASR/ST性能之间的相互作用。 TTA的模型权重和训练方案将作为音频理解工具包Auden的一部分发布。
Speech-LLM models have demonstrated great performance in multi-modal and multi-task speech understanding. A typical speech-LLM paradigm is integrating speech modality with a large language model (LLM). While the Whisper encoder was frequently adopted in previous studies for speech input, it shows limitations regarding input format, model scale, and semantic performance. To this end, we propose a lightweight TTA model specialized in speech semantics for more effective LLM integration. With large-scale training of 358k hours of speech data on multilingual speech recognition (ASR), speech translation (ST) and speech-text alignment tasks, TTA is capable of producing robust cross-lingual speech representations. Extensive evaluations across diverse benchmarks, including ASR/ST, speech retrieval, and ASR-LLM performance assessments, demonstrate TTA's superiority over Whisper. Furthermore, we rigorously validate the interplay between cross-lingual capabilities and ASR/ST performance. The model weights and training recipes of TTA will be released as part of an audio understanding toolkit Auden.
- [53] arXiv:2511.14749 [中文pdf, pdf, html, 其他]
-
标题: 视觉大语言模型在参与度分析中是良好的噪声处理者标题: Vision Large Language Models Are Good Noise Handlers in Engagement Analysis主题: 计算机视觉与模式识别 (cs.CV)
视频数据集中的参与度识别,与传统的图像分类任务不同,特别受到主观标签和噪声的挑战,限制了模型性能。 为了克服主观和噪声参与度标签的挑战,我们提出了一种框架,利用视觉大语言模型(VLMs)来精炼标注并指导训练过程。 我们的框架使用问卷提取行为线索,并将数据分为高可靠性和低可靠性子集。 我们还引入了一种结合课程学习和软标签精炼的训练策略,在逐步纳入模糊样本的同时调整监督以反映不确定性。 我们证明了在精炼的高可靠性子集上训练的经典计算机视觉模型,结合我们的课程策略后表现出改进,突显了使用VLMs解决标签主观性的优势。 该方法在参与度基准如EngageNet(六种特征设置中的三种,最大提升+1.21%)以及DREAMS / PAFE上超过了之前最先进的方法,F1得分分别提升了+0.22 / +0.06。
Engagement recognition in video datasets, unlike traditional image classification tasks, is particularly challenged by subjective labels and noise limiting model performance. To overcome the challenges of subjective and noisy engagement labels, we propose a framework leveraging Vision Large Language Models (VLMs) to refine annotations and guide the training process. Our framework uses a questionnaire to extract behavioral cues and split data into high- and low-reliability subsets. We also introduce a training strategy combining curriculum learning with soft label refinement, gradually incorporating ambiguous samples while adjusting supervision to reflect uncertainty. We demonstrate that classical computer vision models trained on refined high-reliability subsets and enhanced with our curriculum strategy show improvements, highlighting benefits of addressing label subjectivity with VLMs. This method surpasses prior state of the art across engagement benchmarks such as EngageNet (three of six feature settings, maximum improvement of +1.21%), and DREAMS / PAFE with F1 gains of +0.22 / +0.06.
- [54] arXiv:2511.14691 [中文pdf, pdf, html, 其他]
-
标题: 通过突触可塑性的注意力:你所需要的一切,一种生物启发的脉冲类脑变换器标题: Attention via Synaptic Plasticity is All You Need: A Biologically Inspired Spiking Neuromorphic Transformer评论: 21页,5图,3表主题: 神经与进化计算 (cs.NE) ; 人工智能 (cs.AI) ; 计算机视觉与模式识别 (cs.CV) ; 新兴技术 (cs.ET) ; 机器学习 (stat.ML)
注意力是大脑选择性地关注少数特定方面而忽略无关方面的能力。 这一生物原理启发了现代Transformer中的注意力机制。 Transformer现在支撑着像GPT这样的大型语言模型(LLM),但代价是巨大的训练和推理能耗,导致较大的碳足迹。 虽然大脑的注意力源于神经回路,但Transformer注意力依赖于点积相似性来加权输入序列中的元素。 类脑计算,特别是脉冲神经网络(SNNs),为高效能智能提供了一条受大脑启发的路径。 尽管已有基于注意力的脉冲Transformer的研究,但核心注意力层仍然不是类脑的。 当前的脉冲注意力(i)依赖于适合浮点运算的点积或逐元素相似性,而不是事件驱动的脉冲;(ii)保留注意力矩阵,受到冯·诺依曼瓶颈的影响,限制了内存内计算;并且(iii)仍然与类脑计算存在差距。 为了解决这些问题,我们提出了脉冲STDP Transformer (S$^{2}$TDPT),这是一种通过脉冲时间依赖可塑性(STDP)实现自注意力的类脑Transformer,将查询-键相关性嵌入突触权重中。 STDP是大脑中记忆和学习的核心机制,在类脑设备中被广泛研究,自然地实现了内存内计算并支持非冯·诺依曼硬件。 在CIFAR-10和CIFAR-100上,我们的模型仅用四个时间步就达到了94.35%和78.08%的准确率,在CIFAR-100上耗能0.49 mJ,相比标准ANN Transformer能耗降低了88.47%。 Grad-CAM显示该模型关注语义相关的区域,增强了可解释性。 总体而言,S$^{2}$TDPT展示了如何通过生物启发的注意力实现节能、硬件友好且可解释的类脑模型。
Attention is the brain's ability to selectively focus on a few specific aspects while ignoring irrelevant ones. This biological principle inspired the attention mechanism in modern Transformers. Transformers now underpin large language models (LLMs) such as GPT, but at the cost of massive training and inference energy, leading to a large carbon footprint. While brain attention emerges from neural circuits, Transformer attention relies on dot-product similarity to weight elements in the input sequence. Neuromorphic computing, especially spiking neural networks (SNNs), offers a brain-inspired path to energy-efficient intelligence. Despite recent work on attention-based spiking Transformers, the core attention layer remains non-neuromorphic. Current spiking attention (i) relies on dot-product or element-wise similarity suited to floating-point operations, not event-driven spikes; (ii) keeps attention matrices that suffer from the von Neumann bottleneck, limiting in-memory computing; and (iii) still diverges from brain-like computation. To address these issues, we propose the Spiking STDP Transformer (S$^{2}$TDPT), a neuromorphic Transformer that implements self-attention through spike-timing-dependent plasticity (STDP), embedding query--key correlations in synaptic weights. STDP, a core mechanism of memory and learning in the brain and widely studied in neuromorphic devices, naturally enables in-memory computing and supports non-von Neumann hardware. On CIFAR-10 and CIFAR-100, our model achieves 94.35\% and 78.08\% accuracy with only four timesteps and 0.49 mJ on CIFAR-100, an 88.47\% energy reduction compared to a standard ANN Transformer. Grad-CAM shows that the model attends to semantically relevant regions, enhancing interpretability. Overall, S$^{2}$TDPT illustrates how biologically inspired attention can yield energy-efficient, hardware-friendly, and explainable neuromorphic models.
- [55] arXiv:2511.07185 [中文pdf, pdf, html, 其他]
-
标题: 使用紧凑麦克风阵列的神经方向滤波标题: Neural Directional Filtering Using a Compact Microphone Array主题: 音频与语音处理 (eess.AS)
使用紧凑麦克风阵列实现所需的定向模式在许多音频应用中是必不可少的。 传统波束成形器可实现的定向模式取决于麦克风的数量和阵列孔径。 通常,它们在紧凑阵列中的效果会下降。 为了克服这些限制,我们提出了一种神经方向过滤(NDF)方法,该方法利用深度神经网络来实现具有预定义定向模式的声音捕获。 NDF从麦克风阵列信号中计算出一个单通道复数掩码,然后将其应用于参考麦克风以产生一个输出,该输出近似于具有所需定向模式的虚拟方向性麦克风。 我们介绍了训练策略,并提出了数据相关的度量标准来评估定向模式和定向因子。 我们表明,所提出的方法:i)即使在空间混叠频率以上也能实现频率不变的定向模式,ii)可以近似多种和高阶模式,iii)可以将模式引导到不同方向,iv)能推广到未见过的条件。 最后,实验比较显示了优于传统波束成形和参数化方法的性能。
Beamforming with desired directivity patterns using compact microphone arrays is essential in many audio applications. Directivity patterns achievable using traditional beamformers depend on the number of microphones and the array aperture. Generally, their effectiveness degrades for compact arrays. To overcome these limitations, we propose a neural directional filtering (NDF) approach that leverages deep neural networks to enable sound capture with a predefined directivity pattern. The NDF computes a single-channel complex mask from the microphone array signals, which is then applied to a reference microphone to produce an output that approximates a virtual directional microphone with the desired directivity pattern. We introduce training strategies and propose data-dependent metrics to evaluate the directivity pattern and directivity factor. We show that the proposed method: i) achieves a frequency-invariant directivity pattern even above the spatial aliasing frequency, ii) can approximate diverse and higher-order patterns, iii) can steer the pattern in different directions, and iv) generalizes to unseen conditions. Lastly, experimental comparisons demonstrate superior performance over conventional beamforming and parametric approaches.
- [56] arXiv:2503.21071 [中文pdf, pdf, html, 其他]
-
标题: 用随机后处理净化近似差分隐私标题: Purifying Approximate Differential Privacy with Randomized Post-processingMichael A. Boateng, Russell Bent, Sidhant Misra, Parikshit Pareek, Pascal Van Hentenryck, Daniel Molzahn主题: 密码学与安全 (cs.CR) ; 机器学习 (cs.LG) ; 机器学习 (stat.ML)
We propose a framework to convert $(\varepsilon, δ)$-approximate Differential Privacy (DP) mechanisms into $(\varepsilon', 0)$-pure DP mechanisms under certain conditions, a process we call ``purification.'' This algorithmic technique leverages randomized post-processing with calibrated noise to eliminate the $δ$ parameter while achieving near-optimal privacy-utility tradeoff for pure DP. It enables a new design strategy for pure DP algorithms: first run an approximate DP algorithm with certain conditions, and then purify. This approach allows one to leverage techniques such as strong composition and propose-test-release that require $δ>0$ in designing pure-DP methods with $δ=0$. We apply this framework in various settings, including Differentially Private Empirical Risk Minimization (DP-ERM), stability-based release, and query release tasks. To the best of our knowledge, this is the first work with a statistically and computationally efficient reduction from approximate DP to pure DP. Finally, we illustrate the use of this reduction for proving lower bounds under approximate DP constraints with explicit dependence in $δ$, avoiding the sophisticated fingerprinting code construction.
We propose a framework to convert $(\varepsilon, δ)$-approximate Differential Privacy (DP) mechanisms into $(\varepsilon', 0)$-pure DP mechanisms under certain conditions, a process we call ``purification.'' This algorithmic technique leverages randomized post-processing with calibrated noise to eliminate the $δ$ parameter while achieving near-optimal privacy-utility tradeoff for pure DP. It enables a new design strategy for pure DP algorithms: first run an approximate DP algorithm with certain conditions, and then purify. This approach allows one to leverage techniques such as strong composition and propose-test-release that require $δ>0$ in designing pure-DP methods with $δ=0$. We apply this framework in various settings, including Differentially Private Empirical Risk Minimization (DP-ERM), stability-based release, and query release tasks. To the best of our knowledge, this is the first work with a statistically and computationally efficient reduction from approximate DP to pure DP. Finally, we illustrate the use of this reduction for proving lower bounds under approximate DP constraints with explicit dependence in $δ$, avoiding the sophisticated fingerprinting code construction.
新提交 (展示 56 之 56 条目 )
- [57] arXiv:2511.14109 (交叉列表自 cs.CV) [中文pdf, pdf, html, 其他]
-
标题: $A^2$GC:$A$对称$A$聚合与几何约束用于局部聚合描述符标题: $A^2$GC: $A$symmetric $A$ggregation with Geometric Constraints for Locally Aggregated Descriptors评论: 8页,4图主题: 计算机视觉与模式识别 (cs.CV)
视觉位置识别(VPR)旨在使用视觉线索将查询图像与数据库进行匹配。最先进的方法从深度主干网络中聚合特征以形成全局描述符。基于最优传输的聚合方法将特征到聚类的分配重新表述为一个传输问题,但标准的Sinkhorn算法对源和目标边缘分布进行对称处理,在图像特征和聚类中心表现出显著不同的分布时,会限制效果。我们提出了一种带有几何约束的非对称聚合VPR方法,称为$A^2$GC-VPR。我们的方法采用行列归一化平均与独立边缘校准,实现适应视觉位置识别中分布差异的非对称匹配。通过可学习的坐标嵌入引入几何约束,计算与特征相似性融合的兼容性得分,从而促进空间邻近的特征进入同一聚类并增强空间感知能力。在MSLS、NordLand和Pittsburgh数据集上的实验结果表明了方法的优越性能,验证了该方法在提高匹配准确性和鲁棒性方面的有效性。
Visual Place Recognition (VPR) aims to match query images against a database using visual cues. State-of-the-art methods aggregate features from deep backbones to form global descriptors. Optimal transport-based aggregation methods reformulate feature-to-cluster assignment as a transport problem, but the standard Sinkhorn algorithm symmetrically treats source and target marginals, limiting effectiveness when image features and cluster centers exhibit substantially different distributions. We propose an asymmetric aggregation VPR method with geometric constraints for locally aggregated descriptors, called $A^2$GC-VPR. Our method employs row-column normalization averaging with separate marginal calibration, enabling asymmetric matching that adapts to distributional discrepancies in visual place recognition. Geometric constraints are incorporated through learnable coordinate embeddings, computing compatibility scores fused with feature similarities, thereby promoting spatially proximal features to the same cluster and enhancing spatial awareness. Experimental results on MSLS, NordLand, and Pittsburgh datasets demonstrate superior performance, validating the effectiveness of our approach in improving matching accuracy and robustness.
- [58] arXiv:2511.14107 (交叉列表自 cs.CV) [中文pdf, pdf, html, 其他]
-
标题: RTS-Mono:一种用于实际部署的实时自监督单目深度估计方法标题: RTS-Mono: A Real-Time Self-Supervised Monocular Depth Estimation Method for Real-World Deployment评论: 14页,10图主题: 计算机视觉与模式识别 (cs.CV)
深度信息对于自动驾驶和智能机器人导航至关重要。 自监督单目深度估计的简单性和灵活性有助于其在这些领域中发挥作用。 然而,大多数现有的单目深度估计模型消耗大量计算资源。 尽管一些方法已经减少了模型的大小并提高了计算效率,但性能下降严重阻碍了自监督单目深度估计模型在现实世界中的部署。 为了解决这个问题,我们提出了一种实时自监督单目深度估计方法,并在现实世界中进行了实现。 它被称为RTS-Mono,是一种轻量级且高效的编码器-解码器架构。 编码器基于Lite-Encoder,解码器设计了一个多尺度稀疏融合框架,以最小化冗余,确保性能并提高推理速度。 在基于KITTI数据集的实验中,RTS-Mono在高分辨率和低分辨率下都达到了最先进的(SoTA)性能,参数量极低(3 M)。 与轻量级方法相比,RTS-Mono在低分辨率下Abs Rel和Sq Rel分别提高了5.6%和9.8%,在高分辨率下Sq Rel和RMSE分别提高了6.1%和1.9%。 在现实世界部署实验中,RTS-Mono具有极高的准确性,并且可以在Nvidia Jetson Orin上以49 FPS的速度进行实时推理。 源代码可在https://github.com/ZYCheng777/RTS-Mono获取。
Depth information is crucial for autonomous driving and intelligent robot navigation. The simplicity and flexibility of self-supervised monocular depth estimation are conducive to its role in these fields. However, most existing monocular depth estimation models consume many computing resources. Although some methods have reduced the model's size and improved computing efficiency, the performance deteriorates, seriously hindering the real-world deployment of self-supervised monocular depth estimation models in the real world. To address this problem, we proposed a real-time self-supervised monocular depth estimation method and implemented it in the real world. It is called RTS-Mono, which is a lightweight and efficient encoder-decoder architecture. The encoder is based on Lite-Encoder, and the decoder is designed with a multi-scale sparse fusion framework to minimize redundancy, ensure performance, and improve inference speed. RTS-Mono achieved state-of-the-art (SoTA) performance in high and low resolutions with extremely low parameter counts (3 M) in experiments based on the KITTI dataset. Compared with lightweight methods, RTS-Mono improved Abs Rel and Sq Rel by 5.6% and 9.8% at low resolution and improved Sq Rel and RMSE by 6.1% and 1.9% at high resolution. In real-world deployment experiments, RTS-Mono has extremely high accuracy and can perform real-time inference on Nvidia Jetson Orin at a speed of 49 FPS. Source code is available at https://github.com/ZYCheng777/RTS-Mono.
- [59] arXiv:2511.13789 (交叉列表自 cs.CR) [中文pdf, pdf, html, 其他]
-
标题: 揭示和对齐异常注意力头以防御NLP后门攻击标题: Uncovering and Aligning Anomalous Attention Heads to Defend Against NLP Backdoor Attacks主题: 密码学与安全 (cs.CR) ; 人工智能 (cs.AI)
后门攻击对大型语言模型(LLMs)的安全性构成严重威胁,导致它们在特定触发条件下的行为出现异常。 后门触发器的设计已从固定触发器演变为动态或隐式触发器。 这种触发器设计的灵活性使得防御者难以准确识别其具体形式。 现有的大多数后门防御方法仅限于特定类型的触发器,或者依赖于额外的干净模型来提供支持。 为了解决这个问题,我们提出了一种基于注意力相似性的后门检测方法,使后门检测无需事先了解触发器。 我们的研究表明,受到后门攻击的模型在暴露于触发器时,注意力头之间表现出异常高的相似性。 基于这一观察,我们提出了一种结合逐头微调的注意力安全对齐方法,以修正可能被污染的注意力头,从而有效缓解后门攻击的影响。 大量实验结果表明,我们的方法显著降低了后门攻击的成功率,同时保持了模型在下游任务上的性能。
Backdoor attacks pose a serious threat to the security of large language models (LLMs), causing them to exhibit anomalous behavior under specific trigger conditions. The design of backdoor triggers has evolved from fixed triggers to dynamic or implicit triggers. This increased flexibility in trigger design makes it challenging for defenders to identify their specific forms accurately. Most existing backdoor defense methods are limited to specific types of triggers or rely on an additional clean model for support. To address this issue, we propose a backdoor detection method based on attention similarity, enabling backdoor detection without prior knowledge of the trigger. Our study reveals that models subjected to backdoor attacks exhibit unusually high similarity among attention heads when exposed to triggers. Based on this observation, we propose an attention safety alignment approach combined with head-wise fine-tuning to rectify potentially contaminated attention heads, thereby effectively mitigating the impact of backdoor attacks. Extensive experimental results demonstrate that our method significantly reduces the success rate of backdoor attacks while preserving the model's performance on downstream tasks.
- [60] arXiv:2511.14023 (交叉列表自 cs.AI) [中文pdf, pdf, html, 其他]
-
标题: Syn-STARTS:用于可扩展大语言模型评估的合成START分诊场景生成框架标题: Syn-STARTS: Synthesized START Triage Scenario Generation Framework for Scalable LLM Evaluation评论: 介绍一个开放数据集主题: 人工智能 (cs.AI)
分诊是大规模伤亡事件(MCIs)中一个至关重要的决策过程,旨在最大化伤者生存率。 尽管人工智能在这种情况下的作用正在引起关注,以在有限的资源和时间内做出最佳决策,但其开发和性能评估需要足够数量和质量的基准数据集。 然而,MCIs发生频率较低,现场难以积累足够的记录,这使得收集大规模真实世界数据用于研究变得具有挑战性。 因此,我们开发了Syn-STARTS,一个使用大语言模型生成分诊案例的框架,并验证了其有效性。 结果表明,由Syn-STARTS生成的分诊案例在质量上与通过人工整理训练材料生成的TRIAGE公开数据集无法区分。 此外,当使用标准分诊方法START定义的绿色、黄色、红色和黑色类别中的数百个案例来评估大语言模型的准确性时,结果表现出高度的稳定性。 这强烈表明,在开发严重和危及生命医疗情况的高性能人工智能模型中,合成数据的可能性。
Triage is a critically important decision-making process in mass casualty incidents (MCIs) to maximize victim survival rates. While the role of AI in such situations is gaining attention for making optimal decisions within limited resources and time, its development and performance evaluation require benchmark datasets of sufficient quantity and quality. However, MCIs occur infrequently, and sufficient records are difficult to accumulate at the scene, making it challenging to collect large-scale realworld data for research use. Therefore, we developed Syn-STARTS, a framework that uses LLMs to generate triage cases, and verified its effectiveness. The results showed that the triage cases generated by Syn-STARTS were qualitatively indistinguishable from the TRIAGE open dataset generated by manual curation from training materials. Furthermore, when evaluating the LLM accuracy using hundreds of cases each from the green, yellow, red, and black categories defined by the standard triage method START, the results were found to be highly stable. This strongly indicates the possibility of synthetic data in developing high-performance AI models for severe and critical medical situations.
- [61] arXiv:2511.14201 (交叉列表自 physics.chem-ph) [中文pdf, pdf, html, 其他]
-
标题: 多电荷转移驱动的复杂反应动力学:共价键与范德华相互作用的结合标题: Multiple charge transfer driven complex reaction dynamics: covalent bonding meets van der Waals interactionsJiangyifei Zhu, Yuzhe Wang, Tao Qiang, Vu Phan, Zhixiong Li, Evy Meinders, Eni Halilaj, Justin Chan, Swarun Kumar评论: 18页,4图主题: 化学物理 (physics.chem-ph)
在针对复杂凝聚态化学或生物系统的实验中,分子内部或分子之间的多重电荷转移(包括伴随的分子结构演变和能量分布)的细节通常无法在单个分子层面上获得。 为了获得这种详细的洞察,需要识别并研究涵盖这些过程本质的小型原型系统。 在此,我们采用由共价键和范德华力结合而成的小系统进行研究,即N2Ar二聚体。 我们使用同步辐射光源选择性地实现电荷转移过程,并对产生的电子和离子进行符合测量。 结合从头算计算,这种方法能够逐步跟踪电荷转移和碎片化动力学。 我们发现,二聚体的超快结构演变可以触发第二次电荷转移,从而打开复杂的反应路径,在此路径中,电子在Ar和N2之间来回转移,并通过锥形交叉点发生两次非绝热跃迁。 这些结果表明,多重电荷转移引起的跃迁,尤其是在这种简单的二聚体系统中,为复杂自然系统中的非绝热反应机制提供了基准性的见解。
The details of multiple charge transfer within or among molecules (including the accompanying molecular structure evolution and energy distribution) are typically not accessible on a single molecule level in experiments targeting complex condensed chemical or biological systems. In order to gather such detailed insight, small prototype systems that cover the essence of such processes need to be identified and investigated. Here, we employ a small system consisting of a combination of covalent and van der Waals bonds for our studies, namely N2Ar dimers. We use synchrotron radiation to site-selectively enable the charge transfer processes and perform a coincidence measurement of the resulting electrons and ions. In combination with ab initio calculations, this approach enables a step-by-step tracking of the charge transfer and fragmentation dynamics. We find that ultrafast structural evolution of the dimer can trigger a second CT, thereby opening complex reaction pathways, in which electrons transfer back and forth between Ar and N2, and nonadiabatic transitions occur twice through conical intersections. These results demonstrate that multiple CT-induced transitions, particularly in such a simple dimer system, provide benchmark insights into the mechanisms of nonadiabatic reactions in complex natural systems.
- [62] arXiv:2511.14463 (交叉列表自 math.DG) [中文pdf, pdf, html, 其他]
-
标题: 第一拉普拉斯特征值的强各向同性不可约空间标题: First Laplace eigenvalue of strongly isotropy irreducible spaces主题: 微分几何 (math.DG) ; 谱理论 (math.SP)
我们研究与任何紧致强各向同性不可约空间相关的拉普拉斯-贝尔特拉米算子的最小正特征值$λ_1$。 我们提供了所有单连通情况的显式表达式。 此外,每个强各向同性不可约空间自动是一个爱因斯坦流形,我们证明了对于每一个这样的空间,$E<λ_1\leq 16E$,其中$E$表示相应的爱因斯坦常数。
We study the smallest positive eigenvalue $λ_1$ of the Laplace-Beltrami operator associated with any compact strongly isotropy irreducible space. We provide an explicit expression for all simply connected cases. Furthermore, every strongly isotropy irreducible space is automatically an Einstein manifold, and we prove for each of them that $E<λ_1\leq 16E$, where $E$ denotes the corresponding Einstein constant.
- [63] arXiv:2511.14466 (交叉列表自 q-bio.NC) [中文pdf, pdf, html, 其他]
-
标题: 多巴胺在增强皮层-纹状体-丘脑-皮层环路尖峰信噪比中的作用标题: Effect of Dopamine in Enhancement of SNR of Cortico-Striatal-Thalamo-Cortical Loop Spiking评论: 9页主题: 神经与认知 (q-bio.NC)
在本工作中,研究了多巴胺神经递质在皮层-纹状体-丘脑-皮层(CSTC)环路中的作用。 模拟结果证实,多巴胺通过丘脑去抑制促进运动。 对其对信噪比(SNR)的影响进行分析显示,结果复杂且具有区域特异性:SNR在某些区域增加(例如,D2纹状体:3.41 dB至6.25 dB),在其他区域减少(例如,丘脑VL:6.24 dB至3.93 dB),而在其他区域保持稳定(例如,M1:3.16 dB至3.13 dB)。 这种异质性源于多巴胺增加了表达D1受体的神经元的兴奋性,这会增强通道传导噪声并在特定电路中降低SNR。 因此,多巴胺的作用并非作为统一的信号增强器,而是一个复杂的调节因子,在CSTC环路中关键地平衡促进作用和噪声。
In this work, the effects of dopamine neurotransmitter within the Cortico-Striatal-Thalamo-Cortical (CSTC) loop. Simulations confirmed dopamine facilitates movement via thalamic disinhibition. Analysis of its impact on the signal-to-noise ratio (SNR) revealed a complex, region-specific outcome: SNR increased in some regions (e.g., D2 Striatum: 3.41 dB to 6.25 dB), decreased in others (e.g., Thalamus VL: 6.24 dB to 3.93 dB), and remained stable elsewhere (e.g., M1: 3.16 dB to 3.13 dB). This heterogeneity stems from dopamine increasing the excitability of D1-receptor-expressing neurons, which amplifies channel conductance noise and reduces SNR in specific circuits. Thus, dopamine acts not as a uniform signal enhancer, but as a complex modulator that critically balances facilitation and noise within the CSTC loop.
- [64] arXiv:2511.14036 (交叉列表自 physics.plasm-ph) [中文pdf, pdf, 其他]
-
标题: 使用三维动力学粒子-网格模拟中的维度缩放栅极离子推力器理解腔内等离子体行为标题: Understanding In-Chamber Plasma Behavior Using a Dimensionally Scaled Gridded Ion Thruster in Three-Dimensional Kinetic Particle-in-Cell Simulations主题: 等离子体物理 (physics.plasm-ph) ; 计算物理 (physics.comp-ph)
我们使用一个全动能、三维粒子-蒙特卡洛碰撞(PIC-MCC)求解器,结合直接模拟蒙特卡洛(DSMC)中性背景,研究设施效应对缩小尺度网格离子推力器羽流的影响。 这种方法使得在地面测试条件下能够详细检查控制束流中和和壁面相互作用的关键等离子体过程。 我们发现,非弹性电子冷却对于实现物理上一致的中和束流是必不可少的。 增加背景压力会增强离子-中性碰撞,导致更多的电荷和动量交换事件,从而降低离子平均能量,扩展束流,并增加侧壁损失。 包括非弹性过程可以平坦化电势,维持准中性,并在下游更远的地方保持束流的准直性。 单粒子轨迹分析显示,主要电子经历混合逃逸和暂时捕获,而低能非弹性后电子则被限制在其中,维持中和云。 鞘层诊断表明,在束流终止处,经典Child-Langmuir和Hutchinson模型由于残留电子而低估了鞘层长度,而在侧壁附近,鞘层由于束流-鞘层干扰而在紧凑区域内被截断。 电流流动分析表明,较高的背景压力条件会导致较低的束流能量和增加的侧壁电流。
We investigate facility effects on a reduced-scale gridded ion thruster plume using a fully kinetic, three-dimensional Particle-in-Cell/Monte Carlo Collision (PIC-MCC) solver coupled with a Direct Simulation Monte Carlo (DSMC) neutral background. This approach enables detailed examination of key plasma processes governing beam neutralization and wall interactions under ground-test conditions. We find that inelastic electron cooling is essential for achieving a physically consistent, neutralized beam. Increasing the background pressure enhances ion-neutral collisions, leading to more charge- and momentum-exchange events that reduce ion mean energies, broaden the beam, and increase sidewall losses. Including inelastic processes flattens the potential, sustains quasi-neutrality, and preserves beam collimation farther downstream. Single-particle trajectory analyses show that primary electrons undergo mixed escape and temporary trapping, while low energy post-inelastic electrons remain confined, sustaining the neutralization cloud. Sheath diagnostics reveal that at the beam dump, classical Child-Langmuir and Hutchinson models underpredict the sheath length due to residual electrons, while near the sidewall, the sheath is truncated by beam-sheath interference within the compact domain. Current-flow analysis indicates that higher background pressure conditions yield lower beam energies and increased sidewall currents.
- [65] arXiv:2511.13752 (交叉列表自 cs.LG) [中文pdf, pdf, html, 其他]
-
标题: 基于空间加权脑电特征融合的运动想象分类标题: Motor Imagery Classification Using Feature Fusion of Spatially Weighted Electroencephalography主题: 机器学习 (cs.LG) ; 人工智能 (cs.AI)
脑机接口(BCI)将人脑与外部世界连接,提供直接的通信通道。 脑电图(EEG)信号常用于BCI中,以反映与运动功能活动相关的认知模式。 然而,由于EEG信号的多通道特性,显式的信息处理对于减轻BCI系统中的计算复杂性至关重要。 本研究提出了一种基于脑区特异性通道选择和多域特征融合的创新方法,以提高分类准确性。 所提出方法的创新之处在于基于区域的通道选择,其中根据EEG通道与不同脑区的功能相关性进行分组。 通过根据运动想象(MI)任务中涉及的特定区域选择通道,该技术消除了无关通道,降低了数据维度并提高了计算效率。 这也确保了提取的特征更能反映与运动任务相关的实际脑活动。 三种不同的特征提取方法:通用空间模式(CSP)、模糊C均值聚类和切线空间映射(TSM),根据其脑区对每组通道进行应用。 每种方法针对EEG信号的不同特征:CSP关注空间模式,模糊C均值识别数据中的聚类,TSM捕捉信号中的非线性模式。 组合特征向量用于使用支持向量机(SVM)对运动想象任务(左手、右手和右脚)进行分类。 所提出的方法在公开的基准EEG数据集(IVA和I)上进行了验证,这些数据集来自BCI竞赛第三和第四届。 结果表明,该方法优于现有方法,在数据集IVA和I上的分类准确率分别达到90.77%和84.50%。
A Brain Computer Interface (BCI) connects the human brain to the outside world, providing a direct communication channel. Electroencephalography (EEG) signals are commonly used in BCIs to reflect cognitive patterns related to motor function activities. However, due to the multichannel nature of EEG signals, explicit information processing is crucial to lessen computational complexity in BCI systems. This study proposes an innovative method based on brain region-specific channel selection and multi-domain feature fusion to improve classification accuracy. The novelty of the proposed approach lies in region-based channel selection, where EEG channels are grouped according to their functional relevance to distinct brain regions. By selecting channels based on specific regions involved in motor imagery (MI) tasks, this technique eliminates irrelevant channels, reducing data dimensionality and improving computational efficiency. This also ensures that the extracted features are more reflective of the brain actual activity related to motor tasks. Three distinct feature extraction methods Common Spatial Pattern (CSP), Fuzzy C-means clustering, and Tangent Space Mapping (TSM), are applied to each group of channels based on their brain region. Each method targets different characteristics of the EEG signal: CSP focuses on spatial patterns, Fuzzy C means identifies clusters within the data, and TSM captures non-linear patterns in the signal. The combined feature vector is used to classify motor imagery tasks (left hand, right hand, and right foot) using Support Vector Machine (SVM). The proposed method was validated on publicly available benchmark EEG datasets (IVA and I) from the BCI competition III and IV. The results show that the approach outperforms existing methods, achieving classification accuracies of 90.77% and 84.50% for datasets IVA and I, respectively.
- [66] arXiv:2511.14665 (交叉列表自 cs.CC) [中文pdf, pdf, html, 其他]
-
标题: 求解器的悖论在形式问题空间中标题: The Solver's Paradox in Formal Problem Spaces评论: 通过分析非predicativity如何在算术化问题空间中传递,进行全球决策问题的结构分析,结合对角化、反射和统一复杂性。18页主题: 计算复杂性 (cs.CC) ; 计算机科学中的逻辑 (cs.LO) ; 逻辑 (math.LO)
本文研究了在算术表示的域上全局决策问题如何通过类量化获得反射结构。 算术化强制产生对角固定点,其验证需要超越有限手段的反射,从而产生与计算技术无关的费弗曼风格障碍。 我们利用这一机制来分析统一复杂性陈述,包括$\mathsf{P}$对$\mathsf{NP}$,表明它们的难度源于结构上的非递归性,而非方法上的限制。 重点不是推导分离,而是澄清此类算术化断言的逻辑状态。
This paper investigates how global decision problems over arithmetically represented domains acquire reflective structure through class-quantification. Arithmetization forces diagonal fixed points whose verification requires reflection beyond finitary means, producing Feferman-style obstructions independent of computational technique. We use this mechanism to analyze uniform complexity statements, including $\mathsf{P}$ vs. $\mathsf{NP}$, showing that their difficulty stems from structural impredicativity rather than methodological limitations. The focus is not on deriving separations but on clarifying the logical status of such arithmetized assertions.
- [67] arXiv:2511.13806 (交叉列表自 math.OC) [中文pdf, pdf, html, 其他]
-
标题: 最优顺序流标题: Optimal Sequential Flows主题: 优化与控制 (math.OC) ; 数据结构与算法 (cs.DS) ; 形式语言与自动机理论 (cs.FL)
我们提供了一种新的代数技术,在多项式空间中解决顺序流问题。 任务是通过从给定的有限集合中选择容量标记序列,来最大化图中的流。 我们的方法基于有限半群的一个新分解定理,将其应用于合适的流半群时,可以推导出小的见证。 这可以推广到多个输入/输出顶点以及常规约束。
We provide a new algebraic technique to solve the sequential flow problem in polynomial space. The task is to maximize the flow through a graph where edge capacities can be changed over time by choosing a sequence of capacity labelings from a given finite set. Our method is based on a novel factorization theorem for finite semigroups that, applied to a suitable flow semigroup, allows to derive small witnesses. This generalizes to multiple in/output vertices, as well as regular constraints.
- [68] arXiv:2412.17200 (交叉列表自 cs.SE) [中文pdf, pdf, html, 其他]
-
标题: 通过GPT评估UML图:对教育的影响标题: Assessing UML Diagrams by GPT: Implications for Education评论: 预印本已被接受发表于《系统与软件技术》期刊,2025年主题: 软件工程 (cs.SE)
在软件工程(SE)研究和实践中,UML作为一种重要的建模方法,在学术界和工业界的需求分析和软件建模中广为人知。 特别是,许多大学的本科课程中都包含了关于UML建模的基础知识以及创建高质量UML图的实践。 这导致教育工作者需要花费大量时间和精力来审查和评分学生创建的大量UML图。 近年来,生成式AI技术(如GPT)的进步为自动化许多SE任务开辟了新途径。 然而,目前的研究或工具很少探讨GPT在评估UML图质量方面的潜力。 本文旨在研究GPT在评估用例图、类图和序列图质量方面的可行性和性能。 首先,为这些UML图提出了11个评估标准,并详细说明了评分细节。 接着,设计并进行了40名学生的UML建模报告的一系列实验,以探索GPT在评估和评分这些UML图方面的表现。 研究结果表明,GPT可以完成这项评估任务,但目前还不能取代人类专家。 同时,GPT与人类专家之间存在五个评估差异。 这些差异在不同类型的UML图中使用不同的评估标准时有所不同,展示了GPT在此自动评估任务中的优势和不足。
In software engineering (SE) research and practice, UML is well known as an essential modeling methodology for requirements analysis and software modeling in both academia and industry. In particular, fundamental knowledge of UML modeling and practice in creating high-quality UML diagrams are included in SE-relevant courses in the undergraduate programs of many universities. This leads to a time-consuming and labor-intensive task for educators to review and grade a large number of UML diagrams created by the students. Recent advances in generative AI techniques, such as GPT, have paved new ways to automate many SE tasks. However, current research or tools seldom explore the capabilities of GPT in evaluating the quality of UML diagrams. This paper aims to investigate the feasibility and performance of GPT in assessing the quality of UML use case diagrams, class diagrams, and sequence diagrams. First, 11 evaluation criteria with grading details were proposed for these UML diagrams. Next, a series of experiments was designed and conducted on 40 students' UML modeling reports to explore the performance of GPT in evaluating and grading these UML diagrams. The research findings reveal that GPT can complete this assessment task, but it cannot replace human experts yet. Meanwhile, there are five evaluation discrepancies between GPT and human experts. These discrepancies vary in the use of different evaluation criteria in different types of UML diagrams, presenting GPT's strengths and weaknesses in this automatic evaluation task.
- [69] arXiv:2511.14316 (交叉列表自 math.NT) [中文pdf, pdf, html, 其他]
-
标题: 复二元型的华里问题标题: The Waring Problem of Complex Binary Forms主题: 数论 (math.NT)
齐次多元多项式的幂和表达问题涉及将齐次多元多项式表示为线性形式的幂之和。 本文专注于复数二元形式,并使用代数和分析的基本工具来解决这个问题。 特别是,我们对Apolarity引理和西尔维斯特1851定理进行了简单的处理,这些内容易于理解,并将为未来推广到一般情况提供理想的途径。
The Waring problem of forms concerns the expression of homogeneous multivariate polynomials as sums of powers of linear forms. This paper focuses on complex binary forms, and we solve the Waring problem for them using basic tools in algebra and analysis. In particular, we present elementary treatments of the Apolarity Lemma and Sylvester's 1851 Theorem, which are easily accessible and will provide an ideal approach for future extension to the general case.
- [70] arXiv:2511.14642 (交叉列表自 cs.CL) [中文pdf, pdf, html, 其他]
-
标题: 比较错觉的分级强度由贝叶斯推断解释标题: Graded strength of comparative illusions is explained by Bayesian inference评论: 49页,7图主题: 计算与语言 (cs.CL)
像视觉处理一样,语言处理也容易受到幻觉的影响,人们会系统性地错误感知刺激。 在一个这样的例子——比较幻觉(CI),例如,“更多的学生去过俄罗斯而不是我”——理解者往往会认为这个句子是可接受的,尽管其底层的比较是无意义的。 先前的研究认为,这一现象可以通过对噪声信道的贝叶斯推理来解释:一个句子的解释的后验概率与该解释的先验概率以及被腐蚀成观察到的(CI)句子的可能性成正比。 初步的行为研究表明,通过评估CI句子的一组有限的替代解释,并表明理解者更倾向于那些更可能被腐蚀成幻觉句子的解释,从而支持了这一观点。 在本研究中,我们通过直接预测幻觉强度的定量模型来复制并大大超越了之前的工作,该模型我们通过统计语言模型与人类行为数据的新颖综合推导得出。 我们的模型不仅解释了CI效应强度的细微差别,还解释了一个先前未被解释的效果,即由代词与完整名词短语than从句主语引起的效应。 这些发现通过证明该理论对比较幻觉做出了新的预测,并且这些预测在实证上得到了验证,从而支持了句子理解的噪声信道理论。 这一结果结合了在幻觉和非幻觉情境中相关的噪声信道处理证据,以支持噪声信道推断作为多样语言处理现象的统一计算层面理论。
Like visual processing, language processing is susceptible to illusions in which people systematically misperceive stimuli. In one such case--the comparative illusion (CI), e.g., More students have been to Russia than I have--comprehenders tend to judge the sentence as acceptable despite its underlying nonsensical comparison. Prior research has argued that this phenomenon can be explained as Bayesian inference over a noisy channel: the posterior probability of an interpretation of a sentence is proportional to both the prior probability of that interpretation and the likelihood of corruption into the observed (CI) sentence. Initial behavioral work has supported this claim by evaluating a narrow set of alternative interpretations of CI sentences and showing that comprehenders favor interpretations that are more likely to have been corrupted into the illusory sentence. In this study, we replicate and go substantially beyond this earlier work by directly predicting the strength of illusion with a quantitative model of the posterior probability of plausible interpretations, which we derive through a novel synthesis of statistical language models with human behavioral data. Our model explains not only the fine gradations in the strength of CI effects, but also a previously unexplained effect caused by pronominal vs. full noun phrase than-clause subjects. These findings support a noisy-channel theory of sentence comprehension by demonstrating that the theory makes novel predictions about the comparative illusion that bear out empirically. This outcome joins related evidence of noisy channel processing in both illusory and non-illusory contexts to support noisy channel inference as a unified computational-level theory of diverse language processing phenomena.
交叉提交 (展示 14 之 14 条目 )
- [71] arXiv:2511.12977 (替换) [中文pdf, pdf, 其他]
-
标题: ArtiWorld:场景中3D物体的LLM驱动表征标题: ArtiWorld: LLM-Driven Articulation of 3D Objects in Scenes主题: 计算机视觉与模式识别 (cs.CV)
构建交互式模拟器和可扩展的机器人学习环境需要大量的关节资产。 然而,大多数现有的模拟3D资产是刚性的,手动将它们转换为关节物体是非常耗时且成本高昂的。 这引发了一个自然的问题:我们能否自动识别场景中的可关节物体并直接将其转换为关节资产? 在本文中,我们提出了ArtiWorld,一种场景感知的流程,该流程从文本场景描述中定位候选的可关节物体,并重建保留原始几何形状的可执行URDF模型。 该流程的核心是Arti4URDF,它利用3D点云、大型语言模型(LLM)的先验知识以及面向URDF的提示设计,快速将刚性物体转换为基于URDF的交互式关节物体,同时保持其3D形状。 我们在三个层次上评估了ArtiWorld:3D模拟物体、完整的3D模拟场景和真实世界扫描场景。 在所有三种设置中,我们的方法始终优于现有方法,并实现了最先进的性能,同时保留了物体几何形状并正确捕捉了物体的交互性,以生成可用的基于URDF的关节模型。 这提供了一条实用的路径,可以直接从现有的3D资产构建交互式、机器人就绪的仿真环境。 代码和数据将被发布。
Building interactive simulators and scalable robot-learning environments requires a large number of articulated assets. However, most existing 3D assets in simulation are rigid, and manually converting them into articulated objects is extremely labor- and cost-intensive. This raises a natural question: can we automatically identify articulable objects in a scene and convert them into articulated assets directly? In this paper, we present ArtiWorld, a scene-aware pipeline that localizes candidate articulable objects from textual scene descriptions and reconstructs executable URDF models that preserve the original geometry. At the core of this pipeline is Arti4URDF, which leverages 3D point cloud, prior knowledge of a large language model (LLM), and a URDF-oriented prompt design to rapidly convert rigid objects into interactive URDF-based articulated objects while maintaining their 3D shape. We evaluate ArtiWorld at three levels: 3D simulated objects, full 3D simulated scenes, and real-world scan scenes. Across all three settings, our method consistently outperforms existing approaches and achieves state-of-the-art performance, while preserving object geometry and correctly capturing object interactivity to produce usable URDF-based articulated models. This provides a practical path toward building interactive, robot-ready simulation environments directly from existing 3D assets. Code and data will be released.
- [72] arXiv:2511.14138 (替换) [中文pdf, pdf, html, 其他]
-
标题: FxSearcher:无梯度文本驱动的音频转换标题: FxSearcher: gradient-free text-driven audio transformation主题: 音频与语音处理 (eess.AS) ; 声音 (cs.SD)
从文本提示中实现多样且高质量的音频转换仍然具有挑战性,因为现有方法在本质上受到其依赖有限的一组可微分音频效果的限制。 本文提出了\textbf{FxSearcher},一种新颖的无梯度框架,该框架能够发现最优的音频效果(FX)配置,以根据文本提示转换源信号。 我们的方法采用贝叶斯优化和基于 CLAP 的得分函数来高效地执行此搜索。 此外,引入了一个引导提示以防止不良伪影并增强人类偏好。 为了客观评估我们的方法,我们提出了一种基于人工智能的评估框架。 结果表明,我们的方法在这些指标上获得的最高分数与人类偏好高度一致。 演示地址为 https://hojoonki.github.io/FxSearcher/
Achieving diverse and high-quality audio transformations from text prompts remains challenging, as existing methods are fundamentally constrained by their reliance on a limited set of differentiable audio effects. This paper proposes \textbf{FxSearcher}, a novel gradient-free framework that discovers the optimal configuration of audio effects (FX) to transform a source signal according to a text prompt. Our method employs Bayesian Optimization and CLAP-based score function to perform this search efficiently. Furthermore, a guiding prompt is introduced to prevent undesirable artifacts and enhance human preference. To objectively evaluate our method, we propose an AI-based evaluation framework. The results demonstrate that the highest scores achieved by our method on these metrics align closely with human preferences. Demos are available at https://hojoonki.github.io/FxSearcher/
- [73] arXiv:2511.14110 (替换) [中文pdf, pdf, html, 其他]
-
标题: 基于简化导联脑电图和心电图的非患者依赖性新生儿惊厥预测模型标题: A Patient-Independent Neonatal Seizure Prediction Model Using Reduced Montage EEG and ECG评论: 10页,4图主题: 信号处理 (eess.SP) ; 机器学习 (cs.LG)
新生儿极易发生癫痫,常常导致短期或长期的神经功能障碍。 然而,新生儿癫痫的临床表现较为微妙,常常导致误诊。 这会增加持续未治疗的癫痫发作风险以及随后的脑损伤。 连续视频脑电图(cEEG)监测是癫痫检测的金标准。 然而,这是一种昂贵的评估方法,需要专业知识和时间。 在本研究中,我们提出了一种基于卷积神经网络的模型,通过区分脑电图的间期和发作前期状态,以实现新生儿癫痫的早期预测。 我们的模型与患者无关,能够在多个受试者之间进行泛化,并利用从多通道脑电图和心电图(ECG)信号中提取的梅尔频率倒谱系数矩阵作为输入特征。 在赫尔辛基新生儿脑电图数据集上使用10折交叉验证进行训练和验证,所提出的模型平均准确率为97.52%,灵敏度为98.31%,特异性为96.39%,F1分数为97.95%,能够在发作前30分钟内实现准确的癫痫预测。 将ECG与EEG结合使用,使F1分数提高了1.42%,而引入注意力机制又带来了额外的0.5%的提升。 为了提高透明度,我们采用了SHapley Additive exPlanations(SHAP)作为一种可解释的人工智能方法来解释模型,并通过头皮图提供了癫痫灶的定位。 总体结果表明,该模型在新生儿重症监护室中具有最小监督部署的潜力,能够实现及时可靠的新生儿癫痫预测,并通过迁移学习在未见过的受试者中表现出强大的泛化能力。
Neonates are highly susceptible to seizures, often leading to short or long-term neurological impairments. However, clinical manifestations of neonatal seizures are subtle and often lead to misdiagnoses. This increases the risk of prolonged, untreated seizure activity and subsequent brain injury. Continuous video electroencephalogram (cEEG) monitoring is the gold standard for seizure detection. However, this is an expensive evaluation that requires expertise and time. In this study, we propose a convolutional neural network-based model for early prediction of neonatal seizures by distinguishing between interictal and preictal states of the EEG. Our model is patient-independent, enabling generalization across multiple subjects, and utilizes mel-frequency cepstral coefficient matrices extracted from multichannel EEG and electrocardiogram (ECG) signals as input features. Trained and validated on the Helsinki neonatal EEG dataset with 10-fold cross-validation, the proposed model achieved an average accuracy of 97.52%, sensitivity of 98.31%, specificity of 96.39%, and F1-score of 97.95%, enabling accurate seizure prediction up to 30 minutes before onset. The inclusion of ECG alongside EEG improved the F1-score by 1.42%, while the incorporation of an attention mechanism yielded an additional 0.5% improvement. To enhance transparency, we incorporated SHapley Additive exPlanations (SHAP) as an explainable artificial intelligence method to interpret the model and provided localization of seizure focus using scalp plots. The overall results demonstrate the model's potential for minimally supervised deployment in neonatal intensive care units, enabling timely and reliable prediction of neonatal seizures, while demonstrating strong generalization capability across unseen subjects through transfer learning.
- [74] arXiv:2511.12959 (替换) [中文pdf, pdf, html, 其他]
-
标题: 基于知识引导的个性化联邦推荐标题: Personalized Federated Recommendation With Knowledge Guidance主题: 信息检索 (cs.IR)
联邦推荐(FedRec)已成为构建隐私保护推荐系统的关键范式。 然而,现有的FedRec模型面临一个关键困境:内存高效的单知识模型由于采用次优的知识替换实践而丢弃了有价值的个性化信息,而高性能的双知识模型通常内存消耗过大,难以实际部署在设备上。 我们提出了带有知识引导的联邦推荐(FedRKG),这是一个与模型无关的框架,解决了这一困境。 核心原则是知识引导,它避免了完全替换,而是将全局知识融合到保留的本地嵌入中,在单知识的内存占用内实现了双知识的个性化优势。 此外,我们引入了自适应引导,这是一种细粒度机制,可以动态调节每个用户-项目交互的引导强度,克服了静态融合方法的局限性。 在基准数据集上的大量实验表明,FedRKG显著优于最先进的方法,验证了我们方法的有效性。 代码可在 https://github.com/Jaehyung-Lim/fedrkg 获取。
Federated Recommendation (FedRec) has emerged as a key paradigm for building privacy-preserving recommender systems. However, existing FedRec models face a critical dilemma: memory-efficient single-knowledge models suffer from a suboptimal knowledge replacement practice that discards valuable personalization, while high-performance dual-knowledge models are often too memory-intensive for practical on-device deployment. We propose Federated Recommendation with Knowledge Guidance (FedRKG), a model-agnostic framework that resolves this dilemma. The core principle, Knowledge Guidance, avoids full replacement and instead fuses global knowledge into preserved local embeddings, attaining the personalization benefits of dual-knowledge within a single-knowledge memory footprint. Furthermore, we introduce Adaptive Guidance, a fine-grained mechanism that dynamically modulates the intensity of this guidance for each user-item interaction, overcoming the limitations of static fusion methods. Extensive experiments on benchmark datasets demonstrate that FedRKG significantly outperforms state-of-the-art methods, validating the effectiveness of our approach. The code is available at https://github.com/Jaehyung-Lim/fedrkg.
- [75] arXiv:2511.14620 (替换) [中文pdf, pdf, 其他]
-
标题: 融合生物力学和时空特征用于跌倒预测:表征并减轻仿真到现实的差距标题: Fusing Biomechanical and Spatio-Temporal Features for Fall Prediction: Characterizing and Mitigating the Simulation-to-Reality Gap主题: 计算机视觉与模式识别 (cs.CV)
跌倒是老年人受伤和失去独立性的主要原因。基于视觉的跌倒预测系统提供了一种非侵入性解决方案,在撞击前几秒预测跌倒,但其开发受到可用跌倒数据稀缺的阻碍。为这些努力做出贡献,本研究提出了生物力学时空图卷积网络(BioST-GCN),这是一种双流模型,使用交叉注意力融合机制结合姿态和生物力学信息。我们的模型在模拟的MCF-UA特技演员和MUVIM数据集上分别比原始ST-GCN基线提高了5.32%和2.91%的F1分数。ST-GCN流中的时空注意力机制通过识别关键关节和时间阶段提供了可解释性。然而,模拟与现实之间的差距仍然存在。虽然我们的模型在模拟数据上全监督下达到了89.0%的F1分数,但对未见过的受试者的零样本泛化下降到35.9%。这种性能下降可能是由于模拟数据中的偏差,例如“意图跌倒”线索。对于老年人,尤其是患有糖尿病或虚弱的人,由于他们独特的运动学特征,这一差距更加严重。为了解决这个问题,我们提出了个性化策略,并倡导隐私保护的数据管道以实现现实世界的验证。我们的研究结果强调了迫切需要弥合模拟数据与真实世界数据之间的差距,以开发针对易受伤害的老年人群的有效跌倒预测系统。
Falls are a leading cause of injury and loss of independence among older adults. Vision-based fall prediction systems offer a non-invasive solution to anticipate falls seconds before impact, but their development is hindered by the scarcity of available fall data. Contributing to these efforts, this study proposes the Biomechanical Spatio-Temporal Graph Convolutional Network (BioST-GCN), a dual-stream model that combines both pose and biomechanical information using a cross-attention fusion mechanism. Our model outperforms the vanilla ST-GCN baseline by 5.32% and 2.91% F1-score on the simulated MCF-UA stunt-actor and MUVIM datasets, respectively. The spatio-temporal attention mechanisms in the ST-GCN stream also provide interpretability by identifying critical joints and temporal phases. However, a critical simulation-reality gap persists. While our model achieves an 89.0% F1-score with full supervision on simulated data, zero-shot generalization to unseen subjects drops to 35.9%. This performance decline is likely due to biases in simulated data, such as `intent-to-fall' cues. For older adults, particularly those with diabetes or frailty, this gap is exacerbated by their unique kinematic profiles. To address this, we propose personalization strategies and advocate for privacy-preserving data pipelines to enable real-world validation. Our findings underscore the urgent need to bridge the gap between simulated and real-world data to develop effective fall prediction systems for vulnerable elderly populations.
- [76] arXiv:2506.18465 (替换) [中文pdf, pdf, html, 其他]
-
标题: 用于近场通信和传感的天线阵列尺寸设计标题: Sizing Antenna Arrays for Near-field Communication and Sensing评论: 被IEEE无线通信Letters接受主题: 信号处理 (eess.SP)
本文介绍了近场通信和传感系统的关键性能指标及其作为天线阵列孔径函数的缩放行为。 推导了多种标准阵列几何结构的解析表达式,以简化在给定系统要求下的大型天线阵列设计。 首先,分析了近场波束聚焦,观察到最小波束深度随着阵列孔径增加迅速趋于一个低渐近极限。 相反,近场区域跨度与阵列孔径呈二次方比例关系。 基于这两个指标,推导出3 dB分离条件下可分辨波束点的最大数量,显示出与阵列孔径的线性依赖关系。 此外,当考虑波束聚焦分辨率不超过特定阈值的区域时,该区域的范围也显示与阵列尺寸呈线性比例关系。 最后,估计了在阵列端射方向观测到的信道显著奇异值的数量,显示出与孔径的幂律依赖关系。 这些结果表达式为评估近场通信和传感应用中的孔径需求提供了实用的设计指南。
This paper presents key performance metrics for near-field communication and sensing systems and their scaling behavior as a function of the antenna array aperture. Analytical expressions are derived for several standard array geometries to ease the design of the large antenna arrays under given system requirements. First, the near-field beam focusing is analyzed and the minimum beamdepth is observed to rapidly saturate to a low asymptotic limit as the array aperture increases. In contrast, the near-field region span is shown to scale quadratically with the array aperture. Based on these two metrics, the maximum number of resolvable beamspots at 3 dB separation is derived analytically, exhibiting a linear dependence on the array aperture. Moreover, when considering a region where the beamfocusing resolution does not exceed a specified threshold, the extent of the region is also shown to scale linearly with the array size. Finally, the number of significant singular values of a channel observed at the array's broadside is estimated, showing a power-law dependence on the aperture. The resulting expressions provide practical design guidelines for evaluating aperture requirements in near-field communication and sensing applications.
- [77] arXiv:2511.14716 (替换) [中文pdf, pdf, 其他]
-
标题: 扩散作为自蒸馏:一个模型中的端到端潜在扩散标题: Diffusion As Self-Distillation: End-to-End Latent Diffusion In One ModelBinyu Cui, Jun Wang, Xibo Yuan, Alfonso Martinez, George Slama, Matthew Wilkowski, Ryosuke Ota, Keiji Wada评论: 技术报告。10页主题: 计算机视觉与模式识别 (cs.CV)
标准潜在扩散模型依赖于一个复杂的三部分架构,包括单独的编码器、解码器和扩散网络,这些组件在多个阶段进行训练。 这种模块化设计计算效率低下,导致性能次优,并阻碍了扩散与视觉基础模型中常见的单网络架构的统一。 我们的目标是将这三个组件统一为一个端到端可训练的网络。 我们首先证明,由于“潜在崩溃”,即扩散训练目标干扰了网络学习良好潜在表示的能力,简单的联合训练方法会灾难性地失败。 我们通过将扩散与基于自蒸馏的无监督学习方法进行新颖的类比,确定了这种不稳定的根源。 基于这一见解,我们提出了扩散作为自蒸馏(DSD),一种对训练目标进行关键修改的新框架,以稳定潜在空间。 这种方法首次实现了稳定端到端训练的单一网络,该网络同时学习编码、解码和扩散。 DSD在ImageNet$256\times 256$条件生成任务上表现出色:在ImageNet上仅使用42M/118M/205M参数和50个训练周期,无需使用无分类器引导,FID=13.44/6.38/4.25。
Standard Latent Diffusion Models rely on a complex, three-part architecture consisting of a separate encoder, decoder, and diffusion network, which are trained in multiple stages. This modular design is computationally inefficient, leads to suboptimal performance, and prevents the unification of diffusion with the single-network architectures common in vision foundation models. Our goal is to unify these three components into a single, end-to-end trainable network. We first demonstrate that a naive joint training approach fails catastrophically due to ``latent collapse'', where the diffusion training objective interferes with the network's ability to learn a good latent representation. We identify the root causes of this instability by drawing a novel analogy between diffusion and self-distillation based unsupervised learning method. Based on this insight, we propose Diffusion as Self-Distillation (DSD), a new framework with key modifications to the training objective that stabilize the latent space. This approach enables, for the first time, the stable end-to-end training of a single network that simultaneously learns to encode, decode, and perform diffusion. DSD achieves outstanding performance on the ImageNet $256\times 256$ conditional generation task: FID=13.44/6.38/4.25 with only 42M/118M/205M parameters and 50 training epochs on ImageNet, without using classifier-free-guidance.
- [78] arXiv:2511.14517 (替换) [中文pdf, pdf, html, 其他]
-
标题: 三重混合波束成形设计用于全连接夹紧天线系统标题: Tri-Hybrid Beamforming Design for Fully-Connected Pinching Antenna Systems主题: 信号处理 (eess.SP)
一种新颖的全连接(FC)三混合波束成形(THB)架构被提出用于夹紧天线系统(PASS)。 与传统的子连接(SC)PASS相比,所提出的FC架构采用可调相移器网络将所有射频(RF)链与所有波导连接。 这促进了集成传统混合模拟-数字波束成形与夹紧波束成形的THB框架。 随后,提出了一个加权和速率(WSR)优化问题,以联合优化发射波束成形器和夹紧天线(PA)位置。 开发了两种算法来解决这个具有挑战性的非凸问题。 1)基于分数规划(FP)的算法:该算法使用基于FP的交替优化框架直接最大化WSR。 特别是,提出了一种基于成功历史的自适应差分进化(SHADE)方法来优化PA位置,有效解决了难以处理的多模态目标函数。 2)基于零 forcing(ZF)的算法:为了降低设计复杂度,采用零 forcing 进行发射波束成形。 随后通过修改的SHADE方法优化PA位置以最大化WSR。 仿真结果验证了所提算法的有效性,揭示了FC-THB PASS在WSR方面与SC架构相当,同时在较少的RF链下实现了更优的能量效率。
A novel fully-connected (FC) tri-hybrid beamforming (THB) architecture is proposed for pinching antenna systems (PASS). In contrast to conventional sub-connected (SC) PASS, the proposed FC architecture employs a tunable phase-shifter network to interconnect all radio frequency (RF) chains with all waveguides. This facilitates a THB framework that integrates conventional hybrid analog-digital beamforming with pinching beamforming. A weighted sum-rate (WSR) optimization problem is then formulated to jointly optimize the transmit beamformers and pinching antenna (PA) positions. Two algorithms are developed to address this challenging non-convex problem. 1) Fractional programming (FP)-based algorithm: This algorithm directly maximizes the WSR using an FP-based alternating optimization framework. Particularly, a success-history based adaptive differential evolution (SHADE) method is proposed to optimize PA positions, effectively addressing the intractable multimodal objective function. 2) Zero-forcing (ZF)-based algorithm: To reduce design complexity, zero-forcing is employed for transmit beamforming. The PA positions are subsequently optimized to maximize the WSR via a modified SHADE method. Simulation results validate the effectiveness of the proposed algorithms, revealing that the FC-THB PASS achieves WSR comparable to the SC architecture while delivering superior energy efficiency with fewer RF chains.
- [79] arXiv:2511.12791 (替换) [中文pdf, pdf, html, 其他]
-
标题: 时间序列预测中的联邦学习最优回溯周期标题: Optimal Look-back Horizon for Time Series Forecasting in Federated Learning评论: 被AAAI-26接受为口头报告主题: 机器学习 (cs.LG) ; 人工智能 (cs.AI)
选择适当的回溯范围仍然是时间序列预测(TSF)中的一个基本挑战,特别是在数据分散、异构且通常非独立的联邦学习场景中。尽管最近的研究通过在内在空间中保留预测相关的信息来探索范围选择,但这些方法主要局限于集中式和独立分布的环境。本文通过一种内在空间公式提出了一种自适应范围选择的系统框架。我们引入了一个合成数据生成器(SDG),它能够捕捉客户端数据中的基本时间结构,包括自回归依赖性、季节性和趋势,同时结合客户端特定的异质性。在此模型的基础上,我们定义了一个将时间序列窗口映射到具有明确几何和统计特性的内在表示空间的变换。然后,我们推导了预测损失的分解,其中包括一个反映不可约不确定性的贝叶斯项,以及一个考虑有限样本效应和模型容量限制的近似项。我们的分析表明,虽然增加回溯范围可以提高确定性模式的可识别性,但由于模型复杂度的增加和样本效率的降低,也会增加近似误差。我们证明了总预测损失在不可约损失开始饱和的最小范围内达到最小值,而近似损失继续上升。这项工作为联邦学习中的时间序列预测提供了自适应范围选择的严格理论基础。
Selecting an appropriate look-back horizon remains a fundamental challenge in time series forecasting (TSF), particularly in the federated learning scenarios where data is decentralized, heterogeneous, and often non-independent. While recent work has explored horizon selection by preserving forecasting-relevant information in an intrinsic space, these approaches are primarily restricted to centralized and independently distributed settings. This paper presents a principled framework for adaptive horizon selection in federated time series forecasting through an intrinsic space formulation. We introduce a synthetic data generator (SDG) that captures essential temporal structures in client data, including autoregressive dependencies, seasonality, and trend, while incorporating client-specific heterogeneity. Building on this model, we define a transformation that maps time series windows into an intrinsic representation space with well-defined geometric and statistical properties. We then derive a decomposition of the forecasting loss into a Bayesian term, which reflects irreducible uncertainty, and an approximation term, which accounts for finite-sample effects and limited model capacity. Our analysis shows that while increasing the look-back horizon improves the identifiability of deterministic patterns, it also increases approximation error due to higher model complexity and reduced sample efficiency. We prove that the total forecasting loss is minimized at the smallest horizon where the irreducible loss starts to saturate, while the approximation loss continues to rise. This work provides a rigorous theoretical foundation for adaptive horizon selection for time series forecasting in federated learning.
- [80] arXiv:2511.10446 (替换) [中文pdf, pdf, html, 其他]
-
标题: 连续性丢弃用于神经微分方程标题: Continuum Dropout for Neural Differential EquationsVivien van Veldhuizen, Vanessa Botha, Chunyao Lu, Melis Erdal Cesur, Kevin Groot Lipman, Edwin D. de Jong, Hugo Horlings, Clárisa I. Sanchez, Cees G. M. Snoek, Lodewyk Wessels, Ritse Mann, Eric Marcus, Jonas Teuwen期刊参考: 人工智能促进协会2026主题: 机器学习 (stat.ML) ; 机器学习 (cs.LG)
神经微分方程(NDEs)在建模连续时间动态方面表现出色,能够有效处理不规则观测、缺失值和噪声等挑战。 尽管具有优势,NDEs 在采用 dropout 这一深度学习正则化的基石技术时面临根本性挑战,使其容易过拟合。 为解决这一研究空白,我们引入了连续体 dropout,这是一种基于交替更新过程理论的通用 NDE 正则化技术。 连续体 dropout 将 dropout 的开关机制形式化为一个在连续时间中在活跃(演化)和非活跃(暂停)状态之间交替的随机过程。 这提供了一种防止过拟合并增强 NDE 一般化能力的方法。 此外,连续体 dropout 提供了一个结构化的框架,通过测试时的蒙特卡洛采样来量化预测不确定性。 通过广泛的实验,我们证明连续体 dropout 超过了现有的 NDE 正则化方法,在各种时间序列和图像分类任务中取得了更优的性能。 它还产生了更校准且更可信的概率估计,突显了其在感知不确定性的建模中的有效性。
Neural Differential Equations (NDEs) excel at modeling continuous-time dynamics, effectively handling challenges such as irregular observations, missing values, and noise. Despite their advantages, NDEs face a fundamental challenge in adopting dropout, a cornerstone of deep learning regularization, making them susceptible to overfitting. To address this research gap, we introduce Continuum Dropout, a universally applicable regularization technique for NDEs built upon the theory of alternating renewal processes. Continuum Dropout formulates the on-off mechanism of dropout as a stochastic process that alternates between active (evolution) and inactive (paused) states in continuous time. This provides a principled approach to prevent overfitting and enhance the generalization capabilities of NDEs. Moreover, Continuum Dropout offers a structured framework to quantify predictive uncertainty via Monte Carlo sampling at test time. Through extensive experiments, we demonstrate that Continuum Dropout outperforms existing regularization methods for NDEs, achieving superior performance on various time series and image classification tasks. It also yields better-calibrated and more trustworthy probability estimates, highlighting its effectiveness for uncertainty-aware modeling.
- [81] arXiv:2503.10249 (替换) [中文pdf, pdf, html, 其他]
-
标题: 等变覆盖同伦性质标题: The equivariant covering homotopy property评论: 9页主题: 代数拓扑 (math.AT)
在本文中,我们解释了广义等变丛的更一般的背景如何允许对ECHP进行简单的归纳证明。 我们还明确说明了ECHP与Hurewicz纤维理论之间的联系。
In this paper, we explain how the more general context of generalised equivariant bundles allows for a simple inductive proof of the ECHP. We also make clear the link between the ECHP and the theory of Hurewicz fibrations.
- [82] arXiv:2412.16661 (替换) [中文pdf, pdf, html, 其他]
-
标题: 动态和资源受限的DFRC系统中的DoA估计自适应波束扩展标题: Adaptive Beam Broadening for DoA Estimation in Dynamic and Resource-Constrained DFRC Systems评论: 10页,14图 1)修改后的系统模型,包含抽象DFRC调度器共享射频链路 2)重新组织文本/描述 3)根据标准参数设置重新生成图表,使用标准颜色主题: 信号处理 (eess.SP)
双功能雷达通信(DFRC)系统通过共享频谱、硬件和射频(RF)链路,结合雷达和通信功能。 在本工作中,我们考虑一个概念性的DFRC调度器模型,该模型在雷达和通信功能之间共享RF链路。 如果此类调度器被调整为优先考虑通信性能,则分配给雷达的RF链路和时间较少且变化。 我们提出了一种实用的、低延迟且资源感知的技术,通过利用时间切片波束分配和自适应窗口技术,在这种情况下对整个视场(FOV)进行感知和到达方向(DoA)估计。 这导致在FOV上具有平衡的累积阵列因子,从而确保更好的DoA估计可靠性。 大量仿真研究显示,该技术在所有方向上都具有一致的目标检测和角度估计性能,并能随时间适应不同的资源可用性。
Dual-function radar communication (DFRC) systems incorporate both radar and communication functions by sharing spectrum, hardware and radio frequency (RF) chains. In this work, we consider a conceptual DFRC scheduler model which shares RF chains between radar and communication functions. If such a scheduler is tuned for prioritizing communication performance, the RF chains and time allocated to radar are less and varying. We propose a practical, low-latency and resource-aware technique for sensing the entire field-of-view (FOV) and Direction-of-Arrival (DoA) estimation in such settings by leveraging time-sliced beam allocation along with adaptive windowing. This results in a balanced cumulative array factor over the FOV thereby ensuring better DoA estimation reliability. Extensive simulation studies show that the technique has consistent target detection and angle estimation performance in all directions and adapts to varying resource availability with time.
- [83] arXiv:2511.07991 (替换) [中文pdf, pdf, 其他]
-
标题: VSPO:通过基于大语言模型的CQ生成验证本体中的语义陷阱标题: VSPO: Validating Semantic Pitfalls in Ontology via LLM-Based CQ Generation评论: 被AAAI 2026接收为口头报告主题: 人工智能 (cs.AI)
能力问题(CQs)在验证本体设计中起着至关重要的作用。 虽然手动编写CQs对于本体工程师来说可能非常耗时且成本高昂,但最近的研究已经探索了使用大型语言模型(LLMs)来自动化这一过程。 然而,之前的方法大多基于生成的CQs与现有数据集的相似性进行评估,这通常无法验证诸如“错误使用allValuesFrom”之类的语义陷阱。 由于这些陷阱无法通过基于规则的方法可靠地检测到,我们提出了一个新颖的数据集和模型,用于验证本体中的语义陷阱(VSPO),专门设计用于验证语义陷阱。 为了模拟缺失和误用的公理,我们使用LLMs生成类和属性的自然语言定义,并通过删除公理或更改逻辑运算符(例如,用交集替换并集)在定义和本体之间引入不一致。 然后,我们对LLaMA-3.1-8B-Instruct进行微调,以生成验证所提供定义与相应公理之间语义差异的CQs。 与现有的公共数据集相比,生成的CQs能够检测更广泛的建模错误。 我们的微调模型在生成用于陷阱验证的CQs方面表现出优于基线模型的性能,其精度比GPT-4.1高26%,召回率高28.2%。 这项研究实现了使用LLMs自动生成TBox验证的CQs,显著减少了人工努力,同时提高了本体与专家知识之间的语义对齐。 据我们所知,这是首次使用LLMs针对CQ生成中的语义陷阱验证进行的研究。
Competency Questions (CQs) play a crucial role in validating ontology design. While manually crafting CQs can be highly time-consuming and costly for ontology engineers, recent studies have explored the use of large language models (LLMs) to automate this process. However, prior approaches have largely evaluated generated CQs based on their similarity to existing datasets, which often fail to verify semantic pitfalls such as "Misusing allValuesFrom". Since such pitfalls cannot be reliably detected through rule-based methods, we propose a novel dataset and model of Validating Semantic Pitfalls in Ontology (VSPO) for CQ generation specifically designed to verify the semantic pitfalls. To simulate missing and misused axioms, we use LLMs to generate natural language definitions of classes and properties and introduce misalignments between the definitions and the ontology by removing axioms or altering logical operators (e.g., substituting union with intersection). We then fine-tune LLaMA-3.1-8B-Instruct to generate CQs that validate these semantic discrepancies between the provided definitions and the corresponding axioms. The resulting CQs can detect a broader range of modeling errors compared to existing public datasets. Our fine-tuned model demonstrates superior performance over baselines, showing 26% higher precision and 28.2% higher recall than GPT-4.1 in generating CQs for pitfall validation. This research enables automatic generation of TBox-validating CQs using LLMs, significantly reducing manual effort while improving semantic alignment between ontologies and expert knowledge. To the best of our knowledge, this is the first study to target semantic pitfall validation in CQ generation using LLMs.
- [84] arXiv:2510.17474 (替换) [中文pdf, pdf, html, 其他]
-
标题: 并非所有深度伪造都相同:对音频伪造进行优先处理以实现稳健的深度伪造歌手识别标题: Not All Deepfakes Are Created Equal: Triaging Audio Forgeries for Robust Deepfake Singer Identification评论: 已被接受在NeurIPS 2025生成与保护性人工智能用于内容创作研讨会(非归档)上展示主题: 声音 (cs.SD)
高度逼真的歌唱语音深度伪造的泛滥给保护艺术家形象和内容真实性带来了重大挑战。 在语音深度伪造中自动识别歌手是一种有前景的途径,使艺术家和权利持有人能够对抗未经授权使用其声音,但仍然是一个开放的研究问题。 基于最有害的深度伪造是最高质量的这一前提,我们引入了一个两阶段的流程来识别歌手的语音相似性。 它首先使用一个判别器模型来过滤掉那些无法准确再现语音相似性的低质量伪造品。 随后的模型仅在真实录音上进行训练,用于识别剩余高质量深度伪造和真实音频中的歌手。 实验表明,该系统在真实和合成内容上始终优于现有的基线方法。
The proliferation of highly realistic singing voice deepfakes presents a significant challenge to protecting artist likeness and content authenticity. Automatic singer identification in vocal deepfakes is a promising avenue for artists and rights holders to defend against unauthorized use of their voice, but remains an open research problem. Based on the premise that the most harmful deepfakes are those of the highest quality, we introduce a two-stage pipeline to identify a singer's vocal likeness. It first employs a discriminator model to filter out low-quality forgeries that fail to accurately reproduce vocal likeness. A subsequent model, trained exclusively on authentic recordings, identifies the singer in the remaining high-quality deepfakes and authentic audio. Experiments show that this system consistently outperforms existing baselines on both authentic and synthetic content.
- [85] arXiv:2507.15487 (替换) [中文pdf, pdf, 其他]
-
标题: DeSamba:用于3D多序列MRI病灶分类的解耦频谱自适应框架标题: DeSamba: Decoupled Spectral Adaptive Framework for 3D Multi-Sequence MRI Lesion Classification评论: 我们的稿件需要进一步的实验工作,且数据集无法公开提供;因此,我们恳请撤回该论文主题: 图像与视频处理 (eess.IV) ; 计算机视觉与模式识别 (cs.CV)
磁共振成像(MRI)序列提供丰富的空间和频率域信息,这对于医学影像中准确的病灶分类至关重要。 然而,有效地整合多序列MRI数据以实现稳健的3D病灶分类仍然是一个挑战。 在本文中,我们提出了DeSamba(解耦频谱自适应网络和基于Mamba的模型),一种新的框架,旨在提取解耦表示并自适应融合空间和频谱特征用于病灶分类。 DeSamba引入了一个解耦表示学习模块(DRLM),通过自重构和交叉重构从不同MRI序列中解耦特征,并在提出的SAMNet中引入了一个频谱自适应调制块(SAMB),根据病灶特征实现频谱和空间信息的动态融合。 我们在两个临床上相关的3D数据集上评估了DeSamba。 在一个六类脊柱转移数据集(n=1,448)上,DeSamba在外部验证集(n=372)上实现了62.10%的Top-1准确率、63.62%的F1分数、87.71%的AUC和93.55%的Top-3准确率,优于所有最先进的(SOTA)基线方法。 在一个涉及具有挑战性的二分类任务的脊椎炎数据集(n=251)上,DeSamba在内部和外部验证集上的准确率分别为70.00%/64.52%,AUC分别为74.75/73.88。 消融研究显示,DRLM和SAMB对整体性能都有显著贡献,与基线相比相对改进超过10%。 我们的结果突显了DeSamba作为多序列医学影像中3D病灶分类的一种可推广且有效的解决方案的潜力。
Magnetic Resonance Imaging (MRI) sequences provide rich spatial and frequency domain information, which is crucial for accurate lesion classification in medical imaging. However, effectively integrating multi-sequence MRI data for robust 3D lesion classification remains a challenge. In this paper, we propose DeSamba (Decoupled Spectral Adaptive Network and Mamba-Based Model), a novel framework designed to extract decoupled representations and adaptively fuse spatial and spectral features for lesion classification. DeSamba introduces a Decoupled Representation Learning Module (DRLM) that decouples features from different MRI sequences through self-reconstruction and cross-reconstruction, and a Spectral Adaptive Modulation Block (SAMB) within the proposed SAMNet, enabling dynamic fusion of spectral and spatial information based on lesion characteristics. We evaluate DeSamba on two clinically relevant 3D datasets. On a six-class spinal metastasis dataset (n=1,448), DeSamba achieves 62.10% Top-1 accuracy, 63.62% F1-score, 87.71% AUC, and 93.55% Top-3 accuracy on an external validation set (n=372), outperforming all state-of-the-art (SOTA) baselines. On a spondylitis dataset (n=251) involving a challenging binary classification task, DeSamba achieves 70.00%/64.52% accuracy and 74.75/73.88 AUC on internal and external validation sets, respectively. Ablation studies demonstrate that both DRLM and SAMB significantly contribute to overall performance, with over 10% relative improvement compared to the baseline. Our results highlight the potential of DeSamba as a generalizable and effective solution for 3D lesion classification in multi-sequence medical imaging.
- [86] arXiv:2503.01504 (替换) [中文pdf, pdf, html, 其他]
-
标题: 非相干多天线瑞利块衰落信道在有限码长下的研究标题: On Noncoherent Multiple-Antenna Rayleigh Block-Fading Channels at Finite Blocklength主题: 信息论 (cs.IT)
本文研究了在非协作、多输入多输出(MIMO)瑞利块衰落信道上,使用给定长度的纠错码,且块错误概率不超过给定值时,数据可以传输的最大编码速率。 推导出了一种高信噪比的正态近似,当信噪比(SNR)和我们进行编码的相干间隔数趋于无穷大时,该近似变得准确。 获得的正态近似补充了文献中出现的非渐近界,但这些界的计算评估较为复杂。 它进一步为在有限块长度和有限信噪比下,分集、复用和信道估计成本之间的基本权衡的分析提供了理论基础。
This paper investigates the maximum coding rate at which data can be transmitted over a noncoherent, multiple-input, multiple-output (MIMO) Rayleigh block-fading channel using an error-correcting code of a given blocklength with a block-error probability not exceeding a given value. A high-SNR normal approximation is derived that becomes accurate as the signal-to-noise ratio (SNR) and the number of coherence intervals over which we code tend to infinity. The obtained normal approximation complements the nonasymptotic bounds that have appeared in the literature, but whose evaluation is computationally demanding. It further lays the theoretical foundation for an analytical analysis of the fundamental tradeoff between diversity, multiplexing, and channel-estimation cost at finite blocklength and finite SNR.
- [87] arXiv:2511.14680 (替换) [中文pdf, pdf, html, 其他]
-
标题: NERD:用于三维计算机断层扫描的网络正则化扩散采样标题: NERD: Network-Regularized Diffusion Sampling For 3D Computed Tomography期刊参考: CAMSAP2025主题: 图像与视频处理 (eess.IV)
基于扩散模型(DM)的方法已被提出用于解决逆成像问题。 其中,最近的一类工作通过将采样公式化为一种优化过程,该过程强制执行测量一致性、前向扩散一致性和逐步及反向扩散一致性,展示了强大的性能。 然而,这些方法仅考虑了二维重建任务,并不能直接扩展到三维图像重建问题,例如在计算机断层扫描(CT)中。 为了弥补这一差距,我们通过将L1正则化引入优化目标,提出了用于三维CT的网络正则化扩散采样(NERD)。 这种正则化器鼓励相邻切片之间的空间连续性,减少切片间伪影并促进连贯的体积重建。 此外,我们引入了两种高效的优化策略来解决产生的目标:一种基于交替方向乘子法(ADMM),另一种基于原始-对偶混合梯度(PDHG)方法。 在医学三维CT数据上的实验表明,我们的方法实现了最先进的或极具竞争力的结果。
Numerous diffusion model (DM)-based methods have been proposed for solving inverse imaging problems. Among these, a recent line of work has demonstrated strong performance by formulating sampling as an optimization procedure that enforces measurement consistency, forward diffusion consistency, and both step-wise and backward diffusion consistency. However, these methods have only considered 2D reconstruction tasks and do not directly extend to 3D image reconstruction problems, such as in Computed Tomography (CT). To bridge this gap, we propose NEtwork-Regularized diffusion sampling for 3D CT (NERD) by incorporating an L1 regularization into the optimization objective. This regularizer encourages spatial continuity across adjacent slices, reducing inter-slice artifacts and promoting coherent volumetric reconstructions. Additionally, we introduce two efficient optimization strategies to solve the resulting objective: one based on the Alternating Direction Method of Multipliers (ADMM) and another based on the Primal-Dual Hybrid Gradient (PDHG) method. Experiments on medical 3D CT data demonstrate that our approach achieves either state-of-the-art or highly competitive results.
- [88] arXiv:2502.07339 (替换) [中文pdf, pdf, html, 其他]
-
标题: 爪自由图的叶数和分支顶点较少的生成树标题: Spanning trees of claw-free graphs with few leaves and branch vertices评论: arXiv管理员注释:与arXiv:2201.01043文本重叠主题: 组合数学 (math.CO)
设$T$是一棵树。 度为一的顶点是$T$的\emph{叶子},度至少为三的顶点是$T$的\emph{分支顶点}。 如果一个图不包含$K_{1,3}$作为导出子图,则该图被称为无爪图。 在本文中,我们研究具有有限叶子数和分支顶点数的无爪图的生成树。 应用主要结果,我们还对无爪图情况下具有少量分支顶点的生成树的先前结果进行了改进。
Let $T$ be a tree. A vertex of degree one is a \emph{leaf} of $T$ and a vertex of degree at least three is a \emph{branch vertex} of $T$. A graph is said to be claw-free if it does not contain $K_{1,3}$ as an induced subgraph. In this paper, we study the spanning trees with a bounded number of leaves and branch vertices of claw-free graphs. Applying the main results, we also give some improvements of previous results on the spanning trees with few branch vertices for the case of claw-free graphs.
- [89] arXiv:2508.18708 (替换) [中文pdf, pdf, html, 其他]
-
标题: 技能对齐的公平性在医疗协作中的多智能体学习标题: Skill-Aligned Fairness in Multi-Agent Learning for Collaboration in Healthcare主题: 多智能体系统 (cs.MA) ; 人工智能 (cs.AI) ; 机器学习 (cs.LG)
在多智能体强化学习(MARL)中,公平性通常被描述为工作量平衡问题,而忽略了智能体的专业技能以及现实世界领域中所需的结构化协调。 在医疗保健领域,公平的任务分配需要工作量平衡或专业技能对齐,以防止职业倦怠和高技能智能体的过度使用。 工作量平衡指的是无论医疗工作者的专业技能如何,都将其分配大约相等数量的子任务或均衡的努力。 我们做出了两项贡献来解决这个问题。 首先,我们提出了FairSkillMARL,这是一个将公平性定义为工作量平衡和技能-任务对齐双重目标的框架。 其次,我们引入了MARLHospital,这是一个可定制的医疗保健启发式环境,用于建模团队组成以及能源约束调度对公平性的影响,因为目前没有现有的模拟器适合解决这个问题。 我们进行了实验,将FairSkillMARL与四种标准的MARL方法以及两种最先进的公平性度量进行比较。 我们的结果表明,仅基于平等工作量的公平性可能导致任务与技能不匹配,并强调了需要更稳健的度量来捕捉技能-任务不匹配的情况。 我们的工作为研究异构多智能体系统中的公平性提供了工具和基础,其中将努力与专业技能对齐至关重要。
Fairness in multi-agent reinforcement learning (MARL) is often framed as a workload balance problem, overlooking agent expertise and the structured coordination required in real-world domains. In healthcare, equitable task allocation requires workload balance or expertise alignment to prevent burnout and overuse of highly skilled agents. Workload balance refers to distributing an approximately equal number of subtasks or equalised effort across healthcare workers, regardless of their expertise. We make two contributions to address this problem. First, we propose FairSkillMARL, a framework that defines fairness as the dual objective of workload balance and skill-task alignment. Second, we introduce MARLHospital, a customizable healthcare-inspired environment for modeling team compositions and energy-constrained scheduling impacts on fairness, as no existing simulators are well-suited for this problem. We conducted experiments to compare FairSkillMARL in conjunction with four standard MARL methods, and against two state-of-the-art fairness metrics. Our results suggest that fairness based solely on equal workload might lead to task-skill mismatches and highlight the need for more robust metrics that capture skill-task misalignment. Our work provides tools and a foundation for studying fairness in heterogeneous multi-agent systems where aligning effort with expertise is critical.
- [90] arXiv:2508.03411 (替换) [中文pdf, pdf, html, 其他]
-
标题: SlotMatch:用于无监督视频分割的物体中心表示知识蒸馏标题: SlotMatch: Distilling Object-Centric Representations for Unsupervised Video Segmentation主题: 计算机视觉与模式识别 (cs.CV) ; 人工智能 (cs.AI)
无监督视频分割是一项具有挑战性的计算机视觉任务,特别是由于缺乏监督信号以及视觉场景的复杂性。 为了克服这一挑战,基于槽注意力的最先进模型往往不得不依赖于大型且计算成本高昂的神经架构。 为此,我们提出了一种简单的知识蒸馏框架,能够有效地将以对象为中心的表示传递给轻量级的学生模型。 该提出的框架称为SlotMatch,通过余弦相似度对齐相应的教师和学生槽,不需要额外的蒸馏目标或辅助监督。 SlotMatch的简单性通过理论和实证证据得到确认,两者均表明集成额外损失是多余的。 我们在三个数据集上进行实验,将最先进的教师模型SlotContrast与我们的蒸馏学生模型进行比较。 结果表明,我们的学生模型在使用3.6倍更少的参数并运行速度高达2.7倍的情况下,能够匹配甚至超越其教师模型。 此外,我们的学生模型超过了所有其他最先进的无监督视频分割模型。
Unsupervised video segmentation is a challenging computer vision task, especially due to the lack of supervisory signals coupled with the complexity of visual scenes. To overcome this challenge, state-of-the-art models based on slot attention often have to rely on large and computationally expensive neural architectures. To this end, we propose a simple knowledge distillation framework that effectively transfers object-centric representations to a lightweight student. The proposed framework, called SlotMatch, aligns corresponding teacher and student slots via the cosine similarity, requiring no additional distillation objectives or auxiliary supervision. The simplicity of SlotMatch is confirmed via theoretical and empirical evidence, both indicating that integrating additional losses is redundant. We conduct experiments on three datasets to compare the state-of-the-art teacher model, SlotContrast, with our distilled student. The results show that our student based on SlotMatch matches and even outperforms its teacher, while using 3.6x less parameters and running up to 2.7x faster. Moreover, our student surpasses all other state-of-the-art unsupervised video segmentation models.
- [91] arXiv:2507.03779 (替换) [中文pdf, pdf, html, 其他]
-
标题: FastDINOv2:基于频率的课程学习提高鲁棒性和训练速度标题: FastDINOv2: Frequency Based Curriculum Learning Improves Robustness and Training Speed评论: 被第39届神经信息处理系统大会(NeurIPS 2025)接受主题: 计算机视觉与模式识别 (cs.CV) ; 人工智能 (cs.AI) ; 机器学习 (cs.LG)
大规模视觉基础模型如DINOv2通过利用庞大的架构和训练数据集表现出色。 但许多场景要求从业者复制这些预训练解决方案,例如在私有数据、新模态上,或者仅仅是为了科学探究——这目前在计算上非常耗费资源。 因此,我们为DINOv2提出了一种新的预训练策略,该策略同时加速收敛,并作为副产品增强对常见损坏的鲁棒性。 我们的方法涉及一个频率过滤课程——先看到低频内容——以及高斯噪声补丁增强。 应用于在ImageNet-1K上训练的ViT-B/16主干网络时,预训练时间和FLOPs分别减少了1.6倍和2.25倍,我们的方法在损坏基准(ImageNet-C)中仍实现了相当的鲁棒性,并且与基线相比保持了具有竞争力的线性探测性能。 这种效率和鲁棒性的双重优势使大规模自监督基础建模更加可行,同时为围绕数据课程和增强手段以提高自监督学习模型鲁棒性的新探索打开了大门。 代码可在https://github.com/KevinZ0217/fast_dinov2获取。
Large-scale vision foundation models such as DINOv2 boast impressive performances by leveraging massive architectures and training datasets. But numerous scenarios require practitioners to reproduce those pre-training solutions, such as on private data, new modalities, or simply for scientific questioning--which is currently extremely demanding computation-wise. We thus propose a novel pre-training strategy for DINOv2 that simultaneously accelerates convergence--and strengthens robustness to common corruptions as a by-product. Our approach involves a frequency filtering curriculum--low-frequency being seen first--and the Gaussian noise patching augmentation. Applied to a ViT-B/16 backbone trained on ImageNet-1K, while pre-training time and FLOPs are reduced by 1.6x and 2.25x, our method still achieves matching robustness in corruption benchmarks (ImageNet-C) and maintains competitive linear probing performance compared with baseline. This dual benefit of efficiency and robustness makes large-scale self-supervised foundation modeling more attainable, while opening the door to novel exploration around data curriculum and augmentation as means to improve self-supervised learning models robustness. The code is available at https://github.com/KevinZ0217/fast_dinov2
- [92] arXiv:2510.06009 (替换) [中文pdf, pdf, html, 其他]
-
标题: 通过改进的图像-文本对齐的持续学习用于图像描述标题: Continual Learning for Image Captioning through Improved Image-Text Alignment评论: 11页,3图主题: 计算机视觉与模式识别 (cs.CV)
生成准确且连贯的图像字幕在持续学习设置中仍然是一个重大挑战,这是由于灾难性遗忘以及随着时间推移将不断演变的视觉概念与语言对齐的困难。 在本工作中,我们提出了一种用于持续图像字幕的新多损失框架,该框架通过基于提示的持续学习整合语义指导和对比对齐。 基于预训练的ViT-GPT-2骨干网络,我们的方法结合了标准的交叉熵损失与三个附加组件:(1) 一种基于提示的余弦相似度损失,将图像嵌入与合成构建的编码对象、属性和动作的提示对齐;(2) 一种类似CLIP的损失,促进图像嵌入与目标字幕嵌入之间的对齐;以及(3) 一种语言引导的对比损失,采用三元组损失来增强任务之间的类别级可区分性。 值得注意的是,我们的方法在推理时没有额外的开销,并且在生成字幕时不需要提示。 我们发现,这种方法减轻了灾难性遗忘,同时相比最先进的方法实现了更好的语义字幕对齐。 代码可通过以下链接找到:https://github.com/Gepardius/Taetz_Bordelius_Continual_ImageCaptioning.
Generating accurate and coherent image captions in a continual learning setting remains a major challenge due to catastrophic forgetting and the difficulty of aligning evolving visual concepts with language over time. In this work, we propose a novel multi-loss framework for continual image captioning that integrates semantic guidance through prompt-based continual learning and contrastive alignment. Built upon a pretrained ViT-GPT-2 backbone, our approach combines standard cross-entropy loss with three additional components: (1) a prompt-based cosine similarity loss that aligns image embeddings with synthetically constructed prompts encoding objects, attributes, and actions; (2) a CLIP-style loss that promotes alignment between image embeddings and target caption embedding; and (3) a language-guided contrastive loss that employs a triplet loss to enhance class-level discriminability between tasks. Notably, our approach introduces no additional overhead at inference time and requires no prompts during caption generation. We find that this approach mitigates catastrophic forgetting, while achieving better semantic caption alignment compared to state-of-the-art methods. The code can be found via the following link: https://github.com/Gepardius/Taetz_Bordelius_Continual_ImageCaptioning.
- [93] arXiv:2511.14495 (替换) [中文pdf, pdf, html, 其他]
-
标题: 基于对抗学习的指纹定位射频图重建标题: Adversarial Learning-Based Radio Map Reconstruction for Fingerprinting Localization评论: 此作品已提交给IEEE以可能发表主题: 信号处理 (eess.SP)
这封信提出了一种特征引导的对抗框架,即ComGAN,旨在通过推断未测量参考点(RPs)处缺失的接收信号强度(RSS)值来重建不完整的指纹数据库。 一个辅助子网络被集成到条件生成对抗网络(cGAN)中,以实现空间特征学习。 然后开发了一种优化方法,通过聚合多个预测集来改进RSS预测,从而实现更好的定位性能。 实验结果表明,所提出的方案在均方根误差(RMSE)方面与真实测量结果相当,同时优于最先进的重建方法。 当重建的指纹与测量数据结合用于训练时,指纹定位的准确性可与在完全测量数据集上训练的模型相媲美。
This letter presents a feature-guided adversarial framework, namely ComGAN, which is designed to reconstruct an incomplete fingerprint database by inferring missing received signal strength (RSS) values at unmeasured reference points (RPs). An auxiliary subnetwork is integrated into a conditional generative adversarial network (cGAN) to enable spatial feature learning. An optimization method is then developed to refine the RSS predictions by aggregating multiple prediction sets, achieving an improved localization performance. Experimental results demonstrate that the proposed scheme achieves a root mean squared error (RMSE) comparable to the ground-truth measurements while outperforming state-of-the-art reconstruction methods. When the reconstructed fingerprint is combined with measured data for training, the fingerprinting localization achieves accuracy comparable to models trained on fully measured datasets.
- [94] arXiv:2511.09087 (替换) [中文pdf, pdf, html, 其他]
-
标题: Tele-LLM-Hub:构建面向电信网络的上下文感知多智能体LLM系统标题: Tele-LLM-Hub: Building Context-Aware Multi-Agent LLM Systems for Telecom Networks主题: 网络与互联网架构 (cs.NI) ; 人工智能 (cs.AI)
本文介绍了Tele-LLM-Hub,这是一个用户友好的低代码解决方案,用于快速原型设计和部署面向5G及更高级别的上下文感知多智能体(MA)大型语言模型(LLM)系统。 随着电信无线网络变得越来越复杂,智能LLM应用必须共享对网络状态的领域特定理解。 我们提出了TeleMCP,即电信模型上下文协议,以实现在电信环境中的智能体之间结构化且上下文丰富的通信。 Tele-LLM-Hub通过一个低代码界面实现TeleMCP,该界面支持智能体创建、工作流组合以及与srsRAN等软件栈进行交互。 关键组件包括直接聊天界面、预构建系统的存储库、利用我们的RANSTRUCT框架进行微调的Agent Maker,以及用于组合MA工作流的MA-Maker。 Tele-LLM-Hub的目标是使上下文感知MA系统的设民主化,并加速下一代无线网络的创新。
This paper introduces Tele-LLM-Hub, a user friendly low-code solution for rapid prototyping and deployment of context aware multi-agent (MA) Large Language Model (LLM) systems tailored for 5G and beyond. As telecom wireless networks become increasingly complex, intelligent LLM applications must share a domainspecific understanding of network state. We propose TeleMCP, the Telecom Model Context Protocol, to enable structured and context-rich communication between agents in telecom environments. Tele-LLM-Hub actualizes TeleMCP through a low-code interface that supports agent creation, workflow composition, and interaction with software stacks such as srsRAN. Key components include a direct chat interface, a repository of pre-built systems, an Agent Maker leveraging finetuning with our RANSTRUCT framework, and an MA-Maker for composing MA workflows. The goal of Tele-LLM-Hub is to democratize the design of contextaware MA systems and accelerate innovation in next-generation wireless networks.
- [95] arXiv:2508.01450 (替换) [中文pdf, pdf, html, 其他]
-
标题: 面向高效医疗推理的最小微调数据方法标题: Towards Efficient Medical Reasoning with Minimal Fine-Tuning Data评论: 预印本,正在审稿中主题: 计算与语言 (cs.CL)
监督微调(SFT)在将大型语言模型(LLMs)适应到如医学推理等专业领域中起着关键作用。 然而,现有的SFT实践通常依赖于未过滤的数据集,这些数据集包含冗余和低质量的样本,导致大量的计算成本和次优性能。 尽管现有方法尝试通过基于样本难度(由知识和推理复杂性定义)选择数据来缓解这个问题,但它们忽略了样本在梯度中反映的优化效用。 有趣的是,我们发现仅基于梯度的影响倾向于选择容易优化的样本,这些样本导致大的参数变化但缺乏深度推理链,而仅凭难度则会选择噪声或过于复杂的案例,这些案例无法指导稳定的优化。 基于这一观察,我们提出了一种数据选择策略,即难度-影响象限(DIQ),该策略优先选择高难度高影响的样本,以平衡复杂的临床推理与显著的梯度影响,从而在最少微调数据的情况下实现高效的医学推理。 此外,人类和LLM作为评判者的评估表明,DIQ选择的子集表现出更高的数据质量,并生成的临床推理更符合专家在鉴别诊断、安全检查和证据引用方面的实践,因为DIQ强调那些促进专家式推理模式的样本。 在医学推理基准上的大量实验表明,DIQ使仅在1%的选定数据上微调的模型就能达到全数据集的性能,而使用10%的数据始终优于基线方法,突显了有原则的数据选择优于蛮力扩展的优势。 代码和数据可在https://github.com/mihara-bot/DIQ获取。
Supervised Fine-Tuning (SFT) plays a pivotal role in adapting Large Language Models (LLMs) to specialized domains such as medical reasoning. However, existing SFT practices often rely on unfiltered datasets that contain redundant and low-quality samples, leading to substantial computational costs and suboptimal performance. Although existing methods attempt to alleviate this problem by selecting data based on sample difficulty, defined by knowledge and reasoning complexity, they overlook each sample's optimization utility reflected in its gradient. Interestingly, we find that gradient-based influence alone favors easy-to-optimize samples that cause large parameter shifts but lack deep reasoning chains, while difficulty alone selects noisy or overly complex cases that fail to guide stable optimization. Based on this observation, we propose a data selection strategy, Difficulty-Influence Quadrant (DIQ), which prioritizes samples in the high-difficulty-high-influence quadrant to balance complex clinical reasoning with substantial gradient influence, enabling efficient medical reasoning with minimal fine-tuning data. Furthermore, Human and LLM-as-a-judge evaluations show that DIQ-selected subsets demonstrate higher data quality and generate clinical reasoning that is more aligned with expert practices in differential diagnosis, safety check, and evidence citation, as DIQ emphasizes samples that foster expert-like reasoning patterns. Extensive experiments on medical reasoning benchmarks demonstrate that DIQ enables models fine-tuned on only 1% of selected data to match full-dataset performance, while using 10% consistently outperforms baseline methods, highlighting the superiority of principled data selection over brute-force scaling. The code and data are available at https://github.com/mihara-bot/DIQ.
- [96] arXiv:2511.14694 (替换) [中文pdf, pdf, html, 其他]
-
标题: 近无损模型压缩在DNA大语言模型中实现更长上下文推理标题: Near-Lossless Model Compression Enables Longer Context Inference in DNA Large Language Models主题: 基因组学 (q-bio.GN) ; 人工智能 (cs.AI) ; 机器学习 (cs.LG) ; 种群与进化 (q-bio.PE)
基于大规模跨物种DNA语料库训练,DNA大型语言模型(LLMs)学习基因组序列的基本“语法”和进化模式。 这使它们成为DNA序列建模的强大先验,尤其是在长范围内。 然而,两个主要限制阻碍了它们在实际中的使用:自注意力的二次计算成本以及自回归解码过程中键值(KV)缓存所需的不断增长的内存。 这些限制迫使使用启发式方法,如固定窗口截断或滑动窗口,这会通过丢弃远距离信息而损害超长序列的保真度。 我们引入了FOCUS(面向特征的超长自注意力压缩),一个可以插入预训练DNA LLM中的渐进式上下文压缩模块。 FOCUS结合了基因组学中已建立的k-mer表示与可学习的分层压缩:它在k-mer粒度上插入摘要标记,并在多个Transformer层中逐步压缩注意力键和值激活,仅保留窗口间的摘要KV状态,同时丢弃普通标记的KV。 一种共享边界窗口方案产生了一个静态的跨窗口接口,以最小的损失传播长距离信息。 我们在基于Evo-2的DNA LLM上验证了FOCUS,该模型在GRCh38染色体1上进行了自监督训练,并采用了随机压缩计划以提高不同压缩比下的鲁棒性。 在保留的人类染色体上,FOCUS实现了接近无损的保真度:将1 kb的上下文压缩为仅10个摘要标记(约100倍)仅使每核苷酸概率平均变化约0.0004。 与没有压缩的基线相比,FOCUS减少了KV缓存内存,并将有效推理扩展从O(N^2)转换为近线性O(N),在商品GPU上实现了约100倍更长的推理窗口,同时保持接近无损的保真度。
Trained on massive cross-species DNA corpora, DNA large language models (LLMs) learn the fundamental "grammar" and evolutionary patterns of genomic sequences. This makes them powerful priors for DNA sequence modeling, particularly over long ranges. However, two major constraints hinder their use in practice: the quadratic computational cost of self-attention and the growing memory required for key-value (KV) caches during autoregressive decoding. These constraints force the use of heuristics such as fixed-window truncation or sliding windows, which compromise fidelity on ultra-long sequences by discarding distant information. We introduce FOCUS (Feature-Oriented Compression for Ultra-long Self-attention), a progressive context-compression module that can be plugged into pretrained DNA LLMs. FOCUS combines the established k-mer representation in genomics with learnable hierarchical compression: it inserts summary tokens at k-mer granularity and progressively compresses attention key and value activations across multiple Transformer layers, retaining only the summary KV states across windows while discarding ordinary-token KV. A shared-boundary windowing scheme yields a stationary cross-window interface that propagates long-range information with minimal loss. We validate FOCUS on an Evo-2-based DNA LLM fine-tuned on GRCh38 chromosome 1 with self-supervised training and randomized compression schedules to promote robustness across compression ratios. On held-out human chromosomes, FOCUS achieves near-lossless fidelity: compressing a 1 kb context into only 10 summary tokens (about 100x) shifts the average per-nucleotide probability by only about 0.0004. Compared to a baseline without compression, FOCUS reduces KV-cache memory and converts effective inference scaling from O(N^2) to near-linear O(N), enabling about 100x longer inference windows on commodity GPUs with near-lossless fidelity.
- [97] arXiv:2402.09697 (替换) [中文pdf, pdf, html, 其他]
-
标题: 关于三层数据市场标题: On Three-Layer Data Markets主题: 理论经济学 (econ.TH) ; 计算机科学与博弈论 (cs.GT)
我们研究了一个包含用户(数据所有者)、平台和数据购买者的三层数据市场。 每个用户通过提供数据来获得平台服务,在其数据被共享给购买者时,即使带有噪声,也会产生隐私损失。 用户选择与哪些平台共享数据,而平台在向购买者出售数据之前决定数据的噪声水平和定价。 购买者选择从哪些平台购买数据。 我们通过多阶段博弈对这些互动进行建模,重点关注子博弈纳什均衡。 我们发现,当购买者对用户数据赋予高价值(且平台可以收取高价格)时,所有平台都会为加入并将其数据与每个平台共享的用户提供建议。 相反,当购买者对用户数据的估值较低时,只有那些服务成本低的大平台才能负担得起为用户提供服务。 在这种情况下,用户只会加入并将其数据与这些低成本平台共享。 有趣的是,增加竞争对购买者有利,而不是对用户有利:随着平台数量的增加,用户效用不一定提高,而购买者效用会提高。 然而,增加竞争会提高整体功利福利。 在我们的分析基础上,我们进一步研究了改善用户效用的监管措施。 我们发现,当所有平台都是低成本时,禁止数据共享才能最大化用户效用。 在高成本和低成本平台混合的市场中,用户更倾向于最低噪声规定而不是数据共享禁令。 对高成本平台实施这一规定,并对低成本平台禁止数据共享,可以进一步提高用户效用。
We study a three-layer data market comprising users (data owners), platforms, and a data buyer. Each user benefits from platform services in exchange for data, incurring privacy loss when their data, albeit noisily, is shared with the buyer. The user chooses platforms to share data with, while platforms decide on data noise levels and pricing before selling to the buyer. The buyer selects platforms to purchase data from. We model these interactions via a multi-stage game, focusing on the subgame Nash equilibrium. We find that when the buyer places a high value on user data (and platforms can command high prices), all platforms offer services to the user who joins and shares data with every platform. Conversely, when the buyer's valuation of user data is low, only large platforms with low service costs can afford to serve users. In this scenario, users exclusively join and share data with these low-cost platforms. Interestingly, increased competition benefits the buyer, not the user: as the number of platforms increases, the user utility does not necessarily improve while the buyer utility improves. However, increasing the competition improves the overall utilitarian welfare. Building on our analysis, we then study regulations to improve the user utility. We discover that banning data sharing maximizes user utility only when all platforms are low-cost. In mixed markets of high- and low-cost platforms, users prefer a minimum noise mandate over a sharing ban. Imposing this mandate on high-cost platforms and banning data sharing for low-cost ones further enhances user utility.
- [98] arXiv:2511.13031 (替换) [中文pdf, pdf, html, 其他]
-
标题: 面向3D对象中心特征学习的语义场景补全标题: Towards 3D Object-Centric Feature Learning for Semantic Scene CompletionSebastiano Mengozzi, Giovanni B. Esposito, Michelangelo Bin, Andrea Acquaviva, Andrea Bartolini, Lorenzo Marconi评论: 被AAAI-2026接受主题: 计算机视觉与模式识别 (cs.CV)
基于视觉的3D语义场景补全(SSC)由于其在自动驾驶中的潜力而受到越来越多的关注。 尽管大多数现有方法通过在整个场景中聚合和扩散特征来遵循自我中心范式,但它们常常忽略细粒度的对象级细节,导致语义和几何模糊,尤其是在复杂环境中。 为了解决这一限制,我们提出了Ocean,一种以对象为中心的预测框架,将场景分解为单独的对象实例,以实现更准确的语义占用预测。 具体来说,我们首先使用轻量级分割模型MobileSAM从输入图像中提取实例掩码。 然后,我们引入一个3D语义组注意力模块,利用线性注意力在3D空间中聚合以对象为中心的特征。 为了处理分割错误和缺失实例,我们进一步设计了一个全局相似性引导的注意力模块,利用分割特征进行全局交互。 最后,我们提出了一种实例感知的局部扩散模块,通过生成过程改进实例特征,并随后在BEV空间中细化场景表示。 在SemanticKITTI和SSCBench-KITTI360基准上的大量实验表明,Ocean实现了最先进的性能,mIoU分数分别为17.40和20.28。
Vision-based 3D Semantic Scene Completion (SSC) has received growing attention due to its potential in autonomous driving. While most existing approaches follow an ego-centric paradigm by aggregating and diffusing features over the entire scene, they often overlook fine-grained object-level details, leading to semantic and geometric ambiguities, especially in complex environments. To address this limitation, we propose Ocean, an object-centric prediction framework that decomposes the scene into individual object instances to enable more accurate semantic occupancy prediction. Specifically, we first employ a lightweight segmentation model, MobileSAM, to extract instance masks from the input image. Then, we introduce a 3D Semantic Group Attention module that leverages linear attention to aggregate object-centric features in 3D space. To handle segmentation errors and missing instances, we further design a Global Similarity-Guided Attention module that leverages segmentation features for global interaction. Finally, we propose an Instance-aware Local Diffusion module that improves instance features through a generative process and subsequently refines the scene representation in the BEV space. Extensive experiments on the SemanticKITTI and SSCBench-KITTI360 benchmarks demonstrate that Ocean achieves state-of-the-art performance, with mIoU scores of 17.40 and 20.28, respectively.
- [99] arXiv:2511.14058 (替换) [中文pdf, pdf, html, 其他]
-
标题: 小边权子图的数量标题: On the number of small edge-weighted subgraphs主题: 组合数学 (math.CO)
子图计数是一项基础任务,是多种网络分析方法的基础,包括社区检测和图双样本检验。 计数子图是一个计算密集型问题。 大量研究集中在开发高效的算法和策略,以使其适用于更大的无权图。 实施这些算法对于数据专业人员或在算法原理和编程方面经验有限的研究人员来说可能是一个重大障碍。 此外,许多现实世界的网络都是加权的。 在加权网络中计算加权子图的数量存在计算挑战,因为在最坏情况下没有高效的算法。 在本文中,我们利用加权邻接矩阵推导出计数小边加权子图的显式公式。 这些公式适用于无权网络,为各个科学领域的研究人员提供了一个简单且高度实用的分析工具。 此外,我们引入了一种计算任意加权子图的通用方法。
Subgraph counting is a fundamental task that underpins several network analysis methodologies, including community detection and graph two-sample tests. Counting subgraphs is a computationally intensive problem. Substantial research has focused on developing efficient algorithms and strategies to make it feasible for larger unweighted graphs. Implementing those algorithms can be a significant hurdle for data professionals or researchers with limited expertise in algorithmic principles and programming. Furthermore, many real-world networks are weighted. Computing the number of weighted subgraphs in weighted networks presents a computational challenge, as no efficient algorithm exists for the worst-case scenario. In this paper, we derive explicit formulas for counting small edge-weighted subgraphs using the weighted adjacency matrix. These formulas are applicable to unweighted networks, offering a simple and highly practical analytical tool for researchers across various scientific domains. In addition, we introduce a generalized methodology for calculating arbitrary weighted subgraphs.
- [100] arXiv:2412.18911 (替换) [中文pdf, pdf, html, 其他]
-
标题: 重新思考逐标记特征缓存:通过双特征缓存加速扩散变压器标题: Rethinking Token-wise Feature Caching: Accelerating Diffusion Transformers with Dual Feature Caching主题: 机器学习 (cs.LG) ; 人工智能 (cs.AI) ; 计算机视觉与模式识别 (cs.CV)
扩散变压器(DiT)已成为图像和视频生成中的主导方法,但仍面临显著的计算成本。作为一种有效的DiT加速方法,特征缓存方法被设计用来缓存DiT在先前时间步的特征,并在下一个时间步中重复使用它们,使我们能够跳过下一个时间步的计算。其中,逐标记特征缓存被引入用于在DiT中对不同标记执行不同的缓存比例,旨在跳过不重要标记的计算,同时仍然计算重要标记。在本文中,我们通过以下两个问题仔细检查了逐标记特征缓存的有效性:(1)是否真的有必要在每个步骤中计算所谓的“重要”标记?(2)所谓的“重要”标记真的重要吗?令人惊讶的是,本文给出了某些反直觉的答案,证明在所有步骤中持续计算选定的“重要”标记是不必要的。所谓“重要”标记的选择通常无效,甚至有时比随机选择的表现还差。基于这些观察,本文引入了一种称为DuCa的双特征缓存,它迭代地执行激进的缓存策略和保守的缓存策略,并随机选择要计算的标记。大量实验结果证明了我们的方法在DiT、PixArt、FLUX和OpenSora中的有效性,证明了比之前的逐标记特征缓存有显著的改进。
Diffusion Transformers (DiT) have become the dominant methods in image and video generation yet still suffer substantial computational costs. As an effective approach for DiT acceleration, feature caching methods are designed to cache the features of DiT in previous timesteps and reuse them in the next timesteps, allowing us to skip the computation in the next timesteps. Among them, token-wise feature caching has been introduced to perform different caching ratios for different tokens in DiTs, aiming to skip the computation for unimportant tokens while still computing the important ones. In this paper, we propose to carefully check the effectiveness in token-wise feature caching with the following two questions: (1) Is it really necessary to compute the so-called "important" tokens in each step? (2) Are so-called important tokens really important? Surprisingly, this paper gives some counter-intuition answers, demonstrating that consistently computing the selected ``important tokens'' in all steps is not necessary. The selection of the so-called ``important tokens'' is often ineffective, and even sometimes shows inferior performance than random selection. Based on these observations, this paper introduces dual feature caching referred to as DuCa, which performs aggressive caching strategy and conservative caching strategy iteratively and selects the tokens for computing randomly. Extensive experimental results demonstrate the effectiveness of our method in DiT, PixArt, FLUX, and OpenSora, demonstrating significant improvements than the previous token-wise feature caching.
- [101] arXiv:2511.14552 (替换) [中文pdf, pdf, html, 其他]
-
标题: 真实量子 Mpemba 效应的实验观察与应用标题: Experimental observation and application of the genuine Quantum Mpemba Effect评论: 8页,6图主题: 量子物理 (quant-ph)
相干性是一种固有的量子特性,深刻影响着微观过程,包括热化现象。 一个显著的例子是量子Mpemba效应(QME),其中系统可以表现出异常的弛豫,从初始远离平衡的状态比从接近平衡的状态更快地热化。 在此,我们实验研究了真正的QME,并观察到与热库相互作用的自旋-1/2系统的动力学如何被加速至平衡。 此外,我们将QME应用于量子奥托制冷机,从而提高了其制冷功率。 这个概念验证实验揭示了改进量子热力学任务的新实用途径。
Coherence is an inherently quantum property that deeply affects microscopic processes, including thermalization phenomena. A striking example is the quantum Mpemba effect (QME), in which a system can exhibit anomalous relaxation, thermalizing faster from a state initially farther from equilibrium than from one closer. Here, we experimentally investigate the genuine QME and observe how the dynamics of a spin-1/2 system interacting with a heat sink can be sped-up to equilibrium. Furthermore, we apply the QME in a quantum Otto refrigerator, thereby increasing its cooling power. This proof-of-concept experiment unveils new practical paths for improving quantum thermal tasks.
- [102] arXiv:2511.14441 (替换) [中文pdf, pdf, html, 其他]
-
标题: 偏度稳健的定位尺度噪声模型因果发现标题: Skewness-Robust Causal Discovery in Location-Scale Noise Models主题: 机器学习 (stat.ML) ; 机器学习 (cs.LG)
为了在因果发现中区分马尔可夫等价图,有必要限制结构因果模型。 关键的是,我们需要能够在双变量模型中区分原因$X$和结果$Y$,即区分两个图$X \to Y$和$Y \to X$。 位置尺度噪声模型(LSNMs),其中结果$Y$基于原因$X$建模为$Y = f(X) + g(X)N$,形成一个灵活的模型类,在大多数情况下是通用且可识别的。 估计这些模型对于任意噪声项$N$是具有挑战性的。 因此,实际的估计器通常仅限于对称分布,例如正态分布。 正如我们在本文中展示的那样,当$N$是一个偏斜的随机变量,这在现实世界领域中很常见时,这些方法的可靠性会下降。 为了解决这一限制,我们提出了 SkewD,这是一种基于似然的算法,在具有偏斜噪声分布的LSNMs下进行二元因果发现。 SkewD 将通常的正态分布框架扩展到偏正态设置,使得在对称和偏斜噪声下都能进行可靠的推断。 对于参数估计,我们采用启发式搜索和期望条件最大化算法的组合。 我们在具有偏斜噪声的新合成数据集以及已建立的基准数据集上评估了 SkewD。 在整个实验中,SkewD 表现出强大的性能,并且在高偏度情况下,相较于先前的工作仍保持稳健。
To distinguish Markov equivalent graphs in causal discovery, it is necessary to restrict the structural causal model. Crucially, we need to be able to distinguish cause $X$ from effect $Y$ in bivariate models, that is, distinguish the two graphs $X \to Y$ and $Y \to X$. Location-scale noise models (LSNMs), in which the effect $Y$ is modeled based on the cause $X$ as $Y = f(X) + g(X)N$, form a flexible class of models that is general and identifiable in most cases. Estimating these models for arbitrary noise terms $N$, however, is challenging. Therefore, practical estimators are typically restricted to symmetric distributions, such as the normal distribution. As we showcase in this paper, when $N$ is a skewed random variable, which is likely in real-world domains, the reliability of these approaches decreases. To approach this limitation, we propose SkewD, a likelihood-based algorithm for bivariate causal discovery under LSNMs with skewed noise distributions. SkewD extends the usual normal-distribution framework to the skew-normal setting, enabling reliable inference under symmetric and skewed noise. For parameter estimation, we employ a combination of a heuristic search and an expectation conditional maximization algorithm. We evaluate SkewD on novel synthetically generated datasets with skewed noise as well as established benchmark datasets. Throughout our experiments, SkewD exhibits a strong performance and, in comparison to prior work, remains robust under high skewness.
- [103] arXiv:2408.00540 (替换) [中文pdf, pdf, html, 其他]
-
标题: 人工智能生命周期在通信网络中的能耗标题: The Energy Cost of Artificial Intelligence Lifecycle in Communication Networks评论: 16页,13图主题: 新兴技术 (cs.ET) ; 人工智能 (cs.AI) ; 机器学习 (cs.LG)
人工智能(AI)正在被纳入多个优化、调度、编排以及原生通信网络功能中。 这种范式转变导致了能耗的增加,然而,量化将智能引入通信系统的端到端能耗仍然是一个开放性挑战,因为传统的能耗指标只关注通信、计算基础设施或模型开发中的某一方面。 为了解决这个问题,我们提出了一种新的度量标准,即系统中AI模型的AI生命周期能耗(eCAL)。 eCAL通过(i)分析各个组件中数据收集和处理的复杂性,以及(ii)推导总体和每比特的能耗,来捕捉AI模型在通信网络中提供智能时的整个开发、部署和使用过程中的能耗。 我们表明,随着训练好的AI模型被用于推理的频率增加,其每次推理的能耗会降低,因为固定的训练能耗会被越来越多的推理次数分摊。 对于一个简单的案例研究,我们展示了100次推理的eCAL是1000次推理的2.73倍。 此外,我们开发了一个模块化且可扩展的开源仿真工具,使研究人员、实践者和工程师能够以各种配置和跨不同系统计算端到端能耗,确保适应各种使用场景。
Artificial Intelligence (AI) is being incorporated in several optimization, scheduling, orchestration as well as in native communication network functions. This paradigm shift results in increased energy consumption, however, quantifying the end-to-end energy consumption of adding intelligence to communication systems remains an open challenge since conventional energy consumption metrics focus on either communication, computation infrastructure, or model development. To address this, we propose a new metric, the Energy Cost of AI Lifecycle (eCAL) of an AI model in a system. eCAL captures the energy consumption throughout the development, deployment and utilization of an AI-model providing intelligence in a communication network by (i) analyzing the complexity of data collection and manipulation in individual components and (ii) deriving overall and per-bit energy consumption. We show that as a trained AI model is used more frequently for inference, its energy cost per inference decreases, since the fixed training energy is amortized over a growing number of inferences. For a simple case study we show that eCAL for 100 inferences is 2.73 times higher than for 1000 inferences. Additionally, we have developed a modular and extendable open-source simulation tool to enable researchers, practitioners, and engineers to calculate the end-to-end energy cost with various configurations and across various systems, ensuring adaptability to diverse use cases.
- [104] arXiv:2508.18503 (替换) [中文pdf, pdf, html, 其他]
-
标题: 相干成像中估计问题的极小极大分析标题: Minimax Analysis of Estimation Problems in Coherent Imaging评论: 63页。交叉引用的拼写错误已修复主题: 统计理论 (math.ST) ; 信息论 (cs.IT)
与磁共振成像等传统成像方式不同,这些方式通常可以用线性回归框架很好地描述,相干成像系统遵循一个显著更复杂的模型。 在这些系统中,任务是从形式为\[ {\boldsymbol y}_l = A_l X_o {\boldsymbol w}_l + {\boldsymbol z}_l, \quad l = 1, \ldots, L, \]的观测值${\boldsymbol y}_1, \ldots, {\boldsymbol y}_L \in \mathbb{R}^m$中估计未知图像${\boldsymbol x}_o \in \mathbb{R}^n$,其中$X_o = \mathrm{diag}({\boldsymbol x}_o)$是一个$n \times n$对角矩阵,${\boldsymbol w}_1, \ldots, {\boldsymbol w}_L \stackrel{\text{i.i.d.}}{\sim} \mathcal{N}(0,I_n)$表示散斑噪声,而${\boldsymbol z}_1, \ldots, {\boldsymbol z}_L \stackrel{\text{i.i.d.}}{\sim} \mathcal{N}(0,σ_z^2 I_m)$表示加性噪声。 矩阵$A_1, \ldots, A_L$是由成像系统确定的前向算子。 传统成像系统的极限通过稀疏线性回归模型得到了广泛研究。 然而,相干成像系统的极限仍大多未被探索。 我们的目标是通过表征高维设置中估计${\boldsymbol x}_o$的最小最大风险来填补这一空白。 受稀疏回归见解的启发,我们观察到${\boldsymbol x}_o$的结构在决定估计误差中起着关键作用。 在本工作中,我们采用基于覆盖数的一般结构概念,这更适合于相干成像系统。 我们证明最小最大均方误差(MSE)按\[ \frac{\max\{σ_z^4,\, m^2,\, n^2\}\, k \log n}{m^2 n L}, \]缩放,其中$k$是一个量化图像类有效复杂度的参数。
Unlike conventional imaging modalities, such as magnetic resonance imaging, which are often well described by a linear regression framework, coherent imaging systems follow a significantly more complex model. In these systems, the task is to estimate the unknown image ${\boldsymbol x}_o \in \mathbb{R}^n$ from observations ${\boldsymbol y}_1, \ldots, {\boldsymbol y}_L \in \mathbb{R}^m$ of the form \[ {\boldsymbol y}_l = A_l X_o {\boldsymbol w}_l + {\boldsymbol z}_l, \quad l = 1, \ldots, L, \] where $X_o = \mathrm{diag}({\boldsymbol x}_o)$ is an $n \times n$ diagonal matrix, ${\boldsymbol w}_1, \ldots, {\boldsymbol w}_L \stackrel{\text{i.i.d.}}{\sim} \mathcal{N}(0,I_n)$ represent speckle noise, and ${\boldsymbol z}_1, \ldots, {\boldsymbol z}_L \stackrel{\text{i.i.d.}}{\sim} \mathcal{N}(0,σ_z^2 I_m)$ denote additive noise. The matrices $A_1, \ldots, A_L$ are known forward operators determined by the imaging system. The fundamental limits of conventional imaging systems have been extensively studied through sparse linear regression models. However, the limits of coherent imaging systems remain largely unexplored. Our goal is to close this gap by characterizing the minimax risk of estimating ${\boldsymbol x}_o$ in high-dimensional settings. Motivated by insights from sparse regression, we observe that the structure of ${\boldsymbol x}_o$ plays a crucial role in determining the estimation error. In this work, we adopt a general notion of structure based on the covering numbers, which is more appropriate for coherent imaging systems. We show that the minimax mean squared error (MSE) scales as \[ \frac{\max\{σ_z^4,\, m^2,\, n^2\}\, k \log n}{m^2 n L}, \] where $k$ is a parameter that quantifies the effective complexity of the class of images.
- [105] arXiv:2511.13912 (替换) [中文pdf, pdf, html, 其他]
-
标题: 基于内存计算的状态空间模型在事件序列处理中的实现标题: Compute-in-Memory Implementation of State Space Models for Event Sequence Processing评论: 张晓雨和胡明涛对本研究的贡献相同主题: 信号处理 (eess.SP) ; 人工智能 (cs.AI) ; 机器学习 (cs.LG)
状态空间模型(SSMs)最近已成为一种强大的长序列处理框架,在各种基准测试中表现优于传统方法。 从根本上说,SSMs可以推广循环和卷积网络,并已被证明能够捕捉生物系统的关键功能。 在这里,我们报告了一种在节能计算内存(CIM)硬件中实现SSMs的方法,以实现实时、事件驱动的处理。 我们的工作重新参数化了模型,使其使用实数值系数和共享衰减常数,从而降低了模型映射到实际硬件系统的复杂性。 通过利用设备动态和对角化状态转移参数,状态演化可以在基于交叉条的CIM系统中原生实现,结合具有短期记忆效应的忆阻器。 通过这种算法和硬件协同设计,我们展示了所提出的系统在支持基于事件的视觉和音频任务的完全异步处理的同时,既具有高精度又具有高能效。
State space models (SSMs) have recently emerged as a powerful framework for long sequence processing, outperforming traditional methods on diverse benchmarks. Fundamentally, SSMs can generalize both recurrent and convolutional networks and have been shown to even capture key functions of biological systems. Here we report an approach to implement SSMs in energy-efficient compute-in-memory (CIM) hardware to achieve real-time, event-driven processing. Our work re-parameterizes the model to function with real-valued coefficients and shared decay constants, reducing the complexity of model mapping onto practical hardware systems. By leveraging device dynamics and diagonalized state transition parameters, the state evolution can be natively implemented in crossbar-based CIM systems combined with memristors exhibiting short-term memory effects. Through this algorithm and hardware co-design, we show the proposed system offers both high accuracy and high energy efficiency while supporting fully asynchronous processing for event-based vision and audio tasks.
- [106] arXiv:2509.03948 (替换) [中文pdf, pdf, html, 其他]
-
标题: 分类算法在空间用例中的局部鲁棒性的形式化验证标题: Formal Verification of Local Robustness of a Classification Algorithm for a Spatial Use CaseLiang Lin, Miao Yu, Kaiwen Luo, Yibo Zhang, Lilan Peng, Dexian Wang, Xuehai Tang, Yuanhe Zhang, Xikang Yang, Zhenhong Zhou, Kun Wang, Yang Liu评论: 在FMAS 2025会议论文集中,arXiv:2511.13245期刊参考: EPTCS 436,2025年,第15-30页主题: 机器学习 (cs.LG)
卫星部件的故障成本高昂且难以解决,通常需要大量的人力和物质资源。 在卫星中直接嵌入基于混合人工智能的故障检测系统可以大大减轻这一负担,从而实现更早的检测。 然而,此类系统必须以极高的可靠性运行。 为了确保这种程度的可靠性,我们采用形式化验证工具Marabou来验证基于人工智能算法中使用的神经网络模型的局部鲁棒性。 该工具允许我们量化在模型输出行为变得不稳定之前,模型输入可以被扰动的程度,从而提高其在不确定性下的性能可信度。
Failures in satellite components are costly and challenging to address, often requiring significant human and material resources. Embedding a hybrid AI-based system for fault detection directly in the satellite can greatly reduce this burden by allowing earlier detection. However, such systems must operate with extremely high reliability. To ensure this level of dependability, we employ the formal verification tool Marabou to verify the local robustness of the neural network models used in the AI-based algorithm. This tool allows us to quantify how much a model's input can be perturbed before its output behavior becomes unstable, thereby improving trustworthiness with respect to its performance under uncertainty.
- [107] arXiv:2508.18646 (替换) [中文pdf, pdf, html, 其他]
-
标题: 超越基准:具有拟人化和价值观导向的路线图的LLM评估标题: Beyond Benchmark: LLMs Evaluation with an Anthropomorphic and Value-oriented Roadmap评论: 预印本。正在审稿中主题: 人工智能 (cs.AI) ; 计算与语言 (cs.CL)
对于大型语言模型(LLMs),基准性能与实际应用价值之间仍存在脱节。 当前的评估框架仍然碎片化,侧重于技术指标,而忽视了部署中的整体评估。 本综述通过人类智能的视角引入了一种拟人化的评估范式,提出了一种新的三维分类法:智力商数(IQ)-基础能力,情感商数(EQ)-对齐能力,专业商数(PQ)-专业技能。 为了实现实际价值,我们首创了一种以价值为导向的评估(VQ)框架,用于评估经济可行性、社会影响、伦理一致性及环境可持续性。 我们的模块化架构集成了六个组件,并提供了实施路线图。 通过对200多个基准的分析,我们识别出关键挑战,包括动态评估需求和可解释性差距。 它为开发技术熟练、上下文相关且符合伦理的LLM提供了可行的指导。 我们在以下位置维护了一个精选的开源评估资源库:https://github.com/onejune2018/Awesome-LLM-Eval.
For Large Language Models (LLMs), a disconnect persists between benchmark performance and real-world utility. Current evaluation frameworks remain fragmented, prioritizing technical metrics while neglecting holistic assessment for deployment. This survey introduces an anthropomorphic evaluation paradigm through the lens of human intelligence, proposing a novel three-dimensional taxonomy: Intelligence Quotient (IQ)-General Intelligence for foundational capacity, Emotional Quotient (EQ)-Alignment Ability for value-based interactions, and Professional Quotient (PQ)-Professional Expertise for specialized proficiency. For practical value, we pioneer a Value-oriented Evaluation (VQ) framework assessing economic viability, social impact, ethical alignment, and environmental sustainability. Our modular architecture integrates six components with an implementation roadmap. Through analysis of 200+ benchmarks, we identify key challenges including dynamic assessment needs and interpretability gaps. It provides actionable guidance for developing LLMs that are technically proficient, contextually relevant, and ethically sound. We maintain a curated repository of open-source evaluation resources at: https://github.com/onejune2018/Awesome-LLM-Eval.
- [108] arXiv:2507.02373 (替换) [中文pdf, pdf, html, 其他]
-
标题: UVLM:水下世界理解的视频语言模型基准测试标题: UVLM: Benchmarking Video Language Model for Underwater World Understanding评论: 18页,10图,7表。被第40届人工智能大会(AAAI-26)接受,2026年主题: 计算机视觉与模式识别 (cs.CV)
最近,大型语言模型(LLMs)的显著成功对人工智能领域产生了深远的影响。 基于LLMs的大量先进工作已被提出,并应用于各种场景中。 其中,视频语言模型(VidLMs)尤其被广泛使用。 然而,现有工作主要集中在陆地场景,忽视了水下观测的高度需求的应用需求。 为克服这一差距,我们引入了UVLM,这是一个通过结合人类专业知识和AI模型的合作方法构建的水下观测基准。 为了确保数据质量,我们从多个角度进行了深入考虑。 首先,为了解决水下环境的独特挑战,我们选择了代表典型水下挑战的视频,包括光线变化、水体浑浊度和多样的观察角度,以构建数据集。 其次,为确保数据多样性,该数据集涵盖了广泛的帧率、分辨率、419种海洋动物类别以及各种静态植物和地形。 接下来,为了任务多样性,我们采用了一种结构化设计,将观察目标分为两大类:生物类和环境类。 每个类别包括内容观察和变化/动作观察,总共20种不同的任务类型。 最后,我们设计了几种具有挑战性的评估指标,以实现不同方法的定量比较和分析。 在两个代表性VidLM上的实验表明,在UVLM上微调VidLM显著提高了水下世界理解能力,同时也在现有的空中VidLM基准(如VideoMME和Perception text)上显示出轻微改进的潜力。 该数据集和提示工程将公开发布。
Recently, the remarkable success of large language models (LLMs) has achieved a profound impact on the field of artificial intelligence. Numerous advanced works based on LLMs have been proposed and applied in various scenarios. Among them, video language models (VidLMs) are particularly widely used. However, existing works primarily focus on terrestrial scenarios, overlooking the highly demanding application needs of underwater observation. To overcome this gap, we introduce UVLM, an under water observation benchmark which is build through a collaborative approach combining human expertise and AI models. To ensure data quality, we have conducted in-depth considerations from multiple perspectives. First, to address the unique challenges of underwater environments, we selected videos that represent typical underwater challenges including light variations, water turbidity, and diverse viewing angles to construct the dataset. Second, to ensure data diversity, the dataset covers a wide range of frame rates, resolutions, 419 classes of marine animals, and various static plants and terrains. Next, for task diversity, we adopted a structured design where observation targets are categorized into two major classes: biological and environmental. Each category includes content observation and change/action observation, totaling 20 distinct task types. Finally, we designed several challenging evaluation metrics to enable quantitative comparison and analysis of different methods. Experiments on two representative VidLMs demonstrate that fine-tuning VidLMs on UVLM significantly improves underwater world understanding while also showing potential for slight improvements on existing in-air VidLM benchmarks, such as VideoMME and Perception text. The dataset and prompt engineering will be released publicly.
- [109] arXiv:2511.12638 (替换) [中文pdf, pdf, html, 其他]
-
标题: ML GPU内核的等价性检查标题: Equivalence Checking of ML GPU Kernels主题: 编程语言 (cs.PL)
随着深度学习和大型语言模型(LLMs)的快速发展,公司现在花费大量资金执行GPU内核。 这些内核因此成为激进优化的主要目标。 最近的努力越来越多地利用LLMs生成GPU内核,但对生成的内核没有任何形式的保证。 我们提出了第一个GPU内核等价检查器,并用它来正式验证由人工、LLMs和编译器优化的机器学习(ML)内核的正确性。 我们证明了我们的等价检查器是合理的,并且对于一个定义明确的GPU内核类,包括感兴趣的程序,是完整的。 我们的实现VOLTA可以验证如卷积、矩阵乘法和各种注意力机制等ML计算。
With the rapid progress of deep learning and large language models (LLMs), companies now spend enormous sums executing GPU kernels. These kernels have, therefore, become prime targets for aggressive optimization. Recent efforts increasingly leverage LLMs to generate GPU kernels, but make no formal guarantees about the generated kernels. We present the first equivalence checker for GPU kernels and use it to formally verify the correctness of machine learning (ML) kernels optimized by hand, by LLMs, and by compilers. We show that our equivalence checker is sound and, for a well-defined class of GPU kernels which includes the programs of interest, complete. Our implementation, VOLTA, can verify ML computations such as convolutions, matrix multiplications, and various attention mechanisms.
- [110] arXiv:2506.09888 (替换) [中文pdf, pdf, html, 其他]
-
标题: 最大简约距离的表征所需状态数的加强界限标题: A strengthened bound on the number of states required to characterize maximum parsimony distance主题: 种群与进化 (q-bio.PE) ; 组合数学 (math.CO)
在本文中,我们证明了两个无根二叉系统发生树 $T_1, T_2$ 在同一组分类单元上的距离 $d_{\mathrm{MP}}(T_1,T_2) = k$ 可以由一个在其中一个 $T_1, T_2$ 上是凸的特征来定义,并且该特征最多有 $2k$ 个状态。这显著优于之前的 $7k-5$ 个状态的界限。 我们还证明,对于每个$k \geq 1$,存在两个树$T_1, T_2$具有$d_{\mathrm{MP}}(T_1,T_2) = k$,使得在任何达到此距离的字符中至少需要$k+1$个状态,并且在$T_1, T_2$中的一个上是凸的。 我们将这些下界和上界与实证分析相结合,结果表明在实际情况下通常所需的狀態數顯著少於$k+1$。
In this article we prove that the distance $d_{\mathrm{MP}}(T_1,T_2) = k$ between two unrooted binary phylogenetic trees $T_1, T_2$ on the same set of taxa can be defined by a character that is convex on one of $T_1, T_2$ and which has at most $2k$ states. This significantly improves upon the previous bound of $7k-5$ states. We also show that for every $k \geq 1$ there exist two trees $T_1, T_2$ with $d_{\mathrm{MP}}(T_1,T_2) = k$ such that at least $k+1$ states are necessary in any character that achieves this distance and which is convex on one of $T_1, T_2$. We augment these lower and upper bounds with an empirical analysis which shows that in practice significantly fewer than $k+1$ states are usually required.
- [111] arXiv:2509.18970 (替换) [中文pdf, pdf, 其他]
-
标题: 基于大语言模型的智能体存在幻觉问题:幻觉的分类、方法与研究方向标题: LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions主题: 人工智能 (cs.AI)
由大型语言模型(LLMs)的快速发展所推动,基于LLM的智能体已经发展成为能够进行类人认知、推理和交互的强大智能系统。 这些智能体正被越来越多地部署到各种现实应用场景中,包括学生教育、科学研究和金融分析。 然而,尽管它们具有显著的潜力,基于LLM的智能体仍然容易受到幻觉问题的困扰,这可能导致任务执行错误,并损害整体系统设计的可靠性。 解决这一关键挑战需要深入理解并系统整合近期在基于LLM的智能体方面的进展。 为此,我们提出了对基于LLM的智能体中幻觉现象的首次全面综述。 通过仔细分析智能体的完整工作流程,我们提出了一种新的分类法,以识别不同阶段出现的不同类型的智能体幻觉。 此外,我们深入探讨了导致智能体幻觉出现的十八种潜在原因。 通过对大量现有研究的详细回顾,我们总结了幻觉缓解和检测的方法,并指出了未来研究的有前景的方向。 我们希望本次综述能激发进一步努力来解决基于LLM的智能体中的幻觉问题,最终促进更加稳健和可靠的智能体系统的发展。
Driven by the rapid advancements of Large Language Models (LLMs), LLM-based agents have emerged as powerful intelligent systems capable of human-like cognition, reasoning, and interaction. These agents are increasingly being deployed across diverse real-world applications, including student education, scientific research, and financial analysis. However, despite their remarkable potential, LLM-based agents remain vulnerable to hallucination issues, which can result in erroneous task execution and undermine the reliability of the overall system design. Addressing this critical challenge requires a deep understanding and a systematic consolidation of recent advances on LLM-based agents. To this end, we present the first comprehensive survey of hallucinations in LLM-based agents. By carefully analyzing the complete workflow of agents, we propose a new taxonomy that identifies different types of agent hallucinations occurring at different stages. Furthermore, we conduct an in-depth examination of eighteen triggering causes underlying the emergence of agent hallucinations. Through a detailed review of a large number of existing studies, we summarize approaches for hallucination mitigation and detection, and highlight promising directions for future research. We hope this survey will inspire further efforts toward addressing hallucinations in LLM-based agents, ultimately contributing to the development of more robust and reliable agent systems.