Cryptography and Security 27
★ DynaMark: A Reinforcement Learning Framework for Dynamic Watermarking in Industrial Machine Tool Controllers
Industry 4.0's highly networked Machine Tool Controllers (MTCs) are prime
targets for replay attacks that use outdated sensor data to manipulate
actuators. Dynamic watermarking can reveal such tampering, but current schemes
assume linear-Gaussian dynamics and use constant watermark statistics, making
them vulnerable to the time-varying, partly proprietary behavior of MTCs. We
close this gap with DynaMark, a reinforcement learning framework that models
dynamic watermarking as a Markov decision process (MDP). It learns an adaptive
policy online that dynamically adapts the covariance of a zero-mean Gaussian
watermark using available measurements and detector feedback, without needing
system knowledge. DynaMark maximizes a unique reward function balancing control
performance, energy consumption, and detection confidence dynamically. We
develop a Bayesian belief updating mechanism for real-time detection confidence
in linear systems. This approach, independent of specific system assumptions,
underpins the MDP for systems with linear dynamics. On a Siemens Sinumerik 828D
controller digital twin, DynaMark achieves a reduction in watermark energy by
70% while preserving the nominal trajectory, compared to constant variance
baselines. It also maintains an average detection delay equivalent to one
sampling interval. A physical stepper-motor testbed validates these findings,
rapidly triggering alarms with less control performance decline and exceeding
existing benchmarks.
★ OptMark: Robust Multi-bit Diffusion Watermarking via Inference Time Optimization
Watermarking diffusion-generated images is crucial for copyright protection
and user tracking. However, current diffusion watermarking methods face
significant limitations: zero-bit watermarking systems lack the capacity for
large-scale user tracking, while multi-bit methods are highly sensitive to
certain image transformations or generative attacks, resulting in a lack of
comprehensive robustness. In this paper, we propose OptMark, an
optimization-based approach that embeds a robust multi-bit watermark into the
intermediate latents of the diffusion denoising process. OptMark strategically
inserts a structural watermark early to resist generative attacks and a detail
watermark late to withstand image transformations, with tailored regularization
terms to preserve image quality and ensure imperceptibility. To address the
challenge of memory consumption growing linearly with the number of denoising
steps during optimization, OptMark incorporates adjoint gradient methods,
reducing memory usage from O(N) to O(1). Experimental results demonstrate that
OptMark achieves invisible multi-bit watermarking while ensuring robust
resilience against valuemetric transformations, geometric transformations,
editing, and regeneration attacks.
★ Entropy-Based Non-Invasive Reliability Monitoring of Convolutional Neural Networks
Convolutional Neural Networks (CNNs) have become the foundation of modern
computer vision, achieving unprecedented accuracy across diverse image
recognition tasks. While these networks excel on in-distribution data, they
remain vulnerable to adversarial perturbations imperceptible input
modifications that cause misclassification with high confidence. However,
existing detection methods either require expensive retraining, modify network
architecture, or degrade performance on clean inputs. Here we show that
adversarial perturbations create immediate, detectable entropy signatures in
CNN activations that can be monitored without any model modification. Using
parallel entropy monitoring on VGG-16, we demonstrate that adversarial inputs
consistently shift activation entropy by 7% in early convolutional layers,
enabling 90% detection accuracy with false positives and false negative rates
below 20%. The complete separation between clean and adversarial entropy
distributions reveals that CNNs inherently encode distribution shifts in their
activation patterns. This work establishes that CNN reliability can be assessed
through activation entropy alone, enabling practical deployment of
self-diagnostic vision systems that detect adversarial inputs in real-time
without compromising original model performance.
comment: 8 pages, 3 figures, 2 tables
★ Cybersecurity AI: Hacking the AI Hackers via Prompt Injection
We demonstrate how AI-powered cybersecurity tools can be turned against
themselves through prompt injection attacks. Prompt injection is reminiscent of
cross-site scripting (XSS): malicious text is hidden within seemingly trusted
content, and when the system processes it, that text is transformed into
unintended instructions. When AI agents designed to find and exploit
vulnerabilities interact with malicious web servers, carefully crafted reponses
can hijack their execution flow, potentially granting attackers system access.
We present proof-of-concept exploits against the Cybersecurity AI (CAI)
framework and its CLI tool, and detail our mitigations against such attacks in
a multi-layered defense implementation. Our findings indicate that prompt
injection is a recurring and systemic issue in LLM-based architectures, one
that will require dedicated work to address, much as the security community has
had to do with XSS in traditional web applications.
★ I Stolenly Swear That I Am Up to (No) Good: Design and Evaluation of Model Stealing Attacks
Model stealing attacks endanger the confidentiality of machine learning
models offered as a service. Although these models are kept secret, a malicious
party can query a model to label data samples and train their own substitute
model, violating intellectual property. While novel attacks in the field are
continually being published, their design and evaluations are not standardised,
making it challenging to compare prior works and assess progress in the field.
This paper is the first to address this gap by providing recommendations for
designing and evaluating model stealing attacks. To this end, we study the
largest group of attacks that rely on training a substitute model -- those
attacking image classification models. We propose the first comprehensive
threat model and develop a framework for attack comparison. Further, we analyse
attack setups from related works to understand which tasks and models have been
studied the most. Based on our findings, we present best practices for attack
development before, during, and beyond experiments and derive an extensive list
of open research questions regarding the evaluation of model stealing attacks.
Our findings and recommendations also transfer to other problem domains, hence
establishing the first generic evaluation methodology for model stealing
attacks.
comment: Under review
★ Analogy between Learning With Error Problem and Ill-Posed Inverse Problems
In this work, we unveil an analogy between well-known lattice based learning
with error problem and ill-posed inverse problems. We show that LWE problem is
a structured inverse problem. Further, we propose a symmetric encryption scheme
based on ill-posed problems and thoroughly discuss its security. Finally, we
propose a public key encryption scheme based on our symmetric encryption scheme
and CRYSTALS-Kyber KEM (key encapsulation mechanism) and discuss its security.
★ Detecting Stealthy Data Poisoning Attacks in AI Code Generators
Deep learning (DL) models for natural language-to-code generation have become
integral to modern software development pipelines. However, their heavy
reliance on large amounts of data, often collected from unsanitized online
sources, exposes them to data poisoning attacks, where adversaries inject
malicious samples to subtly bias model behavior. Recent targeted attacks
silently replace secure code with semantically equivalent but vulnerable
implementations without relying on explicit triggers to launch the attack,
making it especially hard for detection methods to distinguish clean from
poisoned samples. We present a systematic study on the effectiveness of
existing poisoning detection methods under this stealthy threat model.
Specifically, we perform targeted poisoning on three DL models (CodeBERT,
CodeT5+, AST-T5), and evaluate spectral signatures analysis, activation
clustering, and static analysis as defenses. Our results show that all methods
struggle to detect triggerless poisoning, with representation-based approaches
failing to isolate poisoned samples and static analysis suffering false
positives and false negatives, highlighting the need for more robust,
trigger-independent defenses for AI-assisted code generation.
comment: Accepted to the 3rd IEEE International Workshop on Reliable and
Secure AI for Software Engineering (ReSAISE, 2025), co-located with ISSRE
2025
★ Hybrid Cryptographic Monitoring System for Side-Channel Attack Detection on PYNQ SoCs SC'25
AES-128 encryption is theoretically secure but vulnerable in practical
deployments due to timing and fault injection attacks on embedded systems. This
work presents a lightweight dual-detection framework combining statistical
thresholding and machine learning (ML) for real-time anomaly detection. By
simulating anomalies via delays and ciphertext corruption, we collect timing
and data features to evaluate two strategies: (1) a statistical threshold
method based on execution time and (2) a Random Forest classifier trained on
block-level anomalies. Implemented on CPU and FPGA (PYNQ-Z1), our results show
that the ML approach outperforms static thresholds in accuracy, while
maintaining real-time feasibility on embedded platforms. The framework operates
without modifying AES internals or relying on hardware performance counters.
This makes it especially suitable for low-power, resource-constrained systems
where detection accuracy and computational efficiency must be balanced.
comment: This paper is submitted at Supercomputing (SC'25)
★ Condense to Conduct and Conduct to Condense
In this paper we give the first examples of low-conductance permutations. The
notion of conductance of permutations was introduced in the paper
"Indifferentiability of Confusion-Diffusion Networks" by Dodis et al., where
the search for low-conductance permutations was initiated and motivated. In
this paper we not only give the desired examples, but also make a general
characterization of the problem -- i.e. we show that low-conductance
permutations are equivalent to permutations that have the information-theoretic
properties of the so-called Multi-Source-Somewhere-Condensers.
★ Agentic Discovery and Validation of Android App Vulnerabilities
Existing Android vulnerability detection tools overwhelm teams with thousands
of low-signal warnings yet uncover few true positives. Analysts spend days
triaging these results, creating a bottleneck in the security pipeline.
Meanwhile, genuinely exploitable vulnerabilities often slip through, leaving
opportunities open to malicious counterparts.
We introduce A2, a system that mirrors how security experts analyze and
validate Android vulnerabilities through two complementary phases: (i) Agentic
Vulnerability Discovery, which reasons about application security by combining
semantic understanding with traditional security tools; and (ii) Agentic
Vulnerability Validation, which systematically validates vulnerabilities across
Android's multi-modal attack surface-UI interactions, inter-component
communication, file system operations, and cryptographic computations.
On the Ghera benchmark (n=60), A2 achieves 78.3% coverage, surpassing
state-of-the-art analyzers (e.g., APKHunt 30.0%). Rather than overwhelming
analysts with thousands of warnings, A2 distills results into 82 speculative
vulnerability findings, including 47 Ghera cases and 28 additional true
positives. Crucially, A2 then generates working Proof-of-Concepts (PoCs) for 51
of these speculative findings, transforming them into validated vulnerability
findings that provide direct, self-confirming evidence of exploitability.
In real-world evaluation on 169 production APKs, A2 uncovers 104
true-positive zero-day vulnerabilities. Among these, 57 (54.8%) are
self-validated with automatically generated PoCs, including a medium-severity
vulnerability in a widely used application with over 10 million installs.
★ Generalized Encrypted Traffic Classification Using Inter-Flow Signals
In this paper, we present a novel encrypted traffic classification model that
operates directly on raw PCAP data without requiring prior assumptions about
traffic type. Unlike existing methods, it is generalizable across multiple
classification tasks and leverages inter-flow signals - an innovative
representation that captures temporal correlations and packet volume
distributions across flows. Experimental results show that our model
outperforms well-established methods in nearly every classification task and
across most datasets, achieving up to 99% accuracy in some cases, demonstrating
its robustness and adaptability.
comment: Accepted manuscript at Availability, Reliability and Security (ARES
2025), published in Lecture Notes in Computer Science, vol. 15992, Springer,
Cham. DOI: https://doi.org/10.1007/978-3-032-00624-0_11
★ Towards a Decentralized IoT Onboarding for Smart Homes Using Consortium Blockchain
The increasing adoption of smart home devices and IoT-based security systems
presents significant opportunities to enhance convenience, safety, and risk
management for homeowners and service providers. However, secure
onboarding-provisioning credentials and establishing trust with cloud
platforms-remains a considerable challenge. Traditional onboarding methods
often rely on centralized Public Key Infrastructure (PKI) models and
manufacturer-controlled keys, which introduce security risks and limit the
user's digital sovereignty. These limitations hinder the widespread deployment
of scalable IoT solutions. This paper presents a novel onboarding framework
that builds upon existing network-layer onboarding techniques and extends them
to the application layer to address these challenges. By integrating consortium
blockchain technology, we propose a decentralized onboarding mechanism that
enhances transparency, security, and monitoring for smart home architectures.
The architecture supports device registration, key revocation, access control
management, and risk detection through event-driven alerts across dedicated
blockchain channels and smart contracts. To evaluate the framework, we formally
model the protocol using the Tamarin Prover under the Dolev-Yao adversary
model. The analysis focuses on authentication, token integrity, key
confidentiality, and resilience over public channels. A prototype
implementation demonstrates the system's viability in smart home settings, with
verification completing in 0.34 seconds, highlighting its scalability and
suitability for constrained devices and diverse stakeholders. Additionally,
performance evaluation shows that the blockchain-based approach effectively
handles varying workloads, maintains high throughput and low latency, and
supports near real-time IoT data processing.
★ SoK: Large Language Model-Generated Textual Phishing Campaigns End-to-End Analysis of Generation, Characteristics, and Detection
Phishing is a pervasive form of social engineering in which attackers
impersonate trusted entities to steal information or induce harmful actions.
Text-based phishing dominates for its low cost, scalability, and
concealability, advantages recently amplified by large language models (LLMs)
that enable ``Phishing-as-a-Service'' attacks at scale within minutes. Despite
the growing research into LLM-facilitated phishing attacks, consolidated
systematic research on the phishing attack life cycle remains scarce. In this
work, we present the first systematization of knowledge (SoK) on LLM-generated
phishing, offering an end-to-end analysis that spans generation techniques,
attack features, and mitigation strategies. We introduce
Generation-Characterization-Defense (GenCharDef), which systematizes the ways
in which LLM-generated phishing differs from traditional phishing across
methodologies, security perspectives, data dependencies, and evaluation
practices. This framework highlights unique challenges of LLM-driven phishing,
providing a coherent foundation for understanding the evolving threat landscape
and guiding the design of more resilient defenses.
comment: 13 pages, 3 tables, 4 figures
★ Time Tells All: Deanonymization of Blockchain RPC Users with Zero Transaction Fee (Extended Version)
Remote Procedure Call (RPC) services have become a primary gateway for users
to access public blockchains. While they offer significant convenience, RPC
services also introduce critical privacy challenges that remain insufficiently
examined. Existing deanonymization attacks either do not apply to blockchain
RPC users or incur costs like transaction fees assuming an active network
eavesdropper. In this paper, we propose a novel deanonymization attack that can
link an IP address of a RPC user to this user's blockchain pseudonym. Our
analysis reveals a temporal correlation between the timestamps of transaction
confirmations recorded on the public ledger and those of TCP packets sent by
the victim when querying transaction status. We assume a strong passive
adversary with access to network infrastructure, capable of monitoring traffic
at network border routers or Internet exchange points. By monitoring network
traffic and analyzing public ledgers, the attacker can link the IP address of
the TCP packet to the pseudonym of the transaction initiator by exploiting the
temporal correlation. This deanonymization attack incurs zero transaction fee.
We mathematically model and analyze the attack method, perform large-scale
measurements of blockchain ledgers, and conduct real-world attacks to validate
the attack. Our attack achieves a high success rate of over 95% against normal
RPC users on various blockchain networks, including Ethereum, Bitcoin and
Solana.
★ RepoMark: A Code Usage Auditing Framework for Code Large Language Models
The rapid development of Large Language Models (LLMs) for code generation has
transformed software development by automating coding tasks with unprecedented
efficiency.
However, the training of these models on open-source code repositories (e.g.,
from GitHub) raises critical ethical and legal concerns, particularly regarding
data authorization and open-source license compliance. Developers are
increasingly questioning whether model trainers have obtained proper
authorization before using repositories for training, especially given the lack
of transparency in data collection.
To address these concerns, we propose a novel data marking framework RepoMark
to audit the data usage of code LLMs. Our method enables repository owners to
verify whether their code has been used in training, while ensuring semantic
preservation, imperceptibility, and theoretical false detection rate (FDR)
guarantees. By generating multiple semantically equivalent code variants,
RepoMark introduces data marks into the code files, and during detection,
RepoMark leverages a novel ranking-based hypothesis test to detect memorization
within the model. Compared to prior data auditing approaches, RepoMark
significantly enhances sample efficiency, allowing effective auditing even when
the user's repository possesses only a small number of code files.
Experiments demonstrate that RepoMark achieves a detection success rate over
90\% on small code repositories under a strict FDR guarantee of 5\%. This
represents a significant advancement over existing data marking techniques, all
of which only achieve accuracy below 55\% under identical settings. This
further validates RepoMark as a robust, theoretically sound, and promising
solution for enhancing transparency in code LLM training, which can safeguard
the rights of repository owners.
★ An Empirical Study of Vulnerable Package Dependencies in LLM Repositories
Large language models (LLMs) have developed rapidly in recent years,
revolutionizing various fields. Despite their widespread success, LLMs heavily
rely on external code dependencies from package management systems, creating a
complex and interconnected LLM dependency supply chain. Vulnerabilities in
dependencies can expose LLMs to security risks. While existing research
predominantly focuses on model-level security threats, vulnerabilities within
the LLM dependency supply chain have been overlooked. To fill this gap, we
conducted an empirical analysis of 52 open-source LLMs, examining their
third-party dependencies and associated vulnerabilities. We then explored
activities within the LLM repositories to understand how maintainers manage
third-party vulnerabilities in practice. Finally, we compared third-party
dependency vulnerabilities in the LLM ecosystem to those in the Python
ecosystem. Our results show that half of the vulnerabilities in the LLM
ecosystem remain undisclosed for more than 56.2 months, significantly longer
than those in the Python ecosystem. Additionally, 75.8% of LLMs include
vulnerable dependencies in their configuration files. This study advances the
understanding of LLM supply chain risks, provides insights for practitioners,
and highlights potential directions for improving the security of the LLM
supply chain.
★ zkLoRA: Fine-Tuning Large Language Models with Verifiable Security via Zero-Knowledge Proofs
Fine-tuning large language models (LLMs) is crucial for adapting them to
specific tasks, yet it remains computationally demanding and raises concerns
about correctness and privacy, particularly in untrusted environments. Although
parameter-efficient methods like Low-Rank Adaptation (LoRA) significantly
reduce resource requirements, ensuring the security and verifiability of
fine-tuning under zero-knowledge constraints remains an unresolved challenge.
To address this, we introduce zkLoRA, the first framework to integrate LoRA
fine-tuning with zero-knowledge proofs (ZKPs), achieving provable security and
correctness. zkLoRA employs advanced cryptographic techniques -- such as lookup
arguments, sumcheck protocols, and polynomial commitments -- to verify both
arithmetic and non-arithmetic operations in Transformer-based architectures.
The framework provides end-to-end verifiability for forward propagation,
backward propagation, and parameter updates during LoRA fine-tuning, while
safeguarding the privacy of model parameters and training data. Leveraging
GPU-based implementations, zkLoRA demonstrates practicality and efficiency
through experimental validation on open-source LLMs like LLaMA, scaling up to
13 billion parameters. By combining parameter-efficient fine-tuning with ZKPs,
zkLoRA bridges a critical gap, enabling secure and trustworthy deployment of
LLMs in sensitive or untrusted environments.
★ Risks and Compliance with the EU's Core Cyber Security Legislation
The European Union (EU) has long favored a risk-based approach to regulation.
Such an approach is also used in recent cyber security legislation enacted in
the EU. Risks are also inherently related to compliance with the new
legislation. Objective: The paper investigates how risks are framed in the EU's
five core cyber security legislative acts, whether the framings indicate
convergence or divergence between the acts and their risk concepts, and what
qualifying words and terms are used when describing the legal notions of risks.
Method : The paper's methodology is based on qualitative legal interpretation
and taxonomy-building. Results: The five acts have an encompassing coverage of
different cyber security risks, including but not limited to risks related to
technical, organizational, and human security as well as those not originating
from man-made actions. Both technical aspects and assets are used to frame the
legal risk notions in many of the legislative acts. A threat-centric viewpoint
is also present in one of the acts. Notable gaps are related to acceptable
risks, non-probabilistic risks, and residual risks. Conclusion: The EU's new
cyber security legislation has significantly extended the risk-based approach
to regulations. At the same time, complexity and compliance burden have
increased. With this point in mind, the paper concludes with a few practical
takeaways about means to deal with compliance and research it.
comment: Submitted to IST (VSI:RegCompliance in SE)
★ LLM-driven Provenance Forensics for Threat Investigation and Detection
We introduce PROVSEEK, an LLM-powered agentic framework for automated
provenance-driven forensic analysis and threat intelligence extraction.
PROVSEEK employs specialized toolchains to dynamically retrieve relevant
context by generating precise, context-aware queries that fuse a vectorized
threat report knowledge base with data from system provenance databases. The
framework resolves provenance queries, orchestrates multiple role-specific
agents to mitigate hallucinations, and synthesizes structured, ground-truth
verifiable forensic summaries. By combining agent orchestration with
Retrieval-Augmented Generation (RAG) and chain-of-thought (CoT) reasoning,
PROVSEEK enables adaptive multi-step analysis that iteratively refines
hypotheses, verifies supporting evidence, and produces scalable, interpretable
forensic explanations of attack behaviors. By combining provenance data with
agentic reasoning, PROVSEEK establishes a new paradigm for grounded agentic
forecics to investigate APTs. We conduct a comprehensive evaluation on publicly
available DARPA datasets, demonstrating that PROVSEEK outperforms
retrieval-based methods for intelligence extraction task, achieving a 34%
improvement in contextual precision/recall; and for threat detection task,
PROVSEEK achieves 22%/29% higher precision/recall compared to both a baseline
agentic AI approach and State-Of-The-Art (SOTA) Provenance-based Intrusion
Detection System (PIDS).
★ Locus: Agentic Predicate Synthesis for Directed Fuzzing
Directed fuzzing aims to find program inputs that lead to specified target
program states. It has broad applications, such as debugging system crashes,
confirming reported bugs, and generating exploits for potential
vulnerabilities. This task is inherently challenging because target states are
often deeply nested in the program, while the search space manifested by
numerous possible program inputs is prohibitively large. Existing approaches
rely on branch distances or manually-specified constraints to guide the search;
however, the branches alone are often insufficient to precisely characterize
progress toward reaching the target states, while the manually specified
constraints are often tailored for specific bug types and thus difficult to
generalize to diverse target states and programs.
We present Locus, a novel framework to improve the efficiency of directed
fuzzing. Our key insight is to synthesize predicates to capture fuzzing
progress as semantically meaningful intermediate states, serving as milestones
towards reaching the target states. When used to instrument the program under
fuzzing, they can reject executions unlikely to reach the target states, while
providing additional coverage guidance. To automate this task and generalize to
diverse programs, Locus features an agentic framework with program analysis
tools to synthesize and iteratively refine the candidate predicates, while
ensuring the predicates strictly relax the target states to prevent false
rejections via symbolic execution. Our evaluation shows that Locus
substantially improves the efficiency of eight state-of-the-art fuzzers in
discovering real-world vulnerabilities, achieving an average speedup of 41.6x.
So far, Locus has found eight previously unpatched bugs, with one already
acknowledged with a draft patch.
♻ ★ SAGA: A Security Architecture for Governing AI Agentic Systems
Large Language Model (LLM)-based agents increasingly interact, collaborate,
and delegate tasks to one another autonomously with minimal human interaction.
Industry guidelines for agentic system governance emphasize the need for users
to maintain comprehensive control over their agents, mitigating potential
damage from malicious agents. Several proposed agentic system designs address
agent identity, authorization, and delegation, but remain purely theoretical,
without concrete implementation and evaluation. Most importantly, they do not
provide user-controlled agent management.
To address this gap, we propose SAGA, a scalable Security Architecture for
Governing Agentic systems, that offers user oversight over their agents'
lifecycle. In our design, users register their agents with a central entity,
the Provider, that maintains agent contact information, user-defined access
control policies, and helps agents enforce these policies on inter-agent
communication. We introduce a cryptographic mechanism for deriving access
control tokens, that offers fine-grained control over an agent's interaction
with other agents, providing formal security guarantees. We evaluate SAGA on
several agentic tasks, using agents in different geolocations, and multiple
on-device and cloud LLMs, demonstrating minimal performance overhead with no
impact on underlying task utility in a wide range of conditions. Our
architecture enables secure and trustworthy deployment of autonomous agents,
accelerating the responsible adoption of this technology in sensitive
environments.
♻ ★ Asynchronous Approximate Agreement with Quadratic Communication SC 2025
We consider an asynchronous network of $n$ message-sending parties, up to $t$
of which are byzantine. We study approximate agreement, where the parties
obtain approximately equal outputs in the convex hull of their inputs. In their
seminal work, Abraham, Amit and Dolev [OPODIS '04] solve this problem in
$\mathbb{R}$ with the optimal resilience $t < \frac{n}{3}$ with a protocol
where each party reliably broadcasts a value in every iteration. This takes
$\Theta(n^2)$ messages per reliable broadcast, or $\Theta(n^3)$ messages per
iteration.
In this work, we forgo reliable broadcast to achieve asynchronous approximate
agreement against $t < \frac{n}{3}$ faults with a quadratic communication. In a
tree with the maximum degree $\Delta$ and the centroid decomposition height
$h$, we achieve edge agreement in at most $6h + 1$ rounds with
$\mathcal{O}(n^2)$ messages of size $\mathcal{O}(\log \Delta + \log h)$ per
round. We do this by designing a 6-round multivalued 2-graded consensus
protocol and using it to recursively reduce the task to edge agreement in a
subtree with a smaller centroid decomposition height. Then, we achieve edge
agreement in the infinite path $\mathbb{Z}$, again with the help of 2-graded
consensus. Finally, we show that our edge agreement protocol enables
$\varepsilon$-agreement in $\mathbb{R}$ in $6\log_2\frac{M}{\varepsilon} +
\mathcal{O}(\log \log \frac{M}{\varepsilon})$ rounds with $\mathcal{O}(n^2 \log
\frac{M}{\varepsilon})$ messages and $\mathcal{O}(n^2\log
\frac{M}{\varepsilon}\log \log \frac{M}{\varepsilon})$ bits of communication,
where $M$ is the maximum non-byzantine input magnitude.
comment: 26 pages, full version of a DISC 2025 brief announcement
♻ ★ Publish to Perish: Prompt Injection Attacks on LLM-Assisted Peer Review
Large Language Models (LLMs) are increasingly being integrated into the
scientific peer-review process, raising new questions about their reliability
and resilience to manipulation. In this work, we investigate the potential for
hidden prompt injection attacks, where authors embed adversarial text within a
paper's PDF to influence the LLM-generated review. We begin by formalising
three distinct threat models that envision attackers with different motivations
-- not all of which implying malicious intent. For each threat model, we design
adversarial prompts that remain invisible to human readers yet can steer an
LLM's output toward the author's desired outcome. Using a user study with
domain scholars, we derive four representative reviewing prompts used to elicit
peer reviews from LLMs. We then evaluate the robustness of our adversarial
prompts across (i) different reviewing prompts, (ii) different commercial
LLM-based systems, and (iii) different peer-reviewed papers. Our results show
that adversarial prompts can reliably mislead the LLM, sometimes in ways that
adversely affect a "honest-but-lazy" reviewer. Finally, we propose and
empirically assess methods to reduce detectability of adversarial prompts under
automated content checks.
♻ ★ CLUE-MARK: Watermarking Diffusion Models using CLWE
As AI-generated images become widespread, reliable watermarking is essential
for content verification, copyright enforcement, and combating disinformation.
Existing techniques rely on heuristic approaches and lack formal guarantees of
undetectability, making them vulnerable to steganographic attacks that can
expose or erase the watermark. Additionally, these techniques often degrade
output quality by introducing perceptible changes, which is not only
undesirable but an important barrier to adoption in practice.
In this work, we introduce CLUE-Mark, the first provably undetectable
watermarking scheme for diffusion models. CLUE-Mark requires no changes to the
model being watermarked, is computationally efficient, and because it is
provably undetectable is guaranteed to have no impact on model output quality.
Our approach leverages the Continuous Learning With Errors (CLWE) problem -- a
cryptographically hard lattice problem -- to embed watermarks in the latent
noise vectors used by diffusion models. By proving undetectability via
reduction from a cryptographically hard problem we ensure not only that the
watermark is imperceptible to human observers or adhoc heuristics, but to
\emph{any} efficient detector that does not have the secret key. CLUE-Mark
allows multiple keys to be embedded, enabling traceability of images to
specific users without altering model parameters. Empirical evaluations on
state-of-the-art diffusion models confirm that CLUE-Mark achieves high
recoverability, preserves image quality, and is robust to minor perturbations
such JPEG compression and brightness adjustments. Uniquely, CLUE-Mark cannot be
detected nor removed by recent steganographic attacks.
♻ ★ RevPRAG: Revealing Poisoning Attacks in Retrieval-Augmented Generation through LLM Activation Analysis EMNLP 2025
Retrieval-Augmented Generation (RAG) enriches the input to LLMs by retrieving
information from the relevant knowledge database, enabling them to produce
responses that are more accurate and contextually appropriate. It is worth
noting that the knowledge database, being sourced from publicly available
channels such as Wikipedia, inevitably introduces a new attack surface. RAG
poisoning involves injecting malicious texts into the knowledge database,
ultimately leading to the generation of the attacker's target response (also
called poisoned response). However, there are currently limited methods
available for detecting such poisoning attacks. We aim to bridge the gap in
this work. Particularly, we introduce RevPRAG, a flexible and automated
detection pipeline that leverages the activations of LLMs for poisoned response
detection. Our investigation uncovers distinct patterns in LLMs' activations
when generating correct responses versus poisoned responses. Our results on
multiple benchmark datasets and RAG architectures show our approach could
achieve 98% true positive rate, while maintaining false positive rates close to
1%.
comment: Accepted to Findings of EMNLP 2025
♻ ★ On the Adversarial Robustness of Spiking Neural Networks Trained by Local Learning
Recent research has shown the vulnerability of Spiking Neural Networks (SNNs)
under adversarial examples that are nearly indistinguishable from clean data in
the context of frame-based and event-based information. The majority of these
studies are constrained in generating adversarial examples using
Backpropagation Through Time (BPTT), a gradient-based method which lacks
biological plausibility. In contrast, local learning methods, which relax many
of BPTT's constraints, remain under-explored in the context of adversarial
attacks. To address this problem, we examine adversarial robustness in SNNs
through the framework of four types of training algorithms. We provide an
in-depth analysis of the ineffectiveness of gradient-based adversarial attacks
to generate adversarial instances in this scenario. To overcome these
limitations, we introduce a hybrid adversarial attack paradigm that leverages
the transferability of adversarial instances. The proposed hybrid approach
demonstrates superior performance, outperforming existing adversarial attack
methods. Furthermore, the generalizability of the method is assessed under
multi-step adversarial attacks, adversarial attacks in black-box FGSM
scenarios, and within the non-spiking domain.
♻ ★ Survey of Privacy Threats and Countermeasures in Federated Learning
Federated learning is widely considered to be as a privacy-aware learning
method because no training data is exchanged directly between clients.
Nevertheless, there are threats to privacy in federated learning, and privacy
countermeasures have been studied. However, we note that common and unique
privacy threats among typical types of federated learning have not been
categorized and described in a comprehensive and specific way. In this paper,
we describe privacy threats and countermeasures for the typical types of
federated learning; horizontal federated learning, vertical federated learning,
and transfer federated learning.
comment: The revised paper has been accepted as a full paper for presentation
at The 3rd IEEE International Conference on Federated Learning Technologies
and Applications (FLTA25)