ZKARCH 2025

The 1st Workshop on Architectures for
Zero-Knowledge Proofs and Verifiable Computation

To be held in conjunction with MICRO 2025.

zkarch2025

The first ZKARCH was a success! A growing ZK hardware community – see you at the next ZKARCH.

[More photos] Thank you to all speakers, organizers, and attendees for making it a success. Excited to see where the next ZKARCH will be hosted — volunteers welcome!

Workshop Details

Workshop Date: Saturday October 18, 2025.
Location: Charlotte Room, Lotte Hotel Seoul, Seoul, Korea (For more information, see MICRO 2025).

Saturday October 18^th, 2025 (all times are KST)
8:15 – 8:20 AM	Welcome & Opening Remarks
8:20 – 8:45 AM	Talk 1 (INDUSTRY TALK): Accelerating Verifiable Computing Revolution Dr. Radi Cojbasic (Irreducible)	Details
8:45 – 9:10 AM	Talk 2: LegoZK: a Dynamically Reconfigurable Accelerator for Zero Knowledge Proofs Lutan Zhao (Chinese Academy of Sciences)	Details
9:10 – 9:35 AM	Talk 3: Accelerating Zero-Knowledge Proofs with Multi-GPU Systems Zhuoran Ji (Shandong University)	Details
9:35 – 10:00 AM	Talk 4: Code Generation for Cryptographic Kernels using Multi-word Modular Arithmetic Naifeng Zhang (Carnegie Mellon University)	Details
10:00 – 10:20 AM	Break
10:20 – 10:45 AM	Talk 5 (INDUSTRY TALK): Accelerating the Sumcheck Protocol in Hardware Shanie Winitz (Ingonyama)	Details
10:45 – 11:10 AM	Talk 6: Accelerating HyperPlonk for Zero-Knowledge Proofs Alhad Daftardar (New York University)	Details
11:10 – 11:35 AM	Talk 7: AcclMT: A Highly Resource-Efficient and Flexible Poseidon Hash-Based Merkle Tree Architecture Changxu Liu (Fudan University)	Details
11:35 – 12:00 PM	Talk 8 (INDUSTRY TALK): Lessons from Building the World’s First Verifiable Processing Unit (VPU) Michael Gao (Fabric Cryptography)	Details
12:00 – 12:05 PM	Closing Remarks

Workshop Overview

In recent years, zero-knowledge proofs (ZKPs) and verifiable computation have moved from cryptographic theory into systems, applications, and real-world deployment — driving a surge of innovation across protocol design, implementation, and hardware acceleration. Dozens of new proof systems and variants (e.g., R1CS, PlonK, STARKs, Nova, zkVMs) are being published at a rapid pace, with parallel advances in tooling, languages, and accelerator architectures. However, each system has unique algebraic structures, performance profiles, and/or hardware requirements. Researchers and practitioners face fragmented toolchains, limited interoperability, and ad hoc design tradeoffs.

ZKARCH aims to provide a space for critical evaluation of these diverse protocols and frameworks, enabling the community to identify architectural bottlenecks, unify design patterns, and share lessons across different stacks. By bringing together developers of circuits, compilers, provers, and accelerators, this workshop will help bridge gaps between theory and deployment, and guide the co-evolution of protocols and systems in making ZKPs and verifiable computation practical and scalable.

Speaker List

Accelerating Verifiable Computing Revolution

Dr. Radi Cojbasic (Irreducible): In this talk, I will explore the rapid advances in high-performance zero-knowledge (ZK) proving and their implications for the future of secure computation. Zero-knowledge technology, once considered purely theoretical, now underpins a wide range of applications – from private credentials to scaling public blockchains such as Ethereum. Not long ago, generating a proof on a powerful CPU could take hours. Today, we can prove an entire Ethereum block in just 12 seconds.

This transformation reflects not only algorithmic breakthroughs but also a profound shift in computing paradigms. I will highlight the wave of innovation driving ZK acceleration, with a special focus on the role of hardware in reshaping what is computationally possible. From architectural design to system-level integration, hardware acceleration is enabling ZK proofs at unprecedented scale and efficiency. This keynote will provide both a survey of the state of the art and a forward-looking perspective on where this fast-moving field is headed.

LegoZK: a Dynamically Reconfigurable Accelerator for Zero Knowledge Proof

Lutan Zhao (Chinese Academy of Sciences): Zero-knowledge proof (ZKP) allows a prover to convince a verifier of the truth of a statement without revealing any secret information. This property is utilized in numerous privacy-preserving applications. However, the huge overhead of proof generation impedes the widespread adoption of ZKP. As a result, many ZKP accelerators have been developed to speed up proof generation. However, most existing accelerators are designed at the granularity of core operators and exhibit low hardware resource utilization and limited adaptability. In this work, we identify the commonality of all computation stages in proof generation at the level of basic finite field arithmetic operations. Based on this insight, we propose LegoZK, a dynamically reconfigurable hardware accelerator for ZKP. LegoZK employs finite field arithmetic units (FAUs) as its fundamental components and integrates these FAUs with a hierarchical on-chip network (NoC). By dynamically configuring the FAUs and the NoC, LegoZK can effectively accelerate the entire proof generation process, achieving higher overall performance.

Accelerating Zero-Knowledge Proofs with Multi-GPU Systems

Zhuoran Ji (Shandong University): Zero-knowledge proofs (ZKPs) enable the validation of statements without disclosing any underlying information, making them essential for applications such as verifiable outsourcing and digital currencies. However, widespread adoption of ZKPs remains constrained by lengthy proof generation times, predominantly due to two computationally intensive operations: Multi-Scalar Multiplication (MSM) and Number Theoretic Transform (NTT). Even with GPU acceleration, generating proofs for complex statements can require several minutes on a single GPU. This talk explores the potential of distributed multi-GPU systems to significantly accelerate ZKP generation, introducing two novel algorithms: DistMSM for MSM and UniNTT for NTT. Innovations are presented both at the algorithmic and GPU kernel levels. At the algorithmic level, MSM and NTT algorithms are adapted to effectively leverage multi-GPU architectures; at the GPU kernel level, optimized kernels are designed specifically for elliptic curve arithmetic and NTT computations, tailored to contemporary GPU hardware. Experiments demonstrate a 6.64X speedup in end-to-end proof generation on an 8-GPU system compared to single-GPU setups, underscoring the effectiveness of multi-GPU systems in enhancing the performance of ZKPs.

Code Generation for Cryptographic Kernels using Multi-word Modular Arithmetic

Naifeng Zhang (Carnegie Mellon University): Fully homomorphic encryption (FHE) and zero-knowledge proofs (ZKPs) are emerging as solutions for data security in distributed environments. However, the widespread adoption of these encryption techniques is hindered by their significant computational overhead, primarily resulting from core cryptographic operations that involve large integer arithmetic. This paper presents a formalization of multi-word modular arithmetic (MoMA), which breaks down large bit-width integer arithmetic into operations on machine words. We further develop a rewrite system that implements MoMA through recursive rewriting of data types, designed for compatibility with compiler infrastructures and code generators. We evaluate MoMA by generating cryptographic kernels, including basic linear algebra subprogram (BLAS) operations and the number theoretic transform (NTT), targeting various GPUs. Our MoMA-based BLAS operations outperform state-of-the-art multi-precision libraries by orders of magnitude, and MoMA-based NTTs achieve near-ASIC performance on commodity GPUs.

Accelerating the Sumcheck Protocol in Hardware

Shanie Winitz (Ingonyama): The Sumcheck protocol is a key component in modern zero-knowledge proofs, verifying that the sum of multilinear polynomials over the Boolean hypercube equals a claimed value. We present Ingonyama’s Sumcheck IP core, a programmable hardware design that operates on polynomials in HBM and supports arbitrary combine functions. The architecture leverages parallel processing units to fold polynomials efficiently, providing scalable hardware acceleration for ZK proofs.

Accelerating HyperPlonk for Zero-Knowledge Proofs

Alhad Daftardar (New York University): Zero-Knowledge Proofs (ZKPs) have have rapidly gained attention for their capability to prove the correctness of computations without revealing sensitive data. ZKPs have been proposed for blockchain technologies, verifiable machine learning, and electronic voting, but have yet to see widespread, ubiquitous adoption due to their high computational complexity. Naturally, there has been recent work to accelerate ZKP primitives and protocols using GPUs and ASICs. However, the protocols considered so far face one of two challenges: they require a trusted setup for each new application or generate large proofs with high verification costs, limiting their applicability in scenarios with numerous verifiers or strict verification time constraints. HyperPlonk is a state-of-the-art ZKP protocol that supports both one-time, universal setup and small proof sizes/verification costs expected by publicly verifiable, consensus-based systems (e.g., blockchain). While HyperPlonk offers efficient setup and verification, its proving phase is expensive due to wide-bit computations, high-degree polynomials, and intensive kernels like MSM and SumCheck. We introduce zkSpeed, a hardware accelerator that supports all major primitives and schedules protocol phases across specialized units. Using high-level synthesis and full-chip simulation, we optimize design tradeoffs for performance and bandwidth. On a 366 mm² chip with 2 TB/s off-chip bandwidth, zkSpeed achieves an 801× geomean speedup over a CPU baseline.

AcclMT: A Highly Resource-Efficient and Flexible Poseidon Hash-Based Merkle Tree Architecture

Changxu Liu (Fudan University): In Zero-Knowledge Proof (ZKP) protocols, Merkle Trees play a critical role in ensuring cryptographic security, particularly in zk-STARK schemes, where they share significant computational tasks with the Number Theoretic Transform (NTT). A key aspect of Merkle Trees is the use of hash functions, with Poseidon Hash emerging as a leading choice due to its ZK-friendly properties. To enhance the performance of these protocols, we present AcclMT, an innovative hardware architecture for efficient Merkle Tree construction based on Poseidon Hash. AcclMT employs a hardware-software co-design that optimizes the data flow of hash computations, leading to a highly resource-efficient and area-efficient Poseidon Hash engine. This engine enhances the utilization of modular multiplication resources and is paired with hierarchical on-chip cache and optimized task scheduling to accelerate the building of large Merkle Trees. Additionally, the architecture is flexible, supporting a range of parameter configurations to cater to different needs. Our experimental results demonstrate the effectiveness of AcclMT: the Poseidon Hash engine delivers a 14.3× speedup compared to the latest FPGA-based implementations, while also reducing area usage by 14.8%. When building Merkle Trees, AcclMT achieves up to 1665× speedup over software solutions, with average utilization rates of 95.9% and 99.2% for its two hash engines.

Lessons from Building the World’s First Verifiable Processing Unit (VPU)

Michael Gao (Fabric Cryptography): Michael Gao is the founder and CEO at Fabric Cryptography, which has a special announcement today!

ZKARCH 2025 Organizers and Contact

Alhad
Daftardar
ajd9396@nyu.edu

Jianqiao “Cambridge” Mo
jqmo@nyu.edu

Professor
Brandon Reagen
bjr5@nyu.edu

Professor Siddharth Garg
sg175@nyu.edu

Location

Registration

ZKARCH 2025 will be held in conjunction with the International Symposium on Microarchitecture (MICRO 2025). Refer to the main venue to continue with the registration process.