【zkMIPS系列】ZKM Prover之模拟器（Emulator）的运行

ZKM
发布于 2025-01-05 15:28
阅读 286

本文介绍了一个MIPS模拟器的设计，支持MIPS ELF程序的加载、执行和段生成。模拟器通过逐步执行指令并检查退出条件，支持常规执行和段分割两种模式。内存管理采用4KB页面并通过哈希树计算镜像ID，优化了修改页面的哈希计算，提升了性能。主要数据结构包括仿真状态、内存和段信息。

# 模拟器
模拟器主要实现了MIPS指令集的仿真操作，并提供了运行MIPS ELF程序和生成段的接口。所有代码可以在`zkm/emulator`目录下找到。

# 执行过程

MIPS程序的执行过程如下：（左边为常规执行过程，右边为执行过程中带有段分割的过程）

![elf_execuition_process.png](https://img.learnblockchain.cn/attachments/2025/01/ZLMTBmUU677a336948751.png)

主要步骤如下：

* **load_elf**: 将MIPS程序加载到模拟内存中。
* **patch_elf**: 隐藏一些可忽略的过程（例如运行时的某些检查）。
* **patch_stack**: 初始化初始的运行时栈（包括将程序参数填充到栈中）。
* **step**: 执行一条指令。在常规执行过程中，执行一步后直接判断是否触发退出条件。如果触发，进入退出过程，否则继续执行下一步；如果是带有段分割的执行过程，在检查退出条件后，还要检查当前已执行的步骤数是否达到段阈值。如果达到，进入`split_seg`过程。如果退出条件被触发，进入`split_seg + exit`过程。
* **split_seg**: 生成当前段的预内存镜像（包括系统架构状态）以及预/后镜像ID，并利用这些信息生成段数据结构，并将其写入对应的段输出文件。
* **Exit**: 结束程序执行。

# 主数据结构

主数据结构包括：

* **InstrumentedState**: 维护仿真系统的整体信息，包括MIPS架构状态、当前段ID、当前段的预状态（如PC、镜像ID、哈希根、输入等）。

```rust
pub struct InstrumentedState {
   /// state stores the state of the MIPS emulator
   pub state: Box<State>,

/// writer for stdout
   stdout_writer: Box<dyn Write>,
   /// writer for stderr
   stderr_writer: Box<dyn Write>,

pub pre_segment_id: u32,
   pre_pc: u32,
   pre_image_id: [u8; 32],
   pre_hash_root: [u8; 32],
   block_path: String,
   pre_input: Vec<Vec<u8>>,
   pre_input_ptr: usize,
   pre_public_values: Vec<u8>,
   pre_public_values_ptr: usize,
}
```

* **State**: 维护仿真系统的MIPS架构状态（寄存器、内存、堆指针等）。
```rust
pub struct State {
   pub memory: Box<Memory>,

/// the 32 general purpose registers of MIPS.
   pub registers: [u32; 32],
   /// the pc register stores the current execution instruction address.
   pub pc: u32,
   /// the next pc stores the next execution instruction address.
   next_pc: u32,
   /// the hi register stores the multiplier/divider result high(remainder) part.
   hi: u32,
   /// the low register stores the multiplier/divider result low(quotient) part.
   lo: u32,

/// heap handles the mmap syscall.
   heap: u32,

/// brk handles the brk syscall
   brk: u32,

/// tlb addr
   local_user: u32,

/// step tracks the total step has been executed.
   pub step: u64,
   pub total_step: u64,

/// cycle tracks the total cycle has been executed.
   pub cycle: u64,
   pub total_cycle: u64,

/// A stream of input values (global to the entire program).
   pub input_stream: Vec<Vec<u8>>,

/// A ptr to the current position in the input stream incremented by HINT_READ opcode.
   pub input_stream_ptr: usize,

/// A stream of public values from the program (global to entire program).
   pub public_values_stream: Vec<u8>,

/// A ptr to the current position in the public values stream, incremented when reading from public_values_stream.
   pub public_values_stream_ptr: usize,

pub exited: bool,
   pub exit_code: u8,
   dump_info: bool,
}
```
* **Memory**: 维护系统当前的内存镜像以及当前段的访问追踪信息。
```rust 
pub struct Memory {
   /// page index -> cached page
   pages: BTreeMap<u32, Rc<RefCell<CachedPage>>>,

// two caches: we often read instructions from one page, and do memory things with another page.
   // this prevents map lookups each instruction
   last_page_keys: [Option<u32>; 2],
   last_page: [Option<Rc<RefCell<CachedPage>>>; 2],

// for implement std::io::Read trait
   addr: u32,
   count: u32,

rtrace: BTreeMap<u32, [u8; PAGE_SIZE]>,
   wtrace: [BTreeMap<u32, Rc<RefCell<CachedPage>>>; 3],
}

```
* **Segment**: 维护段相关信息。
```rust
pub struct Segment {
   pub mem_image: BTreeMap<u32, u32>,  // initial memory image of segment
   pub pc: u32,                        // initial pc
   pub segment_id: u32,                // segment id
   pub pre_image_id: [u8; 32],         // image id of segment pre state 
   pub pre_hash_root: [u8; 32],       // hash root of segment pre memory image      
   pub image_id: [u8; 32],            // image id of segment post state 
   pub page_hash_root: [u8; 32],      // hash root of segment post memory image
   pub end_pc: u32,                   // end pc
   pub step: u64,                     // step number of cur segment
   pub input_stream: Vec<Vec<u8>>,
   pub input_stream_ptr: usize,
   pub public_values_stream: Vec<u8>,
   pub public_values_stream_ptr: usize,
}

```

# 指令仿真

模拟器使用指令解析方法执行指令：首先获取指令，然后根据指令编码解析并执行相应的函数，更新系统的状态/内存。主要代码：`mips_step()`可以在`state.rs`中找到。

支持的ISA可以在`mips_isa`中找到。

# 内存仿真和Image ID计算

内存以哈希树的形式组织，页面（4KB）作为节点。哈希页面的起始地址为0x8000000，程序地址空间为0\~0x8000000，根哈希页面地址为0x81020000，如下所示。

![memory.png](https://img.learnblockchain.cn/attachments/2025/01/5qESMYYZ677a33f0bc979.png)

页面哈希和镜像ID的计算过程如下：

1. 将内存（Mem）按页面（4KB）组织，计算相应的哈希，并存储在相应的哈希页面中；
2. 递归计算哈希页面的哈希值，直到只剩下一个哈希页面，即根哈希页面；
3. 在哈希页面的0x400偏移位置写入寄存器信息，计算根哈希页面的哈希值，并等待根哈希值；
4. 将根哈希值和对应的PC值拼接在一起，计算哈希值，得到镜像ID。

为了减少页面哈希更新的频率，在指令执行过程中会记录已修改的内存页面。因此，在镜像ID计算过程中，只需递归更新这些已修改的内存页面的哈希页面，计算根哈希并获得镜像ID。

模拟器

模拟器主要实现了MIPS指令集的仿真操作，并提供了运行MIPS ELF程序和生成段的接口。所有代码可以在zkm/emulator目录下找到。

执行过程

MIPS程序的执行过程如下：（左边为常规执行过程，右边为执行过程中带有段分割的过程）

主要步骤如下：

load_elf: 将MIPS程序加载到模拟内存中。
patch_elf: 隐藏一些可忽略的过程（例如运行时的某些检查）。
patch_stack: 初始化初始的运行时栈（包括将程序参数填充到栈中）。
step: 执行一条指令。在常规执行过程中，执行一步后直接判断是否触发退出条件。如果触发，进入退出过程，否则继续执行下一步；如果是带有段分割的执行过程，在检查退出条件后，还要检查当前已执行的步骤数是否达到段阈值。如果达到，进入split_seg过程。如果退出条件被触发，进入split_seg + exit过程。
split_seg: 生成当前段的预内存镜像（包括系统架构状态）以及预/后镜像ID，并利用这些信息生成段数据结构，并将其写入对应的段输出文件。
Exit: 结束程序执行。

主数据结构

主数据结构包括：

InstrumentedState: 维护仿真系统的整体信息，包括MIPS架构状态、当前段ID、当前段的预状态（如PC、镜像ID、哈希根、输入等）。

pub struct InstrumentedState {
   /// state stores the state of the MIPS emulator
   pub state: Box&lt;State>,

   /// writer for stdout
   stdout_writer: Box&lt;dyn Write>,
   /// writer for stderr
   stderr_writer: Box&lt;dyn Write>,

   pub pre_segment_id: u32,
   pre_pc: u32,
   pre_image_id: [u8; 32],
   pre_hash_root: [u8; 32],
   block_path: String,
   pre_input: Vec&lt;Vec&lt;u8>>,
   pre_input_ptr: usize,
   pre_public_values: Vec&lt;u8>,
   pre_public_values_ptr: usize,
}

State: 维护仿真系统的MIPS架构状态（寄存器、内存、堆指针等）。

pub struct State {
pub memory: Box&lt;Memory>,

/// the 32 general purpose registers of MIPS.
pub registers: [u32; 32],
/// the pc register stores the current execution instruction address.
pub pc: u32,
/// the next pc stores the next execution instruction address.
next_pc: u32,
/// the hi register stores the multiplier/divider result high(remainder) part.
hi: u32,
/// the low register stores the multiplier/divider result low(quotient) part.
lo: u32,

/// heap handles the mmap syscall.
heap: u32,

/// brk handles the brk syscall
brk: u32,

/// tlb addr
local_user: u32,

/// step tracks the total step has been executed.
pub step: u64,
pub total_step: u64,

/// cycle tracks the total cycle has been executed.
pub cycle: u64,
pub total_cycle: u64,

/// A stream of input values (global to the entire program).
pub input_stream: Vec&lt;Vec&lt;u8>>,

/// A ptr to the current position in the input stream incremented by HINT_READ opcode.
pub input_stream_ptr: usize,

/// A stream of public values from the program (global to entire program).
pub public_values_stream: Vec&lt;u8>,

/// A ptr to the current position in the public values stream, incremented when reading from public_values_stream.
pub public_values_stream_ptr: usize,

pub exited: bool,
pub exit_code: u8,
dump_info: bool,
}

Memory: 维护系统当前的内存镜像以及当前段的访问追踪信息。


pub struct Memory {
/// page index -> cached page
pages: BTreeMap&lt;u32, Rc&lt;RefCell&lt;CachedPage>>>,

// two caches: we often read instructions from one page, and do memory things with another page.
// this prevents map lookups each instruction
last_page_keys: [Option&lt;u32>; 2],
last_page: [Option&lt;Rc&lt;RefCell&lt;CachedPage>>>; 2],

// for implement std::io::Read trait
addr: u32,
count: u32,

rtrace: BTreeMap&lt;u32, [u8; PAGE_SIZE]>,
wtrace: [BTreeMap&lt;u32, Rc&lt;RefCell&lt;CachedPage>>>; 3],
}

* **Segment**: 维护段相关信息。
```rust
pub struct Segment {
   pub mem_image: BTreeMap&lt;u32, u32>,  // initial memory image of segment
   pub pc: u32,                        // initial pc
   pub segment_id: u32,                // segment id
   pub pre_image_id: [u8; 32],         // image id of segment pre state 
   pub pre_hash_root: [u8; 32],       // hash root of segment pre memory image      
   pub image_id: [u8; 32],            // image id of segment post state 
   pub page_hash_root: [u8; 32],      // hash root of segment post memory image
   pub end_pc: u32,                   // end pc
   pub step: u64,                     // step number of cur segment
   pub input_stream: Vec&lt;Vec&lt;u8>>,
   pub input_stream_ptr: usize,
   pub public_values_stream: Vec&lt;u8>,
   pub public_values_stream_ptr: usize,
}