vllm.v1.worker.gpu.spec_decode.eagle.cudagraph ¶
DecodeEagleCudaGraphManager ¶
Bases: EagleCudaGraphManagerBase
Eagle CudaGraphManager for decode draft generation, building its own attention metadata from scratch.
Source code in vllm/v1/worker/gpu/spec_decode/eagle/cudagraph.py
EagleCudaGraphManagerBase ¶
Bases: CudaGraphManager
Base CudaGraphManager for Eagle with a dedicated graph pool.
Source code in vllm/v1/worker/gpu/spec_decode/eagle/cudagraph.py
PrefillEagleCudaGraphManager ¶
Bases: EagleCudaGraphManagerBase
Eagle CudaGraphManager for prefill, using pre-built attention states from the target model's capture.