SGLang Diffusion Code Walk Through This document aims to provide developers with a code walk-through for the SGLang Diffusion ( ) backend. Diffusion Models SGLang-Diffusion supports efficient inference for diffusion models.
This document aims to provide developers with a code walk-through for the SGLang Diffusion (multimodal_gen) backend.
SGLang-Diffusion supports efficient inference for diffusion models. Diffusion models are one of the fastest-developing and most popular generative frameworks for images and videos in recent years.
Broadly, diffusion models define a forward process: Data -> Gaussian Noise. From this forward process, the reverse process can be derived: Noise -> Data, which is the reconstruction of a sample from noise. Executing this reverse process using a trained model is what SGLang-Diffusion needs to do.
As a code walk-through tutorial, we will not delve into complex mathematical formulas and principles here. Based on how models conceptualize the reverse process, diffusion models are generally divided into three categories:
These three modeling approaches interpret the reverse process from different angles, but their presentation within the inference framework is similar. They primarily differ in the denoise stage and are mainly handled by the following components:
runtime/models/dits/ directory.runtime/models/schedulers/ directory.runtime/models/schedulers/ directory, via the Scheduler's step method.The design of SGLang Diffusion aims to remain consistent with SGLang to facilitate developer understanding and familiarity with various concepts. The life cycle of a request in SGLang Diffusion is approximately as follows:
Scheduler and GPUWorker are launched on each rank. During initialization, GPUWorker builds the ComposedPipeline object, which involves loading Pipeline Components.DiffusionGenerator or HTTP API).Scheduler on Rank 0 receives the request via ZeroMQ and Broadcasts it to all ranks. Schedulers on all ranks receive the request and call their local GPUWorker to execute the task.GPUWorker calls the forward method of ComposedPipeline. Internally, the Pipeline sequentially schedules each PipelineStage via a PipelineExecutor (defaulting to the SyncExecutor implementation).Scheduler on Rank 0 collects the generated tensor data (Pixel Values) and returns it to the client.DiffusionGenerator) receives the tensor data and performs post-processing (e.g., format conversion, saving files) to finally obtain the image/video.
[!NOTE]
- This code walk-through is based on the SGLang-diffusion version (
35a9a073706e89a2f5740f578bbb080146cd48bf)- This code walk-through is inspired by the SGLang Code Walk Through