Abstract: Existing generative image transformers follow a two-stage generation paradigm, where the first stage learns a codebook to encode images into discrete codes via vector quantization, and the ...