2 Star 0 Fork 0

modelee / vqgan_imagenet_f16_16384

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README

VQGAN-f16-16384

Model Description

This is a Flax/JAX implementation of VQGAN, which learns a codebook of context-rich visual parts by leveraging both the use of convolutional methods and transformers. It was introduced in Taming Transformers for High-Resolution Image Synthesis (CVPR paper).

The model allows the encoding of images as a fixed-length sequence of tokens taken from the codebook.

This version of the model uses a reduction factor f=16 and a vocabulary of 16,384 tokens.

As an example of how the reduction factor works, images of size 256x256 are encoded to sequences of 256 tokens: 256/16 * 256/16. Images of 512x512 would result in sequences of 1024 tokens.

This model was ported to JAX using a checkpoint trained on ImageNet.

How to Use

The checkpoint can be loaded using Suraj Patil's implementation of VQModel.

Other

This model can be used as part of the implementation of DALL·E mini. Our report contains more details on how to leverage it in an image encoding / generation pipeline.

空文件

简介

取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
1
https://gitee.com/modelee/vqgan_imagenet_f16_16384.git
git@gitee.com:modelee/vqgan_imagenet_f16_16384.git
modelee
vqgan_imagenet_f16_16384
vqgan_imagenet_f16_16384
decoder

搜索帮助