Sign In

Lens Text Encoder (GPT-OSS)

Updated: May 28, 2026

base model

Download

1 variant available

SafeTensor

12.33 GB

Verified:

Type

Text Encoder

Stats

70

Reviews

Published

May 28, 2026

Base Model

Lens

Hash

AutoV2
103D7759C7

This is the GPT-OSS text encoder used by Microsoft's Lens text-to-image model. Lens conditions its 48-block MMDiT denoiser on concatenated multi-layer features from GPT-OSS rather than a CLIP-style text tower, which is a big part of where its long-caption following and multilingual generalization come from. This page mirrors the encoder weights so you can run Lens and Lens-Turbo on-site without pulling them separately.

The underlying model is GPT-OSS, released by OpenAI in August 2025 under Apache 2.0. All credit for the language model goes to OpenAI. The specific bundled weights here come from the microsoft/Lens repository, where they live under the text_encoder/ folder alongside the Lens denoiser and FLUX.2 VAE. Civitai is hosting a mirror so creators can run Lens on-site - head to the upstream repos for the canonical weights and updates.

Built by

OpenAI released the GPT-OSS family (gpt-oss-20b and gpt-oss-120b) in August 2025 as open-weight reasoning models with full chain-of-thought access, MoE architecture, and MXFP4 quantization. Microsoft Research selected GPT-OSS as the text encoder for Lens and bundled the weights in their distribution - project leads Dong Chen, Fangyun Wei, and Ziyu Wan, with core contributors Jiawei Zhang, Jinjing Zhao, Sirui Zhang, Yang Yue, and Zhiyang Liang.

How Lens uses it

The encoder is frozen - Lens does not finetune it. Features from multiple GPT-OSS layers are concatenated and fed into the MMDiT denoiser as conditioning. This gives Lens stronger long-caption comprehension than a single-final-layer extraction, and it inherits GPT-OSS's multilingual coverage essentially for free.

Variant

Microsoft's README does not name the specific GPT-OSS checkpoint, but parameter parity with the 3.8B Lens denoiser strongly points to gpt-oss-20b (21B total, 3.6B active) rather than gpt-oss-120b. The bundled weights are stored in MXFP4 by default - inference scripts expose a --disable_mxfp4 flag to dequantize when the host GPU does not support it.

What this model page is for

You do not generate images with this encoder alone. It is a dependency of Lens and Lens-Turbo, mirrored here so on-site generation can resolve the full pipeline without external fetches. If you are looking for the image generator itself, see the Lens model page.

Links