ProductsReddit r/LocalLLaMA — 10 d ago

Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5

The Gemma 4 E2B model has been optimized to run in-browser using WebGPU kernels, achieving a throughput of 255 tokens per second on an M4 Max. The demo and kernels are now publicly available, allowing practitioners to experiment with this mobile transformer model. This development is significant for AI engineers focusing on efficient in-browser execution of large language models, enhancing accessibility and performance without requiring extensive computational resources.

gemma-4webgpulocal-llmrelevance 0.00 · engagement 0.00

Read at source ↗← all news