What is Google Gemma 4?
Gemma 4 is Google DeepMind's open multimodal AI model family released in April 2026. It processes text, images, audio, and video with on-device agentic capabilities. It supports tool use (Maps, Wikipedia, calculators, etc.) without requiring cloud upload, making it privacy-focused by design. Available in 2B, 4B, 12B, and 27B parameter sizes.
How does Gemma 4 compare to Llama 4?
Gemma 4 12B outperforms Llama 4 Scout 17B on most benchmarks while being more parameter-efficient. Key advantages include: higher MMLU scores (83.2 vs 79.6), significantly better multimodal understanding (MMMU: 58.9 vs 54.2), superior tool use accuracy (89.1 vs 78.5), and better on-device optimization. Llama 4 has an edge in raw code generation (HumanEval) with its larger parameter count.
Can Gemma 4 run on my phone?
Yes! Gemma 4 2B and 4B are designed for mobile deployment. The 2B model runs on phones with 4GB+ RAM at 30-40 tokens/second using INT4 quantization. The 4B model needs 6GB+ RAM. Both support full multimodal capabilities on-device, including image understanding and tool use. Works on Android (Pixel 8+, Samsung S24+) and iOS (iPhone 15+).
What hardware do I need for Gemma 4 12B?
For Gemma 4 12B in FP16: 24GB RAM and 12GB VRAM (RTX 4070+). With INT4 quantization: 8GB RAM and 6GB VRAM is sufficient. On Apple Silicon: M2 Pro/Max or better with 16GB unified memory. The INT4 quantized version runs well on most modern laptops with 16GB RAM using CPU-only inference at ~15 tokens/second.
How do I install Gemma 4 with Ollama?
Install Ollama with curl -fsSL https://ollama.com/install.sh | sh, then pull the model with ollama pull gemma4 (default 12B) or specify a size like ollama pull gemma4:2b. Start the server with ollama serve and interact via ollama run gemma4 or the REST API at localhost:11434.
Does Gemma 4 support tool use and function calling?
Yes, Gemma 4 has built-in agentic tool use. It can call Google Maps for location queries, Wikipedia for knowledge, calculators for math, web search for real-time info, and custom REST/GraphQL APIs. All tool orchestration happens on-device. You define tools as JSON schemas, and Gemma 4 automatically decides when and how to call them.
Is Gemma 4 free to use commercially?
Yes, Gemma 4 is released under Google's permissive open model license. It's free for both research and commercial use with no royalties. You can download weights from HuggingFace, Kaggle, or Google AI Studio. The only restriction is on using outputs to train competing models above a certain parameter threshold.
What modalities does Gemma 4 support?
Gemma 4 is natively multimodal across all variants. It supports: text (generation, summarization, translation), images (understanding, VQA, captioning), audio (transcription, understanding, analysis), and video (frame analysis, temporal understanding, action recognition). Even the 2B model supports all four modalities on-device.
Can I fine-tune Gemma 4?
Yes, Gemma 4 supports LoRA and QLoRA fine-tuning. The 2B model can be fine-tuned on a single RTX 3090, and the 4B on an A100. Google provides official fine-tuning guides, Keras/JAX integration, and PEFT-compatible checkpoint formats. HuggingFace PEFT, Unsloth, and Axolotl all support Gemma 4 fine-tuning.
What is Gemma 4's context window?
Gemma 4 12B and 27B support 128K token context windows. The 2B and 4B on-device variants support 32K tokens. The 128K context enables processing full codebases, long documents, and extended multi-turn conversations. RoPE-based position encoding allows reliable extrapolation beyond the training context length.
How does Gemma 4 ensure privacy?
Gemma 4 runs entirely on-device with zero data upload. No prompts, inputs, or outputs are sent to any server. This makes it suitable for HIPAA, GDPR, SOC2, and classified environments. The open weights allow full auditability. No telemetry, no usage tracking, no data collection. You control the complete inference pipeline.