Gemini — Google's Multimodal LLM Lineup

Gemini is the model series Google DeepMind released in late 2023. Multimodal input — handling images · audio · video · code alongside text — and the very long context that came in with 1.5 are the often-cited features.

1. About Gemini

Google DeepMind released Gemini 1.0 on December 6, 2023. The chatbot previously offered as Bard was unified into Gemini, and a Nano variant was placed onboard devices like the Pixel 8 Pro, spreading the lineup across desktop · mobile · server.

Time	Model	Note
2023-12	Gemini 1.0 (Ultra · Pro · Nano)	First release.
2024-02	Gemini 1.5 Pro	1M token context.
2024-05	Gemini 1.5 Flash	Fast and cheap variant.
2024-12	Gemini 2.0 (Flash etc.)	Reinforced multimodal output·tool use.
2025	Gemini 2.5 Pro · Flash	Reasoning-reinforced variants.

Position as generations pass:

Pro · Ultra — Greatest capability. Higher cost · latency.
Flash — Light variant. Throughput-oriented.
Nano — On-device built-in small variant.

Exact model names and availability change often by generation · date, so check the model card in official docs each time.

2. The 1M token context

Gemini 1.5 Pro was announced with a standard 1M token context at general availability (with research announcements introducing up to 2M). With very long context, use patterns become possible: putting a whole book · video · code base in at once.

Position effects like "lost in the middle" are still observed, so large context isn't always the answer.

3. Two API entry points

Google AI Studio (ai.google.dev) — Individual developers · experiments. Start with one API key.
Vertex AI (Google Cloud) — Enterprise entry point integrated with GCP project · IAM · logging · billing. Controls like data residency (region) · VPC-SC.

The same models, but auth · billing · feature availability · SLA can differ.

4. Call shape

from google import genai

client = genai.Client(api_key="...")
response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents="Please summarize in one paragraph in Korean.",
)
print(response.text)

REST follows the same line. Inputs like images · PDFs · audio · video are split into Part units inside contents.

5. Multimodal input

Input	Note
Image	PNG · JPEG · WEBP · HEIC.
Audio	Voice · music. Captions · summary · analysis.
Video	MP4. Frame-based or timestamp-based.
PDF	Mixed pages · images · text documents.

Upload limits · supported formats vary by model · generation.

6. Function calling · JSON mode

Function calling — Pass function signatures to the model; the model produces and returns the call parameters (JSON). The actual call is made by the caller.
JSON mode · response schema — Force output format to JSON. Schema via JSON Schema or Pydantic.

7. Objective comparison with other models

Model family	Provider	Release	Trait
Gemini	Google DeepMind	2023-12	Multimodal breadth · very long context · GCP integration.
GPT (4 · 4o · o1 · o3)	OpenAI	2022-11	Tool ecosystem · broad adoption · reasoning model line.
Claude (3 · 3.5 · 4)	Anthropic	2023-03	Long context · strong writing·coding.
Mistral · Codestral	Mistral AI	2023	Europe-based · open-weight variants.
Llama (3 · 3.1 · 3.2)	Meta	2023~	Open weights (license separately).
Qwen	Alibaba	2023~	Open weights · multilingual.

Strengths and weaknesses shift quickly by generation · time. Your own domain evaluation is more reliable than a single benchmark.

8. Pricing · context caching

Pricing — Per-token billing (input · output separated, cache · context-caching separate). Free tiers exist in some places, and quotas · constraints differ. Vertex AI is bundled with general GCP billing, so other service costs (storage · logging · network) come along.

Context caching — A feature to cache large system prompts · documents on the server so they don't have to be sent every time, introduced from the 1.5 generation. Anthropic · OpenAI also have similar cache features, with differing pricing · TTL · key definitions per provider.

9. Safety settings · environment variables

The Gemini API allows setting category-level safety classifier thresholds (violence · sexual · harassment · dangerous). Verify the difference between defaults and changed values with your own data.

export GOOGLE_API_KEY=...           # macOS · Linux
$env:GOOGLE_API_KEY = "..."          # Windows PowerShell

Vertex AI auth is usually via ADC (Application Default Credentials) obtained with gcloud auth application-default login or a service account key file.

10. Spots where you often get stuck

Model name volatility — Aliases like gemini-1.5-pro-latest and date pins (gemini-1.5-pro-002) mean different things. Pinning is safer in operations.

Region constraints — Some models · features are limited to specific regions. Watch the location setting on Vertex AI.

Context limit vs actual limit — Even when 1M tokens is advertised, input·output total and per-model limits are defined separately. Output tokens usually have a separate, smaller cap.

Image · video token conversion — Non-text inputs are internally converted to tokens. Looking only at text tokens for cost calculation is off.

Blocking · filtering — Cases where safety classification blocks input · output. Check the reason · category code in the response.

Response length limit — If you set max_output_tokens small and forget, responses get cut.

AI Studio vs Vertex AI difference — The same code works on one side and needs additional permissions · settings on the other.

Data usage policy — There's notice that data-training-use policies differ between AI Studio free key and Vertex AI. Check the terms.

Closing thoughts

Gemini's appeal is multimodal breadth and very long context. However, model names · prices · limits change often, so for operations: pin the model + your own domain evaluation set + dev verification with WireMock cutting external dependencies — that's the safe path.

embeddings-deep
agents-overview

References: Google AI for Developers · Vertex AI Generative AI · Gemini API Models · Google DeepMind Gemini · Gemini 1.5 Report · LMArena · LiveBench.