codingstairs
NotesEDULifeContact
⌕Search⌘K
koen

Navigation

  • Intro
  • Blog
  • Life

Get in touch

Send without signing in. Add your email if you'd like a reply.

  • Leave a message anonymously →
  • ✉ warragon112@gmail.com
  • KakaoTalk Open Chat ↗

© 2026 codingstairs

  • Notes
  • EDU
  • Search
  • Life
  • Contact
  • Legal
  • RSS
  • GitHub
Notes›ai

Gemini — Google's Multimodal LLM Lineup

Published 2026-04-28· Updated 2026-05-18·0 views

Gemini — Google's Multimodal LLM Lineup

Gemini is the model series Google DeepMind released in late 2023. Multimodal input — handling images · audio · video · code alongside text — and the very long context that came in with 1.5 are the often-cited features.

1. About Gemini

Google DeepMind released Gemini 1.0 on December 6, 2023. The chatbot previously offered as Bard was unified into Gemini, and a Nano variant was placed onboard devices like the Pixel 8 Pro, spreading the lineup across desktop · mobile · server.

Time Model Note
2023-12 Gemini 1.0 (Ultra · Pro · Nano) First release.
2024-02 Gemini 1.5 Pro 1M token context.
2024-05 Gemini 1.5 Flash Fast and cheap variant.
2024-12 Gemini 2.0 (Flash etc.) Reinforced multimodal output·tool use.
2025 Gemini 2.5 Pro · Flash Reasoning-reinforced variants.

Position as generations pass:

  • Pro · Ultra — Greatest capability. Higher cost · latency.
  • Flash — Light variant. Throughput-oriented.
  • Nano — On-device built-in small variant.

Exact model names and availability change often by generation · date, so check the model card in official docs each time.

2. The 1M token context

Gemini 1.5 Pro was announced with a standard 1M token context at general availability (with research announcements introducing up to 2M). With very long context, use patterns become possible: putting a whole book · video · code base in at once.

Position effects like "lost in the middle" are still observed, so large context isn't always the answer.

3. Two API entry points

  • Google AI Studio (ai.google.dev) — Individual developers · experiments. Start with one API key.
  • Vertex AI (Google Cloud) — Enterprise entry point integrated with GCP project · IAM · logging · billing. Controls like data residency (region) · VPC-SC.

The same models, but auth · billing · feature availability · SLA can differ.

4. Call shape

from google import genai

client = genai.Client(api_key="...")
response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents="Please summarize in one paragraph in Korean.",
)
print(response.text)

REST follows the same line. Inputs like images · PDFs · audio · video are split into Part units inside contents.

5. Multimodal input

Input Note
Image PNG · JPEG · WEBP · HEIC.
Audio Voice · music. Captions · summary · analysis.
Video MP4. Frame-based or timestamp-based.
PDF Mixed pages · images · text documents.

Upload limits · supported formats vary by model · generation.

6. Function calling · JSON mode

  • Function calling — Pass function signatures to the model; the model produces and returns the call parameters (JSON). The actual call is made by the caller.
  • JSON mode · response schema — Force output format to JSON. Schema via JSON Schema or Pydantic.

7. Objective comparison with other models

Model family Provider Release Trait
Gemini Google DeepMind 2023-12 Multimodal breadth · very long context · GCP integration.
GPT (4 · 4o · o1 · o3) OpenAI 2022-11 Tool ecosystem · broad adoption · reasoning model line.
Claude (3 · 3.5 · 4) Anthropic 2023-03 Long context · strong writing·coding.
Mistral · Codestral Mistral AI 2023 Europe-based · open-weight variants.
Llama (3 · 3.1 · 3.2) Meta 2023~ Open weights (license separately).
Qwen Alibaba 2023~ Open weights · multilingual.

Strengths and weaknesses shift quickly by generation · time. Your own domain evaluation is more reliable than a single benchmark.

8. Pricing · context caching

Pricing — Per-token billing (input · output separated, cache · context-caching separate). Free tiers exist in some places, and quotas · constraints differ. Vertex AI is bundled with general GCP billing, so other service costs (storage · logging · network) come along.

Context caching — A feature to cache large system prompts · documents on the server so they don't have to be sent every time, introduced from the 1.5 generation. Anthropic · OpenAI also have similar cache features, with differing pricing · TTL · key definitions per provider.

9. Safety settings · environment variables

The Gemini API allows setting category-level safety classifier thresholds (violence · sexual · harassment · dangerous). Verify the difference between defaults and changed values with your own data.

export GOOGLE_API_KEY=...           # macOS · Linux
$env:GOOGLE_API_KEY = "..."          # Windows PowerShell

Vertex AI auth is usually via ADC (Application Default Credentials) obtained with gcloud auth application-default login or a service account key file.

10. Spots where you often get stuck

Model name volatility — Aliases like gemini-1.5-pro-latest and date pins (gemini-1.5-pro-002) mean different things. Pinning is safer in operations.

Region constraints — Some models · features are limited to specific regions. Watch the location setting on Vertex AI.

Context limit vs actual limit — Even when 1M tokens is advertised, input·output total and per-model limits are defined separately. Output tokens usually have a separate, smaller cap.

Image · video token conversion — Non-text inputs are internally converted to tokens. Looking only at text tokens for cost calculation is off.

Blocking · filtering — Cases where safety classification blocks input · output. Check the reason · category code in the response.

Response length limit — If you set max_output_tokens small and forget, responses get cut.

AI Studio vs Vertex AI difference — The same code works on one side and needs additional permissions · settings on the other.

Data usage policy — There's notice that data-training-use policies differ between AI Studio free key and Vertex AI. Check the terms.

Closing thoughts

Gemini's appeal is multimodal breadth and very long context. However, model names · prices · limits change often, so for operations: pin the model + your own domain evaluation set + dev verification with WireMock cutting external dependencies — that's the safe path.

Next

  • embeddings-deep
  • agents-overview

References: Google AI for Developers · Vertex AI Generative AI · Gemini API Models · Google DeepMind Gemini · Gemini 1.5 Report · LMArena · LiveBench.

More in ai

All in this category →
  • Google NotebookLM — source-grounded Gemini notebook (RAG-shaped tool)
  • Google AI Studio — Gemini-powered AI Web IDE + app builder
  • LLM Landscape — Closed · Open · Korean-Specialized · Evaluation · Pricing
  • AI Agents — Definition · Patterns · Frameworks · Autonomy
  • Embeddings Deep — Models · Dimensions · Benchmarks · Cache
  • Prompt Design — Message Roles · CoT · ReAct · Sampling · Injection