codingstairs
NotesEDULifeContact
⌕Search⌘K
koen

Navigation

  • Intro
  • Blog
  • Life

Get in touch

Send without signing in. Add your email if you'd like a reply.

  • Leave a message anonymously →
  • ✉ warragon112@gmail.com
  • KakaoTalk Open Chat ↗

© 2026 codingstairs

  • Notes
  • EDU
  • Search
  • Life
  • Contact
  • Legal
  • RSS
  • GitHub
Notes›environment

Data formats — JSON · YAML · TOML · XML

Published 2026-04-28· Updated 2026-05-18·0 views

Data formats — JSON · YAML · TOML · XML

When programs exchange data with each other they need an agreed-upon notation. The four we meet most often in config files, API responses, CI workflows, and environment variables are JSON, YAML, TOML, and XML.

1. About the four formats

Format Origin First standard Common seats
JSON Douglas Crockford extracted it from JS object syntax in 2001 RFC 4627 (2006), current RFC 8259 (2017), ECMA-404 (2013) API responses, configuration, data serialization.
YAML Clark Evans, Oren Ben-Kiki, Ingy döt Net, 2001 YAML 1.0 (2004), 1.2 (2009) CI workflows, Kubernetes, configuration.
TOML Tom Preston-Werner (GitHub co-founder), 2013 TOML 1.0 (2021) Rust Cargo, Python pyproject, static-site config.
XML W3C, 1998 XML 1.0 (1998) SOAP, RSS, some configs (Maven), documents.

JSON is the simplest. YAML is human-friendly but riddled with traps. TOML was created to dodge those traps. XML has the richest expressiveness but is heavy.

2. JSON

{
  "name": "lee",
  "age": 30,
  "tags": ["dev", "ko"],
  "active": true,
  "address": null,
  "profile": {
    "bio": "안녕"
  }
}

Six data types — string, number, boolean, null, array, object. No comments, no trailing commas (,]).

Strengths Weaknesses
Simple and universal standard. No comments.
Almost every language supports it via the standard library. Quotes and braces feel heavy when typed by hand.
Machine-friendly. Large integers hit limits (IEEE 754 double).

JSON5 (a 2012 variant allowing comments and trailing commas) and JSON Lines (one object per line, used in logs and streams) are close relatives.

3. YAML

name: lee
age: 30
tags:
  - dev
  - ko
active: true
address: null
profile:
  bio: 안녕

Indentation defines structure. Spaces only (no tabs). Comments start with #. Data types are a superset of JSON's plus multi-document, anchors, and tags.

# Anchors and references (DRY)
defaults: &defaults
  retries: 3
  timeout: 30

dev:
  <<: *defaults
  host: localhost

prod:
  <<: *defaults
  host: example.com
# Multi-line strings
folded: >
  Multiple lines
  collapsed onto
  a single line
literal: |
  Multiple lines
  preserved

4. YAML's Norway problem

In YAML 1.1 the Norwegian country code NO is parsed as boolean false.

countries:
  - NO       # ← becomes boolean false
  - SE
  - DK

yes, no, on, off, Y, N are also booleans. The 1.2 standard removed this interpretation, but many libraries still default to 1.1. To stay safe, quote them:

countries:
  - "NO"
  - "SE"

Another trap — octal interpretation. Some parsers read 010 as octal 8.

5. TOML

name = "lee"
age = 30
active = true
address = ""

tags = ["dev", "ko"]

[profile]
bio = "안녕"

[servers.dev]
host = "localhost"
port = 8080

[servers.prod]
host = "example.com"
port = 443

[[items]]
id = 1
[[items]]
id = 2

Clear key=value syntax. Comments start with #. Data types are richer than JSON, with first-class dates and times.

Strengths Weaknesses
Little ambiguity. Deep nesting gets verbose.
Comments allowed. Not as expressive as YAML.
Comfortable to type by hand. Readability drops in complex structures.

Cargo (Cargo.toml), Python pyproject.toml, and static-site generators like Hugo and Zola use it as a standard.

6. XML

<?xml version="1.0" encoding="UTF-8"?>
<user id="42">
  <name>lee</name>
  <age>30</age>
  <tags>
    <tag>dev</tag>
    <tag>ko</tag>
  </tags>
</user>

Tags, attributes, namespaces, DTD/XSD schemas — expressive in many directions. Once dominant in SOAP, RSS, Atom, and Office Open XML (.docx), it has lost ground to JSON in recent years.

7. At a glance

Item JSON YAML TOML XML
Comments None Yes Yes Yes
Indent-sensitive No Yes No No
Human friendliness Medium High High Low
Ambiguity Low High Low Low
Schema JSON Schema Borrows JSON Schema Sparse Rich (XSD, DTD)

8. Other paths

Formats we run into in particular niches:

  • Protocol Buffers (protobuf) — Google, 2008. Binary. Schema first.
  • MessagePack — binary JSON. JSON-compatible plus smaller size.
  • CBOR — RFC 8949. IoT-friendly binary format.
  • HOCON — Typesafe's Config. A human-friendly variant of JSON.
  • EDN — Clojure data format.
  • CSV · TSV — the simplest tabular format. Riddled with comma and quote escaping pitfalls.

9. Standard tools per language

// JS — only JSON ships in the standard library
const obj = JSON.parse('{"a":1}');
const s = JSON.stringify(obj, null, 2);

// YAML/TOML are libraries (js-yaml, smol-toml)
import yaml from "js-yaml";
const data = yaml.load(text);
# Python
import json
data = json.loads(s); s = json.dumps(data, indent=2, ensure_ascii=False)

import yaml         # PyYAML
data = yaml.safe_load(text)

import tomllib      # 3.11+ standard
data = tomllib.loads(text)

Command-line conversion:

cat data.json | jq .              # mac · Linux. On Windows use choco install jq
yq -o=json . config.yaml          # YAML → JSON
yq -P . data.json                 # JSON → YAML

10. Common pitfalls

JSON — no trailing commas, keys must use double quotes, no comments.

YAML — never indent with tabs (spaces only). Boolean traps like the Norway problem. Empty values vs null (~, null, empty string) differ in notation.

TOML — defining the same key in multiple places is an error. Array of tables ([[items]]) vs regular tables can be confusing at first.

XML — namespaces stretch parser code. XXE (external entity) security flaws make it worth checking parser options.

Encoding — almost always UTF-8. A BOM can prevent some parsers from finding the first key.

No-comment seats — to leave a memo in JSON config, people sometimes wedge in a _comment key as a workaround. JSON5 or JSONC (VS Code) is an alternative.

Versions — the YAML 1.1 vs 1.2 split. Check the library docs to know which side you are on.

Closing thoughts

Each data format settles into a fixed role and is hard to swap arbitrarily. JSON is the standard for APIs and config, YAML for CI · k8s · docker compose, TOML for language package managers (Rust, Python), XML for legacy seats — once this matrix clicks, even an unfamiliar file reads quickly.

Next

  • wsl2

RFC 8259 JSON · ECMA-404 · YAML 1.2 · TOML 1.0 · XML 1.0 · Norway Problem · JSON5 · jq · yq · JSON Schema for reference.

More in environment

All in this category →
  • WSL2 — Linux on top of Windows
  • First day with the terminal
  • Text encoding and line endings
  • Markdown
  • Cross-platform scripts
  • cmd.exe and batch files