Static vs dynamic — BS4 + Playwright

Wrong tool = 10x slower, 10x more likely to be blocked. Decide first.

1. Static (server-rendered)

curl returns the data you want.

100–300ms
Light resources
requests + BeautifulSoup or httpx

2. Dynamic (JS-rendered)

Source has empty <div id="app"> and JS fills it.

2–10s
Hundreds of MB per browser
Playwright / Selenium

3. Decide fast

curl https://target.com/page | grep "the text you want"

No match? Open DevTools → Network → XHR. Often there's a JSON API you can call directly.

4. Hidden APIs

Many "dynamic" sites actually call REST APIs. Calling them directly beats Playwright in speed and stability.

5. requests + BS4

import httpx
from bs4 import BeautifulSoup

async with httpx.AsyncClient(headers={"User-Agent": "MyBot/1.0"}) as client:
    resp = await client.get("https://example.com/page")
    soup = BeautifulSoup(resp.text, "html.parser")
    for item in soup.select("div.item"):
        yield {
          "title": item.select_one(".title").text.strip(),
          "price": item.select_one(".price").text.strip(),
        }

6. Playwright

from playwright.async_api import async_playwright

async with async_playwright() as p:
    browser = await p.chromium.launch(headless=True)
    page = await browser.new_page()
    await page.goto(url, wait_until="networkidle")
    await page.wait_for_selector(".item")
    titles = await page.locator(".item .title").all_inner_texts()
    await browser.close()

7. Optimizations

await page.route("**/*.{png,jpg,gif,svg,woff,woff2,css}", lambda r: r.abort())

Block images/styles → 3–5x faster.

8. Reusable context

context = await browser.new_context(user_agent="MyBot/1.0")
page1 = await context.new_page()
page2 = await context.new_page()

Shares cookies / storage.

9. Hybrid

Playwright once for login / JS-rendered listing; BS4 in parallel for details.

urls = await extract_urls_with_playwright(list_page)
async with httpx.AsyncClient() as client:
    details = await asyncio.gather(*[fetch_bs4(client, u) for u in urls])

10. Gotchas

Playwright for static pages — wasteful
BS4 for SPAs — empty HTML
Missing hidden APIs — check Network tab
Default Playwright timeout (30s) too short on slow sites

Closing

"curl first, hidden API next, Playwright last" — preserves speed, stability, and politeness.

03-rate-limit-backoff

5. requests + BS4

import httpx from bs4 import BeautifulSoup async with httpx.AsyncClient(headers={"User-Agent": "MyBot/1.0"}) as client: resp = await client.get("https://example.com/page") soup = BeautifulSoup(resp.text, "html.parser") for item in soup.select("div.item"): yield { "title": item.select_one(".title").text.strip(), "price": item.select_one(".price").text.strip(), }

6. Playwright

from playwright.async_api import async_playwright async with async_playwright() as p: browser = await p.chromium.launch(headless=True) page = await browser.new_page() await page.goto(url, wait_until="networkidle") await page.wait_for_selector(".item") titles = await page.locator(".item .title").all_inner_texts() await browser.close()

Static vs dynamic — BS4 + Playwright

Static vs dynamic — BS4 + Playwright

1. Static (server-rendered)

2. Dynamic (JS-rendered)

3. Decide fast

4. Hidden APIs

5. requests + BS4

6. Playwright

7. Optimizations

8. Reusable context

9. Hybrid

10. Gotchas

Closing

Next

Static vs dynamic — BS4 + Playwright

Static vs dynamic — BS4 + Playwright

1. Static (server-rendered)

2. Dynamic (JS-rendered)

3. Decide fast

4. Hidden APIs

5. requests + BS4

6. Playwright

7. Optimizations

8. Reusable context

9. Hybrid

10. Gotchas

Closing

Next