import requests from lxml import etree def download_ttml(url, headers=None, convert_to_srt=True): resp = requests.get(url, headers=headers) resp.raise_for_status()
If you’ve ever tried to grab captions from a streaming service like HBO Max, Paramount+, or even some corporate video platforms, you’ve likely run into a file format that isn’t .srt or .vtt . Instead, you saw .xml or .ttml . Welcome to the world of Timed Text Markup Language (TTML)—a powerful, verbose, and often misunderstood standard. ttml download
ffmpeg -i input.ttml output.vtt If you have seg_1.ttml , seg_2.ttml , etc.: ffmpeg -i input
# Parse and convert to SRT manually root = etree.fromstring(resp.content) ns = {"tt": "http://www.w3.org/ns/ttml"} cues = [] converting it for practical use
Downloading a TTML file is only the first step. The real challenge is understanding its structure, converting it for practical use, and avoiding common pitfalls like missing styling or overlapping timings.
for idx, p in enumerate(root.xpath("//tt:p", namespaces=ns), start=1): begin = p.get("begin") end = p.get("end") text = "".join(p.itertext()).strip() if not begin or not end or not text: continue # Convert 00:00:02.000 → SRT time format (commas for ms) begin_srt = begin.replace(".", ",") end_srt = end.replace(".", ",") cues.append(f"{idx}\n{begin_srt} --> {end_srt}\n{text}\n")