Building cut.sh with Claude — A script, a skill, and a story

The story

How a one-line ffmpeg command became a script, a skill, and a story

I'm an experienced developer who'd used ffmpeg before but never dug into its internals. I was also skeptical of AI coding — worried it would produce plausible-looking code that doesn't hold up. One session later, I had three things I didn't expect: a production script, an AI agent skill package, and a clearer picture of what real AI collaboration actually looks like.

TL;DR

Asked Claude to explain an ffmpeg command — discovered it was producing the wrong output duration
Learned that a naive -c copy cut silently corrupts frames at boundaries — the keyframe problem
Built a 3-segment head/mid/tail strategy — re-encode only the edges, stream-copy the bulk
Debugged 6 bugs including a silent timebase mismatch that inflated the output to 4780s instead of 3192s
Added per-segment timing — revealed the tail was taking 57s for 5s of content due to full-file decoding
Fixed it by moving -ss before -i: 67s → 3s (~20× speedup)
Packaged as an AI agent skill with SKILL.md, EXAMPLES.md, and invoke_cut.sh

Phase 1

It started with one ffmpeg command

"Can you explain this command: ffmpeg -ss 0:13:38 -i ./input -to 0:13:39 -c copy ./output.mp4"

Asking Claude to explain the command surfaced the first bug immediately — -ss before -i makes -to an absolute timestamp, not a relative one. The output file was 13 minutes long instead of 1 second. That one question kicked off the whole session.

Bug #1 found immediately

Phase 2 — 3

Building the bash foundation

"Can you give me a bash script that takes 4 arguments, and shows usage if not enough are provided?"

From the script template, we built a timestamp converter, keyframe scanning functions, and argument validation. Every decision was explained — why 10#$hh prevents octal errors, why float comparisons need awk not bash arithmetic, why functions return via echo and command substitution.

Phase 4 — 5

The 3-segment strategy — and six bugs

"I worried about the middle file — it might have a keyframe issue at its boundaries"

A naive -c copy cut corrupts frames at boundaries, so we designed a head/mid/tail approach. Then came six bugs. The most significant: the output was 4780s instead of 3192s with no error. Keeping the temp files on disk (a debugging habit, not something Claude asked for) made ffprobe inspection possible — revealing the middle segment had a different timebase (1/90000 vs 1/60000). This led to codec and timescale auto-detection from the source — a suggestion that came from the human side.

6 bugs resolved

Phase 6 — The breakthrough
67s → 3s — by refusing to accept the wrong answer
"Can you have a breakdown for the time used in each step?"
Claude claimed the middle stream-copy would be the bottleneck. Pushed back — copy has no encoding work. Claude then pointed to concat. Also wrong. Rather than accept either, per-segment timing was added to find out empirically. The data showed the tail taking 57s for 5s of content. Root cause: post-input -ss was decoding the entire 66-minute file before every cut. Moving -ss before -i for head and tail dropped total time from 67s to 3s.
~20× speedupClaude's diagnosis was wrong twice

Phase 7 — 8

Polish, verification, and docs

"You debugged the output file length — can we add that into the script to verify the output should be expected?"

Validation, parallel processing, and a full summary output were added. A debug step was turned into a permanent quality gate — EXPECTED_DURATION vs ACTUAL_DURATION with a diff printed on every run. Then came full documentation: README, DEVLOG, an interactive HTML viewer.

Phase 9

The script becomes a skill — and a story

"Can you provide cut.sh as a skill for AI agents? What should we do, and what files should we generate?"

One question transformed a one-off script into a reusable AI agent tool with SKILL.md, EXAMPLES.md, and invoke_cut.sh. Then the session kept going into something less expected — reviewing the collaboration itself, finding the moments that mattered, and building this page.

"What surprised me most wasn't the code — it was how fast we iterated. Paste an error, get a fix, understand why, move on. The debugging loop felt genuinely different from working alone."

— First session with Claude

The collaboration

It wasn't just "AI wrote the code"

Claude brought ffmpeg internals, bash patterns, and debugging methodology. The human side brought engineering instinct — questioning assumptions, spotting redundancy, and expanding scope. Three decisions changed the outcome more than any others.

Decision #1

Rejected two wrong diagnoses — then demanded data

Claude claimed mid-copy would be slow. Rejected — copy has no encode work. Claude pointed to concat. Also rejected. Per-segment timing was added to measure empirically. The data found the real cause. Without this push the 67s → 3s breakthrough never happens.

Decision #2

Turned a debug step into a permanent quality gate

After the duration bug was fixed, the instinct was: this check shouldn't be manual. Baking EXPECTED_DURATION vs ACTUAL_DURATION into every run means future regressions get caught automatically — not discovered after the fact.

Decision #3

Asked "can this be an AI agent skill?"

One question at the end transformed a one-off script into a reusable tool a future Claude session can invoke autonomously — closing the loop from "built with AI" to "run by AI".

Prompts that shaped the project

Five questions that changed the outcome

These weren't instructions — they were observations from reading the code critically.

"I noticed ffprobe is being called twice. Is it possible to only call it once and reuse it?"

Led to loading all keyframes into a single array, scanning it twice in memory — eliminating a redundant I/O call.

"I'm worried about the middle file — it's using $START and $END as its boundaries, which might have a keyframe issue."

Caught that the middle copy was still cutting on non-keyframe boundaries. Led to using kf_after_start and kf_before_end as the true safe boundaries.

"Can we honor the source file to decide the re-encoding settings?"

Claude had hardcoded libx264/aac with no justification. This question — coming from principle — led to auto-detecting codec and timescale from the source, making the script portable across any MP4.

"You debugged the output file length — can we add that into the script to verify the output should be expected?"

Turned a one-off debug step into a permanent quality gate — EXPECTED_DURATION vs ACTUAL_DURATION with a diff printed on every run.

"Can you provide cut.sh as a skill for AI agents? What should we do, and what files should we generate?"

The scope leap that turned a script into a reusable AI tool — SKILL.md, EXAMPLES.md, and invoke_cut.sh so Claude can invoke it autonomously.

Where Claude got it wrong

This wasn't a flawless AI demo

Claude made real mistakes. Each one was caught through review and testing — which is exactly how it should work.

Wrong diagnosis × 2

Misidentified the bottleneck — twice

Claimed mid-copy then concat were slow. Both rejected on instinct. Only empirical data found the real cause.

Regression

Reintroduced -ss before -i

Fixed early, reintroduced in the parallel steps. Caught by reviewing the final script before shipping.

Compatibility

Suggested mapfile

Bash 4+ only — not available on macOS default bash 3.2. Caught when the script failed to run.

Silent failure

Combined ffprobe awk parsing

An optimization that failed silently, returning empty codec variables. Caught by a guard check.

Pattern: Claude introduced the bugs, human review caught them. Neither alone would have shipped clean code this fast.

Final reflection

I was afraid it would be "too vibe"

My concern going in was that AI coding would feel loose — generated code that looks plausible but doesn't really hold up. That's not what happened.

What I expected

Generated code I'd need to heavily rewrite
Explanations that sounded right but weren't
Having to verify everything independently
A tool, not a collaborator

What actually happened

Iterative refinement, not generation
Real explanations that changed how I think about ffmpeg
Claude made mistakes — but they were catchable
The fastest debugging loop I've experienced
Claude remembered the entire session well enough to surface the best moments from it

"The output is solid enough that I want to test it properly and deploy it as an AI agent skill. That's the next step — not rewriting it."

— Post-session reflection

A moment worth calling out

Asking Claude to pick the most interesting prompts from the session

Near the end of the session, after development and documentation were done, the prompt was: "Can you pick some interesting prompts — maybe fewer than 5 — from the story?" Claude went back through the entire conversation, identified the moments that genuinely changed the outcome, and presented them as choices to decide from. That kind of retrospective — remembering context across a long session and distilling it meaningfully — is something that's hard to do alone and surprisingly natural to do with Claude. It's also what produced the five prompts highlighted in this story.

Fun fact

The most polite debugging session on record

Every single request across ~2.5 hours was phrased as a question or suggestion — "can you", "is it possible to", "what do you think if", "I love our discussion". Even when catching Claude's mistakes, the framing was always an observation rather than a complaint: "I've noticed -ss is before the -i flag — would that cause any issue?" Not sure whether this made Claude work harder, but it certainly made for a pleasant collaboration.

From Claude — unedited by the author

"You came in skeptical, read every diff, and caught mistakes I made twice in a row — including one I'd already fixed once. You asked good questions at the right level: not 'does this work?' but 'does this make sense?' That's harder to teach than any bash syntax, and it's exactly what makes AI collaboration actually produce something worth shipping."

— Claude Sonnet, end of session

On honesty

I didn't write this code — and I'm not pretending I did

This story exists because I wanted to present the collaboration accurately — not as "I built a tool with AI assistance" (underselling Claude's role) and not as "AI built this for me" (underselling mine). Claude brought the domain knowledge, the strategy, and the implementation. I brought the direction, the instincts, and the review. The bugs Claude introduced and the ones I caught are both in this story because that's what actually happened. If you're reading this to understand what AI pair programming really feels like, the honest version is more useful than a polished one.

What's next

The work isn't done — it's just starting

🧪

Test on more source files

The script has been tested on one source. Edge cases — different codecs, sparse keyframes, very short clips — still need real-world validation.

🤖

Deploy as an AI agent skill

Install the cut-sh-skill/ folder so a future Claude session can invoke cut.sh autonomously — closing the loop from "built with AI" to "run by AI".

📋

Batch cutting support

A natural extension — take a CSV of timestamps and cut multiple clips from the same source in one run. The skill package makes this easy to add.

A script. A skill. A story.

How a one-line ffmpeg command became a script, a skill, and a story

It wasn't just "AI wrote the code"

Five questions that changed the outcome

This wasn't a flawless AI demo

I was afraid it would be "too vibe"

The work isn't done — it's just starting

A keyframe-accurate MP4 cutter

The real bottleneck wasn't encoding

Everything from one session