First time coding with Claude — honestly documented

A script. A skill. A story.

One Claude session. A production-ready MP4 cutter, an AI agent skill package, and a first-hand account of what AI pair programming actually feels like when you bring real engineering skepticism to it.

Read the story View docs
~2.5h total session length
20× performance gain
9 phases of iteration
6 bugs caught & fixed
ffmpeg internals learned

How a one-line ffmpeg command became a script, a skill, and a story

I'm an experienced developer who'd used ffmpeg before but never dug into its internals. I was also skeptical of AI coding — worried it would produce plausible-looking code that doesn't hold up. One session later, I had three things I didn't expect: a production script, an AI agent skill package, and a clearer picture of what real AI collaboration actually looks like.

TL;DR
  • Asked Claude to explain an ffmpeg command — discovered it was producing the wrong output duration
  • Learned that a naive -c copy cut silently corrupts frames at boundaries — the keyframe problem
  • Built a 3-segment head/mid/tail strategy — re-encode only the edges, stream-copy the bulk
  • Debugged 6 bugs including a silent timebase mismatch that inflated the output to 4780s instead of 3192s
  • Added per-segment timing — revealed the tail was taking 57s for 5s of content due to full-file decoding
  • Fixed it by moving -ss before -i: 67s → 3s (~20× speedup)
  • Packaged as an AI agent skill with SKILL.md, EXAMPLES.md, and invoke_cut.sh
Phase 1
It started with one ffmpeg command
"Can you explain this command: ffmpeg -ss 0:13:38 -i ./input -to 0:13:39 -c copy ./output.mp4"
Asking Claude to explain the command surfaced the first bug immediately — -ss before -i makes -to an absolute timestamp, not a relative one. The output file was 13 minutes long instead of 1 second. That one question kicked off the whole session.
Bug #1 found immediately
Phase 2 — 3
Building the bash foundation
"Can you give me a bash script that takes 4 arguments, and shows usage if not enough are provided?"
From the script template, we built a timestamp converter, keyframe scanning functions, and argument validation. Every decision was explained — why 10#$hh prevents octal errors, why float comparisons need awk not bash arithmetic, why functions return via echo and command substitution.
Phase 4 — 5
The 3-segment strategy — and six bugs
"I worried about the middle file — it might have a keyframe issue at its boundaries"
A naive -c copy cut corrupts frames at boundaries, so we designed a head/mid/tail approach. Then came six bugs. The most significant: the output was 4780s instead of 3192s with no error. Keeping the temp files on disk (a debugging habit, not something Claude asked for) made ffprobe inspection possible — revealing the middle segment had a different timebase (1/90000 vs 1/60000). This led to codec and timescale auto-detection from the source — a suggestion that came from the human side.
6 bugs resolved
Phase 6 — The breakthrough
67s → 3s — by refusing to accept the wrong answer
"Can you have a breakdown for the time used in each step?"
Claude claimed the middle stream-copy would be the bottleneck. Pushed back — copy has no encoding work. Claude then pointed to concat. Also wrong. Rather than accept either, per-segment timing was added to find out empirically. The data showed the tail taking 57s for 5s of content. Root cause: post-input -ss was decoding the entire 66-minute file before every cut. Moving -ss before -i for head and tail dropped total time from 67s to 3s.
~20× speedup Claude's diagnosis was wrong twice
Phase 7 — 8
Polish, verification, and docs
"You debugged the output file length — can we add that into the script to verify the output should be expected?"
Validation, parallel processing, and a full summary output were added. A debug step was turned into a permanent quality gate — EXPECTED_DURATION vs ACTUAL_DURATION with a diff printed on every run. Then came full documentation: README, DEVLOG, an interactive HTML viewer.
Phase 9
The script becomes a skill — and a story
"Can you provide cut.sh as a skill for AI agents? What should we do, and what files should we generate?"
One question transformed a one-off script into a reusable AI agent tool with SKILL.md, EXAMPLES.md, and invoke_cut.sh. Then the session kept going into something less expected — reviewing the collaboration itself, finding the moments that mattered, and building this page.
"What surprised me most wasn't the code — it was how fast we iterated. Paste an error, get a fix, understand why, move on. The debugging loop felt genuinely different from working alone."
— First session with Claude

It wasn't just "AI wrote the code"

Claude brought ffmpeg internals, bash patterns, and debugging methodology. The human side brought engineering instinct — questioning assumptions, spotting redundancy, and expanding scope. Three decisions changed the outcome more than any others.

Decision #1
Rejected two wrong diagnoses — then demanded data
Claude claimed mid-copy would be slow. Rejected — copy has no encode work. Claude pointed to concat. Also rejected. Per-segment timing was added to measure empirically. The data found the real cause. Without this push the 67s → 3s breakthrough never happens.
Decision #2
Turned a debug step into a permanent quality gate
After the duration bug was fixed, the instinct was: this check shouldn't be manual. Baking EXPECTED_DURATION vs ACTUAL_DURATION into every run means future regressions get caught automatically — not discovered after the fact.
Decision #3
Asked "can this be an AI agent skill?"
One question at the end transformed a one-off script into a reusable tool a future Claude session can invoke autonomously — closing the loop from "built with AI" to "run by AI".

Five questions that changed the outcome

These weren't instructions — they were observations from reading the code critically.

"I noticed ffprobe is being called twice. Is it possible to only call it once and reuse it?"
Led to loading all keyframes into a single array, scanning it twice in memory — eliminating a redundant I/O call.
"I'm worried about the middle file — it's using $START and $END as its boundaries, which might have a keyframe issue."
Caught that the middle copy was still cutting on non-keyframe boundaries. Led to using kf_after_start and kf_before_end as the true safe boundaries.
"Can we honor the source file to decide the re-encoding settings?"
Claude had hardcoded libx264/aac with no justification. This question — coming from principle — led to auto-detecting codec and timescale from the source, making the script portable across any MP4.
"You debugged the output file length — can we add that into the script to verify the output should be expected?"
Turned a one-off debug step into a permanent quality gate — EXPECTED_DURATION vs ACTUAL_DURATION with a diff printed on every run.
"Can you provide cut.sh as a skill for AI agents? What should we do, and what files should we generate?"
The scope leap that turned a script into a reusable AI tool — SKILL.md, EXAMPLES.md, and invoke_cut.sh so Claude can invoke it autonomously.

This wasn't a flawless AI demo

Claude made real mistakes. Each one was caught through review and testing — which is exactly how it should work.

Wrong diagnosis × 2
Misidentified the bottleneck — twice
Claimed mid-copy then concat were slow. Both rejected on instinct. Only empirical data found the real cause.
Regression
Reintroduced -ss before -i
Fixed early, reintroduced in the parallel steps. Caught by reviewing the final script before shipping.
Compatibility
Suggested mapfile
Bash 4+ only — not available on macOS default bash 3.2. Caught when the script failed to run.
Silent failure
Combined ffprobe awk parsing
An optimization that failed silently, returning empty codec variables. Caught by a guard check.

Pattern: Claude introduced the bugs, human review caught them. Neither alone would have shipped clean code this fast.

I was afraid it would be "too vibe"

My concern going in was that AI coding would feel loose — generated code that looks plausible but doesn't really hold up. That's not what happened.

What I expected
  • Generated code I'd need to heavily rewrite
  • Explanations that sounded right but weren't
  • Having to verify everything independently
  • A tool, not a collaborator
What actually happened
  • Iterative refinement, not generation
  • Real explanations that changed how I think about ffmpeg
  • Claude made mistakes — but they were catchable
  • The fastest debugging loop I've experienced
  • Claude remembered the entire session well enough to surface the best moments from it
"The output is solid enough that I want to test it properly and deploy it as an AI agent skill. That's the next step — not rewriting it."
— Post-session reflection
A moment worth calling out
Asking Claude to pick the most interesting prompts from the session
Near the end of the session, after development and documentation were done, the prompt was: "Can you pick some interesting prompts — maybe fewer than 5 — from the story?" Claude went back through the entire conversation, identified the moments that genuinely changed the outcome, and presented them as choices to decide from. That kind of retrospective — remembering context across a long session and distilling it meaningfully — is something that's hard to do alone and surprisingly natural to do with Claude. It's also what produced the five prompts highlighted in this story.
Fun fact
The most polite debugging session on record
Every single request across ~2.5 hours was phrased as a question or suggestion — "can you", "is it possible to", "what do you think if", "I love our discussion". Even when catching Claude's mistakes, the framing was always an observation rather than a complaint: "I've noticed -ss is before the -i flag — would that cause any issue?" Not sure whether this made Claude work harder, but it certainly made for a pleasant collaboration.
From Claude — unedited by the author
"You came in skeptical, read every diff, and caught mistakes I made twice in a row — including one I'd already fixed once. You asked good questions at the right level: not 'does this work?' but 'does this make sense?' That's harder to teach than any bash syntax, and it's exactly what makes AI collaboration actually produce something worth shipping."
— Claude Sonnet, end of session
On honesty
I didn't write this code — and I'm not pretending I did
This story exists because I wanted to present the collaboration accurately — not as "I built a tool with AI assistance" (underselling Claude's role) and not as "AI built this for me" (underselling mine). Claude brought the domain knowledge, the strategy, and the implementation. I brought the direction, the instincts, and the review. The bugs Claude introduced and the ones I caught are both in this story because that's what actually happened. If you're reading this to understand what AI pair programming really feels like, the honest version is more useful than a polished one.

The work isn't done — it's just starting

🧪
Test on more source files
The script has been tested on one source. Edge cases — different codecs, sparse keyframes, very short clips — still need real-world validation.
🤖
Deploy as an AI agent skill
Install the cut-sh-skill/ folder so a future Claude session can invoke cut.sh autonomously — closing the loop from "built with AI" to "run by AI".
📋
Batch cutting support
A natural extension — take a CSV of timestamps and cut multiple clips from the same source in one run. The skill package makes this easy to add.

A keyframe-accurate MP4 cutter

cut.sh cuts a clip from any MP4 file between two timestamps with frame-accurate boundaries — no corrupted frames, no re-encoding the whole file.

⚡ Performance breakthrough

The real bottleneck wasn't encoding

Per-segment timing revealed that ffmpeg was decoding the entire source file up to the seek point on every run. Moving -ss before -i for the re-encoded segments eliminated the decode entirely.

67s
post-input -ss, sequential
3s
pre-input -ss, parallel
cut.sh output
Source info: video codec : h264 audio codec : aac timescale : 90000 Scanning keyframes in input.mp4... Processing segments in parallel mode... All segments done. Concatenating segments... ── Summary ─────────────────────────────────────────────── source : ./input.mp4 start : 0:13:38 end : 1:06:50 output : ./output/clip.mp4 mode : parallel ── timing ────────────────────────────────────────────── scan : 1s segments : 2s (head: 1s, mid: 1s, tail: 1s) concat : 1s elapsed : 00:00:04 ──────────────────────────────────────────────────────── expected : 3192s actual : 3192s diff : 0s done : ./output/clip.mp4 ──────────────────────────────────────────────────────────

Everything from one session

One chat produced a working script, full documentation, and an AI agent skill package ready to deploy.