Skip to content

How we shipped 98 pull requests in four days

We build for us. We build for you.

How Brumm Labs shipped 98 pull requests in four days through a self-built AI pipeline — and why this is exactly the method we use to build your software, too.

The moment it clicked

It was a Thursday in April. Discord messages ticked across the screen every few seconds:

14:02
14:05
14:08
14:12

We sat in the kitchen, drank coffee, talked about other things. In the background, our product was building itself. Story by story, feature by feature. 35 merges in a single day. No night shifts. No "just ship it real quick". No regressions.

That was the day we knew: this isn't just a tool for us anymore. This is how software should be built today. And we want our clients to benefit from it.

The problem we set ourselves

Brumm Labs is a workshop. We build multiple products in parallel: Vertellen, Brummlabs Backoffice, a design system, a Discord bot, several websites, and your products. Each of them has its own backlog. Each product shouts for attention.

As a founder or entrepreneur, you know the feeling. The list of things you want to build grows faster than the list of things you can build. You wake up at night thinking about three things you should do, four new ideas your clients have been asking about for weeks, and that one refactor that would make everything easier if you only had the time.

AI coding tools were the hope. Cursor, Copilot, Devin. We tried them all. And yes: they make typing faster. They help with experiments and prototypes. They are brilliant in the moment.

But: they don't help you with the lifecycle. They don't write requirements documents. They don't plan implementation. They don't deliver structured changes you can trace. They don't make sure everything that was running in production still works. They don't look at their own work again two days later.

What we were missing wasn't a faster code editor. What we were missing was a workshop.

What we built

We call it the Guild. A guild of specialized AI agents, each with a single, clearly defined job, all coordinated through something every development team already has: GitHub issues and labels.

You write an issue. You set a label. You drink coffee. The Guild handles the rest.

The five trades

Scribe, the writer.

She reads your issue and translates fuzzy product ideas into sharp specifications: a requirements document, epics, individual stories with measurable acceptance criteria. She makes assumptions explicit instead of peppering you with questions. Only when she really can't move forward does she knock.

Smith, the implementer.

She takes Scribe's stories and writes code. Across multiple repos, in the right order. Tests first. Minimal diff. If a bug is too complex, she escalates to Scribe. No hero stunts.

Warden, the guardian.

She reviews every feature with three adversarial perspectives at once: a blind-hunter view, an edge-case view, an acceptance-auditor view. If she finds something critical, she fixes it herself. Up to two iterations, then she escalates. And after the merge she looks again, this time as an auditor, turning every overlooked detail into its own ticket for later.

Seal, the sealkeeper.

She opens the pull request, watches the CI pipeline, fixes small errors herself, and merges cleanly as soon as everything is green. If Chromatic needs a visual approval, she pings us with a screenshot diff.

Werkstatt, the conductor.

She orchestrates. She reads labels, dispatches the next agent, documents everything, and keeps us in the loop via Discord. From idea to live deployment, easy to trace from the kitchen table.

In the background runs Forge, orchestrating multiple pipelines in parallel. One founder can have ten issues processed at the same time without anything getting in each other's way.

What happened

Over the course of four days, the Guild carried 98 pull requests through our repositories. Here are the numbers, unvarnished:

The headline

Pull requests merged98
PeriodApril 9–12, 2026 (4 active days)
Median lead time14.3 minutes
Peak day35 pull requests
Products worked on in parallel6
Code volume+683,351 / -36,569 lines
Manual interventions (HITL)0

The daily curve

04 · DAILY CURVE98 pull requests. 4 days.Merged autonomously through the Guild pipeline.40302010010Day 135PEAKDay 222Day 331Day 4MEDIAN LEAD TIME14.3 minPRODUCTS IN PARALLEL6SELF-CORRECTION10 % audit findingsINTERVENTIONS0

Four days. 98 pull requests. This isn't marketing magic. This is a backlog that had grown over months, cleared in a few days, while we kept thinking about strategic decisions.

What the Guild built

Not just bugfixes. Not just boilerplate. Real, productive work:

  • NIS2 compliance. Analytics API, annual report, cohort reporting — coordinated across three products (backend, frontend, design system).
  • Self-hosted video support. Video model, agent API, blob cleanup, studio UI — in one sweep.
  • A new agent. guild-renovator, which tests and merges dependency updates autonomously. Built by the Guild for the Guild.

The second pass that keeps the system honest

Ten percent of all merged pull requests were audit findings. Meaning: after Warden had waved a change through, she came back to the same code later, this time as an independent auditor. She found missing tests. She spotted duplicated code. She caught a blocking HTTP call in an async context. Every one of those findings became its own ticket, which the pipeline then picked up itself.

The Guild improves its own code. Without us doing anything.

Why this works

Three design decisions carry the entire system.

Labels instead of a workflow engine

The pipeline has no central server, no proprietary database, no lock-in. GitHub issue labels are the single source of truth. If ready-for-dev is on the issue, Smith knows what to do. If ready-for-review is on it, Warden knows she's up. If someone wanted to rewrite the Guild tomorrow, they could. There's nothing to migrate.

Adversarial review instead of rubber-stamping

Three independent AI reviewers look at the same diff, and none of them knows what the others are doing. One doesn't know the stories and hunts for code smells. One walks every boundary case systematically. One checks whether each acceptance criterion was actually met. If you want to convince one reviewer, you have to convince three. That catches more than any single pair of eyes.

Hard limits instead of endless loops

Every fix attempt has a hard maximum: two iterations for Warden, two for Seal, three for GreenLight (our CI repair). After that: escalation. No zombie jobs running in circles burning tokens. If a machine can't do it, we see it immediately and decide.

What the Guild can't do

Here's the honest line we don't cross:

  • Product strategy. What should be built is decided by humans. The Guild executes — it doesn't decide what.
  • Deep design work. Visual judgment, user research, brand instinct — that stays human.
  • Architecture with long-term strategic weight. When a decision reaches years into the future, we're at the table, not an AI.
  • Production incident response. When something's on fire at 3am, you don't want an AI in the middle.

Our take: naming these limits honestly is exactly why the Guild works so reliably within its limits.

That doesn't mean we work without AI in these areas. We have agents for product strategy, design-thinking workshops, user research, incident analysis. But that's a story for another time. What matters is the clear separation: every agent has its own space, its own limits, its own responsibility. The Guild builds software. Other helpers think about other things. The separation is no accident — it's what carries the quality.

Does this replace developers, designers, or stakeholders?

No.

Honestly: the Guild gets you 80 to 90 percent of the way. The final few percent that separate good software from adequate software come from humans. Developers who flip an architectural switch. Designers who feel an interaction moment. Stakeholders who throw out a product assumption because they know their market.

The point is: those 10 to 20 percent now happen in the right place. No longer spread over weeks of routine work that dilutes good ideas. But at clean touch points in the workflow, where a glance at the PRD, a comment on a story, an override in review, or a strategic question has immediate impact.

The result: more software. At higher quality. That stays sustainable, because it's documented, tested, and audited from day one. And humans who spend their energy where humans are actually irreplaceable.

The Guild replaces no one. It frees up the work of the humans who work with it.

What this means for you

When you work with Brumm Labs, this workshop builds for you. Concretely, that means:

We build for us. That's exactly why we can build for you.

The Guild wasn't born in a hackathon. It wasn't built for marketing. It came into being because we decided to run multiple products at the same time without giving ourselves up.

Every agent you read about above (Scribe, Smith, Warden, Seal, Werkstatt) was forged through real work. Every hard-limit rule comes from a moment when something went wrong and we learned what can't be automated. Every convention comes from a day when something would have slipped through the cracks if we hadn't had structure.

That's the difference. Most AI agencies sell a promise. We show you a workshop.

If you want to know what it feels like when it works on your product, write to us. We're happy to open the door.

Brumm Labs is a workshop for AI-powered software in Hamburg. We build products for our own brands — and step in for clients from the first sketch to a successful launch.