Careers at Keel Labs

Teach a machine to read the fine print no human will.

We are a small, remote team building the AI that reads benefits plan documents and answers, with a citation, for real people. The questions are hard, the stakes are real, and the work shows up on the site every day.
Remote-first · async by default Small team · high ownership Ships daily · a paper a week, a note a day
How we build

Frontier engineering, pointed at a boring trillion dollars.

The hard part of benefits is not the chatbot. It is reading contradictory documents correctly, proving it, and doing it cheaply enough to offer for free. This is what the day actually looks like.

Remote

Wherever you do your best thinking

Async by default, deep work protected, meetings to decide and not to update. We hire for judgment, then get out of the way.

Token economics

Cheap and right beats expensive and right

Every answer has a cost. We treat tokens like a budget: prompt caching, context pruning, and a hard eye on cost per resolved question, because the answer has to be free to the member.

Model routing

The smallest model that can do the job

No single model for everything. We route each task to the cheapest model that can answer it and escalate to the frontier only when the question earns it. The router is its own research problem.

Training

Benefits-native, not off-the-shelf

We fine-tune and post-train on real plan documents and real questions, graded by licensed experts. A general model does not know what a qualifying life event is, or which clause governs.

Inference

Latency and cost are product features

We push inference to the cheapest place it can run correctly and watch p50 and p95 like vital signs. A right answer that arrives too late is a wrong answer to the person waiting.

Evals

Nothing ships on a guess

Every release clears the eval suite first. We would rather hold a launch than ship a confident wrong answer on someone's coverage. The bar is the whole job.

Open roles

Three problems we have not cracked.

Each role owns one. Earlier-career or senior, if you can do the work, we want to talk.

Grounding

Research Engineer, Grounding

Make the model find the right answer inside contradictory plan documents, and prove it back to the exact line.
  • Build retrieval and reasoning over the reconciled benefits graph, not raw PDFs.
  • Resolve documents that disagree with each other, and decide which clause wins.
  • Design the provenance layer so every answer cites the line it came from.
  • Push accuracy on the questions members actually ask, where wrong is not an option.
Apply
Evals

Member of Technical Staff, Evaluations

There is no standard for whether benefits advice is correct. Build it, and make it the gate every model release has to clear.
  • Design eval sets with licensed benefits experts, graded against primary plan documents.
  • Build automated grading and regression suites that run on every change.
  • Red-team Fathom for the failure that matters most: a confident wrong answer on real coverage.
  • Own the bar for what is allowed to ship.
Apply
Agents

Applied Engineer, Agents

Build software that enrolls, files appeals, and follows up for a member, with a full record of everything it did and why.
  • Design the agent loop: tool use, action planning, human-in-the-loop checkpoints.
  • Route every action through the governance layer before it executes, and log it.
  • Turn a denied claim into a filed appeal in minutes, end to end.
  • Own reliability where a mistake touches someone's coverage and their paycheck.
Apply
The Keel screen

Three problems. Each is a day in one of these roles.

No trivia. Real benefits, the way they arrive: priced to confuse, written to contradict, easy to get wrong. Work each one. You only see the next after you clear the last.

1 · Grounding 2 · Evals 3 · Judgment 4 · Prove or break
Every visit draws a fresh set, so there is no answer to look up or pass along.
Three plans, identical coverage, different prices. One can never be the cheapest, no matter how the year goes. Find it.
A year of care: $6,000drag from a healthy year to a terrible one
The one that can never win:

Don't see your exact role?

If you can do the work on this page, write us. Tell us what you would build.