The right plan for this exact life

Open enrollment hands an ordinary person a genuinely hard problem: dozens of plan combinations, a year of their own health they cannot yet see, and a vocabulary most insured adults cannot define. Then it gives them a deadline.

There is an objection worth granting up front, because it sounds right: people barely try. In Voya's 2024 survey, 91 percent of working Americans said they typically re-select last year's plan, and 49 percent spend under twenty minutes on the whole review, less time than many people give a restaurant booking. So maybe the fix is effort, or a sterner reminder from HR. I believed a version of that until I read the choice studies, which tested effort and menu length directly and found that neither moves the result much. Comprehension moves it. Plan choice is an optimization with more variables than a person can hold in their head, and we hand it, without tools, to the one party who has never seen the math. This paper walks through what that costs, who pays it, why the popular fixes backfire, and what a system built for the job has to look like.

A Keel Labs paper, written by Garrett · Figures are point-in-time and directional · Every number traces to a source at the end

The mistake

A pay cut you can sign up for

The cleanest evidence in benefits research comes from one large US firm that let its 23,894 employees compose their own coverage: pick a deductible, pick a coinsurance rate, pick a copay tier, pick an out-of-pocket maximum. Forty-eight combinations. Every combination carried the same insurer, the same network, the same covered services. The only thing an employee could change was the financial plumbing, which gave the menu a property almost no real menu has: every choice could be scored objectively against every other choice, for every possible amount of care. Bhargava, Loewenstein and Sydnor published the score in the Quarterly Journal of Economics as Choose to Lose.

The pricing did something strange. Cutting the deductible from $1,000 to $750 cost an average of $528 in extra annual premium. The most that $250 of extra deductible coverage can ever return is $250. Anyone buying that upgrade accepted a loss of at least $278 a year before seeing a single doctor, and the rest of the low-deductible menu was priced the same way. In the language of economics, nearly every low-deductible plan was dominated: more expensive at every possible level of health spending, sick or healthy, lucky or not.

Figure 2 · The arithmetic of a dominated planpause

One cell of the studied menu: $528 of premium goes out to buy a deductible $250 lower. Watch the return lane stop at $250. The loss is locked in before any care happens. Bhargava, Loewenstein and Sydnor, QJE 2017.

61 percent of employees picked a dominated plan anyway. Adjust for the tax treatment of premiums and the figure is still 55 percent. The average employee in a dominated plan overpaid by $372 a year, which is 24 percent of the premium they chose and about 2 percent of salary. Picture a 2 percent pay cut, self-renewing every autumn, exchanged for nothing.

61%

of 23,894 employees chose a plan that was worse at every possible level of health spending. Average overspend: $372 a year, 24 percent of the chosen premium, roughly 2 percent of salary.

And it ran downhill. Employees in the firm's three lowest salary bands, two-thirds of the workforce, chose dominated plans at 63 percent, against 38 percent for everyone above them. In the full regression, an employee earning under $20,000 was 24 percentage points more likely to choose a dominated plan than one earning over $100,000, all else equal, and because the dollar loss was similar up and down the pay scale, the loss as a share of salary was steeply regressive. The next year, lower earners were also less likely to switch out: 20 percent switched plans, against 28 percent of higher earners. The most regressive tax in benefits is collected through a dropdown menu.

The diagnosis

Effort was never the variable

Three explanations sound plausible, and the authors tested all three. Maybe 48 options is too many. Maybe people will not spend the time. Maybe some people knowingly pay extra to escape a big deductible, because peace of mind is worth something. So the authors ran experiments with the menu cut to four plans in a single table, differing only in deductible and premium, where search cost was close to zero. 66 percent of subjects still chose a dominated plan, a rate above the employees' own. Raising the stakes did not repair it. And the field choices were too internally inconsistent to read as a taste for protection: many employees paid heavily to trim the deductible while accepting cost-sharing elsewhere that handed the exposure right back.

What did predict the error was comprehension, measured directly. Among experiment participants who scored high on a basic insurance literacy test, 22 percent chose dominated plans; among those who scored low, 45 percent. Score people instead on whether they understood a plan well enough to estimate what care would cost under it, and the gap widens to 8 percent against 47. Among participants who scored high on every measure, dominated choice nearly vanished. Dominated choice behaves like a reading problem: it disappears the moment the reading does.

Now look at how the population reads. When Loewenstein's group quizzed insured adults on the four input variables of every plan comparison, 78 percent understood a deductible, 72 percent a copay, 55 percent an out-of-pocket maximum, and 34 percent coinsurance. Fourteen percent understood all four. Asked to compute the cost of a four-day hospital stay under a simple plan, 11 percent got it right, and the misses ran to thousands of dollars. The overconfidence is the sharp edge: 57 percent said they understood coinsurance before 34 percent demonstrated it. KFF found the same shape in a national sample: 4 percent answered all ten of its basic insurance questions, and 16 percent could compute an out-of-network lab bill that required applying a coinsurance rate. These are the variables of the optimization. Most of the people asked to optimize cannot read them, and do not know they cannot.

Figure 3 · The input variables, as read by the publicpause

Each strip is 100 insured adults; filled cells understood the term. Coinsurance, the least understood input, is the one modern plan design leans on hardest. Loewenstein et al., Journal of Health Economics 2013.

34%

of insured adults could identify what coinsurance means, against 57 percent who were sure they knew. The least understood term in the plan is the one doing the most financial work.

The pattern

Not one firm, and it does not heal

One firm could be one badly priced menu. It is not. Liu and Sydnor took national data on what employers actually offer and found that where a firm pairs a high-deductible plan with a lower-deductible plan, the high-deductible option carries lower maximum possible spending 62 percent of the time, and strictly dominates its sibling at roughly half of firms, with typical savings above $500 a year. Sinaiko and Hirth found a University of Michigan plan that was dominated outright; a third of covered workers sat in it. The trap in Choose to Lose is the normal architecture of American employer menus, which is why we gave it a paper of its own.

The natural rebuttal is that people learn. Medicare Part D is the longest-running test of that hope, and it failed. Abaluck and Gruber found that only about 12 percent of seniors chose the plan that minimized their drug costs, that the average enrollee could have cut total spending by about 30 percent, and that welfare would have been roughly 27 percent higher under fully rational choice. Their follow-up tracked the same market through 2010 and found foregone savings growing, with little learning detectable at the individual or cohort level. Honesty requires the footnote: Ketcham, Kuminoff and Powers published a formal Comment disputing the welfare framing, arguing most choices can be reconciled with consumer theory under full information, and Abaluck and Gruber replied. We rest nothing on the contested welfare magnitudes. The descriptive facts, few enrollees minimize cost and the gap did not close with experience, survived the exchange.

Meanwhile the wrong choice, once made, hardens. Handel measured inertia at a large employer and found workers behaving as if forgoing $2,032 a year to avoid switching plans. Ericson showed insurers pricing against exactly that: Part D carriers enter markets cheap and ratchet premiums on locked-in enrollees, so an older plan runs about 10 percent pricier than an identical newer one. Set those findings beside Voya's 91 percent re-selection rate and the picture closes. A wrong choice made once stings. Re-stamped unexamined every autumn, it compounds into an annuity paid out against the employee, and the supply side prices the annuity in.

Figure 4 · The defaultpause

Year after year, the same plan gets re-stamped without a second look. Watch the check land before any comparison happens. Voya Financial survey, October 2024.

$2,032.

What the average employee in Handel's study behaved as if they were willing to forgo, per year, to avoid re-deciding. The market prices plans against that reluctance.

The trap

Why the obvious fixes point at each other

By this point two fixes look obvious: shorten the menu, and nudge people out of their defaults. Both have been tested, and the results should slow down anyone selling either one alone. Shortening the menu was the four-plan experiment: 66 percent still chose dominated plans, because the comparison itself, at any menu length, exceeded the reader. Nudging is worse. Handel modeled a policy that eliminates three-quarters of measured inertia. Individual choices improve, exactly as intended. Then the healthy re-sort themselves into cheaper plans, the sick are left concentrated in expensive ones, and adverse selection worsens enough to roughly double the existing 8.2 percent welfare loss in his setting. The inertia everyone wants to cure was accidentally holding the risk pool together.

Take both findings seriously and the space of honest fixes gets narrow. A blanket nudge moves everyone the same direction and unravels the pool. A simplified menu still asks a person to run a comparison they cannot read. What survives is harder and more specific: solve the comparison for one person at a time, correctly, while watching what the solutions do to the pool in aggregate. That is an engineering claim, and it is the claim this paper exists to make.

The menu is priced against the mistake. The mistake is measurable. So is its repair.

The shape of the problem

This is a job for a model, and we can say why

Choosing a plan is a per-person decision, made under uncertainty, repeated every year, with a measurable outcome. In machine learning that shape has a name: it is close to a contextual bandit. Each recommendation is conditioned on a context vector for one specific person, their family, their conditions, their budget, their tolerance for downside, and the system learns from how each recommended choice turns out. The same problem shape drives the recommendations people already trust for trivial decisions. Here it is pointed at one that costs real money.

Be precise about where the model earns its keep, because part of this job needs no model at all. Screening out dominated plans is arithmetic; a checklist does it, and any enrollment tool that fails to is malpractice. The model matters on the plans that survive the screen, where the right answer genuinely depends on the life: a family expecting a second child, a managed chronic condition, a thin emergency fund that makes a high deductible a different proposition than it is for a saver. The key shift is the objective. A generic benefits wizard optimizes for getting you to the end of the form, and it succeeds; the form gets finished, the dominated plan gets chosen, everyone moves on. The right objective is to minimize a person's realized cost across the whole year, including the care they will probably need and the tail risk they cannot afford to carry. Optimize the year, not the click.

Figure 5 · Profile to fitpause

The signals of a real life assemble into a context, and the context matches to the plan that protects it best. Change the life, change the match.

A person cannot brute-force forty-eight plan combinations against their own uncertain future. A model can, and it can show its work while it does.

The recursion

The recommender that grades itself on real outcomes

Here is the part that compounds, and the reason this sits in a research paper rather than a product brochure. Every recommendation produces an outcome: a plan was chosen, the year happened, the costs arrived. Those outcomes are labels. Labeled outcomes train a better recommender, which produces better outcomes, which produce better labels. The loop closes on itself, and unlike engagement metrics, it cannot be gamed without getting caught by reality, because the label is what the year actually cost.

a16z has argued, correctly in our view, that raw data volume makes a weak moat; rows are a commodity and scale effects flatten quickly. The asset that is hard to replicate is outcome-labeled benefits data: the record of what was chosen, what it actually cost, and whether it held when someone got sick. We will say plainly that this is an argument we hold rather than a settled finding, and we intend to publish the evidence as our own numbers accumulate.

The limits

Where this can go wrong, and what would change our mind

Personalization can overfit to noisy signals, and it can quietly encode bias against the very people Choose to Lose found erring most. So a recommendation a person cannot understand is a recommendation we will not ship. Every suggestion has to be explainable to the human receiving it and checkable against the plan documents themselves, and the person who disagrees keeps the last word.

Three findings would make us retract this paper's framing, and we are watching for all three. If recommendations tuned to individual context cannot beat the dumb rule, screen out dominated plans and take the cheapest survivor, then the optimization story is overbuilt and a checklist would have done. If the error rate in our own system tracks income the way dominated choices did, we will have automated the original injustice instead of repairing it. And if our recommendations re-sort risk the way Handel's modeled nudge did, improving each choice while degrading the pool that prices everyone's coverage, the welfare claim fails even where the individual claim holds. Those are the three numbers we intend to publish, whichever way they come out.

What Keel Labs is building

An enrollment engine that designs the plan around one life, and gets better every time it learns how the year turned out.

This sits directly on top of the rest of our research, so the parts connect cleanly. The relay paper showed why no one can see the whole. The no-price paper showed how to hold a person's plan, needs, and real costs in one place. Personalized enrollment is what you do once you can: stop handing people a menu and start designing the choice around their life. Here is how it assembles.

Screen the menu before anyone chooses

Dominated options are arithmetic, and at roughly half of US firms they are sitting on the menu. Keel flags them first, so no member ever starts from a trapped choice set.

Build the context from a real life

Family, conditions, budget, risk tolerance, and likely care become a structured per-person context, drawn from the benefits graph, never a guess from a generic profile.

Optimize the year, not the click

Every surviving plan is scored against the person's predicted annual cost and downside risk, instead of nudging them to finish a form. The objective is their outcome, full stop.

Close the outcome loop →

What was chosen, what it cost, and whether it protected the member becomes labeled data. That data trains the next recommendation. The system grades itself on reality, not on engagement.

Fathom and Amanda put the answer in reach →

Fathom makes the recommendation, grounded in the plan documents, and explains in plain terms why this plan beats the default. Amanda walks one person through it at the moment they decide, and recovers the $372 they were about to overspend.

The mistake in Choose to Lose was a hard optimization handed to the one party least equipped to solve it. We are moving that work to a system built for it, publishing the three numbers that would falsify us, and keeping the person in charge of the answer.

End the default, and you end the most expensive recurring mistake in benefits, one person and one year at a time.

What this paper does not claimWe are not claiming a model can predict any one person's year. Individual utilization is hard to forecast, and an honest system reasons about likelihoods and downside risk rather than pretending to certainty. The contextual-bandit framing is a model of the problem, not a finished product, and the outcome-labeled-data moat is an argument we hold, not a proven result. On the evidence: the Choose to Lose numbers come from a single large firm and may not generalize in magnitude; 61 percent is the nominal dominated-choice rate and 55 percent the tax-adjusted one; the $528-for-$250 example is one cell of that menu; and the income gradient quoted here, 63 percent against 38, is the paper's in-text comparison of the three lowest salary bands against the rest, with the 24-point figure from its regression of the under-$20,000 band against the over-$100,000 band. Popular summaries quote a wider 70-versus-30 spread read off the paper's figure; we use the printed numbers. The Part D welfare estimates are formally disputed by Ketcham, Kuminoff and Powers, which is why we rely only on the descriptive findings. Handel's $2,032 is a structural estimate of behavior consistent with inertia, not a measured payment, and some of it reflects real switching costs rather than pure error. The Voya figures are self-reported answers to one benefits vendor's survey; we use them for direction. The Loewenstein comprehension sample was 202 insured adults. The savings we cite are the recovery of documented gaps, not a new promise.

SourcesBhargava, Loewenstein and Sydnor, "Choose to Lose: Health Plan Choices from a Menu with Dominated Options," Quarterly Journal of Economics 132(3), 2017 (23,894 employees; 48 options; 61% nominal and 55% tax-adjusted dominated choice; $372/yr average excess; 24% of chosen premium; 2% of salary; $528 premium for $250 of deductible reduction; 63% vs 38% by salary band; under-$20K band +24 points vs $100K+; next-year switching 20% vs 28%; four-plan experiment 66%; dominated choice 45% vs 22% by literacy and 47% vs 8% by plan comprehension) · Loewenstein et al., "Consumers' Misunderstanding of Health Insurance," Journal of Health Economics 32(5), 2013 (n=202; deductible 78%, copay 72%, OOP maximum 55%, coinsurance 34%; all four 14%; hospital-stay computation 11%; 57% self-reported coinsurance understanding) · KFF, "Assessing Americans' Familiarity with Health Insurance Terms and Concepts," Nov 2014 (n=1,292; 4% answered all ten; 16% computed the out-of-network lab cost) · Liu and Sydnor, "Dominated Options in Health Insurance Plans," AEJ: Economic Policy 14(1), 2022 (62% lower maximum spending; dominance at roughly half of firms; typical savings >$500/yr) · Sinaiko and Hirth, "Consumers, Health Insurance and Dominated Choices," Journal of Health Economics 30(2), 2011 (one-third of workers enrolled in the dominated plan) · Abaluck and Gruber, "Choice Inconsistencies Among the Elderly," American Economic Review 101(4), 2011 (~12% chose the cost-minimizing plan; ~30% potential savings; welfare ~27% higher under rational choice) and "Evolving Choice Inconsistencies in Choice of Prescription Drug Insurance," AER 106(8), 2016 (foregone savings grew 2006 to 2010; little learning); Ketcham, Kuminoff and Powers, AER Comment, 2016, and Abaluck and Gruber, Reply, 2017 (the welfare dispute noted in the endnote) · Handel, "Adverse Selection and Inertia in Health Insurance Markets: When Nudging Hurts," American Economic Review 103(7), 2013 (inertia $2,032/yr, population SD $446; a three-quarters reduction in inertia roughly doubles the 8.2% welfare loss from adverse selection) · Ericson, "Consumer Inertia and Firm Pricing in the Medicare Part D Prescription Drug Insurance Exchange," AEJ: Economic Policy 6(1), 2014 (older plans ~10% pricier than comparable new entrants) · Voya Financial Consumer Insights & Research, open enrollment survey, October 2024 (91% typically re-select the prior plan; 49% spend under 20 minutes) · Casado and Lauten, "The Empty Promise of Data Moats," a16z, 2019. Figures are point-in-time and directional.

The right plan, for this exact life.

A pay cut you can sign up for

Effort was never the variable

Not one firm, and it does not heal

Why the obvious fixes point at each other

This is a job for a model, and we can say why

The recommender that grades itself on real outcomes

Where this can go wrong, and what would change our mind

An enrollment engine that designs the plan around one life, and gets better every time it learns how the year turned out.

Screen the menu before anyone chooses

Build the context from a real life

Optimize the year, not the click

Close the outcome loop →

Fathom and Amanda put the answer in reach →

Keep reading: the fix is not only a better chooser. At half of US firms, the menu itself is broken.