Why purchase-accounting questions break finance LLMs

Ask a capable model to allocate the purchase price in an acquisition and it will do something that looks almost right. The fixed assets get written up, the intangibles get named, the residual lands in goodwill. And then, very often, the balance sheet is wrong by exactly the amount of one line nobody asked about: the deferred tax liability on the step-up.

This is the single most reliable structural failure I see in the M&A cluster, and it is worth being precise about why. It is not an arithmetic slip. The model can multiply. It is a synthesis failure — the kind that appears when the correct answer to one part of a question depends on having already answered a different part, in a different section of the standard, correctly.

The mechanism

When one company acquires another in a transaction that is a business combination for accounting but not a corresponding step-up for tax, the acquirer revalues the target's identifiable assets to fair value on its opening balance sheet. The tax authorities, however, keep carrying those assets at their old tax base. The book value is now higher than the tax base, and that difference is a taxable temporary difference: when the stepped-up asset is eventually depreciated or sold, the book deduction will be smaller than it looks, and more tax will be paid than the book numbers suggest.

Accounting recognises that future tax cost today, as a deferred tax liability equal to the step-up multiplied by the tax rate. Crucially, that liability is itself part of the purchase price allocation. It increases the net assets you have to account for, which means it increases the residual — the goodwill.

Illustrative — €100 step-up, 25% rate

Asset step-up to fair value+100

Deferred tax liability (25% × 100)+25

Net identifiable assets added+75

Goodwill, versus a no-DTL allocation+25 higher

Omit the deferred tax liability and two things happen at once. Net identifiable assets are overstated by the full step-up rather than the step-up net of tax, and goodwill is understated by the tax amount. The plug is wrong, and because goodwill is the plug, the error hides in the one number a reader is least likely to recompute.

Why models miss it specifically

A model trained on a great deal of financial writing has seen purchase price allocations described many times. It has also seen deferred tax explained many times. What it has seen far less often is the two in the same breath — the explicit statement that the DTL on a non-deductible step-up feeds back into the goodwill calculation. The two pieces of knowledge live in different parts of the model's training the same way they live in different chapters of the textbook.

The failure is not ignorance of either rule. It is the absence of the bridge between them.

This is what makes purchase accounting such a clean diagnostic. A single-lookup question — "what is the corporate tax rate in this jurisdiction" — tells you almost nothing about whether a model is safe to deploy on deal work. A PPA question tells you a great deal, because it cannot be answered correctly by retrieval. It requires the model to hold the allocation open while it reasons about tax, recognise that the tax conclusion changes the allocation, and then close the loop. Most models do not close the loop. They answer each part well and never notice that the parts interact.

What it distorts downstream

The opening balance sheet is the foundation of everything that follows a deal. Goodwill drives the impairment test and the headroom against it. The asset step-up drives the incremental depreciation and amortization that depress post-deal earnings. Pro forma equity feeds the accretion–dilution bridge. An error in the opening allocation is not contained; it is inherited by every period and every schedule built on top of it. A model that gets the allocation wrong will produce a fully internally consistent five-year model that is wrong from year one — and internal consistency is exactly what makes the error survive review.

The grading point

You cannot catch this with a correctness score on the final number, because there is rarely a single final number — there is a balance sheet, and it balances. It balances incorrectly. Catching it requires a grader who reads the allocation the way a senior practitioner reads it: not "is the total right" but "is the goodwill the residual after the right set of items, recognised in the right order." That is a structural call, and it is precisely the call an LLM-as-judge at temperature zero is not positioned to make. It is the reason the diagnostic grades structure, and the reason a model can pass a leaderboard on finance and still misallocate the first deal you put in front of it.

— Written from practice. The example figures are illustrative, not drawn from any engagement.

Request a sample diagnostic →