Back to Reports & Documents

Estimate Extraction: PDF to ESX Pipeline

Convert insurance estimate PDFs into Xactimate ESX files. Walk through the 6-phase matching funnel, OCR detection, the math validator, and how the corrections loop trains future matches.

Intermediate6 min readUpdated 2026-05-23
All roles

You'll learn

  • The high-level pipeline that turns a PDF estimate into an ESX file
  • The 6 matching phases and which ones are free vs. paid (in compute cost)
  • When the "Place under Miscellaneous" escape hatch is the right call
  • How the math validator catches classifier errors before they hit your ESX
  • How correcting a match teaches the system for future estimates

Why this matters

The carrier sent you their estimate as a PDF. Sometimes it is a clean Xactimate export, sometimes it is a scan of a printout that has been faxed twice. Either way, you need it in Xactimate format so you can do your line-item analysis, compare scopes, and write your own position.

Without extraction, you re-type every line item by hand. Item description, quantity, unit price, RCV, depreciation, ACV, all of it. A typical roof claim has 40 to 80 line items, and the math has to reconcile when you are done. Three hours per estimate is normal. Errors creep in. Evenings disappear. You start cutting corners on the analysis just to get the file out the door.

Most useful if you do line-item review, which is most appraisers eventually. Carrier appraisers feel this most because they almost always work from the carrier's existing estimate as the starting point. Insured appraisers benefit too once they have a carrier scope to push against.

AwardLettr can take an insurance estimate PDF, whether it is a native Xactimate export or a carrier-formatted spreadsheet, and convert it into a Xactimate ESX file you can import. The hard part is matching free-text line items ("R&R asphalt shingles, laminated, high grade") to the right Xactimate catalog code (RFG ASPHS H). That is what the matching funnel exists for.

The pipeline at a glance

1

PDF upload

You upload the estimate PDF

2

OCR detection

Text PDF vs. scanned PDF detection

3

Extraction

Claude Sonnet pulls structured line items from clean text

4

6-phase matching

Each line item runs through the funnel

5

Math validator

RCV/ACV sums sanity-checked

6

ESX generation

Download the ESX and import into Xactimate

OCR: text PDF vs. scanned PDF

Most carrier estimates (about 60%) are text-based PDFs. AwardLettr extracts the text directly with a lightweight parser. The other 40% are scanned images (faxed estimates, photocopies of mailed packets) and need OCR. For those, AwardLettr runs AWS Textract to recover the text along with confidence scores.

Why the two paths

Once both paths have clean text, the extraction step uses Claude Sonnet on text tokens instead of vision tokens. That is dramatically cheaper than feeding the raw PDF as an image, and the extraction quality is the same as long as the OCR step produced good text.

The 6-phase matching funnel

For each line item Claude extracts, AwardLettr walks through six matching phases. Each phase is more expensive (in compute or LLM cost) than the last. The funnel short-circuits as soon as one phase returns a high-confidence match, so most items get resolved cheaply in the early phases.

PhaseSourceCostWhat it does
0Global correctionsFree, instantWaze-style community consensus mappings — corrections every workspace has agreed on.
1Workspace correctionsFree, instantMappings YOU have made previously. Your past corrections become future free matches.
2Normalized descriptionFree, instantDeterministic string match against catalog descriptions and aliases.
3Category classifierCheapKeyword-based TF-IDF narrows the search to 1-3 of the 85 categories.
4Vector searchMediumpgvector + OpenAI embeddings within the classified categories only.
5LLM arbitrationMost expensiveClaude Haiku picks the best fit from a shortlist when the vector search is ambiguous.

The corrections loop trains the system

When you correct a match, the original description is added as an alias on the target catalog item. Future matches with the same description resolve in Phase 2 (free, instant) instead of going through vector search or LLM arbitration. Your corrections also flow back into the global corrections layer (Phase 0) over time, so the whole community benefits.

"Place under Miscellaneous": the escape hatch

Some line items genuinely do not have a clean Xactimate match. Custom labor entries, carrier-specific overhead lines, weird one-off charges. Rather than force a bad match, you can mark those items "Place under Miscellaneous" and AwardLettr drops them into the ESX as a USR (user-defined) entry in the Miscellaneous category with the original description preserved.

  • Use it when no catalog item is even close to right.
  • Do NOT use it as a shortcut for items that DO have a match but you have not corrected the wrong guess yet. Correcting the match teaches the system; "Place under Miscellaneous" does not.
  • The line item still keeps its RCV/ACV values, just under a USR/MISC placeholder code.

The math validator

After extraction, AwardLettr validates that the RCV and ACV totals reported in the PDF match the sum of the line items it extracted. If they do not match, something went wrong in extraction (a line was missed, or a number was misread).

  • Pre-tax subtotal must match the sum of line-item amounts.
  • O&P (overhead & profit) lines are accounted for separately and validated.
  • Depreciation deltas (RCV - ACV) are sanity-checked against per-line depreciation values.
  • If validation fails, you see a warning before ESX generation, with the discrepancy highlighted.

Always review the validator output

A clean validator pass means the math reconciles. It does NOT mean every line item matched the right Xactimate code. Spot-check the matches themselves, especially for unusual trades.

Generating the ESX

Once matches are confirmed (and any "Place under Miscellaneous" items are flagged), AwardLettr writes the ESX file. The file is a standard Xactimate import format, so you can open it directly in Xactimate desktop or Xactimate Online.

Free-form item names can be rejected on import

Some price lists (notably certain State Farm price lists) reject items with special characters or unusual phrasing in the description field. If Xactimate refuses to import an ESX, the most common culprit is a free-form item name in a USR entry. Strip special characters from the description and re-export.

Old way vs. AwardLettr

The old way

  • Read the PDF, type each line into Xactimate by hand
  • Look up the matching code for every line in the price list
  • Hope you do not transpose a digit or miss a line
  • Two hours per estimate, then you double-check the math

AwardLettr

  • Upload PDF, watch the funnel match line items in seconds
  • Confirm or correct flagged matches
  • Math validator catches extraction errors before export
  • Download ESX and import into Xactimate — minutes, not hours

Common pitfalls

  • Scanned PDFs with poor source quality (scans of faxes, low-res photocopies). Textract recovers most of it, but illegible numbers stay illegible. Re-scan or rekey the worst sections.
  • Special characters in item names break TOL/ESX parsing on certain price lists. Strip ampersands and curly quotes from USR descriptions before export if Xactimate rejects the import.
  • Using "Place under Miscellaneous" as a shortcut to skip corrections. That item will need the same one-off treatment on every future estimate that contains it.
  • Trusting the math validator alone. A clean math pass confirms totals reconcile, not that line items matched the right codes. Spot-check unusual trades by hand.
Suggest an editLast updated 2026-05-23
Estimate Extraction: PDF to ESX Pipeline | AwardLettr Docs