Estimate Extraction: PDF to ESX Pipeline
Convert insurance estimate PDFs into Xactimate ESX files. Walk through the 6-phase matching funnel, OCR detection, the math validator, and how the corrections loop trains future matches.
You'll learn
- The high-level pipeline that turns a PDF estimate into an ESX file
- The 6 matching phases and which ones are free vs. paid (in compute cost)
- When the "Place under Miscellaneous" escape hatch is the right call
- How the math validator catches classifier errors before they hit your ESX
- How correcting a match teaches the system for future estimates
Why this matters
The carrier sent you their estimate as a PDF. Sometimes it is a clean Xactimate export, sometimes it is a scan of a printout that has been faxed twice. Either way, you need it in Xactimate format so you can do your line-item analysis, compare scopes, and write your own position.
Without extraction, you re-type every line item by hand. Item description, quantity, unit price, RCV, depreciation, ACV, all of it. A typical roof claim has 40 to 80 line items, and the math has to reconcile when you are done. Three hours per estimate is normal. Errors creep in. Evenings disappear. You start cutting corners on the analysis just to get the file out the door.
Most useful if you do line-item review, which is most appraisers eventually. Carrier appraisers feel this most because they almost always work from the carrier's existing estimate as the starting point. Insured appraisers benefit too once they have a carrier scope to push against.
AwardLettr can take an insurance estimate PDF, whether it is a native Xactimate export or a carrier-formatted spreadsheet, and convert it into a Xactimate ESX file you can import. The hard part is matching free-text line items ("R&R asphalt shingles, laminated, high grade") to the right Xactimate catalog code (RFG ASPHS H). That is what the matching funnel exists for.
The pipeline at a glance
PDF upload
You upload the estimate PDF
OCR detection
Text PDF vs. scanned PDF detection
Extraction
Claude Sonnet pulls structured line items from clean text
6-phase matching
Each line item runs through the funnel
Math validator
RCV/ACV sums sanity-checked
ESX generation
Download the ESX and import into Xactimate
OCR: text PDF vs. scanned PDF
Most carrier estimates (about 60%) are text-based PDFs. AwardLettr extracts the text directly with a lightweight parser. The other 40% are scanned images (faxed estimates, photocopies of mailed packets) and need OCR. For those, AwardLettr runs AWS Textract to recover the text along with confidence scores.
Why the two paths
The 6-phase matching funnel
For each line item Claude extracts, AwardLettr walks through six matching phases. Each phase is more expensive (in compute or LLM cost) than the last. The funnel short-circuits as soon as one phase returns a high-confidence match, so most items get resolved cheaply in the early phases.
| Phase | Source | Cost | What it does |
|---|---|---|---|
| 0 | Global corrections | Free, instant | Waze-style community consensus mappings — corrections every workspace has agreed on. |
| 1 | Workspace corrections | Free, instant | Mappings YOU have made previously. Your past corrections become future free matches. |
| 2 | Normalized description | Free, instant | Deterministic string match against catalog descriptions and aliases. |
| 3 | Category classifier | Cheap | Keyword-based TF-IDF narrows the search to 1-3 of the 85 categories. |
| 4 | Vector search | Medium | pgvector + OpenAI embeddings within the classified categories only. |
| 5 | LLM arbitration | Most expensive | Claude Haiku picks the best fit from a shortlist when the vector search is ambiguous. |
The corrections loop trains the system
"Place under Miscellaneous": the escape hatch
Some line items genuinely do not have a clean Xactimate match. Custom labor entries, carrier-specific overhead lines, weird one-off charges. Rather than force a bad match, you can mark those items "Place under Miscellaneous" and AwardLettr drops them into the ESX as a USR (user-defined) entry in the Miscellaneous category with the original description preserved.
- Use it when no catalog item is even close to right.
- Do NOT use it as a shortcut for items that DO have a match but you have not corrected the wrong guess yet. Correcting the match teaches the system; "Place under Miscellaneous" does not.
- The line item still keeps its RCV/ACV values, just under a USR/MISC placeholder code.
The math validator
After extraction, AwardLettr validates that the RCV and ACV totals reported in the PDF match the sum of the line items it extracted. If they do not match, something went wrong in extraction (a line was missed, or a number was misread).
- Pre-tax subtotal must match the sum of line-item amounts.
- O&P (overhead & profit) lines are accounted for separately and validated.
- Depreciation deltas (RCV - ACV) are sanity-checked against per-line depreciation values.
- If validation fails, you see a warning before ESX generation, with the discrepancy highlighted.
Always review the validator output
Generating the ESX
Once matches are confirmed (and any "Place under Miscellaneous" items are flagged), AwardLettr writes the ESX file. The file is a standard Xactimate import format, so you can open it directly in Xactimate desktop or Xactimate Online.
Free-form item names can be rejected on import
Old way vs. AwardLettr
The old way
- Read the PDF, type each line into Xactimate by hand
- Look up the matching code for every line in the price list
- Hope you do not transpose a digit or miss a line
- Two hours per estimate, then you double-check the math
AwardLettr
- Upload PDF, watch the funnel match line items in seconds
- Confirm or correct flagged matches
- Math validator catches extraction errors before export
- Download ESX and import into Xactimate — minutes, not hours
Common pitfalls
- •Scanned PDFs with poor source quality (scans of faxes, low-res photocopies). Textract recovers most of it, but illegible numbers stay illegible. Re-scan or rekey the worst sections.
- •Special characters in item names break TOL/ESX parsing on certain price lists. Strip ampersands and curly quotes from USR descriptions before export if Xactimate rejects the import.
- •Using "Place under Miscellaneous" as a shortcut to skip corrections. That item will need the same one-off treatment on every future estimate that contains it.
- •Trusting the math validator alone. A clean math pass confirms totals reconcile, not that line items matched the right codes. Spot-check unusual trades by hand.
Next steps