Estimate Extraction: PDF to ESX Pipeline

Why this matters

The carrier sent you their estimate as a PDF. Sometimes it is a clean Xactimate export, sometimes it is a scan of a printout that has been faxed twice. Either way, you need every line in a workable, coded format so you can do your line-item analysis, compare scopes, and write your own position.

Without extraction, you read each line and look up its code by hand. Item description, quantity, unit price, RCV, depreciation, ACV, all of it. A typical roof claim has 40 to 80 line items, and the math has to reconcile when you are done. Three hours per estimate is normal. Errors creep in. Evenings disappear. You start cutting corners on the analysis just to get the file out the door.

Most useful if you do line-item review, which is most appraisers eventually. Carrier appraisers feel this most because they almost always work from the carrier's existing estimate as the starting point. Insured appraisers benefit too once they have a carrier scope to push against.

AwardLettr can take an insurance estimate PDF, whether it is a native Xactimate export or a carrier-formatted spreadsheet, and convert it into a Xactimate ESX file you import through Xactimate Online. Prefer a spreadsheet? The same coded line items also export as a CSV you can open in Excel or hand to a VA. The hard part is matching free-text line items ("R&R asphalt shingles, laminated, high grade") to the right Xactimate catalog code (RFG ASPHS H). That is what the matching funnel exists for.

The pipeline at a glance

PDF upload

You upload the estimate PDF

OCR detection

Text PDF vs. scanned PDF detection

Extraction

Claude Sonnet pulls structured line items from clean text

6-phase matching

Each line item runs through the funnel

Math validator

RCV/ACV sums sanity-checked

ESX generation

Download the ESX (or CSV) and import into Xactimate

OCR: text PDF vs. scanned PDF

Most carrier estimates (about 60%) are text-based PDFs. AwardLettr extracts the text directly with a lightweight parser. The other 40% are scanned images (faxed estimates, photocopies of mailed packets) and need OCR. For those, AwardLettr runs AWS Textract to recover the text along with confidence scores.

Why the two paths

Once both paths have clean text, the extraction step uses Claude Sonnet on text tokens instead of vision tokens. That is dramatically cheaper than feeding the raw PDF as an image, and the extraction quality is the same as long as the OCR step produced good text.

The 6-phase matching funnel

For each line item Claude extracts, AwardLettr walks through six matching phases. Each phase is more expensive (in compute or LLM cost) than the last. The funnel short-circuits as soon as one phase returns a high-confidence match, so most items get resolved cheaply in the early phases.

Phase	Source	Cost	What it does
0	Global corrections	Free, instant	Waze-style community consensus mappings — corrections every workspace has agreed on.
1	Workspace corrections	Free, instant	Mappings YOU have made previously. Your past corrections become future free matches.
2	Normalized description	Free, instant	Deterministic string match against catalog descriptions and aliases.
3	Category classifier	Cheap	Keyword-based TF-IDF narrows the search to 1-3 of the 85 categories.
4	Vector search	Medium	pgvector + OpenAI embeddings within the classified categories only.
5	LLM arbitration	Most expensive	Claude Haiku picks the best fit from a shortlist when the vector search is ambiguous.

The corrections loop trains the system

When you correct a match, the original description is added as an alias on the target catalog item. Future matches with the same description resolve in Phase 2 (free, instant) instead of going through vector search or LLM arbitration. Your corrections also flow back into the global corrections layer (Phase 0) over time, so the whole community benefits.

"Place under Miscellaneous": the escape hatch

Some line items genuinely do not have a clean Xactimate match. Custom labor entries, carrier-specific overhead lines, weird one-off charges. Rather than force a bad match, you can mark those items "Place under Miscellaneous" and AwardLettr drops them into the ESX as a USR (user-defined) entry in the Miscellaneous category with the original description preserved.

Use it when no catalog item is even close to right.
Do NOT use it as a shortcut for items that DO have a match but you have not corrected the wrong guess yet. Correcting the match teaches the system; "Place under Miscellaneous" does not.
The line item still keeps its RCV/ACV values, just under a USR/MISC placeholder code (and flagged UNMATCHED in the CSV's Status column).

The math validator

After extraction, AwardLettr validates that the RCV and ACV totals reported in the PDF match the sum of the line items it extracted. If they do not match, something went wrong in extraction (a line was missed, or a number was misread).

Pre-tax subtotal must match the sum of line-item amounts.
O&P (overhead & profit) lines are accounted for separately and validated.
Depreciation deltas (RCV - ACV) are sanity-checked against per-line depreciation values.
If validation fails, you see a warning before ESX generation, with the discrepancy highlighted.

Always review the validator output

A clean validator pass means the math reconciles. It does NOT mean every line item matched the right Xactimate code. Spot-check the matches themselves, especially for unusual trades.

Generating the ESX

Once matches are confirmed (and any "Place under Miscellaneous" items are flagged), AwardLettr writes the ESX file. Import it through Xactimate Online (X1): Desktop does not accept the file directly, but once you import it online the project syncs to Desktop through your Xactimate profile. Prefer a spreadsheet instead? The same line items also export as a CSV (Excel), each row carrying the original description, quantity, unit, prices, RCV/ACV, the matched catalog code, and a Status column.

Import through Xactimate Online, not Desktop

Xactimate Desktop rejects the file if you try to import it directly. Import it through Xactimate Online (X1) instead, then open the synced project in Desktop. If Xactimate refuses an import for another reason, the most common culprit is a free-form item name in a USR entry: strip special characters from the description and re-export.

Old way vs. AwardLettr

The old way

Read the PDF, type each line into Xactimate by hand
Look up the matching code for every line in the price list
Hope you do not transpose a digit or miss a line
Two hours per estimate, then you double-check the math

AwardLettr

Upload PDF, watch the funnel match line items in seconds
Confirm or correct flagged matches
Math validator catches extraction errors before export
Download the ESX and import into Xactimate Online — minutes, not hours (or grab the CSV for Excel)

Estimate Extraction: PDF to ESX Pipeline

Why this matters

The pipeline at a glance

OCR: text PDF vs. scanned PDF

The 6-phase matching funnel

"Place under Miscellaneous": the escape hatch

The math validator

Generating the ESX

Old way vs. AwardLettr

Related Articles

Managing Document Templates

Generating Documents from Templates

Managing Generated Documents