Paper receipts are the last mile of expense data. Everything else in a modern finance stack is structured, searchable, and automatable. Receipts are photos of thermal paper.
If you're building anything that touches expenses — a reimbursement workflow, an accounting integration, a spend analytics dashboard — you eventually need to solve the receipt parsing problem. Doing it yourself means training models, handling edge cases, dealing with a hundred different receipt formats in a hundred different languages. That's a lot of infrastructure for something that isn't your core product.
The alternative: use an API that handles the extraction and returns clean JSON.
What the JSON output contains
When Receipt Converter processes a receipt, the extracted data follows a consistent structure:
{
"vendor": "Whole Foods Market",
"vendor_address": "945 N Michigan Ave, Chicago, IL 60611",
"vendor_phone": "(312) 587-0648",
"date": "2026-02-19",
"time": "10:42",
"items": [
{
"name": "Organic Oat Milk",
"quantity": 2,
"unit_price": 5.98,
"total_price": 11.96
},
{
"name": "Sourdough Loaf",
"quantity": 1,
"unit_price": 4.49,
"total_price": 4.49
}
],
"subtotal": 28.44,
"tax": 1.42,
"tax_rate": "5%",
"tip": null,
"total": 29.86,
"payment_method": "Visa ****4821",
"currency": "USD",
"receipt_number": "WF-20260219-4821"
}
Every field is typed and consistently named across all receipts. Null fields appear when the data isn't on the receipt rather than being omitted — so your parsing code doesn't need to handle missing keys.
What this looks like in practice
How to get JSON output from the UI
If you're evaluating the extraction quality before building an integration, you can test it manually:
- Go to Receipt Converter and upload a receipt photo
- After extraction, click Export and choose JSON
- Download and inspect the output
The JSON from the UI is identical to what you'd get from an API call. Use this to verify the extraction quality on your specific receipt types before building anything.
Drop any receipt photo below. Results in a few seconds, free.
Building your own extraction pipeline
If you need to process receipts programmatically, here's the approach:
Option 1: Use the Receipt Converter API (coming soon) A direct REST API is on the roadmap. Submit a multipart form with the receipt image, get back structured JSON. No model training, no infrastructure.
Option 2: Build on top of GPT-4 Vision Receipt Converter is built on GPT-4 Vision with a carefully engineered system prompt that handles edge cases: multi-tax receipts, foreign currencies, handwritten amounts, poor image quality. You can replicate this approach:
import openai
import base64
def extract_receipt(image_path: str) -> dict:
with open(image_path, "rb") as f:
image_data = base64.b64encode(f.read()).decode()
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{image_data}",
"detail": "auto"
}
},
{
"type": "text",
"text": "Extract all receipt data as JSON with fields: vendor, date, items (name, quantity, unit_price, total_price), subtotal, tax, tax_rate, tip, total, currency, payment_method."
}
]
}],
response_format={ "type": "json_object" }
)
return json.loads(response.choices[0].message.content)
The key gotchas when building this yourself:
- Force JSON output mode with
response_format - Handle
not_a_receiptcases (someone uploads a non-receipt image) - Resize images before sending — large images slow down processing and cost more tokens
- Add retry logic for API timeouts
Option 3: Use an OCR service + post-processing AWS Textract, Google Document AI, and Azure Form Recognizer can extract text from receipts. The challenge is that raw OCR output is unstructured — you still need to parse it into the fields you need. Accuracy on complex receipts (multiple tax rates, discounts, handwritten fields) is lower than vision model approaches.
Common receipt parsing edge cases
If you're building a parser, these are the cases that will catch you:
Multi-tax receipts. Some jurisdictions charge different tax rates on different items (e.g., prepared food vs. packaged goods). A single receipt might have multiple tax lines. Your schema needs to handle arrays of tax entries, not just a single tax field.
Tip lines. Restaurant receipts often have a pre-filled tip or a tip line. The pre-tip total and post-tip total both appear. Make sure you're capturing the right total for expense purposes.
Quantity formats. "2x" versus "2" versus "qty: 2" versus the item listed twice. Vision models handle these better than rule-based parsers.
Handwritten amounts. Especially on restaurant receipts where a server writes in the tip. Legibility varies but vision models generally handle clear handwriting.
Foreign currency and dates. € vs EUR, 14/03/2026 vs March 14, 2026 vs 2026-03-14. Normalize these in your post-processing.
If your users are primarily in one country or industry, test on receipts from that context. Restaurant receipts look different from retail receipts look different from hotel folios. The more specific your test set, the more confident you can be in your extraction quality.
Storing and querying the structured data
Once you have JSON, you can store it in any database and query it like any other structured data:
-- Find all receipts over $100 from the last 30 days
SELECT vendor, date, total, currency
FROM receipts
WHERE total > 100
AND date >= CURRENT_DATE - INTERVAL '30 days'
ORDER BY total DESC;
-- Sum expenses by vendor for a date range
SELECT vendor, SUM(total) as total_spend, COUNT(*) as receipt_count
FROM receipts
WHERE date BETWEEN '2026-01-01' AND '2026-03-31'
GROUP BY vendor
ORDER BY total_spend DESC;
The consistent JSON structure makes this straightforward. Vendor names from the same merchant might have minor variations ("Whole Foods" vs "WHOLE FOODS MKT") — you'll want a normalization pass if exact grouping matters.
If you're not building a system but just need JSON output for occasional use, the manual export from the UI works well. The batch processing workflow covers how to handle larger volumes without writing any code.
Test the JSON extraction on your receipts. Try Receipt Converter free →