Blog/Guides

Receipt to JSON: How to Automate Expense Extraction as a Developer

If you're building an expense workflow, reimbursement system, or financial tool, you need structured data from receipts. Here's what the JSON output looks like and how to use it.

March 3, 2026 · 6 min read

Paper receipts are the last mile of expense data. Everything else in a modern finance stack is structured, searchable, and automatable. Receipts are photos of thermal paper.

If you're building anything that touches expenses — a reimbursement workflow, an accounting integration, a spend analytics dashboard — you eventually need to solve the receipt parsing problem. Doing it yourself means training models, handling edge cases, dealing with a hundred different receipt formats in a hundred different languages. That's a lot of infrastructure for something that isn't your core product.

The alternative: use an API that handles the extraction and returns clean JSON.

What the JSON output contains

When Receipt Converter processes a receipt, the extracted data follows a consistent structure:

{
  "vendor": "Whole Foods Market",
  "vendor_address": "945 N Michigan Ave, Chicago, IL 60611",
  "vendor_phone": "(312) 587-0648",
  "date": "2026-02-19",
  "time": "10:42",
  "items": [
    {
      "name": "Organic Oat Milk",
      "quantity": 2,
      "unit_price": 5.98,
      "total_price": 11.96
    },
    {
      "name": "Sourdough Loaf",
      "quantity": 1,
      "unit_price": 4.49,
      "total_price": 4.49
    }
  ],
  "subtotal": 28.44,
  "tax": 1.42,
  "tax_rate": "5%",
  "tip": null,
  "total": 29.86,
  "payment_method": "Visa ****4821",
  "currency": "USD",
  "receipt_number": "WF-20260219-4821"
}

Every field is typed and consistently named across all receipts. Null fields appear when the data isn't on the receipt rather than being omitted — so your parsing code doesn't need to handle missing keys.

What this looks like in practice

See it in action
WHOLE FOODS MARKET
Order #WF-20260219-4821
Organic Oat Milk 2x5.98
Sourdough Loaf4.49
Avocado 3x5.97
Cold Brew Coffee12.00
Subtotal:28.44
Tax (5% GST):1.42
Total:29.86
Visa ****482129.86
Feb 19, 2026 · 10:42 AM
A
B
C
D
E
1
Date
Item
Qty
Amount
Category
2
Feb 19
Organic Oat Milk
2
$5.98
Groceries
3
Feb 19
Sourdough Loaf
1
$4.49
Groceries
4
Feb 19
Avocado
3
$5.97
Groceries
5
Feb 19
Cold Brew Coffee
1
$12.00
Groceries
4 items · Feb 19, 2026Total: $29.86

How to get JSON output from the UI

If you're evaluating the extraction quality before building an integration, you can test it manually:

  1. Go to Receipt Converter and upload a receipt photo
  2. After extraction, click Export and choose JSON
  3. Download and inspect the output

The JSON from the UI is identical to what you'd get from an API call. Use this to verify the extraction quality on your specific receipt types before building anything.

Try it right here

Drop any receipt photo below. Results in a few seconds, free.

Drop your receipt here
or click to browse — JPG, PNG, PDF, HEIC
Upload Receipt
Free to try · No account needed

Building your own extraction pipeline

If you need to process receipts programmatically, here's the approach:

Option 1: Use the Receipt Converter API (coming soon) A direct REST API is on the roadmap. Submit a multipart form with the receipt image, get back structured JSON. No model training, no infrastructure.

Option 2: Build on top of GPT-4 Vision Receipt Converter is built on GPT-4 Vision with a carefully engineered system prompt that handles edge cases: multi-tax receipts, foreign currencies, handwritten amounts, poor image quality. You can replicate this approach:

import openai
import base64

def extract_receipt(image_path: str) -> dict:
    with open(image_path, "rb") as f:
        image_data = base64.b64encode(f.read()).decode()
    
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{image_data}",
                        "detail": "auto"
                    }
                },
                {
                    "type": "text",
                    "text": "Extract all receipt data as JSON with fields: vendor, date, items (name, quantity, unit_price, total_price), subtotal, tax, tax_rate, tip, total, currency, payment_method."
                }
            ]
        }],
        response_format={ "type": "json_object" }
    )
    
    return json.loads(response.choices[0].message.content)

The key gotchas when building this yourself:

  • Force JSON output mode with response_format
  • Handle not_a_receipt cases (someone uploads a non-receipt image)
  • Resize images before sending — large images slow down processing and cost more tokens
  • Add retry logic for API timeouts

Option 3: Use an OCR service + post-processing AWS Textract, Google Document AI, and Azure Form Recognizer can extract text from receipts. The challenge is that raw OCR output is unstructured — you still need to parse it into the fields you need. Accuracy on complex receipts (multiple tax rates, discounts, handwritten fields) is lower than vision model approaches.

Common receipt parsing edge cases

If you're building a parser, these are the cases that will catch you:

Multi-tax receipts. Some jurisdictions charge different tax rates on different items (e.g., prepared food vs. packaged goods). A single receipt might have multiple tax lines. Your schema needs to handle arrays of tax entries, not just a single tax field.

Tip lines. Restaurant receipts often have a pre-filled tip or a tip line. The pre-tip total and post-tip total both appear. Make sure you're capturing the right total for expense purposes.

Quantity formats. "2x" versus "2" versus "qty: 2" versus the item listed twice. Vision models handle these better than rule-based parsers.

Handwritten amounts. Especially on restaurant receipts where a server writes in the tip. Legibility varies but vision models generally handle clear handwriting.

Foreign currency and dates. vs EUR, 14/03/2026 vs March 14, 2026 vs 2026-03-14. Normalize these in your post-processing.

Test on real receipts from your target market

If your users are primarily in one country or industry, test on receipts from that context. Restaurant receipts look different from retail receipts look different from hotel folios. The more specific your test set, the more confident you can be in your extraction quality.

Storing and querying the structured data

Once you have JSON, you can store it in any database and query it like any other structured data:

-- Find all receipts over $100 from the last 30 days
SELECT vendor, date, total, currency
FROM receipts
WHERE total > 100
  AND date >= CURRENT_DATE - INTERVAL '30 days'
ORDER BY total DESC;

-- Sum expenses by vendor for a date range
SELECT vendor, SUM(total) as total_spend, COUNT(*) as receipt_count
FROM receipts
WHERE date BETWEEN '2026-01-01' AND '2026-03-31'
GROUP BY vendor
ORDER BY total_spend DESC;

The consistent JSON structure makes this straightforward. Vendor names from the same merchant might have minor variations ("Whole Foods" vs "WHOLE FOODS MKT") — you'll want a normalization pass if exact grouping matters.


If you're not building a system but just need JSON output for occasional use, the manual export from the UI works well. The batch processing workflow covers how to handle larger volumes without writing any code.

Test the JSON extraction on your receipts. Try Receipt Converter free →

Try it on your own receipts

Free to start. No account, no credit card.

Try free →

Continue reading

Guides

How to Go Paperless With Your Receipts (And Actually Stay That Way)

Going paperless is easy for a week. Staying paperless is where most systems fail. Here's the setup t...

6 min read
Tax Tips

Home Office Receipts: What to Track and How to Prove It at Tax Time

The home office deduction is one of the most valuable for freelancers and remote workers — and one o...

6 min read
Guides

Faded Receipt? How to Recover the Data Before It Disappears

Thermal paper receipts fade fast. Within 6-12 months many become completely blank. Here's what to do...

6 min read