Part of our ai automation guide series

ai-automation

AI-Powered Expense Report Automation for Office Workers: No-Code Solutions

Praveen 5 min read
a computer chip with the letter a on top of it
Photo by Igor Omilaev on Unsplash

How Can AI Automate My Expense Reports Without Writing Code?

The short answer is: I used DeepSeek to generate a Python script that extracts data from receipt images, categorizes expenses, and formats them into a CSV report-all triggered by dragging files into a folder. The AI handled 80% of the code, while I focused on structuring the workflow and fixing library conflicts.

The Problem: Expense Reports Were Killing My Productivity

Every Friday, I wasted 2 hours manually:

  1. Cropping receipt images
  2. Typing amounts into Excel
  3. Assigning categories (Travel, Meals, etc.)
  4. Calculating totals

Our finance team needed reports in a specific CSV format with columns for Date, Vendor, Amount, and Category. The worst part? I’d often transpose numbers (like typing $56.20 as $65.20), requiring rework.

The AI Attempt: A One-Shot Prompt to DeepSeek

I fed DeepSeek this prompt:

Prompt:  
"Build a Python script that:  
1. Watches a 'receipts' folder for new image files (JPG/PNG)  
2. Uses OCR to extract text from each image  
3. Identifies the vendor name, date, and total amount  
4. Categorizes expenses as 'Travel', 'Meals', or 'Office Supplies' based on keywords  
5. Appends the data to a CSV with headers: Date,Vendor,Amount,Category  
6. Runs automatically when new files appear  

Use pytesseract for OCR and watchdog for file monitoring.  
Include error handling for bad OCR reads."  

First Output:
DeepSeek generated a 47-line script using:

  • pytesseract for OCR
  • watchdog for folder monitoring
  • Regex patterns to find amounts/dates

Where It Worked:

  • The folder watcher logic was flawless
  • CSV generation used proper escaping

Where It Broke: 3 Critical Failures

  1. Library Hell
    The script required pytesseract, but failed because I didn’t have Tesseract OCR installed system-wide. Error:

    TesseractNotFoundError: tesseract is not installed or it's not in your PATH  
  2. Vendor Detection Was Garbage
    For a Starbucks receipt, it output:

    Vendor: STARBUCKS INC 12345  
    Date: 2024-02-30 (invalid date)  
    Amount: $12.SO (misread '5' as 'S')  
  3. False Positives
    A screenshot of a Slack conversation triggered the script because it contained a dollar sign ($).

What I Had to Fix

1. The OCR Pipeline
I switched to EasyOCR after finding pytesseract struggled with thermal receipts. Required changes:

# Before (pytesseract)  
text = pytesseract.image_to_string(image)  

# After (EasyOCR)  
reader = easyocr.Reader(['en'])  
results = reader.readtext(image, detail=0)  

2. Date Validation
Added a function to reject impossible dates (like February 30th):

from datetime import datetime  
def validate_date(date_str):  
    try:  
        datetime.strptime(date_str, '%Y-%m-%d')  
        return True  
    except ValueError:  
        return False  

3. Vendor Cleanup
Used a hardcoded vendor list with fuzzy matching:

KNOWN_VENDORS = ['Starbucks', 'Uber', 'Amazon Web Services']  

def match_vendor(raw_text):  
    for vendor in KNOWN_VENDORS:  
        if vendor.lower() in raw_text.lower():  
            return vendor  
    return "Unknown"  

The Working Result

Final script features:

  • Drag-and-drop processing: Drop receipts into ~/expense_reports/receipts
  • Human review: Flags uncertain entries in yellow (requires manual confirmation)
  • Audit trail: Logs all actions to processing.log

Sample Output CSV:

Date,Vendor,Amount,Category  
2024-03-15,Starbucks,12.50,Meals  
2024-03-16,Uber,24.30,Travel  

Execution Time:

  • 0.8 seconds per image (tested on 20 receipts)
  • 95% accuracy on typed receipts, 70% on handwritten

What I Learned

  1. AI Can’t Handle Edge Cases Alone
    DeepSeek didn’t anticipate thermal receipt fade or handwritten notes. I had to build manual review steps.

  2. Prompt Engineering = Specification Writing
    The more precise my prompt (e.g., “include a 10% confidence threshold for OCR results”), the better the output.

  3. Library Choices Matter
    EasyOCR worked better than pytesseract for my use case, but added 300MB to the deployment size.

The Exact Prompt

Here’s the refined prompt that got me the final working version:

"Build a Python 3 script that:  
1. Uses EasyOCR instead of pytesseract for better receipt readability  
2. Watches a folder for new JPG/PNG files with watchdog  
3. Extracts:  
   - Date (must be valid, skip if invalid)  
   - Amount (must match regex ^\$?\d+\.\d{2}$)  
   - Vendor (fuzzy match against a hardcoded list)  
4. Categorizes expenses using these rules:  
   - 'Travel' if vendor is Uber/Lyft or contains 'taxi'  
   - 'Meals' if vendor is a restaurant  
   - 'Office' if vendor is Amazon/Staples  
5. Outputs to CSV with error logging  
6. Skips images where confidence is below 70%  

Include a function to manually review uncertain entries before CSV export."  

FAQ

Q: Can this handle multi-currency receipts?
A: Not yet. The current regex only catches USD ($). You’d need to modify the amount detection pattern.

Q: What about receipts in other languages?
A: EasyOCR supports 80+ languages. Change the Reader initialization to ['en','es','fr'] as needed.

Q: Is this GDPR compliant?
A: No-the script stores raw receipt images alongside processed data. You’d need to add auto-deletion of images after processing.

Q: Can I run this on my phone?
A: Only via Termux (Linux environment for Android). For iOS, you’d need a cloud trigger like AWS Lambda.

Q: How much did this cost to build?
A: $0 in tools (all open-source), but 3 hours of prompt tuning and debugging.

What expense workflow would you automate with this approach? Drop a comment with your nightmare process!

P

Praveen

Technology enthusiast helping people work smarter with practical guides and AI workflows.

Explore more: Browse all ai automation guides or check related articles below.