ai-automation
AI-Powered Expense Report Automation for Office Workers: No-Code Solutions
How Can AI Automate My Expense Reports Without Writing Code?
The short answer is: I used DeepSeek to generate a Python script that extracts data from receipt images, categorizes expenses, and formats them into a CSV report-all triggered by dragging files into a folder. The AI handled 80% of the code, while I focused on structuring the workflow and fixing library conflicts.
The Problem: Expense Reports Were Killing My Productivity
Every Friday, I wasted 2 hours manually:
- Cropping receipt images
- Typing amounts into Excel
- Assigning categories (Travel, Meals, etc.)
- Calculating totals
Our finance team needed reports in a specific CSV format with columns for Date, Vendor, Amount, and Category. The worst part? I’d often transpose numbers (like typing $56.20 as $65.20), requiring rework.
The AI Attempt: A One-Shot Prompt to DeepSeek
I fed DeepSeek this prompt:
Prompt:
"Build a Python script that:
1. Watches a 'receipts' folder for new image files (JPG/PNG)
2. Uses OCR to extract text from each image
3. Identifies the vendor name, date, and total amount
4. Categorizes expenses as 'Travel', 'Meals', or 'Office Supplies' based on keywords
5. Appends the data to a CSV with headers: Date,Vendor,Amount,Category
6. Runs automatically when new files appear
Use pytesseract for OCR and watchdog for file monitoring.
Include error handling for bad OCR reads."
First Output:
DeepSeek generated a 47-line script using:
pytesseractfor OCRwatchdogfor folder monitoring- Regex patterns to find amounts/dates
Where It Worked:
- The folder watcher logic was flawless
- CSV generation used proper escaping
Where It Broke: 3 Critical Failures
-
Library Hell
The script requiredpytesseract, but failed because I didn’t have Tesseract OCR installed system-wide. Error:TesseractNotFoundError: tesseract is not installed or it's not in your PATH -
Vendor Detection Was Garbage
For a Starbucks receipt, it output:Vendor: STARBUCKS INC 12345 Date: 2024-02-30 (invalid date) Amount: $12.SO (misread '5' as 'S') -
False Positives
A screenshot of a Slack conversation triggered the script because it contained a dollar sign ($).
What I Had to Fix
1. The OCR Pipeline
I switched to EasyOCR after finding pytesseract struggled with thermal receipts. Required changes:
# Before (pytesseract)
text = pytesseract.image_to_string(image)
# After (EasyOCR)
reader = easyocr.Reader(['en'])
results = reader.readtext(image, detail=0)
2. Date Validation
Added a function to reject impossible dates (like February 30th):
from datetime import datetime
def validate_date(date_str):
try:
datetime.strptime(date_str, '%Y-%m-%d')
return True
except ValueError:
return False
3. Vendor Cleanup
Used a hardcoded vendor list with fuzzy matching:
KNOWN_VENDORS = ['Starbucks', 'Uber', 'Amazon Web Services']
def match_vendor(raw_text):
for vendor in KNOWN_VENDORS:
if vendor.lower() in raw_text.lower():
return vendor
return "Unknown"
The Working Result
Final script features:
- Drag-and-drop processing: Drop receipts into
~/expense_reports/receipts - Human review: Flags uncertain entries in yellow (requires manual confirmation)
- Audit trail: Logs all actions to
processing.log
Sample Output CSV:
Date,Vendor,Amount,Category
2024-03-15,Starbucks,12.50,Meals
2024-03-16,Uber,24.30,Travel
Execution Time:
- 0.8 seconds per image (tested on 20 receipts)
- 95% accuracy on typed receipts, 70% on handwritten
What I Learned
-
AI Can’t Handle Edge Cases Alone
DeepSeek didn’t anticipate thermal receipt fade or handwritten notes. I had to build manual review steps. -
Prompt Engineering = Specification Writing
The more precise my prompt (e.g., “include a 10% confidence threshold for OCR results”), the better the output. -
Library Choices Matter
EasyOCR worked better than pytesseract for my use case, but added 300MB to the deployment size.
The Exact Prompt
Here’s the refined prompt that got me the final working version:
"Build a Python 3 script that:
1. Uses EasyOCR instead of pytesseract for better receipt readability
2. Watches a folder for new JPG/PNG files with watchdog
3. Extracts:
- Date (must be valid, skip if invalid)
- Amount (must match regex ^\$?\d+\.\d{2}$)
- Vendor (fuzzy match against a hardcoded list)
4. Categorizes expenses using these rules:
- 'Travel' if vendor is Uber/Lyft or contains 'taxi'
- 'Meals' if vendor is a restaurant
- 'Office' if vendor is Amazon/Staples
5. Outputs to CSV with error logging
6. Skips images where confidence is below 70%
Include a function to manually review uncertain entries before CSV export."
Related Guides
- Ga4 Data Delayed Or Missing Check Measurement Id Consent
- Ga4 Events Automatic Recommended Custom Tracking Guide
FAQ
Q: Can this handle multi-currency receipts?
A: Not yet. The current regex only catches USD ($). You’d need to modify the amount detection pattern.
Q: What about receipts in other languages?
A: EasyOCR supports 80+ languages. Change the Reader initialization to ['en','es','fr'] as needed.
Q: Is this GDPR compliant?
A: No-the script stores raw receipt images alongside processed data. You’d need to add auto-deletion of images after processing.
Q: Can I run this on my phone?
A: Only via Termux (Linux environment for Android). For iOS, you’d need a cloud trigger like AWS Lambda.
Q: How much did this cost to build?
A: $0 in tools (all open-source), but 3 hours of prompt tuning and debugging.
What expense workflow would you automate with this approach? Drop a comment with your nightmare process!
Praveen
Technology enthusiast helping people work smarter with practical guides and AI workflows.
Explore more: Browse all ai automation guides or check related articles below.