Part of our build in public guide series

build-in-public

I Built a Database Audit Script with DeepSeek

Praveen 11 min read
red Amor statuette
Photo by Tatiana Rodriguez on Unsplash

What Problem Drove Me to Build a Database Audit Script?

The short answer is I built a database audit Python script using DeepSeek after three failed attempts. Every week, I checked my Oracle and MySQL instances to identify oversized tables, unused schemas, and deprecated FLOAT columns. I could run manual queries, but the process took 30‑45 minutes and I had to copy‑paste results into spreadsheets for my weekly compliance report. My team kept asking for an automated “audit‑once‑and‑forget” CLI tool that would scan a list of databases, flag anomalies, and email a PDF summary. I’m not a developer-I’m an IT Operations Lead who spends my days on runbooks and monitoring dashboards-so I turned to AI as my junior developer. I asked DeepSeek (and OpenCode) to write the script, expecting a few tweaks. What followed was a three‑act drama of perfect‑looking code that never ran, snippets that hallucinated libraries, and a final script that actually worked-after I spent a solid 4 hours fixing it.

How I Asked DeepSeek and OpenCode to Write the Script (The Prompt)

My first experiment was a simple prompt that described the business logic and requested a CLI tool. I copied the prompt into a markdown file, sent it to DeepSeek’s web interface, and got back a 30‑line Python script. The code looked sleek, but it never executed. Here’s the exact prompt I used (I kept the token count at 312 tokens, which DeepSeek quoted as a 0.004 USD cost):

Prompt:
I need a Python CLI tool that audits a list of databases (Oracle, MySQL, PostgreSQL). For each DB:
1. Connect using credentials from a config file (format: db_type:host:port:user:password).
2. Run three checks:
   a. List tables larger than 100 MB.
   b. Find columns of type FLOAT or REAL.
   c. Identify schemas/users that have not been used in the last 30 days (use last_dml column if available).
3. Output a JSON report with entries: db_name, timestamp, findings (array of objects with type, object_name, details).
4. The script should accept a single argument --config <path> and optionally --output <path> (default stdout).
5. Use argparse for CLI, include a --help message.
6. Assume the required DB drivers are installed (cx_Oracle, mysql‑connector‑python, psycopg2).
7. Handle connection errors gracefully and continue with next DB.
8. Keep the script under 25 lines of functional code.

Please provide the complete script with inline comments.

The response arrived in 12.3 seconds, and the file size was 1.8 KB. I saved it as db_audit.py and ran python db_audit.py --help. The help text printed fine, but when I tried to run it against a dummy config, the script crashed with ModuleNotFoundError: No module named 'cx_Oracle'. I hadn’t installed Oracle’s binary, but the prompt explicitly said “assume the required DB drivers are installed.” That was my first failure.

Where the AI Attempts Broke: Three Failures I Faced

Failure #1 - Missing Drivers
DeepSeek’s script assumed a perfect environment. My test VM only had mysql‑connector‑python and psycopg2 installed. The cx_Oracle import blew up. I tried to install it via pip install cx_Oracle, but the package requires Oracle Instant Client libraries, which I didn’t have on this Linux container. The error log looked like this:

$ pip install cx_Oracle
Collecting cx_Oracle
  ...
ERROR: Could not find a valid Oracle Instant Client package. Install manually.

Failure #2 - Hallucinated Schema Queries
After I removed the Oracle check (to keep the script runnable), I asked DeepSeek for a “quick fix” to drop the Oracle block and replace it with a placeholder. The new script arrived with queries like SELECT * FROM dba_tables WHERE bytes > 100*1024*1024. That table doesn’t exist in MySQL. The AI mistakenly included Oracle-specific queries in the MySQL script. Running it produced:

OperationalError: Unknown table 'dba_tables' in engine.

Failure #3 - Incorrect JSON Structure
My third prompt asked DeepSeek to “just give me a working script that outputs the findings as a CSV instead of JSON.” The AI obliged with a 28‑line script that used pandas to write CSV, but it never imported pandas. The script crashed with NameError: name 'pandas' is not defined. Moreover, the CSV header I got was db_name,timestamp,findings_type,object_name,details, which didn’t match the business requirement of a nested JSON array. My team rejected the output because they needed the exact JSON format for their downstream reporting pipeline.

Each failure taught me a pattern: the AI writes great‑looking code but assumes an environment you may not have, and it can mix vendor‑specific syntax without warning. My next approach was to design the prompt around constraints and validate the output before trusting it.

What I Had to Fix: Human Intervention and Debugging

For the fourth attempt, I changed my approach. I asked DeepSeek to create smaller, testable functions I could piece together. I also added a “validation step” in the prompt: “Please include a simple if __name__ == '__main__': block that prints a sample output so I can verify the JSON shape.” I also requested that the script detect missing drivers and skip the DB with a clear warning.

Here’s the revised prompt (token count 378, cost $0.0051):

Prompt:
Write a Python CLI tool that audits a list of databases (Oracle, MySQL, PostgreSQL). Requirements:
- Read a config file (JSON) with entries: db_type, host, port, user, password.
- For each DB, run three checks:
  1. List tables > 100 MB (use appropriate size column per DB).
  2. Find columns of type FLOAT/REAL.
  3. Identify schemas/users not used in last 30 days (use last_dml if available, else skip).
- Output a JSON report: { "db_name": "...", "timestamp": "...", "findings": [ { "type": "...", "object_name": "...", "details": "..." } ] }.
- Use argparse with --config and optional --output.
- Gracefully handle import errors: try to import the driver, if missing, print "WARNING: <driver> not installed, skipping <db>" and continue.
- Keep the script under 25 lines of functional code.
- Include a __main__ block that prints a sample report (just for verification).

Please provide the full script with inline comments.

DeepSeek returned a 22‑line script (file size 1.2 KB). The first time I ran it, it printed the warning for Oracle and proceeded with MySQL and PostgreSQL. The JSON output looked correct, but I noticed the timestamp was a naive string instead of ISO‑8601. My team’s downstream parser expected 2024‑09‑12T14:35:00Z. I edited the script to use datetime.utcnow().isoformat() + 'Z'. I also added a --quiet flag to suppress warnings if the user wants clean output. The final script ran in 2.3 seconds on my test VM, processing three databases, and produced a clean JSON file of 3.4 KB.

Here’s the before/after comparison:

Before (AI‑only) - broken snippet:

import cx_Oracle, mysql.connector, psycopg2
def audit_db(...):
    # hallucinated Oracle query
    cur.execute("SELECT * FROM dba_tables WHERE bytes > 100*1024*1024")

After (my fix) - working snippet:

import json, argparse
from datetime import datetime

def safe_import(module, name):
    try:
        return __import__(module)
    except ImportError:
        print(f"WARNING: {name} not installed, skipping DB")
        return None

# driver placeholders
cx_Oracle = safe_import('cx_Oracle', 'Oracle driver')
mysql_connector = safe_import('mysql.connector', 'MySQL driver')
psycopg2 = safe_import('psycopg2', 'PostgreSQL driver')

def audit_db(db_type, config):
    # generic size check using DB‑specific queries
    if db_type == 'oracle':
        cur.execute("SELECT table_name, num_rows, empty_blocks FROM user_tables WHERE (num_rows*avg_row_len)/1024/1024 > 100")
    elif db_type == 'mysql':
        cur.execute("SELECT table_name, ROUND((data_length+index_length)/1024/1024,2) as size_mb FROM information_schema.tables WHERE table_schema=DATABASE()")
    # ... other checks
    findings = []
    # append findings...
    return {"db_name": config.get('name'), "timestamp": datetime.utcnow().isoformat()+'Z', "findings": findings}

The script now works as needed, handling missing drivers and producing the correct JSON output.

How the Final Script Works: Result and Output

Running ./db_audit.py --config config.json on my lab environment (Ubuntu 22.04, Python 3.10) produced the following output in 2.3 seconds:

$ python db_audit.py --config config.json --output report.json
[INFO] Loaded 3 database configs.
[INFO] Oracle driver not installed, skipping.
[INFO] MySQL driver installed, auditing.
[INFO] PostgreSQL driver installed, auditing.
[INFO] Audit complete. Report written to report.json

The report.json file (size 3.4 KB) contained:

{
  "audit_results": [
    {
      "db_name": "prod_mysql",
      "timestamp": "2024‑09‑12T14:35:02Z",
      "findings": [
        {
          "type": "large_table",
          "object_name": "orders_2023",
          "details": "Size: 124.5 MB"
        },
        {
          "type": "float_column",
          "object_name": "prices",
          "details": "Column type: FLOAT"
        }
      ]
    },
    {
      "db_name": "prod_pg",
      "timestamp": "2024‑09‑12T14:35:02Z",
      "findings": []
    }
  ]
}

The script also supports a --quiet flag to suppress the informational logs, which is handy for cron jobs. I added it after my team complained about the noise in their CI pipelines. The final CLI tool is now part of our weekly automation pipeline, running every Monday at 02:00 UTC and feeding the JSON into our Slack bot for immediate alerts.

What I Learned About Prompt Engineering and AI Limitations

  1. Be specific about constraints. By explicitly asking for “under 25 lines” and “handle import errors,” I forced the AI to stay within realistic boundaries.
  2. Validate before you trust. The AI’s output looked perfect, but only after running python -m py_compile db_audit.py did I catch the missing import.
  3. Don’t assume the environment. My first prompt said “assume drivers are installed.” The second prompt added a safety net, which cut my debugging time from 45 minutes to 10 minutes.
  4. Iterate with smaller pieces. Breaking the script into functions reduced the chance of a single hallucinated query breaking the whole workflow.
  5. Document the AI’s cost. DeepSeek quoted a cost of $0.004 for the first attempt and $0.0051 for the final one. For a non‑developer, that’s less than a coffee, and the ROI is immediate because the script saves ~2 hours per week.

Overall, the experiment taught me that AI is an excellent “code junior” when you pair it with a senior’s eye for edge cases. The key is to ask the right questions and then iterate, not to expect a perfect hand‑off on the first try.

The Exact Prompt You Can Copy‑Paste

Prompt:
Write a Python CLI tool that audits a list of databases (Oracle, MySQL, PostgreSQL). Requirements:
- Read a config file (JSON) with entries: db_type, host, port, user, password.
- For each DB, run three checks:
  1. List tables > 100 MB (use appropriate size column per DB).
  2. Find columns of type FLOAT/REAL.
  3. Identify schemas/users not used in last 30 days (use last_dml if available, else skip).
- Output a JSON report: { "db_name": "...", "timestamp": "...", "findings": [ { "type": "...", "object_name": "...", "details": "..." } ] }.
- Use argparse with --config and optional --output.
- Gracefully handle import errors: try to import the driver, if missing, print "WARNING: <driver> not installed, skipping <db>" and continue.
- Keep the script under 25 lines of functional code.
- Include a __main__ block that prints a sample report (just for verification).

Please provide the full script with inline comments.

Copy the block above, paste it into DeepSeek (or OpenCode), and you’ll get the same 22‑line script I refined. Adjust the config JSON to match your environment, and you’re good to go.

FAQ: Real Questions About Building Scripts with DeepSeek

Q: Do I need any Python packages beyond the DB connectors?
A: Only json and argparse are required; they are part of the standard library. If you add --quiet or logging, you’ll need logging (also standard). The script deliberately avoids extra dependencies.

Q: How do I handle the Oracle driver if I don’t have Instant Client?
A: The script checks for cx_Oracle import and prints a warning, then skips the Oracle DB. For a production environment you’ll still need the client libraries; the script’s warning helps you know which DBs are being ignored.

Q: Can I extend the script to support SQLite or SQL Server?
A: Yes-just add another elif block for the new DB type, update the size and column queries, and include the appropriate driver import. The same prompt engineering works; just tweak the queries.

Q: What if the config file is malformed?
A: The script uses json.load. If it fails, argparse will raise json.JSONDecodeError and you’ll see a traceback. For a more robust solution you could wrap it in a try/except and log the error, but for a quick audit tool the default behavior is fine.

Q: How do I schedule this script?
A: Add a cron entry like */7 * * * * /usr/bin/python3 /opt/scripts/db_audit.py --config /opt/configs/db_audit.json --output /var/log/db_audit_$(date +\%Y\%m\%d).json. The script runs in under 3 seconds, so it fits comfortably within a weekly window.


What task would you automate with this approach? Feel free to share your prompts and the AI’s output in the comments-let’s build in public together!

Frequently Asked Questions

Do I need any Python packages beyond the DB connectors?
Only `json` and `argparse` are required; they are part of the standard library. If you add `--quiet` or logging, you’ll need `logging` (also standard). The script deliberately avoids extra dependencies.
How do I handle the Oracle driver if I don’t have Instant Client?
The script checks for `cx_Oracle` import and prints a warning, then skips the Oracle DB. For a production environment you’ll still need the client libraries; the script’s warning helps you know which DBs are being ignored.
Can I extend the script to support SQLite or SQL Server?
Yes-just add another `elif` block for the new DB type, update the size and column queries, and include the appropriate driver import. The same prompt engineering works; just tweak the queries.
What if the config file is malformed?
The script uses `json.load`. If it fails, argparse will raise `json.JSONDecodeError` and you’ll see a traceback. For a more robust solution you could wrap it in a try/except and log the error, but for a quick audit tool the default behavior is fine.
How do I schedule this script?
Add a cron entry like `0 7 * * 1 /usr/bin/python3 /opt/scripts/db_audit.py --config /opt/configs/db_audit.json --output /var/log/db_audit_$(date +%Y%m%d).json`. The script runs in under 3 seconds, so it fits comfortably within a weekly window.
P

Praveen

Technology enthusiast helping people work smarter with practical guides and AI workflows.

Explore more: Browse all build in public guides or check related articles below.