How to Use VT Hash Check to Identify Malicious Files Fast

Automating VT Hash Check: Scripts and Best Practices

Why automate VT hash checks

Automating VirusTotal (VT) hash lookups saves time, reduces human error, and scales threat triage for many files. Instead of manually submitting hashes to the web UI, scripts let you batch-query, integrate checks into pipelines (CI/CD, EDR workflows), and trigger downstream actions (quarantine, alerts, ticket creation).

Common automation goals

Batch-check large sets of file hashes (MD5/SHA1/SHA256).
Enrich alerts with VT verdicts and vendor detections.
Cache results to avoid repeated API calls and rate limits.
Automatically escalate or block based on thresholds.
Log and audit all queries for incident investigation.

Prerequisites

A VirusTotal API key (public or private).
Basic scripting knowledge (Python, Bash, PowerShell).
Hashes to check in a structured form (CSV, JSON, or plain text).
Secure storage for your API key (environment variables, secrets manager).

Recommended workflow

Read a list of hashes from a file or alert feed.
Normalize hashes (trim whitespace, verify length/format).
Check local cache/database for prior results.
Query VT API only for uncached hashes, obeying rate limits.
Parse VT response: detection ratio, first/last submission dates, related indicators.
Store results in your cache and send relevant alerts/actions.
Periodically refresh cached results for older entries.

Example: Python script (SHA256, VT v3 API)

python
# Requires: requests
# Usage: set VT_API_KEY env var; provide hashes.txt with one SHA256 per line
import os, time, requests, json 
VT_API_KEY = os.getenv(“VT_API_KEY”)
HEADERS = {“x-apikey”: VT_API_KEY}
INPUT_FILE = “hashes.txt”
CACHE_FILE = “vt_cache.json”
RATE_LIMIT_SLEEP = 15# seconds between requests to avoid throttling

def load_cache():
    try:
        with open(CACHE_FILE, “r”) as f:
            return json.load(f)
    except:
        return {}

def save_cache(cache):
    with open(CACHE_FILE, “w”) as f:
        json.dump(cache, f, indent=2)

def query_hash(h):
    url = f”https://www.virustotal.com/api/v3/files/{h}“
    r = requests.get(url, headers=HEADERS, timeout=30)
    if r.status_code == 200:
        return r.json()
    else:
        return {“error”: r.status_code, “text”: r.text}

def parse_result(resp):
    if “error” in resp:
        return {“status”: “error”, “code”: resp[“error”]}
    data = resp.get(“data”, {})
    attrs = data.get(“attributes”, {})
    stats = attrs.get(“last_analysis_stats”, {})
    result = {
        “malicious”: stats.get(“malicious”, 0),
        “suspicious”: stats.get(“suspicious”, 0),
        “undetected”: stats.get(“undetected”, 0),
        “total_votes”: attrs.get(“total_votes”, {}),
        “first_submission_date”: attrs.get(“first_submission_date”),
        “last_analysis_date”: attrs.get(“last_analysis_date”),
        “links”: data.get(“links”, {})
    }
    return result 
def main():
    cache = load_cache()
    with open(INPUT_FILE) as f:
        hashes = [l.strip() for l in f if l.strip()]
    for h in hashes:
        if h in cache:
            print(f”{h}: cached -> {cache[h][‘malicious’]} malicious”)
            continue
        resp = query_hash(h)
        parsed = parse_result(resp)
        cache[h] = parsed         print(f”{h}: {parsed.get(‘malicious’, ‘err’)} malicious”)
        save_cache(cache)
        time.sleep(RATE_LIMIT_SLEEP)

if name == “main”:
    main()

Best practices

Respect rate limits: Use sleeps, exponential backoff, and monitor HTTP 429 responses.
Cache aggressively: Store results with timestamps; refresh only when needed.
Secure API keys: Use environment variables or secrets managers; never hard-code keys.
Normalize inputs: Validate hash lengths (MD5=32, SHA1=40, SHA256=64 hex chars).
Graceful error handling: Retry transient failures, log persistent errors for review.
Use VT enrichment fields: Pull vendor detections, community votes, first/last submission dates, and crowdsourced tags.
Define action thresholds: e.g., block if malicious vendors ≥ 3, quarantine if suspicious > 0. Tailor thresholds to your risk tolerance.
Privacy and compliance: Avoid uploading sensitive content; prefer hash lookups over file uploads when privacy is a concern.
Audit and logging: Keep query logs (without sensitive data) for investigations and compliance.

Integrations and scaling tips

Push results to SIEM (Splunk, Elastic) or ticketing systems (Jira, ServiceNow).
Use serverless functions (AWS Lambda, Azure Functions) for on-demand checks.
Parallelize with worker queues but shard to respect per-key rate limits.
Rotate API keys or use multiple keys/accounts if volume requires it.

Quick decision matrix

Use case	Recommended approach
One-off checks	Manual VT UI or simple script
Batch daily feeds	Scheduled script with cache and logging
Real-time alerts	Integrate into EDR/SIEM with async workers
High-volume automation	Sharded workers, multiple API keys, backoff logic

Final checklist before production

API key stored securely and tested.
Rate limiting and retry logic implemented.
Caching and expiry policy defined.
Alert/enforcement thresholds documented.
Logging and monitoring in place.

If you want, I can adapt the example to PowerShell, Bash, or a serverless function and include concrete threshold values for your environment.

How to Use VT Hash Check to Identify Malicious Files Fast

Automating VT Hash Check: Scripts and Best Practices

Why automate VT hash checks

Common automation goals

Prerequisites

Recommended workflow

Example: Python script (SHA256, VT v3 API)

Best practices

Integrations and scaling tips

Quick decision matrix

Final checklist before production

Comments

Leave a Reply Cancel reply

More posts

JabKeeper Review 2026: Features, Pricing, and Privacy Insights

How to Secure Your Remote Desktop: Best Practices in 2026

Pipe-tunes Collection: 25 Traditional and Modern Favorites

Top 10 Optimization Tips for Intel Cluster Toolkit Compiler