A practitioner's reference for matching local LLM parameter counts to real-world jobs. What each tier can and can't do — with offensive security context.
Essentially unusable for security tooling. Can't reliably parse scan output, hallucinates flags and tool syntax, can't chain reasoning steps. Only viable use is as a fast classifier for log triage or alert categorization where you've fine-tuned on your own labeled data.
Marginal for security work. Can assist with simple tasks like parsing a single nmap scan or explaining a known CVE, but struggles with anything requiring judgment — choosing between attack paths, chaining findings, or generating reliable exploit code. Fine-tuned coding variants (e.g., CodeLlama 7B, DeepSeek Coder 6.7B) are better for script assistance but still hallucinate tool flags regularly. Generally not usable for agentic security loops — expect ~50-60% JSON reliability.
Starting to become useful. Can parse nmap/Nessus output with decent accuracy, generate basic enumeration scripts, and explain CVEs with acceptable fidelity. Can handle simple structured output for tool integration. Still struggles with multi-step attack path reasoning and will confidently suggest wrong flags or non-existent tool features. Fine-tuned variants on security data could be viable for focused tasks like log analysis or IOC extraction. Considered a bare minimum for agentic security workflows — handles simple targets with some retries.
This is the practical sweet spot for local security tooling. Can reliably parse complex scan output, generate working enumeration scripts, reason about attack paths across multiple findings, and maintain context in agentic loops. AD attack chain reasoning becomes viable — the model can connect Kerberoasting results to delegation abuse opportunities with reasonable accuracy. Exploit code generation is functional but should be validated. Fits a single RTX 3090 at Q4, making it ideal for homelab setups. The recommended tier for local agentic security platforms.
Best local option for serious autonomous security tooling. Complex AD attack chain reasoning, multi-step exploit development, and nuanced vulnerability analysis are all viable. Can reason about environmental context during engagements — understanding network topology, making pivot decisions, and adapting exploitation strategy based on partial information. A dual RTX 3090 setup can run this tier comfortably at Q4. The tradeoff is speed: inference is significantly slower, so real-time interactive use during a live engagement can feel sluggish. Best deployed for pre/post-engagement analysis or overnight autonomous scanning.
Maximum capability, but with operational constraints. Frontier models like Claude Opus, GPT-4, and Gemini Pro excel at everything from complex exploit chain reasoning to report generation. The critical consideration is data sensitivity: sending client network data, credentials, or engagement details through a cloud API requires careful evaluation of your ROE, client agreements, and the provider's data handling policies. Best suited for: pre-engagement planning, methodology development, tooling creation, report writing (with sanitized data), training/learning, and CTF work where data sensitivity isn't a concern. For live engagement data, a local model or a provider with a signed DPA is more appropriate.
These tiers are guidelines, not hard rules. Several factors shift performance significantly:
✅ = Comfortable fit with KV cache headroom ⚠️ = Fits but tight at longer contexts ❌ = Won't fit / requires offloading
| GPU | VRAM | 1-3B | 7-8B | 13-14B | 27-32B | 70B |
|---|---|---|---|---|---|---|
| NVIDIA 30 SERIES | ||||||
| RTX 3060 | 12 GB | ✅ | ✅ | ⚠️ | ❌ | ❌ |
| RTX 3070 | 8 GB | ✅ | ⚠️ | ❌ | ❌ | ❌ |
| RTX 3070 Ti | 8 GB | ✅ | ⚠️ | ❌ | ❌ | ❌ |
| RTX 3080 | 10/12 GB | ✅ | ✅ | ⚠️ | ❌ | ❌ |
| RTX 3080 Ti | 12 GB | ✅ | ✅ | ⚠️ | ❌ | ❌ |
| RTX 3090 | 24 GB | ✅ | ✅ | ✅ | ✅ | ❌ |
| RTX 3090 Ti | 24 GB | ✅ | ✅ | ✅ | ✅ | ❌ |
| 2× RTX 3090 | 48 GB | ✅ | ✅ | ✅ | ✅ | ⚠️ |
| NVIDIA 40 SERIES | ||||||
| RTX 4060 | 8 GB | ✅ | ⚠️ | ❌ | ❌ | ❌ |
| RTX 4060 Ti | 8/16 GB | ✅ | ✅ | ⚠️ (16GB) | ❌ | ❌ |
| RTX 4070 | 12 GB | ✅ | ✅ | ⚠️ | ❌ | ❌ |
| RTX 4070 Ti | 12 GB | ✅ | ✅ | ⚠️ | ❌ | ❌ |
| RTX 4070 Ti Super | 16 GB | ✅ | ✅ | ✅ | ❌ | ❌ |
| RTX 4080 | 16 GB | ✅ | ✅ | ✅ | ❌ | ❌ |
| RTX 4080 Super | 16 GB | ✅ | ✅ | ✅ | ❌ | ❌ |
| RTX 4090 | 24 GB | ✅ | ✅ | ✅ | ✅ | ❌ |
| 2× RTX 4090 | 48 GB | ✅ | ✅ | ✅ | ✅ | ⚠️ |
| NVIDIA 50 SERIES | ||||||
| RTX 5060 | 8 GB | ✅ | ⚠️ | ❌ | ❌ | ❌ |
| RTX 5060 Ti (8GB) | 8 GB | ✅ | ⚠️ | ❌ | ❌ | ❌ |
| RTX 5060 Ti (16GB) | 16 GB | ✅ | ✅ | ✅ | ❌ | ❌ |
| RTX 5070 | 12 GB | ✅ | ✅ | ⚠️ | ❌ | ❌ |
| RTX 5070 Ti | 16 GB | ✅ | ✅ | ✅ | ❌ | ❌ |
| RTX 5080 | 16 GB | ✅ | ✅ | ✅ | ❌ | ❌ |
| RTX 5090 | 32 GB | ✅ | ✅ | ✅ | ✅ | ❌ |
| 2× RTX 5090 | 64 GB | ✅ | ✅ | ✅ | ✅ | ✅ |
| QUICK VRAM REFERENCE | ||||||
| Model Size | — | ~1-2 GB | ~4-6 GB | ~8-10 GB | ~16-22 GB | ~40-48 GB |
Note: VRAM estimates are for model weights only at Q4 quantization. The KV cache grows with context length and will consume additional VRAM during inference. A card marked ⚠️ may work fine with short prompts but degrade or offload to CPU during longer sessions. The 50 series specs reflect announced/released configurations as of early 2026.