AI Engineering — Start Here
If you’re a developer adding LLM features to a production app, this is your reading order. These guides cover the infrastructure decisions that determine whether your AI feature works at scale — not just in a demo.
Assumed knowledge: You’re a working developer comfortable with TypeScript or Python. You’ve used an LLM API (OpenAI, Anthropic, etc.) at least once.
Step 1 — Learn the Vocabulary First
Before building anything, make sure you understand the terms. Misunderstanding these leads to expensive architecture mistakes.
- 📄 9 AI Terms Every Developer Must Know in 2026 — context windows, KV cache, quantization, evals and more explained clearly
- 📄 Context Window vs Context Collapse — why a bigger context window backfires
Step 2 — Understand the Infrastructure Costs
Most developers are surprised by their LLM bill. These two guides explain exactly where the money goes and how to control it.
- 📄 Why Your AI Bill Is Almost All Inference — the levers that cut it
- 📄 KV Cache: Why Long Context Isn’t Free — the memory math behind every long prompt
- 📄 LLM Quantization: FP16 to INT4 Explained — cut VRAM by 75% without wrecking accuracy
Step 3 — Build a RAG System That Actually Works
RAG is the most common LLM architecture in production apps. These guides cover the two decisions that make or break retrieval quality.
- 📄 Chunking Is Quietly Breaking Your RAG System — diagnose and fix bad retrieval
- 📄 GraphRAG vs Vector RAG — when relationships beat chunks
Step 4 — Ship Safely
Before you deploy an LLM feature to real users, you need two things: a way to catch regressions and a way to block unsafe outputs.
- 📄 How to Build Evals for Your AI Features — golden datasets, LLM-as-judge, CI gating
- 📄 How to Add Guardrails to an LLM App — input filters, in-flight constraints, output validation
Get the Weekly AI Engineering Guide
Every week: one pattern, one decision, one implementation — for developers building LLM features in production.
About the Author
Mahmoud Hussien is a frontend engineer with 19 years of experience, currently focused on applied AI engineering for production applications. He writes about the decisions he makes in real projects — infrastructure costs, retrieval architecture, evaluation systems, and deployment patterns.