Tac Is a Map for the Agent Stack

Most AI writeups start at the model.

Tac starts lower.

A model can look excellent in a benchmark and still fail in production. The context window is too long. The cache is wrong. The routing layer is blind. The tool loop falls apart. The evals are missing.

Tac exists because those failures are not edge cases. They are the work.

Tac is not a tutorial. It is a reference.

That choice matters. A tutorial tells you how to get something working once. A reference helps you decide what to build, where the failure modes live, and what tradeoff you are accepting.

What Tac Is#

Tac is a structured map of the Agentic AI stack. The site is organized around production decisions, not buzzwords. Each layer starts with the question a builder actually faces, then explains the tradeoff, then ends with the reality check.

That means you can open it when you are choosing a model family, designing a cache strategy, deciding whether to self-host serving, picking a runtime, or adding evals before a release.

The stack is four layers deep.

┌─────────────────────────────────────────────────────────────┐
│  Agent Applications   ← coding agents, assistants, support  │
├─────────────────────────────────────────────────────────────┤
│  Agent Runtime        ← tool loops, MCP, memory, planning   │
├─────────────────────────────────────────────────────────────┤
│  Agent Infrastructure ← serving, caching, routing, observ.  │
├─────────────────────────────────────────────────────────────┤
│  Foundation Models    ← transformers, tokenizers, weights   │
└─────────────────────────────────────────────────────────────┘

text

Foundation Models#

This is the bottom of the stack. Tokenization, context windows, sampling, quantization, pricing, and model choice all start here.

The repo treats model selection as a production decision, not a benchmark contest. That is the right framing. A model family can be strong and still be wrong for your workload if the latency budget, output format, or cost profile do not fit.

Tac covers the parts that usually get hand-waved away:

tokens and cost
context windows
sampling behavior
quantization
reasoning model tradeoffs

That matters because every layer above foundation models inherits the constraint. If the model is expensive to call, the rest of the stack has to compensate. If the context window is big but unreliable, you need a different retrieval strategy. If structured output is the real requirement, prompting alone is not enough.

Tac is honest about that. A large context window does not make retrieval reliable. A cheap model does not make routing unnecessary. A good benchmark score does not make a good system.

Agent Infrastructure#

This is the layer that turns inference from a demo into something you can actually run. Serving, caching, routing, observability, and rate limits all live here.

This is also where the economics show up. Managed APIs are simple, but they are not free at scale. Self-hosting gives you control, but it gives you the bill too. Prompt caching can cut repeated context costs sharply if your prefixes are stable. Routing can save money if the classifier is accurate and lightweight.

Tac does not treat those as back-end details. They are the product.

That is why the infrastructure section exists as its own layer. A notebook workflow can ignore these questions. A production agent cannot. If you are calling models millions of times, caching and routing are not optimization work. They are the difference between a viable product and a burn rate problem.

The infrastructure topics reflect that:

serving
prompt caching
latency
KV cache and quantization
rate limits and concurrency
observability

That is the middle of the stack for a reason. It is where the cost and reliability problems stop being theoretical.

Agent Runtime#

This is the layer between model output and real action.

The runtime is where tool loops, memory, planning, frameworks, and protocol plumbing live. This is the layer that decides whether your model call becomes a helpful action or just another text response.

Tac focuses on the questions that actually break systems:

Do you need a framework, or is a raw loop enough?
What happens when a tool fails mid-task?
How much state should persist across turns?
When does orchestration help, and when does it add noise?

The docs are clear that tool design matters a lot. One action per tool. Structured responses. Idempotence where possible. Clear failure messages. A short tool list. Those sound like small details until you are debugging a bad agent and the whole run is one tangled context blob.

Tac also treats MCP the way it should be treated: as a protocol, not a magic solution. It helps with tool discovery and reuse. It does not solve security. It does not replace trust boundaries. It does not save you from prompt injection in tool results.

That is the right amount of skepticism.

The runtime section also makes a useful point about frameworks. LangGraph, CrewAI, AutoGen, Claude Code SDK, and raw loops all trade control for abstraction in different ways. Tac does not pretend there is one winner. It just helps you see the cost of each choice.

Agent Applications#

This is the top of the stack. Coding agents, assistants, support bots, research systems. The user sees this layer, but the layer below determines whether it survives contact with reality.

The repo frames this as build versus buy. That is the right question. The useful decision is not “Can we make an agent?” The useful decision is “What do we own, what do we outsource, and where do we want the failure to sit?”

That is why the application layer is so practical. The same stack can produce very different products:

a coding agent with file and terminal access
a personal assistant with cross-session memory
a support bot with narrow domain knowledge
a research agent that browses, reads, and synthesizes

Each one puts different pressure on the lower layers. Coding agents need strong tool use and fast feedback. Support bots need retrieval and escalation paths. Research agents need source quality and citation discipline. Personal assistants need memory and long-term context handling.

Tac is useful because it keeps those differences visible.

Why The Structure Works#

The best part of Tac is not that it covers a lot of topics. It is that the topics are arranged around decisions.

That matters because most AI material is organized around hype cycles. New model. New framework. New agent. New acronym. You can read a lot of that and still not know whether you should self-host, cache, route, benchmark, or simply lower your expectations.

Tac refuses that shape. It asks the builder to start with the problem. Then it maps the layer that owns the problem. Then it shows the tradeoff. Then it adds a production reality section so the nice theory does not drift away from the messy part.

That is the part I wanted from the start.

If you are building agentic systems, you do not need more hype. You need a map that still makes sense when the first design choice turns into a billing issue, a latency issue, a reliability issue, or a security issue.

Tac is trying to be that map.

Production Reality#

Benchmarks do not predict production performance. A model that looks best on paper may still fail on your workload. Tac keeps the discussion grounded in cost, latency, and reliability instead of leaderboard theater.

Model context windows are not free architecture. They still have cost, latency, and retrieval limits. If you need reliable access to knowledge, you still need to design for it.

Prompt caching matters when the prefix is stable. If your prompts churn on every request, you will not get the savings you expected. The same is true for routing. A smart routing layer helps only if the routing decision is cheaper than the work it saves.

Security is not a footnote. Once an agent can read untrusted content and call tools, prompt injection becomes part of the real threat model. Tac treats that as a core concern, not an appendix.

Is This Right for You?#

Tac is right for you if you are past the stage where “just try a model” is enough.

If you are making decisions about pricing, routing, memory, tool design, evals, or deployment strategy, a stack-level reference saves time. It gives you a cleaner way to think before you build.

If you only want a quick prompt trick or a one-off demo, this is probably more than you need.

If you are building something that has to survive real users, it is the right shape.

Docs: https://srmdn.github.io/tac/ ↗

Repo: https://github.com/srmdn/tac ↗