System Design for the LLM Era with Sampriti Mitra

5Best Seller

System Design for the LLM Era

System Design Book

Stop building fragile AI wrappers. Start architecting resilient systems.

Most AI Engineering tutorials stop at client.chat.completions.create(). That works for a hackathon, but it breaks in production.

This book is not about prompt engineering. It is an architectural playbook for the non-deterministic, high-latency, and expensive reality of LLMs in production.

I am Sampriti Mitra, a Software Engineering Lead at Sumologic (ex-Razorpay, IIT BHU alumna). I’ve spent the last 6 years building scalable systems. I wrote System Design for the LLM Era to document the exact patterns needed to move from prototype to production-grade by deep diving into whitepapers, blogs and post mortems.

This book is the bridge between a working prototype and a resilient, scalable system.

What’s inside the final book:

We deep dive into patterns, scaling issues, cost optimisations, and case studies of different companies.

Core System Design patterns for using LLMs

We look at the different patterns for system design useful when integrating with LLMs.

The LLM Gateway Pattern: How to decouple your application from OpenAI/Anthropic using Tiered Fallback strategies.
Async Event-Driven Architecture: Decouple high-latency agentic workflows using Kafka/SQS message queues and worker patterns for long-running generation tasks
Cost Optimization: How to combine Semantic Caching (Redis/Vector) and Model Routing (sending simple grammar tasks to cheaper models) to reduce redundant API calls and lower inference costs by ~40%
Security at the Prompt Layer: Implementing Instruction/Data Separation and using Firewall LLMs to detect and block prompt injection attacks before they reach your expensive models

What Breaks at Scale

This book focuses on failure modes that only show up under load:

The Cache Stampede: Why standard caching fails during high-traffic events and how to fix it with Coalesce Caching middleware
The RAG Precision Drop: Why Naive RAG fails on multi-hop reasoning questions and how GraphRAG (Vector + Knowledge Graph) bridges the gap
Golden Datasets: Why simple assertion fails for non-deterministic LLMs and how to implement LLM-as-a-Judge evaluation pipelines using Golden Datasets

Case Study Deconstructions

We deep dive into the architecture of successful products based on their public engineering blogs and whitepapers:

AI-Native IDEs (like Cursor/Copilot): Handling the Context Window problem with smart code indexing and low-latency code completion.
Adaptive Learning Platforms (like Duolingo): Architecting offline content pipelines vs. online serving paths
AI-powered E-commerce search (for platforms like Amazon) and others!

Beyond the hype: Understanding Tokens, Embeddings, and the RAG lifecycle.
Why Naive RAG fails in production and how to fix it with GraphRAG.
Agentic AI: Understanding the shift from simple prompts to autonomous agents.
Operationalizing: Performance benchmarking, testing strategies, and handling failures.

Chapter 2: Core Architectural Patterns

Resilience: Circuit breakers and fallbacks for when OpenAI goes down.
Latency: Caching strategies to make LLM apps feel instant.
Cost: Token optimization techniques to slash your API bill by 40%.
Security: Injection attacks, data privacy, and Grounding strategies.

Chapter 3: Case Study: Designing an AI-Native IDE (like Cursor/Copilot)

Handling the Context Window problem with smart code indexing.
Privacy patterns for handling proprietary user code.
Deep dive: Latency vs. Accuracy trade-offs in code completion.

Chapter 4: Case Study: Adaptive Learning Platform like Duolingo

Architecting an offline content pipeline vs. an online serving path.
Asynchronous processing patterns for generating personalized courseware.
Database selection: When to use Vector DBs vs. Relational vs. Graph.

Chapter 5: Case Study: AI-Powered Search for E-Commerce like Amazon

Moving beyond keyword search: Hybrid Search architecture.
The Product Discovery flow: Ranking and re-ranking with LLMs.
Caching strategies for high-traffic retail events.

Chapter 6: Case Study: AI Customer Support Agent

The Golden Dataset: How to build an evaluation suite that actually works.
LLM-as-a-Judge: Automating your quality assurance.
Ingestion pipelines: Keeping your knowledge base fresh in real-time.

Terms | Privacy

$16$33