Every few weeks, a startup founder asks me to review their ML architecture. And every few weeks, I find the same pattern: a system designed for Google's scale, built by a team of three, serving a few thousand users.
They have Kubernetes orchestrating their containers. A feature store managing their feature pipelines. A model registry tracking their model versions. A sophisticated CI/CD pipeline for model deployment. MLflow for experiment tracking. Airflow for workflow orchestration.
And their actual product? One model. A few hundred predictions per hour. Three engineers who spend more time maintaining infrastructure than improving the model.
This is the most expensive mistake I see in startup AI, and it happens because the ML industry's "best practices" are written by people who work at companies with 500-person ML teams. Their advice is correct — for their context. For a Series A startup, it's a death sentence.
How this happens
The path is predictable. A startup hires their first ML engineer. That engineer's previous job was at a larger company with mature ML infrastructure. They know what "good" looks like at scale, so they build what they know. Or they follow the blog posts and tutorials written by ML teams at big tech companies, because those are the most visible and well-written resources.
The result is infrastructure that's optimized for problems the startup doesn't have yet: managing hundreds of models (they have one), coordinating across large teams (they have three engineers), handling millions of requests per second (they handle hundreds).
None of this is the engineer's fault. They're doing what they were trained to do. The problem is that nobody told them: at your current scale, the correct infrastructure looks completely different.
The minimal production ML stack
Here's what actually works at the seed-to-Series-A stage, based on what I've seen across dozens of startups and what I've built at AIshar:
Model serving: FastAPI + a single container. Not a model serving platform. Not KServe or Seldon or TFServing. A FastAPI application that loads your model, accepts requests, returns predictions. Deployed as a single container on whatever cloud service your application already uses.
This handles thousands of requests per second on a $50/month instance. When you need more, you add a load balancer and a second instance. You don't need a model serving platform until you have multiple models with different resource requirements being deployed by different teams. If that's not you yet, FastAPI is correct.
Training: A script that runs on a cloud instance. Not Airflow. Not Kubeflow. Not a DAG orchestration system. A Python script that you can run manually or on a cron schedule. It reads data, trains a model, saves the artifact, and optionally deploys it.
The script should be version-controlled. It should log its results somewhere you can find them. It should be reproducible. That's it. You don't need workflow orchestration until your training process has multiple interdependent stages managed by different people. A script handles single-model training perfectly.
Feature engineering: SQL + pandas. Not a feature store. Your features are queries against your database, transformed in Python. Store the feature computation code in your repo. Run it before training.
Feature stores solve a real problem — feature consistency between training and serving, feature sharing across teams, point-in-time correctness for time-series data. But at the stage where one person writes all the features and there's one model, the problem doesn't exist yet. The overhead of maintaining a feature store will cost you more than the bugs it prevents.
Experiment tracking: A spreadsheet. Not MLflow. Not Weights & Biases. A shared spreadsheet where you log: what you tried, what the metrics were, what you learned. When you have three engineers running ten experiments a week, a spreadsheet works. When you have twenty engineers running hundreds of experiments, invest in tooling. Not before.
I know this sounds absurd to anyone who's worked in a mature ML organization. But I promise you: the startup that ships with a spreadsheet and a FastAPI server will iterate faster than the one that spends three months setting up MLflow, Airflow, and a feature store before deploying their first model.
Monitoring: Application-level metrics + alerts. Not a dedicated ML monitoring platform. Add prediction latency and prediction distribution metrics to whatever monitoring your application already uses. Set alerts for latency spikes and distribution shifts. That covers 90% of model issues at startup scale.
When to add complexity
The question isn't whether to build proper ML infrastructure. It's when. Here are the actual triggers:
Add a model serving platform when you have 3+ models with meaningfully different resource requirements (some need GPUs, some don't) and different deployment cadences. Not before.
Add a feature store when you have features shared across multiple models and teams, or when training-serving skew is causing production bugs you can't diagnose from code review alone. Not before.
Add workflow orchestration when your training pipeline has more than three interdependent stages that need to run in a specific order, and failure in one stage needs to trigger specific recovery in another. Not before.
Add experiment tracking tooling when you're running more than 20 experiments per month and can't remember what you tried last week. Not before.
Add Kubernetes when you need to manage more than 10 different services with different scaling requirements, and your ops team has someone who actually knows Kubernetes. Not before. Especially not before.
The real cost of over-engineering
The obvious cost is time — months spent on infrastructure instead of the model. But the hidden cost is worse: over-engineered infrastructure makes your model harder to improve.
When deploying a model change requires updating a Kubernetes manifest, pushing to a model registry, triggering a CI/CD pipeline, waiting for a staged rollout, and monitoring a canary deployment, engineers stop experimenting. The activation energy for trying something new becomes so high that people default to the safe option: change nothing.
In a startup, speed of iteration is your primary competitive advantage. Every piece of infrastructure that adds friction to the question "what happens if I try this?" is making you slower.
What I actually build for startups
When AIshar works with a Series A startup, we typically deploy something like this:
One or two FastAPI-based model servers. A training script that runs on a schedule. Features computed from SQL queries. A simple A/B testing framework built on top of their existing analytics. And clear documentation about when each component should be replaced with something more sophisticated.
The total infrastructure cost is usually under $500/month. Time to first production model: weeks, not months. And when the startup grows to the point where they need real ML infrastructure, they have a clear, documented migration path — not a tangled mess of premature abstractions.
The boring stack wins. Every time.
Want help with your AI stack?
If this post matches problems you're seeing, we can map the fastest path from architecture decisions to production outcomes.
Talk to Manmeet
Manmeet Singh
Founder & CEO, AIshar Labs · Ex-Apple, Ex-Instacart · 15 AI Patents
Built ML systems at Apple (Search: Maps, Safari, Spotlight) and Instacart (Search, Recommendations, Ranking). Writes about production AI tradeoffs and system design.
Follow on LinkedIn →