The Firestore JOIN Trap
You face a common Firestore problem. Your Firebase function throws a maximum batch size error. You need to join orders and customers for a dashboard. You usually duplicate data to fix this. But now your data is stale and inconsistent.
Google announced the Pipelines API to solve this. It allows JOIN operations across collections without duplicating data. Some developers report fast query times in small tests.
I spent a week testing this API under heavy load. Here is what the documentation does not tell you.
High Costs Every pipeline execution reads from all involved collections. A JOIN between two collections bills you for reads in both. If you join two collections of 50,000 documents, your costs scale poorly. It is not a simple linear cost.
Performance Limits In my tests, a pipeline against 10,000 documents took 380ms. When I tested 100,000 documents, the query timed out at 30 seconds. You are not fixing the problem. You are just turning a batch error into a timeout error.
Cold Start Issues Pipelines create a separate execution context. In serverless environments like Cloud Functions, this adds 2 to 4 seconds of delay. Your users will think your app is slow.
The Pipelines API is a tool for prototyping or small collections under 5,000 documents. It is not a replacement for a relational database. Google provides this to help you stay in the Firebase ecosystem instead of moving to PostgreSQL or Spanner.
If you use Pipelines, follow these rules:
• Audit your collection size. If a collection exceeds 20,000 documents, calculate the JOIN cost first. • Limit complexity. A JOIN across three or more collections is a bad sign. • Track read costs weekly. Pipeline reads appear differently on your bill. • Keep your denormalized data. Use Pipelines as a supplement, not a total replacement. • Test with real traffic. Benchmarks on quiet collections do not reflect production reality.
Do not use a band-aid to avoid a real architectural decision.
How do you handle relationships in Firestore? Do you use denormalization or client-side joins? Tell me in the comments.
