I’m planning to use Postgres for my next Saas, But all these “issues” worry me. I have used MySQL in the past and, even though my web services don’t handle too many connections/transactions, MySQL always perform good. I’ve seen other articles regarding Postgres issues when it comes to scaling. I’ve come to realize that I like MySQL when it comes to performance, but I don’t like it when it comes to semantics and design... hence my attempt to switch to Postgres.
Maybe balance your reading with other articles that talk about how well plain old vertical scaling (bigger box) works these days. You can fit 128 cores, huge amount of memory, and 24+ NVME drives into a 2U rack spot these days and not have to worry much about sharding, horizontal scaling, etc, for quite some time.
Both RDBMS have good performance. Lots of the issues we ran into really are related to scale. If you know that your data set will never be particularly large, most of these issues will not come up. Other issues like managing replicas, promotions, etc. will be shared between RDBMS.
One of the nice things about PostgreSQL regardless of scale are the tools they provide for optimizing your application. EXPLAIN and EXPLAIN ANALYZE are really powerful tools for figuring out why a query performs badly and validating that indexing you add actually improved query performance.
The article mentions that 75TB of data is stored across 40 nodes. Does this mean that sharding is already done ? What is the CPU/RAM/Disk of each node ?
Use pgbouncer once you start getting some traffic and vertically scale as needed. If you're that concerned about scaling you could also just start with Aurora and I think RDS even has a built in connection pooler now. That will get you really, really far. You could also just put pgbouncer on the application servers and use the RDS connection proxy. If you really start needing scale then call Citus and at that point you'd be able to afford them.