Skip to content
Scale, performance, reliability

Sharding for unlimited scale

Sharding is a proven database architecture, allowing you to spread out your data across many servers. This facilitates increased failure isolation, faster backups, cluster flexibility, and unlimited scale.

Sharding is used by hundreds of large organizations around the world, managing data sets of terabyte to petabyte scale. Shopify, Figma, Uber, Slack, Cash App. These are just a few of the organizations that leverage a sharded architecture for their data stores.

Scaling issues? Get in touch for a demo of our sharding solution.

What are the benefits of sharding?

Who is sharding for?

Sharding is for organizations with large databases who need scale and flexibility. If you are currently using a single-primary database such as Amazon RDS, you should consider sharding in the following scenarios:
    • You have reached or are nearing the largest available instance size. For example, if you are on a m7a.24xlarge or m7a.32xlarge, it is likely time to shift to sharding.
    • Your monthly cloud bill incurs significant cost for extra IOPS and throughput. With sharding, your data and queries get spread out across many servers, reducing I/O bottlenecks.
    • You are facing operational challenges such as long backups, frequent server outages, lock contention, etc. Sharding with PlanetScale provides solutions to many of these all too common scaling difficulties.
    • You are planning for future growth and want to start with a platform that will provide scaling capabilities with minimal friction.
    Several customers have transitioned their database to a sharded architecture on PlanetScale with great success.

    What is sharding?

    PlanetScale builds its sharding solution on top of MySQL and Vitess. MySQL is the worlds most popular open-source relational database. Vitess is an open-source software layer that sits between a fleet of MySQL instances and your application servers. It manages query routing, connection pooling, automatic failover, backups, and sharding.

    Sharding comes in two types: vertical and horizontal.

    With vertical sharding, each individual table resides on a single server, but collectively the tables are spread out across many servers.

    Perhaps you have a database with several hundred tables. Most of them are small, but two are large - 1TB each. Instead of housing everything on one server, we can use vertical sharding. The two large tables will get placed on their own dedicated servers, and the rest can remain on a shared server. This will still appear as a single, unified database from the application's perspective, since it connects to the database through a proxy layer.

    Horizontal sharding is another way to spread tables out across servers. With horizontal sharding, we take the rows of an individual (large) table and spread them out across many servers. The proxy layer maintains metadata to keep track of which rows of a table live on which server, thus allowing it to effectively fulfill queries on this data.

    Perhaps we have a database with many small tables, and one huge table that has 1 trillion rows and is 2TB. The large table could get spread across four shards, each managing 500GB of the data. The proxy layer will manage routing queries to the appropriate shard.

    The sharding strategy is the technique used to distribute the data. A common sharding strategy is to use ID hashing as the shard key. With this technique, we choose one of the ID columns from the table. Each time we receive a new row, Vitess generates a hash of this ID. Each shard server is responsible for storing the rows for a range of hashes, and new row gets sent to the appropriate server.

    A good choice of shard key can lead to excellent performance, but a poor choice can be detrimental to your database. PlanetScale enterprise support provides guidance on how to shard your database effectively.

    Vitess or PlanetScale?

    Vitess and MySQL are both widely-used open-source projects. In light of this, why would one choose to use PlanetScale over self-managed Vitess?

    The history of Vitess and PlanetScale

    Vitess originated from an engineering team at YouTube, who needed to scale their massive fleet of MySQL databases to support millions of simultaneous users. YouTube was an early adopter of the sharded database architecture.

    Several years later, Vitess was donated to CNCF, joining the likes of other battle-harded tools like Kubernetes, Prometheus, and Argo.

    PlanetScale was later founded by the creators of Vitess. Today, we employ the majority of the maintainers of the Vitess project. We actively develop new features to enhance reliability and meet the needs of our large customers. Let us help you scale your database with our team of Vitess experts.

    Need to scale?

    Get started with sharding on PlanetScale

    PlanetScale gives you a shard-native platform to provide the ultimate solution for scalability and availability. Get in touch now for a quote.