Comments Page - Zero downtime migrations at Petabyte scale

« Back Zero downtime migrations at Petabyte scaleplanetscale.comSubmitted by Ozzie_osman 3 days ago

WaitWaitWha 12 minutes ago
I split step 4 in their "high level, this is the general flow for data migrations".
4.0 Freeze old system
4.1 Cut over application traffic to the new system.
4.2 merge any diff that happened between snapshot 1. and cutover 4.1
4.3 go live
to me, the above reduces the pressure on downtime because the merge is significantly smaller between freeze and go live, than trying to go live with entire environment. If timed well, the diff could be minuscule.
What they are describing is basically, live mirror the resource. Okay, that is fancy nice. Love to be able to do that. Some of us have a mildly chewed bubble gum, a foot of duct tape, and a shoestring.
mattlord an hour ago
Blog post author here. I'm happy to answer any related questions you may have.
- redwood an hour ago
  That 400TB in the image is a large database! I'm guessing that's not the largest in the PlanetScale fleet either. Very impressive and a reminder that you're strongly differentiated against some of the recent database upstarts in terms of battle tested mission critical scale. Out of curiosity how many of these large clusters are using your true managed 'as a service' offering or are they mostly in the bring your own cloud mode? Do you offer zero downtime migrations from bring your own cloud to true as a service?
  mattlord 43 minutes ago
  That particular cluster has grown significantly since the post was written, and yes there are now quite a few others that are challenging it for the "largest" claim. :-)
  These larger ones are fully using the PlanetScale SaaS, but they are using Managed -- meaning that there are resources dedicated to and owned by them. You can read more about that here: https://planetscale.com/docs/vitess/managed
  All of the PlanetScale features, including imports and online schema migrations or deployment requests (https://planetscale.com/docs/vitess/schema-changes/deploy-re...) are fully supported with PlaneScale Managed.
  redwood 31 minutes ago
  Understood: that's great for your customers' EDP negotiations with their cloud providers!
clarabennett26 2 hours ago
Running ALTER TABLE on a large MySQL table can lead to significant replication lag. The ghost/pt-osc method using shadow tables and trigger-based replication has its limits, especially when dealing with foreign keys or backfilling computed columns with application logic. I'm interested in how PlanetScale manages the consistency window during cutover since, in my experience, those final seconds of syncing the changelog can be quite critical.
ksec 19 minutes ago
Missing 2024 in the Title.
redwood 2 hours ago
Worth underlining that this is data migrations from one database server or system to another rather than schema migrations