I've been dealing with their support trying to delete my data. Here's the latest response [1]. The way I read it, they won't delete your genetic data, and it sure seems personally identifiable to me. Am I reading this wrong?
This is a follow-up from the 23andMe Team. Your inquiry
has been escalated to me for review. To clarify, once you
confirm your request to delete your account, we will
delete your data from our systems within 30 days,
unless we are required by law or regulation to
maintain data for a given timeframe.
For example, your Genetic Information, date of birth, and
sex will be retained by 23andMe and our third party
genotyping laboratory as required for compliance with
applicable legal obligations, including the U.S. Federal
Clinical Laboratory Improvement Amendments of 1988
(CLIA), California Business and Professional Code
Section 1265, and College of American Pathologists
accreditation requirements.
It is important to understand that the information stored
is distinct from the raw genotype data available within
your account. The raw data we receive from the lab
has not been processed by our interpretation software
to produce your individual-level genotype data (in
your account).
You can read more about our retention requirements in the
retention of personal information section of our Privacy
Statement.
I’m in a weird spot with 23andMe - when I signed up, I used a fake name as a fig leaf in case they decided to sell to insurance or whatever. Since then, several members of my immediate family have all signed up, so “the child of X and the sibling of Y” means that fig leaf is pretty useless now - except I can’t issue an actual CCPA now, because fake name.
All of this is super predictable, but I wasn’t nearly cynical enough 15 years ago when I mailed my spit to them.
I wonder if you can convince them through the customer service portal? People make typos all the time…
Doubt it. I assume they're under HIPAA regulations and it'd be a massive cost if they did it even once.
HIPAA also includes the right to correct incorrect information in your records. 23 may have to get unconventional to verify the individual but they're a DNA lab and have everything they need to make a positive confirmation.
There’s no HIPAA concern if you just want to delete your info, I think.
Nope. Same boat. They want ID in the name I signed up with to do anything, and I haven’t been able to access my account since they mass reset passwords after their breaches.
I think it should be easier if the goal is just to get your data deleted. If you want to recover your account, that brings up some HIPAA concerns. But if you are just nuking it, that should be easier, right?
If you (and others in the replies) were to go and update and perfect your data, removing all these ambiguities (fake name, dob), you would then be in a position to ask them to delete it. Ie absolutely remove all doubts about who you are to then address your privacy concerns. Perverse, eh?
I lied about my birth date and apparently there's no way to delete your data without the fake date or a photo ID... with the fake birth date...
sigh
Have you tried emailing them a bit? It is worth a shot I think: you made a typo (people make them all the time), but you don’t really need to fully authenticate, because you are just making a deletion request anyway (not trying to access the data).
(Also keep in mind, customer service people have to argue with assholes all day long, staying polite but clear but on-target can go a long way. Stick to the topic and never give them an excuse to cut off communication).
Thanks, I want to get my genes sequenced but I'd also like to get my records deleted from the service provider. I guess it'l have to be real name?
I moved my DNA data from 23andMe to Genomelink ~5 years ago. Sort of saw it coming.
They might delete it from their database, but it doesn't change the fact that it's been sold and shared in a way we can't also follow up and remove that information. There's no transparency. It not only implicates you, but your relations and future generations.
Genetic testing done through the hospital for a completely unrelated procedure can impact your life insurance. ( Example genetic testing for a child) Minnesota State Law prevents health insurance from changing. Laws need to protect right to know, not just right to use genetic information.
The (consumer) company I used to work for also allowed their customers to "delete" their data. Deletion was implemented as a boolean filed in the database "deleted - true/false'. We called it "soft deletion". And why was it implemented like this? It's because actually deleting data is hard. There is no single database and the data is distributed across many servers. It's also backed up in different places. Running the delete operation can be extremely costly and can also create service interruptions and data integrity issues. I think there was a script that was supposed to actually delete the entries but it was not run very often and was there for legal and compliance issues.
Just remember that when you request to delete some data on the internet, it doesn't actually get deleted (right away anyway). The best way to deal with this is not to give random sites your real information in the first place. However, that can be difficult or impossible when dealing with government, financial institutions or shopping sites.
Edit: And just to address questions below, the actual delete script was not run daily. I don't know how often it was run (I was not an SRE) but I presume it was run at least once a month. I have no idea how other companies do this.
> there was a script that was supposed to actually delete the entries ... was there for legal and compliance issues.
Sounds like the laws worked in this case. They required data to be actually deleted, and it was due to those laws, and only due to those laws.
No you don't understand, the script exists for plausible deniability, it even runs sometimes! And if you find out we didn't delete your data, we might even go out of our way to run it for you. Except if the script doesn't run anymore because it's been broken. Or because 5 microservices were added since the last time we "actually had to run it", and so even running it makes no assurance it actually deletes everything about you.
But if an internal lawyer really puts their foot down, we might put an intern looking at it for a couple of days.
I'd bet a finger this is how it works in most companies, and I know I've seen worse versions.
Many businesses would still use soft-deletion even if distributed data wasn't an issue. The company I work for has soft-deletion enabled because they want to be able to help customers who accidentally delete something. I wish we would just tell them "better luck next time", but obviously management will never say that.
What annoys me more is how many companies give next to no insight into or control over data retention. It should be unambiguous how soon or often our data gets hard-deleted, if ever.
Let’s be clear that what you describe is absolutely not gdpr compliant, so it would be illegal if you do business in Europe
Did you read the whole comment? They say there was a batch script to comply with legal requirements.
They said they thought there was a script, but it wasn't run very often.
Didn’t seem sufficient to me at all, but I’m happy to be proven wrong.
I work for a company managing a team that has built this for GDPR compliance.
Customer submits a deletion request. We have a fan out process that takes the deletion request and submits it to a bunch of different data locations. All of these must respond within 2 days (though the required time is 72h). Each of those data locations will queue up a job to remove access (soft delete) the data, and schedule a hard delete for 28 days in the future. If the customer says they don't actually want the data to be deleted, we cancel the data hard deletion and revert the soft delete. If nothing happens the hard deletion goes through.
Thanks, that’s insightful. In this case, it seems sensible to me at least.
> but that was not run very often
GDPR has strict rules about how long data can persist after the deletion request is made.
Who knows what "not very often" means. It could mean once a day or once a year. The point is that this could be made to be compliant with little extra effort, so pointing out "um actually it's not compliant" is not saying much.
I feel like that ship has sailed. Every software company I have ever worked for is dysfunctional in this regard. You might think your "delete my data" request succeeded but there is absolutely zero way to guarantee that it actually did, and chances are it didn't.
Agree, this is pointless. For one thing how many companies have the technical ability to remove specific records from all their database backups and logs? None that I’ve worked at
I’m not expecting my data get deleted from old backups or log files. I can see where that would be an issue.
What I do expect is my data is deleted from the production database and thus won’t be in any future backups/logs/etc. I guess to that end, they would need to keep a record of delete requests to re-delete them if they ever need to restore from backup.
If there is a data breach in a year where the company’s user data ends up on the internet, I expect to not be in that user list.
The problem is - imagine microservices - that data does not exist in one spot. And chances are no one actaually knows 100% where the data lives. It probably lives in a prod db, an ETL data lake type platform (or two/three - and god knows if that has any kind of identifier to actually delete it) and chances are if you are big enough some 3rd party systems. So even if you delete it from prod, it still exists somewhere.
In a perfect world there would be some way to snap your fingers and delete it from every system - but we do not live in a perfect world. There is absolutely no incentive to build systems with this kind of requirement in mind. It's a waste of time and effort. Europeans will say "but hey wait! GDPR!" meanwhile the world keeps spinning and no one gives a shit.
Not that anyone is disagreeing, but it bears repeating: This is a lack of any real pressure from regulators, not a technical challenge. Or rather, there may be technical challenges but they absolutely can be overcome, and aren’t being tackled right now very simply because the business doesn’t care. As is so often the case, the business must be made to care.
> This is a lack of any real pressure from regulators, not a technical challenge.
Also, I think it's easy to misstep if we start thinking of it as a problem of "better regulators", since some of the blame lies on deeper legal-aspects around (data-)ownership, contracts, and what what happens in bankruptcies.
Even a company with great intentions may have difficulty ensuring the promises they made are kept long-term, especially if a bankruptcy court voids those promises in the name of repaying creditors.
GDPR mandates the ability to delete the data.
Not from all backups, or so I've heard.
You heard wrong. It doesn't have to be immediate though.
Disagree. Waste of time and resources. Let the data sit and rot, who cares. We are humans not Germans.
Data usually leaks from production though, no? So in that perspective it's not pointless.
On the other hand very few organizations I have worked at could definitely restore backups (at least it was not tested regularly) and logs will eventually roll off.
Exactly this. Especially for a currently failing company that got an incentive to NOT delete your data (because that's the only value they still have).
Not in the EU.
I feel like at best we'll get a soft delete
Historically, yes.
But don't the GDPR and CCPA et al. create liability around failure-to-delete after receiving a request?
Sure, but how will you know they didn't delete it?
update users set deleted=true where uid=123345;
And the data is "gone".Good luck proving that your data was not deleted.
GDPR and CCPA etc made it easy to send a request for deletion that will most probably be a frontend gimmick. How much effort are they really going to put into going back in their backups and deleting all your entries? I'm pretty sure it must be the lowest roadmap priorities.
The financial penalties are pretty nasty.
And it's amazing how financial liability has a way of getting things on a VP's feature radar that common sense doesn't.
The reason it was haphazardly handled prior was that there was no liability. Who cared? (legally speaking)
From working inside a T25 American retail company, I can say that we went top-to-bottom and rearchitected for traceability and hard deletes as a result of the CCPA.
I have a feeling that it's also quite a difficult problem past some scale of infrastructure.
If I ask Google to delete my data (EU citizen), I have trouble believing that they actually go through all of their cold storage backups where it was stored and make sure it's erased. At best I could believe that the process is designed in such a way that my soft-deleted data is unlikely to be recovered (intentionally or not) and maybe unlikely to be possible to link to my account.
What they should do (I have no idea what they do) is to encrypt every record belonging to a user with an individual key. Live records, backups, everything. If a user wishes to be deleted, that live key is simply obliterated, making any data the user owns unrecoverable.
Since the key is not used for end to end encryption, and backends still have access to the data (as long as the key lives), it has different requirements on how it needs to be protected. The biggest challenge is backing up the key itself, as losing it means losing access to all the user’s data by design. But backing up and obliterating a single key is much, much easier than doing so for a whole set of loosely associated data across many databases.
Practically speaking, it also makes using and querying that data and doing any kind of analytics much, much more expensive. It is done that way in some cases, but in the absence of a technical requirement to do so, there are cheaper approaches.
Those are solvable problems. I could also argue how address space separation and more generally MMU protections make things so, so much more complex (they do!), yet we don’t question that one very much.
There is no end to end encryption involved here, so you don’t need to resort to such voodoo as homomorphic encryption.
Yes, I also expect that this is the way, but I think it makes the problem only partially smaller, since you still need to sync and back up the keys.
Also, is an encrypted piece of data with a lost key truly deleted? What if the encryption gets cracked?
I would say it is more deleted than toggling a `deleted` flag in the db and less deleted than burning the tapes in fire.
> the problem only partially smaller, since you still need to sync and back up the keys.
I mentioned that: It makes the problem much smaller, as you only have one single, small piece of data to backup and and erase, instead of an ever-changing many-faceted blob of distributed data.
> Also, is an encrypted piece of data with a lost key truly deleted? What if the encryption gets cracked?
Oh boy. If simple symmetric encryption gets “cracked”, then you have much larger problems.
> I would say it is more deleted than toggling a `deleted` flag in the db and less deleted than burning the tapes in fire.
For all practical purposes symmetrically encrypted data that lost its keys is considered “random” data. If you “erase” data on a device before you sell it, most often it will just throw away the key to the disk contents nowadays.
They already do this (the encryption-at-rest part). Deleting the data is still a hard requirement. Also, the keys are never seen outside of the centralized encryption service. Deletion is still a must.
Encrypt with an individual key for each user. Throwing away the key is indistinguishable from deletion.
Google-scale companies have very capable people employed, both on the technical and legal side, who do nothing else than look for these kinds of oversights, and are empowered to make sure they get fixed.
That's why they get fined all the time?
Before you make a deletion request, make a subject data request and see what they have on you; then request deletion; then make a subject data request again.
The fact they cannot access the data during subject data request does not mean it has been deleted.
I can't speak for any other companies, but you don't need to speculate. You can search the internet and find several articles outlining that the correct strategy for businesses here is to delete the data from production systems, and then maintain a record of references to those deleted records such that a restored backup can ensure that deleted records are not put back into production.
There is generally an expectation that data may be retained in backups for a specified retention period, but will not be used or restored. Beyond that, it is up to the regulator to determine if this is meets the standard, but it's worth noting that there are notions baked into the text and the interpretations of the text of GDPR that account for reasonable costs and efforts.
Auditors can and do test and monitor for this, both using audit processes and demanding evidence, and by performing manual testing and experimentation.
Fines for non-compliance with GDPR regarding data of European citizens can amount to 4% of annual revenue:
83(5) GDPR, the fine framework can be up to 20 million euros, or in the case of an undertaking, up to 4 % of their total global turnover of the preceding fiscal year, whichever is higher.
I have built systems for a lot of EU companies, and they all took GDPR compliance very seriously.
Maybe some mom-and-pop shop would bodge it, but any serious business has legal council and wisely listens to them.
100% this. It's laughable if you believe those requests work as expected. Sure they may "delete" some surface level bs like your account or login, but there is no way it's 100% scrubbed in the way it's supposed to work.
A lot of recourse is around intent and liability. I would like to believe my request is honored; in the event it is later proved to not have been honored, recourse is potentially available through legal and regulatory mechanisms.
23andme didn't implement strong customer identity and auth mechanisms, for example, and it cost them ~$30M to settle their data breach liability [1]. Take action, keep receipts, and failing good faith actions, step back while regulators and the legal system whack whack whack with a hammer.
[1] https://news.ycombinator.com/item?id=41536494 ("HN: 23andMe settles data breach lawsuit for $30M")
Oh nice, "~$30M to settle." That <$100 you get back in the class action will be amazing compensation. Sadly the legal route is a joke at this point.
> I'm happy if it contributes to the death of the org.
But the not the death of your data. That will be sold onto someone else.
Note that even if they delete the data, if you have close relatives that submitted their samples a company can still infer quite a lot from that.
One instance where I am disappointed to be vindicated.
Considered doing 23andMe at the hype peak, discovered they had avoided HIPAA requirements, read through their privacy policy, and marked them off the possibility list.
It was pretty clear the delta between sequencing costs and price they were charging consumers equaled how much they thought they could make from your genetic information.
And because they don't fall under HIPAA, your data is theirs after they get it.
PS: Sequencing costs were also falling rapidly, so it isn't that expensive to get it done.
They do not do DNA sequencing. They do genotyping. It's far less detailed.
That too. And at full sequencing for <$1000 now, why not just pay for the whole shebang? It's not like someone is doing it monthly.
Do you know where one can get sequencing done at a HIPAA regulated provider?
I would talk to your primary care person, they should know.
I've had two members of my family die of ALS and was wondering what my odds are of getting it. One of the steps could be a full DNA sequence. In order to get to that step however, you have to do about six months of counseling, and several blood tests before they do the full DNA sequence. The counseling is to prepare yourself for the possibility of them essentially giving you a death sentence with the blood and DNA results.
I never got that far. My father convinced me its better not to know and live your life accordingly, rather than trying to live a life always looking over your shoulder.
But my primary did have the information on how I could get it done, so I would start there.
I don't think I have ever seen a correctly implemented data deletion request system that worked well with the company's backups. If it's backed up, it's likely not getting deleted.
I have seen plenty. The key is to take frequent backups and aggressively delete older backups once you know you won't be restoring from that backup. Also, don't appropriate backups to do other things, such as audit logs.
I tried to download my raw data recently and it took days. Seems like a lot of customers are trying to download it and cancel after the turmoil. I think 23andme has always been held hostage by its scientists who have stopped it from offering a lot of entertaining information about health related studies that are not considered methodologically sound enough to constitute health advice. Why not just add a "speculative or insufficiently replicated / peer-reviewed" section and let us have fun with our data!
I tell ya, it's a great party conversation that begins with "Hey, I'm a Libra, 3% Neanderthal, and I share a haplogroup with Genghis Khan! Let's go out for some tacos with extra cilantro, and a dark chocolate churro!"
Don't know why you'd bother. I, and my friends, and soon my family are in All of Us. We'll be in every genomics dataset you want.
As a California company the data is subject to the CCPA. You can download your data but more importantly you can request they delete it. I highly recommend that everyone do so.
I can think of no more sensitive biometric data than your dna.
> I can think of no more sensitive biometric data than your dna.
I dunno, is that actually true? You leave DNA everywhere don't you? If someone really wanted tombert's DNA, they'd just have to follow me onto the train and swab the poll I'm grabbing, or grab the cup I was sipping on at McDonald's, or any number of things that could lead to a number of cells containing my DNA in a state that could be collected being dropped.
Hardly the same.
Your day to day DNA “leavings” aren’t neatly packaged up and associated with your other PII like name, location, email address, etc in a stolen, searchable dataset.
If they know who you are, it's easy. If all they have is the DNA, that database will link it to you. They have caught several serial killers because close relatives were in DNA databases, which allowed honing in on a small number of living suspects.
Following you probably costs more per hour than buying a whole country's DNA from a broker. And definitely costs more than a leaked dataset.
You do realize there's a difference between obsessing about tombert (or groby) vs hoovering up DNA at scale, right? Your insurance company probably won't follow you around personally, but if they can buy a bunch of DNA (yours included) for a few dollars/person and use that to strategically deny claims/increase costs, yeah, they'll sign up for that.
The issue with digital data is almost never the individual targeting case. Cheap mass surveillance is the concern.
Honestly it doesn't even matter. There is no proof the DNA is yours because they do no validation of users identity.
Speculative results with statistical likelihood are still highly valuable to the right buyers.
People are convicted all the time without any "proof" of guilt. It all goes to "beyond a reasonable doubt" and with enough circumstantial evidence, that "beyond" can be achieved.
I still find it astonishing anyone would be so careless of their own and close blood relatives' privacy to hand over their genetic material to a private company. What were you thinking. You can't undo that and you can't change your DNA ever. You have no idea where it ends up any time -- and that "any time" covers your life time and your close blood relatives entire lifetime too. These companies should have never been able to get a single customer but I guess.
And here we are 18 years later and some people still think they can delete this. What else do you believe in? The tooth fairy? Santa Claus? Come on.
Also what have you thought they can tell you? An archaeogenetics teacher described this belief as "they think we throw a bone in the machine which tells us it was half hun, half avar, half bear and spoke slavic".
Y'all surrendered an intrinsic part of the privacy of your, your sister, your brother, even your unborn children for snake oil -- and paid for the privilege. I can't even.
commence the downvotes but you can't put the toothpaste back once it's been squeezed out.
I agree completely about it being careless etc but I am not astonished at all that so many people have done it.
Have you seen how simple minded the masses are? They find it hard to think! They are barely sentient!
When you ask a company to delete your data, you're actually asking them to pretend they deleted it by making it invisible to you. There's too much $$$ sloshing around for them to behave ethically.
What, you mean sending your DNA to a random startup to have them analyze it for you was a BAD idea? #surprisedpikachu
Why was it a BAD idea? What negative consequences have people faced so far? If they've also benefited from the service, how are they supposed to judge if it was a mistake or not?
Of course it was a mistake. That data will 100% be compromised, if it hasn't been already. If there's a way for it to be used against them it will be found.