Ironically, this article is full of the sort of semantic confusions that cause the problem in the first place. The reporter clearly hasn't run the article past an actual programmer as she seems to think this outcome is a deliberate design choice rather than a bug:
> Null was first programmed 60 years ago by a British computer scientist named Tony Hoare ... Hoare probably wasn’t thinking about people with the 4,910th most common surname. He later called it his billion-dollar mistake, given the amount of programmer time it has used up and the damage it has inflicted on the user experience.
Obviously Hoare's statement wasn't about this problem at all. She's also giving readers the impression Microsoft has some sort of policy against using null values:
> “It’s a difficult problem to solve because it’s so widespread,” said Daan Leijen, a researcher at Microsoft, who says the company avoids use of null values in its software.
Whatever Leijen said, I'm pretty sure it wasn't that.
I really don't get why journalists so rarely do basic fact checking of their own articles by asking an independent source for a final read-through. Many of them actually have policies against doing this, which leads to an endless stream of garbled articles that undermines their credibility without them even noticing.
> > “It’s a difficult problem to solve because it’s so widespread,” said Daan Leijen, a researcher at Microsoft, who says the company avoids use of null values in its software.
> Whatever Leijen said, I'm pretty sure it wasn't that.
I had a good lol when I read that, imagining some top-level decree to NEVER use null values in any context in all of Microsoft
> Whatever Leijen said, I'm pretty sure it wasn't that
What makes you so sure? This is Hoare's apology for creating the null reference:
> I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.
Given that null references cause crashes, why would it be unreasonable for a researcher at Microsoft to say they try and avoid them? Is this journalist really so far off?
> I really don't get why journalists so rarely do basic fact checking of their own articles
If you compare journalism to our other sources of information - such as comments on Hacker News and posts on social media - I think it holds up quite well, especially when the outlet is a reputable organization. It's quite fashionable for technology people to be highly critical of what they (pejoratively?) call "legacy media", but the alternatives that the technology industry have brought forward, like social media, are far, far worse in terms of accuracy, and also do very little of the kind of investigative reporting that is crucial for holding powerful officials to account.
> I really don't get why journalists so rarely do basic fact checking
It takes time and effort with no discernable upside. In fact, knowing the true facts would make it harder for journalists to bias the story in the way they want to without them feeling a bit bad for lying. It’s easier for them if they don’t know.
> that undermines their credibility without them even noticing.
Not really. Even the vanishingly small minority of readers who know the details of the story in question suffer from Gell-Mann amnesia, and continue to believe that all other stories (by the same paper, and even from the same reporter) are perfectly accurate.
Unfortunately, that seems to be the quality of "professional" journalism nowadays. I wouldn't be surprised if AI was complicit as well (though I don't supposed it'd make a difference as the slop was just as low quality prior to recent years, it may as well have been AI generated then too).
It used to be indie publications, and now I find indie YouTubers tend to be generally superior (though you still have to do your own filtering and selection of course).
I have my own mildly amusing story of breaking systems with my name. I have a twin with the same first initial. Any time we had to use a system at school which constructed usernames from some combination of first initial, surname, and date of birth, only one account would be provisioned between the two of us.
It became almost a ritual in the first term of the school year for us to make a visit to IT Support and request a second account... there was always a bit of contention between us about who got the 'proper' username and who got the disambiguated one!
I set-up a directory system for a small school. The students logins were a combination or initials and date of birth. When I created the scheme I knew that a set of twins would break the system. Somewhere between 3 and 5 years we finally got a set of twins that needed to modify the system. I called them to my office and found out which one came out first and appended 1 their usernames and 2.
At my school you were provisioned as <two-digit-start><first-three-firstname><first-three-last-name>. e.g. Joe Bloggs starting in 2000 is 00joeblo
Cue problems when two "Simon Smith" join in the same year. They were given 00simsmi and 00smisim I think.
I am pretty sure they spent 7 years at school forwarding each other email as various teachers assumed the default would work.
A former employer had a first letter of given name + last name (actually first 5 letters) convention for email addresses. They did have a fallback--usually with second letter of given name. But, of course, a lot of people just automatically emailed the convention with the result that certain email "twins" got misdirected mail.
A very common one at one point was that the CFO shared a first letter of name with his daughter. As I recall, he actually had the email in the usual convention so it's not like his daughter was receiving lots of highly confidential financial info but there was regularly misdirected mail.
My university was [initial][lastname][year][letter][letter]@school.com, allowing for 26^2 people to have the same initials and last name every year.
Despite having a rare last name and no twins, I was "AC".
I never know whether to use the ć in my name when signing up for systems that require full legal names (banks et al.). Even my own country's gov't sites break when I input my real name, but refuses to accept the name without the ć. It was a real pain in the ass when I had to make an appointment and go in to some dingy office at 6am because their system doesn't support one of the most common letters in Serbian names. There's like a 90% chance the dude who made the system has a similar name with a ć or č in it even!
Surprisingly this has never broken for me in either Indonesia or the Netherlands though, whenever I've put the ć in it just converted it to a regular C which is perfectly acceptable for me (for context, it's pretty easy to guess which C is actually a ć or č in Serbian, similarly for s/š or z/ž, so seeing text without the proper diacritics doesn't really matter in most cases). My Dutch ID even correctly has the ć!
In Severance S2E1, Mark W says to Mark S: "Would you be open to using a different first name to avoid confusion?"
Did anyone ever ask that of you or your twin?
Meanwhile, friend of mine's last name was Li, and IDs were first initial + last name + #
Her number had three digits.
I'm guessing it's the birthday that really messed the system up though?
It's weird that none of the systems automatically fell back to disambiguating with a number or something similar if the 'proper' one already existed. I'm wondering if you were a year or two apart instead, would the system simply silently fail to create a new username for the younger sibling when they joined?