I find floating point NaN != NaN quite annoying. But this is not related to Rust: this affects all programming languages that support floating point. All libraries that want to support ordering for floating point need to handle this special case, that is, all sort algorithms, hash table implementation, etc. Maybe it would cause less issues if NaN doesn't exist, or if NaN == NaN. At least, it would be much easier to understand and more consistent with other types.
I agree. In my opinion NaNs were a big mistake in the IEEE 754 spec. Not only they introduce a lot of special casing, but also consume a relatively big chunk of all values in 32 bit floats (~0.4%).
I am not saying we do not need NaNs (I would even love to see them in integers, see: https://news.ycombinator.com/item?id=45174074), but I would prefer if we had less of them in floats with clear sorting rules.
There's a helpful crate that abstracts that away: https://docs.rs/ordered-float/latest/ordered_float/
You have a strongly ordered `NotNan` struct that wraps a float that's guaranteed to not be NaN, and an `OrderedFloat` that consideres all NaN equal, and greater than non-NaN values.
These are basically the special-cases you'd need to handle yourself anyway, and probably one of the approaches you'd end up taking.
I wonder if "any code that would create a NaN would error" would suffice here. I don't think it makes sense when you actually start to implement it, but I do feel like making a NaN error would be helpful. Why would you want to handle an NaN?
Well floating point operations never throw an exception, which I kind of like, personally. I would rather go in the opposite direction and change integer division by zero to return MAX / MIN / 0.
But NaN could be defined to be smaller or higher than any other value.
Well, there are multiple NaN. And NaN isn't actually the only weirdness; there's also -0, and we have -0 == 0. I think equality for floating point is anyway weird, so then why not just define -0 < 0.
I mentioned in a sibling comment, there's a crate that does this in a pretty simple and obvious way: https://docs.rs/ordered-float/latest/ordered_float/
If you don't handle NaN values, and there are NaNs in the real observations made for example with real sensors that sometimes return NaN and outliers, then the sort order there is indeterminate regardless of whether NaN==NaN; the identity function collides because there isn't enough entropy for there to be partial ordering or total ordering if multiple records have the same key value of NaN.
How should an algorithm specify that it should sort by insertion order instead of memory address order if the sort key is NaN for multiple records?
That's the default in SQL Relational Algebra IIRC?
What is a good sort key for Photons and Phonons? What is a good sort key for H2O water molecules?