Comments Page - Stepanov's biggest blunder? The curious case of adjacent difference

« Back Stepanov's biggest blunder? The curious case of adjacent differencemmapped.blogSubmitted by signa11 4 days ago

voidmain a day ago
partial_sum has the same problems (eg when T - T isn't T) and would be more generic if it took the initial accumulator as a separate parameter (and then it and a more generic adjacent_difference would be symmetrical again)
- kragen 7 hours ago
  I agree that that would solve the symmetry problem, and it would allow the type of the items in the generated sequence to be different from the input sequence. However, the problem is much less severe for partial_sum, because in most cases, it is possible to add two items of the relevant type.
  The concept of "torsors" seems relevant here: https://math.ucr.edu/home/baez//torsors.html
  > We can express this in terms of torsors as follows: energy differences lie in the group of real numbers R, but energies themselves do not: they lie in an "R-torsor".
  So in the date example, we have a set T of intervals like "three days" which we can add to members of the set D of datetimes to get new elements of D, which is the torsor of T:
  D + T : D
  (I always forget which one is the torsor. It's the one that doesn't have an identity, but I haven't yet said the identity of what.)
  And we can subtract two D elements to get the T from one to the other:
  D - D : T
  Which we define with the property
  d1 + (d2 - d1) = d2
  This is sufficient to define your suggested versions of the algorithms, which are, as you say, inverses.
  partial_sum : (D, T[n]) -> D[n+1] adjacent_difference : D[n+1] -> (D, T[n])
  That is, partial_sum is a function from a torsor element, and a sequence of n elements of the set T of deltas on that torsor set, to a sequence of n+1 torsor elements, and vice versa for adjacent_difference.
  Now, we can try to define an interval addition operation:
  T + T : T
  If we have two intervals t1 and t2, their sum is:
  t1 + t2 = ((d1 + t1) + t2) - d1
  But nothing above guarantees that this result is independent of d1! And I suspect that a real-world example might be lurking in calendar math, where you might support intervals of "±n days" and "±m months", and an interval like "+2 months" might be realized as a varying number of days depending on month lengths, but never returned from the datetime-subtraction function.
  In such a case your chosen set of intervals might not contain an element that could correctly represent "+2 months" + "+7 days". With different choices of d1, you might get a sum of "+67 days", "+66 days", "+69 days", etc.
  The STL prefix_sum can't handle that, because it needs T + T to be well-defined. But your more-generic version takes a starting D element, so it can handle it fine.
  However, I think it's fairly unusual to use such unruly data types with these algorithms. Torsors are defined, normally, on groups, so addition on T is guaranteed to not just exist, but have an identity and an inverse for every element, and be associative. And in such cases the STL's T[n] -> T[n] prefix_sum is in some sense "just as generic", because you can freely transform between D[n] and T[n] by adding or subtracting a predetermined "epoch date" or "freezing point".
  ——*——
  There are some lovely wildflowers here if you have the patience for rambling off the trail a bit! The associativity of group operations gives rise to an efficient parallel algorithm for partial_sum (prefix sum) closely related to the parallel sum algorithm that Stepanov says originally inspired his work on generic algorithms when he realized that it was applicable to any monoid.
  Often parallel algorithms correspond closely with incremental algorithms; that O(lg N) sum algorithm applicable to general monoids corresponds closely with an O(lg N) incremental sum update algorithm applicable to general monoids, where you sift updates up a binary tree from the changed input to the root. But for abelian groups like integer addition, there's an O(1) update algorithm: subtract the old value from the new, and add that difference to the sum.
  However, the O(lg N) version is applicable to other monoids, such as max and min, which yields efficient incremental algorithms for problems like morphological dilation and erosion, where prefix sums are not applicable.
fph 2 days ago
I think it's also because C++ has no generic concept of "zero"; otherwise one could have defined the first element of adjacent_difference(v) as v(1)- zero<typeof(v)>, and it would have been type-stable.
- plorkyeran a day ago
  T{} is now sort of that, but it didn't exist yet when adjacent_difference was written.
  tialaramex a day ago
  Sort of. T {} is IIUC C++ "zero initialization" but this doesn't actually promise to give us the zero (additive identity) of T but instead to basically smash all T's constituent elements to zero which may not have that effect. This does what you meant for simple types like the signed integers or floats at least.
- nurettin 5 hours ago
  They could have added an init parameter if int() or double() isn't enough
- akoboldfrying 2 days ago
  I think that would fix the issue at the type level (up to a point; for unsigned types, the "correct" type for the result of a subtraction is underdetermined -- it could be any of {same type, larger signed type, same-size signed type} depending on the circumstances).
  But I think the more serious niggle is the fact that that first element shows up in the output at all. OTOH, I suppose you could write a discard_first iterator adaptor that ignores the first write and increment and passes the rest through to the underlying output iterator.
AnotherGoodName 8 hours ago
std::vector<bool> is the actual biggest stl blunder fwiw.
It’s a overridden specialisation of vector that only occurs when you use bool as the type. It tries to store the bools in a bit array which completely breaks various vector and container functionality.
Universally agreed as a blunder and a great example of premature optimisation. Herbs commentary on it: http://www.gotw.ca/publications/N1185.pdf
- gpderetta 5 hours ago
  Was vector<bool> from the original STL or is it a committee invention?
  tialaramex 36 minutes ago
  It does look like the SGI STL from the 1990s has the std::vector<bool> mistake
  https://github.com/uatuko/sgi-stl/blob/master/lib/stl_bvecto...
  At first I thought OK there's no sign of it, and then I realised it'd be a specialisation in a separate part of the codebase and sure enough it's there, it's really obvious in hindsight that this just isn't the same type, it's all separate implementation, but hey that's a specialisation right ?
  I suppose it's possible this is a result of committee feedback and was not in the "original" STL from even before that ?
wowczarek 9 hours ago
As someone who's done very little with C++ - not enough to explore what std:: has to offer today anyway - but a lot with C and with time and frequency, looking past the quirk and the typing consequences, all I could see was "oh look, free Allan Deviation!".
getnormality 2 days ago
For anyone wanting to go deeper, Knuth's Concrete Mathematics covers the discrete calculus topics mentioned here (and much more).
- pizza 2 days ago
  This is also known as "The Fundamental Theorem of Stream Calculus" in stream calculus. Using coinduction for an (infinite) stream sigma, eg
  sigma(0) = head(sigma) sigma' = tail(sigma) (a ++ sigma)' = sigma