Comments Page - Sublinear Time Algorithms

« Back Sublinear Time Algorithmspeople.csail.mit.eduSubmitted by gone35 5 months ago

dataflow 5 months ago
> There are problems for which deterministic exact sublinear time algorithms are known.
I can imagine silly examples (like "find the minimum element in this list under the assumption that no more than O(sqrt(n)) elements exceed the minimum"...), but what's an interesting example of this?
- _jab 5 months ago
  Binary search is the obvious example.
  What it and your example have in common is that a significant constraint exists on the input. I can't imagine how a deterministic algorithm with unconstrained input can process that input in sublinear time, but I would love to learn otherwise.
  dataflow 5 months ago
  I can't imagine that's what they meant? The text very specifically says: "Indeed, it is hard to imagine doing much better than that, since for any nontrivial problem, it would seem that an algorithm must consider all of the input in order to make a decision." For them to be thinking of binary search, they would have to be effectively saying "it is hard to think of binary search", which would be a rather absurd position from a CS professor, especially given binary search is quite literally the first algorithm every programmer learns.
  So I took it to mean there's something interesting here where the inputs could literally be anything, not heavily constrained. But I can't imagine what that might be.
  kadoban 5 months ago
  > especially given binary search is quite literally the first algorithm every programmer learns.
  I get what you're saying, and it doesn't change your point, but: no _way_ is binary search the first algorithm people learn. For binary search to even be intelligible you already have to know, at a minumum linear search and the concept of sorting (you probably know a sorting algorithm first). You also learn dozens of other really simple algorithms first in practice.
  ssivark 5 months ago
  > no _way_ is binary search the first algorithm people learn. For binary search to even be intelligible you already have to know, at a minumum linear search and the concept of sorting
  Almost every 6-10 year old kid who had to use physical dictionaries intuitively learned (probably even discovered by themselves) something like binary search. It's a different matter whether they could formalize that into an algorithm and write code to handle all the edge cases. But the basic idea is very intuitive. Kids can also pick up the intuition to incorporate improvements even beyond balanced binary search eg. there might be a lot of words starting with "S" so split into two groups at a little less than the middle, etc.
  smokel 5 months ago
  > Almost every 6-10 year old kid who had to use physical dictionaries intuitively learned (probably even discovered by themselves) something like binary search.
  I find this highly unlikely. It might be true for those children who grow up to study computer science, though.
  russfink 5 months ago
  I would argue a dictionary word lookup problem for a six year old is closer to a skip list than a binary search.
  karparov 5 months ago
  Moving the goal post?
  If you are asking which is the "first algorithm" a human learns in their life then it's likely more related to movement (crawl? walk? move food towards mouth?) or selection (which item can I eat? who are my parents?) rather than a physical dictionary. Even considering that it's been a while since kids encountered a physical dictionary.
  If you are asking about formal algorithms then we're talking about the beginning of a programmers or computer scientists education and then it's usually some form of O(n^2) sort that they will encounter first, if we don't count things like "how to add two multi-digit integers" which is typically an algorithm every kid learns in primary school.
  Binary search tends to be one of the first recursive algorithms that are taught which is another level entirely regarding intellectual development.
  ssivark 5 months ago
  I guess my response was to how I read your comment fitting in with the higher level discussion. My main point is that many of these algorithms are intuitive, and kids learn these much earlier than when they learn formal programming (which might typically be in their teens).
  Looking over your comment again, I also don't dispute that linear search and sorting are simpler -- even toddlers learn these.
  bee_rider 5 months ago
  Agreed, WRT the bigger picture; lots of little algorithms could come before binary search.
  But, giving them binary search before sorting kinda works. It is motivating. If you do sorting first, it just seems like a weird high-effort excursion into a niche bookkeeping thing about lists. Once they see how fast binary search is (just give them pre-sorted lists to run it on), sorting becomes interesting, right?
  linguae 5 months ago
  This is what I do in my introductory data structures and algorithms course at a Bay Area community college: I teach binary search as part of my introduction to recursion, and then the following lectures are a series of lessons on sorting algorithms, beginning with selection sort and then insertion sort. After teaching these O(n^2) algorithms, I spend a lecture on merge sort and then have a lecture on quicksort, both for covering O(n lg n) sorts and for introducing my students to divide-and-conquer recursive algorithms.
  bee_rider 5 months ago
  It is a shame that quicksort has to be covered. I mean, it does have to be covered. But it has an O(n^2) cost for a particular input, despite being generally considered nlog(n), seems to me to introduce some fuzziness in an otherwise solid concept.
  But it does need to be covered.
  Unfortunately.
  (Mergesort is best).
  ncruces 5 months ago
  Quicksort is an important, extremely flexible, and very hard to beat unstable comparison sort.
  It's based on a very simple but powerful idea/strategy (divide & conquer). Its flexibility means it can be adapted to partially sort, find top-N, find the median or any other rank, all optimally.
  And it's so much faster in practice than everything else (why?), that even after mitigating its worse case, it often comes out ahead.
  Also, it is relevant/necessary to teach the concept of average, best and worse cases in complexity analysis. What best way to do it than “the best sorting algorithm is terrible for some inputs”?
  You can also use it to teach/learn adaptive algorithms (you're almost expected too): switch to something else on the base case; or on the worst case; can we do better for low cardinality; etc.
  So, of course it needs to be covered. There's more to learn from 200 lines of Quicksort than from Mergesort: https://github.com/ncruces/sort/blob/main/quick/quick.go
  Retric 5 months ago
  > very hard to beat
  radix sort
  If you’re forced to use comparative sorting and write it by hand and it’s near random etc then Quicksort isn’t that bad but even then there’s better options.
  ncruces 5 months ago
  Yes, I said comparison sort.
  And it is hard to beat, which is why the standard libraries of languages like C++, Rust, Go, C#, Java (etc) use some quicksort variant for their unstable sorts (and some mergesort variant when they need stability).
  All of the above aren't forced to use comparison sorts, they just do; no other caveats required.
  Retric 5 months ago
  > Yes, I said comparison sort.
  You mentioned it was a comparisons sort, but not that you were only comparing it with other comparison sorts.
  There’s also a huge caveat, libraries know less about your data than you do. Thus different default choices become optimal but any decent library will give you many options for very good reasons.
  Quicksort is relatively terrible for partially ordered lists like appended timestamps from multiple processes etc etc. It’s only ok at a very narrow range of problems, and the important bit isn’t the implementation but where those borders are.
  ncruces 5 months ago
  > Quicksort is relatively terrible for partially ordered lists like appended timestamps from multiple processes etc etc.
  It's not, not really: pdqsort handles those organically.
  I'm sorry if "best sort", "so much faster" and "very hard to beat" felt like baits. But "huge caveat", "relatively terrible" and being "forced to use comparative sorting" are not fair descriptions of quicksort or why it's used and chosen by standard libraries.
  Regardless, my point was that there's a lot to learn from quicksort. It's not “a shame” that it must be taught, and mergesort is not “best.”
  bee_rider 5 months ago
  Mergesort is best because it is the most elegant and beautiful sorting algorithm. Just merge lists so that they stay sorted.
  Quicksort has all sorts of nonsense bouncing around and picking pivots. Eww. Terrible.
  I will admit that I could have been more clear that I was evaluating algorithms based on beauty, but given that info I’m sure the conclusion is obvious.
  ncruces 5 months ago
  Now build me a selection algorithm out of mergesort.
  Just the other day I sped up a process by an order of magnitude because someone was sorting to do weighted median selection. Which is easy to build on top of std::nth_element. Top-k is another common need.
  Quicksort is still worthy today also because of this.
  Retric 5 months ago
  Or you could just have people study QuickSelect and gain the same benefit without wasting on a slower sorting method when taking into account cache etc.
  Retric 5 months ago
  Ehh objectively false is objectively false.
  pdqsort fails to detect many partially ordered lists.
  So you might argue about the scale of “Huge caveat” but trying to discover information about a list takes computing cycles. Even just pdqsort vs pdqsort_branchless exists due to that exact caveat, it’s fundamental to the nature of the problem. There’s infinitely ways data could have an inherent pattern which would speed up the process of sorting it and no way to algorithmically notice all patterns in such a way as to universally speed up sorting.
  As to what there is to learn from Quicksort. I think it’s a poor introduction to algorithms not because it’s of the nature as an algorithm, but early on its many pitfalls draw attention away from more useful topics. Later on it’s simply not complex enough to be worth much attention. So sure in an academic context it looks really appealing, yet when you dig into what the point of teaching algorithms it’s much harder to justify. It’s covered so frequently you’re not even going to see it on interviews.
  chongli 5 months ago
  But it has an O(n^2) cost for a particular input
  Even worse is the fact for naive implementations (such as students might come up with) the worst case behaviour occurs in very common cases such as sorting already sorted lists or reverse-sorted lists.
  ykonstant 5 months ago
  We have different notions of "covered", then. When I teach algorithms and introduce quicksort, the majority of the time is spent discussing strategies for choosing the pivot. I expect none of my students to implement a quicksort with bad pivot selection; if they do, that's my failure as a teacher and definitely failure in "coverage" of the algorithm.
  kadoban 5 months ago
  Yeah, it does work as something you learn with/right-before/right-after a sorting algorithm, depending on the teaching style.
  Aardwolf 5 months ago
  > no _way_ is binary search the first algorithm people learn
  It legit was the first one we learned, the first algorithm written on the blackboard by the professor (this was in the 2000s but the first algorithm lessons were on blackboard and paper!)
  Probably because something simpler linear like "find the minimum value in a list" is too dull as an algorithm example
  globnomulous 5 months ago
  It was actually the first algorithm I discovered and learned in a technical environment, when I was debugging my Skyrim mods list and realized I needed an efficient way to discover which of my hundreds of active mods were interacting, causing the dreaded neck-seam issue (It was Ethereal Elven overhaul and another whose name escapes me.)
  It's an unusually intuitive algorithm, so it wouldn't surprise me if it were one many people learn first.
  FreakLegion 5 months ago
  They say a little ways down:
  > there are classical optimization problems whose values can be approximated in sublinear time
  This can actually be quite useful if the approximation is guaranteed, or even if it isn't, as long as it works well in practice.
  https://en.wikipedia.org/wiki/Hardness_of_approximation
  lqet 5 months ago
  Here a a few examples, linked in the article:
  https://www.dcs.warwick.ac.uk/~czumaj/PUBLICATIONS/DRAFTS/Dr...
  Searching in sorted lists is the first example, although they acknowledge that "the assumption that the input array is sorted is not natural in typical applications." They then give an non-trivial variant of this problem, where the sorted list is stored as a linked list, so that you cannot directly jump to some element at position i.
  Another example for a sublinear algorithm they give is to check whether 2 convex polygons intersect, using the algorithm by Chazelle and Dobkin.
  undefined 5 months ago
  [deleted]
  HelloNurse 5 months ago
  It is obvious that binary search leverages the property that the input is sorted in order to ignore part of the input, but it is less obvious to see it abstractly as an exotic specimen of sublinear exact algorithm rather than merely as a simple special case of search, and it is even less obvious to investigate what weaker (and hopefully cheaper to guarantee) input constraints allow sublinear search algorithms.
  dzaima 5 months ago
  I read that as saying that binary search isn't among those "nontrivial problem"s, along with most other things with known exact deterministic sublinear time algorithms.
  And your first quote is followed by "However, for most natural problems", which further indicates that the known exact algorithms are for trivial problems.
  Aurornis 5 months ago
  Binary search requires a sorted input, which requires that you first consider all elements of the input data set.
  The sublinear algorithms this page is discussing require that the algorithm not consider all elements of the data set and you're not allowed to pre-process it with an O(n) or greater algorithm.
  So no trees, no binary search. It's a different set of algorithms.
  karparov 5 months ago
  Another obvious example: What's the mean of an unsorted list of integers? If you do a random sample of sqrt(n) values and mean over that, you guess is with high probability pretty good. Or a log(n) sample. (That's how election polling works too which uses even a O(1) sample though not random.)
  Edit: Ah, GP asked for deterministic exact.
  undefined 5 months ago
  [deleted]
  amelius 5 months ago
  What if one if the integers is significantly larger than the rest?
  spoaceman7777 5 months ago
  I mean, yeah, binary search is sublinear, but the data has to be ordered into a binary search tree to be able to use it, which has a much more familiar (non-sub-linear) runtime.
  I have to assume the reason for the article wasn't to talk about the runtime of algorithms that operate on data that's already in a partially solved state.
- JonChesterfield 5 months ago
  Data structured as trees permit a lot of sublinear operations. Set intersection for example, you traverse the two trees in the same order, and where a node exists in one and not the other, you know nothing under it is in the intersection.
  Aurornis 5 months ago
  In this case, Sublinear Algorithms refers to algorithms that don't consider the entire input set.
  A B-Tree would not qualify because it must first consider the entire input set. Only later operations can be less than O(n) because you've already done O(n) or greater work on the data set.
- ssivark 5 months ago
  Think from an information theory perspective. It is rarely true that you cannot say anything more about the data than what is assumed by classical algorithms. We almost always have some more information depending on the specific domain under consideration. Eg: Sorting a list of ages might be very different from sorting a list of account balances.
  Any time I have information that reduces the entropy of the dataset, I want to be able to leverage that into runtime improvements of algorithms for pertinent questions. And it would be great to develop a structured framework for that instead of handling special cases in an ad-hoc manner.
  ssivark 5 months ago
  As one example of such a more general framework -- (variants of) belief propagation might be a good answer if dataset constraints could be cleanly formulated as distributions to be reasoned with.
  oxavier 5 months ago
  My work is about inferring solutions to Constraint Satisfaction Problem by using belief propagation in the corresponding constraint network, a few keywords caught my eye here :)
  Do you have any illustrative example so I can understand better what you are hinting at?
  Cheers
- minutillo 5 months ago
  https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-sea...
  dataflow 5 months ago
  Isn't that O(mn) worst-case run time?
  quuxplusone 5 months ago
  No, it's O(n) worst case. (The Wikipedia sidebar says "O(mn)," but that's apparently for a maimed version of the algorithm without a key part they're calling "the Galil rule." That's a special usage of the phrase "worst case"! In the absolute worst case, your implementation could have a bug and never terminate at all!)
  Anyway, the point is that it's O(n/m) in the usual case. Which remains technically linear, not sub-linear; but at least it's O(n) with a constant factor smaller than 1.
  dataflow 5 months ago
  > No, it's O(n) worst case. (The Wikipedia sidebar says "O(mn)," but that's apparently for a maimed version of the algorithm without a key part they're calling "the Galil rule.")
  It's not "maimed", it's literally the original algorithm. And what the article was specifically analyzing. And exactly what the parent was citing. "No" here makes no sense, unless your goal was just to write "no" to someone on the internet.
  > That's a special usage of the phrase "worst case"! In the absolute worst case, your implementation could have a bug and never terminate at all!
  Wikipedia is describing that algorithm, not a different broken one. If your code is buggy then you're not implementing that algorithm, you're implementing a different one that happens to be buggy. It's completely absurd to suggest "the absolute worst case" of an algorithm could include that of a different algorithm. Whether the latter is correct or buggy.
  > Anyway, the point is that it's O(n/m) in the usual case.
  Sure, and the halting problem is O(1) in the best case.
  > Which remains technically linear, not sub-linear
  So it's neither an example of what the page was talking about (sublinear) nor an answer to my question (interesting sublinear).
  > but at least it's O(n) with a constant factor smaller than 1.
  If "usually only reads a fraction of the input" was what I was looking for, I would've realized String.indexOf(char) or Array.find(element) is an answer, and not needed to ask a question here.
  undefined 5 months ago
  [deleted]
- an_ko 5 months ago
  Fully dynamic connectivity on general graphs comes to mind. https://en.m.wikipedia.org/wiki/Dynamic_connectivity (Graphs, with operations to connect and disconnect nodes, and to check whether two nodes are connected by some path.)
  State of the art there is poly-logarithmic time worst case.
- gleenn 5 months ago
  Anything probabilistic? There are so many interesting fields where you can assume the distribution of a dataset and the take a sample of data and assert things about it with a high degree of confidence. All of modern AI is built on so much of this. All the Deep Neural Nets are making grand assumptions about the shape of meaning of data, they literally assume convexity of the space and they have clearly very interesting results despite the imprecision of the model. Anything dealing with finance is also dealing in lack of data. So if you had a list of prices of a stock over time, you could probably start making assumptions exactly like that, tgat the probability that it doubles over a short time is so unlikely so you can subsample the data and have it still be super useful to make assumptions exactly, especially when you have intractably large data.
  dataflow 5 months ago
  >> deterministic exact
  > Anything probabilistic?
  Are you sure you're answering the same question I'm asking?
- latency-guy2 5 months ago
  I wouldn't call that example silly IMO.
  I'd consider all the varieties of B-Tree to be real example, which goes to any DBMS. You can extend this out to any direction you want like logging for concrete examples.
  GIS/Mapping/computer vision has tons of algorithms and data structures that all needed to do better than linear time as well.
  Stream processing in general is another, but that ends up being probabilistic more often than not, so weak punt into that direction.
  If you expand the use case out to sublinear space as well, I'd argue for compression of all kinds.
- qyph 5 months ago
  https://en.wikipedia.org/wiki/AKS_primality_test though it's number theory, and concerned with numbers of size n, rather than lists of length n.
  Also relevant: https://www.cs.yale.edu/homes/aspnes/pinewiki/Derandomizatio...
  Ar-Curunir 5 months ago
  AKS is not sublinear. It runs in poly(n) time, where n is the number of bits in the input (i.e. input size).
  dataflow 5 months ago
  > https://en.wikipedia.org/wiki/AKS_primality_test though it's number theory, and concerned with numbers of size n, rather than lists of length n.
  They were talking about not reading a lot of the input, so that's not it.
  alok-g 5 months ago
  For that case, a better 'n' to use could be the number of digits in the number.
- deycallmeajay 5 months ago
  What about GWP-ASan? It basically samples a portion of allocations with ASan looking for memory corruption bugs. If your app is used enough it’ll find the bugs eventually without the performance overhead. https://llvm.org/docs/GwpAsan.html
- ice-water 5 months ago
  A round-robin tournament with n players, where you have the results (win/lose) of all the games and you must determine whether there is a player who won all his games.
  The input is the n(n-1)/2 bits indicating the results, but the existence of a winner can be determined in O(n) steps (fun exercise).
- BugsJustFindMe 5 months ago
  Consider that you often need to decide when to stop looking at data before making a decision using what you've seen so far.
  https://en.wikipedia.org/wiki/Optimal_stopping
  dataflow 5 months ago
  Cool as that is, I don't think that's a "deterministic exact sublinear time algorithm".
  BugsJustFindMe 5 months ago
  You're probably right. Apologies. I think I misread the question initially.
- wellow 5 months ago
  Lempel-Ziv compression is a good example: https://arxiv.org/abs/2409.12146
- z2210558 5 months ago
  Assuming shuffled list: estimate of mean, estimate of cardinality etc etc
  munchler 5 months ago
  I think any sort of estimation is ruled out by the word “exact”.
- tbrownaw 5 months ago
  Public opinion polling is sub-linear in the size of the population.
  dataflow 5 months ago
  > Public opinion polling is sub-linear in the size of the population.
  How is public opinion polling deterministic and exact?
- indoordin0saur 5 months ago
  I'm surprised this is even a debate on HN. Aren't we mostly computer scientists here? Several examples on wikipedia: https://en.wikipedia.org/wiki/Big_O_notation#Orders_of_commo...
shae 5 months ago
This sounds like a useful way to mix in statistics and get useful approximations. I'm reading one of the survey links and it's approximately eye opening.
dooglius 5 months ago
Often for problems taking integer input x, formal CS will define the input to be something like 1^x (the character '1' repeated x times, as opposed to binary) so that the time complexity is in terms of x. This class of problems seems amenable to sublinear time since only only needs log(x) steps to determine x.
- LPisGood 5 months ago
  Actually if you try to formally define sublinear time algorithms in this manner they all collapse to constant time algorithms.
  To see why, realize that to determine x, the TM needs to look at x many bits. If this algorithm only needed, say, log(x) many bits to produce output then all input values with more than log(x) bits are indistinguishable by this TM.
  tromp 5 months ago
  The notion of sublinear time only makes sense with Random-access Machines [1], not with Turing Machines.
  [1] https://en.wikipedia.org/wiki/Random-access_machine
  dooglius 5 months ago
  Pretty much any standard notion of time complexity requires one to assume a random access machine, or else e.g. an array access is O(index) time.
  tromp 5 months ago
  Turing machines are still quite prevalent in complexity theory. Of course, robust classes like P and NP are invariant under various models, but also classes like DTIME and NTIME are defined for (single- or multi-tape) Turing machines [1].
  [1] https://en.wikipedia.org/wiki/DTIME
doormatt 5 months ago
So like HyperLogLog?
- qyph 5 months ago
  Hyperloglog analyses generally assume access to the full data stream, and so are O(n) at a minimum. Perhaps by running hyperloglog on a sublinear sample of the dataset you'd get an algorithm in this class.
  doormatt 5 months ago
  That makes sense, thanks for explaining!
- dataflow 5 months ago
  HyperLogLog uses sublinear space, not sublinear time.