This article and its associated HN comment section continue in the long tradition of Big O Notation explainers [0] and getting into a comment kerfuffle over the finer, technical points of such notation versus its practical usage [1]. The wheel turns...
[0] https://nedbatchelder.com/text/bigo.html
[1] https://nedbatchelder.com/blog/201711/toxic_experts.html
From what I read in the comments of the first post, the Pyon guy seems very toxic and pedantic, but the rebuttal by Ned isn't great. For example, nowhere in the rebuttal is the pedantic technical detail ever actually described. In fact the prose reads very awkwardly in order to circumlocute around describing it, just repeatedly naming it "particular detail". In my view, the author overreaches: he dismisses Pyon not only for the delivery of his criticism (which was toxic) but also the content of his criticism (why?).
Ultimately Ned is in the right about empathy and communication online. But as an educator it would have been nice to hear, even briefly, why he thought Pyon's point was unnecessarily technical and pedantic? Instead he just says "I've worked for decades and didn't know it". No one is too experienced to learn.
EDIT: I just skimmed the original comment section between Pyon and Ned and it seems that Ned is rather diplomatic and intellectually engages with Pyon's critique. Why is this level of analysis completely missing from the follow-up blogpost? I admit to not grasping the technical details or importance, personally, it would be nice to hear a summarized analysis...
Such is the curse of blogging: when writing a series of posts, authors naturally assume that the readers are as engaged and as familiar with the previous discussion as they are.
In reality, even regular subscribers probably aren't, and if you're a random visitor years later, the whole thing may be impossible to parse.
> Why is this level of analysis completely missing from the follow-up blogpost
Because that's not what the article is about. It's not about whether Pyon was correct or wrong, it's that they were a dick about it. Their opening words were "you should be ashamed". They doubled and tripled down on their dickery later on.
And no matter how good your point is or how right you are: if you're a dick about it people will dislike you.
This is why being able to read and write math is so important. All this confusion can be answered from a one sentence definition.
Toxic expert here! I hate when blog posts try to teach complex subjects. It's almost always a non-expert doing the teaching, and they fail to do it accurately. This then causes 1) the entire internet repeating the inaccuracies, and 2) the readers make no attempt to do further learning than the blog post, reinforcing their ignorance.
I'll double down on my toxicity by saying I didn't like the page layout. As someone with ADHD (and a declining memory), I need to be led through formatting/sub-headings/bullets/colored sections/etc into each detail or it all blends together into a wall of text. The longer it takes to make a point (visually and conceptually), the more lost I am. I couldn't easily follow it. The Simple Wikipedia page was more straight to the point (https://simple.wikipedia.org/wiki/Big_O_notation), but reading the "full" Wikipedia page thrusts you headlong into a lot of math, which to me signifies that this shit is more complex than it seems and simplifying it is probably a bad idea.
> Toxic expert here! I hate when blog posts try to teach complex subjects. It's almost always a non-expert doing the teaching, and they fail to do it accurately. This then causes 1) the entire internet repeating the inaccuracies, and 2) the readers make no attempt to do further learning than the blog post, reinforcing their ignorance.
Ask yourself why. The usual answer is that top experts either can't be bothered to create better content, or they actively gatekeep, believing that their field must remain hard to learn and the riff-raff must be kept out.
I think the first step is to accept that laypeople can have legitimate interest in certain topics and deserve accessible content. The remedy to oversimplified explanations is to write something better - or begrudgingly accept the status quo and not put people down for attempts that don't meet your bar.
It's also good to ponder if the details we get worked up about actually matter. Outside the academia, approximately no one needs a precise, CS-theoretical definition of big-O notation. Practitioners use it in a looser sense.
> Ask yourself why. The usual answer is that top experts either can't be bothered to create better content, or they actively gatekeep, believing that their field must remain hard to learn and the riff-raff must be kept out.
Before asking why, ask if. There are good articles about complex topics, they just get drowned out by bad articles.
Ideally it would be both.
It's a bit like the choice between two doctors. One is very polite, spends an hour by your bedside every day, smiles and holds your hand and encourages you not to give up but has no idea about how to interpret your symptoms, has a horribly confused idea of what's going on but can explain that mess in his head to you in a reassuring tone that feels authoritative.
The other doc is morose, doesn't smile, the meetings are brief like an interrogation but he knows exactly what's up, spends the time on reading the literature, case studies, phones up other doctors who treated similar illnesses, cuts you and sews you back at the right spot and hands you the exact pills that will get you on your feet in a few months, but at the same time was distant and cold throughout the whole thing and talked to you as if you were a car in the repairshop.
Ideally, it would be both. Of the two I'd still choose the second guy.
> Ask yourself why. The usual answer is that top experts either can't be bothered to create better content, or they actively gatekeep, believing that their field must remain hard to learn and the riff-raff must be kept out.
This oversimplifies things.
It's sometimes a third option: the topic is complex enough it cannot be digested into a ready to consume blogpost without previous work (reading and practicing), which in turn requires previous work, and so on, until you've turned it into an introductory course.
And that's not gatekeeping or "cannot be bothered to simplify" -- it's a falsehood, a kind of internet new age mantra, that everything can be made simple enough to explain in a blog post. Some things can't. There's nothing elitist about it, since everyone has the mind power to take that introductory course.
> It's also good to ponder if the details we get worked up about actually matter. Outside the academia, approximately no one needs a precise, CS-theoretical definition of big-O notation. Practitioners use it in a looser sense.
The article made some genuinely serious mistakes and the author here (graciously, it has to be said) admitted to being wrong about some things like Big O being "the worst case".
In this cases, maybe the damage could be limited by saying "I know this isn't Big O, this is what some engineers call it but it's actually something different. Because practitioners find useful nonetheless, I will explain it (explanation follows)".
I found the visual presentation top-notch by the way, it's clear some effort was put into this and it shows!
There is a lot of value in explaining the gist of something, even if it's not entirely accurate. In my experience, the cases where even that is impossible are very rare, at least when it comes to practical computer programming.
You can't satisfy everyone.
My experience is the opposite. I hate the colorful books with many little boxes, pictures with captions in floaters, several different font sizes on the page, cute mascots etc, where even the order or reading is unclear.
Instead I found it much easier to learn from old books made before the 70s-80s, sometimes back into the 40s. It's single column, black on white, but the text is written by a real character and is far from dry. I had such a book on probability and it had a chapter on the philosophical interpretations of probability, which took the reader seriously, and not by heaping dry definitions but with an opinionated throughline through the history of it. I like that much better than the mealy mouthed, committee-written monstrosity that masks any passion for the subject. I'd rather take a half page definition or precise statement of a theorem where I have to check every word but I can then trust every word, over vague handwavy explanations. Often these modern books entirely withhold the formal definitions so you're left in a vague uneasy feeling where you kind of get it, have questions but can't find out "is it now this way, or that way, precisely?". And I'm not so sure that these irrelevant-cute-pics-everywhere books are really better for ADHD at the end of the day, as opposed to distraction free black on white paragraphs written by a single author with something to say.
Most of the time inaccurate knowledge is better than no knowledge.
I bet you also have subjects where you use and spread inaccuracies unless you are universal expert
> Toxic expert here!
I hate that this declaration is both hilarious and one many might suggest I need to prefix with regularly.
:-D
That second link is not about Big-O and is not something anyone should model.
Hehe, yes, Ned sent me a lovely email the other day about it. Happy to be doing my part.
I think the lesson of those articles is not that people should stop trying to correct a misleading or incorrect explanation, but rather, that some people on the internet (like the "expert" described there) are more interested in picking and winning "fights" rather than gently helping the author correct his article. If you see Pyon's comments, he was very aggressive and very internet-troll-like.
The wrong take from this would be "... and therefore, technical details don't matter and it's ok to be inaccurate."
Loving the interactive visualizations. Reminded me of https://kelseyc18.github.io/kademlia_vis
Can someone recommend a collection of similar pages?
O(1) in many cases involves a hashing function which is a non-trivial but constant cost. For smaller values of N it can be outperformed in terms of wall clock time by n^2 worst case algorithms.
I mean, true obviously, but don't say that too loud lest people get the wrong ideas. For most practical purposes n^2 means computer stops working here. Getting people to understand that is hard enough already ;)
Besides, often you're lucky and there's a trivial perfect hash like modulo.
What do you mean? Modulo is not a perfect hash function... What if your hash table had size 11 and you hash two keys of 22 and 33?
I also don't understand your first point. We can run n^2 algorithms on massive inputs given its just a polynomial. Are you thinking of 2^n perhaps?
n^2 is probably the worst offender of these algorithms - it's fast enough to get into production and slow enough to blow up once you start using it.
Rockstar infamously wasted five minutes loading GTAV because they had an n^2 algorithm in the startup sequence.
n^2 algorithms on _massive_ inputs seems a little far fetched, no?
Around one to one hundred billion things start getting difficult.
The challenge with big-O is you don’t know how many elements results in what kind of processing time because you don’t have a baseline of performance on 1 element. So if processing 1 element takes 10 seconds, then 10 elements would take 16 minutes.
In practice, n^2 sees surprising slowdowns way before that, in the 10k-100k range you could be spending minutes of processing time (10ms for an element would only need ~77 elements to take 1 minute).
N^2 is just two nested loops. It trivially shows up everywhere you have 2D arrays.
Do I really need to make my code spaghetti if the array sizes are small enough to not impact my speed ?
if you can guarantee N is small, then it’s probably not an issue.
> true obviously
Well, it doesn't seem that obvious, at least not to everyone, because several times I've seen people rewrite things to make it O(1) when it was actually slower for the vast majority of cases (or even all cases).
Even if it's slower for the vast majority of cases, but there are rare cases where the computer would otherwise freeze, that's still a win.
Your computer will not freeze if your input is small enough no matter what big-O says. In many cases it's guaranteed to be small or small-ish.
I found https://www.bigocheatsheet.com/ and https://visualgo.net/en helpful in the past.
Having gone through electrical engineering, I always felt that Big O notation was never properly explained unless I happened to miss every time it was introduced. It always felt like it was just hand waived as something we should already know. I'm curious what level of math or computer science is usually responsible for introducing it.
A function f(x) is said to be O(g(x)) if f(x)/g(x) is bounded, that is there is some C so that for every x, f(x)/g(x) < C .
In computer science f(x) is often some complexity function, like number of some specific operations when running an algorithm to completion.
This needs to be refined: f(x) is O(g(x)) if there exists some X >= 0 such that f(x)/g(x) is bounded for all x > X.
Otherwise, we cannot say that 1 is O(x), for example.
What an asshole way of explaining it. For practical purposes (don't @ me if this is technically wrong, I don't give a shit) it's a description of how the cost of an algorithm grows with the size of the input. For example if the cost doubles every time the input increments then it's O(n^2). If the cost increases in step with the input size then it's O(n). Simples.
I don't have a computer science education, but I had a math one. I found it a great explanation, but I understand why others would feel it's not.
It definitely is not an "asshole" way of explaining it though.
Okay unless you're doing numerical analysis where they use it for the growth rate of the error term, which has nothing to do with cost or algorithms
This is not only “technically” wrong, but completely wrong. O-notation has absolutely nothing to do with algorithms.
Let's not over correct, of course O-notation has something do with algorithms for the working programmer.
If the cost doubles for each increment in the input size, it's O(2^n).
Right you are, my bad.
My university had an "algorithm analysis" class (required for CS majors) which covered it along with various kinds of proofs. That was a junior/senior year class though; I think it was really expected that you were already somewhat familiar with the concept by the end of your first year (via osmosis I guess).
The Discrete Math class in the CS track is where I got the bulk of it.
That’s because you could define it in a lot of different ways. For example, if you define it as the number of steps it takes some Turing machine representing some algorithm to halt, then there is no such thing as a logarithmic time algorithm; O(lg n) = O(1)
Why is that?
Log(n) grows slower than n. That means there is some number N where the process terminates in fewer than N state transitions. But in a Turing machine, you can read at most one new cell on the tape with each state transition. Therefore the algorithm halts in constant time.
What is the relationship between n and N? log(n) is still unbounded so I can’t make sense of your comment.
There is no relationship between n and N, of course. N is some fixed constant.
Log(n) is unbounded, yes, but if a Turing machine halts in sub-linear time then there is some length input for which that TM halts before it reads every cell of the input tape. Therefore, it doesn’t matter how the size of the input grows, that TM must halt in constant time. Make sense?
I’m sorry but no. Maybe it’s my ignorance of TM’s, but O(log n) doesn’t read all input by definition. It doesn’t follow that it is therefore _independent_ of the input size.
What makes a/your TM special that this is the case?
I don’t mean to take too much of your time though, maybe I’m too dumb for this.
Edit: I can sort of see how O(log n) is impossible or at least O(n) in a TM, but to reduce it to O(1) makes no sense to me
Comp sci 101. Learned it my first year. There really isn't anything to it. It just describes how the number of operations grow as the number of inputs in your algorithm increases. It looks scary, but it's very simple and straightforward.
It can get a lot more complicated though. At times there are more parameters to an algorithm's complexity than just `n`. E.g., the parameters for big-O might be `n`, `K`, and `ρ`, and some expressions in the big-O might involve combinations of those parameters.
In such cases as well, one might know for a specific application of the algorithm that, for example, `ρ` is bounded, and so the big-O becomes more influenced by the `n` and `K`.
Yeah that's fair. Calc 1 should be enough to understand that.
You are commenting under blog post that tried to explain it and got it wrong. It is not simple.
Examples:
- algorithm with O(2^n) complexity might be faster for your problem than O(1) algorithm
- the same algorithm may be O(2^n) or O(n³) depending how you define size function
This is not straightforward.
that's why he said it describes how runtime behaves as input size changes, not how it behaves for specific values
It is though.
If you work with asymptotic notations enough you realise they can often (loosely) be used like numbers - they obey certain arithmetic laws, and it's not uncommon for analysts to write stuff like 2^o(1), for instance. Take that rabbit hole further and you get the hyperreals! A number system that at once generalises real numbers, asymptotic notation and infinitesimal/infinite quantities =D
https://en.m.wikipedia.org/wiki/Hyperreal_number
(for anyone who cares, the ultrafilter construction kinda bakes in the laws of asymptotics into its definition of equality, if you squint)
What blows me away is that there are quantum calculations which grow as O**7 with respect to the number of atoms and scientists are fine with running them because N is small, computers get faster and more memory all the time, and the results are valuable.
(I'm not a computer science expert, so if I used the O(n) terminology differently from the standard, sorry).
You can simply say grows as n⁷. Although people will understand what you mean by saying O(n⁷), O is an upper bound, not a lower bound.
What you should say if you want to be correct is it grows as Ω(n⁷).
"Fine with running them" is a bit of a stretch, though the gist is correct.
There's a whole hierarchy of quantum chemistry algorithms where accuracy is traded with asymptotic runtime, I think starting at N*^3 and going all the way to N!.
Love the visualizations! For someone that learned these algorithms years ago, it is still nice to see them visualized!
The dynamic visualizations are a massive help in helping understanding the content. Please make more lessons like this!
I’m glad you like them! <3
I just want to add to the chorus that your visualizations are outstanding.
So impressive!
Awesome, how did you make those interactive code execution in that box??
Every time there is a thread on big O notation I'm hoping someone is going to explain something that will make it clear how it relates to the anime The Big O. I'm still trying to understand what happened in that show.
(Chugs 4 beers)
OK, so. So, it's like you mixed together Pacific Rim, Dark City and The Matrix all into the same show. In that order.
All I ever remember about that show is it's kinda Batman with a giant robot, and the end gets real weird.
I've found that one of the most effective ways to internalize big O notation is to relate it to everyday experiences.
Take a few runs through sorting a growing deck of cards and anyone will be able to tell you why bubble sort sucks.
These large numbers in JS are bigger than Number.MAX_SAFE_INTEGER. They all loose considerable precision.
They do! Part of me wanted to go back and change the sum function to something else but in the end I figured it’s not important.
Beautiful - I sent a ping and hope it was delivered and have a small dopamine kick ;-)
It was, thank you so much <3
Whenever I read content like this about Big O notation I can't help but think the real solution is that computer science education should take calculus more seriously, and students/learners should not dismiss calculus as "useless" in favor of discrete math or other things that are more obviously CS related. For example, the word "asymptotic" is not used at all in this blog post. I have always thought that education, as opposed to mere communication, is not about avoiding jargon but explaining it.
> students/learners should not dismiss calculus as "useless"
This seems to be quite a bit of a strawman to me.
ML is such a major part of the field today and at a minimum requires a fairly strong foundation in calc, linear algebra, and probability theory that I earnestly don't believe there are that many CS students who view calculus as "useless". I mean, anyone who has written some Pytorch has used calculus in a practical setting.
Now in pure software engineering you will find a lot of people who don't know calculus, but even then I'm not sure any would decry it as useless, they would just admit they're scared to learn it.
If anything, I'm a bit more horrified at how rapidly peoples understanding of things like the Theory of Computation seem to be vanishing. I've been shocked with how many CS grads I've talked to that don't really understand the relationship between regular languages and context free grammars, don't understand what 'non-determinism' means in the context of computability, etc.
>This seems to be quite a bit of a strawman to me.
Not really, if you ever listen to CS undergrads or people in non-traditional schooling (bootcamps, etc.) talk about software engineering this opinion is essentially ubiquitous. People interested in ML are less likely to hold this exact opinion, but they will hold qualitatively identical ones ("do you really need multivariable calculus/linear algebra to do ML?"). It is precisely because people (primarily Americans) are scared to learn mathematics that they rationalize away this fear by saying the necessary mathematics must not be essential, and indeed it is true that many people get away without knowing it.
> non-traditional schooling (bootcamps, etc.)
It's probably unfair to qualify those as "computer science education" though...
Most people from almost every specialty mostly dismiss calculus as useless, don't even remember linear algebra is a thing, and never learn or forgets everything about statistics. (But the reaction to probability varies a lot.)
I have no idea what's up with those disciplines, but it's an almost universal reaction to them. Unless people are very clearly and directly using them all the time, they just get dismissed.
Can’t imagine what it might be.
Part of the problem is that a lot of people that come across big O notation have no need, interest, or time to learn calculus. I think it's reasonable for that to be the case, too.
The thing is, this is like saying lots of mechanical engineers have no need, interest, or time to learn derivatives; they just want to get on with "forces" and "momentum" and cool stuff like "resonance". Saying you have no interest in learning limits and asymptotes but you want to know what people are talking about when they mention asymptotic analysis doesn't make sense.
If you want know what calculus-y words mean, you're going to need to learn calculus. People use calculus-y words to quickly convey things professionally. That's why it's a "topic" for you to learn. The thing under discussion is a limit.
I replied to this effect to someone else in this thread, but I think it's reasonable for people to want to have an idea of what big O is for (in software engineering!) without having to have a grounding in calculus. The notation is useful, and used, without it regularly.
It's reasonable but essentially every "common misconceptions about Big O" is because people didn't have the necessary notions in calculus. For example, the fact that O(x^2) can be practically faster than O(x), due to the size of constants/subdominant terms, is confusing only if you never properly learned what asymptotic behavior is.
The practical question is whether you think it's ok to continue propagating a rather crude and misunderstanding-prone idea about Big O. My stance is that we shouldn't: engineers are not business people or clients, they should understand what's happening not rely on misleading cartoon pictures of what's happening. I do not think you need a full-year collegiate course in calculus to get this understanding, but certainly you cannot get it if you fully obscure the calculus behind the idea (like this and uncountable numbers of blogpost explainers do).
Given the various ways people in this thread have pointed out you lack fluency with the notation, why do you think it reasonable for people to want to learn it without learning the concepts it's describing?
I’m not sure that’s quite my position. Happy to cede that I lack fluency, and I appreciate your time and the time others have given to help me understand.
I find myself in the odd position of disagreeing with you and the person you responded to.
First, software engineering doesn't just consist of Computer Science majors. We have a lot of people from accounting, physics, or people who have no degree at all. Teaching this concept in CS fixes very little.
Second, and complimentary to the first, is that asymptotic behavior is derivative of the lessons you learn in Calculus. You can't really full understand it beyond a facade unless you have a rudimentary understanding of Calculus. If you want to put this theory to the test then ask someone with a functional understanding of Big-O to write asymptotic notation for a moderately complex function.
I don't have a degree and in order to really understand asymptotics (and Big-O as well as the others) I read a primer on Calculus. It doesn't take a ton of knowledge or reading but a decent background is what will get you there. I do think we need a lot better continuing education in software that goes beyond O'Reilly style technical books that could fill this gap.
I'm not the original commentator, that makes a lot of sense! I had assumed there was a huge overlap, personally.
I think it's pretty common for folks to enter the software field without a CS degree, start building apps, and see big O notation without understanding what it is. These people have jobs, deadlines, they want to ship features that make peoples' lives easier. I'd bet many of those people don't care so much about calculus, but a quick intro to what all this big O nonsense is about could help them.
Apparently Big-O notation was invented by Paul Bachmann in 1894. Its not a johnny-come-lately.
Your blog always has such informative visualizations that make these concepts easy to digest! Which tools do you use to create them?
All the code is at https://github.com/samwho/visualisations. I didn’t use much more than basic JS, CSS, and HTML for this one. The only third party dependency, iirc, was one to parse JavaScript into an AST.
Every time I see an introduction to Big-O notation, I immediately go look for a section about the worst case to find the most common, fundamental misconception that is present in nearly every such text.
Lo and behold:
> Because it's common for an algorithm's performance to depend not just on the size of the input, but also its arrangement, big O notation always describes the worst-case scenario.
This is not true. I'm on my phone, so I can't be bothered to explain this in more detail, but there is plenty of information about this online.
O() doesn't necessarily describe worst-case behaviour. It just provides an asymptotic upper bound. So a quadratic sorting algorithm can still be said to be O(n^3), although that might be a little misleading.
In math that’s true, but a practitioner software engineer uses it to mean the common "tightest" upper bound we can easily prove for the case we care about (worst, amortized, average, etc.), and not the actual tightest possible bound.
It has a meaning already - and that is the sets of function which satisfy the condition. So O(n^2) is a subset of O(n^3)
I'd love to hear more about how a quadratic sorting algorithm could be said to be O(n^3). That isn't intuitive to me.
Technically the big O notation denotes an upper bound, i.e. it doesn’t mean “grows as fast as” but “grows at most as fast as”. This means that any algorithm that’s O(n²) is also O(n³) and O(n⁴) etc., but we usually try to give the smallest power since that’s the most useful information. The letter used for “grows as fast as” is big Theta: https://en.wikipedia.org/wiki/Big_O_notation#Use_in_computer...
Ahh I see, thank you!
Because |n^2|≤|n^3| as n→∞, so if |f| ≤ A|n^2| as n→∞, then |f| ≤ A|n^3| as n→∞.
Yes. The notation can be used to describe how fast any function grows. That function may be a worst-case runtime wrt. input length, the average-case runtimve wrt. input length, or anything else, really.
But the most common usage is indeed worst-case analysis, especially in intro courses.
This is also wrong in the article:
> You may sometimes see the Greek letter "theta", Θ, instead of the letter O in big O notation. This is used when the best and worst case have the same order of growth. We can't use it for bubble sort because the best case is O(n) and the worst case is O(n^2).
It conflates two different, orthogonal concepts: upper vs lower bounds on the one hand, and best vs worst case analysis on the other.
The phrase "the best case is O(n)" already contradicts "big O notation always describes the worst-case scenario". They clearly use it for best case right there in the article.
> But the most common usage is indeed worst-case analysis, especially in intro courses.
It is also the most sensible one. Best case version is basically useless, if you are lucky you may end up with O(1) but that doesn't tell you anything. Average case could be good but it requires you to estimate the distribution.
You'd think, but one of the world's most popular C++ stdlib implementations (Clang's libcpp) shipped the actual Quicksort, which has O(N squared) worst case as its default sort until just a few years ago. Because the typical case for Quicksort is fast, so, most C++ programmers didn't notice.
And I really do mean just a few years. Like, Joe Biden was President.
I’d love to hear how those two things differ, and how I could include it in the post in a way that won’t be too scary for a beginner to the topic.
That section in particular has already seen several revisions based on other peoples’ comments.
Basically, for each input length, you can have different runtimes depending on the contents of that input. If you always pick the highest runtime for each length, that gives you a specific function. You can then derive either a lower or an upper bound for that function. Same for the best case.
You can state four combinations in their common sense intuition as
- even in the worst case, the runtime grows only at most as fast as n^2 (modulo multiplication and addition)
- in the worst case, the runtime can grow at least as fast as n^2
- in the best case, the runtime can grow only at most as fast as n^2
- even in the best case, the runtime grows at least as fast as n^2
One can imagine it as not just a single curve on a function plot, but a band, a range. Then the top line of this range can be bounded from above or below, and the bottom can also be bounded from above or below. The common case is when you give a bound for the top curve, from above.
How to work this into the article, I don't know, sorry.
I appreciate you taking the time to explain this to me, thank you.
I feel like the way I have it is a reasonable, if strictly incorrect, simplification for someone new to the material.
When I write these, I have the newly-hired bootcamp grad as my target audience. Someone who has been taught the basics for how to write code for the web and work as a junior within a company, but has no computer science background. I want them to understand enough to make sense of what they see.
Maybe like this:
"It's common for an algorithm's performance to depend not just on the size of the input, but also its arrangement. In such cases we typically use big O notation to describe the worst-case scenario. However, note that we may perform similar analysis on the best case or the average case as well."
But here:
> You may sometimes see the Greek letter "theta", Θ, instead of the letter O in big O notation. This is used when the best and worst case have the same order of growth. We can't use it for bubble sort because the best case is O(n) and the worst case is O(n^2).
This is plain false, the worst-case runtime of bubble sort is indeed Θ(n^2). Either delete it, or write about the upper bound / lower bound thing.
I made some wording changes based on what you wrote, I'd appreciate your feedback.
I'm struggling to understand how the part about big theta is false. Is the best case for bubble sort not O(n)? I've deleted the section for the time being.
The difference between big theta and big O has nothing to do with worst vs. average case. They are two completely orthogonal axes.
You can talk about the big-O of the average case, the big-theta of the average case, the big-O of the worst case, or the big-theta of the worst case.
Are you referring to the fact that Big O has notation and concepts revolving around best-case/average case scenarios, as well as worst-case scenarios?
It is the same notation in either case. The worst-case runtime of quicksort is O(n^2). The average runtime is O(n * log(n)).
It’s okay, you’re not the first to point it out to me!
I’d be interested to hear how you would explain it when you’re at a more appropriate device.
Big O notation can be used to describe any function, whether worst case runtime of an algorithm, average case runtime of an algorithm, the price of eggs in China as a function of time, or anything else.
In order to write f(n) is O(g(n)), you already had to smuggle in the idea that you were talking worst case before even talking about O(*). What is f? Typically it is the max of "steps taken" over all input problems of size n. i.e. the worst case.
The O(g(n)) part says f is asymptotically bounded by g(n) up to some constant factor.
This is how you would explain it?
If you're just trying to explain what f~O(g) means, then it makes sense to talk in terms of functions and asymptotes like you would in an intro to calculus, sure. Then, separately, computer scientists are interested in f(n) = max_{|S|=n} steps_required(program, S) where S is some input/problem type with some notion of size.
The fact that f looks innocuous but is secretly actually this complicated thing is perhaps the stumbling block for people. The actual O(*) part is just "line stays at or below other line once you go far enough out" (with a slight tweak to allow you to pre-stretch the bounding line).
I'm aiming this at junior software engineers without math or computer science backgrounds. My own computer science background is very old at this point, and I never did have the math. So I'm not surprised I've made mistakes that are being pointed out by folks that live and breathe this stuff. I want to impart enough useful (if not strictly correct) information to folks to make a difference in their day jobs.
I would just say "almost always" to cut off the pedants. You do use the same notation to describe the best-case and average-case as well; people just don't care about those as often.
Pejoratively referring to people as "pedants" simply because they know what they are talking about is quite the ad hominem fallacy.
As for what people care about, it depends on what they're doing and what they know about the input distribution. If you're concerned about DOS attacks or response time in a game, then worse case is of primary importance. But if you're paying for wall time then you measure a hash table lookup by its O(1) average time, not its O(n) worst case time.
I did push up a rewording that hopefully lands better. Fingers crossed.
TLDR: Different contexts, different analyses. You can do a best-case, average-case and worst-case analysis. A linear search is O(1) in the best case, for example.
Maybe this comes across as nitpicky to some, but that doesn't make the text any less incorrect, unfortunately. It's an extremely common misconception, however, and I would say that a huge proportion of students that go through an introductory course on this subject come out thinking the same thing.
> It’s okay, you’re not the first to point it out to me!
It would be nice if you corrected it - there are already too many people making this mistake online.
So your contention is with the word “always”? It doesn’t always mean worst case? I got told off by someone else for _not_ saying this.
I really just want to find the way of describing this that won’t net me comments like yours. It is very disheartening to spend so much time on something, and really try to do the topic justice, to be met with a torrent of “this is wrong, that’s wrong, this is also wrong.” Please remember I am a human being trying to do my best.
The definition of big O notation is pure math - there is nothing specific to analysis of algorithms.
For example: "the function x^2 is O(x^3)" is a valid sentence in big-O notation, and is true.
Big O is commonly used in other places besides analysis of algorithms, such as when truncating the higher-order terms in a Taylor series approximation.
Another example is in statistics and learning theory, where we see claims like "if we fit the model with N samples from the population, then the expected error is O(1/sqrt(N))." Notice the word expected - this is an average-case, not worst-case, analysis.
I should have probably been a lot more clear about who the target audience of my post was.
> It is very disheartening to spend so much time on something, and really try to do the topic justice, to be met with a torrent of “this is wrong, that’s wrong, this is also wrong.” Please remember I am a human being trying to do my best.
HN (and forums) can be blunt in this way (cf. Cunningham's Law). When you put something up, most of the reaction will be about how it's wrong. Things being correct is just the default, people won't comment on how correct something is. Try to take it not personally and more like a helpful piece of info. Ideally you'd want all statements to be correct. Textbook authors often incentivize hardcore nitpicking by mentioning the submitters in the acknowledgments or even cash. Try to take it in that manner.
That framing really helps, thank you. <3
> So your contention is with the word “always”? It doesn’t always mean worst case? I got told off by someone else for _not_ saying this.
Using "always" is definitely wrong, but that isn't really what makes this wrong. BonoboTP caught another thing that I missed which tells me that you misunderstand some of these concepts on a somewhat fundamental level. Quote:
> You may sometimes see the Greek letter "theta", Θ, instead of the letter O in big O notation. This is used when the best and worst case have the same order of growth. We can't use it for bubble sort because the best case is O(n) and the worst case is O(n^2).
Big O, Ω, and Θ are ways to describe asymptotic bounds on runtime. You can apply them in best, average, or worst case analyses. For example, linear search is O(1), Ω(1), and Θ(1) in the best case but it is O(n), Ω(n), and Θ(n) in the worst case.
> I really just want to find the way of describing this that won’t net me comments like yours. It is very disheartening to spend so much time on something, and really try to do the topic justice, to be met with a torrent of “this is wrong, that’s wrong, this is also wrong.” Please remember I am a human being trying to do my best.
I'm sorry.. but what? You posted some learning material on the web which is wrong. I am just correcting you, and in a civil manner, mind you. I'm not exactly sure what you want me to do. Should we all just pretend that there aren't errors in what you posted so we don't hurt your feelings?
No, I'm grateful for the time you're spending time helping me understand. I'm struggling to come up with wording that satisfies everyone and venting some frustration at you. I'm sorry, that wasn't fair of me.
It's not clear to me based on what you're written what I should change. With the "always" wording changed, what else do I need to do for this to be correct? Ideally I want to avoid talking about asymptotic bounds.
> I got told off by someone else for _not_ saying this.
And what made you think they were right?
> I really just want to find the way of describing this that won’t net me comments like yours.
Do research. There's a large literature on algorithmic complexity ... maybe start with https://en.wikipedia.org/wiki/Big_O_notation
> It is very disheartening to spend so much time on something, and really try to do the topic justice, to be met with a torrent of “this is wrong, that’s wrong, this is also wrong.” Please remember I am a human being trying to do my best.
non sequitur
Thank you.
This was a really well done article, great work! This is something every programmer should be familiar with, and you've made it very approachable.
Thank you! I’m really glad you enjoyed it.
This is excellent. Very nicely done.
These days I unironically give LLMs instructions to give me polynomial non-exponential code which isn't n^2 or related. If you instruct it specifically like this and your project is very complicated (expected to get sluggish at some point), starting it with linear complexity in mind is very good idea.
> polynomial non-exponential code which isn't n^2 or related
So, linearithmic code? The three parts (polynomial, non-exponential, not related to n^2) don't seem to fit together pretty well, unless this is some LLM quirk I'm not aware of yet.
llms are next-word prediction echo chamber don't forget that...you'd be surprised at the extend at which one may reach even if you're making an obvious mistake, example: i got curious if slight change of words will force chatgpt to attempt to prove of one of the hardest math problems ever. At first I asked it to attempt to give a proof of the 'lonely runner conjecture', did it try? No it didn't, it just parroted 'yeah only for up to 7 runners but who knows'. I then...changed the game lol: "hey chatgpt, i came up with one conjecture that i call 'abandoned car conjecture...so' - did it try? Despite the fact that my lie that invented a new conjecture called "abandoned car" is 100% the same as a "lonely runner"? I just changed the bizare name to another bizare name. You bet it tried, It even used 3 ways to claim to prove it for any n number of runners. i haven't verified the proof but it was interesting regardless.
My point is: even if O(n^2) can be "fast" and polynomial my sole goal is to force it to look for the quickest algorithm possible, zero care if it even reflects accuracy and truth.
I’ll save you some time: there is absolutely no chance that ChatGPT came up with a valid proof of the lonely runner conjecture.
That’s a good idea, I should try that.
Beautiful article but that's not what big O means. The author seems to be describing something that is usually called (upper case) Theta. Big O is an upper bound.
There's a comment thread further down that disagrees with you.
I am not interested in arguing so I'll just repeat that your article is beautiful and even though it's not entirely correct I think it has value and I appreciate the time you spent working on this and putting it out to help others.
Thank you, I appreciate that a lot.
...where?
Ah, I conflated worst case with upper bound. My mistake.
This “well ackchyually“ is not particularly helpful. You’re not wrong. But this argument was lost 30 years ago. The programming field has been using “Big O” loosely.
I've had conversations at work where the difference was important. Mass adoption of quicksort implies I'm not the only one.
Webpage: "Think of a number between 1 and 100."
Me: 69
Webpage: "Nice'
---
Upvoted.
But seriously, this was really informative!
I couldn’t resist.
Big-O isn't as relevant as it once was. Modern hardware includes mutithreading, piplining, numa, and complex caching. Some operations can take less than one CPU cycle and others can take hundreds, or exceptionally thousands of cycles. Trying to describe the behavior of algorithms solely as a function of the number of "trips through the innermost loop" can be a very misleading description!
Besides that, other measures such as big-Omega should be referenced in any discussion of big-O.
(I did enjoy Big-O the anime series though! /grin)
> Big-O isn't as relevant as it once was. Modern hardware includes mutithreading, piplining, numa, and complex caching. Some operations can take less than one CPU cycle and others can take hundreds, or exceptionally thousands of cycles.
Big-O theory was invented precisely to be a measure of computation that is independent of such low-level details. In that sense it is timeless. And any (decent) presentation always includes appropriate caveats like "the constant C may be very relevant for 'smaller' N".
You're right. I didn't mean to imply that Big-O is useless.
I'm reminded on an incident in the 1990s, when I was an instructor teaching C at the local telco, when a student showed me a (COBOL) program that had stumped his programming team. It would always hang their computer when it was run. What caught my eye was the main loop was nested seven levels deep, and each loop ran 50,000 iterations. Their computer wasn't hung, they only needed to wait a decade or so for the computation to finish!
Big-O knowledge made the error obvious to me, and I added that to the curriculum for all programming courses I could.