>my professor projected video of himself writing on a piece of paper before a very large auditorium, and that guy was left-handed, and so his hand would cover his notes for like the entire time and it was impossible to see what he was writing. I only figured out that this was why it was so unpleasant like halfway through the class.
So many of my college math classes had some version of this professor who took a fascinating subject like linear algebra, statistics or algorithms and made it into a slog. The fact that most stats is taught by getting students to just memorize random ideas rather than building up a holistic and intuitive view really is a travesty.
Also makes sense why so many people, even though they took stats in college, hav e such a poor understanding of probability.
The fact that most stats is taught by getting students to just memorize random ideas rather than building up a holistic and intuitive view really is a travesty.
We don't need first principals thinking for every thing. Granted, this comment is a bit vague, so I don't know how exactly you were taught or which type of class we're talking about, but generally, you can accept some axioms in applied mathematics classes. If we're talking the bare minimum classes like in the article (for apparently a business degree), this is likely a general applied prob/stat. Things tend to get more in depth with more advanced pure mathematics courses.
There's a middle path between rote memorization of outcomes, and building everything up from first principles. And I'm guessing it's probably what the parent poster had in mind.
A great statistics textbook along these lines is Principles of Statistics by MG Bulmer. It's one of those Dover classic textbooks that you can get for cheap. This book assumes you already know basic calculus and combinatorics. It then goes through a series of practical problems, and shows how you can use calculus or combinatorics to solve them. And, along the way, an intuitive and holistic perspective on statistics begins to form.
The overall effect is great. It's a lot like a 3blue1brown video series, only from the 1960s, and with problem sets.
I get where you're coming from, and obviously there are practical limitations on how deep one can and should go in an introductory class. But my recollection of AP Statistics 15 years ago is that, because the exam and therefore curriculum was so focused on running various tests on a TI-84, I learned way more about using this one specific graphing calculator than about statistics. I got a high score on the exam, but I never felt like I understood any of it until I got to college and took a statistics course that actually used calculus to show what was going on.
Intuition isn’t synonymous with working from first principles. You can have a very intuitive understanding of something you only understand at a higher level. Indeed, this is true for many applied mathematicians.
We don't need first principals thinking every time, but having an understanding of why you can't just test 100 variations of your hypothesis and accept p=0.05 as "statistically significant" is important.
Additionally it's quite useful to have the background to understand the differences between Pearson correlation and Spearman rank, or why you might want to use Welch's t-test vs students, etc.
Not that you should know all of these things off the top of your head necessarily, but you should have the foundation to be able to quickly learn them, and you should know what assumptions the tests you're using actually make.
the reality is the vast majority of students have no interest in seeing the beauty of any mathematical or technical field.
they want the professor to tell them the passwords they need to memorize. then on the exam they repeat the passwords and get an A. this is understandable though because they are under a lot of pressure and these days nobody can afford to fail.
if the teaching style deviates from this they become annoyed, leave poor course reviews, and that professor has a hard time.
the professor could overcome this by being "good" -- when the students say a professor is "good" they mean it is easy to get an A.
In health care there have been studies that find an inverse correlation between patient satisfaction scores and patient outcomes. I don't know if the same is true in education, but I'd believe it.
Left-handed I could handle. Opaque accents seem to warrant some sort of consumer protection action by authorities.
> I think if I were in charge of presenting this material to students I'd do it by introducing the concept of memorylessness and by showing how good memorylessness is, how many wonderful things you can do with it. And then one day I'd be like, "well, it sure would be nice if we had any distributions like that!" and then whirl around with my piece of chalk to deliver the exciting news that we do. Exactly one, in fact.
Incidentally, this also goes for the determinant of a matrix. It's got a lot of neat and desirable properties, and it turns out to be the only thing that does. When it was finally taught to me this way, those weird algorithms we use to compute this seemingly-arbitrary number finally made sense. (And, in fact, this is the easiest way to prove that all those algorithms have to be computing the same seemingly-arbitrary number. Because the algorithms preserve the properties that define The Determinant, and The Determinant is the unique thing that preserves all of those properties, so must those algorithms all be computing The Determinant, no matter how different they might look.)
So I can vouch that this style of explanation really does work, at least for people like me.
Any sources that discuss this viewpoint wrt the determinant? Seems I'm still at the "seemingly-arbitrary number" stage.
If you like textbooks, try Section 1.3 of Artin's Algebra (find it at https://media.githubusercontent.com/media/storagelfs/books/m... among others).
Do be warned that Algebra is a... high-octane... text written for serious math students and can be... powerful.
The entire 3blue1brown series[0] on linear algebra is well worth watching, it has really intuitive graphical explanations of a bunch of concepts. Here's the one on determinants in particular[1].
TL;DW the determinant represents how much you scale the area/volume/hypervolume (depending on dimension) of a shape by applying a matrix transformation to each point.
[0] https://www.youtube.com/watch?v=fNk_zzaMoSs&list=PLZHQObOWTQ...
[1] https://www.youtube.com/watch?v=Ip3X9LOh2dk&list=PLZHQObOWTQ...
Matrix multiplication and the Gaussian distribution are also like that. A lot of things are like that. I really dislike that this approach is not a core tool for teaching in math.
Can you give some examples for algorithms which aren't obviously logically connected but use the determinant for its nice properties?
Two ways to compute the determinant:
1) product of eigenvalues
2) cofactor expansion (e.g., https://textbooks.math.gatech.edu/ila/determinants-cofactors...)
The second is the one I was taught first, in the context of linear system solution, but it does not seem obvious that it would be related to the product of eigenvalues. It’s kind of a weird pile of arithmetic manipulations.
Poisson processes are neat, they always end up working nicely in ways that many other distributions/processes very much don't.
Splitting a Poisson process into two lower rate processes is a neat trick. Even better is that you can do the same to convert a Poisson process into one with a variable rate, provided that rate is lower than the original (original may be variable as well).
And the fact that the partial sums of a bunch of exponential distributions results in the same distribution of values as picking Poisson(lambda * time) values uniformly at random is pure magic.
Another neat property of Poisson processes is that when raced against one another, they win in proportion to their underlying rates. This property is the basis of a clever random sampling algorithm that works well in SQL:
SELECT *
FROM Population
WHERE weight > 0
ORDER BY -LN(1.0 - RANDOM()) / weight
LIMIT 100 -- Sample size.
For an explanation of how it works, see
https://blog.moertel.com/posts/2024-08-23-sampling-with-sql....(author of OP) That post of yours was actually what got me tooling around with this stuff again :) it's a really excellent one
Thanks! That was very kind of you to say. Whenever I write stuff like that, I wonder, "Does anyone find this useful?" It helps to hear every once in a while that the answer is sometimes yes.
Nice article!
This was a lot of fun. By skipping over the formulaic details and proof and explaining the lay of the land, it makes a good starting point to explore further.
Either mathematically or just some python dice rolls.
Really good. Same ethos as fast.ai courses.
Finally that you can add tbe /3 and /5 to get a /8 distribution makes intuitive semsw to me. / means lambda.
This is because if you have people arriving to a train station you could split them by eye colour and there is no reason a particular eye colour cause there to be dependencies. (Assuming spherical cows: families arriving excepted. Assume it is downtown rush hour)
I doubt that this model of a queue and the processing of its items by overlaying two independent poisson processes is statistically valid (one for items arriving, the other for processing those items). The processing starts only after the respective item arrived in the queue - so it's not independent - and this needs to be modeled accordingly or it requires a proof that this is equivalent to the suggested overlaying approach - that wouldn't be obvious or trivial.
As an introduction to the topic it functions very well though. It doesn't matter whether it's valid or not. In fact, I would say that diving immediately into the validity of some bullshit independence assumptions and other nonsense is where you lose most students (it definitely lost me).
I think flawed examples lead to a great way of scaffolding towards the "true" nontrivial answer in a teaching setting at least... I am still exceptionally bitter at how I was taught and forced to learn stochastics and it was very much through a purely theoretical, proof driven, abstract lens with very crappy examples that were more of an afterthought... because of course the theory is all you need to make sense of it!
This bullshit independence is one of the most fundamental and important concepts of probability theory and that other nonsense is also relevant cause especially with statistics it's easy to concoct a model which only seems to be correct ... but in fact isn't.
Exactly my point, hence why at least from the learning side it works well towards making something that is actually correct. What I was aiming at is that too often people teaching this stuff get lost in the weeds without any clear motivation for what is actually being taught.
>The processing starts only after the respective item arrived in the queue
Further you can't process anything if the queue is empty. So it breaks down in this most obvious of cases.
It might be trivial if you consider it a window of an infinite process.
the infinite process only solves the problem of having a processed event happen before anything to process has entered the queue - as far as I can tell.
I can't help but wonder if real systems have additional (perhaps subtle) signals, which can be provided to a neural network, which then outperforms these simple algorithms.
For example, customers arrive at the grocery store in clusters due to traffic lights, schools getting out, etc. Even without direct signals, a NN could potentially pickup on these "rules" given other inputs, e.g. time of day, weather, etc.
?
> For example, customers arrive at the grocery store in clusters due to traffic lights, schools getting out, etc.
You're kind of just describing seasonality components and exogenous regressors; RNNs do actually function quite well for demand forecasting of this type but even simple models (Holt-Winters or a Bayesian state space model or something) can be really effective
Lgtm; A NN is literally a probability distribution producer.
You can't combine rate of arriving with rate of leaving, can you? Leaving is dependent on arriving, so the latter distribution is dependent on the former.
little’s law is quite instructive here. assuming that a system is stable, you can.
How does one know the application of such a Math concept for a particular software problem? I couldn't guess in a million years.
Other option as someone who didn’t go to school. Learn a bunch of methods. Learn a bunch of problems. Match them up.
“A bunch” is actually a small enough number that brute forcing it works fine. If you have 20 methods in your head, you’re far ahead of most people.
You don’t even have to have them in your head. There’s an underutilized technology called a “list” where you just write things down one after another. Go down your list of methods. You’ll be able to throw out 90% of them because of something obvious.
You could be up to speed on this in a day of clicking though Wikipedia if you have some stats knowledge. Two days of you don’t. It’s how I did it myself, for whatever that’s worth.
you go to school for it. Stats, applied mathematics, operations research, industrial engineering.
I went for industrial engineering. we learned the math as pure math, then the math as free language problems, then how to identify and collect data to identify their attributes, then simulate and verify those processes, then test for variations in the underlying assumptions of those processes.
They never really did teach me to code well in a language that was useful, I had to pick that one up myself.
hey this might be a stupid detail in implementation, but if the poisson simulation of arrival and completion are started cold, then random assignment of events could assign a completion before an arrival, or there could be more completions than arrivals? Arrivals should always be greater or equal to completions?
Edit: I remembered by stats class. merging poisson distributions requires Independent processes. since start and completion are dependent, its not mathematically valid.
but I am an engineer and it works, so whatever
my Msc thesis in 2004 was about this: a probabilistic model for last-recently used queues, based on poisson processes, for network packets flows. The model worked OK on a couple of datasets I could get at that time. I tried to publish the work but got rejected a couple of times, then I gave up. If anyone wants to read it (even try to publish) I am happy to share it.