• clircle 2 hours ago

    Does any living statistician come close to the level of Donald Rubin in terms of research impact? Missing data analysis, causal inference, EM algorithm, any probably more. He just walks around creating new subfields.

    • selimthegrim an hour ago

      Efron?

    • xiaodai 3 hours ago

      I don’t know. I find quanta articles very high noise. It’s always hyping something

      • jll29 3 hours ago

        I don't find the language of the article full of "hype"; they describe the history of different forms of imputation from single to multiple to ML-based.

        The table is particularly useful as it describes what the article is all about in a way that can stick to students' minds. I'm very grateful for QuantaMagazine for its popular science reporting.

        • vouaobrasil 3 hours ago

          I agree with that. I skip the Quanta magazine articles, mainly because the titles seem to be a little to hyped for my taste and don't represent the content as well as they should.

          • amelius an hour ago

            Yes, typically a short conversation with an LLM gives me more info and understanding of a topic than reading a Quanta article.

        • paulpauper 36 minutes ago

          why not use regression on the existing entries to infer what the missing ones should be?

          • light_hue_1 3 hours ago

            I wish they actually engaged with this issue instead of writing a fluff piece. There are plenty of problems with multiple imputation.

            Not the least of which is that it's far too easy to do the equivalent of p hacking and get your data to be significant by playing games with how you do the imputation. Garbage in, garbage out.

            I think all of these methods should be abolished from the curriculum entirely. When I review papers in the ML/AI I automatically reject any paper or dataset that uses imputation.

            This is all a consequence of the terrible statics used in most fields. Bayesian methods don't need to do this.

            • jll29 3 hours ago

              There are plenty of legit. articles that discuss/survey imputation in ML/AI: https://scholar.google.com/scholar?hl=de&as_sdt=0%2C5&q=%22m...

              • light_hue_1 2 hours ago

                The prestigious journal "Artificial intelligence in medicine"? No. Just because it's on Google scholar doesn't mean it's worth anything. These are almost all trash. On the first page there's one maybe legit paper in an ok venue as far as ML is concerned (KDD; an adjacent field to ML) that's 30 years old.

                No. AI/ML folks don't do imputation on our datasets. I cannot think of a single major dataset in vision, nlp, or robotics that does so. Despite missing data being a huge issue in those fields. It's an antiqued method for an antiqued idea of how statistics should work that is doing far more damage than good.

              • DAGdug 2 hours ago

                Maybe in academia, where sketchy incentives rule. In industry, p-hacking is great till you’re eventually caught for doing nonsense that isn’t driving real impact (still, the lead time is enough to mint money).

                • light_hue_1 2 hours ago

                  Very doubtful. There are plenty of drugs that get approved and are of questionable value. Plenty of procedures that turn out to be not useful. The incentives in industry are even worse because everything depends on lying with data if you can do it.

                  • hggigg 2 hours ago

                    Indeed. Even worse some entire academic fields are built on pillars of lies. I was married to a researcher in one of them. Anything that compromises the existence of the field just gets written off. The end game is this fed into life changing healthcare decisions so one should never assume academia is harmless. This was utterly painful watching it from the perspective of a mathematician.

                    • nerdponx an hour ago

                      I assume by "in industry" they meant in jobs where you are doing data analysis to support decisions that your employer is making. This would be any typical "data scientist" job nowadays. There the consequences of BSing are felt by the entity that pays you, and will eventually come back around to you.

                      The incentives in medicine are more similar to those in academia, where your job is to cook up data that convinces someone else of your results, with highly imbalanced incentives that reward fraud.