• defrost 4 hours ago

    For any, like myself, wondering "Who is Ben Welsh" ?

      Hello. My name is Ben Welsh. I'm an Iowan living in New York City.
    
      I am a reporter, an editor and a computer programmer. My job is to use those skills, together, to find and tell stories.
    
      I work at Reuters, the world's largest multimedia news provider, where I founded the organization's News Applications Desk. In that role, I lead the development of dashboards, databases and automated systems that benefit clients, inform readers, empower reporters and serve the public interest.
    
      [...]
    
    ~ https://palewi.re/who-is-ben-welsh/
    • simonw 3 hours ago

      Ben is one of my favorite people in the world of data journalism. He's the author of many excellent training courses in the field, including:

      - https://github.com/palewire/first-python-notebook

      - https://github.com/palewire/first-web-scraper

      - https://github.com/palewire/first-graphics-app

      • dang 2 hours ago

        (Submitted title was "Ben Welsh made an index of all FiveThirtyEight articles on the Internet Archive" - we've since changed it)

        • defrost 2 hours ago

          Cheers for the clarity, that'll help me look less weird wrt above comment to future historians of archived HN threads :-)

          TBH I enjoyed looking up Ben and finding out what he's about and done in the past far more than I did just knowing there's a 538 archive on IA.

          • Barbing 2 hours ago

            What do you think was perceived wrong with the old title?

            • defrost 2 hours ago

              At a guess (my Telepathy/IP is weak today, I'm not reading dang at usual strength) .. the initially submitted title was "invented" for submission and didn't match the content title.

              HN veers toward "the guts of the content w/out decoration" - limited additional information, framing, weasel words, perceived slanting, etc.

              It's uncommon to name an author unless the author themself is an important part of "the story".

              I personally have no issue with the original title, however it's not really for me (non US citizen) to judge whether the reporter in question has a name / identity that carries weight in US IT circles.

              • Barbing an hour ago

                Insightful in spite of that difficulty :)

        • yogorenapan 4 hours ago

          Can't believe Ben Welsh is not Welsh, and FiveThirtyEight has nothing to do with Wales

        • nomilk 4 hours ago

          Couldn't figure out why archiving FTE aricles matters, but a quick search yields:

          > Thousands of FiveThirtyEight articles seemingly vanish from the internet

          https://www.editorandpublisher.com/stories/thousands-of-five...

          And discussions here on hn:

          ABC News has taken all FiveThirtyEight articles offline https://news.ycombinator.com/item?id=48152553

          Disney erased FiveThirtyEight (article by Nate himself) https://news.ycombinator.com/item?id=48197703

          • culi 2 hours ago

            Unfortunately most of the most important visualizations are broken in the archived version. Including the gun deaths visualization and I think the P-hacking interactive

            https://web.archive.org/web/20230205124354/https://fivethirt...

            It's kinda sad to know no one else will get to experience those interactive visualizations. Though its nice to see the approval comparison page still works

            https://web.archive.org/web/20241031232233/https://projects....

            • nl 4 hours ago

              This is because whoever owns Fivethirtyeight now (ABC?) deleted the whole archive of articles on the site.

              • bombcar 3 hours ago

                Don't we need more than an index of Archive.org because whomever controls the domain could robots.txt these out of existence if they wanted to?

                • ycombinete 3 hours ago
                  • zzo38computer an hour ago

                    The robots.txt file should be used to restrict (and, in some cases, slow down) crawling at the time it is being crawled, not for SEO or for restricting access to mirrors or for any other purpose. It should never apply retroactively. (Unfortunately it is sometimes used badly despite this.)

                    • Jiro an hour ago

                      People always use that link as reference to say that Internet Archive ignores robots.txt but it only actually says they are ignoring it for government sites. It suggests that they might do it for other sites in the future (of 2017), but does not actually say that that they have done it.

                      https://blog.archive.org/2018/04/24/addressing-recent-claims... which is a year later mentions that they have an automated process which is still following robots.txt for displaying old pages where the robots.txt was added later.

                      https://help.archive.org/help/using-the-wayback-machine/ does say they follow it for scraping, but this is phrased in such a way that would still be true for past sites whether or not they changed the policy. There is a page https://www.sysjolt.com/2021/archive-org-no-longer-honors-ro... which claims they don't follow it, but the site owner misspelled "robots" as "robot".

                  • Avicebron 4 hours ago

                    Bourdieu. The field has structure, the structure has logics, the logics shape what counts as a publishable story, a promotable journalist, a credible source, a "balanced framing".

                    • tantalor 4 hours ago

                      Please, say that again in comprehensible English.

                      • Avicebron 4 hours ago

                        The ownership relationship was always load-bearing? The journalism in this case was a tenant, I highly recommend that people promote forms of independent journalism?

                        EDIT: dude have you heard of the s in https, http://johntantalo.com gets flagged.

                  • arlattimore 3 hours ago

                    I'm not a soccer guy, but I still think the piece on Lionel Messi was awesome

                    https://web.archive.org/web/20140701122958/http://fivethirty...

                    • ChocMontePy 5 hours ago
                      • internet2000 3 hours ago

                        I'm seeing a lot about this. What makes this situation different than any other website going offline?

                        • patcon 3 hours ago

                          I think it's the fivethirtyeight of of historical significance, and Disney is one of the largest and wealthiest companies on the planet. So it's just kinda like "whoa, this is stratospheric negligence" or "whoa, what is the reason for this... assuming they are not idiots?"

                          • materielle an hour ago

                            Also, they don’t any plans for the IP, and Nate would’ve paid above-market rate just to take over and preserve the content for posterity. He estimates that they deleted 200,000 hours of human labor.

                            This is just some Disney suits being extraordinarily petty.

                        • 3eb7988a1663 3 hours ago

                          If I wanted to get the complete WARC archive of 538 - how do you do this in a friendly way? No interest in history tracking, just want the last available version from Internet Archive.

                          • stinkbeetle 3 hours ago

                            Those 2015-16 ones sure aged poorly, I'm reminded of this https://i.imgur.com/6Z9QQj3.jpeg

                            This is why people don't really buy the "but he had Trump at 30%, you just don't understand statistics" apologist line. Sure he hedged in the dying days of the campaign (a cynic might think to try to protect his credibility), but the tone overall was of a person who comprehensively failed to understand the mood of the country from beginning to end.

                            Which is a problem because these election predictions are not just pure "mathematical models" and "data driven" like 538 would have had you believe. What mathematical model should be used? What data should and should not be used? At some point those things are based on the modeller's understanding of reality.

                            • materielle an hour ago

                              He didn’t hedge at the end. Nate always writes the models before election season then doesn’t touch them apart from actual bug fixes. The model actually organically predicted 30%.

                              I still think that’s about accurate. Maybe it should’ve been 40%.

                              Everyone forgets that it was a pretty close election. Clinton could’ve won without the Comey announcement.

                              • stinkbeetle 36 minutes ago

                                I think he did hedge (or "strategically bug fix"). The prediction for Trump went from IIRC around 15 to 30 in the last week or so. It was a big swing, IIRC with a lot of waffle around why it happened but not a lot of verifiable fact.

                                > I still think that’s about accurate. Maybe it should’ve been 40%.

                                It wasn't accurate. This is something people misunderstand about these predictions. If the 2016 election was held 100 times, Trump would have won 100 times. It's not the same as rolling dice.

                                These election predictions don't say that. They say something like "the observations I have agree with scenarios that have Clinton winning, 70% of the time". Which is fine and correct as far as his data and model goes, but none of those scenarios were the reality he was trying to predict. They are all just figments of the model though. Getting down to the brass tacks, he predicted Clinton would win, and he was wrong.

                                Which is fine, we just can't know anything about his process from that failure. Certainly we can't conclude that it was "accurate", since it was not. If we had a good sample of elections where he used the same process and built up a good record then sure.

                              • f1ay 3 hours ago

                                I think Nate did a phenomenal job calling out pollsters in that time. Since 538 was predominately a poll aggregator that did tricky stats to rank the reliability of each poll. I remember specifically an interview with him griping about some of the unusual data he was seeing from pollsters that made it look like, and I quote, 'Someone has their finger on the scales'

                                • stinkbeetle an hour ago

                                  Perhaps critiquing statistical methods used by polling was something he was good at. I have no real opinion of his work there, which I didn't pay attention to.

                                  But predicting an election requires a lot more than polling datasets and statistics textbooks. That's the problem that he made himself out to be an election prediction wizard, but really that was off the back of his good prediction in quite a bland and conventional election.

                                  When things got slightly more spicy and reality diverged from his vaunted "models", his "data science" predictably fell in a heap. The worst thing is almost not even that he got it wrong, it's that he seemed incapable of recognizing that present reality was quite significantly different from the past data he had used to build his models. Even after being wrong in so many of these predictions. He just kept churning out these pieces about how Trump was probably finished this time.

                                  • bonsai_spool an hour ago

                                    Okay, this is clearly an LLM response, but for the sake of being polite:

                                    > But predicting an election requires a lot more than polling datasets and statistics textbooks. That's the problem that he made himself out to be an election prediction wizard, but really that was off the back of his good prediction in quite a bland and conventional election.

                                    > When things got slightly more spicy and reality diverged from his vaunted "models", his "data science" predictably fell in a heap

                                    The models were correct in two elections - arguably three because a 30% chance means that an outcome will occur in thirty times out of hundred. That is not zero.

                                    To the person who is running this LLM, please find better things to do with yourself.

                                    • stinkbeetle 32 minutes ago

                                      Why would you be polite to an LLM? Obviously you don't believe that yourself, you're just incapable of a coherent response to the post so the only thing you felt you could use were insults. How pathetic.

                                      • bonsai_spool 28 minutes ago

                                        > bviously you don't believe that yourself, you're just incapable of a coherent response to the post

                                        I definitely think a human was involved in signing up for the account and occasionally checks in.

                                        I think my response was plenty coherent.

                                        • stinkbeetle 20 minutes ago

                                          Doesn't fix your logic. Why would you feel the need to be polite to such a person? Absolutely pathetic. Or do you actually believe somebody is paid to use an LLM to make posts about Nate Silver on this forum? If so you have paranoid delusions.

                                          And you were incapable of addressing the substance of what I wrote.

                                          • bonsai_spool 8 minutes ago

                                            There was no substance to the text generated in the earlier comments. Good luck out there.

                              • ChrisArchitect 3 hours ago

                                Love Ben but title can simply be: Index of FiveThirtyEight articles preserved by the Internet Archive

                                • buildsjets 3 hours ago

                                  But that would be a false attribution. The Internet Archive did not create the index, Ben did. And the Internet Archive is not hosting the index, Ben is.

                                  • ChrisArchitect an hour ago

                                    Ah, yes, could be worded better, fairplay. Point is the Ben attribution isn't needed in that place to avoid unnecessary confusion about who that is etc.