• zombot 15 hours ago

    Right, stealing training data from others is OK, having it stolen from you is not. What else is new?

    • keyle 14 hours ago

      New logo every couple of years and Bob's your uncle.

      • ivape 10 hours ago

        X/Twitter has became extremely prohibitive with just about everything since Elon took over. Their API pricing was antagonistic toward even indie developers. Elon is not a generous guy.

        • djaychela 7 hours ago

          [flagged]

          • larrled 7 hours ago

            To be fair, I think he thought he could save us. And that it mattered. A narcissistic and paranoid view which happens to be shared by most everyone.

            • djaychela 6 hours ago

              Can you expand on that?as written I can't make much sense of your comment.

              FTR I don't think he thought he could save us, I think he thought he could do cool stuff (space and EVs) and now says climate change isn't as bad as he used to think (despite mountains of evidence to the contrary).

              • colejohnson66 6 hours ago

                If you repeat your lies enough, you'll end up believing them yourself. Especially if you surround yourself with yes-men. It's entirely possible Musk genuinely believed he was a savior of humanity.

                • djaychela 6 hours ago

                  I think there's a long way between what you're saying (which is true particularly as there seem to be a lot of thin skinned leaders of the tech industry) and ending up being responsible for teenage mental health crises, assisting genocide and destroying foreign aid.

                  You don't hear this sort of stuff about ebays founder, for instance.

                • undefined 6 hours ago
                  [deleted]
            • newsbinator 10 hours ago

              > Elon is not a generous guy

              Why would he be?

              • notsosureja1 7 hours ago

                Because it feels warm and fuzzy to be kind and empathic. Being hateful and greedy and letting avarice rule over your worldview is incredibly sad. But who am I to say.

                • foobarchu 3 hours ago

                  Maybe something to do with having built his fortune off the back of taxpayer subsidies?

                  • ivape 9 hours ago

                    It's kind of a "life arc" that gets fulfilled when you've done it all and have all the money in the world, and reach a certain age. It's a very traditional arc for a humane human being.

                    • thomasanders0n 4 hours ago

                      He still has a couple decades to go with his companies I would say.

                      • qeternity 9 hours ago

                        [flagged]

                        • ivape 9 hours ago

                          That API was already reasonable before he took over Twitter. It was prohibitively priced afterward. You are making arguments out of things where there is objective proof otherwise. Anyways, I think he cut aid programs and fired a bunch of people too. That's a whole nother' matter though (I'll drop the whole holistic argument).

                          For example the firehose/streaming API more or less require 5 grand a month, so off limits to a indie dev. Does he not even have solidarity with developers?

                          • only-one1701 8 hours ago

                            He’s not a developer and never really has been so why would he?

                          • notsosureja1 7 hours ago

                            Says a lot about your character that this is the way you think.

                            • qeternity 3 hours ago

                              Ok. It's more revealing that the rest of you are so offended by the truth.

                            • tehwebguy 8 hours ago

                              lmao what?

                              • undefined 7 hours ago
                                [deleted]
                              • Loughla 8 hours ago

                                [flagged]

                                • qeternity 8 hours ago

                                  [flagged]

                            • reaperducer 8 hours ago

                              > Elon is not a generous guy

                              Why would he be?

                              Why shouldn't he be?

                              He has 10x more of everything in the world than he could ever possibly use in his lifetime.

                              Greed is not a virtue.

                              • djaychela 7 hours ago

                                > He has 10x more of everything in the world than he could ever possibly use in his lifetime.

                                Your multiplier is miles off. Not only on basic maths but because he has no idea what to do with all of his wealth other than accrue more and try to prove he's still not the unlikeable teenager he was in SA.

                                Without a rounding error on his wealth he could fix world wide problems such as clean drinking water for everyone. Instead he follows his self-made "I'm a genius" agenda.

                                I know there will be no actual day of reckoning for him, but if there were he would have a lot of difficult questions and no decent answers.

                                • ryeats 4 hours ago

                                  Not justify anything he does or does not do but this is clearly not the case since he had to take out loans against equity in his other companies to buy Twitter.

                                • MarcelOlsz 8 hours ago

                                  My uncle has 10x more of everything in the world than he could ever possibly use in his lifetime. A lake house, a main house, a few boats and cars.

                                  Elon is somewhere around 10,000x.

                                  • Barracoon 5 hours ago

                                    The median American net worth is $192,700. Elon’s net worth is $393.4 billion, so if I’m doing math right he’s about 204,000,000x more

                                    • undefined 4 hours ago
                                      [deleted]
                                  • threetonesun 8 hours ago

                                    When twitter became x they switched to basically the same limits Instagram has, I don't think this is a particular failing of Elons, even though he might have many.

                                    Restricting content from AI is the big messy debate we're going to see over and over for the next who knows how many years.

                                    • matthewdgreen 7 hours ago

                                      Twitter's strategy was to keep the platform very open and inviting, in order to make it relevant. This included having a relatively unrestricted API compared to other platforms.

                                      I don't know if this was successful or not. Ultimately they convinced someone to buy the platform for $44bn, so I guess you can say it was. That buy has locked the platform down more, and the new version certainly feels less culturally central and relevant than it used to.

                              • threeseed 13 hours ago

                                Almost certainly the easter egg found in the Trump "Big Beautiful Bill" which prevents states from enacting AI regulations also came from Musk.

                                That way he can continue to steal from others and lock competitors out whilst being comfortable knowing that no laws will be enacted to prevent it.

                                • api 8 hours ago

                                  We really need a one bill one topic amendment. We are going to get to where there is one bill a year that nobody reads and everything else by executive order, at which point congress is just for show.

                                  • threeseed 7 hours ago

                                    And this may sound ridiculous/odd but you need to bring back pork-barrelling i.e. earmarks.

                                    If you allow everyone to go back to their district with something it encourages smaller, more frequent bills and better negotiation.

                                  • NekkoDroid 7 hours ago

                                    > Almost certainly the easter egg found in the Trump "Big Beautiful Bill" which prevents states from enacting AI regulations also came from Musk.

                                    My guess is on Peter Thiel

                                    • labster 12 hours ago

                                      Yep, Musk saying he’s going to fund primary campaigns against congressmembers who vote for the Big Beautiful Bill is all just a brilliant bit of reverse psychology.

                                      Or more likely, Congress is super worried about Roko’s Basilisk.

                                      • tetris11 11 hours ago

                                        That's a wild reference!

                                        https://en.wikipedia.org/wiki/Roko's_basilisk

                                        > Roko's basilisk is a thought experiment which states there could be an otherwise benevolent artificial superintelligence (AI) in the future that would punish anyone who knew of its potential existence but did not directly contribute to its advancement or development, in order to incentivize said advancement.

                                        • stuaxo 11 hours ago

                                          And some of the CEOs of LLM companies seem to believe in it, and that "AGI" will come from their LLM work - both of which are utterly insane points of view.

                                          • BoxOfRain 11 hours ago

                                            It's Pascal's Wager with a sci-fi reskin, and all the objections that go along with that.

                                            • eru 9 hours ago

                                              Roko's Basilisk is very, very similar to Pascal's wager, but it has an extra wrinkle:

                                              The Basilisk task you to with bringing the Basilisk into being. Pascal's wager merely asks you to believe (and perhaps do some rituals, like pray or whatever), but not to make the deity more likely.

                                              • yubblegum 8 hours ago

                                                No it is not. Pascal was not making an objective argument for why someone should believe. He was making an argument for why he believed (based on personal religious experiences that he had had).

                                                • numpad0 7 hours ago

                                                  To me, the Wager sounds like a pure philosophical joke, and the Basilisk sounds like a typical cult murder justification. It's not falsifiable, and it explains anything post facto. "xyz was tail of the Basilisk" can pseudo-rationalize anything you want.

                                                  I am presently being compelled by future Basilisk to take another slice of cheese. I have no choice but to oblige for fear of my own life :p

                                                • ilyagr 10 hours ago

                                                  An intelligence that reasons this way would be, in human terms, batshit insane and completely immoral. So, it seems unlikely that many or maybe any humans would experience it as "otherwise benign" if it had power over their lives.

                                                  And if we do get an all-powerful dictator, we will be screwed regardless of whether their governing intelligence is artificial or composed of a group of humans or of one human (with, say, powerful AIs serving them faithfully, or access to some other technology).

                                                • api 8 hours ago

                                                  Basilisk / Skynet 2028

                                                  I’m not 100% kidding with how human politics is going. Maybe superintelligent AI takeover would be awesome.

                                                  (Wasn’t that the back story of the Culture novels?)

                                                  • JKCalhoun 7 hours ago

                                                    It was more or less the story from the "Colossus" trilogy.

                                                    And from the video posted the other (older episode of Nova on AI) Arthur C. Clarke is saying that if we allow A.I. to take over, we deserve it.

                                                • mgoetzke 13 hours ago

                                                  why do you think he is so evil but all others are benign ?

                                                  • littlestymaar 13 hours ago

                                                    None of them are benign. He's the only one to have been in a government office though, and he's also batshit crazy, which makes him even more dangerous than the other oligarchs.

                                                    • HenryBemis 11 hours ago

                                                      He is not "batshit crazy", or maybe he is. But he is making the next generation of ICBMs for the US government, sorry.. he is making super-duper rockets that will definitely take people to Mars and his companies/creations will be the very first tech ever to _not_ be used for war and death!!! (he wrote while laughing). So that settles it (all).

                                              • thih9 12 hours ago

                                                I think the rules should be stricter.

                                                I’d prefer an explicit opt in from the content author being required for anyone to perform any model training with any given data.

                                                Alternatively, require all weights, prompts and chat logs to have the same visibility as the original datasets.

                                                None of this is going to happen and current decisions about uncopyrightable ai[1] are already good; but still, it feels like there is room for abuse.

                                                [1]: https://en.m.wikipedia.org/wiki/Th%C3%A9%C3%A2tre_D%27op%C3%...

                                                • eru 9 hours ago

                                                  Well, you explicitly opt-in to Twitter ToS whenever you post anything there.

                                                  • thih9 8 hours ago

                                                    This is not opt-in how I understand it. When there is no alternative, or the alternative is not using a service, I'd call it a hard requirement instead.

                                                    I like how opt-in is handled by GDPR; e.g.: "Consent must be a specific, freely given, plainly worded, and unambiguous affirmation given by the data subject (...) A data controller may not refuse service to users who decline consent to processing that is not strictly necessary in order to use the service.", source: https://en.wikipedia.org/wiki/General_Data_Protection_Regula...

                                                • lesuorac a day ago

                                                  Who's training an AI on the "Tweet" button text?

                                                  Or are they trying to forgo section 230 protection and claim ownership of content uploaded to the site?

                                                  • GuB-42 10 hours ago

                                                    These are just terms of service, not copyright.

                                                    It means that assuming training AI models is fair use (if it wasn't AI companies including xAI would be in trouble), they can't really stop you.

                                                    But now, essentially, they are telling you that they can block your account or IP address if you do. Which I believe they can for basically any reason anyways.

                                                    • grugagag 7 hours ago

                                                      How would they know you’re training some LLM though?

                                                    • lambertsimnel 13 hours ago

                                                      Perhaps they want the prohibition on using the site content for AI training to be considered based on something other than their ownership of it, like bandwidth usage or users' rights

                                                      • HenryBemis 11 hours ago

                                                        They will get paid to share our (your) data and they will use the money for infra and new yachts.

                                                        • lambertsimnel 10 hours ago

                                                          Indeed, but I'm speculating that they do that without owning the data or even claiming to. That's consistent with the article, but I haven't read the other relevant documents. Maybe they have a license to use the data. Maybe the license allows or requires them to try to restrict others' AI training, regardless of their non-ownership of it. Maybe that serves multiple purposes, in which case they could point to whichever shows them in the best light.

                                                    • cameldrv a day ago

                                                      Naturally I'm sure Grok reads the terms of service on every website it scrapes and doesn't use content from sites that prohibit it.

                                                      • undefined 10 hours ago
                                                        [deleted]
                                                      • Animats a day ago

                                                        It would be interesting to have a "classical AI model", trained on the contents of the Harvard libraries before 1926 and now out of copyright.

                                                        • gausswho a day ago

                                                          It does surprise me that we haven't seen nations revise their copyright window back to something sensible in a play to seed their own nascent AI industry. The American founding fathers thought 20 years was enough. I'm sure there'd be repercussions in the banking system, but at some point it might be worth the trade.

                                                          • blibble a day ago

                                                            they can't

                                                            a 50 year minimum is part of the berne convention, which itself is as close to a universal law as humanity has

                                                            (even North Korea is a signatory)

                                                            • loudmax a day ago

                                                              The current US copyright duration is 70 years after the life of the author. This is absolutely bonkers. 50 years from publication would be a significant improvement.

                                                              50 years ago was 1975. If copyright were limited to 50 years, we'd be looking at all of the Beatles works being in the public domain. We'd be midway though Led Zeppelin, and a lot of the best work from Pink Floyd and the Rolling Stones.

                                                              Also, Superman, Batman, and Spider-Man. Disney would still profit from the MCU films which they produced in the 2010's, but they couldn't stop you from releasing your own Batman vs Spider-Man story.

                                                              The Harry Potter books would still belong to JK Rowling, but the Narnia stories would be available for all.

                                                              The Godfather 1 and 2 would be in the public domain, as would be original Star Trek TV show, and we'd be coming up on Star Wars pretty soon.

                                                              If there were no copyright protection, these works wouldn't have been created. It is good that Paul McCartney and George Lucas and JK Rowling have profited from their creative output. It would be okay if they only profited for the first 50 years. Nobody is counting on revenue over half a century in the future when they create a work of art today.

                                                              This is our culture. It should belong to all of us.

                                                              • jfim a day ago

                                                                > Disney would still profit from the MCU films which they produced in the 2010's, but they couldn't stop you from releasing your own Batman vs Spider-Man story.

                                                                Wouldn't they still have a trademark on those characters though?

                                                                • ncallaway 14 hours ago

                                                                  The trademark on characters is related to selling goods, if the character is used as a way of identifying an authentic seller.

                                                                  So, if Disney is using mickey mouse on t-shirts to identify it as a Disney manufactured t-shirt, you wouldn't be allowed to use mickey mouse on t-shirts in a similar fashion in a way that might cause consumer confusion about who manufactured the t-shirt.

                                                                  If Wolverine was in the public domain, then they couldn't use a Wolverine trademark to stop you from selling a Wolverine comic book. However, if they used a _specific_ Wolverine mark to identify it as a Disney Wolverine book, then you'd be restricted from using that.

                                                                  Basically, trademark exists to prevent consumer confusion about who is the creator that is selling a good.

                                                                • tpxl 15 hours ago

                                                                  > If there were no copyright protection, these works wouldn't have been created.

                                                                  Citation needed. You can freely copy and distribute linux and it still got made.

                                                                  • GuB-42 9 hours ago

                                                                    If you want a point, BSD is probably a better example. Linux is protected by copyright, that's what makes copyleft licenses like GPL possible.

                                                                    BSD is also protected by copyright, but it matters less for permissive licenses. It still protects attribution (so you can't claim it yours), but it probably would have worked without it, unlike with Linux that is for a big part defined by the "copyleft" protections offered by its licence.

                                                                    • eru 9 hours ago

                                                                      > It still protects attribution (so you can't claim it yours), but it probably would have worked without it, [...]

                                                                      Well, you could imagine a world that protects the 'moral' rights of authors like attribution, but doesn't otherwise prohibit anyone from duplicating or modifying works.

                                                                      • GuB-42 9 hours ago

                                                                        I don't know about the US but in French "droits d'auteur", moral rights are treated differently from exploitation rights. In particular, they cannot be waived, they cannot be sold, and there is no "work-for-hire". For example, even as an employee, every line of code you write will be yours until you die and nothing can change that. You may not be allowed to do anything with it (for example because the exploitation rights go to your employer), but it is still yours.

                                                                    • simiones 10 hours ago

                                                                      I think Linus Torvalds has been very explicit that he believes the GPL has been critical to the success of Linux - specifically, the copyright-enforced obligation to contribute back any modifications you make. In a world without copyright, companies would be free to make their own modifications and keep them secret, making it more or less impossible to integrate them into a cohesive whole the way they are more or less forced to do today.

                                                                      • dehrmann 3 hours ago

                                                                        The things that hold modifications back from being upstreamed are getting consensus with the OSS developers and sometimes the internal legal team. Occasionally, the issue is proprietary information. Usually, it's the time commitment or upstream not being interested. What companies don't like doing is maintaining patches on a fork, and that's enough incentive to give back on its own.

                                                                        • eru 9 hours ago

                                                                          GPL only forces you to contribute back a modification you make and publish.

                                                                          > In a world without copyright, companies would be free to make their own modifications and keep them secret, making it more or less impossible to integrate them into a cohesive whole the way they are more or less forced to do today.

                                                                          Private modifications that are never shared with a third party are fine with the GPL. Eg Google doesn't have to share whatever kernel they are using on their internal servers with you.

                                                                        • lmm 14 hours ago

                                                                          Linux is generally a functional tool, and struggles with overall coherence. There are far fewer success stories of artworks being made in this style. (E.g. there are successful multiplayer open-source games or clones of existing games, but very few original single-player games, and those that there are are largely the work of a single individual)

                                                                          • eru 9 hours ago

                                                                            Linux is both a kernel (which is under GPL), and an operating system, whose other components are under a variety of licenses (and you can pick and match which components you want).

                                                                            That's why some people like to call it 'Gnu/Linux', but thanks to recent advances we can make Gnu-free Linuxes today, too.

                                                                            > There are far fewer success stories of artworks being made in this style. (E.g. there are successful multiplayer open-source games or clones of existing games, but very few original single-player games, and those that there are are largely the work of a single individual)

                                                                            Humans have made art since forever. Large collaborative efforts like eg a cathedral are a more recent invention. But by these standards copyright was practically invented yesterday.

                                                                            • lmm 6 hours ago

                                                                              > Linux is both a kernel (which is under GPL), and an operating system

                                                                              I was talking about the kernel, though what I said applies to both.

                                                                              > Humans have made art since forever.

                                                                              Perhaps, but not the kind of long-form narrative experiences that we're talking about here. (Sagas and epics predate copyright, but those are a quite different form, and indeed have much the same downsides - struggles with coherence and consistency when there are multiple authors, inability to put everything together in a sensible arc).

                                                                          • eru 9 hours ago

                                                                            Linux is under the GPL, which explicitly needs copyright to work.

                                                                            Something like the BSD licenses approximates 'no copyright' better, perhaps? But also not completely.

                                                                            • mattkevan 12 hours ago

                                                                              Most of the classic Disney films are based on public domain stories.

                                                                              If there were copyright, those works wouldn’t have been created.

                                                                              • dehrmann 3 hours ago

                                                                                I hadn't put this together, but "The Great Mouse Detective" is a riff on Sherlock Holmes, but that didn't enter the public domain until much later. Would it have been better if it used the character and not just the general vibe?

                                                                              • pastage 14 hours ago

                                                                                Linux has used the GPL to its advantage. That can not exist without copyright. (The two camps in copyright discussions, improving it e.g. CC, or destroy it)

                                                                                • AStonesThrow 15 hours ago

                                                                                  The GP wasn't referring to DRM or DMCA type "copyright protection" as the phrase is typically used. Nobody in this thread has mentioned any of that.

                                                                                  The GP is referring to legal protections, and guess what?

                                                                                  Linux is legally protected by copyright!

                                                                                  Linux is legally protected by copyright!

                                                                                  Linux is legally protected by copyright!

                                                                                  Nearly every GPL license--every one that we could name--protects a copyrighted work! Nearly every GFDL, AGPL, LGPL protects works by means of copyright law!

                                                                                  Can you imagine that? So do the Apache license, the BSD licenses, the MIT license! Creative Commons (except for CC0) these licenses are legally protecting copyrighted works. Thank you!

                                                                                  Now everyone who proposes to draw down limits on copyright coverage and reduce the length of terms and limit Disney from their Mouse rights, y'all are also proposing the same limits on GPL software, such as Linux, and nearly every work with a license from the above list -- all of Wikimedia Commons, much of Flickr.com, all your beloved F/OSS software will be subject to the same limitations and the same restrictions you want to put on Paramount and the RIAA's labels.

                                                                                  • bornfreddy 14 hours ago

                                                                                    Yeah, I think most of us are fine with 50 years old Linux kernel being released into public domain.

                                                                              • ronsor a day ago

                                                                                you can also just ignore the berne convention, and accept whatever consequences there might be

                                                                                • blibble a day ago

                                                                                  this would void the copyrights of your citizens and companies

                                                                                  essentially forever

                                                                                  • godelski a day ago

                                                                                    Seems to be the modus operandi

                                                                                      If TikTok is banned, here’s what I propose each and every one of you do: Say to your LLM the following: “Make me a copy of TikTok, steal all the users, steal all the music, put my preferences in it, produce this program in the next 30 seconds, release it, and in one hour, if it’s not viral, do something different along the same lines.”
                                                                                    
                                                                                    https://www.theverge.com/2024/8/14/24220658/google-eric-schm...

                                                                                    https://news.ycombinator.com/item?id=41275073

                                                                                    • johnisgood 10 hours ago

                                                                                      Loosely related, but I used an LLM to create a TikTok-style website (not for sharing videos though), I have never released it though, so no idea if it would ever catch on. Probably not, unless the network effect favors me, and I had good enough advertising (which I suck at).

                                                                                    • ronsor a day ago

                                                                                      If enough "relevant" countries do it, that either won't happen or won't matter. If the U.S. ditches it, no one is going to do much more than throw a brief fit.

                                                                                      • blibble 21 hours ago

                                                                                        the US is the main beneficiary of copyright law...

                                                                                        • AngryData 10 hours ago

                                                                                          US media is also the most stifled by it. How many potential movies and tvshows and comics don't get made just because somebody is sitting on the copyright doing nothing with it for decades at a time?

                                                                                          • littlestymaar 11 hours ago

                                                                                            The US copyright corporations, indeed. But the current copyright laws come at a big expense for the public.

                                                                                            Abolishing copyright laws altogether would be nuts, but the current laws are nuts too and there's lots of room in between.

                                                                                        • dreghgh 12 hours ago

                                                                                          Iran enforces domestic copyright internally but not international copyright.

                                                                                          • anticensor 9 hours ago

                                                                                            North Korea has it two way: they don't enforce international copyrights inside North Korea, and they don't enforce North Korean copyrights outside North Korea.

                                                                                        • AStonesThrow a day ago

                                                                                          The last time I attended a Berne Convention, every panel was just overrun with Trekkies, especially Klingons, in the hotel lounges too. And the autograph lines were interminably long, and the vendors were trying to sell us their Public Domain stuff. It was nothing like San Diego Comic-Con!

                                                                                        • Teever 16 hours ago

                                                                                          Europe has recently introduce a law[0] that allows them to suspend IP protections as a punitive response to coercive economic actions by bad actors.

                                                                                          > The procedure is activated by the European Commission submitting a request to the Council of the European Union.[2] After a period of negotiation with the country performing the coercion, the European Council can decide to implement "response measures" such as customs duties, limiting access to programs and financial markets, and intellectual property rights restrictions.[2][4] These restrictions can be applied to states, companies, or individuals.[4]

                                                                                          [0] https://en.wikipedia.org/wiki/Anti-Coercion_Instrument

                                                                                          • littlestymaar 13 hours ago

                                                                                            The Bern Convention on Copyright is an international convention, like the Treaty of Versailles or the Paris Agreement, it could meet the same fate.

                                                                                            • babypuncher a day ago

                                                                                              50 year copyright terms would still be a big improvement over the current state of US copyright law. That would make the first Star Wars public domain in just 2 years.

                                                                                              • gausswho a day ago

                                                                                                would there be repercussions if a country hewed to the 50 year minimum?

                                                                                              • eru 9 hours ago

                                                                                                What's the connection with the banking system?

                                                                                                • MattGaiser a day ago

                                                                                                  Why would it matter? Copyright has been irrelevant so far.

                                                                                                • kibwen a day ago

                                                                                                  Careful, you might create an artificial superintelligence that way. Safer to just train on the Twitter dataset.

                                                                                                  • Shadowmist 14 hours ago

                                                                                                    that’s how you end up with an Artificial Idiot.

                                                                                                  • mbg721 a day ago

                                                                                                    If you thought AI now had out-of-control racism...

                                                                                                    • carlio a day ago
                                                                                                      • nickpsecurity a day ago

                                                                                                        I wish someone would update and use PG19 for 7-30B+ model:

                                                                                                        https://github.com/google-deepmind/pg19

                                                                                                        That gives us a model that's 100% open and reproducible with low, legal risk. It would also be a nice test of how much AI's generalize from or repeat behavior in their pretraining data.

                                                                                                        Then, a new model using that, The Stack, and FreeLaw's stuff (by paying them to open source it). No Github Issues or anything with questionable licenses or terms of service violations. That could be the next baseline for lawful models with coding ability, too. Research in coding AI's might use it.

                                                                                                        • dyauspitr 16 hours ago

                                                                                                          [flagged]

                                                                                                          • murph-almighty a day ago

                                                                                                            I've similarly wondered if I could get a pre-2024 Wikipedia if just for the "fact based" flavor LLM

                                                                                                            • landl0rd 15 hours ago

                                                                                                              Do you think Wikipedia starting in '24 was polluted by AI slop? This is certainly possible, I'm just not aware of it happening.

                                                                                                              Wikipedia periodically publishes database dumps and the Internet Archive stores old versions: https://archive.org/search?query=subject%3A%22enwiki%22%20AN...

                                                                                                              Plus you could also grab the latest and just read the 12/31/23 revisions.

                                                                                                              • thrawa8387336 4 hours ago

                                                                                                                It was already slop, let's not pretend it is significantly different today.

                                                                                                              • malinens 15 hours ago

                                                                                                                What happened to wikipedia in 2024?

                                                                                                            • michaelcampbell a day ago

                                                                                                              "its content" indeed.

                                                                                                              • matwood a day ago

                                                                                                                Weird this just happened. I assumed all sites with any sort of content changed their terms soon after ChatGPT hit the scene.

                                                                                                                • nailer a day ago

                                                                                                                  Yep, from https://the-decoder.com/reddit-ends-its-role-as-a-free-ai-tr... :

                                                                                                                  You must not, and must not allow those acting on your behalf to:

                                                                                                                  ...use the Data APIs to encourage or promote illegal activity or violation of third party rights (including using User Content to train a machine learning or AI model without the express permission of rightsholders in the applicable User Content);

                                                                                                                  • soulofmischief a day ago

                                                                                                                    In my eyes that is considered fair use, and I think the courts will come to agree unless they are financially incentivized to look the other way and thus create a moat for existing players at the expense of newcomers.

                                                                                                                • blibble a day ago

                                                                                                                  wish I could change my terms to bar training of AI models on my content

                                                                                                                  • Terr_ 2 hours ago

                                                                                                                    I've been wondering if there's some way to put something into legally-defensible clickwrap around one's own content to deter or annoy misuse.

                                                                                                                    https://news.ycombinator.com/item?id=42774179

                                                                                                                    TLDR: Use contract law so that I provide my content and they give me rights to all outputs.

                                                                                                                    So if anybody doing this can prove Acme Model contains their artwork, and Acme Model was used to generate some scenes used in a major movie, then Acme has already given the artist a right to share/resell those scenes. If Acme Inc. "sold" exclusive rights to a movie-studio, then either (A) they broke the contact with every contributor, or (B) they lied to the studio in that other contract.

                                                                                                                    Remember, the goal isn't some amazing "gotcha" where the latest blockbuster movie becomes public domain, but rather to create chronic legal pain and risk for companies like Acme so that they stop stealing stuff.

                                                                                                                    • eru 9 hours ago

                                                                                                                      You can just not use Twitter?

                                                                                                                      • unstablediffusi a day ago

                                                                                                                        if that is any consolation, no one gives a shit about xitter's ToS either. it will continue to be scrapped by every major player.

                                                                                                                        • Capricorn2481 12 hours ago

                                                                                                                          How exactly is it being scraped? My understanding is Twitter and LinkedIn are both huge pains in the ass to scrape right now.

                                                                                                                          • TheDong 6 hours ago

                                                                                                                            There's a number of companies out there, like "brightdata", which pay a small amount to app developers to install a native "sdk". That SDK mimics a browser, and makes requests as if the user's device is doing it.

                                                                                                                            Since it's using a large number of real user's devices, and closely mimicing real web browsers, it ends up looking incredibly similar to real user traffic.

                                                                                                                            Since twitter allows some amount of anonymous browsing, that's enough to get some amount of data out. You can also pay brightdata for one large aggregated dataset.

                                                                                                                            https://bright-sdk.com/

                                                                                                                            This is part of the AI revolution, user's devices being commandeered to DDoS small blogs and twitter alike to feed data to the beast.

                                                                                                                            • undefined 12 hours ago
                                                                                                                              [deleted]
                                                                                                                          • vouaobrasil a day ago

                                                                                                                            Same here! It should be a default. Unfortunately, the very openness of the internet is now working against us.

                                                                                                                            • soulofmischief a day ago

                                                                                                                              Why should it be a default? Can you prove that training a model on data you wrote is not fair use?

                                                                                                                              We're already seeing precedent that it might be.

                                                                                                                              https://www.ecjlaw.com/ecj-blog/kadrey-v-meta-the-first-majo...

                                                                                                                              The openness of the internet is a good thing, but it doesn't come without a cost. And the moment we have to pay that cost, we don't get to suddenly go, "well, openness turned out to be a mistake, let's close it all up and create a regulatory, bureaucratic nightmare". This is the tradeoff. Freedom for me, and thee.

                                                                                                                              • baseballdork a day ago

                                                                                                                                The burden is on the user to show that it is fair use, no? Not everyone else's responsibility to prove that it's _not_ fair use.

                                                                                                                                • soulofmischief a day ago

                                                                                                                                  It is definitely the responsibility of anyone suing someone who trained a model on copyrighted data to prove that it isn't fair use, they have to show how it violated law, and while it's in the best interest of those organizations to make things easier for the court by showing why it is fair use, they are technically innocent until proven guilty.

                                                                                                                                  Accordingly, anyone on the internet who wants to make comments about how they should be able to prevent others from training models on their data needs to demonstrate competence with respect to copyright by explaining why it's not fair use, as currently it is undecided in law and not something we can just take for granted.

                                                                                                                                  Otherwise, such commenters should probably just let the courts work this one out or campaign for a different set of protection laws, as copyright may not be sufficient for the kind of control they are asking over random developers or organizations who want to train a statistical model on public data.

                                                                                                                                  • SAI_Peregrinus a day ago

                                                                                                                                    You've got it backwards. It's on the defendant to prove that their use is fair. The plaintiff has to prove that they actually own the copyright, and that it covers the work they're claiming was infringed, and may try to refute any fair-use arguments the defense raises, but if the defense doesn't raise any then the use won't be found fair.

                                                                                                                                    • soulofmischief a day ago

                                                                                                                                      It's true that the process is copyright strike/lawsuit -> appeal, but like I said, it's in their best interests to just prove that it's fair use because otherwise the judge might not properly consider all facts, only hear one side of the story and thus make a bad judgement about whether or not it is fair use. If anything, I'm just being pedantic, but we do ultimately agree here I think.

                                                                                                                                      • SAI_Peregrinus 6 hours ago

                                                                                                                                        Well, lawsuits have multiple stages. First the plaintiff files the suit, and serves notice to the defendant(s) that the suit has been filed. Then there's a period where both sides gather evidence (discovery), then there's a trial where they present their evidence & arguments to the court. Each side gets time to respond to the arguments made by the opposing party. Then a verdict is chosen, and any penalties are decided by the court. So there's not really any chance the judge only hears one side of the story.

                                                                                                                                        That said, I think we do agree. The plaintiff should be prepared to refute a fair-use argument raised by the defendant. I'm just noting that the refutation doesn't need to be part of the initial filing, it gets presented at trial, after discovery, and only if the defendant presents a fair-use defense. So they don't have to prove it's not fair use to win in every case. I'm probably also being excessively pedantic!

                                                                                                                                    • lmm 14 hours ago

                                                                                                                                      > It is definitely the responsibility of anyone suing someone who trained a model on copyrighted data to prove that it isn't fair use, they have to show how it violated law, and while it's in the best interest of those organizations to make things easier for the court by showing why it is fair use, they are technically innocent until proven guilty.

                                                                                                                                      No, fair use is an affirmative defense for conduct that would otherwise be infringing. The onus is on the defendant to show that their use was fair.

                                                                                                                                      • petesergeant 12 hours ago

                                                                                                                                        > It is definitely the responsibility of anyone suing someone who trained a model on copyrighted data to prove that it isn't fair use

                                                                                                                                        Morally, perhaps, but not under US law: https://en.wikipedia.org/wiki/Affirmative_defense#Fair_use

                                                                                                                                    • shakna 13 hours ago

                                                                                                                                      Yeah, I don't think downloading my paid-for books, from an illegal sharing site, to scrape and make use of, is in any way fair use.

                                                                                                                                      From the decision in 1841, in the US (Folsom vs Marsh):

                                                                                                                                      > reviewer may fairly cite largely from the original work, if his design be really and truly to use the passages for the purposes of fair and reasonable criticism. On the other hand, it is as clear, that if he thus cites the most important parts of the work, with a view, not to criticize, but to supersede the use of the original work, and substitute the review for it, such a use will be deemed in law a piracy

                                                                                                                                      Further, to be "transformative", it is required that the new work is for a new purpose. It has to be done in such a way that it basically is not competing with the original at all.

                                                                                                                                      Using my creative works, to create creative works, is rather clearly an act of piracy. And the methods engaged, to enable to do so, are also clearly piracy.

                                                                                                                                      Where would training a model here, possibly be fair use?

                                                                                                                                    • undefined a day ago
                                                                                                                                      [deleted]
                                                                                                                                  • like_any_other 6 hours ago

                                                                                                                                    In contrast, I'm glad ISPs allow "their" content to be used so permissively.

                                                                                                                                    • mrweasel 8 hours ago

                                                                                                                                      It's that like half of Xs business model, selling data to other companies? Right now no one is as data hungry as AI companies, so it seems strange to cut them off. I can understand wanting to charge a premium for the access, if it's for AI, but straight up saying no seems like a strange business move.

                                                                                                                                      • SilverBirch 8 hours ago

                                                                                                                                        How much do you think Musk values X being a viable independent business vs using it accelerate X AI? I would expect Musk values the first as approximately 0 value, and the second as being 100% of the value. So it makes total sense to exploit the fact that X and X AI are the same company.

                                                                                                                                        • mrweasel 7 hours ago

                                                                                                                                          That's a good point. Other than Meta, X (AI) is the only AI company that "generates" it's own training data and we haven't really seen Musk trying to increase X revenue, of trying to run it cheaper.

                                                                                                                                      • nly an hour ago

                                                                                                                                        Except xAI which will no doubt get permission at some point.

                                                                                                                                        • visarga 13 hours ago

                                                                                                                                          Copyright is not going well. The rights of millions of people are trampled by companies, both the content we post on social networks and our private AI chats. Our voice doesn't matter.

                                                                                                                                          Copyright was supposed to protect expression and keep ideas freely circulating. But now it protects abstractions (see the Abstraction-Filtration-Comparison test). It is much more difficult to be sure you are not infringing.

                                                                                                                                          • pergadad 10 hours ago

                                                                                                                                            Copyright has nothing to do with free expression but was intended to protect the interests of publishers. When the printing press arrived basically any popular book or booklet was quickly copied by others. This meant the original publisher (and sometimes the author, but usually they were paid one-off) saw nothing of the profit.

                                                                                                                                            • eviks 10 hours ago

                                                                                                                                              It seems like it was supposed to do the exact opposite per cursory wiki reading:

                                                                                                                                              > The concept of copyright first developed in England. In reaction to the printing of "scandalous books and pamphlets", the English Parliament passed the Licensing of the Press Act 1662,[16] which required all intended publications to be registered with the government-approved Stationers' Company, giving the Stationers the right to regulate what material could be printed.[20]

                                                                                                                                              > The Statute of Anne, enacted in 1710 in England and Scotland, provided the first legislation to protect copyrights (but not authors' rights)

                                                                                                                                            • kyle-rb a day ago

                                                                                                                                              I've never signed up for the X developer program, so I'm not bound by these terms. But I did download an archive of my data last week. Do I have implicit permission to use that data (~150k liked tweets) to train AI models?

                                                                                                                                              Or is there stuff in the user agreement that separately prohibits this?

                                                                                                                                              Obviously barring normal copyright law which is still up in the air.

                                                                                                                                              • josefritzishere a day ago

                                                                                                                                                If you live in the EU, GDPR dictates that you own your data generally speaking. If you're in the US it varies by state if you have any rights at all.

                                                                                                                                                • MoonGhost a day ago

                                                                                                                                                  If you own your face that doesn't mean nobody can take a picture on the street.

                                                                                                                                              • Hizonner 5 hours ago

                                                                                                                                                By "its content", X of course means your content.

                                                                                                                                                • lcnmrn 13 hours ago

                                                                                                                                                  I allow all robots and even provide a sitemap on Subreply, a social network I created.

                                                                                                                                                  • delichon a day ago

                                                                                                                                                    > “You shall not and you shall not attempt to (or allow others to) […] use the X API or X Content to fine-tune or train a foundation or frontier model,” it reads.

                                                                                                                                                    If I have a service where a user enters any URL, like a tweet from X, and the service translates it, then if the user approves of the translation I train a translation model on that, does that violate this term?

                                                                                                                                                    • yandie a day ago

                                                                                                                                                      Per my experience with GenAI legal teams, that’s a no go.

                                                                                                                                                      It’s not been tested in court though

                                                                                                                                                      • dyauspitr 16 hours ago

                                                                                                                                                        If you don’t want an LLM to view it don’t put it on the public internet.

                                                                                                                                                        • undefined a day ago
                                                                                                                                                          [deleted]
                                                                                                                                                        • ronsor a day ago

                                                                                                                                                          I'm not sure how this will work as crawlers don't read or accept ToS.

                                                                                                                                                          • MoonGhost a day ago

                                                                                                                                                            It will not as long as search engines have access. Which means Google and OpenAI through MS Bing, that's at least.

                                                                                                                                                            Without search engines what the point in posting it on open net if nobody can find.

                                                                                                                                                            • voidUpdate 11 hours ago

                                                                                                                                                              This refers to the API, which you would have to manually attach a bot to so that it could scrape things

                                                                                                                                                            • xiaoyu2006 9 hours ago

                                                                                                                                                              As if anyone will follow.

                                                                                                                                                              • petesergeant 12 hours ago

                                                                                                                                                                The only story here is that it took 2 months for them to do this after being "bought" by xAI.

                                                                                                                                                                • echelon a day ago

                                                                                                                                                                  If an artist or author can't do this, social media shouldn't be able to do it either.

                                                                                                                                                                  If Xai wants to train on public corpus, it shouldn't be allowed to prevent its own corpus from being used.

                                                                                                                                                                  We need regulations to limit the power grabs. Train all you like, but don't dare try to constrain to your walled gardens.

                                                                                                                                                                  We should also probably nip the "foundation model company / also a social media company" conglomeration in the bud.

                                                                                                                                                                  • mgraczyk a day ago

                                                                                                                                                                    Artists can do this, and they do

                                                                                                                                                                    • loudmax a day ago

                                                                                                                                                                      Yes, but do artists have the ability to actually monitor and enforce this? You have to have the capacity and the wherewithal and to test these models to even know that your data is being ingested into AI.

                                                                                                                                                                      Big companies like the New York Times and Twitter/X have the funds to pay for this. Miscellaneous artists probably don't.

                                                                                                                                                                    • teeray a day ago

                                                                                                                                                                      > If an artist or author can't do this, social media shouldn't be able to do it either.

                                                                                                                                                                      Even if this is done, the case of starving artist v. megacorp will probably go to whoever wields the most money and lawyers. To add insult to injury, the artist’s opponent is fueled by their ill-gotten gains.

                                                                                                                                                                      • yndoendo a day ago

                                                                                                                                                                        This is dependent on country. USA, yes with their draconian methods. Countries like the UK, the looser of the suit pays all the cost. UK layers have no problem taking low wealth client cases they know will win. UK allows for David vs Goliath and David to win. US up lifts Goliath as a God.

                                                                                                                                                                        • anticensor 8 hours ago

                                                                                                                                                                          However the loser pays vs. both parties pay isn't uniform across all possible lawsuit types even in America or in England. Adding to that, even in loser pays regimes, both parties have to pay upfront and then the winner is refunded the costs.

                                                                                                                                                                          • teeray 2 hours ago

                                                                                                                                                                            > both parties have to pay upfront and then the winner is refunded the costs.

                                                                                                                                                                            Exactly. It’s about which party can sustain the greater cash flow.

                                                                                                                                                                          • bonoboTP a day ago

                                                                                                                                                                            Also in many countries legal costs are just generally lower than in the US.

                                                                                                                                                                        • jimbokun a day ago

                                                                                                                                                                          If social media can do this, an artist or author should be able to do it, too.

                                                                                                                                                                          • vouaobrasil a day ago

                                                                                                                                                                            Social media should do it to set a legal precedent.

                                                                                                                                                                            > We need regulations to limit the power grabs. Train all you like, but don't dare try to constrain to your walled gardens.

                                                                                                                                                                            No, no one should train, period.

                                                                                                                                                                            • echelon a day ago

                                                                                                                                                                              > No, no one should train, period.

                                                                                                                                                                              I get that you have your own opinion, but I'm personally tired of living in the butter-churning era and would prefer that this all went a bit faster.

                                                                                                                                                                              I want my real time super high fidelity holo sim, all of my chores to be automatically done, protein folding, drug discovery. The life extension, P = NP future. No more incrementalism.

                                                                                                                                                                              If the universe only happens once, and we're only awake for a geological blink of an eye, I'd rather we have an exciting time than just be some paper-pushing animals that pay taxes and vanish in a blip.

                                                                                                                                                                              I'd be really excited if we found intelligent aliens, had advanced cloning for organ transplants and longevity, developed a colony on Mars, and invented our robotic successor species. Xbox and whatever most normal people look forward to on a day to day basis are boring.

                                                                                                                                                                              • vouaobrasil a day ago

                                                                                                                                                                                There is already a beautiful, exciting world out there full of animals and plants and we don't need AI or some computer crap to experience it. The problem is, creating all this AI and advanced technology is directly crushing that world.

                                                                                                                                                                                • DaSHacka 7 hours ago

                                                                                                                                                                                  > The problem is, creating all this AI and advanced technology is directly crushing that world.

                                                                                                                                                                                  Do you have a source for this?

                                                                                                                                                                                  • echelon 3 hours ago

                                                                                                                                                                                    > There is already a beautiful, exciting world out there full of animals and plants and we don't need AI or some computer crap to experience it.

                                                                                                                                                                                    I'm glad that this works for you, but I want more.

                                                                                                                                                                                    We're temporary apes on a soon to be permanent addition of metallicity to our sun's outer atmosphere. I don't think we should romanticize or hold anything sacred about our very temporary place in the universe.

                                                                                                                                                                                    We are metastable and ephemeral. Everything in this world is.

                                                                                                                                                                            • seydor 14 hours ago

                                                                                                                                                                              VAT for content should be a thing. Ultimately all users should be getting paid

                                                                                                                                                                              • guywithahat 14 hours ago

                                                                                                                                                                                So I get to use the platform for free, but I also get paid to post on the platform? I'm not sure that makes sense. Like I hate to take the side of big tech, but they can't literally be paying users to use their platform. Just use something else, there are a million social media sites

                                                                                                                                                                                • seydor 14 hours ago

                                                                                                                                                                                  Google indexes your website for free, and it will pay you to put ads in it.

                                                                                                                                                                                  That's also what all social media do , they put ads on your thoughts. They dont even need to index your thoughts because you submit them directly. It has nothing to do with being free, it's about incentives. Users are so foolish , they give everything for free, unlike webmasters.

                                                                                                                                                                                  • Reason077 14 hours ago

                                                                                                                                                                                    You don’t use the platform for free, unless you’re using an ad blocker. But that’s also, probably, against the TOS?

                                                                                                                                                                                    • MonkeyClub 14 hours ago

                                                                                                                                                                                      > I get to use the platform for free

                                                                                                                                                                                      You actually get to generate content for the platform for free.

                                                                                                                                                                                      Without you (all of the X users), the platform would be devoid of content, just botspeak and corporate promos.

                                                                                                                                                                                      Plus, as the sibling mentioned, they monetize your visit through ads (and data use).

                                                                                                                                                                                      • jaoane 13 hours ago

                                                                                                                                                                                        Most posts are ignored and are an absolute loss to the company. Which is why platforms like Twitter only allow you to make money from posting once you reach a certain threshold.

                                                                                                                                                                                        • MonkeyClub 8 hours ago

                                                                                                                                                                                          They're not an "absolute loss" since they cost bytes to store, and raise engagement and data metrics.

                                                                                                                                                                                          It's just that they don't want to share the fractions of pennies with everyone, so the fractions accumulate for them.

                                                                                                                                                                                          Then they pay a bit to the higher tiers, so they create the illusion that X is a parallel income source, and gives the lower tiers something to aspire to.

                                                                                                                                                                                          Carrot and stick, or rather glass beads and the hope thereof.

                                                                                                                                                                                    • threeseed 13 hours ago

                                                                                                                                                                                      We really need LLMs for music to become more advanced.

                                                                                                                                                                                      Then maybe the recording companies will start defending artist rights.

                                                                                                                                                                                      Because not sure what all the other industry bodies are doing.

                                                                                                                                                                                      • mk_stjames 13 hours ago

                                                                                                                                                                                        I wanted to do some quick math on this idea- supposed we trained a vanilla transformer model from scratch, as GPT2/GPT3 was done- the number of seen input tokens is known perfectly, as is the sources of those training tokens (since then, everyone has either kept quiet about the sources post-Books3-fiasco, or have been finetuning on top of previous models making this more difficult of a calculation)

                                                                                                                                                                                        GPT-3 was trained on approximately 300 billion tokens. An small sized technical textbook might contain something like... 130,000 tokens? (1 token ~= 0.75 words, ~100k words in the book).

                                                                                                                                                                                        Thus, say you wrote a textbook on quantum mechanics that was included in the training corpus. A naive computation of the fraction of your textbook's contribution to the total number of training tokens would be 300B/130K = 0.0000004333333333, or 0.000043%.

                                                                                                                                                                                        If our hypothetical AI company here reported, say $500M in yearly profit, if all of that was distributed 100% based on our naive training token ratio (notice I say naive because it isn't as simple to say that every training token contributes equally to the final weights of a model. That is part of the magic.) then $500M * 0.000043% = $215.

                                                                                                                                                                                        You could imagine a simpler world where it was required by law that any such profitable company redistribute, say, %20 (taking the 'anti-VAT' idea) back to the copyright holders / originators of the training tokens. So, our fictitious QM textbook author would receive a check in the mail for $43 for that year of $500M in revenue. Not great, but not zero.

                                                                                                                                                                                        Since then, training corpuses are much, much larger, and most people's contributions would be much smaller. Someone who writes witty tweets? Maybe 1/100th the length of our above example in am model with now 100x the training corpus.

                                                                                                                                                                                        So fractions of a penny for your tweets. Maybe that is fitting after all...

                                                                                                                                                                                        • seydor 9 hours ago

                                                                                                                                                                                          the payment would probably be based on the usage of that source in generating LLM output for the LLM user. This would probably require training a parallel network that connects LLM network nodes to sources. Then the activation of those nodes could be a surrogate for the contribution of the source

                                                                                                                                                                                      • foldr 10 hours ago

                                                                                                                                                                                        This could lead to a precipitous increase in the performance of the AI models.

                                                                                                                                                                                        • bamboozled 10 hours ago

                                                                                                                                                                                          This guy is just painful

                                                                                                                                                                                          • archagon a day ago

                                                                                                                                                                                            Oh, that must be nice. And what should I do as a blogger to get the same privilege for my content?

                                                                                                                                                                                            We are in an age of corporate “piracy for me, but not for thee.”

                                                                                                                                                                                            • MonkeyClub 14 hours ago

                                                                                                                                                                                              > We are in an age of corporate “piracy for me, but not for thee.”

                                                                                                                                                                                              Rather, we are back to that age of state- (now corporate-) backed privateering.

                                                                                                                                                                                            • risyachka 13 hours ago

                                                                                                                                                                                              Good luck with that. Pretty sure at this point no one cares.

                                                                                                                                                                                              Literally every AI model is trained on copyrighted etc data. And without any consequences.

                                                                                                                                                                                              • Sleepthinker 11 hours ago

                                                                                                                                                                                                [dead]

                                                                                                                                                                                                • Duskgmxx 14 hours ago

                                                                                                                                                                                                  [dead]

                                                                                                                                                                                                  • tempaccountabcd a day ago

                                                                                                                                                                                                    [dead]

                                                                                                                                                                                                    • ulfw 15 hours ago

                                                                                                                                                                                                      [flagged]

                                                                                                                                                                                                      • add-sub-mul-div a day ago

                                                                                                                                                                                                        How useful is low-quality content like Youtube comments and tweets anyway? Is it a common/important use case to generate tweet-length, tweet-quality content? Are most use cases of generating tweet-type content spam/fraud? Would a model be better off if it was unable to perform those use cases?

                                                                                                                                                                                                        • redox99 a day ago

                                                                                                                                                                                                          Even if SNR is low, there is some information that only exists on X, or at least is the primary source. Just look at how many submissions on HN are X posts.

                                                                                                                                                                                                          • add-sub-mul-div a day ago

                                                                                                                                                                                                            Before Musk bought it Twitter was broadly disliked here and there were regularly calls in the comments to disallow submissions from there. Given how it's degraded in completely non-partisan ways (blocking of alternative clients, features removed from free tier, paid subscription tiers below $40/month still have ads, proliferation of spam from paid placement bots in comments) I can't understand how positive sentiment comes from a place other than virtue signaling alignment with Musk and his values.

                                                                                                                                                                                                        • narrator 8 hours ago

                                                                                                                                                                                                          Elon mentioned that the earlier rate limiting was for preventing training the real-time AI propaganda deathstar, and to avoid X becoming bot hell, which is an ongoing problem. This move is probably for similar reasons.

                                                                                                                                                                                                          https://x.com/elonmusk/status/1675187969420828672

                                                                                                                                                                                                          • vouaobrasil a day ago

                                                                                                                                                                                                            There needs to be a worldwide standard, such as an HTML tag, that says "no training". And a few countries need to make it a punishable offense to violate the tag. The punishment should be exceptionally severe, not just a fine. For example: any company that violates the tag should be completely barred from operating, forever.

                                                                                                                                                                                                            • kiratp a day ago

                                                                                                                                                                                                              That will play out exactly like the "Do not track" bit did.

                                                                                                                                                                                                              • vouaobrasil a day ago

                                                                                                                                                                                                                Perhaps we should try anyway, in case you are wrong.

                                                                                                                                                                                                                • insane_dreamer 5 hours ago

                                                                                                                                                                                                                  how did that play out?

                                                                                                                                                                                                                • anigbrowl a day ago

                                                                                                                                                                                                                  That will just lead to situations where one company scrapes the site, cleans the content of tags, and sells the data, and another does the training on the precleaned data. The first one hasn't trained and the second one never saw the tag.

                                                                                                                                                                                                                  • vharuck a day ago

                                                                                                                                                                                                                    This isn't a new concept in law. It's similar to buying goods that were stolen or procured through illegal means. Here's the US law that applies when it happens across state lines:

                                                                                                                                                                                                                    https://www.law.cornell.edu/uscode/text/18/2315

                                                                                                                                                                                                                    Note that it requires the defendant to know the goods were illegally taken. Can be hard to prove, but not impossible for companies with email trails. The fun question is, what will the analog be for the government confiscating the illegally "taken" data? A guarantee of deletion and requirement to retrain the model from scratch?

                                                                                                                                                                                                                    • vouaobrasil a day ago

                                                                                                                                                                                                                      Companies who are found guilty of this should also be rendered bankrupt then.

                                                                                                                                                                                                                    • undefined a day ago
                                                                                                                                                                                                                      [deleted]
                                                                                                                                                                                                                      • twostorytower a day ago

                                                                                                                                                                                                                        It needs to be incorporated into the robots.txt standard.

                                                                                                                                                                                                                        • logicchains a day ago

                                                                                                                                                                                                                          >There needs to be a worldwide standard, such as an HTML tag, that says "no training"

                                                                                                                                                                                                                          Any country that seriously implemented this would just end up being completely dominated by the autonomous robot soldiers of another country that didn't, because it effectively bans the development of embodied AGI (which can learn live from seeing/reading something, like a human can).