Comments Page - If AI seems smarter, it's thanks to smarter human trainers

« Back If AI seems smarter, it's thanks to smarter human trainersreuters.comSubmitted by getwiththeprog 9 months ago

throwaway_2968 9 months ago
Throwaway account here. I recently spent a few months as a trainer for a major AI company's project. The well-paid gig mainly involved crafting specialized, reasoning-heavy questions that were supposed to stump the current top models. Most of the trainers had PhDs, and the company's idea was to use our questions to benchmark future AI systems.
It was a real challenge. I managed to come up with a handful of questions that tripped up the models, but it was clear they stumbled for pretty mundane reasons—outdated info or faulty string parsing due to tokenization. A common gripe among the trainers was the project's insistence on questions with clear-cut right/wrong answers. Many of us worked in fields where good research tends to be more nuanced and open to interpretation. I saw plenty of questions from other trainers that only had definitive answers if you bought into specific (and often contentious) theoretical frameworks in psychology, sociology, linguistics, history, and so on.
The AI company people running the projects seemed a bit out of their depth, too. Their detailed guidelines for us actually contained some fundamental contradictions that they had missed. (Ironically, when I ran those guidelines by Claude, ChatGPT, and Gemini, they all spotted the issues straight away.)
After finishing the project, I came away even more impressed by how smart the current models can be.
- EvgeniyZh 9 months ago
  I'm currently pursuing PhD in theoretical condensed matter physics. I tried submitting questions to Humanity Last Exam [1], and it was not too hard to think of a problem that none of top llms (Claude, gpt, Gemini + both o1 models) got right. What was surprising for me is how small my bag of tricks was. I could think of 5-6 questions in my direct area of expertise with simple numerical answer that were hard for llms, and another maybe 5 that they were able to solve. But basically that was all my expertise. Of course there is stuff that can't be checked with simple numerical answer (quite a lot in my case), and there are probably additional questions that would require more effort from me to give a correct answer. But all in all, I suddenly felt I'm a one-trick pony, and that's given that my PhD is relatively diverse.
  [1] https://agi.safe.ai/submit
- godelski 9 months ago
  > Many of us worked in fields where good research tends to be more nuanced and open to interpretation
  I've had a hard time getting people to understand this. It's always felt odd tbh. It's what's meant by "truth doesn't exist". Because it doesn't exist with infinite precision, though they are plenty of times where there's good answers. In our modern world I think one of the big challenges is that we've advanced enough that low order approximations are no longer good enough. It should make sense, as we get better we need more complex models. We need to account for more.
  In many optimization problems there are no global solutions. This isn't because we lack good enough models, it's just how things are. And the environment is constantly changing, the targets moving. So the complexity will always exist. There's beauty in that, because what fun is a game when you beat it? With a universe like this, there's always a new level ahead of us.
  ramblingrain 9 months ago
  > Many of us worked in fields where good research tends to be more nuanced and open to interpretation
  >> I've had a hard time getting people to understand this.
  Why, can't you just tell them "it's not a science, it's more like performance art."
- layer8 9 months ago
  I wouldn’t look for questions with yes/no answers, but for questions where the answers can have correct/incorrect reasoning. Of course, you can’t turn those into automated benchmarks, but that’s maybe kinda the point.
  varjag 9 months ago
  I think that's the point: correctness can be a sliding scale. There is Newton correct, and there's Einstein correct.
  rightbyte 9 months ago
  It wouldn't surprise me if Newton correct is closer to correct for 'small numbers' then Einstein correct will be in the end in the general case.
- jrussino 9 months ago
  > when I ran those guidelines by Claude, ChatGPT, and Gemini
  Did you mention this to folks running the project? I would think that pasting the "detailed guidelines" from an internal project into a competitor's tool would run afoul of some confidentiality policy. At least, this sort of restriction has been a barrier to using such LLM tools in my own professional work.
- spencerchubb 9 months ago
  > I saw plenth of questions that only had definitive answers if you bought into specific theoretical frameworks
  That kind of stuff would be great to train on. As long as the answer says something like "If you abide by x framework, then y"
- DavidSJ 9 months ago
  What were the contradictions?
JCharante 9 months ago
> AI models now require trainers with advanced degrees
Companies that create data for FM (foundational model) companies have been hiring people with degrees for years
> Invisible Tech employs 5,000 specialized trainers globally
Some of those companies have almost a million freelancers on their platforms, so 5k is honestly kinda medium sized.
> It takes smart humans to avoid hallucinations in AI
Many smart humans fail at critical thinking. I've seen people with masters fail at spotting hallucinations in elementary level word problems.
- aleph_minus_one 9 months ago
  > Many smart humans fail at critical thinking. I've seen people with masters fail at spotting hallucinations in elementary level word problems.
  This is like lamenting that a person who has a doctoral degree, say, in mathematics or physics often don't have a more than basic knowledge about, for example, medicine or pharmacy.
  visarga 9 months ago
  > This is like lamenting that a person who has a doctoral degree, say, in mathematics or physics often don't have a more than basic knowledge in, for example, medicine or pharmacy.
  It was word problems not rocket science. That tells a lot about human intelligence. We're much less smart than we imagine, and most of our intelligence is based on book learning, not original discovery. Causal reasoning is based on learning and checking exceptions to rules. Truly novel ideation is actually rare.
  We spent years implementing transformers in a naive way until someone figured out you can do it with much less memory (FlashAttention). That was such a face palm, it was a trivial idea thousands of PhDs missed. And the code is just 3 for loops, with a multiplication, a sum and an exponential. An algorithm that fits on a napkin in its abstract form.
  beepbooptheory 9 months ago
  Doesn't this lead you to, perhaps, question the category and measure of "intelligence" in general, especially how it is mobilized in this kind of context? Like this very angle does a lot to point out the contradictions in some speculative metaphysical category of "intelligence" or "being smart," but then you just seem to accept it in this particular kind of fatalism.
  Why not take away from this that "intelligence" is a word that obtains something relative to a particular society, namely, one which values some kind of behavior and speech over others. "Intelligence" is something important to society, its the individual who negotiates (or not) the way they think and learn with what this particular signifier connects with at a given place and time.
  Like I assume you don't agree, but just perhaps if we use our "intelligence" here we could maybe come to some different conclusions here! Everyone is just dying to be like mid-20th century behaviorist now, I just don't understand!
  visarga 9 months ago
  Yes, I think intelligence is social and we kind of write off the social part and prefer to think in heroic terms, like "Einstein was so smart!"
  I prefer to use the concept of search instead, it is better defined in search space and goal space. It doesn't hide the environment, the external part of intelligence, or the learning process.
  klabb3 9 months ago
  > And the code is just 3 for loops, with a multiplication, a sum and an exponential.
  All invented/discovered and formalized by humans. That we found so much (unexpected) power in such simple abstractions is not a failure but a testament to the absolute ingenuity of human pursuit of knowledge.
  The mistake is we’re over-estimating isolated discoveries and underestimating their second order effects.
  visarga 9 months ago
  > a testament to the absolute ingenuity of human pursuit of knowledge
  I think it is more like searching and stumbling onto some great idea than pure-brain-ingenuity. That is why searching and social collaboration is essential and why I say we're not that smart individually, but we search together. It's slow, it took us years to get to Flash version of attention, but we get there, someone finds their way onto a major discovery eventually.
  It took humanity 200K years to accumulate our current level of understanding, and if we lost it, it would take us another 200k years. Not even a whole human generation is that smart. It's also why I don't fault LLMs for mass-learning from human text. We do the same thing, 99% is inherited knowledge. The whole process of knowledge discovery moves slowly, and over large populations.
  sdenton4 9 months ago
  Of course, all of this pales next to the invention of the actual human brain, which was not driven by humans or AI.
  bapr 9 months ago
  It’s a failure in that for decades we thought we had to circumlocute theoretically about all kinds of made up things for consciousness to exist rather than just leverage a bit of looping evolution like the universe did.
  jokethrowaway 9 months ago
  I've also been impressed by how simple transformers, mamba and co are.
  Maybe we get an llm to design the next evolution and benchmark how potential improvements fare?
  JCharante 9 months ago
  It depends on your definition of smart. I think that holding a degree != smart.
  aniviacat 9 months ago
  > I think that holding a degree != smart.
  Does this mean that these two sentences are completely unrelated and only happen to exist within the same paragraph?
  > Many smart humans fail at critical thinking. I've seen people with masters [...]
  I've understood you to use "people with masters" as an example of "smart humans".
  JCharante 9 months ago
  Well the article began by talking about how before these data training companies would just hire generalists for $2/hr, but now they're hiring degree holders. And it mentions that smart people will be necessary. I'm just saying that degree holding != smart and it's a trap that those data training companies have to avoid.
  aleph_minus_one 9 months ago
  > It depends on your definition of smart. I think that holding a degree != smart.
  You wrote:
  > I've seen people with masters fail at spotting hallucinations in elementary level word problems.
  I wanted to express that having a master in some (even complicated) subject does not make you a master at [pun intended] spotting hallucinations. To give evidence for this statement, I gave a different, more down-to-earth example of a similar situation.
  JCharante 9 months ago
  It was a math problem, like this.
  Q: A farmer has 72 chickens. He sells 15 chickens at the market and buys 8 new chicks. Later that week, a fox sneaks into the coop and eats 6 chickens. How many chickens could the farmer sell at the market tomorrow?
  AI Answer: The farmer started with 72 chickens. After selling 15, he had 57 chickens left. Then he bought 8 new chicks, bringing the total to 65. Finally, the fox ate 6 chickens, so we subtract 6 from 65. This gives us 59 chickens. Therefore, the farmer now has 59 chickens that he could sell at the market tomorrow.
  --
  You'd expect someone who can read/understand proofs to be able to spot a a flow in the logic that it takes longer than 1 week for chicks to turn into chickens.
  aleph_minus_one 9 months ago
  > You'd expect someone who can read/understand proofs to be able to spot a a flow in the logic that it takes longer than 1 week for chicks to turn into chickens.
  Rather, I'd assume that someone who is capable of spotting the flow in the logic has a decent knowledge of the English language (in this case referring to the difference in meaning between "chick" and "chicken").
  Many people who are good mathematicians (i.e. capable of "reading/understanding proofs" as you expressed it) are not native English speakers or have a great L2 level of English.
  Viliam1234 9 months ago
  But I was told that humans have this thing called "general intelligence", which means they should be capable to do both math and English!
  If an AI made a similar mistake, people would laugh at it.
  aleph_minus_one 9 months ago
  > But I was told that humans have this thing called "general intelligence", which means they should be capable to do both math and English!
  You confuse "intelligence" with "knowledge". To keep to your example: there exist quite a lot of highly intelligent people on earth who don't or barely know English.
  david-gpu 9 months ago
  Some native English speakers might still question that statement subconsciously, so let me make it clearer for them: there are many highly intelligent people in the world who don't speak the Rarámuri language.
  JCharante 9 months ago
  That’s also true and a HUGE problem when companies are hiring non native English speakers to work on English datasets.
  echoangle 9 months ago
  As a layman, i have no clue at what point a chick turns into a chicken. I also think this isn’t even answerable, because „new chick“ doesn’t really imply „newborn“ but only means „new to the farmer“, so the chicks could be at an age where they would be chickens a week later, no?
  Kerb_ 9 months ago
  I still call my 12 year old cat a "kitty". If someone marked my answer as incorrect because "chicks aren't chickens yet" I would think they're wasting their time with riddles instead of actual intelligence testing. Besides, if the chicks were sellable to the farmer, why the hell wouldn't the farmer be able to sell them?
  echoangle 9 months ago
  Now I have to think of this Reddit thread that made me react pretty similarly: https://www.reddit.com/r/ChatGPT/s/jWlSqhJsOH
  The OP there also has a pretty bad riddle (due to a grammatical error that completely changes the meaning and makes the intended solution nonsensical, and a solution that many people wouldn’t even have heard of).
  Kerb_ 9 months ago
  Exactly! I read that riddle and thought "a couple islands over the international date line" solely because of the last line, but still had no idea what the name of these islands thousands of miles away from me were named. Might as well make the riddle who their little brother is, and make the answer "Fairway Rock", if niche knowledge is your goal. Which, completely to GPT-o1's credit, it did solve in a single prompt when I asked!
  https://chatgpt.com/share/66f9371a-d6a0-8003-b2b5-4af3b10e8a...
  resoluteteeth 9 months ago
  > Besides, if the chicks were sellable to the farmer, why the hell wouldn't the farmer be able to sell them?
  I think maybe the original poster is making some sort of additional assumption that the farmer must be selling chickens as meat at the market and a chick wouldn't be sold for that purpose until it's a mature chicken?
  (Of course depending on how you interpret the question a chick is a chicken (species) and there's nothing inherently preventing reselling the chicks so I don't really understand why OP thinks the ai answer is clearly objectively wrong. It seems more like a matter of interpretation.)
  JCharante 9 months ago
  After posting I realized that the farmer bought some chicks so it could be interpreted that way. I should have modified it to say that 6 chickens hatched.
  Anyways this thread is a perfect example of the chaotic datasets that are being used to train FMs. These arguments of whether it’s reasonable to assume a chick could mature into a chicken within a week are happening everyday and have been taking place for years. Safe to say a billion dollars has been spent on datasets to train FMs where everybody has a different interpretation and the datasets are not aligned.
  jonahx 9 months ago
  When an educated person misses this question, it's not because the temporal logic is out of their reach. It's because they scanned the problem and answered quickly. They're pattern matching to a type of problem that wouldn't include the "tomorrow/next week" trick, and then giving the correct answer to that.
  Imo it's evidence that humans make assumptions and aren't always thorough more than evidence of smart people being unable to perform elementary logic.
  JCharante 9 months ago
  The humans were prompted to read the AI responses very carefully because their hallucinations are very good at convincing you with words. It takes a certain skillset to question every word that comes out of a language model because most people will go “hmm yeah that logic seems right”. So hiring “smart” people is insufficient, you need very paranoid people who question every assumption.
  randomdata 9 months ago
  Implying there isn’t a market for chickens that are chicks? Clearly there is. The question literally states that the farmer bought chicks, so logically they could go back on the market. They don’t need to be older.
  The AI got it right.
  IanCal 9 months ago
  O1-preview I think gets this right. It assumes a distinction between adult chickens and chicks.
  https://chatgpt.com/share/66f890b2-04bc-8002-9724-2deaf3985d...
  randomdata 9 months ago
  It did a better job in explaining that there is ambiguity in the question, but still went ahead with an arbitrary assumption in order to answer it. I think it is fair to say it is right, but so was the other attempt. Each interpretation is quite valid.
  "Most right" would have been to ask questions about what is being asked instead of trying to answer an incomplete question. But rarely is the human even willing to do that as it is bizarrely seen as a show of weakness or something. An LLM is only as good as its training data, unfortunately.
  IanCal 9 months ago
  I agree both got it right, in the sense that it wouldn't be a stupid thing for a human to do. If there's a follow-up from someone, I'm sure the more basic llm would have been able to adjust.
  Regardless I think it's good showing that models are increasingly able to solve these "gotcha" questions, even though I think it's not hugely useful. Partly because I think it's a poor compliant and an easy shutdown.
  JCharante 9 months ago
  Nice. I used Claude 3.5 Sonnet to generate a word problem & false solution for my example.
  fragmede 9 months ago
  Or the other one, a flock of 8 birds are sitting on a fence. The farmer shoots 1. How many are left? 8-1 is 7, but the answer is zero, because the gun shot scared the rest of them off. Fwiw, ChatGPT says zero.
  At some point, we decided that compilers were good enough to convert code into assembly to just use them. even if an absolute master could write better assembly than the complier, we moved over to using compilers because of the advantages offered.
  JCharante 9 months ago
  Sorry I fail to see your point. Is it that conflating chicks and chickens is good enough performance?
  fragmede 9 months ago
  The question is for what. For the level of interaction that many day-to-day tasks require, ChatGPT meets that. When you’re going grocery shopping, how often do you get stopped at the door by a security guard who won’t let you pass unless you answer their riddle? The A in AI stands for artificial, so it’s going to look different than human intelligence but we’re at a point where I can throw some words at the computer and it will generate an essay for me relevant to the words I threw at it. It may not get every little detail right, but I’m amazed by that because I’ve had meaningful interactions with humans via text whom wouldn’t have caught the chicks versus chickens gotcha.
  Is ChatGPT an all knowing and infallible oracle? Clearly not. But holding it to a higher standard than we hold other humans to is a unfair test of its abilities.
  mkl 9 months ago
  Chicks are chickens, just like puppies are dogs, kittens are cats, etc. "Chicken" is the name of the species: https://en.wikipedia.org/wiki/Chicken#Nomenclature
  If you'd said "hens" you'd have a stronger point, but then you'd need to be talking about chicks and hens (and they could still cross whatever adulthood threshold you like within the week, as you didn't specify how young they are - "new" could just mean new to the farmer).
  spencerchubb 9 months ago
  Do you believe that holding a degree is dumb, or just that holding a degree is an insufficient condition for smartness? Technically what you wrote says the former
  JCharante 9 months ago
  Thanks. I meant that holding a degree does not guarantee that you are smart.
  bapr 9 months ago
  Yeah it’s a virtue signal that one has written some language that doesn’t focus on first person anecdote. It’s a sign someone is a hard drive of prior knowledge.
  We leaned on spoken tradition education to pass down knowledge as written literacy, paper, writing tools were hard to come by until the last century. It was never about the student but the future. Still the same today; one student isn’t propping up reality.
  People think learning a linguistic style means discovery of net new knowledge.
  dilawar 9 months ago
  I think many people like to believe that solving puzzles will somehow make them better at combinatorics. Lateral skill transfer in non-motor skills e.g. office works, academics works etc may not be any better than motor skills. It's easier to convince people that playing soccer everyday wouldn't make them any better at cricket, or even hockey.
  sudosysgen 9 months ago
  But motor skills transfer extremely well. It's not uncommon for professional athletes to switch sports, some even repeatedly.
  Der_Einzige 9 months ago
  There’s some famous ass basketball players with mediocre but still existent MLB careers.
  jononor 9 months ago
  Wealth, network and fame transfers incredibly well between fields. Possibly better than anything else. It should be accounted for when reasoning about success in disparate fields. In addition to luck, of course.
  mmooss 9 months ago
  > mediocre but still existent MLB careers.
  If you have a MLB career at all, you are an elite baseball player.
  thatcat 9 months ago
  Kobe Bryant played soccer, Michael Jordan played baseball, Lebron played football.. it actually makes you even better because you learn non traditional strategies to apply to the other sport you're playing.
- 39896880 9 months ago
  All the models do is hallucinate. They just sometimes hallucinate the truth.
  vharuck 9 months ago
  Nice George Box paraphrasing.
  therealdrag0 9 months ago
  A great deal of my own thinking could be described as hallucinating, given a sufficiently loose definition.
- undefined 9 months ago
  [deleted]
Stem0037 9 months ago
AI, at least in its current form, is not so much replacing human expertise as it is augmenting and redistributing it.
- alephnerd 9 months ago
  Yep. And that's the real value add that is happening right now.
  HN concentrates on the hype but ignores the massive growth in startups that are applying commoditized foundational models to specific domains and applications.
  Early Stage investments are made with a 5-7 year timeline in mind (either for later stage funding if successful or acquisition if less successful).
  People also seem to ignore the fact that foundational models are on the verge of being commoditized over the next 5-7 years, which decreases the overall power of foundational ML companies, as applications become the key differentiator, and domain experience is hard to build (look at how it took Google 15 years to finally get on track in the cloud computing world)
  MostlyStable 9 months ago
  I notice that a lot of people seem to only focus on the things that AI can't do or the cases where it breaks, and seem unwilling or incapable of focusing on things it can do.
  The reality is that both things are important. It is necessary to know the limitations of AI (and keep up with them as they change), to avoid getting yourself in trouble, but if you ignore the things that AI can do (which are many, and constantly increasing), you are leaving a ton of value on the table.
  aleph_minus_one 9 months ago
  > I notice that a lot of people seem to only focus on the things that AI can't do or the cases where it breaks, and seem unwilling or incapable of focusing on things it can do.
  I might be one of these people, but in my opinion, one should not concentrate on things that it can do, but for how many of the things where an AI might be of help for you,
  - it does work
  - it only "can" do it in a very broken way
  - it can't do that
  At least for the things that I am interested in an AI doing for me, the record is rather bad.
  signatoremo 9 months ago
  Just because AI doesn’t work for you, doesn’t mean it doesn’t work for other people. Ozempic may have no effect, or even harmful to you, but it’s a godsend for many others. Acknowledge that, instead of blindly insisting on your use cases. It’s fine to resist the hype, but it’s foolish to be willfully ignorant.
  vladms 9 months ago
  How do you define "can do" ? Would answering correctly 9 out of 10 questions correctly for a type of question (like give directions knowing a map) mean it "can do" or that it "can't do" ?
  Considering it works for so many cases, I think it is naturally to point out the examples where it does not work - to better understand the limit.
  Not to mention that practically, I did not see anything proving that it will always "be able" to do something . Yes, it works most of the times for many things, but it's important to remember it can (randomly?) fail and we don't seem to be able to fix that (humans do that too, but having computers fail randomly is something new). Other software lets say a numerical solver or a compiler, are more stable and predictable (and if they don't work there is a clear bug-fix that can be implemented).
  alephnerd 9 months ago
  Yep! Nuance is critical, and sadly it feels like nuance is dying on HN.
  Tepix 9 months ago
  This very discussion feels nuanced so i don't share your sentiment.
  skybrian 9 months ago
  It would be nice to have more examples. Without specifics, “massive growth in startups” isn’t easily distinguishable from hype.
  A trend towards domain-specific tools makes sense, though.
  alephnerd 9 months ago
  DevTools/Configuration Management and Automated SOC are two fairly significant example.
  jayd16 9 months ago
  Am I the only one unimpressed by the dev tool situation? Debugging and verifying the generated code is more work than simply writing it.
  I'm much more impressed with the advances in computer vision and image generation.
  Either way, what are the startups that I should be looking at?
  Terr_ 9 months ago
  And even when the output is perfect, it may be that the tool is helping you write the same thing a hundred times instead of abstracting it into a better library or helper function.
  Search/Replace as a service.
  skybrian 9 months ago
  Those are more like broad categories than examples of startups, though.
  danielbln 9 months ago
  Same with consultancy. There is a huge amount of automation that can be done with current gen LLMs, as long as you keep their shortcomings in mind. The "stochastic parrot" crowd seems an over correction to the hype bros.
  alephnerd 9 months ago
  It's because the kind of person who understands nuance isn't the kind of person to post in HN flame wars.
  The industry is still in it's infancy right now, and stuff can change in 3-5 years.
  Heck, 5 years ago models like GPT-4o were considered unrealistic in scale, and funding in the AI/ML space was drying up at the expense of crypto and cybersecurity. Yet look at the industry today.
  We're still very early and there are a lot of opportunities that are going to be discovered or are in the process of being discovered.
  parineum 9 months ago
  GPT4o is unrealistic at scale. OpenAI isn't making a profit running it.
  Workaccount2 9 months ago
  ...and then being blown up when the AI company integrates their idea.
  alephnerd 9 months ago
  Not exactly.
  At least in the cybersecurity space, most startups have 3-5 year plans to build their own foundational models and/or work with foundational model companies to not directly compete with each other.
  Furthermore, GTM is relationship and solution, and an "everything" company has a difficult time sympathizing or understanding GTM on a sector to sector basis.
  Instead, the foundational ML companies like OpenAI have worked to instead give seed/pre-seed funding to startups applying foundational MLs per domain.
  HeatrayEnjoyer 9 months ago
  OpenAI/Microsoft are building a $100B+ datacenter for foundation models and pitching ideas for $1T+. Compute is the primary bottleneck, startup competitors will not be physically possible.
- hanniabu 9 months ago
  Yes, it should really be called collective intelligence not artificial intelligence
recursive 9 months ago
It kind of seems like it got dumber to me. Maybe because my first exposure to it was so magical. But now, I just notice all the ways it's wrong.
- joe_the_user 9 months ago
  I think any given model is going to decay over time. The data used in them becomes out-dated and the models cost money to run and various cost-saving short-cuts are thus made to reduce accuracy. Also, having your old model seem clunky can make your new model seem great.
  Obviously, there are real ways new model get better too. But if we have diminishing returns, as many speculate, it will take a while for it to be entirely obvious.
theptip 9 months ago
I feel this is one of the major ways that most pundits failed with their “the data is going to run out” predictions.
First and foremost a chatbot generates plenty of new data (plus feedback!), but you can also commission new high-quality content.
Karpathy recently commented that GPT-3 needs so many parameters because most of the training set is garbage, and that he expects eventually a GPT-2 sized model could reach GPT-3 level, if trained exclusively on high-quality textbooks.
This is one of the ways you get textbooks to push the frontier capabilities.
- llm_trw 9 months ago
  I've not done pre-training for LLMs, but years ago I generated a completely synthetic dataset for table recognition using an off the shelf document segmentation model, raw TeX, a random table generator, a discriminator and an evolutionary algorithm to generate different styles of tables.
  The project got killed due to management but I still got results on that dataset better than state of the art in 2023 with no human annotation.
  The Venn diagram of people who know TeX well enough to write a modular script for table generation with metadata and people who know how to train LLMs has an intersection of a dozen people I imagine.
  theptip 9 months ago
  This is a great example. Areas where an expert can write a synthetic data generator (eg code, physics simulators, etc) are the dream scenario here.
  It seems to me there is a huge amount of unharvested low-hanging fruit here, for example IIUC GPT is not trained on synthetic code in languages other than Python (and maybe JS, I don’t recall).
- from-nibly 9 months ago
  At a good cost though? Last time I checked generating good data costs a tiny bit more than an http request to somebody elses website.
  theptip 9 months ago
  If the cost is not “good enough”, why are the big guys buying a lot of it?
CamperBob2 9 months ago
Which is fine. If all AI does is represent human knowledge in a way that makes it explainable and transformable rather than merely searchable, then the hype is justified... along with Google's howling, terrified panic.
The role played by humans on the training side is of little interest when considering the technology from a user's perspective.
- jumping_frog 9 months ago
  The problem is my back and forth with Claude is just Claude's data not available to any other. Unlike stack overflow which is fair game for every AI.
- iwontberude 9 months ago
  I think the most interesting aspect of it is the human training. Human blindsides, dogma, ignorance, etc. All on demand and faster than you can validate its accuracy or utility. This is good.
  CamperBob2 9 months ago
  Shrug... I don't know what anyone expected, once humans got involved. Like all of us (and all of our tools), AI is vulnerable to human flaws.
  ddulaney 9 months ago
  I think that’s really important to reinforce! You probably know better, but lots of the less technical people I talk to don’t think that way. It’s not at all obvious to an observer who doesn’t know how this stuff works that a computer could be racist or misogynist.
  CamperBob2 9 months ago
  Yeah, I do think that's going to be a problem.
  Years ago, my GF asked me why we bother with judges and juries, given all the uneven sentencing practices and other issues with the current legal system. "Why can't the courts run on computers?" This was back in the pre-Alpha Go era, so when I answered her, I focused on technical reasons why Computers Can't Do That... reasons that are all basically obsolete now, or soon will be.
  The real answer lies in the original premise of her question: because Humans Also Can't Do That with the degree of accuracy and accountability that she was asking for. Our laws simply aren't compatible with perfect mechanized jurisprudence and enforcement. Code may be law, but law isn't code.
  That problem exists in a lot of areas where people will be looking to AI to save us from our own faults. Again, this has little to do with how training is conducted, or how humans participate in it. Just getting the racism and misogyny out of the training data isn't going to be enough.
  Terr_ 9 months ago
  Also: It's not just about what task can/can't can't be done, but what other frameworks you/can't build around the executor to detect errors and handle exceptional cases.
yawnxyz 9 months ago
"raw dogging" non-RLHF'd language models (and getting good and unique output) is going to be a rare and sought-after skill soon. It's going to be a new art form
someone should write a story about that!
- zmgsabst 9 months ago
  I’m personally waiting on AI psychology to take off.
  Eg, why does ChatGPT like the concept of harmony so much and use it as a principle for its political analysis?
  yawnxyz 9 months ago
  I thought it's b/c of RLHF?
  I think the earliest GPT-3 wasn't too keen on harmony, but I might be mis-remembering
SamGyamfi 9 months ago
There is a cost-quality tradeoff companies are willing to make for AI model training using synthetic data. It shows up fairly often with AI research labs and their papers. There are also upcoming tools that remove the noise that would trip up some advanced models during annotation. Knowing this, I don't think the "human-labeled data is better" argument will last that long.
GaggiX 9 months ago
Let's not ignore better architectures, training techniques and computing power.
- ben_w 9 months ago
  It's both. I recently saw a comparison of various models on two IQ tests, one of which was public and the other of which was carefully curated to be not directly learnable from the likely training sets.
  On public tests, LLMs vary between "just below average human" and "genius".
  On the hopefully-private test (it's difficult to be sure*), the best was o1, which was "merely" just below an average human, Claude-3 Opus which was stupid, and all the rest were "would need a full time caretaker".
  In both cases, the improvements to the models came with higher scores; but there's still a lot you can do by learning for the test — and one thing that LLMs are definitely superhuman at is that.
  https://www.maximumtruth.org/p/massive-breakthrough-in-ai-in...
  * I could have said the same last year about last year's models, so I'm emphatically not saying o1 really is as smart as this test claims; I'm only saying this demonstrates these IQ tests are a learnable skill up to at least this magnitude of difference.
- kaycebasques 9 months ago
  Suppose you are competing to create the "best" frontier model and have finite R&D budget to allocate into the 4 buckets (plus a catchall in case we're missing something):
  * Data
  * Architecture
  * Training techniques
  * Compute
  * Other
  What allocation gives you the best chance for success? And how are you defining "best"?
  NitpickLawyer 9 months ago
  Right now I'd prioritise compute over anything, because it allows for more experiments, and some of those experiments might turn out to be the key to better models (either specific applications or overall generalist models).
  Meta did this with L3. They used L2 to pre-filter the training data, filtering out a lot of junk. They also used it to classify datasets. Then after pre-training (involving lots of compute) they also used almost exclusively synthetic data for fine-tuning (forgoing RLHF when it was surpassed). So yet more compute. The results are pretty good, L3, 3.1 and 3.2 are pretty high up there in terms of open access SotA.
  oAI did this with their o1 models. They used lots of compute to have the models go over the space of generating tokens, analysing, correcting, and so on. They RLd the "reasoning traces" in a way. Lots of compute. The results seem to be pretty good, with impressive showings on "reasoning" tasks, math, code, and so on.
  The thing is, they weren't the first ones to propose these techniques! What differentiates them is the available compute.
  WizardML tried and were really successful with their RLAIF implementation (tho never released code afaik) about a year ago. And while they were connected to MS research, they probably didn't have as much compute available as Meta. But the WizardML fine-tunes on open models like Mistral and Mixtral were pretty much SotA when released, scoring way higher than the creator's own fine-tunes.
  In the same vein, but at lower scales is the team behind DeepSeek. They used RL on math problems, in their DeepSeekMath-7bRL model, and that model was SotA at the time of release as well. It took a team of multiple really talented folks to fine-tune a better model (in the AIMO kaggle competition) and everyone except the 1st place used the RL model. The 1st place used the base model, with different fine-tuning. So again, the methods were tried, just at much lower scales.
  Yeah, I think compute would be my bet in the short run.
- JCharante 9 months ago
  using human feedback for reinforcement learning is a training technique
jwrallie 9 months ago
I wonder how much prompt skill is actually influencing the quality of the response.
After using LLMs daily for some time, I have developed a feeling on how to phrase my requests as to get better quality answers.
For example, ensure that it can process the information linearly, like asking to classify items in a list and adding the label directly after the item so that the order remains, instead of allowing it to create multiple lists as the output (which it tends to do by default).
So, at least for me, the prompts are getting smarter.
- layer8 9 months ago
  Smarter, but also more tedious. It would be great to have technology that automates this tedious work. ;)
- machiaweliczny 9 months ago
  Yeah, I feel like you get much better answers when you pull enough context with leading context (but without insinuating answers).
  I wonder if someone tried something like this: "Given user question generate prompt that tells LLM that he is a master in this subject and then ask the question" - something in this allows LLM to precompute what the focus of question will be so it can answer much better and it should be automated for one-off questions.
fsndz 9 months ago
> "In the early years, getting AI models like ChatGPT or its rival Cohere to spit out human-like responses required vast teams of low-cost workers helping models distinguish basic facts such as if an image was of a car or a carrot."
Starting the article by comparing what is necessary for ChatGPT to work and image labelling is a bit weird
tiku 9 months ago
I was watching the video of someone using openai voice chat, asking for a joke. Then I've tried it myself, asking for a joke and I got the exact same one. (Why don't skeletons fight eachother?).
Seems like if then else haha.
butz 9 months ago
I bet that every "top" performing GenAI has dozens upon dozens of "if" statements to make them seem smart.
bdjsiqoocwk 9 months ago
Submarine article placed by Cohere. Wtf is cohere.
wlindley 9 months ago
a/k/a It is all a clever scam. True, or true?
ysofunny 9 months ago
I feel weird being stubborn against free tier google gemini
I feel as though it 'extracts' some sort of "smartness" out of me (if any) and then whatever intelligence from me becomes part of google gemini
this is why I would never want to pay for using these tools, anything good that comes from me in the chat becomes google's by AI training, which is ok so long as it's free to use
i.e. I won't pay to make their stuff better through my own work
- simonw 9 months ago
  Several LLM providers have solid promises that they won't train on your inputs to them. OpenAI have this if you are using their paid API (though frustratingly not for their paid ChatGPT users, at least to my knowledge), and Anthropic have that for input to their free apps as well: https://support.anthropic.com/en/articles/7996885-how-do-you...
  I was hoping I could say the same for Gemini, but unfortunately their policy at https://support.google.com/gemini/answer/13594961?visit_id=6... says "Google uses this data, consistent with our Privacy Policy, to provide, improve, and develop Google products and services and machine-learning technologies"
  My intuition is that Google don't directly train on user conversations (because user conversations are full of both junk and sensitive information that no model would want to train on), but I can't state that with any credibility.
  fhdsgbbcaA 9 months ago
  I’m sure there’s absolutely zero chance that Sam Altman would lie about that, especially now that he’s gutted all oversight and senior-level opposition.
  light_hue_1 9 months ago
  Ah yes. Solid promises you can never verify. That companies would benefit massively from violating.
  That's worth literally nothing.
  Workaccount2 9 months ago
  I know this sounds heretical, but companies generally do not go against what they say they are doing. They might use clever language or do slimey things, but it's very rare that they will say "We do not do xyz" while they are in fact doing xyz. Especially for big companies.
  Reputation has far more value than whatever they gain by lying. Besides, they can just say "We do xyz" because <1% of users read the TOS and less than <0.1% care enough to not use the service.
  blooalien 9 months ago
  > Google: "Don't Be Evil" is our motto!
  > Also Google: "Let's do all the evil things..." ~ heavily "paraphrased" ;)
  My "tongue-in-cheek point" is that it seems like corporations beyond a certain point of "filthy-richness" just do as they please, and say what they please, and mostly neither thing has to agree with the other, nor does either one need affect their profits "bottom line" all that seriously much. Most of your typical "mega-corps" are really only able to be affected much by the laws and legal system, which they've been increasingly "capturing" in various ways so that happens very rarely anymore these days, and when it does it's most often a "slap on the wrist" and "don't do that!" sorta thing, followed by more business-as-usual.
  You know the old worry about the "paperclip production maximizer AI" eating everything to create paperclips? That's kinda where we're pretty-much already at with mega-corps. They're so utterly laser-focused on maximizing to extract every last dime of profit out of everything that they're gonna end up literally consuming all matter in the universe if they don't just destroy us all in the process of trying to get there.
  ahazred8ta 9 months ago
  It looks like you're trying to maximize paperclips. Would you like help?
  https://www.decisionproblem.com/paperclips/ #-the-game
  Workaccount2 9 months ago
  I mean from a non-subjective legal TOS perspective.
  I'm not arguing that the grocery store saying "fresh produce" guarantees that the produce is fresh. Fresh, like evil, is subjective.
  I'm saying that if the grocery puts "All our produce is no older than 10 days" you can be pretty sure they adhere to that and train employees to follow it. "10 days" is not subjective.
  pton_xd 9 months ago
  This is supremely naive, in my opinion.
  Big companies not only lie, some of them do so routinely, including breaking the law. Look at the banking industry: Wells Fargo fraudulent / fake account scandal, JPMorgan Chase UST and precious metals future fraud. Standard Charter bank caught money laundering for Iran, twice. Deutsche Bank caught laundering for Russia, twice. UBS laundering and tax evasion. Credit Suisse caught laundering for Iran. And so on.
  Really it comes down to what a company believes it can get away with, and what the consequences will be. If there are minimal consequences they'd be dumb not to try.
  Oh I just remembered a funny one: remember when it came out that Uber employees were using "God view" to spy on ex-partners, etc? For years. Yeah I'm pretty sure the TOS didn't have a section "Our employees may, from time to time, spy on you at their discretion." Actually the opposite, Uber explicitly said they couldn't access ride information for its users.
  startupsfail 9 months ago
  The company can certainly make a calculated risk of going against their TOS and their promise to the customers at the cost of potential risk of their reputation.
  Note that such reputation risks are external and internal. The reputation reflects on the executive team and there is a risk that the executive team members may leave or attempt to get the unscrupulous employee fired.
  choilive 9 months ago
  It would also destroy these companies if they were ever caught lying.
  atq2119 9 months ago
  That seems awfully optimistic, given what Sam Altman is getting away with transforming the governing structure of OpenAI.
  jart 9 months ago
  Not if the government required them to do it.
  immibis 9 months ago
  OpenAI also promised to remain open forever.
- buzzerbetrayed 9 months ago
  I totally sympathize with the sentiment. But how long until people who are taking a moral stand against AI are simply obsoleted by the people who don’t? Today it’s easy to code effectively without relying on AI. But in 10 years will you simply be too slow? Same argument can be made with nearly any industry.
  croes 9 months ago
  That's same logic as for frameworks like react.
  With react you are more productive, my web experience is worse than without a those frameworks.
  And LLMs get worse if they are trained on AI generated text. At the current speed I don't know if in 10 years AI is still worse the high costs.
  joshstrange 9 months ago
  > With react you are more productive, my web experience is worse than without a those frameworks.
  You cannot begin to know that for sure and really makes little to no sense if you think about it.
  As with the anti-electron crowd the options are not:
  * Electron app
  or
  * Bespoke, hand-crafted, made with love, native app
  The options are normally “electron app” or “nothing”.
  Same deal here. Taking away React/Angular/Vue won’t magically make people write more performant websites. I’m sure people bitched about (and continue to) PHP for making it easy for people to create websites that aren’t performant or Wordpress for all its flaws. It’s the same story that’s repeated over and over in tech circles and I find it both silly and incredibly short-sighted. Actually I find it tiring because you can always go one level deeper to one-up these absurd statements. It’s No True Scotsman all the way down.
  emptiestplace 9 months ago
  I feel like I (probably?) agree with what you are saying, but this is a very confusing comment. You started out with an epistemological argument, and then jumped into an analogy that's so close to what is being discussed that on first read I thought you were just confused. I'm not sure anyone can continue the discussion in a meaningful way from what you've written because so many aspects of your comment are ambiguous or contradictory.
  croes 9 months ago
  I mean retrospectively.
  In the time before all those framework like react the UX was better for me than now.
  Less flashy, animated but faster.
  smileson2 9 months ago
  I hate this analogy, even things from the rad days like vb were better than electron
  pixl97 9 months ago
  Pretty much like the people that don't care about privacy. You still get captured and tagged in their information and uploaded to the web. As an individual it's difficult to do much about it.
- undefined 9 months ago
  [deleted]
- JCharante 9 months ago
  tbh your data would be too unstructured, it's not really being used to train unless you flag it deliberately with a feedback mechanism