Comments Page - Designing Predictable LLM-Verifier Systems for Formal Method Guarantee

« Back Designing Predictable LLM-Verifier Systems for Formal Method Guaranteearxiv.orgSubmitted by PaulHoule 7 hours ago

brantmv 2 hours ago
Maybe I'm wrong, but it looks like the authors did not actually have any LLMs write or verify any code for their experiments. Instead, their experiments consist of simulating the simplified Markov chain model itself. They simulated their simple Markov chain and checked if the theorem's predictions matched empirical statistics. This amounts to a test not of their model, but of basic Markov chain theory.
Did I misread or miss something?
mapontosevenths 2 hours ago
This line made me pause:
"We prove that for any non-zero stage success probability, the system reaches the verified state almost surely"
What's the point if its still stochastic?
- jaggederest 2 hours ago
  "almost surely" means "happens with a probability 1", which in infinite set contexts doesn't mean that there aren't other outcomes, but that they have probability 0.
  So like, imagine that you had some finite list of integers, and you were picking a random number from 0 to infinity - because the domain is infinite, any finite set has 0 probability, but that doesn't mean it doesn't exist.
  https://en.wikipedia.org/wiki/Almost_surely
  mapontosevenths 38 minutes ago
  Thank you. That makes this a pretty big deal doesn't it?
  The ability to deterministcly identify that code eventually reaches a halting state, implies that we can use these stochastic tools to generate deterministic outcomes reliably in the future doesn't it?
  jaggederest 12 minutes ago
  Well, reliably but still with a chance of failure - in the same way that you can have a program which is provably correct but can still run into real world issues like being killed, but yes I would say that "almost surely" is a pretty large jump from "more than likely" (50%+1) where I'd say LLM output generally lives these days.
- IanCal 2 hours ago
  Hash collisions are possible but can be provably so rare that they’re not a relevant concern.