• jmull 24 minutes ago

    I don't understand "The Results" graph.

    The x-axis has integers, 0, 1, 2, 3, 4, 5, 6, but the text talks about models struggling at the 30 character mark? On the graph they all start getting bad around 3, depending on what you mean by bad. Is the x-axis tens of characters??

    Anyway...

    > anything longer than 20 characters would tend to have more issues, we flagged those for manual review.

    Even though the failure rate was smaller, is it OK if several of the shorter equations are wrong? Maybe they should have manually reviewed all of them.

    Edit: Now I see someone else brought up the x-axis issue. There's a response that seems to say the x-axis is buckets of 10 characters. I guess the update hasn't gone through yet.

    • gostsamo 4 hours ago

      Funnily enough, the images in the article do not have actually useful alt text and like every image in Substack I've encountered so far have no useful captions either.

      • bearjaws 4 hours ago

        How is the alt-text not useful? I even went through the effort of putting the data in the alt text for the bar chart. I tend to think of alt text as proving the same context as the image, for example the line chart is meant to convey how 1.5-flash outperforms 4o, but I am not going to embed each discrete data point in the alt text.

        • SalmonSnarker 2 hours ago

          3 out of 5 images on the post have empty alt text (alt=""). most substacks are pretty careless about alt text and so previous poster is just noting that your accessibility post follows this trend. (It's worth noting the post you made previous to this has 0 out of 4 images with alt text.)

          • gostsamo 4 hours ago

            Checking the later pictures that you talk about, the alt text is found indeed. My recommendation though would be to give a summary of the data and not the conclusion. E.g. Gemini flash has error rate of x% while the others are y% and z%.

            • gostsamo 4 hours ago

              Maybe something is lost in the translation, but here it is what my screen reader makes out of the article:

              Along the way we realized some of our math courses had not been updated in quite some time, and some schools were still leveraging these courses to teach. Images for equations are bad m’kay

              It was immediately apparent was the use of images to represent equations like this: https%3A%2F%2Fsubstack-post-me… https%3A%2F%2Fsubstack-post-me… This is not great… the font is a bit on the smaller side and the font itself is not very legible, in my non-font expert opinion. Making matters worse, there is no alt-text provided that can explain the equation.

          • bearjaws 4 hours ago

            Funny Google just released moments ago - gemini-1.5-flash-8b which scores slightly lower on vision. For clarity this is on the "older" gemini-1.5-flash.

            https://developers.googleblog.com/en/gemini-15-flash-8b-is-n...

            • armoredkitten 4 hours ago

              What is the measurement on the x-axis in the graph?? The text is talking about equations of 20 or 30 characters, but the graph goes up to...6. Six what?? Characters? Terms? If it's characters, why do we only get to see the performance from 1-6, when apparently 7% of equations had more than 20?

              • bearjaws 4 hours ago

                That's a fair point, I bucketed them into lengths of 1-10, 11-20, 21-30. I'll do a quick update.

              • pumanoir 3 hours ago

                I've had great success to convert math pics to latex using qwen2-vl