• voidUpdate an hour ago

    Wouldn't "Serverless OCR" mean something like running tesseract locally on your computer, rather than creating an AI framework and running it on a server?

    • cachius an hour ago

      Serverless means spinning compute resources up on demand in the cloud vs. running a server permanently.

      • dsr_ an hour ago

        ~99.995% of the computing resources used on this are from somebody else's servers, running the LLM model.

      • normie3000 an hour ago

        Thanks for noting this - for a moment I was excited.

        • mrweasel 24 minutes ago

          When people mentions the number of lines of code, I've started to become suspicious. More often than not it's X number of lines, calling a massive library loading a large model, either locally or remote. We're just waiting for spinning up your entire company infrastructure in two lines of code, and then just being presented a Terraform shell script wrapper.

          I do agree with the use of serverless though. I feel like we agree long ago that serverless just means that you're not spinning up a physical or virtual server, but simply ask some cloud infrastructure to run your code, without having to care about how it's run.

      • kbyatnal 38 minutes ago

        Deepseek OCR is no longer state of the art. There are much better open source OCR models available now.

        ocrarena.ai maintains a leaderboard, and a number of other open source options like dots [1] or olmOCR [2] rank higher.

        [1] https://www.ocrarena.ai/compare/dots-ocr/deepseek-ocr

        [2] https://www.ocrarena.ai/compare/olmocr-2/deepseek-ocr

        • ckrapu a minute ago

          I wasn't aware of dots when I wrote the blog post. This is really good to know!! I would like to try again with some newer models.

          • tclancy 9 minutes ago

            The article mentions choosing the model for its ability to parse math well.

          • coolness an hour ago

            Slight tangent: i was wondering why DeepSeek would develop something like this. In the linked paper it says

            > In production, DeepSeek-OCR can generate training data for LLMs/VLMs at a scale of 200k+ pages per day (a single A100-40G).

            That... doesn't sound legal

            • apwheele an hour ago

              Question for the crowd -- with autoscaling, when a new pod is created it will still download the model right from huggingface?

              I like to push everything into the image as much as I can. So in the image modal, I would run a command to trigger downloading the model. Then in the app just point to the locally downloaded model. So bigger image, but do not need to redownload on start up.

              • ddtaylor an hour ago

                How does this compare to Tesserect?