Wouldn't "Serverless OCR" mean something like running tesseract locally on your computer, rather than creating an AI framework and running it on a server?
Serverless means spinning compute resources up on demand in the cloud vs. running a server permanently.
~99.995% of the computing resources used on this are from somebody else's servers, running the LLM model.
Thanks for noting this - for a moment I was excited.
When people mentions the number of lines of code, I've started to become suspicious. More often than not it's X number of lines, calling a massive library loading a large model, either locally or remote. We're just waiting for spinning up your entire company infrastructure in two lines of code, and then just being presented a Terraform shell script wrapper.
I do agree with the use of serverless though. I feel like we agree long ago that serverless just means that you're not spinning up a physical or virtual server, but simply ask some cloud infrastructure to run your code, without having to care about how it's run.
Deepseek OCR is no longer state of the art. There are much better open source OCR models available now.
ocrarena.ai maintains a leaderboard, and a number of other open source options like dots [1] or olmOCR [2] rank higher.
I wasn't aware of dots when I wrote the blog post. This is really good to know!! I would like to try again with some newer models.
The article mentions choosing the model for its ability to parse math well.
Slight tangent: i was wondering why DeepSeek would develop something like this. In the linked paper it says
> In production, DeepSeek-OCR can generate training data for LLMs/VLMs at a scale of 200k+ pages per day (a single A100-40G).
That... doesn't sound legal
Question for the crowd -- with autoscaling, when a new pod is created it will still download the model right from huggingface?
I like to push everything into the image as much as I can. So in the image modal, I would run a command to trigger downloading the model. Then in the app just point to the locally downloaded model. So bigger image, but do not need to redownload on start up.
How does this compare to Tesserect?