Comments Page - Rolling your own serverless OCR in 40 lines of code

« Back Rolling your own serverless OCR in 40 lines of codechristopherkrapu.comSubmitted by mpcsb 4 days ago

voidUpdate an hour ago
Wouldn't "Serverless OCR" mean something like running tesseract locally on your computer, rather than creating an AI framework and running it on a server?
- cachius an hour ago
  Serverless means spinning compute resources up on demand in the cloud vs. running a server permanently.
  dsr_ an hour ago
  ~99.995% of the computing resources used on this are from somebody else's servers, running the LLM model.
- normie3000 an hour ago
  Thanks for noting this - for a moment I was excited.
  mrweasel 24 minutes ago
  When people mentions the number of lines of code, I've started to become suspicious. More often than not it's X number of lines, calling a massive library loading a large model, either locally or remote. We're just waiting for spinning up your entire company infrastructure in two lines of code, and then just being presented a Terraform shell script wrapper.
  I do agree with the use of serverless though. I feel like we agree long ago that serverless just means that you're not spinning up a physical or virtual server, but simply ask some cloud infrastructure to run your code, without having to care about how it's run.
kbyatnal 38 minutes ago
Deepseek OCR is no longer state of the art. There are much better open source OCR models available now.
ocrarena.ai maintains a leaderboard, and a number of other open source options like dots [1] or olmOCR [2] rank higher.
[1] https://www.ocrarena.ai/compare/dots-ocr/deepseek-ocr
[2] https://www.ocrarena.ai/compare/olmocr-2/deepseek-ocr
- ckrapu a minute ago
  I wasn't aware of dots when I wrote the blog post. This is really good to know!! I would like to try again with some newer models.
- tclancy 9 minutes ago
  The article mentions choosing the model for its ability to parse math well.
coolness an hour ago
Slight tangent: i was wondering why DeepSeek would develop something like this. In the linked paper it says
> In production, DeepSeek-OCR can generate training data for LLMs/VLMs at a scale of 200k+ pages per day (a single A100-40G).
That... doesn't sound legal
apwheele an hour ago
Question for the crowd -- with autoscaling, when a new pod is created it will still download the model right from huggingface?
I like to push everything into the image as much as I can. So in the image modal, I would run a command to trigger downloading the model. Then in the app just point to the locally downloaded model. So bigger image, but do not need to redownload on start up.
ddtaylor an hour ago
How does this compare to Tesserect?