If your VLM based pipeline is really that good for OCR - and no reason to believe it cant be - why dont you just launch that as a product. The way I see it is that these are two separate products - VLM based OCR for messy documents, and automated credit review for developing markets.
I have some experience setting up automated OCR systems for one of the largest fintechs in Australia - and VLM based pipelines can definitely give an extra edge and this is easily a very large TAM market. However, existing players might also be upgrading their systems so might not be too easy to disrupt. That being said, credit analysis is also a very hard problem, but I am not sure how much quality OCR would help here.
Given what I know, I would focus on the VLM/OCR problem rather than the Credit scoring one.
If you are eyeing the South African market - I can promise you granting credit here is waaaaayyy ahead of the US. There is a very solid credit bureau and a few of the banks are already on the "use AI to process docs" train. For rest of Africa - they're bigger on using cellphone data (see Optasia). If you want some insight into the market - happy to have a chat (email on profile)
The document standardization problem you're describing maps closely to what we see in DeFi infrastructure, different chains, different data formats, no consistent standards, and existing tools breaking on real-world inputs. The "model agnostic base + market-specific fine-tuning" architecture is smart. Curious how you handle cases where the same lender operates across multiple markets with conflicting document conventions, does the model layer stay separate per market or do you find cross-market signal bleeding actually helps fraud detection?
As you mentioned, the format of these documents can be wildly different between countries. Curious, do you normalize them to some sort of intermediate representation before running your lending algorithms? How long does it take to spin up a new country or a new domain?
Hey guys, this sounds very cool. I'd like to have a chat with you as a person working in anti-fraud/credit-risk data science. I wonder if you could generalize this to other situations like the one my company is facing right now (subscription based business).
Very interesting. This should be useful in India also, if you ever get around to it.
Solution to emerging markets capital access problem is not making the current predatory system more efficient, but investing in micro credit. Which will never happen at scale because it generates lower returns (better to have 10 bad payers than 1000 good payers)
Also, those markets, "venture capital" usually means vertical lending platforms. Healthtech? nah, just credit for dental treatment. Edutech? nah, just credit for classes. Etc.
It's a very crowded space
Would love to chat, I recently wrapped up an initial version of an automated real estate appraisal review app which appears to have some of the same technical challenges and risks. https://getvalara.com / jwillis@valara.net
Would love to share notes. I was able to get away with landing.ai and some really careful schema design and multi-step workflow with a few agents sprinkled in at the end.
Sidenote: unlike in Tagalog, Kita means “day-care facility for children” in German, so names like Kita Capture and Kita Credit Agent could carry unintended connotations.
In Croatian, Bosnian and Serbian, kita is a derogatory word...
Luckily Germany is not an emerging market :)
Since when did Germany victimize itself into emerging market category?
I would be even more worried by the fact that it resembles Keeta, a delivery brand owned by Meituan, which actually operates in a bunch of large emerging markets.
Germany (~5% of SMEs underfunded [0]) isn’t underbanked; it’s just bureaucratically annoying; so more of an UX problem (which Kita helps address) with a somewhat similar outcome.
The US ("63M underbanked businesses") are already targeted: "Automate document review for business loan applications so you can fund more enterprises without scaling your back office." https://www.usekita.com/united-states
[0] https://www.eib.org/files/publications/20230340_econ_eibis_2...
[dead]