Has anyone figured out how to do “visual” chunking for rag? I’m curious how this would be used in place of an OCR service.
HuggingFace link: https://huggingface.co/nvidia/NVLM-D-72B