Comments Page - OCRing Music from YouTube with Common Lisp

« Back OCRing Music from YouTube with Common Lispnickfa.roSubmitted by superdisk 9 months ago

notpublic 9 months ago
Instead of doing a diff, curious if Normalized compression distance (NCD)[1] will yield a better result. It is very simple algorithm:
to compare two images, i1 and i2
```
  l1  = length(gzip(i1))
  l2  = length(gzip(i2))
  l12 = length(gzip(concatenate(i1, i2))

  ncd = (l12 - min(l1, l2))/max(l1, l2)
```
Here is a nice article where I found out about this long ago.
https://yieldthought.com/post/95722882055/machine-learning-t...
From the article:
"Basically it states that the degree of similarity between two objects can be approximated by the degree to which you can better compress them by concatenating them into one object rather than compressing them individually."
[1] https://en.wikipedia.org/wiki/Normalized_compression_distanc...
- johnisgood 9 months ago
  Oh interesting, I remember comparing images before, I think I was doing a diff as well, so I suppose this would have worked? Nice to know! They were very small images though.
  It probably would have added the overhead from compression which in my case would have been detrimental.
  notpublic 9 months ago
  Do try it. We use it for text search in one of our apps and works remarkably well. Basically to find which chunks contain the given text. Since the text can span multiple chunks, a simple string search will not work.
  namvdo 9 months ago
  [dead]
varjag 9 months ago
If you're also getting a 500:
https://web.archive.org/web/20250106075631/https://nickfa.ro...
- superdisk 9 months ago
  I just restarted the webserver. It's running on OpenBSD HTTPd + MediaWiki + SQLite, and keeping it up has been a perpetual thorn in my side. Oh well. I need to figure out some alternative setup probably.
  j45 9 months ago
  Modify your DNS to put cloudflare or bunny in front of it and you'll be good. Don't stop self-hosting :)
  zoezoezoezoe 9 months ago
  self-hosting means freedom, never stop self-hosting
  MonkeyClub 9 months ago
  Is your VPS on OpenBSD.Amsterdam by any chance? (The 46.23.. address seems familiar.)
  superdisk 9 months ago
  Yep, that's it. The host is (for the most part) fine, but there's either some problem with httpd or the PHP worker pool where it just dies after some number of requests.
  MonkeyClub 9 months ago
  Hi, neighbor! (I'm on server 7.)
  The service is indeed great, Mischa does an excellent job.
  Yeah PHP on httpd can be flaky, I'd wish for a lighter solution for wikis.
xenonite 9 months ago
To OCR music scores, see e.g., https://digitalcollection.zhaw.ch/items/276365b9-0a20-4286-a...
rcarmo 9 months ago
Holy cow.
kanwisher 9 months ago
honestly this would be better with an AI model
- secondplacetho 9 months ago
  ML is the second best answer to everything, and very rarely the first best answer.
  Of course it'd be better than something that is intentionally limiting itself. But that says nothing.
- teruakohatu 9 months ago
  > honestly this would be better with an AI model
  In the article the author tried Tesseract which uses ML and has some neural network models, and also tried ChatGPT.
  I have come to the same conclusion as the author when doing OCR that needed 100% accuracy.
  When you know the font, spacing and the layout is fixed, old school statistical analysis of the pixels works a treat.
  register 9 months ago
  Completely second that. This is my experience as well.
  Vampiero 9 months ago
  You can generalize that to anything: when you know the problem domain so well, why the hell are you using ChatGPT to solve any problem within it? Use the most specialized tool for the job or you're just wasting CPU and memory (and electricity, and money, and time). Same goes for a neural net trained on every possible character set. If you know the font and character size in advance it's way overkill.
  It's a bit more effort to set up since you actually have to set it up. But at least it's done right.
- curt15 9 months ago
  By "AI model" do you mean neural nets? "AI" or "ML" are just buzzwords that conveys no real meaning about the underlying mathematics. The underlying models could be something as basic as linear or logistic regression, which depending on the application could actually be more appropriate that full-blown neural nets.