• duck a day ago
    • Ronsenshi a day ago

      For me this looks like a great way to build connections between books in order to create a recommendation engine - something better than what Goodreads & Co provides. Something actually useful.

      The cost of indexing using third party API is extremely high, however. This might work out well with an open source model and a cluster of raspberry pi for large library indexing?

      • padolsey 18 hours ago

        The incumbants Goodreads and their owner Amazon have indeed done such a poor job at this. Seven years ago I tried creating a basic graph using collaborative-filtering (effectively using our actual reading patterns as the embeddings space instead of semantics [human X likes book Y so likers of Y might like other things that human X has enjoyed]). It works well to this day (ablf.io) but the codebase is so ugly I've not had the bravery to update its data in a couple of years.

        • alansaber 18 hours ago

          Yes imo this is very useful but there's not a clear industry standard on how to do so yet, which I imagine will change? Tell me if i'm missing something

      • nubskr a day ago

        I've been using Claude Code for my research notes and had the same realization, it's less about perfecting prompts and more about building tools so it can surprise you. The moment I stopped treating it like a function and started treating it like a coworker who reads at 1000 wpm, everything clicked

        • imranq 3 hours ago

          I really liked the approach of getting new topics to research via embeddings, trails, and claude code, but often what will this give you outside of novelty?

          • jszymborski a day ago

            This is all interesting, however I find myself most interested in how the topic tree is created. It seems super useful for lots of things. Anyone can point me to something similar with details?

            EDIT: Whoops, I found more details at the very end of the article.

            • alansaber 18 hours ago

              He asks g2.5 flash to assign a topic. I am also interested in the best way to develop a general schema- there is a good deal of literature on this but nothing stands out, I think the standard approach is open ended classification generation using a single model then binning. Actually the novelty in his approach is first asking if a chunk is useful (ie adding a filter for non-semantic information) which I would normally do at the dataset creation stage.

            • rbbydotdev 14 hours ago

              I had a similar toy project. Attempting to make custom day trips from guide books. I immediately ran into limitations naïvely chunking paragraphs into a RAG. My next attempt I’m going to try using a llm model to extract “entities” like holidays/places/history and store them in a graph db coupled with vectors and original source text or index references(page + column)

              Still experimental and way outside my expertise, would love to hear anyone with ideas or experience with this kind of problem

              • zkmon 18 hours ago

                I used AI for accelerating my reading a book recently. This is a interesting usecase. But it same as racing for the destination instead enjoying the journey.

                It kills the tone, pace and the expressions of the author. It is pretty much same as an assistant summarizing the whole book for you, if that's what you want. It misses the entire experience delivered by the author.

                • alansaber 18 hours ago

                  Yes AI subsumes edge cases to produce a very uniform optimal writing (what we call AI slop). I am assuming this is a book you were reading for knowledge work, not for fun? Not heard about people recreationally using AI for consumer content that's a bridge too far for me lol.

                  • CuriouslyC 18 hours ago

                    It's not optimal. It's overwritten, repetitive, cliche and increasingly incoherent over longer generations. I say this as someone who likes AI and uses it to create rough drafts and structural revisions of my ideas.

                    • alansaber 17 hours ago

                      Exactly stochastic but statistically optimal based on a bunch of very broad range of text which often is not actually good writing

                • ebiester a day ago

                  I did a similar thing with productivity books early last year, but never released it because it wasn't high enough quality. I keep meaning to get back to that project but it had a much more rigid hypothesis in mind - trying to get the kind of classification from this is pretty difficult and even more so to get high value from it.

                  • doytch a day ago

                    The mental model I had of this was actually on the paragraph or page level, rather than words like the post demos. I think it'd be really interesting if you're reading a take on a concept in one book and you can immediately fan-out and either read different ways of presenting the same information/argument, or counters to it.

                    • voidhorse a day ago

                      This was posted before and there were many good criticisms raised in the comments thread.

                      I'd just reiterate two general points of critique:

                      1. The point of establishing connections between texts is semantic and terms can have vastly different semantic meanings dependent on the sphere of discourse in which they occur. Because of the way LLMs work, the really novel connections probably won't be found by an LLM since the way they function is quite literally to uncover what isn't novel.

                      2. Part of the point in making these connections is the process that acts on the human being making the connections. Handing it all off to an LLM is no better than blindly trusting authority figures. If you want to use LLMs as generators of possible starting points or things to look at and verify and research yourself, that seems totally fine.

                      • smakt 13 hours ago

                        One has to be the special kind of stupid that is blinded by efficiency promises from the LLM Church to think the article is any worth.

                        It's the usual jargon soup. Publish a vetted paper with repeatable steps instead of a hyped-up, garbage, supposed 100x productivity bomb.

                        And his best result is mechanical findings from where the LLM got the highest correlations between its vectors: Bravo; there's always going to be a top item in any ordered list, but it doesn't make it automatically interesting. Reading literature is about witnessing the journey the characters take. Reading technical material is about memorizing enough of it. In both cases the material has to go through a brain. I find it idiotic to assign any value to outputs like "Oh King Lear's X is highly correlated to Antigone's Y"

                      • skeptrune a day ago

                        I really like the idea of the topic tree. That intuitively resonates.

                        • lloydatkinson 18 hours ago

                          How can anyone even trust crap like this? It was only a few days ago Claude and ChatGPT hallucinated a bunch of stuff from actual docs I sent them links to. When asked about it, they just apologised.

                          • mpalmer 18 hours ago

                            Synthesizing 500 words at a time into digestible topics is significantly less prone to error. You're giving it a lot of info and asking for an organized subset. It's good at following such direction.

                            In your example, you're doing the inverse (give me a lot of text based on a little), and that's where LLMs have no problem hallucinating the new information.

                            • alansaber 18 hours ago

                              Exactly the more tightly scoped the problem the less stochastic noise. Even better if you can add more signals based on deterministic algorithms like keyword presence etc. It gets very domain-specific very fast

                          • kylehotchkiss a day ago

                            In several years, IMO the most interesting people are going to be the ones still actually reading paper books and not trying to shove everything into a LLM

                            • hungryhobbit a day ago

                              I don't think the Venn diagram of those people and everyone else is as separate as you imagine.

                              I'm a Literature major and avid reader, but projects like this are still incredibly exciting to me. I salivate at the thought of new kinds of literary analysis that AI is going to open up.

                              • imdsm a day ago

                                the people most likely to analyse books like this are those of us who are more likely to read them as well

                              • pradmatic a day ago

                                Sure but those people don't have to be mutually exclusive. At the very least, a tool like this can help me decide what to read next.

                                • alansaber 18 hours ago

                                  I can't wait to have the LLM autopilot my neuralink whilst i'm in VR mario kart.

                                  • fatherwavelet 17 hours ago

                                    I still read a lot of books and I use LLMs all the time. I have even got a bunch of book recommendations from LLMs. Imagine that. You actually have agency over these tools. I know it is hard to believe for some.

                                  • gulugawa a day ago

                                    [flagged]

                                    • dang a day ago

                                      "Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

                                      "Don't be curmudgeonly. Thoughtful criticism is fine, but please don't be rigidly or generically negative."

                                      https://news.ycombinator.com/newsguidelines.html

                                      • gjm11 a day ago

                                        I agree that we should be reading books with our eyes and that feeding a book into an LLM doesn't constitute reading it and confers few of the same benefits.

                                        But this thing isn't (so far as I can tell) even slightly proposing that we feed books into an LLM instead of reading them. It looks to me more like a discovery mechanism: you run this thing, it shows you some possible links between books, and maybe you think "hmm, that little snippet seems well written" or "well, I enjoyed book X, let's give book Y a try" or whatever.

                                        I don't think it would work particularly well for me; I'd want longer excerpts to get a sense of whether a book is interesting, and "contains a fragment that has some semantic connection with a fragment of a book I liked" doesn't feel like enough recommendation. Maybe it is indeed a huge waste of time. But if it is, it isn't because it's encouraging people to substitute LLM use for reading.

                                        • imdsm a day ago

                                          commenter above probably didn't read the post, ironically

                                          • ryan_n a day ago

                                            Guess we need “reading across hacker news articles with Claude code.”

                                          • gulugawa 4 hours ago

                                            The ideal way to find similarities between two books is to read both of them. If an LLM is finding links between two books, that means that the LLM read both of the books.

                                            To determine if a book is worth reading, I think it's better to ask someone for their recommendation or look at online reviews.

                                          • stavros a day ago

                                            I need a name for people who dismiss an entirely new and revolutionary class of technology without even trying it, so much so that they'll not even read about any new ideas that involve it.

                                            • dang a day ago

                                              The HN guidelines include the term "curmudgeonly", which IMO is fair.

                                              • smakt 10 hours ago

                                                And I need a name for shills that handwave the whole magic thinking in a blog post and conclude with "oh my claude code pointed out correlations between atlas shrugged and steve jobs" I'm so much smarter and ready for the future that's coming.

                                                You are damn right I didn't try it out. I try things published in journals, vetted by peers, with clear explanations and instructions. On the other hand, when the tone is "It's All Magic Sprinkle(TM)" my pseudoscience alarm goes off.

                                                • stavros 10 hours ago

                                                  Why are you reading this comment section? Nothing here has been peer reviewed. In fact, all my comments here are written by an LLM, because I can't be bothered arguing with closed-minded people.

                                                  • smakt 7 hours ago

                                                    > Nothing here has been peer reviewed

                                                    Oh but everything here is peer reviewed all right: it's sheep-reviewed. All sheep singing the same note. Where's the explosion of groundbreaking, uber-creative, world-shattering, reliable software from MagicDust LLMs that turn you into a 10x engineer? If anything, it generates a lot of noise. Tell you what: being 10x more productive with a statistical engine that will only bring out the most normal of normal solutions is the dream of the incompetent.

                                                • imdsm a day ago

                                                  we call them luddites

                                                  • lsaferite a day ago

                                                    I'm not entirely sure that's a fair association. The Luddites weren't against technology in general, they were fighting for their livelihoods. There very well could be a fresh luddite movement centered around the use of AI tools, but I don't think "luddite" is the right term in this specific case.

                                                    • ironbound a day ago

                                                      No that was a labor issue, abusive factory owners got targeted.

                                                  • mikkupikku a day ago

                                                    I zgrep my epubs, is that a problem too?