• brynary 15 hours ago

    This looks great! Duplication and dead code are especially tricky to catch because they are not visible in diffs.

    Since you mentioned the implementation details, a couple questions come to mind:

    1. Are there any research papers you found helpful or influential when building this? For example, I need to read up on using tree edit distance for code duplication.

    2. How hard do you think this would be to generalize to support other programming languages?

    I see you are using tree-sitter which supports many languages, but I imagine a challenge might be CFGs and dependencies.

    I’ll add a Qlty plugin for this (https://github.com/qltysh/qlty) so it can be run with other code quality tools and reported back to GitHub as pass/fail commit statuses and comments. That way, the AI coding agents can take action based on the issues that pyscn finds directly in a cloud dev env.

    • d-yoda 14 hours ago

      Thank you! 1.For tree edit distance, I referred to "APTED: A Fast Tree Edit Distance Algorithm" (Pawlik & Augsten, 2016), but the algorithm works as O(n²) so I also implemented LSH (classic one) for large codebases.The other analyses also use classical compiler theory and techniques. 2. Should be straightforward! tree-sitter gives us parsers for 40+ languages. CFG construction is just tracking control flow, and the core algorithm stays the same.

      I focused on Python first because vibe coding with Python tends to accumulate more structural issues. But the same techniques should apply to other languages as well.

      Excited about the Qlty integration - that would make pyscn much more accessible and would be amazing!

    • amacbride 12 hours ago

      I'm going to push back hard on the folks dunking on "vibe coders" -- I have been programming longer than most of you have been alive, and there are times when I absolutely do vibe coding:

      1) unfamiliar framework 2) just need to build a throwaway utility to help with a main task (and I don't want to split my attention) 3) for fun: I think of it as "code sculpting" rather than writing

      So this is absolutely a utility I would use. (Kudos to the OP.)

      Remember the second-best advice for internet interactions (after Wheaton's Law): "Ssssshh. Let people enjoy things."

      • kelnos 10 hours ago

        I too have probably been programming longer than most people here, and I'll vibe code on occasion for your #2 reason. (Recently I needed to take an OpenAPI spec file and transform/reduce it in some mechanical ways; didn't feel like writing the code for it, didn't care if it was maintainable, and it was easily verifiably correct after a quick manual skim of its output.)

        I don't think #1 is a good place to vibe code; if it's code that I'll have to maintain, I want to understand it. In that case I'll sometimes use an LLM to write code incrementally in the new framework, but I'll be reading every line of it and using the LLM's work to help me understand and learn how it works.

        A utility like pyscn that determines code quality wouldn't be useful for me with #1: even in an unfamiliar framework, I'm perfectly capable judging code quality on my own, and I still need and want to examine the generated code anyway.

        (I'm assuming we're using what I think is the most reasonable definition of "vibe coding": having an LLM do the work, and -- critically -- not inspecting or reviewing the LLM's output.)

        • amacbride 9 hours ago

          I was using the definition of “let the LLM take the lead in writing the code, but review it afterwards“ so I don’t think our opinions are in conflict.

          I think of coding agents as “talented junior engineers with no fatigue, but sometimes questionable judgment.”

        • convolvatron 11 hours ago

          we can have a pissing contest. I don't begrudge anyone their fun, but when my job becomes taking hundreds of thousands of lines of vibe code and just finding that one little change that will make it all work, we have a serious problem with expectations.

          • amacbride 11 hours ago

            I don't think we're at odds: I think "vibe coding" is strictly for fun and for prototypes. However, people will misuse any tool, so having utilities to mitigate the risk isn't a bad thing.

        • scuff3d 14 hours ago

          This is an interesting idea but you might be better off marketing it as a tool for software engineers, maybe to help with old code bases. Or even for someone stuck cleaning up vibe coded nonsense.

          Vibe coders don't care about quality and wouldn't understand why any of these things are a problem in the first place.

          • ryandrake 13 hours ago

            I agree with this. I've been pretty critical of AI coding, but at the urging of some other HN posters, I shelled out a few bucks and started giving Claude Code a chance. After about 2 months of using it for various personal Python and C++ projects, my current problem with it is 1. how much babysitting you need to do to keep it on track and writing code the way you'd like it written, and 2. how much effort you need to spend after it writes the code, to clean it up and fix it. This tool would probably help quite a bit with 2.

            I find for every 5 minutes of Claude writing code, I need to spend about 55 minutes cleaning up the various messes. Removing dead code that Claude left there because it was confused and "trying things". Finding opportunities for code reuse, refactoring, reusing functions. Removing a LOT of scaffolding and unnecessary cruft (e.g. this class with no member variables and no state could have just been a local function). And trivial stylistic things that add up, like variable naming, lint errors, formatting.

            It takes 5 minutes to make some ugly thing that works, but an hour to have an actual finished product that's sanded and polished. Would it have taken an hour just to write the code myself without assistance? Maybe? Probably? Jury is still out for me.

            • Wowfunhappy 12 hours ago

              Have you experimented with using a Claude.md file that describes your preferred coding style, including a few examples of what not to do and the corrected version? I haven't had complete success with this but it does seem to help.

              • scuff3d 11 hours ago

                Yeah in general I think agents are a mistake. People are desperately trying to make these things more useful then they are.

                It's more useful as a research assistant, documentation search, and writing code a few lines at a time.

                Or yesterday for work I had to generate a bunch of json schemas from Python classes. Friggin great for that. Highly structured input, highly structured output, repetitious and boring.

                • mlyle 10 hours ago

                  I still think vibe coding is a win. Sure, you can't turn it loose on a massive codebase, yet.

                  But in about 45 minutes I got 700 lines of relatively compact web code to use plotly, jszip, and paraparse to suck in video files, CSV telemetry, and logfiles, help you sync them up, and then show overlays of telemetry on the video. It can also save a package zip file of the whole situation for later use/review. Regex search of logs. Things linked so if you click on a log line, it goes to that part of the video. WASD navigation of the timeline. Templating all the frameworks into the beginning of the zip file so it works offline. etc.

                  I am not an expert web developer. It would have taken me many hours to do this myself. It looks crisp and professional and has a big featureset complexity.

                  (Oh, yah, included in the 45 minutes but not the line count: it gave me a ringbuffer for telemetry and a CSV dumper for it and events, too).

                  The last couple of revisions, it was struggling under the weight of its context window a bit and I ended up making the suggested changes by hand rather than taking a big lump of code from it. So this feels like an approximate upper limit for the complexity of what I can get from ChatGPT5-thinking without using something like Claude Code. Still, a whole lot of projects are this size or smaller.

              • CuriouslyC 14 hours ago

                Vibe coders do care about quality, at least the ones that try to ship and get burned by a mountain of tech debt. People aren't as stupid and one dimensional as you assume.

                • scuff3d 14 hours ago

                  Given an entire industry is cropping up to fix the mess these people make, I think less of them care then you think.

                  • _joel 12 hours ago

                    Is it an industry, or just a meme job title? Serious question.

                    • scuff3d 11 hours ago

                      There have been plenty of articles about it recently, seems real enough to me.

                      • _joel 11 hours ago

                        I wonder if that's sustainable though, as either the tools get better or companies realise it's not a magic bullet? Time will tell.

                        • scuff3d 10 hours ago

                          It's not. The cost of fixing the garbage will outweigh any savings on the front end. Any experienced dev will tell you it's easier to spend a little extra up front to make things more maintainable then it is to fix a mess later.

                          And even the tools get better, they'll never get to the point where you don't need experts to utilize them, as long as LLMs are the foundation.

                    • xkbarkar 12 hours ago

                      Hard disagree. Vibe code has its downsides but is not nearly as terrible as threatened coders on the forums make it seem.

                      • scuff3d 8 hours ago

                        It's not a threat to software engineers at all. These things are worse than useless when someone who doesn't know what they're doing tries. If anything they're going to create jobs.

                        Vibe coders are the new script kiddies.

                    • flare_blitz 12 hours ago

                      And where, exactly, did this commenter say that vibe coders are "stupid and one dimensional"? Stop putting words in people's mouths.

                      • CuriouslyC 12 hours ago

                        >> Vibe coders don't care about quality and wouldn't understand why any of these things are a problem in the first place.

                        He literally bucketed an entire group of people by a weak label and made strong claims about competence and conscientiousness.

                        • flare_blitz 12 hours ago

                          That comment sounds pretty benign to me. I also don't know why you're assuming the original commenter is male. The only person in the wrong here is you, and you're wrong twice over.

                    • lacy_tinpot 12 hours ago

                      This kind of weird disdain towards "vibe coders" is hilarious to me.

                      There was a time when hand soldered boards were not only seen as superior to automated soldering, but machine soldered boards were looked down on. People went gaga over a good hand soldered board and the craft.

                      People that are using AI to assist them to code today, the "vibe coders", I think would also appreciate tooling that assists in maintaining code quality across their project.

                      • scuff3d 12 hours ago

                        Whether the board is hand solder or not, the person designing it still has to know what they're doing.

                        I think a comparison that fits better is probably PCB/circuit design software. Back in the day engineering firms had rooms full of people drafting and doing calculations by hand. Today a single engineer can do more in an hour then 50 engineers in a day could back then.

                        The critical difference is, you still have to know what you are doing. The tool helps, but you still have to have foundational understanding to take advantage of it.

                        If someone wants to use AI to learn and improve, that's fine. If they want to use it to improve their workflow or speed them up that's fine too. But those aren't "vibe coders".

                        People who just want the AI to shit something out they can use with absolutely no concern for how or why it works aren't going to be a group who care to use a tool like this. It goes against the whole idea.

                        • lacy_tinpot 11 hours ago

                          Sure, we can use that comparison if you'd like. And sure you need to know what you're doing as well.

                          But "vibe coding" is this vague term that is used on the entire spectrum, from people that do "build me a billion dollar SAAS now" kind of vibe coders, to the "build this basic boilerplate component" type of vibe coders. The former never really get too far.

                          The later have staying power because they're actually able to make progress, and actually build something tangible.

                          So now I'm assuming you're not against AI generated code, right?

                          If that's the case then it's clear that this kind of tool can be useful.

                          • scuff3d 11 hours ago

                            I don't think the term applies to the latter. By definition if you're "vibe coding" you don't care about the output, just that it "works".

                            I think AI is useful for research and digging through documentation. Also useful for generating small chunks of code at a time, documentation, or repetitive tasks witb highly structured inputs and outputs. Anything beyond that, in my opinion, is a waste of time. Especially these crazy ass agent workflows where you write ten pages of spec and hope the thing doesn't go off the rails.

                            Doesn't matter how nice a house you build if you build it on top of sand.

                            • scoopdewoop 9 hours ago

                              By whose definition? Yours? That seems circular.

                              • scuff3d 8 hours ago

                                By the guy who have birth to the whole stupid trend:

                                "... fully give in to the vibes, embrace exponentials, and forgete that the code even exists."

                                If you're "vibe coding" you don't know and you don't care what the code is doing.

                                • scoopdewoop 6 hours ago

                                  Ha, fair enough. I forgot entirely the essence of that tweet, but I was really getting swept away with AI code at the time and probably was projecting my experience onto his tweet. I guess vibe-coding isn't what I like doing.

                                • maleldil 8 hours ago

                                  Karpathy's, the person who is credited for inventing the term.

                                  https://x.com/karpathy/status/1886192184808149383

                        • d-yoda 14 hours ago

                          "You're absolutely right!" - the messaging could be clearer. I built pyscn because more engineers than expected are using AI assistants these days (to varying degrees), and I wanted to give them a tool to check code quality. But the real value might be for engineers who inherit or maintain AI-generated codebases as you say, rather than those actively vibe coding.

                        • aDyslecticCrow 14 hours ago

                          Current AI is most proficient in java-script and python because of the vast training data. But in the long-run, i feel like languages with good static analysis, static type checks, clear language rules, memory leak detection, fuzzing, test oriented code, and any number of other similar tooling is gonna be the true game-changer. Directed learning using this tooling could improve the models beyond their training set, or simply allow humans to constrain AI output within certain bounds.

                          • d-yoda 14 hours ago

                            Great point! Golang is indeed one of those languages with strong "vibe coding resistance" - it's personally one of my favorites for that reason. On the flip side, I think there's a future where tools like pyscn work alongside AI to make languages with large communities like Python even more dominant.

                            • buremba 14 hours ago

                              I was more optimistic before bur if 95% of the all software is written by these two languages, it will be very hard for any (better) alternative to disrupt them. The only way will likely to make better profiling & debugging tools to help maintain existing codebase.

                              • d-yoda 14 hours ago

                                I'm actually more optimistic. While Python/JS have huge ecosystems, there are still things only Go/Rust can achieve.

                            • xrd 14 hours ago

                              I absolutely love this. Tests and code coverage metrics are still important, but so easy to leave behind as you are running toward the vibe. This is a nice addition to the toolbox.

                              • smoe 13 hours ago

                                I’d argue that those kinds of automated tools are much more important much earlier in a project than they used to be.

                                Personally, I can deal with quite a lot of jank and a lack of tests or other quality control tools in the early stages, but LLMs get lost so quickly. It’s like onboarding someone new to the codebase every hour or so.

                                You want to put them into a feedback loop with something or someone that isn’t you.

                                • d-yoda 14 hours ago

                                  Thank you! I'll keep improving it more and more!

                                • ktrnka 6 hours ago

                                  Looks good to me! The status bar did something weird on my codebase in which 2% was really 100% so it looked like it was gonna take hours but only took a minute or so.

                                  I'll try hooking it into my refactor/cleanup workflow with copilot and see how it works as grounding.

                                  • derekcheng08 14 hours ago

                                    This is pretty awesome! If it's built on tree-sitter, is it fair to assume it's generalizable across languages?

                                    • d-yoda 14 hours ago

                                      Yes! tree-sitter supports multiple languages and the core algorithms should transfer easily. I focused on Python first because I saw many people struggling with code quality issues in Python.

                                      • derekcheng08 13 hours ago

                                        Just based on usage, I would assume js/ts would be very valuable as well. I see a lot of the same issues there and agree the core algos seem to apply generally. Very cool project!

                                        • _joel 12 hours ago

                                          Agreed, I do python and ts depending on the use case, a ts version would be cool

                                    • ok123456 11 hours ago

                                      I ran these on some (non-vibe-coded) large repositories of my code that I'm not too proud of, and it gave me an A. I feel validated.

                                      • scosman 13 hours ago

                                        this should be a MCP server the agent can use and optimize on

                                        I have a MCP server that wraps developer tool CLIs (linting, tests, etc), but this would need a textual report instead of HTML.

                                        https://github.com/scosman/hooks_mcp

                                        • brynary 13 hours ago

                                          What benefits do you see from having the agent call a CLI like this via MCP as opposed to just executing the CLI as a shell command and taking action on the stdout?

                                          • scosman 11 hours ago

                                            Few things:

                                            - Security/Speed: I leave "approve CLI commands" on in Cursor. This functions as a whitelist of known safe commands. It only needs to ask if running a non-standard command, 99% of the time it can use tools. It will also verify paths passed by the model are in the project folder (not letting it execute on external files)

                                            - Discoverability: For agents to work well, you need to explain which commands are available, when to use each, parameters, etc. This is a more formal version than a simple AGENTS.md, with typed parameters, tool descriptions, etc.

                                            - Correctness: I find models mess up command strings or run them in the wrong folders. This is more robust than pure strings, with short tool names, type checking, schemas, etc.

                                            - Parallel execution: MCP tools can run in parallel, CLI tools typically can't

                                            - Sharing across team: which dev commands to run can be spread across agents.md, github workflows, etc. This is one central place for the agents use case.

                                            - Prompts: MCP also supports prompts (less known MCP feature). Not really relevant to the "why not CLI" question, but it's a benefit of the tool. It provides a short description of the available prompts, then lets the model load any by name. It's requires much less room in context than loading an entire /agents folder.

                                            • mogwire 13 hours ago

                                              This is one of the most important questions I see when people recommend an MCP server.

                                              If cursor and Claude code can already run an executable why do I need to add an MCP server in front of it?

                                              I feel like a lot of times it’s, “Because AI”

                                            • daydreamer9000 13 hours ago

                                              Coding agents usually have longer timeouts for bash commands than MCP server calls. So depending on how much time this tool takes, using a CLI by default could be more robust.

                                              • ano-ther 13 hours ago

                                                It has a JSON option, would that work?

                                                   pyscn analyze --json .                       # Generate JSON report
                                                • scosman 11 hours ago

                                                  it would!

                                                • d-yoda 12 hours ago

                                                  MCP integration could be a good option. If there's interest, feel free to raise an issue on GitHub.

                                                • CuriouslyC 14 hours ago

                                                  I'm surprised you went with go for this, you're going to encounter so much pain with large codebases.

                                                  • dangoor 14 hours ago

                                                    Curious why you say this. It says in the readme it can do 100K lines per second.

                                                    • CuriouslyC 14 hours ago

                                                      The SIMD story in Rust or another lower level systems language is much better, and the memory control is more fine grained without forfeiting inlining. For a hot loop that's amenable to SIMD, Rust can deliver twice the performance of Go if you don't hand roll platform specific code.

                                                      • d-yoda 13 hours ago

                                                        Rust is definitely the king of performance! I personally love Go, but Rust's performance is truly impressive.

                                                      • maleldil 8 hours ago

                                                        I'm not sure what that means. My codebase with 40k lines (via cloc) takes 20 seconds (M1 Pro).

                                                        • d-yoda 14 hours ago

                                                          Yeah Go is very fast!

                                                      • eric15342335 13 hours ago

                                                        What about Pylint? iirc pylint has code duplication check as well. is it the same thing?

                                                        • d-yoda 12 hours ago

                                                          Pylint's duplication check is text-based (compares lines), while pyscn uses tree edit distance on ASTs. This means pyscn can catch structural clones even when variable/function names differ.

                                                        • guilhermesfc 14 hours ago

                                                          This is great! Is there something similar for Typescript?

                                                          • d-yoda 14 hours ago

                                                            Not yet! But the algorithms should transfer well - the core logic supports TypeScript, so it's definitely doable.

                                                          • senand 13 hours ago

                                                            How does this compare to ruff?

                                                            • d-yoda 13 hours ago

                                                              They complement each other - Ruff for style, pyscn for architecture. pyscn focuses on structural quality - checking if your code follows fundamental design principles like DRY, YAGNI, or other best practices.

                                                            • FergusArgyll 14 hours ago

                                                              Very cool! I've never seen a cli that opens an html file when it's finished. I kinda like it, hope to see more of that in the future

                                                              • d-yoda 14 hours ago

                                                                Glad you like it! Trying to make it as user-friendly as possible.

                                                                • johtso 13 hours ago

                                                                  This is fairly common with linting/test coverage tools

                                                                • khimaros 10 hours ago

                                                                  see also https://github.com/mozilla/rust-code-analysis which also builds on Tree Sitter, tracks similar metrics, and supports quote a few languages

                                                                  • d-yoda 6 hours ago

                                                                    Thanks for sharing! Good to know about rust-code-analysis. Always helpful to see what other tools are doing in this space.

                                                                  • joduplessis 12 hours ago

                                                                    There's no way vibe coders care about this. Focus on real engineers.

                                                                    • d-yoda 6 hours ago

                                                                      Fair point! My initial target was engineers using AI, but I'm open to refining the messaging.