• alexpovel 9 months ago

    These sorts of cases are why I wrote srgn [0]. It's based on tree-sitter too. Calling it as

         cat file.py | srgn --py def --py identifiers 'database' 'db'
    
    will replace all mentions of `database` inside identifiers inside (only!) function definitions (`def`) with `db`.

    An input like

        import database
        import pytest
    
    
        @pytest.fixture()
        def test_a(database):
            return database
    
    
        def test_b(database):
            return database
    
    
        database = "database"
    
    
        class database:
            pass
    
    
    is turned into

        import database
        import pytest
    
    
        @pytest.fixture()
        def test_a(db):
            return db
    
    
        def test_b(db):
            return db
    
    
        database = "database"
    
    
        class database:
            pass
    
    
    which seems roughly like what the author is after. Mentions of "database" outside function definitions are not modified. That sort of logic I always found hard to replicate in basic GNU-like tools. If run without stdin, the above command runs recursively, in-place (careful with that one!).

    Note: I just wrote this, and version 0.13.2 is required for the above to work.

    [0]: https://github.com/alexpovel/srgn

    • Jackevansevo 9 months ago

      This is super cool! I wish I'd known about this.

      • thatxliner 9 months ago

        How does this compare to https://github.com/ast-grep/ast-grep

        • undefined 9 months ago
          [deleted]
        • avianlyric 9 months ago

          Interesting use of treesitter. But I’m a little surprised that treesitters built in query language wasn’t used.

          There’s no need to manually iterate through the tree, and use if statements to select nodes. Instead you can just write a couple of simple queries (and even use treesitters web UI to test the queries), and have the treesitter just provide all the nodes for you.

          https://tree-sitter.github.io/tree-sitter/using-parsers#patt...

          • hetspookjee 9 months ago

            Having no experience with treesitter I find the query language rather hard to parse. From a practical point of view and experimenting with the library I’m not surprised to go with this nested For loop approach.

            • jonathanyc 9 months ago

              The query language is definitely underdocumented. In case it helps you, what helped me was realizing it’s basically a funky pattern language, à la the match pattern sublanguages in OCaml/Haskell/Rust.

              But the syntax for variable binding is idiosyncratic and the opposite of normal pattern languages. Writing “x” doesn’t bind the thing at the position to the variable x; instead, you have to write e.g. foo @x to bind x to the child of type foo. Insanely, some Scheme dialects use @ with the exact opposite semantics!! There’s also a bizarre # syntax for conditionals and statements.

              Honestly there isn’t really an excuse for how weird they made the pattern syntax given that people have spent decades working on pattern matching for everything from XML to objects (even respecting abstraction!). I’ve slowly been souring on treesitter in general, but paraphrasing Stroustrup: there are things people complain about, and then there are things nobody uses.

              • ckolkey 9 months ago

                Its just a Scheme dialect. A bit odd, but not crazy.

                • jonathanyc 9 months ago

                  Not really. It uses S-expressions but Scheme pattern matching is totally different. The most common Scheme pattern matching syntax is basically the same as pattern matching in any other language: x means “bind the value at this position to x”, not “the child node of type”. See: https://www.gnu.org/software/guile/manual/html_node/Pattern-... or syntax-rules.

                  It’s as much a Scheme dialect as WASM’s S-expression form is a Scheme dialect.

                  Treesitter’s query syntax is slightly understandable in the sense that having x match a node among siblings of type x works well for extracting values out of sibling lists. Most conventional pattern syntaxes struggle with this, e.g. how do you match the string “foo” inside of a list of strings in OCaml or Rust without leaving the match expression and resorting to a loop?

                  But you could imagine a syntax-rules like use of ellipses …. There’s also a more powerful pattern syntax someone worked on for implementing Scheme-like macros in non-S-expression based languages whose name escapes me right now.

          • mhw 9 months ago

            I’ve been looking at codemod tools recently, just as a way to extend my editing toolbox. I came across https://ast-grep.github.io/, which looks like it might address part of this problem. My initial test case was to locate all calls to a method where a specific argument was ‘true’, and it handled that well - that’s the kind of thing an IDE seems to struggle with. I’m not yet sure whether it could handle renaming a variable though.

            I guess what I’m looking for is something that

            * can carry out the kind of refactorings usually reserved for an IDE

            * has a concise language for representing the refactorings so new ones can be built quite easily

            * can apply the same refactoring in multiple places, with some kind of search language (addressing the task of renaming a test parameter in multiple methods)

            * ideally does this across multiple programming languages

            • _jayhack_ 9 months ago

              Interesting refactor!

              This is trivial with codegen.com. Syntax below:

                # Iterate through all files in the codebase
                for file in codebase.files:
                    # Check for functions with the pytest.fixture decorator
                    for function in file.functions:
                        if any(d.name == "fixture" for d in function.decorators):
                            # Rename the 'db' parameter to 'database'
                            db_param = function.get_parameter("db")
                            if db_param:
                                db_param.set_name("database")
                                # Log the modification
                                print(f"Modified {function.name}")
              
              Live example: https://www.codegen.sh/codemod/4697/public/diff
              • poincaredisk 9 months ago

                Consider indenting your code block, it's unreadable as it is now.

                • _jayhack_ 9 months ago

                  Good call, thank you

                • jesus_meza 9 months ago

                  That's pretty sick. Super readable with python :)

                  Is each file getting parsed individually with tree-sitter or how is the codebase object constructed?

                  • _jayhack_ 9 months ago

                    We do advanced static analysis to provide programmatic access to the type system, etc., based on tree-sitter and in-house tech.

                    This enables APIs such as `function.call_sites`, `symbol.usages`, `class.parent_classes`, and more!

                    • pksunkara 9 months ago

                      Where can I learn more about this? You guys don't seem to have any docs available.

                • seanhunter 9 months ago

                  Tree-sitter is really powerful, but it's worth people learning a few methods they prefer to use because there are going to be situations where one method works better than another. Things I have found useful in the past include

                  - perl -pi -e 's/foo/bar/g' files

                  "-pi" means "in place edit" so it will change the files in place. If you have a purely mechanical change like he's doing here it's a very reasonable choice. If you're not as much of a cowboy as I am, you can specify a suffix and it will back the files up, so something like

                  perl -p -i.bak -e 's/db/database/g' py

                  For example then all your original '.py' files will be copied to '.py.bak' and the new renamed versions will be '.py'

                  For vim users (I know emacs has the same thing but I don't remember the exact invocation because it has been >20years since I used emacs as my main editor) it's worth knowing the "global" command. So you can execute a particular command only on lines that match some regex. So say you want to delete all the lines which mention cheese

                  :%g/cheese/d

                  Say you want to replace "db" with "database" but only on lines which start with "def"

                  :%g/^def/s/db/database/

                  OK cool. Now if you go 'vim *py' you can do ":argdo g/^def/s/db/database/ | update" and it will perform that global command across all the files in the arg list and save the ones which have changed.

                  • Jackevansevo 9 months ago

                    Author here: I'm super familiar with this kind of find and replace syntax inside vim or with sed. Usually it works great!

                    But in this specific situation it was tricky to handle situations with things spanning over multiple lines + preventing accidental renames.

                    • tmoertel 9 months ago

                      For those tricky situations, there's "sledgehammer and review" and the second-order git-diff trick:

                      https://blog.moertel.com/posts/2013-02-18-git-second-order-d...

                      • seanhunter 9 months ago

                        I realise that and like the article. I was trying to convey in my response that devs should have these things in their toolkit not that you "did the wrong thing"[1] somehow by using treesitter for this.

                        [1] like that's even possible in this situation

                      • aulin 9 months ago

                        About the cowboy comment, that's what version control is for. Just modify in place and then stage hunk by hunk with magit or git add -p.

                        • _whiteCaps_ 9 months ago

                          I'd reach for argdo as well - but I don't think this covers his use case of:

                          > every instance of a pytest fixture

                          Although it's probably good enough for 99% of the use cases, and any extra accidental renames could be reverted when you look at the diff.

                          Maybe it could be covered with a multi line regex using `\_.`

                        • morgante 9 months ago

                          Nice (simple) introduction to the tree sitter APIs.

                          If you're looking for a higher level interface, GritQL[0] is built on top of tree-sitter and could handle the same refactor with this query:

                            language python
                          
                            `def $_($_): $_` as $func where $func <: contains `database` => `db`
                          
                          
                          [0] https://github.com/getgrit/gritql
                          • ievans 9 months ago

                            I wrote up a Semgrep rule as a comparison to add! (also tree-sitter based, `pip install Semgrep`, https://github.com/semgrep/semgrep, or play with live editor link: https://semgrep.dev/playground/s/nJ4rY)

                                pattern: |-
                                   def $FUNC(..., database, ...):
                                       $...BODY
                                fix: |-
                                  def $FUNC(..., db, ...):
                                      $...BODY
                            • otteromkram 9 months ago

                              Everyone's tossing in the name of other third-party packages, but have you explored the language section from Python's standard library?

                              https://docs.python.org/3/library/language.html

                              • benrutter 9 months ago

                                I was thinking too as I read that AST could be swapped in for tree sitter and I think it'd work more or less the same (not sure it'd have an advantage though, unless you preferred using standard library tools where possible)

                              • desbo 9 months ago

                                Would’ve been easy with fastmod: https://github.com/facebookincubator/fastmod

                                • westurner 9 months ago

                                  > I do wish tree-sitter had a mechanism to directly manipulate the AST. I was unable to simply rename/delete nodes and then write the AST back to disk. Instead I had to use Jedi or manually edit the source (and then deal with nasty off-set re-parsing logic).

                                  Or libCST: https://github.com/Instagram/LibCST docs: https://libcst.readthedocs.io/en/latest/ :

                                  > LibCST parses Python 3.0 -> 3.12 source code as a CST tree that keeps all formatting details (comments, whitespaces, parentheses, etc). It’s useful for building automated refactoring (codemod) applications and linters.

                                  libcst_transformer.py: https://gist.github.com/sangwoo-joh/26e9007ebc2de256b0b3deed... :

                                  > example code for renaming variables using libcst [w/ Visitors and Transformers]

                                  Refactoring because it doesn't pass formal verification: https://deal.readthedocs.io/basic/verification.html#backgrou... :

                                  > 2021. deal-solver. We released a tool that converts Python code (including deal contracts) into Z3 theorems that can be formally verified

                                  • westurner 9 months ago

                                    Vim python-mode: https://github.com/python-mode/python-mode/blob/e01c27e8c17b... :

                                    > Pymode can rename everything: classes, functions, modules, packages, methods, variables and keyword arguments.

                                    > Keymap for rename method/function/class/variables under cursor

                                      let g:pymode_rope_rename_bind = '<C-c>rr
                                    
                                    python-rope/ropevim also has mappings for refactorings like renaming a variable: https://github.com/python-rope/ropevim#keybinding :

                                      C-c r r   :RopeRename
                                      C-c f     find occurrences 
                                    
                                    https://github.com/python-rope/ropevim#finding-occurrences

                                    Their README now recommends pylsp-rope:

                                    > If you are using ropevim, consider using pylsp-rope in Vim

                                    python-rope/pylsp-rope: https://github.com/python-rope/pylsp-rope :

                                    > Finding Occurrences: The find occurrences command (C-c f by default) can be used to find the occurrences of a python name. If unsure option is yes, it will also show unsure occurrences; unsure occurrences are indicated with a ? mark in the end. Note that ropevim uses the quickfix feature of vim for marking occurrence locations. [...]

                                    > Rename: When Rename is triggered, rename the symbol under the cursor. If the symbol under the cursor points to a module/package, it will move that module/package files

                                    SpaceVim > Available Layers > lang#python > LSP key Bindings: https://spacevim.org/layers/lang/python/#lsp-key-bindings :

                                      SPC l e  rename symbol
                                    
                                    Vscode Python variable renaming:

                                    Vscode tips and tricks > Multi cursor selection: https://code.visualstudio.com/docs/getstarted/tips-and-trick... :

                                    > You can add additional cursors to all occurrences of the current selection with Ctrl+Shift+L. [And then rename the occurrences in the local file]

                                    https://code.visualstudio.com/docs/editor/refactoring#_renam... :

                                    > Rename symbol: Renaming is a common operation related to refactoring source code, and VS Code has a separate Rename Symbol command (F2). Some languages support renaming a symbol across files. Press F2, type the new desired name, and press Enter. All instances of the symbol across all files will be renamed

                                    • morningsam 9 months ago

                                      But does Rope understand pytest fixtures? I doubt it, but would be happy to be proven wrong.

                                      • westurner 9 months ago

                                        Yeah, there `sed` and `git diff` with one or more filenames in a variable might do.

                                        Because pytest requires a preprocessing step, renaming fixtures is tough, and also for jupyter notebooks %%ipytest is necessary to call functions that start with test_ and upgrade assert keywords to expressions; e.g `assert a == b, error_expr` is preprocessed into `assertEqual(a,b, error_expr)` with an AssertionError message even for comparisons of large lists and strings.

                                • caeruleus 9 months ago

                                  There is a Python library/tool called Bowler (https://pybowler.io/docs/basics-intro) that allows selecting and transforming elements on a concrete syntax tree. From my limited experience with it, I guess it would have been a nice fit for this refactoring.

                                • herrington_d 9 months ago

                                  This can also be achieved via ast-grep by the command `sg -p 'database' -r 'db'`.

                                  https://ast-grep.github.io/guide/rewrite-code.html

                                  • 147 9 months ago

                                    I've always wanted to do mechanical refactors and recently ran into the problem the author ran into where tree-sitter can't write back the AST as source. Is there an alternative that is able to do this for most programming languages?

                                    • nemoniac 9 months ago

                                      There are several straightforward ways to do this without needing Tree-sitter or Jedi.

                                      Here are two approaches in Emacs.

                                      https://emacs.stackexchange.com/a/69571

                                      https://rigsomelight.com/2010/02/14/emacs-interactively-find...

                                      • IanCal 9 months ago

                                        Is that the same? That looks like just a text based replacement.

                                      • ruined 9 months ago

                                        what are some other tools like jedi? it would be cool to have a list of the favored tool for each language, or a meta-tool.

                                        there's tsmod at least https://github.com/WolkSoftware/tsmod

                                        i've heard of fastmod, codemod but never used them.

                                        • rty32 9 months ago

                                          In the JavaScript world, jscodeshift and its upstream tool recast are frequently used. I believe you could do the same thing with esbuild and some Rust based tools, but these two are probably the most popular.

                                        • iamcreasy 9 months ago

                                          Was it possible to have Python parse its own ast and rename those variables?

                                          • 29athrowaway 9 months ago

                                            Use a query expression instead.

                                            • pbreit 9 months ago

                                              I'm wondering if this would be fairly easy to do with AI?

                                              • gloflo 9 months ago

                                                What kind of "AI"? LLM-based hype would probably miss random ones.

                                                • xrd 9 months ago

                                                  Check out the gritql example from morgante. That does a lot of cool things and is what you are looking for.

                                              • nfrankel 9 months ago

                                                I wonder if the author has ever heard something called an IDE?

                                                • ErikBjare 9 months ago

                                                  I think this particular case would be difficult to refactor even in an IDE like PyCharm, which afaik is the best at refactoring Python (might be outdated).

                                                  • Jackevansevo 9 months ago

                                                    Author here, I'm not aware of any IDE that can do this specific refactor

                                                    • morningsam 9 months ago

                                                      PyCharm understands pytest fixtures and if this is really just about a single fixture called "database", it takes 3 seconds to do this refactoring by just renaming it.

                                                    • fiddlerwoaroof 9 months ago

                                                      IDEs are great if your refactorings fit in the predefined refactorings

                                                      • lispisok 9 months ago

                                                        yes but how does the IDE do it?

                                                        • rustyminnow 9 months ago

                                                          What's an IDE and how does it refactor hundreds of semantically unrelated identifiers in one go?

                                                          • cstrahan 9 months ago

                                                            I’m not a Python developer, but…

                                                            I believe the idea is that those identifiers are semantically related: that fixture decorator inspects the formal parameter names so that it can pass the appropriate arguments to each test when the tests are run. A sufficiently smart IDE and/or language server would thus know that these identifiers are related, and performing a rename on one instance would thus rename all of the others.

                                                            And maybe you were being facetious, but an IDE is an “Integrated Development Environment”.

                                                            Edit: Yep. Took all of 60 seconds to find what I’m looking for, as I type this from my phone while sitting in my throne room: https://docs.pytest.org/en/6.2.x/fixture.html

                                                            See the “Fixtures can request other fixtures” section, which describes the scenario from TFA.

                                                            And this post describes the PyCharm support for refactoring fixtures: https://www.jetbrains.com/guide/pytest/tutorials/visual_pyte...

                                                          • 1-more 9 months ago

                                                            Write instructions on how to do this in any IDE.

                                                            • morningsam 9 months ago

                                                              In PyCharm: Move cursor on any occurence or definition of "database" fixture, press the "Rename" hotkey (Shift+F6), delete old name and type new name, press Enter key to confirm.

                                                              • 1-more 9 months ago

                                                                Wouldn’t that be limited to the single function?

                                                                • morningsam 9 months ago

                                                                  A single fixture, yes. If there are many fixtures of the same name in different test modules, it wouldn't work, but that's not how I understood the problem in the blog post, which says

                                                                  >rename every instance of a pytest fixture from database -> db

                                                                  Every instance of a fixture, not every instance of all fixtures of the same name.

                                                              • cstrahan 9 months ago

                                                                The fine folks at JetBrains already have done just that:

                                                                https://www.jetbrains.com/guide/pytest/tutorials/visual_pyte...

                                                                • 1-more 9 months ago

                                                                  I don’t think that this will handle what the author wants, though. If each function takes db as an argument and uses it internally, those don’t count as references that you can change globally, right?