• loresuso a day ago

    Hey! thanks for publishing my tool, and thanks everybody for the great feedback here. Just started addressing some of your points.

    Anyway, my need for the tool was mostly because of these few points:

    - scripting can be much easier with psc, especially when you can output what you want

    - ebpf iterators are so flexible: we can get anything that is defined in the task_struct that is not even exposed in the proc filesytem if we want. This alone makes the tool extremely powerful, with a reasonable amount of effort for just adding a new field

    - I really like querying my system with a simple language. Sometimes I tend to forget about specific ss, lsof, or ps options. In this way, it's much easier for me to get what I need

    - no traditional tooling has native container context. It can be extended to even retrieve data from the kubelet, for instance, but I'll think about it

    Feel free to reach out if you have any particular need

    • dvfjsdhgfv 21 hours ago

      Excellent work, thank you!

    • grantseltzer 2 days ago

      I've played with bpf iterators and wrote a post about them [1]. The benefit of iterating over tasks instead of scanning procfs is a pretty astounding performance difference:

      > I ran benchmarks on current code in the datadog-agent which reads the relevant data from procfs as described at the beginning of this post. I then implemented benchmarks for capturing the same data with bpf. The performance results were a major improvement.

      > On a linux system with around 250 Procs it took the procfs implemention 5.45 ms vs 75.6 us for bpf (bpf is ~72x faster). On a linux system with around 10,000 Procs it took the procfs implemention ~296us vs 3ms for bpf (bpf is ~100x faster).

      [1] https://www.grant.pizza/blog/bpf-iter/

      • tanelpoder 2 days ago

        And with eBPF iterators you can bail out early and move to next if you see a non-interesting item (or one that should be filtered out) instead of emitting textual data of all items and later grepping/filtering things out in post-processing.

        I use early bailout a lot (in 0x.tools xcapture) when iterating through all threads in a system and determining which ones are “active” or interesting

        • rfl890 a day ago

          It took less time for 10,000 processes? Maybe you made a typo

          • stefan_ 2 days ago

            procfs and "everything is a file" is up there with fork on the "terrible useless technology that is undeservedly revered".

          • yjftsjthsd-h 2 days ago

              # Find processes connected to a specific port
              psc 'socket.dstPort == uint(443)'
            
              # Filter by PID range
              psc 'process.pid > 1000 && process.pid < 2000'
            
            
            It seems weird to require the user to remember that ports have to be marked uint when it doesn't look like anything else does.
            • ralferoo 2 days ago

              PIDs haven't been limited to 16-bits for a long time. I guess the default integer in these things is 32-bit signed.

              But, yeah, this could be solved if uint promoted to larger for the comparison.

            • WD-42 2 days ago

              This is neat but the examples comparing the tool against piping grep seem to counter the argument to me. A couple of pipes to grep seems much easier to remember and type, especially with all the quotes needed for psc. For scripts where you need exact output this looks great.

              • pstoll 2 days ago

                I’m the opposite - I much prefer a structured query language (ahem) for this type of thing. If I’m looking at someone’s (ie my own 6 months later) script I much prefer to see the explicit structure being queried vs “why are we feeling for foo or grabbing the 5th field based on squashed spaces as the separater”.

                Nice use of CEL too. Neat all around.

              • mrbluecoat 2 days ago

                Thanks for including so many examples! Perhaps include one example output. Other than mention of the optional '--tree' parameter, it's unclear if the default result would be a list, table, JSON, etc.

                • dundarious 2 days ago

                  I like this tool. I just replaced a multi-step script to find running processes with deleted files open (e.g., updated shared library or binary) that used to be as follows:

                  - grep /proc/*/maps for " (deleted)" (needs root)

                  - exclude irrelevancies like paths starting with "/memfd:" (I have lots of other similar exclusions) with grep -v

                  - extract the pid from the filename part of grep's output with sed

                  - for each pid, generate readable output from /proc/$pid/cmdline (which is NUL separated) with tr, xargs, bash printf

                  - show the pid, cmdline, file path

                  Yes, this is what needs-restarting does too.

                  With this tool, this pipe chain is now just:

                      doas psc -o "process.pid,process.cmdline,file.path" \
                        'file.path.endsWith(" (deleted)") && !file.path.startsWith("/memfd:") && !...' \
                        | sed 1d
                  • mgaunard 2 days ago

                    I'm not convinced with the need to embed CEL. You could just output json and pipe to jq.

                    • guerrilla 2 days ago

                      Sounds less efficient in both space and time.

                      • pstuart 2 days ago

                        I guess it's a matter of muscle memory and workflow. It's nice to have options.

                        • guerrilla 6 minutes ago

                          Fair enough. Letting the computer do the work instead of the brain/body.

                    • fellowmartian 2 days ago

                      An unfortunate name that triggers everybody who’s ever worked at Meta :)

                      • foobarqux 2 days ago

                        Their first example is bad:

                            ps aux | grep nginx | grep root | grep -v grep
                        
                        can be done instead (from memory, not at a Linux machine ATM):

                            ps -u root -C nginx
                        
                        which is arguably better than their solution:

                            psc 'process.name == "nginx" && process.user == "root"'
                        • xorcist 2 days ago

                          The commands in their example are not equivalent. The ps | grep thing searches the full command line including argument while ps -C (and, presumably, the psc thing) just returns the process name.

                          Should you for some reason want to do the former, this is easiest done using:

                            pgrep -u root -f nginx
                          
                          which exists on almost all platforms, with the notable exception of AIX.

                          Their other slightly convoluted example is:

                            psc 'socket.state == established && socket.dstPort == uint(443)'
                          
                          which is much more succinct with:

                            lsof -i :443 -s TCP:ESTABLISHED
                          • dundarious 2 days ago

                            It has process.cmdline as well as .name

                            • wang_li 2 days ago

                              Many new tools appear because people don't know how to use the existing tools or they think the existing tool is too complicated. In time the new tool becomes just as, or more, complicated than the old tool. Because there is a reason the old tool is complicated, which is that the problem requires complexity.

                            • mxey 2 days ago

                              “ss” also has filters, no need for grep

                              ss -o state established '( dport = :ssh or sport = :ssh )'

                            • apopapo 2 days ago

                              > psc uses eBPF iterators to read process and file descriptor information directly from kernel data structures. This bypasses the /proc filesystem entirely, providing visibility that cannot be subverted by userland rootkits or LD_PRELOAD tricks.

                              Is there a trade off here?

                              • mgaunard 2 days ago

                                I found this justification dubious. To me the main reason to use eBPF is that it gives more information and is lower overhead.

                                • tempay 2 days ago

                                  It requires root

                                  • mgaunard 2 days ago

                                    Running eBPF programs doesn't strictly require root.

                                    • cpuguy83 2 days ago

                                      It requires cap_bpf which is considered a high privileged capability.

                                      So yes, it requires root in the sense of what people mean by root.

                                      • mgaunard 2 days ago

                                        You can also enable unpriviledged ebpf.

                                • zokier 2 days ago

                                  how about comparing it to something sensible like osquery instead of doing silly strawman ps pipelines