• accounter8or 4 days ago

    I've always wanted this. You get so spoiled with the Chrome Dev Tools when using Javascript that you miss it when you don't have it.

  • zug_zug 4 days ago

    I've always used a proxy, like charles proxy, for this exact purpose.

    A neutral middle-man that gives exact timing/response data.

    • nghiatran_uit 4 days ago

      It's not as simple as it sounds, it requires a lot of code to capture Python traffic with charles-proxy. For example, you might modify your python code to use a Proxy and accept a self-signed Charles's certificate.

      If you need a 1-click solution, no dependencies, and no code's required, check out Proxyman with Auto-Setup: https://docs.proxyman.io/automatic-setup/automatic-setup

      Works with all popular Python libs: request, aiohttp, http.client, urllib3, etc

      * Disclaimer: I'm Noah, creator of Proxyman. I know a pain point when using Charles, and decided to build a new one, to make life easier. Hope it helps you.

      • antman 3 days ago

        Is that only iOS?

        • nghiatran_uit 3 days ago

          Both, native macOS and iOS

      • actionfromafar 4 days ago

        That’s fine if you can, but say you want to trace deployed on stage, or shudders even production. Or you code tests against some CI and can only add Python.

        • zug_zug 4 days ago

          Well I would never use a script like this at a real job outside of local testing.

          What you'd want on stage is probably using opentelemetry, which I believe has auto-instrumentation for all network calls. Then you have the data forever and it's in a public, shared platform that is permissioned and everybody knows, and will still exist in 10 years.

        • cjbprime 4 days ago

          I'm really surprised this was downvoted. Running e.g. mitmproxy and pointing your Python process at it is absolutely the way to do this.

          • sureglymop 2 days ago

            This is how I do it. But I use it for everything I even want to proxy yo inspect the traffic really.

        • Jugurtha 4 days ago

          That's pretty cool! I was playing last night and implemented resumable downloads[0] for pip so that it could pick up where it stopped upon a network disconnect or a user interruption. It sucks when large packages, especially ML related, fail at the last second and pip has to download from scratch. This tool would have been nice to have. Thanks a bunch,

          - [0]: https://asciinema.org/a/1r8HmOLCfHm40nSvEZBqwm89k

          • westurner 3 days ago

            It is important to check checksums (and signatures, if there are any) of downloaded packages prior to installing them; especially when resuming interrupted downloads.

            Pip has a hash-checking mode, but it only works if the hashes are listed in the requirements.txt file, and they're the hashes for the target platform. Pipfile.lock supports storeing hashes for multiple platforms, but requirements.txt does not.

            If the package hashes are retrieved over the same channel as the package, they can be MITM'd too.

            You can store PyPi package hashes in sigstore.

            There should be a way for package uploaders to sign their package before uploading. (This is what .asc signatures on PyPi were for. But if they are retrieved over the same channel, cryptographic signatures can also be MITM'd).

            IMHO (1) twine should prompt to sign the package (with a DID) before uploading the package to PyPi, and (2) after uploading packages, twine should download the package(s) it has uploaded to verify the signature.

            ; TCP RESET and Content-Range doesn't hash resources.

            • Jugurtha 3 days ago

              Thanks for the pointers. The diff is tiny and deals only with resuming downloads. i.e: everything else is left as is.

          • judofyr 4 days ago

            Looks neat!

            A similar tool for this would be VCR (originally built in Ruby, but ported to other languages since): https://vcrpy.readthedocs.io/en/latest/. This injects itself into the request pipeline, records the result in a local file which can then also be replayed later in tests. It's a quite nice approach when you want to write tests (or just explore) a highly complicated HTTP API without actually hitting it all the time.

            • cle-b 4 days ago

              I really like vrcpy. I used it a lot with pytest in my previous job. httpdbg isn’t exactly the same; the idea is more about seeing HTTP requests in real-time and being able to easily study them.

              • seanc 4 days ago

                The inspection and debugging features this offers are great additions though. I've stared at VCR yaml enough times to not want to ever do it again.

              • stuaxo 4 days ago

                This is great -

                It would be good to be be able to have django debug toolbar integration, that way I could see which requests were made to backend APIs without leaving Django.

                Having tried MITMProxy something like httpdbg is definitely needed.

                • diegoallen 4 days ago

                  You can do that with django debug toolbar. If you have an endpoint that doesn't return HTML, and hence wouldn't render debug toolbar, you can go to django admin (or any other endpoint that would render ddt) and go to the history pane, check other requests and switch to them.

                • nfgrars 4 days ago

                  Alternatively use man (1) ngrep for http or man (1) openssl for https.

                  • billconan 4 days ago

                    this is very useful, but why can it only work with python code? At which level does it intercept the http traffic?

                    do I have to use specific http library?

                    • tredre3 4 days ago

                      It seems to intercept calls for some popular http client libs:

                      https://github.com/cle-b/httpdbg/tree/main/httpdbg/hooks

                      • cle-b 4 days ago

                        It works only with Python code because it intercepts HTTP requests by hooking into certain Python functions.

                        It supports any HTTP library based on Python’s standard socket library. Specifically, it works with libraries like requests, httpx, aiohttp, and urllib3, as well as pytest, providing more detailed information about the initiator of the requests.

                      • cdfuller 4 days ago

                        Is there a way to use it with jupyter notebooks? `pyhttpdbg -m jupyter notebook` didn't work for me

                      • 10000truths 4 days ago

                        Is there a way to write the trace to a file, instead of spinning up a local web server?

                        • linuxdude314 3 days ago

                          Just use OpenTelemetry. No need to keep reinventing wheels.

                          • dmurray 4 days ago

                            This looks great. I have a use case for something similar: detecting calls to the file system. Lots of code I've inherited has a habit of loading configuration from some random network share, then failing when that config got moved or the production host doesn't have the same access.

                            I usually use strace(1) to track these down, but it's nowhere near as ergonomic as this tool. I'm wondering now if I could patch the `open` built-in instead.

                            • oefrha 3 days ago

                              CPython since 3.8 already has built-in audit events, including open, so you don't need to patch anything or use anything external. Just add an audit hook with sys.addaudithook().

                              Quick example:

                                  import inspect
                                  import pathlib
                                  import sys
                              
                              
                                  def callsite():
                                      try:
                                          pathlib.Path("/tmp/file").open()
                                      except:
                                          pass
                              
                              
                                  def audit_hook(event, args):
                                      if event == "open":
                                          path, mode, flags = args
                                          print(f"audit: open({path!r}, {mode!r}, 0o{flags:o})")
                                          # Not using traceback here because traceback will attempt to read the
                                          # source file, causing an infinite recursion of audit events.
                                          f = inspect.currentframe()
                                          while f := f.f_back:
                                              print(
                                                  f'File "{f.f_code.co_filename}", line {f.f_lineno}, in {f.f_code.co_name}'
                                              )
                              
                              
                                  def main():
                                      sys.addaudithook(audit_hook)
                                      callsite()
                              
                              
                                  if __name__ == "__main__":
                                      main()
                              
                              Prints:

                                  audit: open('/tmp/file', 'r', 0o100000000)
                                  File "/path/to/python/lib/python3.12/pathlib.py", line 1013, in open
                                  File "/tmp/audit.py", line 10, in callsite
                                  File "/tmp/audit.py", line 26, in main
                                  File "/tmp/audit.py", line 30, in <module>
                              
                              https://docs.python.org/3/library/audit_events.html
                              • dmurray 3 days ago

                                Sounds perfect. I didn't know of this, but I think I'll start here.

                              • mcoliver 4 days ago

                                If on Linux or windows you can use Procmon or Instruments on macos.

                                https://github.com/Sysinternals/ProcMon-for-Linux

                                • zerocool2750 4 days ago

                                  Went spelunking through the source. I think you absolutely could!

                                  There's actually not a whole lot I found that's really http-library specific. It uses the traceback module in a decorator that ends up being manually wrapped around all of the functions of the specific libraries the author cared about.

                                  https://github.com/cle-b/httpdbg/blob/main/httpdbg/hooks

                                  Should be easy enough to extend this to other libraries.

                                  Super cool tool thanks for sharing @dmurray!

                                  • sYnfo 4 days ago

                                    You might find the syscall tracing functionality of Cirron useful: https://github.com/s7nfo/Cirron

                                  • hartator 4 days ago

                                    I wonder if the same exists for Ruby?

                                    • ricardo81 4 days ago

                                      I could be lost here (C/PHP/Node coder mainly in code I've used)

                                      Why is it a special case to track HTTP/s requests, that otherwise couldn't be logged like any other process/function? I'd guess most people use libcurl and you can wrap something around that.

                                      I guess I'm lost on why this is HTTP or Python specific, or if it is, fine.

                                      • cle-b 4 days ago

                                        Unlike other tools such as proxies that allow you to trace HTTP requests, httpdbg makes it possible to link the HTTP request to the Python code that initiated it. This is why it is specific to Python and does not work with other languages.

                                        • ricardo81 4 days ago

                                          I'm still not understanding.

                                          If you're coding something up, why wouldn't you know that piece of code does a HTTP/s request? Based on what you said, it sounds like a scenario where a programmer doesn't know how a request was made. Are there examples of scenarios where that's the case?

                                          Sounds like a bit of a security nightmare where there's code doing arbitrary requests.

                                          • bityard 4 days ago

                                            Maybe you are working with an application or library that you didn't write, and want to see the raw requests and responses it generates without reading the entirety of the source code.

                                            Maybe you are generating HTTP requests through an API and need to see which headers it sets by default, or which headers are or are not getting set due to a misconfiguration or bug.

                                            There are probably loads more use cases, and if I actually did programming for a living, I could probably list a lot more.

                                            • ricardo81 4 days ago

                                              The 3rd party library stuff makes sense, to an extent. But then you're debugging a 3rd party library.

                                              • diegoallen 4 days ago

                                                If a 3rd party library you depend on has bugs, you have bugs. And you need to either submit a patch to the library or find a workaround.

                                                • ricardo81 4 days ago

                                                  Or just not use arbitrary 3rd party stuff hoping it works :)

                                                  libcurl is used on billions of devices across the world and has plenty of debugging capabilities.

                                                  MITM proxy works across all languages.

                                                  • fragmede 4 days ago

                                                    The NIH is strong in this once.

                                            • whirlwin 4 days ago

                                              Here's a concrete scenario for you: Say you are in a team of 10 developers with a huge codebase that has accumulated over 5+ years. If you're new in the team, and you need to understand when a specific HTTP header is sent, or just snoop the value in the payload you otherwise wouldn't be able to see.

                                              • ricardo81 4 days ago

                                                Snooping traffic isn't new though, so what's specific about this tool and Python.

                                                • whirlwin 4 days ago

                                                  How would you snoop outgoing HTTPS traffic otherwise easily anyway? mitmproxy requires some work to set up

                                              • golergka 4 days ago

                                                > If you're coding something up, why wouldn't you know that piece of code does a HTTP/s request?

                                                Because tracing all side-effects in a huge codebase with a lot of libraries and layers can be a daunting task.

                                                Update: if you haven't worked with 20 year old >1m LOC codebase which went through many different teams and doesn't have any documentation whatsoever, you may lack necessary perspective to see value tools like this.

                                                • ricardo81 4 days ago

                                                  Sounds like people dealing with code they have no idea what it does. No amount of tools are going to help with that.

                                                  • actionfromafar 4 days ago

                                                    I think you attract downvotes because tools are helpful. If you have a huge unknown codebase, it can be nice to attack it from different angles. Reading code is useful, but observing what it does in runtime can be useful, too. Also, with hairier code, it can be more useful to first observe and prod it like a black box.

                                                    • ricardo81 4 days ago

                                                      Just the 1 downvote.

                                                      Yes, "tools are helpful", but whether there's a python/http specific tool that doesn't do what more generic tools do remains to be seen.

                                            • seanc 4 days ago

                                              In the old days we'd use tcpdump and wireshark for this, but nowadays everything is encrypted up in the application layer so you need this kind of thing. Or tricky key dumping hacks.

                                            • robertlagrant 4 days ago

                                              I think the nice thing about HTTP for this is different parts of the stack can introduce default headers etc and it's helpful to be able to see the actual request after all that processing's been done.

                                              • ricardo81 4 days ago

                                                With curl there's always CURLOPT_VERBOSE as per the library.

                                            • Too 4 days ago

                                              Can recommend Opentelemetry if you need a more comprehensive tool like this.

                                              There is a whole library of so called instrumentation that can monkeypatch standard functions and produce traces of them.

                                              Traces can also propagate across process and rpc, giving you a complete picture, even in a microservice architecture.