Comments Page - Pantheon: Parsing command line arguments

« Back Pantheon: Parsing command line argumentstraxys.meSubmitted by lukastyrychtr 9 months ago

IgorPartola 9 months ago
Ages ago I was writing a lot of daemons in Python. They all needed to be well behaved UNIX processes: they needed to be able to background themselves, write a PID file atomically and not start a second copy of a PID file for a live process existed, etc.
The main thing I wanted them to do is to behave well with configuration both from a config file and from command line arguments. Lots of libraries exist for either but none existed for both: I wanted to have something that would be able to correctly pull something like the PID file location from a config file but also be able to be overwritten by a command line argument. I also wanted to be as simple to use as possible. The code is hosted at https://github.com/ipartola/groper
I am not sure if in today’s world of systemd this is useful at all. The code probably needs clean up since it was written to be compatible with both Python 2 and 3 at the time. But I have to say that I haven’t seen anything like this since I wrote this code. If there is interest I can take a look at modernizing this.
Also in general I appreciate programs, especially daemons that are “well behaved”: a sensible config file format, sensible arguments parsing, sane defaults, does not background itself without an explicit configuration to do so, logs to stdout and stderr by default. I feel like there is less and less emphasis on stuff like this lately.
- o11c 9 months ago
  It's definitely a field that there's still need in. I recently started writing yet another argument parser, with all the knowledge of my prior attempts ...
  One thing I'm really starting to lean toward is the idea of using a bespoke data file to actually define the options, and having that generate the code.
  My key observation is that actually parsing the arguments is kind of secondary; the most important thing is to be able to generate good --help text (I've collected many samples and want to be flexible enough for them all). And for config files, to be able to generate a commented "default config" + merge with given command-line arguments. A GUI (or at least a JSON API for the web) for interactive selection, as well as tab-completion for the CLI, are also important.
  Now, I'm currently halfway through implementing a knockoff of the Unicode line-breaking algorithm so that people can translate my help text properly (for a program nobody else will probably ever use) ... I've basically forgotten the project that needed the argument parser.
  There are actually more sources than those you mentioned:
  * hard-coded defaults
  * config files
  * ENV=variables (may be a single TOOLNAME_OPTS variable or separate per-option variables)
  * --options and @files (may be nested, beware cycles)
  Beware of lists (default is port 8080, but I want 8000 and 8001 - wait, 8008 instead).
  And parsing also needs to handle more than just GNU style arguments. Single-dash long options are pretty common.
  Some tricky tools that most libraries can't handle: chmod, dd, more, ps, tar.
  For fused arguments, make sure your testsuite includes errors for `-a-b` and `-a-`.
  IgorPartola 9 months ago
  Those are all great points. I do like the idea of a stand-alone file. Especially since then you could do code generation and if you care about doing type hints you now could.
  groper uses getopt built into the Python standard library as well as the ini file parser. This way there are no dependencies on external stuff. And it automatically generates help text with all the options and can automatically generate a sample config file with all the default options hard coded in the code. Those two things have made it invaluable for me when I used it extensively. I think adding environment variables to the mix would be a good idea.
  I am not sure what you mean by --options. By @files do you mean like what curl can do? If so I have always solved that by just doing something like --in or --file or just making the input file the last argument to the command.
  Lists are definitely tricky but if I recall correctly the version I put on GitHub allows you to specify the same option multiple times in order to get a list but I’m not sure that’s the cleanest way to do this. Possibly having custom option types that do splitting on a delimiter for something like port numbers could be good.
  In either case it’s good to know that this space still could use some work because that’ll be the motivation for me to dust off this project.
- bbkane 9 months ago
  I'm writing something similar for Go: passed flags override env vars, which override the config file, which overrides defaults set in the code.
  With 4(!) sources of configuration to keep track of, I've also made --help print out current values and the source for it (i.e. the password came from an environment variable and the base_url came from a config file).
  It's been tremendously satisfying to use this library for my own projects, as everything works exactly the way I expect (and if it doesn't I change the library).
  Code at https://github.com/bbkane/warg if you want to compare it to your library :)
- joshka 9 months ago
  There's a discussion happening in a new Rust config management crate at https://github.com/cbeck88/conf-rs/discussions/1 about a similar sort of ideal.