Comments Page - JSON has become today's machine-readable output format (on Unix)

« Back JSON has become today's machine-readable output format (on Unix)utcc.utoronto.caSubmitted by ingve 5 hours ago

nayuki 4 hours ago
Oddly enough, BitTorrent's bencode format is a significant subset of JSON but is much easier to parse as a binary format. Bencode only supports integers, byte strings, lists, and dictionaries.
I wrote a more detailed comparison on: https://www.nayuki.io/page/bittorrent-bencode-format-tools
- amelius an hour ago
  So if you want strings, you need to guess what encoding was used or store the encoding in another field? I don't think that makes it a much nicer format. I do like the ability to store byte strings directly.
- sriram_malhar 3 hours ago
  I really like bencode. The only thing I miss is floats.
  victorstanciu 2 hours ago
  You can use two integers, one that represents the entire number including decimals, and one that represents the precision, to know how many decimals are there. For example, you'd represent "123.45" as 12345 and 2. That's often how monetary amounts are stored in databases, to avoid a lot of common floating-point arithmetic pitfalls.
  OJFord 2 hours ago
  Or just '123' & '45'?
  zinekeller 2 hours ago
  I think it is to optimize arithmetic operations. Significantly less steps with the first method, which only requires adjustment of how many digits are considered to be decimal rather than the rejoin, arithmetic, separate again for your proposal. Plus, wider float.
  noam_k an hour ago
  But then you can't tell the difference between 0.12 and 0.00012.
  Unless you're suggesting to use the strings "0" and "00012", at which point you could just use a byte string with the utf8 encoding of the value.
  thrance 2 hours ago
  But that's just floats with extra steps? Floats have two parts in their binary representation: mantissa and exponent (and sign), which correspond exactly to your "entire number" and "precision", only in base 2 instead of 10.
  wavemode 2 hours ago
  the difference being that with integers you never end up with rounding errors when doing addition subtraction or multiplication (only division)
  kali_00 2 hours ago
  [dead]
- skerit 4 hours ago
  Oh, I didn't know about Bencode. It looks interesting. Thank you for sharing!
braggerxyz 3 hours ago
I agree, and I feel like the reason to this is the mere existence of 'jq'. Without 'jq' working with json in a Unix shell would be a lot more uncomfortable, but not impossible.
rffn 3 hours ago
The article suggests that Gnu Awk might soon improve its understanding of JSON.
Can somebody please shed some light at this? Will gawk get JSON support. Or is is already there and I just need to get a recent version?
- shakna 2 hours ago
  GNU awk already does. [0] Sorta. The "non-essential" stuff, like xml, json, redis gets put into gawkextlib. Usually packaged for your platform.
  [0] https://www.gnu.org/software/gawk/manual/html_node/gawkextli...
enriquto 2 hours ago
sort of agree... but only because you can gron it to remove the madness and then grep/cut/sed/awk the output like a human being.
JSON is just a modern flavor of XML, and in a few years we'll likely mock both of them as if they were the same silly thing. They are functionally equivalent: not human-writable, cumbersome, crufty, with unclear semantics, with unclear semantic usage, and all around ugly.
- maccard an hour ago
  I unfortunately write a bit of both xml and json as part of my day to day. JSON is significantly easier to read and write as a human. So many xml files use a combination of the file structure, the nodes and the attributes to encode their data - meaning to parse it you need to know the specifics of how it was written. JSON is much simpler and 95% of the time can be thought of as map<string, JsonObject> and it just works.
  Yml goes too far in the brevity side - I find the 2 space indent, use of “-“ as a list delimiter, white space sensitivity and confusing behaviour with quoted vs unquoted strings incredibly hard to work with.
  enriquto 33 minutes ago
  > 95% of the time can be thought of as map<string, JsonObject>
  But for that case you don't need json. A dockerfile-like text file with lines of the form
  STRING other stuff
  is trivial to parse in any language and without requiring any library. And it's actually human-editable.
  Using json for such trivial stuff is like using a linear algebra library to compute b/a (the quotient of two numbers) by calling linalg.solve([[a]],[b]). Of course it will work. Of course it is more general. But it feels silly.
IcyWindows 3 hours ago
This is part of the reason I love working with powershell. I like having things already in a json-like format by default.
kstenerud 2 hours ago
JSON solves enough problems and is a simple enough format to become ubiquitous. My only beef is with the serialization/deserialization costs.
That's why I've made a 1:1 binary format
https://github.com/kstenerud/bonjson
sim7c00 2 hours ago
about time to move from undefined to atleast something... programs should have clear interfaces for in and output. im sure there were sound reasons, but slapping wads of unstructured text to and fro in 2025 sounds almost primordial -_-
skerit 4 hours ago
I like that more and more CLI tools are implementing a json-output mode. Like `ip -j a`
JasonFive 3 hours ago
I prefer JSON5 myself but not well support unfortunately
https://json5.org/
- silon42 2 hours ago
  Not sure why they had to add additional white space characters... also, single line comment seems problematic in this respect... machine readable JSON is often one line.
teddyh 2 hours ago
See also: <https://varlink.org/>
Ygg2 4 hours ago
JSON is a human readable format. Machine readable means only a specialized program can be used to make human understand it.
- AndrewDucker 4 hours ago
  Machine readable just means a machine can read it. Whether humans can read it as well is irrelevant to the definition.
  (And there are pretty much no formats that a human hasn't learned to read, up to and including binary)
  Ygg2 4 hours ago
  So every text is machine-readable? Because even English can now be readable by a machine via LLM. Then why use machine-readable? Just say text.
  AndrewDucker 4 hours ago
  It generally means something with a well specified format which can be processed by a parser.
  Ygg2 4 hours ago
  Technically, LLM can be a parser, if you tell it to output its findings in another data format, as:
  A parser is a software component that takes input data (typically text) and builds a data structure
  DonHopkins 4 hours ago
  Parsers don't hallucinate.
  raverbashing 4 hours ago
  That is a very good way of shooting yourself in the foot
  Now every API call is a "call to an LLM"? How much will that cost? Oh and how are you calling the LLM API in the first place?
  Ygg2 3 hours ago
  Keep in mind, I consider what's machine-readable to mean, what's only readable by a machine. I.e. a magnetic tape is machine-readable, while JSON is human-readable.
  javawizard 3 hours ago
  > I consider what's machine-readable to mean [...]
  I hate to break it to you but the world has a very different and well agreed-upon definition of what "machine readable" means.
  You're going to get nowhere if you continue to argue that your definition is the correct one. That ship sailed long ago.
  Ygg2 an hour ago
  Ok. But what is machine-readable, then?
  Is a picture of my passport machine-readable? Is a PDF machine-readable? How do you classify it? And what happens when ten years in the future once algorithms become more optimized? If an AI can read Shakespeare, and parse its paragraph for verbing a noun, is all human written stuff then machine-readable?
  iinnPP 26 minutes ago
  The picture of your passport is machine-readable to any machine that can read it. That is not all machines.
  The significance of JSON, and the submission itself, seems to be in the ubiquity of JSON making it more machine-readable than other formats.
  You're being too strict with language and definitions.
  oneeyedpigeon 3 hours ago
  That's a minority definition. Few of the rest of us would say "JSON isn't machine-readable".
  cladopa 3 hours ago
  At 2 tokens per second? Needing a network connection or sending your data(including confidential's) to the cloud? needing very expensive GPUs? With hallucinations and tremendous energy expense.
  I parse things usually at a rate of 2-20 million tokens per second, on a local computer. Never hallucinates and it is always perfect.
  Don't get me wrong. I use LLMs a lot, but they are good at what they are good at.
- Philpax 3 hours ago
  "Trivially parseable by machines" is not mutually exclusive with "Trivially parseable by humans". JSON is both.
- discreteevent 4 hours ago
  It's both. "Readable" means "possible to read it". On the human readability side, though, the obligation to enquote string keys irks me a lot.
  Ygg2 4 hours ago
  Having had to have displeasure to program parsers for YAML, the ability to start your word with a valid token symbol in an unquoted string (for example :this or ,this or &:hate_you) is so limiting and prevents many optimizations.
- cladopa 3 hours ago
  No. It means that the parsing code is trivial to make instead of using some kind of LALR, SLR, Earley monstrosities with hundreds of megabytes of grammar tables in memory, just to understand the output of a program like "ls" or "find" or "grep".
  JSON is ALSO easy for machines to read. I know because I made several JSON parsers myself. I have also made parsers for computer languages(with things like PEG) and natural languages and there is a world of difference.
- rusk 3 hours ago
  JSON is machine readable first; and human readable for convenience. Its primary purpose is for machine to machine communication, but it does allow human intervention. People have used that as a signal to start using it for configuration files but that is an anti pattern in my opinion.
  If you want something like JSON that is designed for humans take a look at YAML
  orphea 2 hours ago
  I will pick JSON over YAML every single time. Not because JSON is so good, it's just YAML is so much more cancerous.
  yas_hmaheshwari an hour ago
  Can you explain why is Yaml cancerous (Genuine question)
  I have always preferred (even without thinking) to use configuration files as Yaml and kept Json for interprocess communication