« BackDon't let dicts spoil your coderoman.ptSubmitted by juniperplant 8 hours ago
  • cardanome 3 hours ago

    This is absolute key advice.

    Another way to look at it is the functional core, imperative shell pattern.

    Wrapping up your dict in a value object (dataclass or whatever that is in you language) early on means you handle the ugly stuff first. Parse don't validate. Resist the temptation of optional fields. Is there really anything you can do if the field is null? No, then don't make it optional. Let it crash early on. Clearly define you data.

    If you have put your data in a neat value objects you know what is in it. You know the types. You know all required fields are there. You will be so much happier. No checking for null throughout the code, no checking for empty strings. You can just focus on the business logic.

    Seriously so much suffering can be avoided by just following this pattern.

    • xenoxcs 12 minutes ago

      I'm a big fan of using Protobuf for the third-party API validation task. After some slightly finniky initial schema definition (helped by things like json-to-proto.github.io), I can be sure the data I'm consuming from an external API is strongly typed, and the functions included in Protobuf which convert JSON to a Proto message instance blows up by default if there's an unexpected field in the API data it's consuming.

      I use it to parse and validate incoming webhook data in my Python AWS Lambda functions, then re-use the protobuf types when I later ship the webhook data to our Flutter-based frontend. Adding extensions to the protobuf fields gives me a nice, structured way to add flags and metadata to different fields in the webhook message. For example, I can add table & column names to the protobuf message fields, and have them automatically be populated from the DB with some simple helper functions. Avoids me needing to write many lines of code that look like:

      MyProtoClass.field1 = DB.table.column1.val

      MyProtoClass.field2 = DB.table.column2.val

      • jimmytucson 3 hours ago

        Here’s an out-there take, but one I’ve held loosely for a long time and haven’t shed yet: dicts are not appropriate for what people mostly use them for, which is named access to member attributes.

        dict is an implementation of a hash table. Hash table are designed for o(1) lookup of items. As such, they are arrays which are much bigger than the number of items they store, to allow hashing items into integers and sidestep collisions. They’re meant to act like an index that contains many records, not a single record.

        A single record is more like a tuple, except you want named access instead of, title = movie[0], release_year = movie[1], etc. And Python had that, in NamedTuple, but it was kinda magical and no one used it (shoutout Raymond Hettinger).

        Granted, this rant is pretty much the meme with the guy explaining something to a brick wall, in that dicts are so firmly entrenched as the "record" type of choice in Python (but not so in other languages: struct, case class, etc. and JSON doesn’t just deserialize to a weak type but I digress).

        • fallingsquirrel 2 hours ago

          NamedTuples are great, but they let you do too much with the objects. You probably don't want users of your GitHubRepo class to be able to do things like `repo[1]` or `for foo in repo`. Dataclasses have more constrained semantics, so I reach for them by default. In my ideal world they would default to frozen=True, kw_only=True, slots=True, but even without those they're a big improvement.

          • seabrookmx 20 minutes ago

            Subclassing NamedTuple is very ergonomic, and given they're immutable unlike data classes I often reach for them by default. I still use Pydantic when I want custom validation or when it ties into another lib like FastAPI.

            • jsyang00 3 hours ago

              I think most modern Python codebases are using dataclasses/ something like Pydantic. I think dicts are mostly seen, like the author suggests, because something which you hacked up to work quickly ends up turning into actual software and it's too much work refactor the types

              • aatarax an hour ago

                Dicts in python are for when you have a thing and you aren't sure what the keys are. Dataclasses are for when you have a thing and you're sure what the keys (attributes are). The trouble is when you have a thing and you're sort of sure, but not entirely sure, and some things are definitely there but not everything you might be thinking of.

              • bigstrat2003 4 hours ago

                For better or for worse, Python doesn't do typing well. I don't disagree that I prefer well defined types, but if that is your desire then I think Python is perhaps not the correct choice of language.

                • Ey7NFZ3P0nzAe 2 hours ago

                  Personnaly I became a huge fan of beartype : https://pypi.org/project/beartype/

                  Leyec, the magic dev behind it managed to make a full python type checker with super advanced features and about 0 overhead. It's crazy

                  • skeledrew 18 minutes ago

                    I tried using it, but beartype quickly became a pain with having to decorate things manually. Then I found typeguard which goes even further and never looked back. Instead of manually decorating each individual function, an import hook can be activated that automatically decorates any function with type annotation. Massive QoL improvement. I have it set to only activate during testing though as I'm unsure of the overhead.

                  • nerdponx 3 hours ago

                    Python does typing pretty darn well now for data like API requests and responses.

                    "Typed Python" does poorly (compared to e.g. Typescript) on things like overloading functions, generics, structural subtyping, et al.

                    • est 2 hours ago

                      > Python doesn't do typing well

                      Golang does typing, but JSONs are PITA to handle.

                      Try parsing something like `[{"a': 1, "b": "c", "d": [], "e": {}}, null, 1, "2"]` in go.

                      Types are a bless as well as a curse.

                      • Aditya_Garg 2 hours ago

                        Thats only because your list has different types. Its a badly formed API and if you really need to support that use case then you can use maps and reflection to handle it.

                        • est 2 hours ago

                          The problem is, programmers can't dictate what JSON should look like in the wild.

                          We used to have strict typed XML. Nobody even bothered.

                          • a57721 5 minutes ago

                            > The problem is, programmers can't dictate what JSON should look like in the wild.

                            Not JSONs in general, but a sane API would never return something like that.

                            > We used to have strict typed XML. Nobody even bothered.

                            Nowadays there is OpenAPI, GraphQL, protobuf, etc. and people do bother about such things.

                            • shiroiushi 3 minutes ago

                              >We used to have strict typed XML. Nobody even bothered.

                              Yeah, because it was ugly as hell and not human-readable.

                      • fhdsgbbcaA 4 hours ago

                        Seems like the issue is less using dicts than not treating external APIs as input that needs to be sanitized.

                        • pmarreck 4 hours ago

                          Agreed. If you sanitize/allowlist API data you should not have issues with dicts.

                          • imron 4 hours ago

                            You'll have issues if you ever rename things in the dict.

                            Linting tools will pick up on every instance where you forgot to rename the fields of a class, but won't do the same for dicts.

                            • FreakLegion 3 hours ago

                              TypedDicts solve the linting problem, but refactoring tools haven't caught up (unlike e.g. ForwardRef type annotations, which are strings but can be transformed alongside type literals).

                              • tomjakubowski 2 hours ago

                                Is there any advantage to using a TypedDict for a record over a dataclass?

                                • FreakLegion an hour ago

                                  TypedDicts "aren't real" in the sense that they're a compile-time feature, so you're getting typing without any deserialization cost beyond the original JSON. Dataclasses and Pydantic models are slow to construct, so that's not nothing.

                                  This of course means TypeDicts don't give you run-time validation. For that, and for full-blown custom types in general, I tend to favor msgspec Structs: https://jcristharif.com/msgspec/benchmarks.html#json-seriali....

                        • cschneid 4 hours ago

                          I generally support this. When dealing with API endpoints especially I like to wrap them in a class that ends up being. I also like having nested data structures as their own class sometimes too. Depends on complexity & need of course.

                              class GetThingResult
                                def initialize(json)
                                  @json = json
                                end
                              
                                # single thing
                                def thing_id
                                  @json.dig('wrapper', 'metadata', 'id')
                                end
                              
                                # multiple things
                                def history
                                  @json['history'].map { |h| ThingHistory.new(h) }
                                end
                                ... two dozen more things
                              end
                          • cle an hour ago

                            Dicts can be a problem, but this particular example isn't that great, like in this diagram from the article:

                              External API <--dict--> Ser/De <--model--> Business Logic
                            
                            Life's all great until "External API" adds a field that your model doesn't know about, it gets dropped when you deserialize it, and then when you send it back (or around somewhere else) it's missing a field.

                            There's config for this in Pydantic, but it's not the default, and isn't for most ser/de frameworks (TypeScript is a notable exception here).

                            Closed enums have a similar tradeoff.

                            • mjr00 16 minutes ago

                              If external API adds a new field but your software already worked, you didn't need it in the first place, so why should it matter?

                              Dropping unknown/unused fields makes sense in 99% of cases.

                            • karmakurtisaani 2 hours ago

                              I've cleaned up code where input parameters came in a dict form. Absolute shit show.

                              - The only way to figure out which parameters are even possible was to search through the code for the uses of the dict.

                              - Default values were decided on the spot all over the place (input.getOrDefault(..)).

                              - Parameter names had to be typed out each time, so better be careful with correct spelling.

                              - Having a concise overview how the input is handled (sanitized) was practically impossible.

                              0/10 design decision, would not recommend.

                              • hcarvalhoalves 4 hours ago

                                Debatable. Here's a counter-point:

                                https://www.youtube.com/watch?v=aSEQfqNYNAc

                                But ok, it's less bad in Python since objects are dicts anyway and you don't need getters.

                                • leoh 2 hours ago

                                  Big structs as params in rust have similar issues

                                  • saintfire an hour ago

                                    In what way? They're not opaque or mutable (by default).

                                    They can be unwieldy but they do define a pretty strongly typed API.

                                  • thebeardisred 4 hours ago

                                    FYI, posted in 2020, updated in 2021.

                                    • est 2 hours ago

                                      dicts are OK, because at least they do have a `key` and it does mean something.

                                      un-annotated tuples and too many func params are cancer.

                                      • ramraj07 2 hours ago

                                        Who does this still??

                                        • stonethrowaway 2 hours ago

                                          No no,

                                          Un-annotated tuples and too many func params are OK, because at least they are pushed and popped from the stack.

                                          Calls and rets without a prologue and epilogue on the other hand…

                                          • est 2 hours ago

                                            > from the stack

                                            Or many, many stacks you can't comprehend nor amend.

                                            I dare to add a new `key` to a dict, can you modify a func call or a tuple with confidence?

                                        • pmarreck 4 hours ago

                                          Less important in Elixir (where they are "maps") due to the immutable nature of them as well as the Struct type which is a structured map.

                                          • nesarkvechnep 10 minutes ago

                                            Yes, usually my APIs in Elixir receive their arguments as a well-typed map, not stringly keyed, and transform them to structs which the core business logic expects.

                                            • mikhmha 2 hours ago

                                              Yup! I find Elixir makes it really intuitive to know when to represent a collection as a map and when to use a list of tuples. And its easy to transform between the two when needed.

                                            • Waterluvian 4 hours ago

                                              I think one really nice thing about Python is duck typing. Your interfaces are rarely asking for a dict as much as they’re asking for a dict-like. It’s pretty great how often you can worry about this kind of problem at the appropriate time (now, later, never) without much pain.

                                              There’s useful ideas in this post but I’d be careful not to throw the baby out with the bath water. Dicts are right there. There’s dict literals and dict comprehensions. Reach for more specific dict-likes when it really matters.

                                              • turnsout 3 hours ago

                                                Duck typing is so fragile… Once you have implementations that are depending on your naming or property structure, you can’t update the model without breaking them all.

                                                If you use a real type, you never have to worry about this.

                                                • pistoleer 8 minutes ago

                                                  You would still have to update everything if you rename a field in a struct, what do you mean you never have to worry?

                                              • Barrin92 2 hours ago

                                                It's a bit of an odd article because the second part kind of shows why dicts aren't a problem. You basically just need to apply the most old school of OO doctrines: "recipients of messages are responsible for how they interpret them", and that's exactly what the author advocates when he talks about treating dict data akin to data over the wire, which is correct.

                                                If you're programming correctly and take encapsulation seriously, then whatever shape incoming data in a dict has isn't something you should take an issue with, you just need to make sure if what you care about is in it (or not) and handle that within your own context appropriately.

                                                Rich Hickey once gave a talk about something like this talking about maps in Clojure and I think he made the analogy of the DHL truck stopping at your door. You don't care what every package in the truck is, you just care if your package is in there. If some other data changes, which data always does, that's not your concern, you should be decoupled from it. It's just equivalent to how we program networked applications. There are no global semantics or guarantees on the state of data, there can't be because the world isn't in sync or static, there is no global state. There's actually another Hickey-ism along the lines of "program on the inside the same way you program on the outside". Dicts are cool, just make sure that you're always responsible for what you do with one.

                                                • klyrs 4 hours ago

                                                  Lists and sets suffer the same drawbacks. If the advice is to not use any of the batteries included if the language, why are we using Python?

                                                  If you want an immutable mapping, why not use an enum?

                                                  • o11c 3 hours ago

                                                    This isn't arguing against them in general, but against the unfortunate Javascript-esque abandonment of specified semantics.

                                                    In particular, whenever anyone thinks that "deep clone vs shallow clone" is a meaningful distinction, that means their types are utterly void of meaning.

                                                  • gotoeleven an hour ago

                                                    Personally I find it is often helpful to keep Dicts in a BigBag ie:

                                                    BigBag<Dict>