• mufasachan 5 months ago

    I do not know the people's background of a lot of comments here. They might have much more experiences than me with tensors. But, in my deep learning code and works, when I need to design an operation that involves a mix as little as 3 tensors with 4+ dimensions, I always struggle. I need to draft some slices to understand which slice should be contracted etc.. Many times the shape of the output is not even clear in my mind. Plus, add some padding maskS on the tensors and it confuses me quite a lot. I really like this notation, the last example of 1.1 is readable in its sum formulation, but the diagram formulation is much more "alive" in my mind.

    I am really lost here if I have missed something about indices notations with tensors or some visualization techniques. Or maybe the confusion of a tensor operation depends of the field? Or maybe I just miss practices and experiences with indices notations...

    • thomasahle 5 months ago

      Standard tensor notation in pytorch and other libraries is very indirect, and often shapes are only documented in comments.

      I definitely find it helps to draw parts of your architecture as a tensor diagram. Or perhaps use a library like tensorgrad which makes eveything explicit.

    • chriskanan 5 months ago

      The Matrix Cookbook was one of the most useful texts I had in the late 2000s during my PhD for computing update rules for loss functions. With autograd now widely employed, I don't think it is as fundamental as it was.

      Given that, what additional value is The Tensor Cookbook providing such that it is worth learning an entirely new notation (for me)? It would probably require long term usage to really benefit from these visual depictions.

      • kengoa 5 months ago

        I encountered it during my masters in 2019 and it still remains my favourite textbook to this day.

        • thomasahle 5 months ago

          The Matrix Cookbook has a lot of useful formulas, but they don't have much explanation.

          In the Tensor Cookbook I aim to show the same formulas using tensor diagrams, in a way that (hopefully) make them seem so obvious you don't even need the book afterwards.

          • chriskanan 5 months ago

            That's a good point regarding the Matrix Cookbook. I didn't start using it until I'd already learned a lot of linear algebra and how to do matrix calculus by hand, so I didn't really need much of an explanation when I first discovered it in the 2000s.

            In contrast, the Tensor Cookbook was my first introduction to tensor diagrams, so I didn't have any prior experience with them to lean on.

            It certainly looks like a useful and powerful technique, but it seems like something that warrants almost a crash course in the topic with some exercises rather than just jumping in.

            • thomasahle 5 months ago

              > It certainly looks like a useful and powerful technique, but it seems like something that warrants almost a crash course in the topic with some exercises rather than just jumping in.

              I guess my personal crash course was writing a book on the topic. And a software library... Ironically this makes it harder for me to appreciate the level of explanation needed for others.

              Thus I rely on people like you to tell me where the chain jumps off, so I can expand the sections. Please let me know what sections were too quickly skipped through!

        • gweinberg 5 months ago

          Why is counting lines easier than just using numbers? A vector is a 1 tensor, a matrix is a 2 tensor, a 3 tensor is a 3 tensor, and a scalar is a zero tensor!

          • absolutelastone 5 months ago

            It's easier to see where the contractions are, since the same line connects two tensors, versus hunting for repeated indices in a mess of many indices among many tensors multiplied together.

          • fluorinerocket 5 months ago

            Writing code with einsum, and knowing what each index represents like time, space, element is so much more intuitive then trying to reason about just how need to transpose things so that everything works to do the tensor contraction you want.

            I will be studying this

            • thomasahle 5 months ago

              Please send me any feedback you have! I'm still figuring out what presentation and content is most interesting to people!

            • physicsguy 5 months ago

              Not saying this notation isn’t useful, but the big advantage with index notation is that a large, large number of people are already familiar with it because it’s a core part of many Physics or Maths degrees, so will be using it naturally. If you have to work with such people then it’s useful to “speak a common language”

              • thomasahle 5 months ago

                You can always draw the diagrams while you're thinking, and then "compile down" to index notation when you share it with others.

                But luckily tensor diagrams is quite standard in many fields.

              • galaxyLogic 5 months ago

                My ideal preference would be a notation I can type on a standard keyboard.

                • hnarayanan 5 months ago

                  What even is this notation??

                  • abecedarius 5 months ago

                    See the link at bottom right to "Penrose graphical notation".

                    (I have nothing to do with the site.)

                    • hnarayanan 5 months ago

                      Sorry. I was just making an exasperated statement that roughly boils down to “what is wrong with mainstream notation and who is going to learn or care about this completely different thing?”

                      • abecedarius 5 months ago

                        And I was just trying to help. :) Easy to miss that link. Unfortunately Wikipedia generally isn't great for motivation on math topics.

                        Tensor ops in e.g. pytorch are pretty opaque to me, too much implicit on shapes which can change as you go down a pipeline. Maybe I'll come to appreciate it better.

                        • hnarayanan 5 months ago

                          And thank you. (I come from an old generation of people trained in linear algebra and differential geometry, well before the current ML era so I am holding onto my "get off my lawn" grumpiness.)

                  • robblbobbl 5 months ago

                    Great job!

                    • thomasahle 5 months ago

                      Thank you! Please share any feedback you have!

                    • keithalewis 5 months ago

                      This is incomplete, incorrect, and irrelevant. Standard notation already exists. I'm sure it is fun to draw squiggly lines and some people enjoy reinventing the wheel. Spend some time learning what others have taught us before striking out on your own lonely path.

                      • llm_trw 5 months ago

                        This is standard notation that's been used for decades.

                        https://arxiv.org/abs/2402.01790v1

                        • chriskanan 5 months ago

                          This paper motivates and explains concepts much better than the Tensor Cookbook.

                          • thomasahle 5 months ago

                            I'm hoping the Tensor Cookbook can become as engaging to read for others as Jordan Taylor's paper was to me. If you have any thoughts on where I lose people, please share!

                            • llm_trw 5 months ago

                              The cookbook is a work in progress by the looks of it.

                            • keithalewis 5 months ago

                              "This book aims to standardize the notation for tensor diagrams..." https://youtu.be/zELbzXAmcUA?t=73

                              • thomasahle 5 months ago

                                Tensor diagrams are standard, but some notation is missing. My goal was to be able to handle the entire Matrix Cookbook.

                                For this I needed a good notation for functions applied to specific dimensions and broadcasting over the rest. Like softmax in a transformer.

                                The function chapter is still under development in the book though. So if you have any good references for how it's been done graphically in the past, that I might have missed, feel free to share them.

                                • absolutelastone 5 months ago

                                  You can do broadcasting with a tensor, at least for products and sums. The product is multilinear, and a sum can be in two steps, first step using a tensor to implement fanout. Though I can see the value in representing structure that can be used more efficiently versus just another box for a tensor. Beyond that (softmax?) seems kind of awkward since you're outside the domain of your "domain specific language". I don't know why it's needed to extend the matrix cookbook to tensor diagrams.

                                  • llm_trw 5 months ago

                                    I come back to this every few months and do some work trying to make sense of how tensors are used in machine learning. Tensors, as used in physics and whose notation these tools inherit, are there for coordinate transforms and nothing else.

                                    Tensors, as used in ML, are much closer to a key-value store with composite keys and scalar values, with most of the complexity coming from deciding how to filter on those composite keys.

                                    Drop me a line if you're interested in a chat. This is something I've been thinking about for years now.

                                • thomasahle 5 months ago

                                  Highly recommend this note by Jordan Taylor.

                                • HighlandSpring 5 months ago

                                  Do point us at this standard notation

                                  • ok123456 5 months ago
                                    • keithalewis 5 months ago

                                      The author also seems to be unaware of Fréchet derivatives.

                                      • gsf_emergency_2 5 months ago

                                        I don't exactly know what you mean but from your hint I found the uh, clarifying bedtime story:

                                        https://arxiv.org/abs/2302.09687

                                        (On functions of 3rd-order "tensors")

                                        ((Whereas matrix-functions are of 2nd-order "tensors"))

                                        Playground: https://gitlab.com/katlund/t-frechet

                                        (MATLAB)

                                        • keithalewis 5 months ago

                                          The Wikipedia page on this is sufficient. If F:X -> Y is a function between normed linear spaces then DF:X -> L(X,Y), where L(X,Y) is the vector space of linear operators from X to Y, satisfies F(x + h) = F(x) + DF(x)h + o(h). A function is differentiable if it can be locally approximated by a linear operator.

                                          Some confusion arises from the difference between f:R -> R and f':R -> R. It's Fréchet derivative is Df:R -> L(R,R) where Df(x)h = f'(x)h. Row vectors and column vectors a just a clumsy way of thinking about this.

                                          BTW, all you need in order to publish on arixv.org is to know a FoF. There is no rigorous peer review. https://arxiv.org/abs/1912.01091, https://arxiv.org/abs/2009.10852.

                                        • thomasahle 5 months ago

                                          What content about Fréchet derivatives do you think would be useful to include?