• gcanyon 10 hours ago

    One that isn't listed here, and which is critical to machine learning, is the idea of near-orthogonality. When you think of 2D or 3D space, you can only have 2 or 3 orthogonal directions, and allowing for near-orthogonality doesn't really gain you anything. But in higher dimensions, you can reasonably work with directions that are only somewhat orthogonal, and "somewhat" gets pretty silly large once you get to thousands of dimensions -- like 75 degrees is fine (I'm writing this from memory, don't quote me). And the number of orthogonal-enough dimensions you can have scales as maybe as much as 10^sqrt(dimension_count), meaning that yes, if your embeddings have 10,000 dimensions, you might be able to have literally 10^100 different orthogonal-enough dimensions. This is critical for turning embeddings + machine learning into LLMs.

    • user070223 3 hours ago

      That's what illustrated in the paper Toy Models of superposition

      https://arxiv.org/pdf/2209.10652

      • phreeza 6 hours ago

        By orthogonal-enough dimensions, do you mean vectors whose dot product is close to zero?

        • sigmoid10 3 hours ago

          This is actually just another way to see the third example (concentration of measure). As you increase the number of dimensions, the contribution of each base vector component in the calculation of, say, the cosine angle (i.e. via the scalar product) becomes less important. So in three dimensions you'll have a pretty high angle if one vector component points along a different base vector. But in 10,000 dimensions, the angle will be tiny.

          • westurner 9 hours ago

            Does distance in feature space require orthogonality?

            With real space (x,y,z) we omit the redundant units from each feature when describing the distance in feature space.

            But distance is just a metric, and often the space or paths through it are curvilinear.

            By Taxicab distance, it's 3 cats, 4 dogs, and 5 glasses of water away.

            Python now has math.dist() for Euclidean distance, for example.

            • epistasis 7 hours ago

              Near-orthogonality allows fitting in more directions for distinct concepts than the dimension of the space. So even though the dimension of an LLM might be <2000, far far more than 2000 distinct directions can fit into that space.

              The term most often used is "superposition." Here's some material on it that I'm working through right now:

              https://arena3-chapter1-transformer-interp.streamlit.app/%5B...

          • rectang 9 hours ago

            Time to share my favorite quote from Symbols, Signals and Noise by John R. Pierce, where he discusses how Shannon achieved a breakthrough in Information Theory:

            > This chapter has had another aspect. In it we have illustrated the use of a novel viewpoint and the application of a powerful field of mathematics in attacking a problem of communication theory. Equation 9.3 was arrived at by the by-no-means-obvious expedient of representing long electrical signals and the noises added to them by points in a multidimensional space. The square of the distance of a point from the origin was interpreted as the energy of the signal represented by a point.

            > Thus a problem in communication theory was made to correspond to a problem in geometry, and the desired result was arrived at by geometrical arguments.

            • remcob 7 hours ago

              The distance between two uniform random points on an n-sphere clusters around the equator. The article shows a histogram of the distribution in fig. 11. While it looks Gaussian, it is more closely related to the Beta distribution. I derived it in my notes, as (surprisingly) I could not find it easily in literature:

              https://xn--2-umb.com/21/n-sphere

              • zombot 5 hours ago

                > The distance between two uniform random points on an n-sphere clusters around the equator.

                This sentence makes no sense to me.

                • p1esk 5 hours ago

                  He means it clusters around the distance from a pole to the equator.

                  • remcob 5 hours ago

                    Correct. I was too short in my comment. It's explained in the article: without loss of generality you can call one of the two points the 'north pole' and then the other one will be distributed close to the equator.

                  • isoprophlex 5 hours ago

                    Pick an equator on an n-sphere. It is a hyperplane of dimensions (n-1) through the center, composed of all but one dimensions of your sphere. The xy plane for a unit sphere in xyz, for example.

                    Uniformly distribute points on the sphere. For high n, all points will be very near the equator you chose.

                    Obviously, in ofder for a point to be not close to this chosen equator, it projects close to 0 on all dimensions spanning the equatorial hyperplane, and not close to 0 on the dimension making up the pole-to-pole axis.

                    • oersted 4 hours ago

                      My first thought is that it's rather obvious, but I'm probably wrong, can you help me understand?

                      The analogy I have in mind is: if you throw n dice, for large n, the likelihood of one specific chosen dice being high value and the rest being low value is obviously rather small.

                      I guess that the consequence is still interesting, that most random points in a high-dimensional n-sphere will be close to the equator. But they will be close to all arbitrary chosen equators, so it's not that meaningful.

                      If the equator is defined as containing n-1 dimensions, then as n goes higher you'd expect it to "take up" more of the space of the sphere, hence most random points will be close to it. It is a surprising property of high-dimensional space, but I think it's mainly because we don't usually think about the general definition of an equator and how it scales to higher dimensions, once you understand that it's not very surprising.

                      • isoprophlex 4 hours ago

                        > The analogy I have in mind is: if you throw n dice, for large n, the likelihood of one specific chosen dice being high value and the rest being low value is obviously rather small.

                        You're exactly right, this whole thing is indeed a bit of an obvious nothingburger.

                    • akdor1154 4 hours ago

                      "clusters" is acting as a verb here, not a noun.

                    • 7fYZ7mJh3RNKNaG 6 hours ago

                      beautiful visualizations, how did you make them?

                      • remcob 4 hours ago

                        The first one IIRC with Geogebra, all the rest with Matplotlib. The design goal was to maximize on 'data-ink ratio'.

                    • FabHK 11 hours ago

                      For high-dimensional spheres, most of the volume is in the "shell", ie near the boundary [0]. This sort of makes sense to me, but I don't know how to square that with the observation in the article that most of the surface area is near the equator. (In particular, by symmetry, it's near any equator; so, one would think, in their intersection. That is near the centre, though, not the shell.)

                      Anyway. Never buy a high-dimensional orange, it's mostly rind.

                      [0] https://www.math.wustl.edu/~feres/highdim

                      • youoy 4 hours ago

                        If you like ML, this is also related with the results of this paper [0], where they show that learning in high dimensions amounts to extrapolation, as opposed to interpolation. Intuitively I think of this as the fact that points in the sphere are convexly independent, and most of the volume of the ball is near the boundary.

                        [0] https://arxiv.org/abs/2110.09485

                        • hansvm 10 hours ago

                          It's basically the same idea in both cases. Power laws warp anything "slightly bigger" into dominating everything else when the power is big enough. There's a bit more stuff near the outside than the inside, so with a high enough dimension the volume is in the rind. Similarly, the equator is a bit bigger than the other slices, so with enough dimensions its surface area dominates.

                          • WiSaGaN 6 hours ago

                            Yes, this seems to be the result of the standard Euclidean metric rather than the high dimension itself. I guess most people assuming the metric to be Euclidean, so it's ok.

                        • mattxxx 11 hours ago

                          Yea - high dimensional spaces are weird and hard to reason about... and we're working very frequently in them, especially when dealing with ML.

                          • l33t7332273 10 hours ago

                            Luckily if you do enough math it becomes much easier to reason about such spaces

                            • JBiserkov 9 hours ago

                              - How do you even visualize an 11-dimensional space?

                              - oh that's easy - you just visualize an N-dimensional space and then set N equal to 11.

                              • rectang 8 hours ago

                                I think of high-dimensional spaces in terms of projection. Projecting a 3-dimensional space onto a 2-dimensional space loses information and the results depend on perspective. Same with an 11-dimensional space being projected onto a 10-dimensional space.

                                I find that this metaphor works pretty well for visualizing how a vector-space search engine represents how two documents can be "similar" in N-dimensional term-space: look at them from the right angle and they appear close together.

                                • marcosdumay 8 hours ago

                                  Yeah, stopping that need to visualize everything is one of the mechanisms usually adopted for working in high-dimensional space.

                            • derbOac 11 hours ago

                              I love this stuff because it's so counterintuitive until you've worked through some of it. There was an article linked to on HN a while back about high-dimensional Gaussian distributions that was similar in message, and probably mathematically related at some level. It has so many implications for much of the work in deep learning and large data, among other things.

                              • bmitc 11 hours ago

                                Actually, the most counterintuitive is 4-dimensional space. It is rather mathematically unique, often exhibiting properties no other dimension does.

                                • dullcrisp 10 hours ago

                                  Well I’m sure 2- and 3- dimensional space are also mathematically unique and interesting by the same token, but they’re nearer to our experience and intuition.

                                  • ngruhn 4 hours ago

                                    I‘ve heard that knots only exist in 3 dimensions. In 2D you can’t entangle anything and in 4D+ you can always untangle everything.

                                  • NL807 10 hours ago

                                    >often exhibiting properties no other dimension does.

                                    Isn't that true for some other dimensions as well? There is a whole much of mathematical concepts that is constrained for a specific dimension. For example the cross product only makes sense in 3D. The perpendicular dot product (a special case of the determinant) only makes sense in 2D.

                                    • immibis 4 hours ago

                                      Apparently there's also a 7D cross product - and no others!

                                    • elcritch 11 hours ago

                                      How so?

                                      • hansvm 10 hours ago

                                        The intuitive way to think about it is that with very few dimensions you have very few degrees of freedom, so it's easy to prove things possible or impossible. With lots of dimensions, you have enough wiggle room to prove most things possible. Somewhere in between, you have enough complexity to not trivialize the problems but not enough wiggle room to be able to easily circumvent the issue.

                                        Often in practice, that boundary is around 3-4 dimensions. See the poincaré conjecture, various sphere packing shenanigans, graph embeddings, ....

                                        • bmitc 10 hours ago

                                          There's a section here about phenomena in 4 dimensions: https://en.wikipedia.org/wiki/4-manifold

                                          One of the most surprising is that all smooth manifolds of dimension not equal to four only have a finite number of unique smooth structures. For dimension four, there are countably infinite number of unique smooth structures. It's the only dimension with that property.

                                          • elcritch 9 hours ago

                                            Fascinating that higher dimension manifolds are more restrictive!

                                            Though in a _very_ handwavy way it seems intuitive given properties like that in TFA where 4-d is the only dimension where the edges of the bounding cube and inner spheres match. Especially given that that property seems related to the possible neighborhoods of points in d-4 manifolds. Though I quickly get lost in the specifics of the maths on manifolds. :)

                                            > However in four dimensions something very interesting happens. The radius of the inner sphere is exactly 1/2, which is just large enough for the inner sphere to touch the sides of the cube!

                                            • ashishb 9 hours ago

                                              > One of the most surprising is that all smooth manifolds of dimension not equal to four only have a finite number of unique smooth structures. For dimension four, there are countably infinite number of unique smooth structures. It's the only dimension with that property.

                                              Can you give some intuition on smooth structure and manifold? I read Wikipedia articles a few times but still can't grasp them.

                                              • aithrowawaycomm 7 hours ago

                                                I am not sure the other comment was especially intuitive. Here is my understanding:

                                                Euclidean space is a vector space and therefore pretty easy to work with in computations (especially calculus) compared to something like the surface of a sphere, but the sphere doesn't simply abandon Euclidean vector structure. We can take halves of the sphere and "flatten them out," so instead of working with the sphere we can work with two planes, keeping in mind that the flattening functions define the boundary of those planes we're allowed to work within. Then we can do computations on the plane and "unflatten" them to get the result of those computations on the sphere.

                                                Manifolds are a generalization of this idea: you have a complicated topological structure S, but also some open subsets of S, S_i, which partition S, and smooth, invertible functions f_i: S_i -> R^n that tell you how to treat elements of S locally as if they were vectors in Euclidean space (and since the functions are invertible, it tells you how to map the vectors back to S, which is what you want).

                                                The manifold is a pair, the space S and the smooth functions f_i. The smoothness is important because ultimately we are interested in doing calculus on S, so if the mapping functions have "sharp edges" then we're introducing sharp edges into S that are entirely a result of the mapping and not S's own geometry.

                                                • bmitc 9 hours ago

                                                  Applying a smooth structure to a manifold to make it a smooth manifold is like a patching process that makes it look like a Eucliden space.

                                                  Most of calculus and undergraduate math, engineering, and physics takes place in Euclidean space R^n. So all the curves and surfaces directly embed into R^n, usually where n = 2 or n = 3. However, there are more abstract spaces that one would like to study and those are manifolds. To do calculus on them, they need to be smooth manifolds. A smooth structure is a collection of "patches" (normally called charts) such that each patch (chart) is homeomorphic (topologically equivalent) to an open set in R^n. Such a manifold is called an n-dimensional manifold. The smoothness criterion is a technicality such that the coordinates and transformation coordinates are smooth, i.e., infinitely differentiable. Smooth manifolds is basically the extension of calculus to more general and abstract dimensions.

                                                  For example, a circle is a 1-dimensional manifold since it locally looks like a line segment. A sphere (the shell of the sphere) is a 2-dimensional manifold because it locally looks like an open subset of R^2, i.e., it locally looks like a two dimensional plane. Take Earth for example. Locally, a Euclidean x-y coordinate system works well.