• patcon an hour ago

    Neat! Might this even be useful to impute missing data for a sparse network of votes, for a system like this (pol.is) whose goal is to do dimensional reduction and visualise the opinion space of divisive social topics: https://gwern.net/doc/sociology/2021-small.pdf

    200 voters on 50 statements would fall within the 10,000 sample threshold. This is well within the bounds of some existing conversations with open data, so it could be tested... Potential values on each statement are agree/disagree/pass (+1/-1/0)

    https://github.com/compdemocracy/openData/blob/master/brexit...

    https://github.com/compdemocracy/openData/blob/master/brexit...

    • gcr 6 hours ago

      Thanks for such a cool project! It's immediately apparent how to use it and I appreciate the brief examples.

      Quick question: In the breast cancer example from the README, simple support vector machines from sklearn (the first thing i tried to compare baseline performance, incidentally) seem to outperform TabPFN. Is this expected? I know it's a baseline to demonstrate ease of use rather than SOTA performance, but I am curious.

          # (TabPFN)
          In [13]: print("ROC AUC:", roc_auc_score(y_test, prediction_probabilities[:, 1]))
          ROC AUC: 0.996299494264216
      
          # (LinearSVC)
          In [27]: from sklearn.svm import LinearSVC
          
          In [28]: clf=LinearSVC(C=0.01).fit(X_train, y_train)
          
          In [29]: roc_auc_score(y_test, clf.decision_function(X_test))
          Out[29]: 0.997532996176144
      • instanceofme 16 hours ago

        Related: CARTE-AI, which can also deal with multiple tables.

        https://soda-inria.github.io/carte/ https://arxiv.org/pdf/2402.16785

        The paper includes a comparison to TabPFN v1 (among others), noting the lack of categorical & missing values handling which v2 now seems to have. Would be curious to see an updated comparison.

        • onasta 10 hours ago

          TabPFN is better on numerical data since v1 (see figure 6 in the CARTE paper). CARTE's main strength in on text features, which are now also supported for TabPFN v2 API version (https://github.com/PriorLabs/tabpfn-client). We compared this to CARTE and found our model to be generally quite better, and much faster. CARTE multi-table approach is also very interesting, and we want to tackle this setting in the future.

        • tmostak 4 hours ago

          This looks amazing!

          Just looking through the code a bit, it seems that the model both supports a (custom) attention mechanism between features and between rows (code uses the term items)? If so, does the attention between rows help improve accuracy significantly?

          Generally, for standard regression and classification use cases, rows (observations) are seen to be independent, but I'm guessing cross-row attention might help the model see the gestalt of the data in some way that improves accuracy even when the independence assumption holds?

          • dist-epoch 16 minutes ago

            Speculating, cross-row might give you information where you are in that row distribution.

          • ggnore7452 11 hours ago

            anyone tried this? is this actually overall better than xgboost/catboost?

          • peepeepoopoo99 8 hours ago

            How can you train a tabular foundation model when the tabular features themselves are inherently domain-specific? Is there some kind of preprocessing step beforehand to match the inference time features with their closest analogues in the training set?

            • enigmaa99 9 hours ago

              I tried this on a few CARTE datasets and it works surprisingly better!! Woahhh

              • hooloovoo_zoo 7 hours ago

                Were your benchmark methods tuned per dataset or across datasets?

                • ersiees 29 minutes ago

                  Tuned per dataset

                • nickpsecurity 5 hours ago

                  A while back, I was looking for a project amateurs could do for experimenting with Transformer alternatives and optimization algorithms. My concept was grabbing objective, test functions from the literature, making custom ones based on realistic data, and layering them together based on real-world depth. Then, training various approaches on them using consumer GPU’s or spot instances of high-end GPU’s.

                  What I read in this paper blew that idea out the water! I mean, it’s still doable but you’ve far exceeded it.

                  I love that you covered many types of structures, used 8x consumer GPU’s more like OSS folks do (widely-accessible pretraining), claim no copyright infringement for pretraining, and use enough techniques in ML that people can enjoy Googling stuff for days.

                  I do have some questions about what I might have overlooked in the paper.

                  1. Is the training data and code available to reproduce the model? And iteratively improve its architectural decisions?

                  2. Most authors claiming their data was legal or open were actually committing copyright infringement. Your method might dodge that if users generate their own synthetic data using methods they can verify aren’t themselves encumbered. Is that code available under open licensing? If not, would you offer it for a fee for companies or free for researchers?

                  3. What specific, common uses could amateurs try that would display the model’s ability in a business setting? (Both to drive more research or build products on the model.)

                  I thank you for your time.

                  • bbstats 11 hours ago

                    looks amazing - finally, DL that beats a tuned catboost?

                    • OutOfHere 16 hours ago
                      • _giorgio_ 10 hours ago

                        It's probably the same model with the same limitations, released nearly two years ago?

                        https://arxiv.org/abs/2207.01848

                        • onasta 10 hours ago

                          There have been a ton of improvements! Much better performance overall, way larger data size limit (1K-->10K rows, 100-->500 features), regression support, native categorical data and missing values handling, much better support for uninformative or outlier features etc.

                          • ersiees 10 hours ago

                            No, it is *much* stronger, a different architecture and scales to 10x the number of examples. It can also do regression now, and handle categorical features. Please, have a quick look at the abstract before making such claims.