Qarma

Question

4.96K views22 May 2024

Valentin Emiya91 3 July 2023 0 Comments

Add papers and vote!

How to: one “answer” = one paper ; vote and comment on proposed papers.

Valentin Emiya Unselected an answer 27 October 2023

28 Answers

score 0 · Answer 1 · 2024-05-21T15:51:49+00:00

Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension, Moritz Haas,NeurIPS 2023

Abstract: “The success of over-parameterized neural networks trained to near-zero training error has caused great interest in the phenomenon of benign overfitting, where estimators are statistically consistent even though they interpolate noisy training data. While benign overfitting in fixed dimension has been established for some learning methods, current literature suggests that for regression with typical kernel methods and wide neural networks, benign overfitting requires a high-dimensional setting, where the dimension grows with the sample size. In this paper, we show that the smoothness of the estimators, and not the dimension, is the key: benign overfitting is possible if and only if the estimator’s derivatives are large enough. We generalize existing inconsistency results to non-interpolating models and more kernels to show that benign overfitting with moderate derivatives is impossible in fixed dimension. Conversely, we show that benign overfitting is possible for regression with a sequence of spiky-smooth kernels with large derivatives. Using neural tangent kernels, we translate our results to wide neural networks. We prove that while infinite-width networks do not overfit benignly with the ReLU activation, this can be fixed by adding small high-frequency fluctuations to the activation function. Our experiments verify that such neural networks, while overfitting, can indeed generalize well even on low-dimensional data sets.”

Complementary paper: Kernel interpolation in Sobolev spaces is not consistent in low dimensions

score 7 · Answer 2 · 2023-10-27T09:26:42+00:00

7

Stephane Ayache15 Posted 18 July 2023 1 Comment

Competitive Physics Informed Networks
Qi Zeng, Yash Kothari, Spencer H Bryngelson, Florian Tobias Schaefer. ICLR 2023

Valentin Emiya Posted new comment 27 October 2023

Valentin Emiya commented 27 October 2023

A related work by a colleague at LMA : https://arxiv.org/abs/2308.11503

score 3 · Answer 3 · 2023-07-03T13:17:13+00:00

Accelerated gradient methods are fast, but why?

Su, W., Boyd, S., & Candes, E. A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. Journal of Machine Learning Research 17 (2016) 1-43 [pdf]

score 1 · Answer 4 · 2023-10-09T15:33:22+00:00

A convnet for the 2020s (CVPR 2022)

The “Roaring 20s” of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model. A vanilla ViT, on the other hand, faces difficulties when applied to general computer vision tasks such as object detection and semantic segmentation. It is the hierarchical Transformers (e.g., Swin Transformers) that reintroduced several ConvNet priors, making Transformers practically viable as a generic vision backbone and demonstrating remarkable performance on a wide variety of vision tasks. However, the effectiveness of such hybrid approaches is still largely credited to the intrinsic superiority of Transformers, rather than the inherent inductive biases of convolutions. In this work, we reexamine the design spaces and test the limits of what a pure ConvNet can achieve. We gradually “modernize” a standard ResNet toward the design of a vision Transformer, and discover several key components that contribute to the performance difference along the way. The outcome of this exploration is a family of pure ConvNet models dubbed ConvNeXt. Constructed entirely from standard ConvNet modules, ConvNeXts compete favorably with Transformers in terms of accuracy and scalability, achieving 87.8% ImageNet top-1 accuracy and outperforming Swin Transformers on COCO detection and ADE20K segmentation, while maintaining the simplicity and efficiency of standard ConvNets.

score 1 · Answer 5 · 2023-10-09T14:39:24+00:00

Hierarchical associative memory
Dmitry Krotov

Dense Associative Memories or Modern Hopfield Networks have many appealing properties of associative memory. They can do pattern completion, store a large number of memories, and can be described using a recurrent neural network with a degree of biological plausibility and rich feedback between the neurons. At the same time, up until now all the models of this class have had only one hidden layer, and have only been formulated with densely connected network architectures, two aspects that hinder their machine learning applications. This paper tackles this gap and describes a fully recurrent model of associative memory with an arbitrary large number of layers, some of which can be locally connected (convolutional), and a corresponding energy function that decreases on the dynamical trajectory of the neurons’ activations. The memories of the full network are dynamically “assembled” using primitives encoded in the synaptic weights of the lower layers, with the “assembling rules” encoded in the synaptic weights of the higher layers. In addition to the bottom-up propagation of information, typical of commonly used feedforward neural networks, the model described has rich top-down feedback from higher layers that help the lower-layer neurons to decide on their response to the input stimuli.

score 2 · Answer 6 · 2023-10-09T08:39:07+00:00

https://arxiv.org/abs/2306.09222

https://blog.research.google/2023/09/re-weighted-gradient-descent-via.html
Stochastic Re-weighted Gradient Descent via Distributionally Robust Optimization
We develop a re-weighted gradient descent technique for boosting the performance of deep neural networks, which involves importance weighting of data points during each optimization step. Our approach is inspired by distributionally robust optimization with f-divergences, which has been known to result in models with improved generalization guarantees. Our re-weighting scheme is simple, computationally efficient, and can be combined with many popular optimization algorithms such as SGD and Adam. Empirically, we demonstrate the superiority of our approach on various tasks, including supervised learning, domain adaptation. Notably, we obtain improvements of +0.7% and +1.44% over SOTA on DomainBed and Tabular classification benchmarks, respectively. Moreover, our algorithm boosts the performance of BERT on GLUE benchmarks by +1.94%, and ViT on ImageNet-1K by +1.01%. These results demonstrate the effectiveness of the proposed approach, indicating its potential for improving performance in diverse domains.

score 1 · Answer 7 · 2023-10-09T08:33:57+00:00

https://www.nature.com/articles/s41592-021-01284-3
Avoiding a replication crisis in deep-learning-based bioimage analysis
Deep learning algorithms are powerful tools for analyzing, restoring and transforming bioimaging data. One promise of deep learning is parameter-free one-click image analysis with expert-level performance in a fraction of the time previously required. However, as with most emerging technologies, the potential for inappropriate use is raising concerns among the research community. In this Comment, we discuss key concepts that we believe are important for researchers to consider when using deep learning for their microscopy studies. We describe how results obtained using deep learning can be validated and propose what should, in our view, be considered when choosing a suitable tool. We also suggest what aspects of a deep learning analysis should be reported in publications to ensure reproducibility. We hope this perspective will foster further discussion among developers, image analysis specialists, users and journal editors to define adequate guidelines and ensure the appropriate use of this transformative technology.

score 0 · Answer 8 · 2023-10-09T07:42:59+00:00

Heist, N., Paulheim, H. (2023). NASTyLinker: NIL-Aware Scalable Transformer-Based Entity Linker. In: Pesquita, C., et al. The Semantic Web. ESWC 2023. Lecture Notes in Computer Science, vol 13870. Springer.

Entity Linking (EL) is the task of detecting mentions of entities in text and disambiguating them to a reference knowledge base. Most prevalent EL approaches assume that the reference knowledge base is complete. In practice, however, it is necessary to deal with the case of linking to an entity that is not contained in the knowledge base (NIL entity). Recent works have shown that, instead of focusing only on affinities between mentions and entities, considering inter-mention affinities can be used to represent NIL entities by producing clusters of mentions. At the same time, inter-mention affinities can help to substantially improve linking performance for known entities. With NASTyLinker, we introduce an EL approach that is aware of NIL entities and produces corresponding mention clusters while maintaining high linking performance for known entities. The approach clusters mentions and entities based on dense representations from Transformers and resolves conflicts (if more than one entity is assigned to a cluster) by computing transitive mention-entity affinities. We show the effectiveness and scalability of NASTyLinker on NILK, a dataset that is explicitly constructed to evaluate EL with respect to NIL entities. Further, we apply the presented approach to an actual EL task, namely to knowledge graph population by linking entities in Wikipedia listings, and provide an analysis of the outcome.

score 2 · Answer 9 · 2023-10-09T07:24:22+00:00

Action Matching: Learning Stochastic Dynamics from Samples
Kirill Neklyudov, Rob Brekelmans, Daniel Severo, Alireza Makhzani

Learning the continuous dynamics of a system from snapshots of its temporal marginals is a problem which appears throughout natural sciences and machine learning, including in quantum systems, single-cell biological data, and generative modeling. In these settings, we assume access to cross-sectional samples that are uncorrelated over time, rather than full trajectories of samples. In order to better understand the systems under observation, we would like to learn a model of the underlying process that allows us to propagate samples in time and thereby simulate entire individual trajectories. In this work, we propose Action Matching, a method for learning a rich family of dynamics using only independent samples from its time evolution. We derive a tractable training objective, which does not rely on explicit assumptions about the underlying dynamics and does not require back-propagation through differential equations or optimal transport solvers. Inspired by connections with optimal transport, we derive extensions of Action Matching to learn stochastic differential equations and dynamics involving creation and destruction of probability mass. Finally, we showcase applications of Action Matching by achieving competitive performance in a diverse set of experiments from biology, physics, and generative modeling.
https://arxiv.org/abs/2210.06662

score 3 · Answer 10 · 2023-10-06T15:18:51+00:00

Replay and compositional computation (2023)

Ideas for AI architectures for lifelong, continual learning with strong generalization capabilities based on a new proposal in neuroscience regarding the function of replay in humans and animal brain activity.

score 3 · Answer 11 · 2023-10-06T11:30:14+00:00

B-cos Networks: Alignment is All We Need for Interpretability
Another approach towards interpretability. Rejects standard CNN Transformers, rejects standard Interpretable Recognition evaluation, proposes B-cos networks and localization focused evaluation.

https://arxiv.org/abs/2205.10268

score 4 · Answer 12 · 2023-07-18T12:17:47+00:00

4

Hamed Benazha38 Posted 18 July 2023 0 Comments

Forward Forward Algorithm, Hinton : https://arxiv.org/abs/2212.13345

Hamed Benazha Answered question 18 July 2023

score 3 · Answer 13 · 2023-07-18T12:15:48+00:00

Luo, Yuetian, and Anru R. Zhang. “Tensor clustering with planted structures: Statistical optimality and computational limits.” The Annals of Statistics 50.1 (2022): 584-613.

score 4 · Answer 14 · 2023-07-18T12:14:21+00:00

Li, Xingfeng, et al. “Auto-weighted tensor schatten p-norm for robust multi-view graph clustering.” Pattern Recognition 134 (2023): 109083.

score 8 · Answer 15 · 2023-07-18T12:12:23+00:00

What Makes Multi-modal Learning Better than Single (Provably)
https://arxiv.org/pdf/2106.04538.pdf
Neurips 2021

score 0 · Answer 16 · 2023-07-18T11:59:31+00:00

A method for learning to robustly segment object instances from images inspired by the development of infant visual perception.

Chen, H., Venkatesh, R., Friedman, Y., Wu, J., Tenenbaum, J. B., Yamins, D. L., & Bear, D. M. (2022, October). Unsupervised segmentation in real-world images via spelke object inference. In European Conference on Computer Vision (pp. 719-735). [pdf]

score 6 · Answer 17 · 2023-07-18T11:56:58+00:00

A brain-inspired method for maintaining a fixed-size representation of the past in recurrent neural networks.

Gu, A., Dao, T., Ermon, S., Rudra, A., & Ré, C. (2020). Hippo: Recurrent memory with optimal polynomial projections. Advances in neural information processing systems, 33, 1474-1487. [pdf]

score 1 · Answer 18 · 2023-07-18T11:50:57+00:00

Counting Out Time: Class Agnostic Video Repetition Counting in the Wild

https://openaccess.thecvf.com/content_CVPR_2020/html/Dwibedi_Counting_Out_Time_Class_Agnostic_Video_Repetition_Counting_in_the_CVPR_2020_paper.html

Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10387-10396

score 3 · Answer 19 · 2023-07-18T10:58:51+00:00

Choromanski, Krzysztof Marcin. “Taming graph kernels with random features.” International Conference on Machine Learning. PMLR, 2023. [pdf]

score 4 · Answer 20 · 2023-07-18T10:59:23+00:00

4

François-Xavier21 Posted 18 July 2023 0 Comments

François Chollet, On the Measure of Intelligence, ArXiv 2019, https://arxiv.org/abs/1911.01547

François-Xavier Answered question 18 July 2023

score 4 · Answer 21 · 2023-07-18T10:56:41+00:00

Tsigler, Alexander, and Peter L. Bartlett. “Benign overfitting in ridge regression.” J. Mach. Learn. Res. 24 (2023): 123-1. [pdf]

score -2 · Answer 22 · 2023-07-18T12:45:26+00:00

Philippe Gautret, Jean-Christophe Lagier, Philippe Parola, Line Meddeb, Morgane Mailhe, Barbara Doudier, Johan Courjon, Valérie Giordanengo, Vera Esteves Vieira, Hervé Tissot Dupont, Stéphane Honoré, Philippe Colson, Eric Chabrière, Bernard La Scola, Jean-Marc Rolain, Philippe Brouqui, Didier Raoult, Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical trial, International journal of antimicrobial agents, 2020.

Beyond Expertise and Roles: A Framework to Characterize the Stakeholders of Interpretable Machine Learning and their Needs

https://dl.acm.org/doi/10.1145/3411764.3445088

score 4 · Answer 23 · 2023-07-03T13:07:14+00:00

Some theoretical insights into the benefits of deep learning

Chizat, L., & Bach, F. Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss. (COLT 2020) [pdf]

score 4 · Answer 24 · 2023-07-04T08:38:19+00:00

An Image is worth 16×16 words. CVPR 2021. https://arxiv.org/abs/2010.11929

Introduction of the visual transformer. Relevant due the change in paradigm on CV and DL from convnets to transformers (could also make connections to language). Understanding transformers is quite relevant nowadays.

score 0 · Answer 25 · 2023-07-04T08:36:34+00:00

ResNet strikes back: An improved training procedure in timm. https://arxiv.org/abs/2110.00476

Same as A Metric Learning Reality Check. Shows that ResNet is still relevant even when compared with approaches as novel as transformers. Augmentation/training recipe is relevant and therefore must be considered in hopes of fairness of comparisons.

score 3 · Answer 26 · 2023-07-04T08:34:55+00:00

A Metric Learning Reality Check: https://arxiv.org/abs/2003.08505 (CVPR 2020-2021?)

Mostly about how current approaches are not fair and comparing state of the art losses-architectures trained with different regimes yields different results when using state-of-the-art or most recent regularization/augmentation/training recipes.