Which papers shall we read together in 2023?

4.96K views

Add papers and vote!

How to: one “answer” = one paper ; vote and comment on proposed papers.

Valentin Emiya Unselected an answer
0

Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension, Moritz Haas,NeurIPS 2023

Abstract: “The success of over-parameterized neural networks trained to near-zero training error has caused great interest in the phenomenon of benign overfitting, where estimators are statistically consistent even though they interpolate noisy training data. While benign overfitting in fixed dimension has been established for some learning methods, current literature suggests that for regression with typical kernel methods and wide neural networks, benign overfitting requires a high-dimensional setting, where the dimension grows with the sample size. In this paper, we show that the smoothness of the estimators, and not the dimension, is the key: benign overfitting is possible if and only if the estimator’s derivatives are large enough. We generalize existing inconsistency results to non-interpolating models and more kernels to show that benign overfitting with moderate derivatives is impossible in fixed dimension. Conversely, we show that benign overfitting is possible for regression with a sequence of spiky-smooth kernels with large derivatives. Using neural tangent kernels, we translate our results to wide neural networks. We prove that while infinite-width networks do not overfit benignly with the ReLU activation, this can be fixed by adding small high-frequency fluctuations to the activation function. Our experiments verify that such neural networks, while overfitting, can indeed generalize well even on low-dimensional data sets.

Complementary paper: Kernel interpolation in Sobolev spaces is not consistent in low dimensions

 

Joachim Tomasi Changed status to publish
7

Competitive Physics Informed Networks
Qi Zeng, Yash Kothari, Spencer H Bryngelson, Florian Tobias Schaefer. ICLR 2023

Valentin Emiya Posted new comment

A related work by a colleague at LMA : https://arxiv.org/abs/2308.11503

3

Accelerated gradient methods are fast, but why?

Su, W., Boyd, S., & Candes, E. A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. Journal of Machine Learning Research 17 (2016) 1-43 [pdf]

Valentin Emiya Unselected an answer
1

A convnet for the 2020s (CVPR 2022)

The “Roaring 20s” of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model. A vanilla ViT, on the other hand, faces difficulties when applied to general computer vision tasks such as object detection and semantic segmentation. It is the hierarchical Transformers (e.g., Swin Transformers) that reintroduced several ConvNet priors, making Transformers practically viable as a generic vision backbone and demonstrating remarkable performance on a wide variety of vision tasks. However, the effectiveness of such hybrid approaches is still largely credited to the intrinsic superiority of Transformers, rather than the inherent inductive biases of convolutions. In this work, we reexamine the design spaces and test the limits of what a pure ConvNet can achieve. We gradually “modernize” a standard ResNet toward the design of a vision Transformer, and discover several key components that contribute to the performance difference along the way. The outcome of this exploration is a family of pure ConvNet models dubbed ConvNeXt. Constructed entirely from standard ConvNet modules, ConvNeXts compete favorably with Transformers in terms of accuracy and scalability, achieving 87.8% ImageNet top-1 accuracy and outperforming Swin Transformers on COCO detection and ADE20K segmentation, while maintaining the simplicity and efficiency of standard ConvNets.

Mimoun Mohamed Unselected an answer
1

Hierarchical associative memory
Dmitry Krotov

Dense Associative Memories or Modern Hopfield Networks have many appealing properties of associative memory. They can do pattern completion, store a large number of memories, and can be described using a recurrent neural network with a degree of biological plausibility and rich feedback between the neurons. At the same time, up until now all the models of this class have had only one hidden layer, and have only been formulated with densely connected network architectures, two aspects that hinder their machine learning applications. This paper tackles this gap and describes a fully recurrent model of associative memory with an arbitrary large number of layers, some of which can be locally connected (convolutional), and a corresponding energy function that decreases on the dynamical trajectory of the neurons’ activations. The memories of the full network are dynamically “assembled” using primitives encoded in the synaptic weights of the lower layers, with the “assembling rules” encoded in the synaptic weights of the higher layers. In addition to the bottom-up propagation of information, typical of commonly used feedforward neural networks, the model described has rich top-down feedback from higher layers that help the lower-layer neurons to decide on their response to the input stimuli.

Hamed Benazha Answered question
2

https://arxiv.org/abs/2306.09222

https://blog.research.google/2023/09/re-weighted-gradient-descent-via.html
Stochastic Re-weighted Gradient Descent via Distributionally Robust Optimization
We develop a re-weighted gradient descent technique for boosting the performance of deep neural networks, which involves importance weighting of data points during each optimization step. Our approach is inspired by distributionally robust optimization with f-divergences, which has been known to result in models with improved generalization guarantees. Our re-weighting scheme is simple, computationally efficient, and can be combined with many popular optimization algorithms such as SGD and Adam. Empirically, we demonstrate the superiority of our approach on various tasks, including supervised learning, domain adaptation. Notably, we obtain improvements of +0.7% and +1.44% over SOTA on DomainBed and Tabular classification benchmarks, respectively. Moreover, our algorithm boosts the performance of BERT on GLUE benchmarks by +1.94%, and ViT on ImageNet-1K by +1.01%. These results demonstrate the effectiveness of the proposed approach, indicating its potential for improving performance in diverse domains.

Chandrasekar Subramani-narayana Answered question
1

https://www.nature.com/articles/s41592-021-01284-3
Avoiding a replication crisis in deep-learning-based bioimage analysis
Deep learning algorithms are powerful tools for analyzing, restoring and transforming bioimaging data. One promise of deep learning is parameter-free one-click image analysis with expert-level performance in a fraction of the time previously required. However, as with most emerging technologies, the potential for inappropriate use is raising concerns among the research community. In this Comment, we discuss key concepts that we believe are important for researchers to consider when using deep learning for their microscopy studies. We describe how results obtained using deep learning can be validated and propose what should, in our view, be considered when choosing a suitable tool. We also suggest what aspects of a deep learning analysis should be reported in publications to ensure reproducibility. We hope this perspective will foster further discussion among developers, image analysis specialists, users and journal editors to define adequate guidelines and ensure the appropriate use of this transformative technology.

Chandrasekar Subramani-narayana Answered question
0

Heist, N., Paulheim, H. (2023). NASTyLinker: NIL-Aware Scalable Transformer-Based Entity Linker. In: Pesquita, C., et al. The Semantic Web. ESWC 2023. Lecture Notes in Computer Science, vol 13870. Springer.

Entity Linking (EL) is the task of detecting mentions of entities in text and disambiguating them to a reference knowledge base. Most prevalent EL approaches assume that the reference knowledge base is complete. In practice, however, it is necessary to deal with the case of linking to an entity that is not contained in the knowledge base (NIL entity). Recent works have shown that, instead of focusing only on affinities between mentions and entities, considering inter-mention affinities can be used to represent NIL entities by producing clusters of mentions. At the same time, inter-mention affinities can help to substantially improve linking performance for known entities. With NASTyLinker, we introduce an EL approach that is aware of NIL entities and produces corresponding mention clusters while maintaining high linking performance for known entities. The approach clusters mentions and entities based on dense representations from Transformers and resolves conflicts (if more than one entity is assigned to a cluster) by computing transitive mention-entity affinities. We show the effectiveness and scalability of NASTyLinker on NILK, a dataset that is explicitly constructed to evaluate EL with respect to NIL entities. Further, we apply the presented approach to an actual EL task, namely to knowledge graph population by linking entities in Wikipedia listings, and provide an analysis of the outcome.

Rafika Boutalbi Changed status to publish
2

Action Matching: Learning Stochastic Dynamics from Samples
Kirill NeklyudovRob BrekelmansDaniel SeveroAlireza Makhzani

Learning the continuous dynamics of a system from snapshots of its temporal marginals is a problem which appears throughout natural sciences and machine learning, including in quantum systems, single-cell biological data, and generative modeling. In these settings, we assume access to cross-sectional samples that are uncorrelated over time, rather than full trajectories of samples. In order to better understand the systems under observation, we would like to learn a model of the underlying process that allows us to propagate samples in time and thereby simulate entire individual trajectories. In this work, we propose Action Matching, a method for learning a rich family of dynamics using only independent samples from its time evolution. We derive a tractable training objective, which does not rely on explicit assumptions about the underlying dynamics and does not require back-propagation through differential equations or optimal transport solvers. Inspired by connections with optimal transport, we derive extensions of Action Matching to learn stochastic differential equations and dynamics involving creation and destruction of probability mass. Finally, we showcase applications of Action Matching by achieving competitive performance in a diverse set of experiments from biology, physics, and generative modeling.
https://arxiv.org/abs/2210.06662

Swetali Nimje Answered question
3

Replay and compositional computation (2023)

Ideas for AI architectures for lifelong, continual learning with strong generalization capabilities based on a new proposal in neuroscience regarding the function of replay in humans and animal brain activity.

Thomas Schatz Answered question
3

B-cos Networks: Alignment is All We Need for Interpretability
Another approach towards interpretability. Rejects standard CNN  Transformers, rejects standard Interpretable Recognition evaluation, proposes B-cos networks and localization focused evaluation.

https://arxiv.org/abs/2205.10268

Felipe Torres Answered question
4

Forward Forward Algorithm, Hinton : https://arxiv.org/abs/2212.13345

Hamed Benazha Answered question
3

Luo, Yuetian, and Anru R. Zhang. “Tensor clustering with planted structures: Statistical optimality and computational limits.” The Annals of Statistics 50.1 (2022): 584-613.

Rafika Boutalbi Answered question
4

Li, Xingfeng, et al. “Auto-weighted tensor schatten p-norm for robust multi-view graph clustering.” Pattern Recognition 134 (2023): 109083.

Rafika Boutalbi Answered question
8

What Makes Multi-modal Learning Better than Single (Provably)
https://arxiv.org/pdf/2106.04538.pdf
Neurips 2021

Cecile Capponi Answered question
0

A method for learning to robustly segment object instances from images  inspired by the development of infant visual perception.

Chen, H., Venkatesh, R., Friedman, Y., Wu, J., Tenenbaum, J. B., Yamins, D. L., & Bear, D. M. (2022, October). Unsupervised segmentation in real-world images via spelke object inference. In European Conference on Computer Vision (pp. 719-735). [pdf]

Thomas Schatz Answered question
6

A brain-inspired method for maintaining a fixed-size representation of the past in recurrent neural networks.

Gu, A., Dao, T., Ermon, S., Rudra, A., & Ré, C. (2020). Hippo: Recurrent memory with optimal polynomial projections. Advances in neural information processing systems, 33, 1474-1487. [pdf]

Thomas Schatz Answered question
1

Counting Out Time: Class Agnostic Video Repetition Counting in the Wild

https://openaccess.thecvf.com/content_CVPR_2020/html/Dwibedi_Counting_Out_Time_Class_Agnostic_Video_Repetition_Counting_in_the_CVPR_2020_paper.html

Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10387-10396

Chandrasekar Subramani-narayana Answered question
3

Choromanski, Krzysztof Marcin. “Taming graph kernels with random features.” International Conference on Machine Learning. PMLR, 2023. [pdf]

Hachem Kadri Edited answer
4

François Chollet, On the Measure of Intelligence, ArXiv 2019, https://arxiv.org/abs/1911.01547

François-Xavier Answered question
4

Tsigler, Alexander, and Peter L. Bartlett. “Benign overfitting in ridge regression.” J. Mach. Learn. Res. 24 (2023): 123-1. [pdf]

Hachem Kadri Answered question
-2

Philippe Gautret, Jean-Christophe Lagier, Philippe Parola, Line Meddeb, Morgane Mailhe, Barbara Doudier, Johan Courjon, Valérie Giordanengo, Vera Esteves Vieira, Hervé Tissot Dupont, Stéphane Honoré, Philippe Colson, Eric Chabrière, Bernard La Scola, Jean-Marc Rolain, Philippe Brouqui, Didier Raoult, Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical trial, International journal of antimicrobial agents, 2020.

Ronan Sicre Posted new comment

Beyond Expertise and Roles: A Framework to Characterize the Stakeholders of Interpretable Machine Learning and their Needs

https://dl.acm.org/doi/10.1145/3411764.3445088

4

Some theoretical insights into the benefits of deep learning

Chizat, L., & Bach, F. Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss. (COLT 2020) [pdf]

Thomas Schatz Edited answer
4

An Image is worth 16×16 words. CVPR 2021. https://arxiv.org/abs/2010.11929

Introduction of the visual transformer. Relevant due the change in paradigm on CV and DL from convnets to transformers (could also make connections to language). Understanding transformers is quite relevant nowadays.

Felipe Torres Answered question
0

ResNet strikes back: An improved training procedure in timm. https://arxiv.org/abs/2110.00476

Same as A Metric Learning Reality Check. Shows that ResNet is still relevant even when compared with approaches as novel as transformers. Augmentation/training recipe is relevant and therefore must be considered in hopes of fairness of comparisons.

Felipe Torres Answered question
3

A Metric Learning Reality Check: https://arxiv.org/abs/2003.08505 (CVPR 2020-2021?)

Mostly about how current approaches are not fair and comparing state of the art losses-architectures trained with different regimes yields different results when using state-of-the-art or most recent regularization/augmentation/training recipes.

Felipe Torres Answered question
4

B Biggio, B Nelson, P Laskov, Poisoning Attacks against Support Vector Machines, ICML 2012. [pdf]

Thomas Schatz Changed status to publish
2

R. Koenker and K. F. Hallock, Quantile Regression, Journal of economic perspectives, 2001. [pdf]

Valentin Emiya Edited answer