posts – Emilio Cantú

Tiny Recursive Models Pt. 1

a breakdown and some randomization experiments

A few months ago Hierarchical Reasoning Models (HRMs) showed remarkable ARC performance for their relatively tiny (27M) parameter count. While HRMs introduced a lot of…

Fuzzy matching professors to their reviews

lessons from a simple ML approach

During my undergrad, I built a simple schedule planning site for my university. By the third semester I’d grown tired of using excel and learned enough javascript to code up…

Engression

exploring a lightweight approach to distributional regression

paper

Traditional regression models predict the conditional mean \(\mathbb E[Y∣X=x]\), or sometimes a few quantiles. In contrast, distributional regression attempts to learn the en…

Distilling the Knowledge in a Neural Network

Revisiting and implementing part of the classical paper

deep learning

paper

This classic paper introduced distillation as a way of transferring knowledge from a big network teacher into a small one. The core observation is that we should use the big…

TENT: Fully Test-Time Adaptation By Entropy Minimization

An attempted (partial) paper reproduction

deep learning

paper

Once a model is deployed the feature (covariate) data distribution might shift from that seen during training. These shifts make models go out-of-distribution and worsen…

A Closer Look at Memorization in Deep Networks

An attempted (partial) paper reproduction

deep learning

paper

This paper argues that memorization is a behavior exhibited by networks trained on random data, as, in the absence of patterns, they can only rely on remembering examples.…

Approximate Nearest Cosine Neighbors

cs

quick intro

Using Random Hyperplane LSH

Understanding Batch Normalization

An attempted (partial) paper reproduction

deep learning

paper

The paper investigates the cause of batch norm’s benefits experimentally. The authors show that its main benefit is allowing for larger learning rates during training. In…

Deep Learning is Robust to Massive Label Noise

An attempted (partial) paper reproduction

deep learning

paper

The paper shows that neural networks can keep generalizing when large numbers of (non-adversarially) incorrectly labeled examples are added to datasets (MNIST, CIFAR, and…