Bits from Rewards: Simple Information-Theoretic Bounds for RL vs. SFT in Verifiable Tasks

October 10, 2025

John Schulman (Thinking Machines) on this blog argues, backed by careful experiments, that LoRA RL fine-tuning can match full-parameter RL fine-tuning. His intuition comes from the fact that in common reinforcement learning from verifiable rewards (RLVR) settings the total number of bits fed to the learner is small, so you don’t need to move all base weights to absorb the signal. In particular, he writes:

Why Patch Features Go Global: A First-Order View of Feature Degradation and Why Gram Anchoring Helps

October 10, 2025

Self-supervised learning has been a game-changer for training large-scale Vision Transformers (ViTs), allowing models to learn powerful visual representations without human-provided labels. However, popular methods like DINO and iBOT suffer from a curious problem: as they train longer, their ability to handle dense, local tasks degrades, even as their global performance improves. This phenomenon, known as feature degradation, is characterized by noisy patch similarity maps and an over-alignment of patch features with the global CLS token.

Video Loading on PyTorch

February 26, 2024

In this post, we will see how to load a video using PyTorch, followed by a rant on how we perfom evaluations of video models. We will try various methods to load a video and convert it to a PyTorch tensor. These include VideoClips (from torchvision), torchvision.IO (using PyAV), and decord, my own implementation using ffmpeg and FFCV. We will use the Kinetics-400 dataset as an example. You can find the dataset here. All the code is available in this repository. I modified FFCV to be able to handle videos, you can find the fork here.

Link Collection 2

January 13, 2024

Some links and papers that I have found interesting this week. If you have any comments, please let me know.

Link Collection 1

December 31, 2023

Some links and papers that I have found interesting this week. If you have any comments, please let me know.