This event marks the first in-person YC Paper Club meetup, aimed at building a community that connects top founders with experienced researchers. The host emphasized the community's mission to address the information asymmetry between academic research and industrial deployment, and reflected on personally witnessing the early days of AI companies like OpenAI through YC. Located in Silicon Valley, the event was designed to tap into the region's rich AI talent, advance real-world applications through accessible paper discussions, and spark more collaboration between academia and industry.
⚡️ Speculative-Speculative Decoding (SSD) for Inference Acceleration
Tanishq presented how Speculative-Speculative Decoding (SSD) speeds up large-model inference, arguing that inference is not just a cost problem but a fundamental capability constraint. The core idea behind SSD is parallelizing traditionally serial inference steps — a draft model predicts the verifier's future outputs, hiding the latency of the drafting phase. This approach enables significant throughput gains on limited hardware. It demonstrates that algorithmic innovation can not only cut costs but treat inference "think time" as a core capability, delivering higher-quality intelligent output.
🤖 Diffusion Model Predictive Control (DMPC) for Robotics
Stanis introduced how diffusion models can be incorporated into Model Predictive Control (MPC) to build more robust robot policies. DMPC combines multi-step action plan generation with multi-step dynamic model prediction, effectively reducing error accumulation in long-horizon trajectory forecasting. A key advantage is real-time adaptation to new tasks and different dynamics — by modifying test-time reward functions or updating the dynamics model — without retraining the full policy. Experiments show that DMPC can leverage video data more effectively than traditional imitation learning and generalizes well across diverse tasks.
Isaac explored LayWorld, an efficient world-model approach based on the Joint-Embedding Predictive Architecture (JEPA), designed to solve the collapse problem in representation learning. By introducing a novel SigReg regularization term, the model maintains a healthy distribution in latent space and ensures prediction stability. Compared to existing complex methods, LayWorld achieves efficient training with minimal hyperparameter tuning. It also demonstrates strong uncertainty quantification — detecting anomalous input fluctuations in real time — providing critical support for safe planning and decision-making by intelligent agents in the real world.
🧠 Exploring the Nature of Deep Learning Generalization
Akshay discussed a deeper understanding of generalization in deep learning, focusing on why large-scale models still perform well despite over-parameterization, contrary to traditional statistical predictions of overfitting. Using the PAC-Bayes framework, he explained the positive correlation between model compressibility and generalization performance, noting that over-parameterized models tend to find smoother, more compressible solutions in parameter space. By introducing soft inductive biases, neural networks balance strong expressive power with the ability to fit structured data — demystifying deep learning and pointing the way toward improved sample efficiency.
📈 Infinite-Compute Training Strategies Under Data Constraints
Kan Wu addressed how to maximize data efficiency through algorithmic innovation when pre-training data is scarce but compute is abundant. The paper proposes combining regularization with ensembling, finding that aggressive weight decay combined with ensemble training significantly reduces the asymptotic performance loss. This research explores scaling laws in compute-unconstrained settings, demonstrating that data epoching and distillation can match or even exceed traditional pre-training performance on smaller datasets — offering an engineering-grade methodology for model training when data is scarce.
Highlights
⚡ Speculative-Speculative Decoding treats inference "think time" as a core capability — not just a cost — by parallelizing serial steps to unlock higher-quality model output on limited hardware.
🤖 Diffusion Model Predictive Control enables robots to adapt to new tasks at test time by modifying reward functions or dynamics models without retraining, making real-world deployment far more practical.
🌍 The LayWorld model solves representation collapse with a lightweight SigReg term, enabling efficient world-model training with strong uncertainty quantification for safe agent decision-making.
🧠 Model compressibility positively correlates with generalization: over-parameterized networks find smoother, more compressible solutions, explaining why large models don't overfit as classical theory predicts.
📈 When data is scarce but compute is abundant, combining aggressive weight decay with ensemble training can match or exceed traditional pre-training performance — a practical blueprint for data-constrained builders.