The Institute of Mathematical Statistics (IMS) is proud to introduce the annual IMS Frontiers in Statistical Machine Learning (FSML) workshop series. This series is dedicated to exploring emerging and impactful topics in the field of statistical machine learning that have yet to receive significant attention in leading IMS and ASA publications. Each year, the FSML workshop will spotlight 2-3 themes where research is rapidly evolving, encouraging the dissemination of novel ideas and fostering deeper engagement within the community.

Inspired by the dynamic format of machine learning conference workshops, FSML brings a fresh and interactive approach to the statistics landscape. The workshop will host an open call for short paper submissions, followed by a rigorous and transparent review process to ensure the highest quality of contributions. This inclusive model is designed to promote the exchange of innovative research, stimulating conversation and collaboration among attendees.

Accepted submissions will be presented through poster sessions and enriched by discussions that provide real-time feedback and networking opportunities. This structure is aimed at enhancing dialogue between researchers and practitioners, paving the way for future advances in the field.

The Inaugural Workshop on
Frontiers in Statistical Machine Learning (FSML)

Conference Venue

The inaugural FSML Workshop will be held on August 2, 2025, at Vanderbilt University in Nashville, Tennessee—immediately preceding the 2025 Joint Statistical Meetings (JSM) and following the 2025 IMS NRC meeting.

The program will take place in the Commodore Ballroom of Vanderbilt’s Student Life Center, a convenient 10-minute walk from the NRC’s Holiday Inn Nashville-Vanderbilt venue and about a 10-minute drive (or 30-minute bus ride) from the JSM site at the Music City Center.

Topics

There will be two main streams in the 2025 workshop:

1) The Science of Deep Learning

Theoretical Foundations: Exploring mathematical and statistical principles underlying deep learning.
Phenomenological Studies of Learning Systems: Cataloging and explaining intriguing behaviors in learning dynamics.
Interpretability, Alignment, and Safety: Understanding and guiding AI systems to ensure ethical and safe operation.
Emerging Learning Paradigms: Investigating new approaches, such as in-context learning and scaling laws.
Other Related Topics

2) Statistical Learning from Heterogeneous Data Sources and Generalization

Learning under Distribution Shifts: Addressing challenges caused by mismatches between training and test data, including covariate shifts, label shifts, and other distributional changes.
Distribution Shifts in Scientific Replications: Investigating how distributional changes impact the reproducibility and generalizability of scientific findings.
Domain Adaptation and Domain Generalization: Designing methods to adapt models to new domains or ensure robust performance across diverse domains.
Distributional Robustness: Developing techniques to maintain reliable performance under adversarial or worst-case distributional scenarios.
Semi-Supervised Learning: Combining labeled and unlabeled data to improve learning and generalization across heterogeneous sources, particularly when labeled data is scarce or distributions shift between domains.
Foundation Model Fine-Tuning: Leveraging pre-trained models for specific tasks or domains through transfer learning and fine-tuning.
Other Related Topics

Invited Speakers

Meet our distinguished team of young researchers and academics who will be sharing their insights at the FSML on 1) the science of deep learning and 2) statistical learning from heterogeneous data sources and generalization. Each invited speaker brings a unique perspective and expertise to the workshop.

Session One: The Science of Deep Learning

Qi Lei

New York University

Talk Title: Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension

Talk Abstract: Weak-to-strong (W2S) generalization is a type of finetuning (FT) where a strong (large) student model is trained on pseudo-labels generated by a weak teacher. Surprisingly, W2S FT often outperforms the weak teacher. We seek to understand this phenomenon through the observation that FT often occurs in intrinsically low-dimensional spaces. Leveraging the low intrinsic dimensionality of FT, we analyze W2S in the ridgeless regression setting from a variance reduction perspective. For a strong student - weak teacher pair with sufficiently expressive low-dimensional feature subspaces Vs, Vw, we provide an exact characterization of the variance that dominates the generalization error of W2S. This unveils a virtue of discrepancy between the strong and weak models in W2S: the variance of the weak teacher is inherited by the strong student in Vs ∩ Vw, while reduced by a factor of dim(Vs)/N in the subspace of discrepancy Vw \ Vs with N pseudo-labels for W2S. Further, our analysis casts
light on the sample complexities and the scaling of performance gap recovery in W2S. The analysis is supported with experiments on synthetic regression and real vision and NLP tasks.

Samet Oymak

University of Michigan

Talk Title: Data, Architecture & Algorithms in In‑Context Learning

Talk Abstract: This talk introduces recent theoretical advancements on the in-context learning (ICL) capability of sequence models, focusing on the intricate interplay of data characteristics, architectural design, and the implicit algorithms models learn. We discuss how diverse architectural designs—ranging from linear attention to state-space models to gating mechanisms—implicitly emulate optimization algorithms that operate on the context and draw connections to variations of gradient descent and expectation maximization. We elucidate the critical influence of data characteristics, such as distributional alignment, task correlation, and the presence of unlabeled examples, on ICL performance, quantifying their benefits and revealing the mechanisms through which models leverage such information. Furthermore, we will explore the optimization landscapes governing ICL, establishing conditions for unique global minima and highlighting the architectural features (e.g., depth and dynamic gating) that enable sophisticated algorithmic emulation. As a central message, we advocate that the power of architectural primitives can be gauged from their capability to handle in-context regression tasks with varying sophistication.

Weijie Su

University of Pennsylvania

Talk Title: Some Thoughts on What ML-Inclined Statisticians Should Do in the Age of LLMs

Talk Abstract: In this talk, I will share my perspective on a fundamental challenge facing those of us working at the intersection of statistics and machine learning: how should we engage with large language models (LLMs)? Statistics has a long-standing history of developing principled approaches for reasoning about data in response to emerging technologies, and LLMs represent the most recent example that renders text processing nearly as tractable as numerical computation. However, LLMs constitute a significant departure from previous examples due to their black-box nature, which renders their internal mechanisms largely incomprehensible. In light of this, I will argue that the statistics community should prioritize developing methodologies for LLMs that circumvent their internal workings---such as attention mechanisms---until new mathematical theories become available to (effectively) analyze these complex systems. I will argue that, at least for now, any research that attempts to develop statistical frameworks relying on the attention mechanism or other engineering complexities of Transformers is unlikely to have lasting value. Instead, the focus should be to leverage the probabilistic, autoregressive nature of next-token prediction in LLMs and to emphasize actionable methods that capture quantitative rather than qualitative aspects of observable behaviors. For illustration, I present examples of watermarking, alignment, and evaluation for LLMs. Collectively, these findings demonstrate how statistical reasoning can yield useful insights for improving LLM development, even while the mechanisms of network architecture remain elusive. This talk is based on arXiv:2505.19145, 2506.22343, 2411.13868, 2506.12350, 2505.20627, 2503.10990, and 2506.02058.

Danica J. Sutherland

University of British Columbia

Talk Title: Local Learning Dynamics Help Explain (Post-)Training Behaviour

Talk Abstract: Learning dynamics, which describes how the learning of specific training examples influences the model’s predictions on other examples, give us a powerful tool for understanding the behaviour of deep learning systems. This talk will cover how we can use a local understanding of training steps, building on empirical neural tangent kernels, to better understand phenomena that happen across the course of training a deep network. Applications include better understandings of knowledge distillation, fine-tuning, and particularly preference tuning for LLM post-training.

Session Two: Statistical learning from heterogeneous data sources and generalization

Sivaraman Balakrishnan

Carnegie Mellon

Talk Title: Using black-box predictors to mitigate distribution-shift

Talk Abstract: In this talk, I will discuss two fundamental distribution-shift problems -- the problem of label-shift estimation and the problem of estimating the average treatment effect in an observational study. In both cases practically successful methods leverage black-box high-dimensional predictors. We will explore how these methods have robust, structure-agnostic guarantees -- they can effectively mitigate distribution-shift so long as the black-box predictor satisfies some guarantee (for instance, is accurate or calibrated). We will also discuss the fundamental limits of these black-box methods.

Krikamol Muandet

CISPA Helmholtz Center for Information Security

Talk Title: Imprecise Generalisation: From Invariance to Heterogeneity

Talk Abstract:

The ability to generalise knowledge across diverse environments stands as a fundamental aspect of both biological and artificial intelligence (AI). In recent years, significant advancements have been made in out-of-domain (OOD) generalisation, including the development of new algorithmic tools, theoretical advancements, and the creation of large-scale benchmark datasets. However, unlike in-domain (IID) generalisation, OOD generalisation lacks a precise definition, leading to ambiguity in learning objectives.

In this talk, I will explain how the tools from imprecise probability (IP) can be used to overcome the aforementioned ambiguity. Unlike the in-domain counterpart, the OOD generalisation is challenging because it involves not only learning from empirical data but also deciding among various notions of generalisation, i.e., worst-case, average-case, and interpolation thereof. Consequently, the learners face imprecision over the right notion of generalisation, particularly when there is an institutional separation between machine learners (e.g., ML engineers) and model operators (e.g., doctors), a common problem in practical applications of machine learning.

To address these challenges, I will then introduce the concept of imprecise learning, drawing connections to imprecise probability, and discuss our recent work in the context of domain generalisation (DG), hypothesis testing, and truthful elicitation of imprecise forecasts. By exploring the synergy between learning algorithms and decision-making processes, this talk aims to shed light on the potential impact of IP in machine learning, paving the way for future advancements in the field.

Dominik Rothenhausler

Stanford University

Talk Title: Generalization under random Y|X shift: augmented inverse distribution weighting (AIDW)

Talk Abstract: A popular assumption in transfer learning is covariate shift, which posits that the conditional distribution p(y∣x) remains invariant. However, empirical evidence shows that shifts in this conditional distribution are common. In this talk, we first discuss empirical evidence for a novel distribution shift model under which the likelihood ratio is essentially white noise. We then develop tools to infer parameters based on partially observed data from randomly shifted distributions. Interestingly, our final estimator shares parallels with augmented inverse probability weighting but differs fundamentally in what is reweighted and how it is reweighted. This is ongoing work with Ying Jin and Naoki Egami.

Xinwei Shen

University of Washington

Talk Title: Generalization through the Lens of Distributional Learning

Talk Abstract: To achieve generalization of prediction models under distribution shifts, existing approaches have exploited frameworks such as distributional robustness. In this talk, we discuss a new perspective--fitting the full distribution of training data allows better generalization beyond the observed distribution. We first introduce a distributional learning method called "engression" that estimates the conditional distribution of the response given covariates. Then under different structural settings, we discuss several adaptations of engression to out-of-support covariate shifts (a.k.a., extrapolation), conditional shifts, and causal effect estimation. We demonstrate how estimating the distribution leads to stronger identification and hence better generalization than, e.g., fitting only the mean.

Schedule

9:00: Provided breakfast (coffee, tea, assorted pastries)

Session 1: Science of Deep Learning

09:30-10:05: Invited Talk 1 -- Qi Lei
10:05-10:40: Invited Talk 2 -- Samet Oymak
10:40-11:00: Break
11:00-11:35: Invited Talk 3 -- Weijie Su
11:35-12:10: Invited Talk 4 -- Danica J. Sutherland

1210-1400: Lunch (self-organised)

Session 2: Statistical Learning from Heterogeneous Data Sources

14:00-14:35: Invited Talk 1 -- Krikamol Muandet
14:35-15:10: Invited Talk 2 -- Sivaraman Balakrishnan
15:10-15:30: Break
15:30-16:05: Invited Talk 3 -- Xinwei Shen
16:05-16:40: Invited Talk 4 -- Dominik Rothenhausler

17:00-18:45: Poster session

Important Dates

See the important dates below:

Paper submission Round 1: March 2, 2025 (11:59 PM AoE)
Paper submission Round 2: April 2, 2025 (11:59 PM AoE)
Notice of acceptance: May 2, 2025
Registration deadline: July 2nd, 2025 (same as the end of JSM regular registration deadline)
Conference: All day, 2 August, 2025

Submission Details

Paper Submission:

The review process for the workshop will be single-blind, i.e. authors will not know the reviewers' identity.
Submissions can be existing work or be currently under review at another venue.
This workshop does not host archival proceedings. Authors are encouraged to confirm dual submission policies with journals or conferences. For instance, submitting to both this workshop and NeurIPS typically does not violate their dual submission policy.
Each paper is limited to 5 pages, not including references and supplementary material.
Please use the TMLR Latex Style and Template, which can be found in https://jmlr.org/tmlr/author-guide.html.
If you encounter any problems with the submission of your papers, please contact the organizers' group email.
The workshop will adopt OpenReview as a submission system (click here for the submission portal), which will open February 1st, 2025.

Travel awards:

The IMS is pleased to offer $500USD travel awards to support participation in the workshop. The top 10 applicants, determined based on the quality of their paper submissions, will receive these awards.

Application Process:
- Travel award applications must be submitted alongside your paper submission on OpenReview. Please ensure you check the designated box during the submission process to indicate your interest in applying for the travel award.
- Along with your application, you are required to upload a CV through the OpenReview system.

Eligibility: If you have already applied for any of the following three IMS travel awards this year, you are not eligible for our travel award:
- IMS Hannan Graduate Student Travel Award
- IMS New Researcher Travel Award
- NRC Conference Travel Cost Reimbursement (details available at NRC2024).

Post-submission updates:
After a competitive review process, we awarded ten travel grants to the best papers with lead student authors. They are (in no particular order):

Heterogeneous transfer learning for high dimensional regression with feature mismatch
Jae Ho Chang, Subhadeep Paul, Massimiliano Russo
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Jiajun Song, Zhuoyan Xu, Yiqiao Zhong
Double Descent and Overfitting under Noisy Inputs and Distribution Shift for Linear Denoisers
Chinmaya Kausik, Kashvi Srivastava, Rishi Sonthalia
A Statistical Theory of Overfitting for Imbalanced Classification in Deep Learning
Jingyang Lyu, Kangjie Zhou, Yiqiao Zhong
Detecting Topological Changes and OOD Misplacement in t-SNE and UMAP via Leave-one-out
Zhexuan Liu, Rong Ma, Yiqiao Zhong
A Statistical Theory of Contrastive Learning via Approximate Sufficient Statistics
Licong Lin, Song Mei
Minimax And Adaptive Transfer Learning for Nonparametric Classification under Distributed Differential Privacy Constraints
Arnab Auddy, T. Tony Cai, Abhinav Chakraborty
Transfer Learning for Survival-based Clustering of Predictors with an Application to TP53 Mutation Annotation
Xiaoqian Liu, Hao Yan, Haoming Shi, Emilie Montellier, Eric Chi, Pierre Hainaut, Wenyi Wang
Task Shift: From Classification to Regression in Overparameterized Linear Models
Tyler LaBonte, Kuo-Wei Lai, Vidya Muthukumar
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Tianyu Guo, Druv Pai, Yu Bai, Jiantao Jiao, Michael I. Jordan, Song Mei

Poster Instructions

Portrait orientation: horizontal

Maximum dimensions: 91 inches × 45 inches (231 cm × 114 cm)

Your poster will be displayed on white foamboard standee size 94 inches × 48 inches (200 cm × 100 cm) portrait. We will provide push pins for installing the posters.