top of page
White Structure

​

The Institute of Mathematical Statistics (IMS) is proud to introduce the annual IMS Frontiers in Statistical Machine Learning (FSML) workshop series. This series is dedicated to exploring emerging and impactful topics in the field of statistical machine learning that have yet to receive significant attention in leading IMS and ASA publications. Each year, the FSML workshop will spotlight 2-3 themes where research is rapidly evolving, encouraging the dissemination of novel ideas and fostering deeper engagement within the community.

 

Inspired by the dynamic format of machine learning conference workshops, FSML brings a fresh and interactive approach to the statistics landscape. The workshop will host an open call for short paper submissions, followed by a rigorous and transparent review process to ensure the highest quality of contributions. This inclusive model is designed to promote the exchange of innovative research, stimulating conversation and collaboration among attendees.

​

Accepted submissions will be presented through poster sessions and enriched by discussions that provide real-time feedback and networking opportunities. This structure is aimed at enhancing dialogue between researchers and practitioners, paving the way for future advances in the field.​​

The Inaugural Workshop on
Frontiers in Statistical Machine Learning (FSML)

Conference Venue

​The inaugural FSML Workshop will be held on August 2, 2025, at Vanderbilt University in Nashville, Tennessee—immediately preceding the 2025 Joint Statistical Meetings (JSM) and following the 2025 IMS NRC meeting.

 

The program will take place in the Commodore Ballroom of Vanderbilt’s Student Life Center, a convenient 10-minute walk from the NRC’s Holiday Inn Nashville-Vanderbilt venue and about a 10-minute drive (or 30-minute bus ride) from the JSM site at the Music City Center.

Topics

There will be two main streams in the 2025 workshop: 

​

1) The Science of Deep Learning

  • Theoretical Foundations: Exploring mathematical and statistical principles underlying deep learning.

  • Phenomenological Studies of Learning Systems: Cataloging and explaining intriguing behaviors in learning dynamics.

  • Interpretability, Alignment, and Safety: Understanding and guiding AI systems to ensure ethical and safe operation.

  • Emerging Learning Paradigms: Investigating new approaches, such as in-context learning and scaling laws.

  • Other Related Topics

​

2) Statistical Learning from Heterogeneous Data Sources and Generalization

  • Learning under Distribution Shifts: Addressing challenges caused by mismatches between training and test data, including covariate shifts, label shifts, and other distributional changes.

  • Distribution Shifts in Scientific Replications: Investigating how distributional changes impact the reproducibility and generalizability of scientific findings.

  • Domain Adaptation and Domain Generalization: Designing methods to adapt models to new domains or ensure robust performance across diverse domains.

  • Distributional Robustness: Developing techniques to maintain reliable performance under adversarial or worst-case distributional scenarios.

  • Semi-Supervised Learning: Combining labeled and unlabeled data to improve learning and generalization across heterogeneous sources, particularly when labeled data is scarce or distributions shift between domains.

  • Foundation Model Fine-Tuning: Leveraging pre-trained models for specific tasks or domains through transfer learning and fine-tuning.

  • Other Related Topics

 

Invited Speakers

Meet our distinguished team of young researchers and academics who will be sharing their insights at the FSML on 1) the science of deep learning and 2) statistical learning from heterogeneous data sources and generalization. Each invited speaker brings a unique perspective and expertise to the workshop.

Session One: The Science of Deep Learning

Qi Lei_edited.jpg

Qi Lei

New York University

Talk Title: Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension

Talk Abstract: Weak-to-strong (W2S) generalization is a type of finetuning (FT) where a strong (large) student model is trained on pseudo-labels generated by a weak teacher. Surprisingly, W2S FT often outperforms the weak teacher. We seek to understand this phenomenon through the observation that FT often occurs in intrinsically low-dimensional spaces. Leveraging the low intrinsic dimensionality of FT, we analyze W2S in the ridgeless regression setting from a variance reduction perspective. For a strong student - weak teacher pair with sufficiently expressive low-dimensional feature subspaces Vs, Vw, we provide an exact characterization of the variance that dominates the generalization error of W2S. This unveils a virtue of discrepancy between the strong and weak models in W2S: the variance of the weak teacher is inherited by the strong student in Vs ∩ Vw, while reduced by a factor of dim(Vs)/N in the subspace of discrepancy Vw \ Vs with N pseudo-labels for W2S. Further, our analysis casts
light on the sample complexities and the scaling of performance gap recovery in W2S. The analysis is supported with experiments on synthetic regression and real vision and NLP tasks.

Samet Oymak.jpg

Samet Oymak

University of Michigan

Talk Title: Data, Architecture & Algorithms in In‑Context Learning

Talk Abstract: This talk introduces recent theoretical advancements on the in-context learning (ICL) capability of sequence models, focusing on the intricate interplay of data characteristics, architectural design, and the implicit algorithms models learn. We discuss how diverse architectural designs—ranging from linear attention to state-space models to gating mechanisms—implicitly emulate optimization algorithms that operate on the context and draw connections to variations of gradient descent and expectation maximization. We elucidate the critical influence of data characteristics, such as distributional alignment, task correlation, and the presence of unlabeled examples, on ICL performance, quantifying their benefits and revealing the mechanisms through which models leverage such information. Furthermore, we will explore the optimization landscapes governing ICL, establishing conditions for unique global minima and highlighting the architectural features (e.g., depth and dynamic gating) that enable sophisticated algorithmic emulation. As a central message, we advocate that the power of architectural primitives can be gauged from their capability to handle in-context regression tasks with varying sophistication.

​

weijie.jpg

Weijie Su

University of Pennsylvania

Talk Title: Some Thoughts on What ML-Inclined Statisticians Should Do in the Age of LLMs

Talk Abstract: In this talk, I will share my perspective on a fundamental challenge facing those of us working at the intersection of statistics and machine learning: how should we engage with large language models (LLMs)? Statistics has a long-standing history of developing principled approaches for reasoning about data in response to emerging technologies, and LLMs represent the most recent example that renders text processing nearly as tractable as numerical computation. However, LLMs constitute a significant departure from previous examples due to their black-box nature, which renders their internal mechanisms largely incomprehensible. In light of this, I will argue that the statistics community should prioritize developing methodologies for LLMs that circumvent their internal workings---such as attention mechanisms---until new mathematical theories become available to (effectively) analyze these complex systems. I will argue that, at least for now, any research that attempts to develop statistical frameworks relying on the attention mechanism or other engineering complexities of Transformers is unlikely to have lasting value. Instead, the focus should be to leverage the probabilistic, autoregressive nature of next-token prediction in LLMs and to emphasize actionable methods that capture quantitative rather than qualitative aspects of observable behaviors. For illustration, I present examples of watermarking, alignment, and evaluation for LLMs. Collectively, these findings demonstrate how statistical reasoning can yield useful insights for improving LLM development, even while the mechanisms of network architecture remain elusive. This talk is based on arXiv:2505.19145, 2506.22343, 2411.13868, 2506.12350, 2505.20627, 2503.10990, and 2506.02058.

danica.jpg

Danica J. Sutherland

University of British Columbia

Talk Title: Local Learning Dynamics Help Explain (Post-)Training Behaviour

Talk Abstract: Learning dynamics, which describes how the learning of specific training examples influences the model’s predictions on other examples, give us a powerful tool for understanding the behaviour of deep learning systems. This talk will cover how we can use a local understanding of training steps, building on empirical neural tangent kernels, to better understand phenomena that happen across the course of training a deep network. Applications include better understandings of knowledge distillation, fine-tuning, and particularly preference tuning for LLM post-training.

Session Two: Statistical learning from heterogeneous data sources and generalization 

siva2.jpg

Sivaraman Balakrishnan

Carnegie Mellon

Talk Title: Using black-box predictors to mitigate distribution-shift

Talk Abstract: In this talk, I will discuss two fundamental distribution-shift problems -- the problem of label-shift estimation and the problem of estimating the average treatment effect in an observational study. In both cases practically successful methods leverage black-box high-dimensional predictors. We will explore how these methods have robust, structure-agnostic guarantees -- they can effectively mitigate distribution-shift so long as the black-box predictor satisfies some guarantee (for instance, is accurate or calibrated). We will also discuss the fundamental limits of these black-box methods.

Krikamol Muandet.jpg

Krikamol Muandet

CISPA Helmholtz Center for Information Security

Talk Title: Imprecise Generalisation: From Invariance to Heterogeneity

Talk Abstract: 

The ability to generalise knowledge across diverse environments stands as a fundamental aspect of both biological and artificial intelligence (AI). In recent years, significant advancements have been made in out-of-domain (OOD) generalisation, including the development of new algorithmic tools, theoretical advancements, and the creation of large-scale benchmark datasets. However, unlike in-domain (IID) generalisation, OOD generalisation lacks a precise definition, leading to ambiguity in learning objectives.

 

In this talk, I will explain how the tools from imprecise probability (IP) can be used to overcome the aforementioned ambiguity. Unlike the in-domain counterpart, the OOD generalisation is challenging because it involves not only learning from empirical data but also deciding among various notions of generalisation, i.e., worst-case, average-case, and interpolation thereof. Consequently, the learners face imprecision over the right notion of generalisation, particularly when there is an institutional separation between machine learners (e.g., ML engineers) and model operators (e.g., doctors), a common problem in practical applications of machine learning.

 

To address these challenges, I will then introduce the concept of imprecise learning, drawing connections to imprecise probability, and discuss our recent work in the context of domain generalisation (DG), hypothesis testing, and truthful elicitation of imprecise forecasts. By exploring the synergy between learning algorithms and decision-making processes, this talk aims to shed light on the potential impact of IP in machine learning, paving the way for future advancements in the field.

Dominik-Rothenhausler.jpg

Dominik Rothenhausler

Stanford University

Talk Title: Generalization under random Y|X shift: augmented inverse distribution weighting (AIDW)

Talk Abstract: A popular assumption in transfer learning is covariate shift, which posits that the conditional distribution p(y∣x) remains invariant. However, empirical evidence shows that shifts in this conditional distribution are common. In this talk, we first discuss empirical evidence for a novel distribution shift model under which the likelihood ratio is essentially white noise. We then develop tools to infer parameters based on partially observed data from randomly shifted distributions. Interestingly, our final estimator shares parallels with augmented inverse probability weighting but differs fundamentally in what is reweighted and how it is reweighted. This is ongoing work with Ying Jin and Naoki Egami.

xinwei_shen.png

Xinwei Shen

University of Washington

Talk Title: Generalization through the Lens of Distributional Learning

Talk Abstract: To achieve generalization of prediction models under distribution shifts, existing approaches have exploited frameworks such as distributional robustness. In this talk, we discuss a new perspective--fitting the full distribution of training data allows better generalization beyond the observed distribution. We first introduce a distributional learning method called "engression" that estimates the conditional distribution of the response given covariates. Then under different structural settings, we discuss several adaptations of engression to out-of-support covariate shifts (a.k.a., extrapolation), conditional shifts, and causal effect estimation. We demonstrate how estimating the distribution leads to stronger identification and hence better generalization than, e.g., fitting only the mean.

Schedule

9:00: Provided breakfast (coffee, tea, assorted pastries)

Session 1: Science of Deep Learning
  • 09:30-10:05​​: Invited Talk 1 -- Qi Lei
  • 10:05-10:40: Invited Talk 2 -- Samet Oymak
  • 10:40-11:00: Break
  • 11:00-11:35: Invited Talk 3 -- Weijie Su
  • 11:35-12:10: Invited Talk 4 -- Danica J. Sutherland
1210-1400: Lunch (self-organised)
Session 2: Statistical Learning from Heterogeneous Data Sources
  • 14:00-14:35​​: Invited Talk 1 -- Krikamol Muandet
  • 14:35-15:10: Invited Talk 2 -- Sivaraman Balakrishnan
  • 15:10-15:30: Break
  • 15:30-16:05: Invited Talk 3 -- Xinwei Shen
  • 16:05-16:40: Invited Talk 4 -- Dominik Rothenhausler
​​
 
17:00-18:45: Poster session

Important Dates

​See the important dates below:​

  • Paper submission Round 1: March 2, 2025 (11:59 PM AoE) 

  • Paper submission Round 2: April 2, 2025 (11:59 PM AoE) 

  • Notice of acceptance: May 2, 2025

  • Registration deadline: July 2nd, 2025 (same as the end of JSM regular registration deadline)

  • Conference: All day, 2 August, 2025​​

Submission Details

Paper Submission:

  • The review process for the workshop will be single-blind, i.e. authors will not know the reviewers' identity.

  • Submissions can be existing work or be currently under review at another venue.

  • This workshop does not host archival proceedings. Authors are encouraged to confirm dual submission policies with journals or conferences. For instance, submitting to both this workshop and NeurIPS typically does not violate their dual submission policy.

  • Each paper is limited to 5 pages, not including references and supplementary material.

  • Please use the TMLR Latex Style and Template, which can be found in https://jmlr.org/tmlr/author-guide.html

  • If you encounter any problems with the submission of your papers, please contact the organizers' group email.​​

  • The workshop will adopt OpenReview as a submission system (click here for the submission portal), which will open February 1st, 2025

​

Travel awards:

The IMS is pleased to offer $500USD travel awards to support participation in the workshop. The top 10 applicants, determined based on the quality of their paper submissions, will receive these awards.

​

  • Application Process:

    • Travel award applications must be submitted alongside your paper submission on OpenReview. Please ensure you check the designated box during the submission process to indicate your interest in applying for the travel award.

    • Along with your application, you are required to upload a CV through the OpenReview system.

​

  • ​Eligibility: If you have already applied for any of the following three IMS travel awards this year, you are not eligible for our travel award:

    • IMS Hannan Graduate Student Travel Award

    • IMS New Researcher Travel Award

    • NRC Conference Travel Cost Reimbursement (details available at NRC2024).

 

Post-submission updates:
After a competitive review process, we awarded ten travel grants to the best papers with lead student authors. They are (in no particular order):

  • Heterogeneous transfer learning for high dimensional regression with feature mismatch
    Jae Ho Chang, Subhadeep Paul, Massimiliano Russo

  • Out-of-distribution generalization via composition: a lens through induction heads in Transformers
    Jiajun Song, Zhuoyan Xu, Yiqiao Zhong

  • Double Descent and Overfitting under Noisy Inputs and Distribution Shift for Linear Denoisers
    Chinmaya Kausik, Kashvi Srivastava, Rishi Sonthalia

  • A Statistical Theory of Overfitting for Imbalanced Classification in Deep Learning
    Jingyang Lyu, Kangjie Zhou, Yiqiao Zhong

  • Detecting Topological Changes and OOD Misplacement in t-SNE and UMAP via Leave-one-out
    Zhexuan Liu, Rong Ma, Yiqiao Zhong

  • A Statistical Theory of Contrastive Learning via Approximate Sufficient Statistics
    Licong Lin, Song Mei

  • Minimax And Adaptive Transfer Learning for Nonparametric Classification under Distributed Differential Privacy Constraints
    Arnab Auddy, T. Tony Cai, Abhinav Chakraborty

  • Transfer Learning for Survival-based Clustering of Predictors with an Application to TP53 Mutation Annotation
    Xiaoqian Liu, Hao Yan, Haoming Shi, Emilie Montellier, Eric Chi, Pierre Hainaut, Wenyi Wang

  • Task Shift: From Classification to Regression in Overparameterized Linear Models
    Tyler LaBonte, Kuo-Wei Lai, Vidya Muthukumar

  • Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
    Tianyu Guo, Druv Pai, Yu Bai, Jiantao Jiao, Michael I. Jordan, Song Mei

​

​​

Poster Instructions

Portrait orientation: horizontal


Maximum dimensions: 91 inches × 45 inches (231 cm × 114 cm)

 

Your poster will be displayed on white foamboard standee size 94 inches × 48 inches (200 cm × 100 cm) portrait. We will provide push pins for installing the posters.

Registration

There are three types of registration, including a special offer for students to attend our workshop:​

  • IMS Member - US$150

  • Non-IMS Member - US$175

  • Student - US$75​

 

Please click here to register.

vanderbilt.jpeg

ORGANISERS (alphabetical order)

Program Committee

Yuansi.jpg

ETH Zürich,

Switzerland

Program Co-chair

Feng_Liu.png

University of Melbourne,

Australia

Program Co-chair

Song_Mei.png

UC Berkeley,

United States

Program Co-chair

Pragya.jpg

Harvard University,

United States

Program Co-chair

Susan.jpg

Monash University,

Australia

Program Co-chair

Local Committee

Cullum headshot 2024_edited.jpg

Vanderbilt University,

United States

Local Arrangement Chair

PanpanZhang.jpg

Vanderbilt University,

United States

Local Arrangement Chair

GET IN TOUCH

If you have questions about the submission/registration process, don’t hesitate to reach out.

Networking

©2024 by The Inaugural Workshop on Frontiers in Statistical Machine Learning.

Proudly created with Wix.com

bottom of page