# Publications

Publications in reversed chronological order. Please also see my Google Scholar profile for a complete list.

## 2022

- ICMLScalable First-Order Bayesian Optimization via Structured Automatic Differentiation
*Ament, Sebastian*, and Gomes, Carla*In International Conference on Machine Learning*2022Bayesian Optimization (BO) has shown great promise for the global optimization of functions that are expensive to evaluate, but despite many successes, standard approaches can struggle in high dimensions. To improve the performance of BO, prior work suggested incorporating gradient information into a Gaussian process surrogate of the objective, giving rise to kernel matrices of size nd × nd for n observations in d dimensions. Naïvely multiplying with (resp. inverting) these matrices requires O(n^2d^2) (resp. O(n^3d^3)) operations, which becomes infeasible for moderate dimensions and sample sizes. Here, we observe that a wide range of kernels gives rise to structured matrices, enabling an exact O(n^2d) matrix-vector multiply for gradient observations and O(n^2d^2) for Hessian observations. Beyond canonical kernel classes, we derive a programmatic approach to leveraging this type of structure for transformations and combinations of the discussed kernel classes, which constitutes a structure-aware automatic differentiation algorithm. Our methods apply to virtually all canonical kernels and automatically extend to complex kernels, like the neural network, radial basis function network, and spectral mixture kernels without any additional derivations, enabling flexible, problem-dependent modeling while scaling first-order BO to high d.

- ICASSPGeneralized Matching Pursuits for the Sparse Optimization of Separable Objectives
*Ament, Sebastian*, and Gomes, Carla*In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*2022Matching pursuit algorithms are a popular family of algorithms for compressed sensing and feature selection. Originally, Matching Pursuit (MP) was proposed as an algorithm for the least-squares objective, but has recently been generalized to arbitrary convex objectives. Here, we are concerned with the case of a general objective that is separable over observed data points, which encompasses most problems of practical interest: least-squares, logistic, and robust regression problems, and the class of generalized linear models. We propose efficient generalizations of Forward and Backward Stepwise Regression for this case, which take advantage of special structure in the Hessian matrix and are based on a locally quadratic approximation of the objective. Notably, the acquisition criterion of the generalized stepwise algorithms can be computed with the same complexity as the ones for the least-squares objective. We further propose a modification to the Newton step to avoid saddle points of non-convex objectives. Lastly, we demonstrate the generality and performance of the forward algorithm on least-squares, logistic, and robust regression problems, for which it compares favorably to generalized Orthogonal Matching Pursuit (OMP) on problems with moderate to large condition numbers.

- AISTATSThe Fast Kernel TransformRyan, John P.,
*Ament, Sebastian*, Gomes, Carla P., and Damle, Anil*In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics*2022Kernel methods are a highly effective and widely used collection of modern machine learning algorithms. A fundamental limitation of virtually all such methods are computations involving the kernel matrix that naively scale quadratically (e.g., matrix-vector multiplication) or cubically (solving linear systems) with the size of the dataset N. We propose the Fast Kernel Transform (FKT), a general algorithm to compute matrix-vector multiplications (MVMs) for datasets in moderate dimensions with quasilinear complexity. Typically, analytically grounded fast multiplication methods require specialized development for specific kernels. In contrast, our scheme is based on auto-differentiation and automated symbolic computations that leverage the analytical structure of the underlying kernel. This allows the FKT to be easily applied to a broad class of kernels, including Gaussian, Matern, and Rational Quadratic covariance functions and Green’s functions, including those of the Laplace and Helmholtz equations. Furthermore, the FKT maintains a high, quantifiable, and controllable level of accuracy—properties that many acceleration methods lack. We illustrate the efficacy and versatility of the FKT by providing timing and accuracy benchmarks with comparisons to adjacent methods, and by applying it to scale the stochastic neighborhood embedding (t-SNE) and Gaussian processes to large real-world datasets.

## 2021

- Sci. Adv.Autonomous materials synthesis via hierarchical active learning of nonequilibrium phase diagrams
*Ament, Sebastian*, Amsler, Maximilian, Sutherland, Duncan R., Chang, Ming-Chiang, Guevarra, Dan, Connolly, Aine B., Gregoire, John M., Thompson, Michael O., Gomes, Carla P., and Dover, R. Bruce*Science Advances*2021Artificial intelligence accelerates the search and discovery of new metastable materials for energy applications. Autonomous experimentation enabled by artificial intelligence offers a new paradigm for accelerating scientific discovery. Nonequilibrium materials synthesis is emblematic of complex, resource-intensive experimentation whose acceleration would be a watershed for materials discovery. We demonstrate accelerated exploration of metastable materials through hierarchical autonomous experimentation governed by the Scientific Autonomous Reasoning Agent (SARA). SARA integrates robotic materials synthesis using lateral gradient laser spike annealing and optical characterization along with a hierarchy of AI methods to map out processing phase diagrams. Efficient exploration of the multidimensional parameter space is achieved with nested active learning cycles built upon advanced machine learning models that incorporate the underlying physics of the experiments and end-to-end uncertainty quantification. We demonstrate SARA’s performance by autonomously mapping synthesis phase boundaries for the Bi2O3 system, leading to orders-of-magnitude acceleration in the establishment of a synthesis phase diagram that includes conditions for stabilizing δ-Bi2O3 at room temperature, a critical development for electrochemical technologies.

- ICMLSparse Bayesian Learning via Stepwise Regression
*Ament, Sebastian*, and Gomes, Carla*In International Conference on Machine Learning*2021Sparse Bayesian Learning (SBL) is a powerful framework for attaining sparsity in probabilistic models. Herein, we propose a coordinate ascent algorithm for SBL termed Relevance Matching Pursuit (RMP) and show that, as its noise variance parameter goes to zero, RMP exhibits a surprising connection to Stepwise Regression. Further, we derive novel guarantees for Stepwise Regression algorithms, which also shed light on RMP. Our guarantees for Forward Regression improve on deterministic and probabilistic results for Orthogonal Matching Pursuit with noise. Our analysis of Backward Regression culminates in a bound on the residual of the optimal solution to the subset selection problem that, if satisfied, guarantees the optimality of the result. To our knowledge, this bound is the first that can be computed in polynomial time and depends chiefly on the smallest singular value of the matrix. We report numerical experiments using a variety of feature selection algorithms. Notably, RMP and its limiting variant are both efficient and maintain strong performance with correlated features.

- ICASSPOn the Optimality of Backward Regression: Sparse Recovery and Subset Selection
*Ament, Sebastian*, and Gomes, Carla*In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*2021Sparse recovery and subset selection are fundamental problems in varied communities, including signal processing, statistics and machine learning. Herein, we focus on an important greedy algorithm for these problems: Backward Stepwise Regression. We present novel guarantees for the algorithm, propose an efficient, numerically stable implementation, and put forth Stepwise Regression with Replacement (SRR), a new family of two-stage algorithms that employs both forward and backward steps for compressed sensing problems. Prior work on the backward algorithm has proven its optimality for the subset selection problem, provided the residual associated with the optimal solution is small enough. However, the existing bounds on the residual magnitude are NP-hard to compute. In contrast, our main theoretical result includes a bound that can be computed in polynomial time, depends chiefly on the smallest singular value of the matrix, and also extends to the method of magnitude pruning. In addition, we report numerical experiments highlighting crucial differences between forward and backward greedy algorithms and compare SRR against popular two-stage algorithms for compressed sensing. Remarkably, SRR algorithms generally maintain good sparse recovery performance on coherent dictionaries. Further, a particular SRR algorithm has an edge over Subspace Pursuit.

## 2019

- Nat. Comp. Mat.Multi-component background learning automates signal detection for spectroscopic data
*Ament, Sebastian*, Stein, Helge S, Guevarra, Dan, Zhou, Lan, Haber, Joel A, Boyd, David A, Umehara, Mitsutaro, Gregoire, John M, and Gomes, Carla P*npj Computational Materials*2019Automated experimentation has yielded data acquisition rates that supersede human processing capabilities. Artificial Intelligence offers new possibilities for automating data interpretation to generate large, high-quality datasets. Background subtraction is a long-standing challenge, particularly in settings where multiple sources of the background signal coexist, and automatic extraction of signals of interest from measured signals accelerates data interpretation. Herein, we present an unsupervised probabilistic learning approach that analyzes large data collections to identify multiple background sources and establish the probability that any given data point contains a signal of interest. The approach is demonstrated on X-ray diffraction and Raman spectroscopy data and is suitable to any type of data where the signal of interest is a positive addition to the background signals. While the model can incorporate prior knowledge, it does not require knowledge of the signals since the shapes of the background signals, the noise levels, and the signal of interest are simultaneously learned via a probabilistic matrix factorization framework. Automated identification of interpretable signals by unsupervised probabilistic learning avoids the injection of human bias and expedites signal extraction in large datasets, a transformative capability with many applications in the physical sciences and beyond.

- ArXiVExponentially-Modified Gaussian Mixture Model: Applications in Spectroscopy
*Ament, Sebastian*, Gregoire, John, and Gomes, Carla2019We propose a novel exponentially-modified Gaussian (EMG) mixture residual model. The EMG mixture is well suited to model residuals that are contaminated by a distribution with positive support. This is in contrast to commonly used robust residual models, like the Huber loss or ℓ1, which assume a symmetric contaminating distribution and are otherwise asymptotically biased. We propose an expectation-maximization algorithm to optimize an arbitrary model with respect to the EMG mixture. We apply the approach to linear regression and probabilistic matrix factorization (PMF). We compare against other residual models, including quantile regression. Our numerical experiments demonstrate the strengths of the EMG mixture on both tasks. The PMF model arises from considering spectroscopic data. In particular, we demonstrate the effectiveness of PMF in conjunction with the EMG mixture model on synthetic data and two real-world applications: X-ray diffraction and Raman spectroscopy. We show how our approach is effective in inferring background signals and systematic errors in data arising from these experimental settings, dramatically outperforming existing approaches and revealing the data’s physically meaningful components.

## 2017

- Stat. and Comp.Accurate and efficient numerical calculation of stable densities via optimized quadrature and asymptotics
*Ament, Sebastian*, and O’Neil, Michael*Statistics and Computing*Jan 2017Stable distributions are an important class of infinitely divisible probability distributions, of which two special cases are the Cauchy distribution and the normal distribution. Aside from a few special cases, the density function for stable distributions has no known analytic form and is expressible only through the variate’s characteristic function or other integral forms. In this paper, we present numerical schemes for evaluating the density function for stable distributions, its gradient, and distribution function in various parameter regimes of interest, some of which had no preexisting efficient method for their computation. The novel evaluation schemes consist of optimized generalized Gaussian quadrature rules for integral representations of the density function, complemented by asymptotic expansions near various values of the shape and argument parameters. We report several numerical examples illustrating the efficiency of our methods. The resulting code has been made available online.

- ArXiVSolving the stochastic Landau-Lifshitz-Gilbert-Slonczewski equation for monodomain nanomagnets : A survey and analysis of numerical techniques
*Ament, Sebastian*, Rangarajan, Nikhil, Parthasarathy, Arun, and Rakheja, ShalooJan 2017The stochastic Landau-Lifshitz-Gilbert-Slonczewski (s-LLGS) equation is widely used to study the temporal evolution of the macrospin subject to spin torque and thermal noise. The numerical simulation of the s-LLGS equation requires an appropriate choice of stochastic calculus and numerical integration scheme. In this paper, we comprehensively evaluate the accuracy and complexity of various numerical techniques to solve the s-LLGS equation. We focus on implicit midpoint, Heun, and Euler-Heun methods that converge to the Stratonovich solution of the s-LLGS equation. By performing numerical tests for both strong (path-wise) and weak (statistical) convergence, we quantify the accuracy of various numerical schemes used to solve the s-LLGS equation. We demonstrate a new method intended to solve Stochastic Differential Equations (SDEs) with small noise (RK4-Heun), and test its capability to handle the s-LLGS equation. We also discuss the circuit implementation of nanomagnets for large-scale SPICE-based simulations. We evaluate the efficacy of SPICE in handling the stochastic dynamics of the multiplicative noise in the s-LLGS equation. Numerical schemes such as Euler and Gear, typically used by SPICE-based circuit simulators do not yield the expected outcome when solving the Stratonovich s-LLGS equation. While the trapezoidal method in SPICE does solve for the Stratonovich solution, its accuracy is limited by the minimum time step of integration in SPICE. We implement the s-LLGS equation in both its cartesian and spherical coordinates form in SPICE and compare the stability and accuracy of the two implementations. The results in this paper will serve as guidelines for researchers to understand the tradeoffs between accuracy and complexity of various numerical methods and the choice of appropriate calculus to solve the s-LLGS equation.