Brian R.Y. Huang

prof_pic.jpg

I’m a machine learning engineer and researcher interested in making frontier AI systems more robust and reliable and in helping disseminate their benefits in an equitable and empowering way.

At MIT, I was fortunate to conduct research on adversarial robustness of deep learning models, advised by Hadi Salman and Aleksander Mądry. I’ve previously worked in finance, robotics, and AI safety research. I currently work on RL and post-training for code at Google DeepMind.

selected works

  1. Endless Jailbreaks with Bijection Learning
    Brian RY Huang, and Maximilian Li
    ICLR, 2025
    We devise a "bijection attack," an encoding scheme taught to a language model in-context which bypasses model alignment and comprises a highly effective jailbreak. We differentially modulate the complexity of our bijection scheme across different models and derive a quadratic scaling law, finding that, curiously, our bijection attack is stronger on higher-capability models.
  2. learnedarch.png
    Adversarial Learned Soups: Neural Network Averaging for Joint Clean and Robust Performance
    Brian RY Huang
    Master’s Thesis, 2023
    Supervised by Hadi Salman and Aleksander Mądry.
    We introduce weight-space interpolation methods to the adversarial robustness regime, devising a wrapper architecture to optimize the interpolation coefficients of a "model soup" via adversarial training. Varying the intensity of adversarial training (perturbation distance, TRADES weightings, etc.) leads to a smooth tradeoff between the resulting clean and robust accuracy of the interpolated model.
  3. Does It Know?: Probing and Benchmarking Uncertainty in Language Model Latent Beliefs
    Brian RY Huang, and Joe Kwon
    ATTRIB Workshop @ NeurIPS, 2023
    We extend the recent work of Contrast-Consistent Search by Burns et al., 2023, to detect uncertainty in the factual beliefs of language models. We create a toy dataset of timestamped news factoids as a true/false/uncertain classification benchmark for LLMs with a known training cutoff date.
  4. On Sufficient Conditions for Trapped Surfaces in Spherically Symmetric Spacetimes
    Brian RY Huang
    presented at Siemens Competition, 2017
    Some differential geometry / general relativity research on black hole formation that I was fortunate to conduct during high school!