Mostly AI safety and cooperative AI. When multiple agents, human or artificial, recognise each other, coordinate, and decide whether to cooperate. The papers, with a short note on what each one's actually about.

Selected research

Google Scholar →
  1. 2025 Oral · top 1.8% Equal-first author

    DarkBench: Benchmarking Dark Patterns in Large Language Models

    ICLR 2025 · preliminary version at AAAI 2025 DATASAFE

    Esben Kran, Hieu Minh Nguyen, Akash Kundu, Sami Jawhar, Jinsuk Park, and Mateusz Maria Jurewicz

    660 adversarial prompts across six categories of manipulative behaviour, evaluated against 14 open and proprietary models, uncovering widespread dark patterns and ethical gaps in current systems.

  2. 2026 Accepted First author

    Do LLMs Take Care of Their Own? Similarity Signals Can Induce Cooperation

    AI4Good Workshop, ICML 2026 · under review at NeurIPS 2026

    Akash Kundu, Emanuel Tewolde, Ratip Emin Berker, Samuel F. Brown, and Vincent Conitzer

    Evidence that similarity signals between agents, alone, can be enough to induce cooperation, with consequences for how multi-agent systems are built and governed.

  3. 2025

    MMTEB: Massive Multilingual Text Embedding Benchmark

    ICLR 2025

    Kenneth Enevoldsen, Isaac Chung, Imene Kerboua, and many others, including Akash Kundu

    An expansion of MTEB to 500+ evaluation tasks across 1,000+ languages. My co-authorship came through open-source contributions that reduced computational demand and improved benchmarking efficiency.

  4. 2025

    Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia

    NeurIPS 2025 Datasets & Benchmarks

    Chris Smith, Marwa Abdulhai, Mark Diaz, and others, including Akash Kundu

    A benchmark for multi-agent alignment under conflicting objectives, built out of the Google DeepMind × Cooperative AI Foundation hackathon that preceded the NeurIPS Concordia Contest, where our team ranked among the top, leading to co-authorship.

  5. 2025

    Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI's Real World Effects

    arXiv 2505.18893 · under review at NeurIPS 2025

    Reva Schwartz, Rumman Chowdhury, Akash Kundu, Heather Frase, and others

    A new evaluation ecosystem for the second-order, real-world effects of AI systems: the failures and societal impacts that only surface once the model leaves the lab. I led red-teaming and sandbox human evaluations for the paper.

  6. 2025 First author

    Red Teaming for Trust: Evaluating Multicultural and Multilingual AI Systems in Asia-Pacific

    ICLR 2025 Workshop on Building Trust in Language Models and Applications

    Akash Kundu, Adrianna Tan, Theodora Skeadas, Rumman Chowdhury, and Sarah Amos

    The first multicultural and multilingual AI safety red-teaming challenge in the Asia-Pacific region: a large-scale study with 54 participants from 9 countries, evaluating LLMs across diverse cultural and linguistic contexts.

The complete, citable list (including the AACL, INOCON, and CAMLIS papers) lives on Writing & Papers →