Raj Dabre

Senior Research Scientist @ Google Research, AustraliaAdjunct Faculty @ IIT Madras (AI4Bharat)Honorary Visiting Assistant Professor @ IIT Bombay (CFILT)

prof_pic.jpeg

I sense a soul in search of an answer!

I am a Senior Research Scientist at Google Research Australia working on Gemini alongside Trevor Cohn and Grace Chung. I am also an Adjunct Faculty in IIT Madras, working alongside Prof Mitesh Khapra and Dr Anoop Kunchukuttan of AI4Bharat, and a visiting Assistant Professor in IIT Bombay (worked alongside the late Prof. Pushpak Bhattacharyya). My work focuses on LLM post-training, reinforcement learning, multilingualism, machine and speech translation, efficiency, low-resource languages, especially Indic and, more recently, Creole languages and dialects.

Prior to this, I was a Researcher at National Institute of Information and Communications Technology (NICT), under the direction of Masao Utiyama. I completed my Ph.D. in Informatics from Kyoto University, Japan under the supervision of Prof. Sadao Kurohashi (currently director general of NII, Japan), M.Tech. in Computer Science from IIT Bombay, India under the supervision of the late Prof. Pushpak Bhattacharyya (ex-director of IIT Patna) and Bachelor’s degree in Computer Engineering from University of Mumbai, India. In the past, I was an intern at Google, Japan in the Rosetta team (Google Translate) headed by Dr. Hideto Kazawa.

In NICT, I focused on low-resource Machine Translation (MT) and Natural Language Generation (NLG), while working with an awesome team of international researchers. Back then I collaborated with Dr Sheng Li of the Speech Lab in NICT. At AI4Bharat and CFILT, I am mainly involved in mentoring students on their research focusing on MT and Evaluation, NLG and, more recently, pragmatics and model robustness. I am one of the co-creators of IndicBART, IndicTrans2, Airavata, YANMTT and Whisper-Streaming. I also have a (Japanese) patent on flexible decoding of generative models.

I have been a reviewer/committee member since 2012 for venues such as ACL, NAACL, IJCAI, EMNLP, CoNLL, WMT, WAT, MT Summit, IJCNLP, ALR, TALLIP, TASLP, CSUR and CSL and an area chair for ACL, NAACL, EMNLP and EACL since 2022. I was also a senior area chair for ACL 2025. I am also an organizer of the Workshop on Asian Translation (WAT) since 2018 and of the M3Oriental workshop since 2023. Long ago, in 2012, I was one of the local organizers of COLING 2012.

I love astronomy, reading books (history, sci-fi and evolution), video games (Steam Deck and Nintendo Switch) and I am a big time anime and manga fan. I speak English, Hindi, Marathi and Japanese.

I have recently started consulting in my free time so please contact me on my personal email (see here for details).


News

2025 (To be updated. See google scholar for up to date publication list)
2024
  • November - Preprints of BhasaAnuvaad, a speech translation dataset, and Pralekha, a parallel document evaluation dataset are public.
  • November - My paper on Kadodi, my native dialect of Marathi, which I co-authored with my mom got a best paper award in WAT.
  • October - CIA has been spotted evaluating your LLM outputs with English references.
  • October - From CVQA to WorldCuisines. VQA for food.
  • September - Our paper on CVQA got accepted to NeurIPS 2024.
  • September - Invited talk titled Advances in Multilingual Machine Translation and Evaluation for Indian Languages.
  • August - Paper on Inline tag transfer for MT accepted to AMTA 2024.
  • August - We got an outstanding paper award and an area chair award at ACL 2024.
  • August - 5 papers accepted to EMNLP/WAT/CoNLL/MRL 2024.
  • May - Invited talk titled Addressing the Data and Modeling Challenges in NLG for Indian Languages.
  • May - 5 co-authored papers accepted to ACL 2024. See you in Thailand.
  • May - My co-workers and I received a best paper award from AfricaNLP at ICLR 2024 for our work on the NGLUENI benchmark and models.
  • May - I received an outstanding performance award in NICT for my role in developing IndicTrans2 which helps translation beteeen Indic and Japanese.
  • April - Our tutorial proposal on “Linguistically Motivated Neural Machine Translation” is accepted to EAMT 2024.
  • April - Our paper on subword centric decoding of subword regularized models is accepted to EAMT 2024.
  • April - My appointment to IIT Bombay as a Visiting Assistant Professor was approved.
  • April - Our paper CreoleVal got accepted to TACL.
  • March - Announcing pre-prints on IndicLLMSuite and Translationese data for LM Pre-training.
  • March - Our paper on Creole Machine Translation has been accepted to NAACL 2024.
  • January - Happy to announce that I have been appointed as an Adjunct Faculty in the Computer Science department of IIT Madras.
  • January - Excited to announce the release of Airavata, an instruction-tuned Hindi LLM. Check out the Blog and Code.
  • January - Announcing pre-prints on our dialects survey, RomanSetu, MT Robustness and Pragmatics Benchamark.
2023