Raj Dabre

Researcher @ NICT, JapanAdjunct Faculty @ IIT Madras (AI4Bharat)Honorary Visiting Assistant Professor @ IIT Bombay (CFILT)

prof_pic.jpeg

I sense a soul in search of an answer!

I am a Researcher at National Institute of Information and Communications Technology (NICT), under the direction of Masao Utiyama, an Adjunct Faculty in IIT Madras, working alongside Prof Mitesh Khapra and Dr Anoop Kunchukuttan of AI4Bharat, and an incoming visiting Assistant Professor in IIT Bombay working alongside Prof. Pushpak Bhattacharyya. I work on efficient multilingual understanding and generation for low-resource languages, especially Indic and, more recently, Creole languages.

I completed my Ph.D. in Informatics from Kyoto University, Japan under the supervision of Prof. Sadao Kurohashi (currently director general of NII, Japan), M.Tech. in Computer Science from IIT Bombay, India under the supervision of Prof. Pushpak Bhattacharyya (ex-director of IIT Patna) and Bachelor’s degree in Computer Engineering from University of Mumbai, India. In the past, I was an intern at Google, Japan in the Rosetta team (Google Translate) headed by Dr. Hideto Kazawa.

In NICT, I focus on low-resource Machine Translation (MT) and Natural Language Generation (NLG), while working with an awesome team of international researchers. I also collaborate with Dr Sheng Li of the Speech Lab in NICT. At AI4Bharat and CFILT, I am mainly involved in mentoring students on their research focusing on MT and Evaluation, NLG and, more recently, pragmatics and model robustness. Now that I am an adjunct faculty I hope to offer a course or two on Generative AI someday soon. I am one of the co-creators of IndicBART, IndicTrans2, Airavata, YANMTT and Whisper-Streaming. I also have a (Japanese) patent on flexible decoding of generative models.

I have been a reviewer/committee member since 2012 for venues such as ACL, NAACL, IJCAI, EMNLP, CoNLL, WMT, WAT, MT Summit, IJCNLP, ALR, TALLIP, TASLP, CSUR and CSL and an area chair for ACL, NAACL, EMNLP and EACL since 2022. I am also an organizer of the Workshop on Asian Translation (WAT) since 2018 and of the M3Oriental workshop since 2023. Long ago, in 2012, I was one of the local organizers of COLING 2012.

I love astronomy, reading books (history, sci-fi and evolution), video games (Steam Deck and Nintendo Switch) and I am a big time anime and manga fan. I speak English, Hindi, Marathi and Japanese.


News

2024
  • May - Invited talk titled ``Addressing the Data and Modeling Challenges in NLG for Indian Languages’’.
  • May - 5 co-authored papers accepted to ACL 2024. See you in Thailand.
  • May - My co-workers and I received a best paper award from AfricaNLP at ICLR 2024 for our work on the NGLUENI benchmark and models.
  • May - I received an outstanding performance award in NICT for my role in developing IndicTrans2 which helps translation beteeen Indic and Japanese.
  • April - Our tutorial proposal on “Linguistically Motivated Neural Machine Translation” is accepted to EAMT 2024.
  • April - Our paper on subword centric decoding of subword regularized models is accepted to EAMT 2024.
  • April - My appointment to IIT Bombay as a Visiting Assistant Professor was approved.
  • April - Our paper CreoleVal got accepted to TACL.
  • March - Announcing pre-prints on IndicLLMSuite and Translationese data for LM Pre-training.
  • March - Our paper on Creole Machine Translation has been accepted to NAACL 2024.
  • January - Happy to announce that I have been appointed as an Adjunct Faculty in the Computer Science department of IIT Madras.
  • January - Excited to announce the release of Airavata, an instruction-tuned Hindi LLM. Check out the Blog and Code.
  • January - Announcing pre-prints on our dialects survey, RomanSetu, MT Robustness and Pragmatics Benchamark.
2023