publications

* denotes equal contribution

An up-to-date list is available on Google Scholar.

2024

  1. arXiv
    IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages
    Mohammed Safi Ur Rahman Khan, Priyam Mehta, Ananth Sankar, Umashankar Kumaravelan, Sumanth Doddapaneni, Suriyaprasaad G, and 6 more authors
    arXiv preprint, 2024
  2. arXiv
    Do Not Worry if You Do Not Have Data: Building Pretrained Language Models Using Translationese
    Meet Doshi, Raj Dabre, and Pushpak Bhattacharyya
    arXiv preprint, 2024
  3. arXiv
    RomanSetu: Efficiently unlocking multilingual capabilities of Large Language Models models via Romanization
    Jaavid Aktar Husain, Raj Dabre, Aswanth Kumar, Ratish Puduppully, and Anoop Kunchukuttan
    arXiv preprint, 2024
  4. arXiv
    MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction
    Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li, Raj Dabre, Yi Zhao, and 1 more author
    arXiv preprint, 2024
  5. arXiv
    An Empirical Analysis of In-context Learning Abilities of LLMs for MT
    Pranjal A. Chitale, Jay Gala, Varun Gumma, Mitesh M. Khapra, and Raj Dabre
    arXiv preprint, 2024
  6. arXiv
    PUB: A Pragmatics Understanding Benchmark for Assessing LLMs’ Pragmatics Capabilities
    Settaluri Lakshmi Sravanthi, Meet Doshi, Tankala Pavan Kalyan, Rudra Murthy, Pushpak Bhattacharyya, and Raj Dabre
    arXiv preprint, 2024
  7. arXiv
    Natural Language Processing for Dialects of a Language: A Survey
    Aditya Joshi, Raj Dabre, Diptesh Kanojia, Zhuang Li, Haolan Zhan, Gholamreza Haffari, and 1 more author
    arXiv preprint, 2024
  8. arXiv
    Airavata: Introducing Hindi Instruction-tuned LLM
    Jay Gala, Thanmay Jayakumar, Jaavid Aktar Husain, Aswanth Kumar M, Mohammed Safi Ur Rahman Khan, Diptesh Kanojia, and 5 more authors
    arXiv preprint, 2024

2023

  1. arXiv
    CreoleVal: Multilingual Multitask Benchmarks for Creoles
    Heather Lent, Kushal Tatariya, Raj Dabre, Yiyi Chen, Marcell Fekete, Esther Ploeger, and 11 more authors
    arXiv preprint, 2023
  2. SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation
    Haiyue Song, Raj Dabre, Chenhui Chu, Sadao Kurohashi, and Eiichiro Sumita
    ACM Trans. Asian Low-Resour. Lang. Inf. Process. Aug, 2023
  3. SuperShaper: A Pre-Training Approach for Discovering Efficient Transformer Shapes
    Vinod Ganesan, Gowtham Ramesh, Pratyush Kumar, and Raj Dabre
    In Workshop on Efficient Systems for Foundation Models @ ICML2023 Aug, 2023
  4. arXiv
    SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning
    Zhishen Yang, Raj Dabre, Hideki Tanaka, and Naoaki Okazaki
    arXiv preprint Aug, 2023
  5. Low-resource Multilingual Neural Translation Using Linguistic Feature-based Relevance Mechanisms
    Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Masao Utiyama, and Eiichiro Sumita
    ACM Trans. Asian Low-Resour. Lang. Inf. Process. Jul, 2023
  6. TMLR
    IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages
    Jay Gala*, Pranjal A. Chitale*, Raghavan AK, Varun Gumma, Sumanth Doddapaneni, Aswanth Kumar, and 8 more authors
    In Jul, 2023
  7. An Empirical Study of Leveraging Knowledge Distillation for Compressing Multilingual Neural Machine Translation Models
    Varun Gumma, Raj Dabre, and Pratyush Kumar
    In Proceedings of the 24th Annual Conference of the European Association for Machine Translation Jun, 2023
  8. IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation Metrics for Indian Languages
    Ananya Sai B, Tanay Dixit, Vignesh Nagarajan, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra, and 1 more author
    In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Jul, 2023
  9. Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation
    Zhuoyuan Mao, Raj Dabre, Qianying Liu, Haiyue Song, Chenhui Chu, and Sadao Kurohashi
    In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) Jul, 2023
  10. YANMTT: Yet Another Neural Machine Translation Toolkit
    Raj Dabre, Diptesh Kanojia, Chinmay Sawant, and Eiichiro Sumita
    In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations) Jul, 2023
  11. DecoMT: Decomposed Prompting for Machine Translation Between Related Languages using Large Language Models
    Ratish Puduppully, Anoop Kunchukuttan, Raj Dabre, Ai Ti Aw, and Nancy Chen
    In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing Dec, 2023
  12. Proceedings of the 10th Workshop on Asian Translation
    Sep, 2023
  13. Overview of the 10th Workshop on Asian Translation
    Toshiaki Nakazawa, Kazutaka Kinugawa, Hideya Mino, Isao Goto, Raj Dabre, Shohei Higashiyama, and 7 more authors
    In Proceedings of the 10th Workshop on Asian Translation Sep, 2023
  14. MT Metrics Correlate with Human Ratings of Simultaneous Speech Translation
    Dominik Macháček, Ondřej Bojar, and Raj Dabre
    In Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023) Jul, 2023
  15. NICT-AI4B’s Submission to the Indic MT Shared Task in WMT 2023
    Raj Dabre, Jay Gala, and Pranjal Chitale
    In Proceedings of the Eighth Conference on Machine Translation Dec, 2023
  16. Robustness of Multi-Source MT to Transcription Errors
    Dominik Macháček, Peter Polák, Ondřej Bojar, and Raj Dabre
    In Findings of the Association for Computational Linguistics: ACL 2023 Jul, 2023
  17. CTQScorer: Combining Multiple Features for In-context Example Selection for Machine Translation
    Aswanth M, Ratish Puduppully, Raj Dabre, and Anoop Kunchukuttan
    In Findings of the Association for Computational Linguistics: EMNLP 2023 Dec, 2023
  18. A Study on the Effectiveness of Large Language Models for Translation with Markup
    Raj Dabre, Bianka Buschbeck, Miriam Exel, and Hideki Tanaka
    In Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track Sep, 2023
  19. Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation
    Zhuoyuan Mao, Haiyue Song, Raj Dabre, Chenhui Chu, and Sadao Kurohashi
    In Proceedings of the 1st International Workshop on Multilingual, Multimodal and Multitask Language Generation Jun, 2023

2022

  1. BERTSeg: BERT Based Unsupervised Subword Segmentation for Neural Machine Translation
    Haiyue Song, Raj Dabre, Zhuoyuan Mao, Chenhui Chu, and Sadao Kurohashi
    In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) Nov, 2022
  2. FeatureBART: Feature Based Sequence-to-Sequence Pre-Training for Low-Resource NMT
    Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Hideki Tanaka, Masao Utiyama, and Eiichiro Sumita
    In Proceedings of the 29th International Conference on Computational Linguistics Oct, 2022
  3. NICT at MixMT 2022: Synthetic Code-Mixed Pre-training and Multi-way Fine-tuning for Hinglish–English Translation
    Raj Dabre
    In Proceedings of the Seventh Conference on Machine Translation (WMT) Dec, 2022
  4. IndicNLG Benchmark: Multilingual Datasets for Diverse NLG Tasks in Indic Languages
    Aman Kumar, Himani Shrotriya, Prachi Sahu, Amogh Mishra, Raj Dabre, Ratish Puduppully, and 3 more authors
    In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing Dec, 2022
  5. Overview of the 9th Workshop on Asian Translation
    Toshiaki Nakazawa, Hideya Mino, Isao Goto, Raj Dabre, Shohei Higashiyama, Shantipriya Parida, and 8 more authors
    In Proceedings of the 9th Workshop on Asian Translation Oct, 2022
  6. NICT’s Submission to the WAT 2022 Structured Document Translation Task
    Raj Dabre
    In Proceedings of the 9th Workshop on Asian Translation Oct, 2022
  7. IndicBART: A Pre-trained Model for Indic Natural Language Generation
    Raj Dabre, Himani Shrotriya, Anoop Kunchukuttan, Ratish Puduppully, Mitesh Khapra, and Pratyush Kumar
    In Findings of the Association for Computational Linguistics: ACL 2022 May, 2022
  8. When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation?
    Zhuoyuan Mao, Chenhui Chu, Raj Dabre, Haiyue Song, Zhen Wan, and Sadao Kurohashi
    In Findings of the Association for Computational Linguistics: NAACL 2022 Jul, 2022
  9. KreolMorisienMT: A Dataset for Mauritian Creole Machine Translation
    Raj Dabre, and Aneerav Sukhoo
    In Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022 Nov, 2022
  10. A Multilingual Multiway Evaluation Data Set for Structured Document Translation of Asian Languages
    Bianka Buschbeck, Raj Dabre, Miriam Exel, Matthias Huck, Patrick Huy, Raphael Rubino, and 1 more author
    In Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022 Nov, 2022

2021

  1. Proceedings of the 8th Workshop on Asian Translation (WAT2021)
    Aug, 2021
  2. Overview of the 8th Workshop on Asian Translation
    Toshiaki Nakazawa, Hideki Nakayama, Chenchen Ding, Raj Dabre, Shohei Higashiyama, Hideya Mino, and 10 more authors
    In Proceedings of the 8th Workshop on Asian Translation (WAT2021) Aug, 2021
  3. NICT-5’s Submission To WAT 2021: MBART Pre-training And In-Domain Fine Tuning For Indic Languages
    Raj Dabre, and Abhisek Chakrabarty
    In Proceedings of the 8th Workshop on Asian Translation (WAT2021) Aug, 2021
  4. Investigating Softmax Tempering for Training Neural Machine Translation Models
    Raj Dabre, and Atsushi Fujita
    In Proceedings of Machine Translation Summit XVIII: Research Track Aug, 2021
  5. Studying The Impact Of Document-level Context On Simultaneous Neural Machine Translation
    Raj Dabre, Aizhan Imankulova, and Masahiro Kaneko
    In Proceedings of Machine Translation Summit XVIII: Research Track Aug, 2021

2020

  1. Proceedings of the 7th Workshop on Asian Translation
    Dec, 2020
  2. Overview of the 7th Workshop on Asian Translation
    Toshiaki Nakazawa, Hideki Nakayama, Chenchen Ding, Raj Dabre, Shohei Higashiyama, Hideya Mino, and 6 more authors
    In Proceedings of the 7th Workshop on Asian Translation Dec, 2020
  3. NICT‘s Submission To WAT 2020: How Effective Are Simple Many-To-Many Neural Machine Translation Models?
    Raj Dabre, and Abhisek Chakrabarty
    In Proceedings of the 7th Workshop on Asian Translation Dec, 2020
  4. Balancing Cost and Benefit with Tied-Multi Transformers
    Raj Dabre, Raphael Rubino, and Atsushi Fujita
    In Proceedings of the Fourth Workshop on Neural Generation and Translation Jul, 2020
  5. Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation
    Haiyue Song, Raj Dabre, Atsushi Fujita, and Sadao Kurohashi
    In Proceedings of the Twelfth Language Resources and Evaluation Conference May, 2020
  6. JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation
    Zhuoyuan Mao, Fabien Cromieres, Raj Dabre, Haiyue Song, and Sadao Kurohashi
    In Proceedings of the Twelfth Language Resources and Evaluation Conference May, 2020
  7. Pre-training via Leveraging Assisting Languages for Neural Machine Translation
    Haiyue Song, Raj Dabre, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi, and Eiichiro Sumita
    In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop Jul, 2020
  8. Combining Sequence Distillation and Transfer Learning for Efficient Low-Resource Neural Machine Translation Models
    Raj Dabre, and Atsushi Fujita
    In Proceedings of the Fifth Conference on Machine Translation Nov, 2020
  9. Harnessing Cross-lingual Features to Improve Cognate Detection for Low-resource Languages
    Diptesh Kanojia, Raj Dabre, Shubham Dewangan, Pushpak Bhattacharyya, Gholamreza Haffari, and Malhar Kulkarni
    In Proceedings of the 28th International Conference on Computational Linguistics Dec, 2020
  10. Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation
    Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Masao Utiyama, and Eiichiro Sumita
    In Proceedings of the 28th International Conference on Computational Linguistics Dec, 2020
  11. Multilingual Neural Machine Translation
    Raj Dabre, Chenhui Chu, and Anoop Kunchukuttan
    In Proceedings of the 28th International Conference on Computational Linguistics: Tutorial Abstracts Dec, 2020

2019

  1. Exploiting Multilingualism through Multistage Fine-Tuning for Low-Resource Neural Machine Translation
    Raj Dabre, Atsushi Fujita, and Chenhui Chu
    In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) Nov, 2019
  2. Proceedings of the 6th Workshop on Asian Translation
    Nov, 2019
  3. Overview of the 6th Workshop on Asian Translation
    Toshiaki Nakazawa, Nobushige Doi, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Hideya Mino, and 7 more authors
    In Proceedings of the 6th Workshop on Asian Translation Nov, 2019
  4. NICT’s participation to WAT 2019: Multilingualism and Multi-step Fine-Tuning for Low Resource NMT
    Raj Dabre, and Eiichiro Sumita
    In Proceedings of the 6th Workshop on Asian Translation Nov, 2019
  5. NICT’s Supervised Neural Machine Translation Systems for the WMT19 News Translation Task
    Raj Dabre, Kehai Chen, Benjamin Marie, Rui Wang, Atsushi Fujita, Masao Utiyama, and 1 more author
    In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1) Aug, 2019
  6. NICT’s Supervised Neural Machine Translation Systems for the WMT19 Translation Robustness Task
    Raj Dabre, and Eiichiro Sumita
    In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1) Aug, 2019
  7. NICT’s Machine Translation Systems for the WMT19 Similar Language Translation Task
    Benjamin Marie, Raj Dabre, and Atsushi Fujita
    In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2) Aug, 2019
  8. Exploiting Out-of-Domain Parallel Data through Multilingual Transfer Learning for Low-Resource Neural Machine Translation
    Aizhan Imankulova, Raj Dabre, Atsushi Fujita, and Kenji Imamura
    In Proceedings of Machine Translation Summit XVII: Research Track Aug, 2019

2018

  1. Overview of the 5th Workshop on Asian Translation
    Toshiaki Nakazawa, Katsuhito Sudoh, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Hideya Mino, and 4 more authors
    In Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation "1–3 " # "dec", 2018
  2. NICT’s Participation in WAT 2018: Approaches Using Multilingualism and Recurrently Stacked Layers
    Raj Dabre, Anoop Kunchukuttan, Atsushi Fujita, and Eiichiro Sumita
    In Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation "1–3 " # "dec", 2018