publications
* denotes equal contribution
An up-to-date list is available on Google Scholar.
2026
2025
- Worldcuisines: A massive-scale benchmark for multilingual and multicultural visual question answering on global cuisinesIn Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2025
- Exploiting word sense disambiguation in large language models for machine translationIn Proceedings of the First Workshop on Language Models for Low-Resource Languages, 2025
- Romanlens: The role of latent romanization in multilinguality in llmsIn Findings of the Association for Computational Linguistics: ACL 2025, 2025
- IteRABRe: Iterative Recovery-Aided Block ReductionarXiv preprint arXiv:2503.06291, 2025
- Tikzero: Zero-shot text-guided graphics program synthesisIn Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025
- Cammt: Benchmarking culturally aware multimodal machine translationarXiv preprint arXiv:2505.24456, 2025
- Mark My Words: A Robust Multilingual Model for Punctuation in Text and Speech TranscriptsIn Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2025
- CycleDistill: Bootstrapping Machine Translation using LLMs with Cyclical DistillationIn Proceedings of the Twelfth Workshop on Asian Translation (WAT 2025), 2025
- Quality Estimation and Post-Editing Using LLMs For Indic Languages: How Good Is It?In Proceedings of Machine Translation Summit XX: Volume 1, 2025
- BYTF: How Good Are Byte Level N-Gram F-Scores for Automatic Machine Translation Evaluation?In Proceedings of Machine Translation Summit XX: Volume 1, 2025
- When Alignment Hurts: Decoupling Representational Spaces in Multilingual ModelsarXiv preprint arXiv:2508.12803, 2025
- Findings of the first shared task for creole language machine translation at wmt25In Proceedings of the Tenth Conference on Machine Translation, 2025
- RiddleBench: A New Generative Reasoning Benchmark for LLMsarXiv preprint arXiv:2510.24932, 2025
- Findings of the IWSLT 2025 evaluation campaignIn Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025), 2025
- Data and Model Centric Approaches for Expansion of Large Language Models to New languagesIn Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, 2025
- Multilingual Iterative Model Pruning: What Matters?In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2025
2024
- Do Not Worry if You Do Not Have Data: Building Pretrained Language Models Using TranslationesearXiv e-prints, 2024
- NGLUEni: Benchmarking and adapting pretrained language models for nguni languagesIn Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), 2024
- Kreyòl-MT: Building MT for Latin American, Caribbean and colonial African creole languagesIn Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
- SubMerge: Merging Equivalent Subword Tokenizations for Subword Regularized Models in Neural Machine TranslationIn Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), 2024
- NICT’s Cascaded and End-To-End Speech Translation Systems using Whisper and IndicTrans2 for the Indic TaskIn Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024), 2024
- Incorporating Hypernym Features for Improving Low-resource Neural Machine TranslationIn Proceedings of the First International Workshop on Knowledge-Enhanced Machine Translation, 2024
- How effective is synthetic data and instruction fine-tuning for translation with markup using LLMs?In Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track), 2024
- Findings of wmt 2024’s multiindic22mt shared task for machine translation of 22 indian languagesIn Proceedings of the Ninth Conference on Machine Translation, 2024
- Bhasaanuvaad: A speech translation dataset for 14 indian languagesarXiv e-prints, 2024
- Leveraging Adapters for Improved Cross-Lingual Transfer for Low-Resource Creole MTIn Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024), 2024
- Machine translation of Marathi dialects: A case study of KadodiIn Proceedings of the Eleventh Workshop on Asian Translation (WAT 2024), 2024
- Pralekha: An indic document alignment evaluation benchmarkarXiv e-prints, 2024
- Proceedings of the Eleventh Workshop on Asian Translation (WAT 2024)In Proceedings of the Eleventh Workshop on Asian Translation (WAT 2024), 2024
- Linguistically Motivated Neural Machine Translation, 2024
2023
- Indictrans2: Towards high-quality and accessible machine translation models for all 22 scheduled indian languagesarXiv preprint arXiv:2305.16307, 2023
- Overview of the 10th workshop on Asian translationIn Proceedings of the 10th Workshop on Asian Translation, 2023
- A study on the effectiveness of large language models for translation with markupIn Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track, 2023
- DecoMT: Decomposed prompting for machine translation between related languages using large language modelsIn Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
- Developing State-Of-The-Art Massively Multilingual Machine Translation Systems for Related LanguagesIn Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: Tutorial Abstract, 2023
- Large pre-trained language models with multilingual prompt for Japanese natural language tasksIn Proc. 29th Annu. Meet. Conf. Nat. Lang. Process, 2023
- Turning Whisper into Real-Time Transcription System (Version 2). arXiv, 2023
2022
- Self-supervised dynamic programming encoding for neural machine translation, 2022
- MorisienMT: A dataset for Mauritian Creole machine translationarXiv preprint arXiv:2206.02421, 2022
- NICT’s Submission to the WAT 2022 Structured Document Translation TaskIn Proceedings of the 9th Workshop on Asian Translation, 2022
- FeatureBART: Feature based sequence-to-sequence pre-training for low-resource NMTIn Proceedings of the 29th International Conference on Computational Linguistics, 2022
- BERTSeg: BERT based unsupervised subword segmentation for neural machine translationIn Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 2022
- Kreolmorisienmt: A dataset for mauritian creole machine translationIn Findings of the association for computational linguistics: Aacl-ijcnlp 2022, 2022
- A Multilingual Multiway Evaluation Data Set for Structured Document Translation of Asian LanguagesIn Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, 2022
- NICT at MixMT 2022: Synthetic Code-Mixed Pre-training and Multi-way Fine-tuning for Hinglish–English TranslationIn Proceedings of the Seventh Conference on Machine Translation (WMT), 2022
- Indicbart: A pre-trained model for indic languagesIn Proceedings of LREC, 2022
- Indicbart: a pre-trained model for Indic natural language generation of Indic languagesIn , 2022
2021
- Investigating softmax tempering for training neural machine translation modelsIn Proceedings of Machine Translation Summit XVIII: Research Track, 2021
- Studying the impact of document-level context on simultaneous neural machine translationIn Proceedings of Machine Translation Summit XVIII: Research Track, 2021
- Proceedings of the 8th Workshop on Asian Translation (WAT2021)In Proceedings of the 8th Workshop on Asian Translation (WAT2021), 2021
2020
- Domain adaptation of neural machine translation through multistage fine-tuningIn 26th Annual Conference of the Association for Natural Language Processing, 2020
-
- Combining sequence distillation and transfer learning for efficient low-resource neural machine translation modelsIn Proceedings of the Fifth Conference on Machine Translation, 2020
- NICT‘s Submission To WAT 2020: How Effective Are Simple Many-To-Many Neural Machine Translation Models?In Proceedings of the 7th Workshop on Asian Translation, 2020
- ニューラル機械翻訳のための言語知識に基づくマルチタスク事前学習言語処理学会 第 26 回年次大会, 2020
- Extremely low-resource neural machine translation for Asian languages, vol. 34, no. 4, 2020
2019
- NICT’s supervised neural machine translation systems for the WMT19 news translation taskIn Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), 2019
- Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation.In Interspeech, 2019
- Proceedings of the 6th Workshop on Asian TranslationIn Proceedings of the 6th Workshop on Asian Translation, 2019
- NICT’s participation to WAT 2019: Multilingualism and Multi-step Fine-Tuning for Low Resource NMTIn Proceedings of the 6th Workshop on Asian Translation, 2019
- Multilingual Multi-Domain Adaptation Approaches for Neural Machine Translation. CoRR abs/1906.07978 (2019)arXiv preprint arXiv:1906.07978, 2019
- Comparison of SMT and RBMTThe Requirement of Hybridization for Marathi–Hindi MT, 2019
- Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation. Interspeech, 2019Crossref, Web of Science, 2019
2018
- Multilingual and multi-domain adaptation for neural machine translationIn Proceedings of the 24st Annual Meeting of the Association for Natural Language Processing (NLP 2018), 2018
- Exploiting Multilingualism and Transfer Learning for Low Resource Machine Translation, 2018
- NICT’s Participation in WAT 2018: Approaches Using Multilingualism and Recurrently Stacked Layers, 2018
- Overview of the 5th workshop on asian translationIn Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation, 2018
2017
- An empirical study of language relatedness for transfer learning in neural machine translationIn Proceedings of the 31st Pacific Asia conference on language, information and computation, 2017
- Kyoto university mt system description for iwslt 2017In Proceedings of the 14th International Conference on Spoken Language Translation, 2017
- Neural machine translation: Basics, practical aspects and recent trendsIn Proceedings of the IJCNLP 2017, Tutorial Abstracts, 2017
2016
- Sophisticated Lexical Databases-Simplified Usage: Mobile Applications and Browser Plugins For WordnetsIn Proceedings of the 8th Global WordNet Conference (GWC), 2016
- Parallel sentence extraction from comparable corpora with neural network featuresIn Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), 2016
- Kyoto university participation to WAT 2016In Proceedings of the 3rd Workshop on Asian Translation (WAT2016), 2016
2015
- Large-scale japanese-chinese scientific dictionary construction via pivot-based statistical machine translationIn Proceedings of the 21st Annual Meeting of the Association for Natural Language Processing (NLP 2015), 2015
- KyotoEBMT System Description for the 2nd Workshop on Asian TranslationIn Proceedings of the 2nd Workshop on Asian Translation (WAT2015), 2015
- Large-scale dictionary construction via pivot-based statistical machine translation with significance pruning and neural network featuresIn Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation, 2015
- Augmenting Pivot based SMT with word segmentationIn Proceedings of the 12th International Conference on Natural Language Processing, 2015
2014
- Do not do processing, when you can look up: Towards a discrimination net for WSDIn Proceedings of the Seventh Global Wordnet Conference, 2014
- PaCMan: Parallel corpus management workbenchIn Proceedings of the 11th International Conference on Natural Language Processing, 2014
- Tackling Close Cousins: Experiences In Developing Statistical Machine Translation Systems For Marathi And HindiIn Proceedings of the 11th International Conference on Natural Language Processing, 2014
- Anou tradir: Experiences in building statistical machine translation systems for mauritian languages–creole, English, FrenchIn Proceedings of the 11th International Conference on Natural Language Processing, 2014
2013
- A way to break them all: A compound word analyzer for MarathiICON, 2013
- Comparison of SMT and RBMT: The requirement of Hybridization for Marathi–Hindi MTICON, 2013
2012
- Morphological Analyzer for Affix Stacking Languages: A Case Study of Marathi.In COLING (Posters), 2012
- Morphology Analyser for Affix Stacking Languages: a case study in MarathiIn , 2012