What happens if you treat ordinal ratings as interval data? Human evaluations in NLP are even more under-powered than you think David M Howcroft author Verena Rieser author 2021-11 text Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing Marie-Francine Moens editor Xuanjing Huang editor Lucia Specia editor Scott Wen-tau Yih editor Association for Computational Linguistics Online and Punta Cana, Dominican Republic conference publication howcroft-rieser-2021-happens 10.18653/v1/2021.emnlp-main.703 https://aclanthology.org/2021.emnlp-main.703/ 2021-11 8932 8939