Tadashi Nomoto


2023

pdf bib
RECESS: Resource for Extracting Cause, Effect, and Signal Spans
Fiona Anting Tan | Hansi Hettiarachchi | Ali Hürriyetoğlu | Nelleke Oostdijk | Tommaso Caselli | Tadashi Nomoto | Onur Uca | Farhana Ferdousi Liza | See-Kiong Ng
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

2022

pdf bib
The Fewer Splits are Better: Deconstructing Readability in Sentence Splitting
Tadashi Nomoto
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

In this work, we focus on sentence splitting, a subfield of text simplification, primarily motivated by an unproven idea that if you divide a sentence into pieces, it should become easier to understand. Our primary goal in this paper is to determine whether this is true. In particular, we ask, does it matter whether we break a sentence into two or three? We report on our findings based on Amazon Mechanical Turk. More specifically, we introduce a Bayesian modeling framework to further investigate to what degree a particular way of splitting the complex sentence affects readability, along with a number of other parameters adopted from diverse perspectives, including clinical linguistics, and cognitive linguistics. The Bayesian modeling experiment provides clear evidence that bisecting the sentence leads to enhanced readability to a degree greater than when we create simplification by trisection.

pdf bib
The Causal News Corpus: Annotating Causal Relations in Event Sentences from News
Fiona Anting Tan | Ali Hürriyetoğlu | Tommaso Caselli | Nelleke Oostdijk | Tadashi Nomoto | Hansi Hettiarachchi | Iqra Ameer | Onur Uca | Farhana Ferdousi Liza | Tiancheng Hu
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Despite the importance of understanding causality, corpora addressing causal relations are limited. There is a discrepancy between existing annotation guidelines of event causality and conventional causality corpora that focus more on linguistics. Many guidelines restrict themselves to include only explicit relations or clause-based arguments. Therefore, we propose an annotation schema for event causality that addresses these concerns. We annotated 3,559 event sentences from protest event news with labels on whether it contains causal relations or not. Our corpus is known as the Causal News Corpus (CNC). A neural network built upon a state-of-the-art pre-trained language model performed well with 81.20% F1 score on test set, and 83.46% in 5-folds cross-validation. CNC is transferable across two external corpora: CausalTimeBank (CTB) and Penn Discourse Treebank (PDTB). Leveraging each of these external datasets for training, we achieved up to approximately 64% F1 on the CNC test set without additional fine-tuning. CNC also served as an effective training and pre-training dataset for the two external corpora. Lastly, we demonstrate the difficulty of our task to the layman in a crowd-sourced annotation exercise. Our annotated corpus is publicly available, providing a valuable resource for causal text mining researchers.

pdf bib
Extended Multilingual Protest News Detection - Shared Task 1, CASE 2021 and 2022
Ali Hürriyetoğlu | Osman Mutlu | Fırat Duruşan | Onur Uca | Alaeddin Gürel | Benjamin J. Radford | Yaoyao Dai | Hansi Hettiarachchi | Niklas Stoehr | Tadashi Nomoto | Milena Slavcheva | Francielle Vargas | Aaqib Javid | Fatih Beyhan | Erdem Yörük
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)

We report results of the CASE 2022 Shared Task 1 on Multilingual Protest Event Detection. This task is a continuation of CASE 2021 that consists of four subtasks that are i) document classification, ii) sentence classification, iii) event sentence coreference identification, and iv) event extraction. The CASE 2022 extension consists of expanding the test data with more data in previously available languages, namely, English, Hindi, Portuguese, and Spanish, and adding new test data in Mandarin, Turkish, and Urdu for Sub-task 1, document classification. The training data from CASE 2021 in English, Portuguese and Spanish were utilized. Therefore, predicting document labels in Hindi, Mandarin, Turkish, and Urdu occurs in a zero-shot setting. The CASE 2022 workshop accepts reports on systems developed for predicting test data of CASE 2021 as well. We observe that the best systems submitted by CASE 2022 participants achieve between 79.71 and 84.06 F1-macro for new languages in a zero-shot setting. The winning approaches are mainly ensembling models and merging data in multiple languages. The best two submissions on CASE 2021 data outperform submissions from last year for Subtask 1 and Subtask 2 in all languages. Only the following scenarios were not outperformed by new submissions on CASE 2021: Subtask 3 Portuguese & Subtask 4 English.

2021

pdf bib
Grounding NBA Matchup Summaries
Tadashi Nomoto
Proceedings of the 14th International Conference on Natural Language Generation

The present paper summarizes an attempt we made to meet a shared task challenge on grounding machine-generated summaries of NBA matchups (https://github.com/ehudreiter/accuracySharedTask.git). In the first half, we discuss methods and in the second, we report results, together with a discussion on what feature may have had an effect on the performance.

2020

pdf bib
Meeting the 2020 Duolingo Challenge on a Shoestring
Tadashi Nomoto
Proceedings of the Fourth Workshop on Neural Generation and Translation

What is given below is a brief description of the two systems, called gFCONV and c-VAE, which we built in a response to the 2020 Duolingo Challenge. Both are neural models that aim at disrupting a sentence representation the encoder generates with an eye on increasing the diversity of sentences that emerge out of the process. Importantly, we decided not to turn to external sources for extra ammunition, curious to know how far we can go while confining ourselves to the data released by Duolingo. gFCONV works by taking over a pre-trained sequence model, and intercepting the output its encoder produces on its way to the decoder. c-VAE is a conditional variational auto-encoder, seeking the diversity by blurring the representation that the encoder derives. Experiments on a corpus constructed out of the public dataset from Duolingo, containing some 4 million pairs of sentences, found that gFCONV is a consistent winner over c-VAE though both suffered heavily from a low recall.

2019

pdf bib
Generating Paraphrases with Lean Vocabulary
Tadashi Nomoto
Proceedings of the 12th International Conference on Natural Language Generation

In this work, we examine whether it is possible to achieve the state of the art performance in paraphrase generation with reduced vocabulary. Our approach consists of building a convolution to sequence model (Conv2Seq) partially guided by the reinforcement learning, and training it on the subword representation of the input. The experiment on the Quora dataset, which contains over 140,000 pairs of sentences and corresponding paraphrases, found that with less than 1,000 token types, we were able to achieve performance which exceeded that of the current state of the art.

2016

pdf bib
NEAL: A Neurally Enhanced Approach to Linking Citation and Reference
Tadashi Nomoto
Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL)

2015

pdf bib
MediaMeter: A Global Monitor for Online News Coverage
Tadashi Nomoto
Proceedings of the First Workshop on Computing News Storylines

2014

pdf bib
Lexico-syntactic text simplification and compression with typed dependencies
Mandya Angrosh | Tadashi Nomoto | Advaith Siddharthan
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2009

pdf bib
A Comparison of Model Free versus Model Intensive Approaches to Sentence Compression
Tadashi Nomoto
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf bib
A Generic Sentence Trimmer with CRFs
Tadashi Nomoto
Proceedings of ACL-08: HLT

2005

pdf bib
Bayesian Learning in Text Summarization
Tadashi Nomoto
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2004

pdf bib
Multi-Engine Machine Translation with Voted Language Model
Tadashi Nomoto
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2003

pdf bib
Predictive models of performance in multi-engine machine translation
Tadashi Nomoto
Proceedings of Machine Translation Summit IX: Papers

The paper describes a novel approach to Multi-Engine Machine Translation. We build statistical models of performance of translations and use them to guide us in combining and selecting from outputs from multiple MT engines. We empirically demonstrate that the MEMT system based on the models outperforms any of its component engine.

2002

pdf bib
Supervised Ranking in Open-Domain Text Summarization
Tadashi Nomoto | Yuji Matsumoto
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

1999

pdf bib
Learning Discourse Relations with Active Data Selection
Tadashi Nomoto | Yuji Matsumoto
1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora

1998

pdf bib
Discourse Parsing: A Decision Tree Approach
Tadashi Nomoto | Yuji Matsumoto
Sixth Workshop on Very Large Corpora

1997

pdf bib
Data Reliability and Its Effects on Automatic Abstracting
Tadashi Nomoto | Yuji Matsumoto
Fifth Workshop on Very Large Corpora

1996

pdf bib
Exploiting Text Structure for Topic Identification
Tadashi Nomoto | Yuji Matsumoto
Fourth Workshop on Very Large Corpora

1994

pdf bib
A Grammatico-Statistical Approach to Discourse Partitioning
Tadashi Nomoto | Yoshihiko Nitta
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics

1993

pdf bib
Resolving Zero Anaphora in Japanese
Tadashi Nomoto | Yoshihiko Nitta
Sixth Conference of the European Chapter of the Association for Computational Linguistics