Shasha Li


2024

pdf bib
Recommending Missed Citations Identified by Reviewers: A New Task, Dataset and Baselines
Kehan Long | Shasha Li | Pancheng Wang | Chenlong Bao | Jintao Tang | Ting Wang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Citing comprehensively and appropriately has become a challenging task with the explosive growth of scientific publications. Current citation recommendation systems aim to recommend a list of scientific papers for a given text context or a draft paper. However, none of the existing work focuses on already included citations of full papers, which are imperfect and still have much room for improvement. In the scenario of peer reviewing, it is a common phenomenon that submissions are identified as missing vital citations by reviewers. This may lead to a negative impact on the credibility and validity of the research presented. To help improve citations of full papers, we first define a novel task of Recommending Missed Citations Identified by Reviewers (RMC) and construct a corresponding expert-labeled dataset called CitationR. We conduct an extensive evaluation of several state-of-the-art methods on CitationR. Furthermore, we propose a new framework RMCNet with an Attentive Reference Encoder module mining the relevance between papers, already-made citations, and missed citations. Empirical results prove that RMC is challenging, with the proposed architecture outperforming previous methods in all metrics. We release our dataset and benchmark models to motivate future research on this challenging new task.

2022

pdf bib
Few-shot Named Entity Recognition with Entity-level Prototypical Network Enhanced by Dispersedly Distributed Prototypes
Bin Ji | Shasha Li | Shaoduo Gan | Jie Yu | Jun Ma | Huijun Liu | Jing Yang
Proceedings of the 29th International Conference on Computational Linguistics

Few-shot named entity recognition (NER) enables us to build a NER system for a new domain using very few labeled examples. However, existing prototypical networks for this task suffer from roughly estimated label dependency and closely distributed prototypes, thus often causing misclassifications. To address the above issues, we propose EP-Net, an Entity-level Prototypical Network enhanced by dispersedly distributed prototypes. EP-Net builds entity-level prototypes and considers text spans to be candidate entities, so it no longer requires the label dependency. In addition, EP-Net trains the prototypes from scratch to distribute them dispersedly and aligns spans to prototypes in the embedding space using a space projection. Experimental results on two evaluation tasks and the Few-NERD settings demonstrate that EP-Net consistently outperforms the previous strong models in terms of overall performance. Extensive analyses further validate the effectiveness of EP-Net.

pdf bib
Multi-Document Scientific Summarization from a Knowledge Graph-Centric View
Pancheng Wang | Shasha Li | Kunyuan Pang | Liangliang He | Dong Li | Jintao Tang | Ting Wang
Proceedings of the 29th International Conference on Computational Linguistics

Multi-Document Scientific Summarization (MDSS) aims to produce coherent and concise summaries for clusters of topic-relevant scientific papers. This task requires precise understanding of paper content and accurate modeling of cross-paper relationships. Knowledge graphs convey compact and interpretable structured information for documents, which makes them ideal for content modeling and relationship modeling. In this paper, we present KGSum, an MDSS model centred on knowledge graphs during both the encoding and decoding process. Specifically, in the encoding process, two graph-based modules are proposed to incorporate knowledge graph information into paper encoding, while in the decoding process, we propose a two-stage decoder by first generating knowledge graph information of summary in the form of descriptive sentences, followed by generating the final summary. Empirical results show that the proposed architecture brings substantial improvements over baselines on the Multi-Xscience dataset.

2020

pdf bib
Span-based Joint Entity and Relation Extraction with Attention-based Span-specific and Contextual Semantic Representations
Bin Ji | Jie Yu | Shasha Li | Jun Ma | Qingbo Wu | Yusong Tan | Huijun Liu
Proceedings of the 28th International Conference on Computational Linguistics

Span-based joint extraction models have shown their efficiency on entity recognition and relation extraction. These models regard text spans as candidate entities and span tuples as candidate relation tuples. Span semantic representations are shared in both entity recognition and relation extraction, while existing models cannot well capture semantics of these candidate entities and relations. To address these problems, we introduce a span-based joint extraction framework with attention-based semantic representations. Specially, attentions are utilized to calculate semantic representations, including span-specific and contextual ones. We further investigate effects of four attention variants in generating contextual semantic representations. Experiments show that our model outperforms previous systems and achieves state-of-the-art results on ACE2005, CoNLL2004 and ADE.

2010

pdf bib
Comparable Entity Mining from Comparative Questions
Shasha Li | Chin-Yew Lin | Young-In Song | Zhoujun Li
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2008

pdf bib
Understanding and Summarizing Answers in Community-Based Question Answering Services
Yuanjie Liu | Shasha Li | Yunbo Cao | Chin-Yew Lin | Dingyi Han | Yong Yu
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)