Licheng Zhang


2024

pdf bib
IDEATE: Detecting AI-Generated Text Using Internal and External Factual Structures
Quan Wang | Licheng Zhang | Zikang Guo | Zhendong Mao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The effective detection of AI-generated text is a vital principle to ensure responsible use of large language models (LLMs). Previous studies mainly focused on discovering and utilizing internal evidences contained in the text itself to perform the detection, while ignoring external evidences implicated in an established knowledge graph (KG) which may also be key discriminative factors between AI-generated and human-written text. To address this deficiency, we propose IDEATE, a novel hierarchical graph network that utilizes both internal and external factual structures to detect AI-generated text. IDEATE consists of a mention-level subgraph at the bottom to describe internal factual structures of mentioned entities reflected in the input text, and an entity-level subgraph at the top to describe external factual structures of mentioned entities reflected in an external KG. Hierarchical graph convolution is then applied successively on the two subgraphs, through which the two types of factual structures will be embedded into the output and used for the final detection. Extensive experiments on four benchmarking datasets show that IDEATE consistently outperforms current state-of-the-art methods in detecting text generated by various LLMs, ranging from GPT-2 to the more powerful ChatGPT, verifying the necessity and superiority of introducing external evidences for AI-generated text detection.

2023

pdf bib
Random Entity Quantization for Parameter-Efficient Compositional Knowledge Graph Representation
Jiaang Li | Quan Wang | Yi Liu | Licheng Zhang | Zhendong Mao
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Representation Learning on Knowledge Graphs (KGs) is essential for downstream tasks. The dominant approach, KG Embedding (KGE), represents entities with independent vectors and faces the scalability challenge. Recent studies propose an alternative way for parameter efficiency, which represents entities by composing entity-corresponding codewords matched from predefined small-scale codebooks. We refer to the process of obtaining corresponding codewords of each entity as entity quantization, for which previous works have designed complicated strategies. Surprisingly, this paper shows that simple random entity quantization can achieve similar results to current strategies. We analyze this phenomenon and reveal that entity codes, the quantization outcomes for expressing entities, have higher entropy at the code level and Jaccard distance at the codeword level under random entity quantization. Therefore, different entities become more easily distinguished, facilitating effective KG representation. The above results show that current quantization strategies are not critical for KG representation, and there is still room for improvement in entity distinguishability beyond current strategies.

pdf bib
Text Style Transfer with Contrastive Transfer Pattern Mining
Jingxuan Han | Quan Wang | Licheng Zhang | Weidong Chen | Yan Song | Zhendong Mao
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Text style transfer (TST) is an important task in natural language generation, which aims to alter the stylistic attributes (e.g., sentiment) of a sentence and keep its semantic meaning unchanged. Most existing studies mainly focus on the transformation between styles, yet ignore that this transformation can be actually carried out via different hidden transfer patterns. To address this problem, we propose a novel approach, contrastive transfer pattern mining (CTPM), which automatically mines and utilizes inherent latent transfer patterns to improve the performance of TST. Specifically, we design an adaptive clustering module to automatically discover hidden transfer patterns from the data, and introduce contrastive learning based on the discovered patterns to obtain more accurate sentence representations, and thereby benefit the TST task. To the best of our knowledge, this is the first work that proposes the concept of transfer patterns in TST, and our approach can be applied in a plug-and-play manner to enhance other TST methods to further improve their performance. Extensive experiments on benchmark datasets verify the effectiveness and generality of our approach.

2020

pdf bib
Curriculum Learning for Natural Language Understanding
Benfeng Xu | Licheng Zhang | Zhendong Mao | Quan Wang | Hongtao Xie | Yongdong Zhang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

With the great success of pre-trained language models, the pretrain-finetune paradigm now becomes the undoubtedly dominant solution for natural language understanding (NLU) tasks. At the fine-tune stage, target task data is usually introduced in a completely random order and treated equally. However, examples in NLU tasks can vary greatly in difficulty, and similar to human learning procedure, language models can benefit from an easy-to-difficult curriculum. Based on this idea, we propose our Curriculum Learning approach. By reviewing the trainset in a crossed way, we are able to distinguish easy examples from difficult ones, and arrange a curriculum for language models. Without any manual model architecture design or use of external data, our Curriculum Learning approach obtains significant and universal performance improvements on a wide range of NLU tasks.