Jiawen Xie

2025

LOHRec: Leveraging Order and Hierarchy in Generative Sequential Recommendation
Jiawen Xie | Haiyang Wu | Deyi Ji | Yuekui Yang | Shaoping Ma
Findings of the Association for Computational Linguistics: EMNLP 2025

The sequential recommendation task involves predicting the items users will be interested in next based on their past interaction sequence. Recently, sequential recommender systems with generative retrieval have garnered significant attention. However, during training, these generative recommenders focus only on maximizing the prediction probability of the next target item in the temporal sequence, while neglecting awareness of diverse plausible potential items.Although introducing large language models (LLMs) with world knowledge and adding a set of auxiliary tasks that can link item identifiers to their real-world meanings can alleviate this issue, the high inference costs associated with these LLM-based recommenders make them challenging to deploy in practical scenarios. In this paper, we propose a novel learning framework, LOHRec, which leverages the order and hierarchy in generative recommendation using quantized identifiers to further explore the performance ceiling of lightweight generative recommenders. Under fair comparisons with approximate backbone parameter sizes, comprehensive experiments show that all variants of generative recommenders using our framework outperform strong prior baselines across multiple datasets. Furthermore, we empirically demonstrate that LOHRec can efficiently align lightweight generative recommenders with LLM recommendation preferences in low-resource scenarios, further demonstrating its practical utility. Our code repository is available at [https://github.com/xjw-nlp/LOHRec](https://github.com/xjw-nlp/LOHRec).

2024

pdf bib abs

Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers
Jiawen Xie | Pengyu Cheng | Xiao Liang | Yong Dai | Nan Du
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Although dominant in natural language processing, transformer-based models still struggle with long-sequence processing, due to the computational costs of their self-attention operations, which increase exponentially as the length of the input sequence grows. To address this challenge, we propose a **Sim**ple framework to enhance the long-content processing of off-the-shelf pre-trained transformers via three steps: **C**hunk, **A**lign, and **S**elect (SimCAS). More specifically, we first divide each long-sequence input into a batch of chunks, then align the inter-chunk information during the encoding steps, and finally, select the most representative hidden states from the encoder for the decoding process. With our SimCAS, the computation and memory costs can be reduced to linear complexity. In experiments, we demonstrate the effectiveness of the proposed method on various real-world long-text summarization and reading comprehension tasks, in which SimCAS significantly outperforms prior long-sequence processing baselines. The code is at [https://github.com/xjw-nlp/SimCAS](https://github.com/xjw-nlp/SimCAS).

pdf bib abs

GECSum: Generative Evaluation-Driven Sequence Level Contrastive Learning for Abstractive Summarization
Jiawen Xie | Shaoting Zhang | Xiaofan Zhang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

While dominant in abstractive summarization, transformer-based language models with the standard maximum likelihood estimation (MLE) training remain challenged by two discrepancies: the misalignment between token-level training and sequence-level evaluation, and the divergence between teacher-forcing training manner and auto-regressive generation behavior. Recent studies have shown that sequence-level contrastive learning, which utilizes the quality differences between multiple summaries as prior information, can effectively mitigate these issues. However, as certain evaluation metrics often determine the contrastive signals in existing methods, this leads to the model performance aligning with the preferences of these metrics being limited by the evaluation capabilities of these metrics. Inspired by prior works that treat the evaluation of generated text as a text generation problem, we propose a generative evaluation-driven contrastive learning framework, which leverages the semantic understanding capabilities of the abstractive model itself to evaluate summary in reference-based settings. In this way, our method establishes a connection between the model’s reference-based evaluation and reference-free generation scenarios, allowing them to share the benefits of model capability enhancements. Extensive experiments on four summarization datasets demonstrate that our method outperforms the previous state-of-the-art regarding comprehensive performance. Various empirical analyses further substantiate the effectiveness of our method.

2023

pdf bib abs

Alleviating Exposure Bias via Multi-level Contrastive Learning and Deviation Simulation in Abstractive Summarization
Jiawen Xie | Qi Su | Shaoting Zhang | Xiaofan Zhang
Findings of the Association for Computational Linguistics: ACL 2023

Most Transformer based abstractive summarization systems have a severe mismatch between training and inference, i.e., exposure bias. From diverse perspectives, we introduce a simple multi-level contrastive learning framework for abstractive summarization (SimMCS) and a tailored sparse decoder self-attention pattern (SDSA) to bridge the gap between training and inference to improve model performance. Compared with previous contrastive objectives focusing only on the relative order of probability mass assigned to non-gold summaries, SimMCS additionally takes their absolute positions into account, which guarantees that the relatively high-quality (positive) summaries among them could be properly assigned high probability mass, and further enhances the capability of discriminating summary quality beyond exploiting potential artifacts of specific metrics. SDSA simulates the possible inference scenarios of deviation in the training phase to get closer to the ideal paradigm. Our approaches outperform the previous state-of-the-art results on two summarization datasets while just adding fairly low overhead. Further empirical analysis shows our model preserves the advantages of prior contrastive methods and possesses strong few-shot learning ability.

Co-authors

Deyi Ji 1

Xiao Liang 1

Shaoping Ma 1

Qi Su (苏琪, 苏祺, 祺苏,) 1

Haiyang Wu 1

Yuekui Yang 1

Venues

Fix author