Fantastic Gradients and Where to Find Them: Improving Multi-attribute Text Style Transfer by Quadratic Program

Qu, Qian; Wang, Jian; Yang, Kexin; Zhang, Hang; Lv, Jiancheng

doi:10.1007/978-3-031-44699-3_1

Qian Qu¹¹,
Jian Wang¹¹,
Kexin Yang¹¹,
Hang Zhang¹¹ &
…
Jiancheng Lv¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14304))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

737 Accesses

Abstract

Unsupervised text style transfer (TST) is an important task with extensive implications in natural language generation (NLG). A prevalent approach involves editing the latent representations of text, guided by gradients from an attribute classifier. However, in multi-attribute TST, the simultaneous satisfaction of all required attributes remains challenging. In this paper, we unveil that the gradient direction during editing might conflict with certain attribute representations through empirical analysis. To tackle this problem, we introduce a mathematical programming method to impose constraints on the editing direction of multiple attributes, effectively mitigating potential attribute conflicts during the inference stage. Our proposed method considers the potential conflict between different attributes for the first time. Experimental results from the YELP benchmark showcase that our method can effectively improve the multi-attribute-transfer accuracy and quality without compromising single attribute performance. Moreover, our method can be readily integrated with pre-trained auto-encoders, providing an effective and scalable solution for multi-attribute scenarios.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Unsupervised Style Transfer in News Headlines via Discrete Style Space

Reinforced Rewards Framework for Text Style Transfer

Unsupervised Text Style Transfer Through Differentiable Back Translation and Rewards

Keywords

1 Introduction

Text style transfer (TST) seeks to rephrase the source text in a language style specific to certain attributes while preserving content that is independent of these style attributes. As depicted in Table 1, the style in TST could be any attribute requiring modification, such as sentiment, topic, gender, writing style, or a combination thereof [7, 23]. Although substantial strides have been made in multi-attribute text generation [6, 9], much of the existing research concentrates on achieving a balance between content preservation and style manipulation constraints, often ignoring the challenges inherent in satisfying multiple attributes simultaneously. In light of this, we turn to an approach that modifies the latent representations of texts without disentanglement, guided by classifier gradients [12, 26]. This method offers both training efficiency and flexibility in text transformation [12, 19], making it an attractive choice for our exploration of multi-attribute TST. Our empirical study reveals that, during the editing process, the gradient direction may contradict the representations of certain attributes. This could potentially lead to the generation of attribute-incomplete text.

To address this, we introduce a novel method grounded in mathematical programming for constrained multi-attribute text style transfer, based on a latent representation editing model. Specifically, our approach starts with an auto-encoder for sentence self-representation, which could also be a pre-trained model. It then establishes constraints on the editing direction of multiple attributes by guiding modifications to the latent representation via a classifier. Our method, for the first time, considers potential conflicts between different attributes, ensuring that generated text satisfies the required attributes to the greatest extent possible during the editing process. Experimental results on the widely-recognized YELP benchmark [10] demonstrate the efficacy of our method in enhancing text-transfer accuracy across multiple attributes. Our main contributions include:

We identify the limitations and possible reasons for the suboptimal performance of existing latent representation editing methods in multi-attribute scenarios.
We propose a novel, flexible multi-attribute TST model based on the latent variable editing method, which first takes into account the conflict and satisfaction between multiple attributes of generated text.
By utilizing Quadratic Programming (QP) with inequality constraints, we can modify the gradient while preserving its key attributes, resulting in improved overall accuracy for multiple attributes.

Table 1. Several common TST tasks with example sentences.

Full size table

2 Related Work

For the disentanglement-based methods [5, 15, 22], its main idea is to separate the style and content of the original text, then generate the new text with the target style and content representation. The key lies include adversarial learning methods [5, 17, 22], attention mechanism [27, 29], or other method such as Levenshtein Editing [11, 20]. Instead of performing a disentanglement of content and style, entanglement-based methods rewrite the entangled representations directly in a specific manner, such as reinforcement learning [14], back-translation technique [1, 2, 21], and latent vector editing method [12, 19, 26].

Multi-attribute style transfer is an extension of single attribute but is more difficult. There are also studies that focus specifically on multiple attributes. [9] proposed an adversarial training model using word-level conditional architecture and a two-stage training program for multi-attribute generation. [10] implemented multi-attribute style transfer by adjusting the average embedding of each target attribute and using a combination of DAE and back translation techniques. [6] use multiple style-aware language models as discriminators in combination with transformer-based encoder-decoders to enhance their rewriting capabilities.

3 Methodology

We start with a dataset $\mathcal {D} = {(x^{i}, s^{i})}_{i=1}^{n}$, wherein each unit (x, s) denotes a sentence x together with its corresponding attribute vector s. This attribute vector might cover multiple attributes, such as sentiment and gender, which can be represented as $s = \{s_{sent}, s_{gend}\}$. The main aim is to transform a given sentence $x^{i}$, accompanied by its associated attribute $s^{i}$, into a new sentence $\hat{x}$ that aligns with a target attribute $\hat{s}$.

3.1 Preliminary

In pursuing this aim, we turn our attention to the latent representation revision method. The fundamental concept here involves fine-tuning the entangled latent representation of the input sentence to align with the desired attribute. As depicted in Fig. 1(A), the model typically integrates three core components: an encoder $G_{enc}$, a decoder $G_{dec}$, and an attribute classifier C. It’s noteworthy that some studies [12] prefer to utilize multiple classifiers. The process begins with the encoder, which translates a given input sentence x into a latent representation z. This representation integrates both the attribute and the content in a tangled manner. Following this, z is modified to match the target attribute, under the guidance of the classifier. Ultimately, the decoder converts z back into a sentence $\hat{x}$, embodying the desired attribute.

The modification of z using the gradient provided by C [12, 26] is merely one among a host of potential strategies. Another option suggested by [19] involves steering z directly across the surface of the decision boundary. In this work, we have chosen to adhere to the gradient modification approach to execute multi-attribute style transfer.

3.2 Problem Analysis

Our approach modifies the latent representation z of the input sentence to incorporate the target attribute by using the gradient from the attribute classifier C. This gradient guides the search for a new representation ${z}'$ that satisfies the desired attribute ${s}'$ while staying close to the original sentence:

$$\begin{aligned} {z}' =z-\omega _{i}\nabla _{z}\mathcal {L}_{C}(C_{\theta _{C}}(z), {s}'), \end{aligned}$$

(1)

where $\theta _{C}$ and $\omega _{i}$ are the parameters of the classifier C and the adjustment factor, respectively. This process is repeated until the classifier C confirms that ${z}'$ matches the target style.

However, this method may compromise the generation quality when transferring multiple attributes. The attribute classifier provides a joint gradient on all labels ${s}'$ to update the latent representation, i.e.,

$$\begin{aligned} \sum _{s'}\nabla _{z}\mathcal {L}_{C}(C_{\theta _{C}}(z), {s}'). \end{aligned}$$

(2)

Given that the decision surfaces for each attribute in the classification may not completely overlap under multi-attribute style transfer, conflicts might arise between the gradient orientations of different attributes when modifying the latent representation along a specific gradient path. For example, certain steps might draw ${z}'$ nearer to attribute $s_{1}$ while distancing it from attribute $s_{2}$. As depicted in Fig. 1 (B), adjusting the latent representation along the gradient direction of path b, leading to a lower final loss, only meets the target attribute $s_{2}$ criteria. Conversely, path a, despite a higher loss value, accommodates two attributes and thus produces a more desirable outcome. Our experiments confirmed this phenomenon by identifying instances of conflict between the editing gradient orientation and the single-attribute gradient orientation during the transfer process, as discussed in Sect. 4.4.

3.3 Model Architecture

Our framework is designed to be compatible with any auto-encoder (AE) and multi-attribute classifier, making it agnostic to the specific neural network architecture employed. In this work, we employed a transformer-based auto-encoder [24] in conjunction with an MLP-based classifier.

Transformer-Based Auto-Encoder G. Given an input sentence $x=\{x_{1},x_{2},...,x_{m}\}$, the encoder $G_{enc}(\theta _{enc};x)$ transforms it into a continuous latent representation: $z \sim G_{enc}(\theta _{enc};x) = q(z|x),$ while the decoder $G_{dec}(\theta _{dec};z)$ maps the latent representation z back to the sentence, reconstructing it: $ x \sim G_{dec}(\theta _{dec};z) = p(x|z). $ During training, the objective of G is to minimize the reconstruction error. The reconstruction loss is defined as:

$$\begin{aligned} \mathcal {L}_{G}(\theta _{enc},\theta _{dec};x)=-\frac{1}{|s|}\sum _{i=1}^{|s|}q(z|x)\log p(x|z), \end{aligned}$$

(3)

where |s| denotes the number of attributes.

Multiple-Attribute Classifier C. Our classifier is implemented as an MLP consisting of two linear layers and a sigmoid activation function. Specifically, it is defined as $C(z)=MLP(z)=p(s|z)$. The attribute classification loss is:

$$\begin{aligned} \mathcal {L}_{C}(\theta _{C};z,s)=-\frac{1}{|s|}\sum _{i=1}^{|s|}[s_i\log (p(s_i|z))+ \nonumber (1-s_i)\log (1-p(s_i|z))], \end{aligned}$$

where |s| is the number of attributes and $s_i$ is the ground truth label for the i-th attribute.

3.4 Multiple-Attributes Gradient Iterative Modification

Conflict Resolution in Modification. To align ${z}'$ with all target attributes, we adopt a gradient direction detection and conflict resolution strategy inspired by GME [13] when modifying z. We consider not only the gradient conflict of classifiers over z, but also the conflict that arises during the intermediate gradient propagation in C. Therefore, during the inference stage, we detect the gradient direction by computing the inner product of the gradient vectors in each linear layer as the gradient backpropagates in C. This enables us to identify potential directional conflicts between the gradient of any single attribute $g_{i}$ and the overall gradient g^{Footnote 1}. The constraint is satisfied when the gradient g agrees with all desired attribute directions, expressed as follows:

$$\begin{aligned} \left\langle g_{i},g \right\rangle := \left\langle \nabla _{z}\mathcal {L}_{C}(C_{\theta _{C}}(z), {s_{i}}'),\nabla _{z}\mathcal {L}_{C}(C_{\theta _{C}}(z), {s}') \right\rangle \ge 0. \end{aligned}$$

(4)

In cases where a conflict arises, modifying the gradient in question could potentially cause z to deviate from the target property associated with the conflicting direction. To tackle this, we project the gradient g onto the nearest gradient $\tilde{g}$ that fulfills all attributes:

$$\begin{aligned} \textrm{minimize}_{\tilde{g}} \frac{1}{2} {\Vert g -\tilde{g} \Vert }^{2}_{2} \quad s.t. \langle g_i,\tilde{g} \rangle \ge 0. \end{aligned}$$

(5)

To address Eq. 5, which presents a Quadratic Program (QP) with inequality constraints [4, 13], it is useful to return to the primal form:

$$\begin{aligned} \textrm{minimize}_{r} \frac{1}{2} r^{\top }Hr+p^{\top }r\quad s.t. \ Ar\ge b, \end{aligned}$$

(6)

where $H\in \mathbb {R}^{p\times p}$ is a symmetric, positive semi-definite matrix, $p\in \mathbb {R}^p$, $A\in \mathbb {R}^{|s|\times p}$, $b\in \mathbb {R}^{|s|}$. The dual problem of inequality [3] (Eq. 6) is:

$$\begin{aligned} \textrm{minimize}_{u,v} \frac{1}{2} u^{\top }Hu-b^{\top }v\quad s.t. \ A^{\top }v-Hu=p,v\ge 0. \end{aligned}$$

(7)

Drawing from Dorn’s duality theorem [3], if a solution $u^{*}$ and $v^{*}$ is obtained from Eq. 7, then there exists a solution $r^{*}$ to Eq. 6, which satisfies $Hr^{*}=Hr^{*}$.

On this basis, the original QP (Eq. 5) can be expressed as:

$$\begin{aligned} \textrm{minimize}_{w} \frac{1}{2} r^{\top }r-g^{\top }r+\frac{1}{2} g^{\top }g\quad s.t. \ Gr\ge 0, \end{aligned}$$

(8)

where $G=(g,g_{1},...,g_{|s|})$. The dual problem of (Eq. 8) is:

$$\begin{aligned} \textrm{minimize}_{v} \frac{1}{2} v^{\top }GG^{\top }v+g^{\top }G^{\top }v\quad s.t. \ v\ge 0, \end{aligned}$$

(9)

where $u=G^{\top }v+g$ and $g^{\top }g$ is the constant term. This is a QP on |s| attributes. Then once we solve the problem (9) for $v^{*}$, we can get the adjusted new gradient $\tilde{g}=G^{\top }v^{*}+g$.

Following the resolution of conflicts, the adjusted gradient $\tilde{g}$ is propagated to the subsequent layer of the network. This gradient then steers the modifications applied to the latent representation $z'$. The detail of this process is outlined in Algorithm 1.

To preserve the attribute-independent content and linguistic integrity of the latent representation, we confine gradient modifications to large-step gradients. This approach mitigates potential negative impacts on the linguistic fluency and coherence of the decoded text that may result from insignificant style category changes induced by small gradients. It is crucial to note that this procedure is strictly implemented during the inference stage and does not come into play during training.

4 Experiments

4.1 Dataset

We evaluated our approach on the Yelp Review Dataset (YELP) [10], which contains complete reviews along with review sentiment, gender and restaurant category information. We conducted multi-attribute style transfer experiments on the three attributes of sentiment, gender, and restaurant categories. The restaurants here we choose three types: Asian, American, and Mexican.

4.2 Baselines

We compare our model with the most relevant and state-of-the-art models as follows: 1) StyIns [28]. encodes sentences with a certain style to vectors as the style instances and uses the generative flow technique to construct style latent representation based on it, then decodes the input sentence along with the style representation to generate desired text. 2) ControllableAttrTransfer (CAT) [26]. edits the sentence latent representations guided by an attribute classifier until it is evaluated as the target style. 3) MultipleAttrTransfer (MAT) [10]. is based on the Denoising auto-encoding (DAE) [25] model and back translation strategy. 4) MUCOCO [8]. conducts controlled inference from the pre-trained model and formulates the decoding process as a multi-optimization problem. It then generates the target sentences using Lagrange multipliers and gradient-descent based techniques.

4.3 Evaluations Metrics

Automatic Evaluation. Following previous works [7, 17, 22, 23], we use the automatic metrics as follows: 1) Style transfer Accuracy. We train an external classifier to measure the accuracy of the transferred sentences related to the required attribute. Here, we have trained a GPT-based [18] classifier on each attribute (sentiment, gender, category) using the training data. 2) Content preservation. We calculate the BLEU [16] score between the transferred sentence and the original input sentence (self-BLEU), with higher scores meaning more content retention. 3) Fluency. We calculate the perplexity of transferred sentences by a Transformer-Based language model, which is trained with the Training data (the lower the better).

Human Evaluation. We further conduct the human evaluation for transfer results. Following some previous works [6, 12, 20], evaluators are asked to rate sentences according to the three criteria described above with each aspect rated on a 5-point Likert scale. Especially, for Style transfer strength, a score of five is given when the sentence satisfies all the attributes and makes sense, with an equal proportional reduction for missing attributes or not reasonable enough.

4.4 Main Results

Gradient Conflict Detection. Here, we verify our claim that editing z with a gradient direction may conflict with the gradient direction of some attributes. For the well-trained model, we randomly select 200 data from the test set to perform multi-attribute TST and detect the situation between the edit gradient direction and the single-attribute direction during the transfer process. The experimental results show that the gradient direction conflict occurred in 100% of the 200 texts. Notably, not every conflict will lead to an attribute-incomplete generation text, but increasing the corresponding possibility (we verify such a situation in Sect. 4.5).

Table 2. Automatic and human evaluation results for multi-attribute transfer tasks on YELP. Notice that since there is a multi-attribute task, the accuracy here does not simply refer to the correct rate of one attribute, but to the overall attribute, that is, the generated sentence that satisfies all the target attributes.

Full size table

Compare with Baselines. In Table 2, we present the automatic and human evaluation results of both our model and the baseline model. The results indicate that our model outperforms the baseline model in terms of style transfer accuracy, achieving the highest score with a significant improvement compared with baseline models (t-test, $p<0.05$).

Notably, the automatic accuracy of all models is relatively low as it requires all target attributes to be satisfied in a single transferred sentence. In reality, the accuracy of satisfying just one of the attributes would be much higher, as will be demonstrated in detail in the ablation study below. Furthermore, the BLEU and PPL scores are within a normal range. Our model achieves the best results on PPL and the second-best score on BLEU, which could be due to the fact that slightly more of the original text was modified to satisfy additional properties.

In human evaluation, we selected ten sentences from each model for each multi-attribute task and asked five evaluators to rate each comparison sample. In total, we evaluated 70 sentences for each model, taking into account the transfer of each attribute to the other (e.g., positive $\rightarrow $ negative with any other gender and category transfer, positive $\rightarrow $ negative with any gender and category transfer, ...). Our model proved to be the unequivocal leader, surpassing all others in both accuracy scores and average scores. Furthermore, we observed that the accuracy of human ratings significantly exceeded that of automatic evaluations. This can be attributed to the fact that human ratings consider sentences that satisfy one or two target attributes, while automatic evaluations only account for the generation that fulfills all attributes when calculating accuracy. This fills a missing in the perspective of automated evaluation, as transfer results that satisfy two attributes are considered superior to results that satisfy only one or fewer attributes.

Moreover, to observe the characteristics of each model under the multi-attribute task more intuitively, we randomly sampled a set of output sentences and showed them in Table 4.

Table 3. Ablation study results on YELP dataset, comparing the performance of our model without (w_o) and with (w_) the implementation of the gradient conflict adjustment strategy. The accuracy for each target attribute, as well as the overall accuracy, is provided for both scenarios. The ’overall accuracy’ refers to the percentage of the generated text that conforms to all of the requisite target attributes simultaneously. Best viewed in bold.

Full size table

Table 4. Case study of generated text by all models. The blue word indicates relevant text in the output that contain the target sentiment attribute, the red words indicate the target gender attribute and green words indicate category attribute.

Full size table

4.5 Ablation Study

To further validate the reliability of our approach, we conduct an extensive analysis of the key components of our model in this section. In Table 3 we show the comparison in performance of our model without and with implementing the gradient conflict adjustment strategy on the YELP dataset. It can be seen that after applying the gradient programming strategy, the overall accuracy improves by 2.3% points with a statistically significant (t-test, $p<0.05$). And for every single attribute, the correct rate is equal to or greater than before. In particular, the category accuracy experiences a significant improvement (t-test, $p<0.05$) as it is a multi-attribute scenario with three sub-attributes, therefore, our method can also be effective here This finding underscores the effectiveness of our method in improving style transfer accuracy for transferred text in multi-attribute scenarios. Sentiment and gender are binary classes and just a transfer from one class to another, so there is no problem of multiple directions and thereby no improvement by our method. Moreover, the full model also achieves better scores on PPL. This result confirms that our method improves attribute accuracy without sacrificing sentence fluency and, in some cases, even leads to better outcomes.

In addition, we can see that even if we detect conflicts in almost every transfer process, there are still 38% of transferred sentences that satisfy all attributes. After conflict resolution, the overall accuracy has improved. This confirms that the conflicts in the directions of gradients indeed affect the satisfaction of different attributes, but not every conflict will lead to an attribute-incomplete generation text, it increases the corresponding possibility of such occurrences.

5 Conclusions

In this study, we presented a novel mathematical programming approach for coordinating and controlling multi-attribute style transfer, which we evaluated on the YELP dataset. Our experiments demonstrated that this method can effectively enhance the accuracy of multi-attribute transfer, while maintaining the accuracy of each individual attribute. Furthermore, this method allows pre-trained auto-encoders to efficiently transmit language attributes, eliminating the need for additional tuning and enabling faster and more scalable learning. Moving forward, we plan to extend our approach to cross-lingual style transfer tasks and explore ways to optimize the algorithm’s time efficiency.

Notes

1.
For simplicity, we denote all the gradients to be detected in C by $\nabla _{z}\mathcal {L}_{C}(C_{\theta _{C}}(z), {s}')$.

References

Cheng, Y., Gan, Z., Zhang, Y., Elachqar, O., Li, D., Liu, J.: Contextual text style transfer. In: Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16–20 November 2020, vol. EMNLP 2020, pp. 2915–2924 (2020)
Google Scholar
Dai, N., Liang, J., Qiu, X., Huang, X.: Style transformer: unpaired text style transfer without disentangled latent representation. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pp. 5997–6007 (2019)
Google Scholar
Dorn, W.S.: Duality in quadratic programming. Q. Appl. Math. 18, 155–162 (1960)
Article MathSciNet MATH Google Scholar
Frank, M., Wolfe, P., et al.: An algorithm for quadratic programming. Naval Res. Logist. Q. 3(1–2), 95–110 (1956)
Article MathSciNet Google Scholar
Fu, Z., Tan, X., Peng, N., Zhao, D., Yan, R.: Style transfer in text: exploration and evaluation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Goyal, N., Srinivasan, B.V., Natarajan, A., Sancheti, A.: Multi-style transfer with discriminative feedback on disjoint corpus. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, 6–11 June 2021, pp. 3500–3510 (2021)
Google Scholar
Jin, D., Jin, Z., Hu, Z., Vechtomova, O., Mihalcea, R.: Deep learning for text style transfer: a survey. Comput. Linguist. 48, 155–205 (2022)
Article Google Scholar
Kumar, S., Malmi, E., Severyn, A., Tsvetkov, Y.: Controlled text generation as continuous optimization with multiple constraints. Adv. Neural Inf. Process. Syst. 34, 14542–14554 (2021)
Google Scholar
Lai, C.T., Hong, Y.T., Chen, H.Y., Lu, C.J., Lin, S.D.: Multiple text style transfer by using word-level conditional generative adversarial network with two-phase training. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3579–3584 (2019)
Google Scholar
Lample, G., Subramanian, S., Smith, E.M., Denoyer, L., Ranzato, M., Boureau, Y.: Multiple-attribute text rewriting. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019 (2019)
Google Scholar
Li, J., Jia, R., He, H., Liang, P.: Delete, retrieve, generate: a simple approach to sentiment and style transfer. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, 1–6 June 2018, Volume 1 (Long Papers), pp. 1865–1874 (2018)
Google Scholar
Liu, D., Fu, J., Zhang, Y., Pal, C., Lv, J.: Revision in continuous space: Unsupervised text style transfer without adversarial learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 8376–8383 (2020)
Google Scholar
Lopez-Paz, D., Ranzato, M.: Gradient episodic memory for continual learning. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Luo, F., et al.: A dual reinforcement learning framework for unsupervised text style transfer. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, 10–16 August 2019, pp. 5116–5122 (2019)
Google Scholar
Malmi, E., Severyn, A., Rothe, S.: Unsupervised text style transfer with padded masked language models. arXiv preprint arXiv:2010.01054 (2020)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Prabhumoye, S., Tsvetkov, Y., Salakhutdinov, R., Black, A.W.: Style transfer through back-translation. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, 15–20 July 2018, Volume 1: Long Papers, pp. 866–876 (2018)
Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI blog 1, 9 (2019)
Google Scholar
Raedt, M.D., Godin, F., Buteneers, P., Develder, C., Demeester, T.: A simple geometric method for cross-lingual linguistic transformations with pre-trained autoencoders. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event/Punta Cana, 7–11 November 2021, pp. 10108–10114 (2021)
Google Scholar
Reid, M., Zhong, V.: LEWIS: levenshtein editing for unsupervised text style transfer. In: Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, 1–6 August 2021, vol. ACL/IJCNLP 2021, pp. 3932–3944 (2021)
Google Scholar
dos Santos, C.N., Melnyk, I., Padhi, I.: Fighting offensive language on social media with unsupervised text style transfer. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, 15–20 July 2018, Volume 2: Short Papers, pp. 189–194 (2018)
Google Scholar
Shen, T., Lei, T., Barzilay, R., Jaakkola, T.: Style transfer from non-parallel text by cross-alignment. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Toshevska, M., Gievska, S.: A review of text style transfer using deep learning. IEEE Trans. Artif. Intell. 3, 669–684 (2021)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103 (2008)
Google Scholar
Wang, K., Hua, H., Wan, X.: Controllable unsupervised text attribute transfer via editing entangled latent representation. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Xu, J., et al.: Unpaired sentiment-to-sentiment translation: a cycled reinforcement learning approach. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, 15–20 July 2018, Volume 1: Long Papers, pp. 979–988 (2018)
Google Scholar
Yi, X., Liu, Z., Li, W., Sun, M.: Text style transfer via learning style instance supported latent space. In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 3801–3807 (2021)
Google Scholar
Zhang, Y., Xu, J., Yang, P., Sun, X.: Learning sentiment memories for sentiment modification without parallel data. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31–November 4, 2018, pp. 1103–1108 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

No.24, South Section of 1st Ring Road, Chengdu, China
Qian Qu, Jian Wang, Kexin Yang, Hang Zhang & Jiancheng Lv

Authors

Qian Qu
View author publications
You can also search for this author in PubMed Google Scholar
Jian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kexin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiancheng Lv
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiancheng Lv .

Editor information

Editors and Affiliations

Emory University, Atlanta, GA, USA
Fei Liu
Microsoft Research Asia, Beijing, China
Nan Duan
Soochow University, Suzhou, China
Qingting Xu
Soochow University, Suzhou, China
Yu Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qu, Q., Wang, J., Yang, K., Zhang, H., Lv, J. (2023). Fantastic Gradients and Where to Find Them: Improving Multi-attribute Text Style Transfer by Quadratic Program. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14304. Springer, Cham. https://doi.org/10.1007/978-3-031-44699-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-44699-3_1
Published: 08 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44698-6
Online ISBN: 978-3-031-44699-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

Fantastic Gradients and Where to Find Them: Improving Multi-attribute Text Style Transfer by Quadratic Program

Abstract

Similar content being viewed by others

Unsupervised Style Transfer in News Headlines via Discrete Style Space

Reinforced Rewards Framework for Text Style Transfer

Unsupervised Text Style Transfer Through Differentiable Back Translation and Rewards

Keywords

1 Introduction

2 Related Work