Restart Optimization for Transactional Memory with Lazy Conflict Detection

Cvetanović, Miloš; Radivojević, Zaharije; Milutinović, Veljko

doi:10.1007/s10766-016-0411-z

Restart Optimization for Transactional Memory with Lazy Conflict Detection

Published: 28 March 2016

Volume 45, pages 482–507, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Parallel Programming Aims and scope Submit manuscript

Restart Optimization for Transactional Memory with Lazy Conflict Detection

Download PDF

Miloš Cvetanović¹,
Zaharije Radivojević¹ &
Veljko Milutinović¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

This paper presents an optimization algorithm for transactional memory with lazy conflict detection. The proposed optimization attempts to minimize the execution time of restarted transactions. Minimizing happens during restart, by avoiding the re-execution of a section of a transaction that is unaffected by the restart. The proposed optimization builds on previous research and differs in that it eliminates the need for the prediction of conflicting accesses and introduces incremental context saving. Moreover, the paper introduces analytical models for estimating the execution time of transactions, with and without the restart optimization, that are developed using the continuous-time model. A critical evaluation comparing analytical models with the simulation results is discussed in the paper.

Insights into the Fallback Path of Best-Effort Hardware Transactional Memory Systems

Algorithmic Techniques in STM Design

Architecture and analysis of a dynamically-scheduled real-time memory controller

Article Open access 28 July 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The goal of transactional memory is to simplify multi-core concurrent programming [1]. A programmer demarcates a section of source code as a transaction while the transactional memory provides run-time support. This support includes atomicity, concurrency, and isolation.

Transactional memory achieves data consistency for transactions executing concurrently by providing conflict resolution. A conflict situation arises when two or more transactions access a shared data item, and at least one of them modifies it [2]. Version management defines whether modifications perform directly to the shared data (eager version management), or are buffered as speculative writes (lazy version management). Speculative writes become visible in shared data at commit time after conflict resolution. The conflict detection strategy, depending on detection time, may also be eager or lazy. Eager conflict detection tends to prevent possible conflicts by introducing synchronization mechanisms that stall the offending transactions. Lazy conflict detection tends to react to consequences of the conflicts afterwards by aborting and restarting the offending transactions.

An implementation of transactional memory may be in software, hardware, or both. Hardware implementations provide low overhead by keeping transactions’ internal states in cache memory [3–5], but suffer from possible resource limitations [6]. Software implementations offer the flexibility to run on existing hardware and use operating memory as an unlimited resource to keep transactions’ internal states, but reduce performance [7, 8]. Hybrid implementations attempt to find a compromise by bringing together low overhead and unlimited resources [9, 10].

Transactional memory Coherence and Consistency (TCC) is an example of hardware implementation of transactional memory with lazy version management and lazy conflict detection [11]. The TCC executes transactions on multiple cores with separate level one (L1) caches that use level two (L2) cache as a shared memory. For each transaction, during execution, TCC buffers transactional writes locally as speculative writes. During the commit time of a transaction, TCC commits speculative writes of the transaction to shared memory atomically, and simultaneously broadcasts them to other transactions in the system. The other transactions snoop broadcasts to maintain cache coherence, while TCC aborts and restarts all transactions that accessed data modified by the committing transaction.

A transaction aborts by discarding all speculative writes and restarts by switching to the context corresponding to the transaction start. The main disadvantage of the abort and restart of a transaction is that all progress made in the transaction before a restart occurred is wasted. This paper presents an optimization that attempts to diminish the effects of a restart by reducing the time necessary for re-execution of the restarted transaction. Re-execution time of the restarted transaction can be reduced by exploiting the progress made before the restart occurred. Instead of switching to the start, the transaction switches to the last valid state. The last valid state corresponds to the context of the transaction just before the first access to the shared data that caused the restart. In order to reduced memory requirements for retaining all necessary contexts the proposed optimization saves the contexts incrementally.

In addition to proposing the restart optimization, this paper introduces analytical models for transactional memory with lazy conflict detection. The models cover transactional memories both with and without restart optimization. The analytical models study system performance independently from the time between consecutive memory accesses inside of a transaction by estimating the restart probability, execution time, relative performance gain, and spatial complexity. The model shows that introduction of the restart optimization, may bring the expected relative gain up to 0.33 for transactions that have been restarted at least once.

This paper has the following structure. Section 2 gives an overview of hardware transactional memory from the aspect of this paper. Section 3 describes principles of the proposed optimization, and explains the usage of additional hardware resources and algorithms behind these principles with the TCC example. Section 4 presents the proposed analytical model for restart probability, execution time, relative performance gain, and spatial complexity of TCC with restart optimization. A critical evaluation comparing analytical models with simulation results is discussed in Sects. 5, and 6 reviews the conclusions of this research.

2 Related Work

Restart optimization for lazy version management can be achieved through various activities in execute, commit, and abort phase of a transaction. Execute phase encompass saving initial context, running transaction, and in some cases saving additional contexts for purpose of restart optimization. During commit phase speculative writes are broadcasted and conflicts are detected. Resolution of detected conflicts is conducted in abort phase where offended transactions are aborted and restarted by returning to the appropriate context. Most of work related to the restart optimization suggests either saving intermediate contexts during execute phase of a transaction or by optimizing commit/abort phase of a transaction as shown in Table 1 that gives an overview of solution space for improving lazy conflict detection for analyzed hardware transactional memories.

Table 1 Overview of solution space for improving lazy version management

Full size table

The research presented in [12] suggests saving entire context of a transaction during execute phase each time the transaction accesses to shared data for which there is a prediction that a conflict may occur. Prediction is created according to the history of previous executions of the transaction. The approach presented in this paper eliminates the need for the prediction of conflicting accesses by suggesting saving the contexts before any first access to shared data. The new problem that may arise is a number of contexts that need to be saved. In order to reduced memory requirements for retaining all necessary contexts the approach proposes saving the contexts incrementally.

Most of other related work is complementary to the approach presented in this paper and attempts to either remove conflicts when possible [13] or to optimize commit/abort phases of a transaction to improve the overall execution time of a set of transactions. A possible optimization of a commit/abort phase can occur by reducing the conflict detection time with the introduction of memory block signatures as in Bulk [14], or when dealing with asymmetry in transactional data sets [15], or by using early conflict detection combined with lazy conflict resolution [16]. Introduction of a new transactional memory coherency protocol, FasTM, which uses the L1 cache to save state before a transaction starts, may also optimize the abort phase [17]. In the case of many-core chip multiprocessors, a commit phase could be optimized by introducing a new cache coherence protocol (DiCo-CMP) aimed at reducing core-to-core traffic [18].

In addition to extending previous research regarding restart optimization, this paper introduces analytical models for transactional memory with lazy conflict detection. Other analytical frameworks for performance modeling of software transactional memory [19–24] are based on Markov chain models. The model presented in this paper observes a transaction execution as a continual process represented with a continuous-time model.

3 Restart Optimization

The main principle of the proposed optimization is avoiding re-execution of a section of a transaction that is not affected by a restart. The restarted transaction continues execution from the place where first access to the shared data that caused the restart occurred. Figure 1 gives an example of two transactions labeled T1 and T2 (Fig. 1a), and their schedules on TCC (Fig. 1b) and TCC with restart optimization (RO-TCC) implementations (Fig. 1c). Transaction T1 uses data A, B, C, and D, while transaction T2 uses only C. The example shows that after commit of transaction T2, transaction T1 aborts and restarts. After the restart, transaction T1 starts from the beginning in the case of TCC implementation. In the case of RO-TCC, transaction T1 does not re-execute the section not affected by the committing data C, and starts from the moment when the data C is first accessed. Execution time of a transaction that has no restarts is the same on both TCC and RO-TCC implementations.

Enabling the proposed restart optimization requires certain modifications within TCC. The modifications encompass changes in behavior of the existing TCC hardware and additional hardware resources. The essence of the algorithm behind additional hardware resources is that a processor state has to be saved before the first access to each shared cache line. If a conflict on a shared cache line causes a restart, then a valid state of the processor needs to be restored. The valid state is defined as the state immediately before the first access to the cached line that caused the restart. The state consists of a core’s context and the content of the corresponding L1 cache. The core’s context is saved each time in its entirety, while the content of the L1 cache is saved incrementally by creating copies of appropriate cache lines. A copy of a cache line is created when an access for a write operation is performed, but only if a fetch of some other cache line is performed between the current access and previous access to the cache line. Such situations can be detected using a counter.

For supporting restart optimization, TCC hardware needs additional resources for storing multiple valid states during the entire lifetime of a transaction, as shown on Fig. 2. After each new cache line fetch, a core’s context is added to the Context Buffer. Incremental storing of the content of the L1 cache can be implemented using the Version Cache for keeping track of speculatively modified cache lines. Moreover, identifying when a new version needs to be created can be done with a counter and it can be implemented with an additional line Id field in a tag of each cache line.

The algorithms explaining the behavior of RO-TCC hardware when performing read, write, and restart operations during transaction are shown on Fig. 3. The read operation in case when L1 hit occurs requires only reading data from appropriate L1 cache line, as shown on Fig. 3a. If L1 miss occurs then the Version Cache is checked. The Version Cache may contain multiple versions of the same cache line and therefore in the case of Version Cache hit the most recent version (the version with the highest Id) of the requested line is copied to L1 cache and then accessed. When the Version cache miss occurs, the counter i is incremented and the requested line is fetched from memory and stored in L1 cache with the line Id tag field set to the value of counter i. In the same time while fetching data from memory, the core’s context is saved in the entry i of the Context Buffer.

The write operation, as shown on Fig. 3a, in case when L1 hit occurs checks whether the Id tag field of the requested line is equal or smaller (it cannot be higher) than the value of counter i. If equal (meaning that the requested line is the last fetched line or that it has been modified since the last line had been fetched) then the cache line is updated without modifying the Id tag field. If smaller (meaning that the requested line is neither the last fetched line nor it has been modified since the last line had been fetched) then the cache line is copied to the Version Cache, the Id tag field of the line is set to the value of counter i and the line is updated. If L1 miss occurs then the Version Cache is checked. In the case of Version Cache hit the most recent version (the version with the highest Id) of the requested line is copied to L1 cache and then the same steps are performed as when L1 hit occurs. When the Version cache miss occurs, the same steps are performed as when read operation Version Cache miss occurs with addition that the fetched line is updated.

The restart operation first determines the value of the Id tag field of the oldest line containing the offending address that caused the restart, as shown on Fig. 3b. The determined Id corresponds to the core’s context created when the first access to the offending address occurred and therefore it needs to be restored. Moreover, all cache lines, from both L1 and Version Cache, with Id tag field greater or equal to the determined Id (created after the first access to the offending address) have to be invalidated. At the end, the counter i is set to determined Id.

4 Analytical Model

Instead of the explicit-modeling concurrent execution of many transactions, the analytical model presented in this paper characterizes the representative behavior of a single transaction and the influence that the rest of the concurrently executed transactions have on it. The impact of the other transactions is captured by computing the appropriate parameters. The described approach assumes that all transactions have a similar probabilistic behavior.

The model relies on the following three assumptions:

1.
Restart probability does not depend on where a transaction is, or whether the transaction has already been restarted.
2.
Moments at which a transaction accesses memory locations for the first time are uniformly distributed within the transaction.
3.
Moments at which a transaction accesses memory are uniformly distributed within the transaction (both first and subsequent access).

The model is defined with nine input parameters:

The average transaction execution time L, in the absence of restarts.
The average time between transactions V.
Access-set size K (data accessed whether for read, write, or both).
Write-set size $K_{w}$.
Read-set size $K_{r}$.
Number of subsequent read and write operations B (possibly repeatedly with the same data) that a transaction needs to successfully finish before it commits.
Working-set size U, representing the total number of available transactional data.
Probability that an access is a write operation $P_{w}$, or a read operation $P_{r }= 1-P_{w}$.
The number of cores N.

4.1 Transaction Restart Probability Model

The execution time of a set of transactions depends on how often they restart. The average number of times a transaction retries before it successfully commits could be modeled with the appropriate restart probability R. This section develops the analytical model for calculating R using the input parameters and given assumptions.

Let us consider transaction A that accesses n different memory locations during the interval x. The interval x ranges from 0 to L and represents the time passed from the start of transaction A assuming that no restarts occurred. If in the same moment x some other transaction B with a write-set $K_{w}$ commits, a conflict will not occur if all n memory locations fall into the set $U-K_{w}$. The probability $\hbox {f}_{{\mathrm{nr}}}(n)$ that transaction A will not detect a conflict is determined with the number of different ways that n can be chosen from the set $U-K_{w}$, divided by the number of different ways that n can be chosen from the entire set U:

$$\begin{aligned} \hbox {f}_{{\mathrm{nr}}} (\hbox {n})=\frac{\left( {{\begin{array}{c} {\hbox {U}-\hbox {K}_{\mathrm{w}} } \\ \hbox {n} \\ \end{array} }} \right) }{\left( {{\begin{array}{c} \hbox {U} \\ \hbox {n} \\ \end{array} }} \right) }=\frac{(\hbox {U}-\hbox {n})!\cdot (\hbox {U}-\hbox {K}_{\mathrm{w}} )!}{\hbox {U}!\cdot (\hbox {U}-\hbox {n}-\hbox {K}_{\mathrm{w}} )!} \end{aligned}$$

(1)

Probability that transaction A accessed to exactly n different memory locations during interval x can be modeled as a Poisson random variable P(n) with distribution:

$$\begin{aligned} \hbox {P}(\hbox {n})=\hbox {e}^{-\overline{{{\mathrm{n}}}}}\cdot \frac{(\overline{{{\mathrm{n}}}})^{{\mathrm{n}}}}{\hbox {n}!} \end{aligned}$$

Using assumption 2, the average number of different memory locations accessed during interval x is:

$$\begin{aligned} \overline{{{\mathrm{n}}}}=\mathrm{K}\cdot \frac{\hbox {x}}{\hbox {L}} \end{aligned}$$

Let $\hbox {f}_{{\mathrm{nr}}}(x)$ be the probability that transaction A will not restart at a moment x at which transaction B commits. The probability $\hbox {f}_{{\mathrm{nr}}}(x)$ can be expressed using the probability that transaction A has accessed exactly n different memory locations $\hbox {P}(n)$ and the probability that transaction A will not restart if it has accessed n memory locations $\hbox {f}_{{\mathrm{nr}}}(n)$. Taking into consideration all possible values for n, the expression for $\hbox {f}_{{\mathrm{nr}}}(x)$ is:

$$\begin{aligned} \hbox {f}_{{\mathrm{nr}}} (\mathrm{x})=\sum \limits _{{\mathrm{n}}=0}^{+\infty } {\hbox {P}(\hbox {n})\cdot \hbox {f}_{{\mathrm{nr}}} (\hbox {n})} =\sum \limits _{\mathrm{n}=0}^{+\infty } {\hbox {e}^{-\overline{{{\mathrm{n}}}}}\cdot \frac{(\overline{{\mathrm{n}}})^{{\mathrm{n}}}}{{\mathrm{n}}!}\cdot } \frac{(\hbox {U}-\hbox {n})!\cdot (\hbox {U}-\hbox {K}_{\mathrm{w}} )!}{\hbox {U}!\cdot (\hbox {U}-\hbox {n}-\hbox {K}_{\mathrm{w}} )!} \end{aligned}$$

Let us consider the event where transaction A executes during interval x in which s other independent transactions commit in the moments $x_{1}$, ..., $x_{\mathrm{s}}$. The conditional probability that transaction A will not restart until moment x if it does not restart in any moment $x_{1}$, ..., $x_{\mathrm{s}}$ is:

$$\begin{aligned} \hbox {P}_{{\mathrm{nrs}}} (\hbox {x}_1 ,\hbox {x}_2 ,\ldots ,\hbox {x}_\mathrm{{\mathrm{s}}} |\hbox {x})=\hbox {f}_{{\mathrm{nr}}} (\hbox {x}_1 )\cdot \hbox {f}_{{\mathrm{nr}}} (\hbox {x}_2 )\cdot \ldots \cdot \hbox {f}_{{\mathrm{nr}}} (\hbox {x}_{\mathrm{s}} ) \end{aligned}$$

The mathematical expectation that transaction A will not restart until moment x is:

$$\begin{aligned} \hbox {P}_{{\mathrm{nrs}}} ({\mathrm{x}})= & {} {\mathrm{E}}({\mathrm{P}}_{{\mathrm{nrs}}} (\mathrm{x}_1 ,{\mathrm{x}}_2 ,\ldots ,{\mathrm{x}}_{\mathrm{s}} |{\mathrm{x}}))\\= & {} \int \limits _{-\infty }^{+\infty } {\ldots \int \limits _{-\infty }^{+\infty } {{\mathrm{g}}({\mathrm{x}}_1 ,\mathrm{x}_2 ,\ldots ,{\mathrm{x}}_s |\mathrm{x})\cdot \mathrm{P}_{\mathrm{nrs}} (\mathrm{x}_1 ,\mathrm{x}_2 ,\ldots ,\mathrm{x}_{\mathrm{s}} |{\mathrm{x}}){\mathrm{dx}}_1 {\mathrm{dx}}_2 \ldots {\mathrm{dx}}_{\mathrm{s}} } } \end{aligned}$$

where $\hbox {g}(x_{1}$, ..., $x_{{\mathrm{s}}} {\vert } x)$ is the distribution density function of moments at which other s transactions commit. Following assumption 1 and assuming s independent transactions it can be stated that:

$$\begin{aligned}&{\mathrm{g}}({\mathrm{x}}_1 ,{\mathrm{x}}_2 ,\ldots ,{\mathrm{x}}_{\mathrm{s}} |{\mathrm{x}})= {\mathrm{g}}({\mathrm{x}}_1 |\mathrm{x})\cdot {\mathrm{g}}({\mathrm{x}}_2 |{\mathrm{x}})\cdot \ldots \cdot {\mathrm{g}}({\mathrm{x}}_{\mathrm{s}} |{\mathrm{x}}) \\&{\mathrm{g}}({\mathrm{x}}_{\mathrm{i}} |{\mathrm{x}})\sim {\mathrm{Unif}}(0,{\mathrm{x}}) \end{aligned}$$

Combining the last three expressions:

$$\begin{aligned} {\mathrm{P}}_{{\mathrm{nrs}}} ({\mathrm{x}})= & {} \left( {\int \limits _0^{\mathrm{x}} {\frac{1}{\mathrm{x}}\cdot {\mathrm{f}}_{{\mathrm{nr}}} ({\mathrm{x}}_1 ){\mathrm{dx}}_1 } } \right) \cdot \ldots \cdot \left( {\int \limits _0^{\mathrm{x}} {\frac{1}{{\mathrm{x}}}\cdot \mathrm{f}_{{\mathrm{nr}}} ({\mathrm{x}}_{\mathrm{s}} ){\mathrm{dx}}_{\mathrm{s}} } } \right) \\= & {} \left( {\frac{1}{{\mathrm{x}}}\cdot \int \limits _0^{\mathrm{x}} {{\mathrm{f}}_{{\mathrm{nr}}} (\mathrm{x}_1 ){\mathrm{dx}}_1 } } \right) ^{{\mathrm{s}}}=\left( {\frac{{\mathrm{F}}_{\mathrm{nr}} ({\mathrm{x}})}{{\mathrm{x}}}} \right) ^{{\mathrm{s}}} \end{aligned}$$

where $\hbox {F}_{\mathrm{nr}}(x)$ represents probability function that transaction A will not restart until moment x.

The probability that exactly s other transactions commit during interval x can be modeled as a Poisson random variable $\hbox {P}(s)$ with the distribution:

$$\begin{aligned} {\mathrm{P}}({\mathrm{s}})={\mathrm{e}}^{-\overline{{{\mathrm{s}}}}}\cdot \frac{(\overline{{{\mathrm{s}}}})^{{\mathrm{s}}}}{{\mathrm{s}}!} \end{aligned}$$

where ${\overline{{{\mathrm{s}}}}}$ is the average number of committed transactions during interval x. The value of the parameter ${\overline{{{\mathrm{s}}}}}$ depends on the number of cores N, the average transaction execution time $\hbox {E}(T)$, and the average time between transactions V. During the time interval $\hbox {E}(T)+V$ all N concurrently executed transactions will commit. Among N concurrent transactions, besides transaction A, there are $N-1$ other concurrently executing transactions. The expression for the average number of committed transactions during interval x is:

$$\begin{aligned} \overline{{{\mathrm{s}}}}=(\mathrm{N}-1)\cdot \frac{{\mathrm{x}}}{{\mathrm{E}}({\mathrm{T}})+\mathrm{V}}=\frac{({\mathrm{N}}-1)}{\frac{{\mathrm{E}}({\mathrm{T}})}{{\mathrm{L}}}+\frac{\mathrm{V}}{{\mathrm{L}}}}\cdot \frac{{\mathrm{x}}}{{\mathrm{L}}} \end{aligned}$$

Let $\hbox {P}_{{\mathrm{nr}}}(x)$ be the probability that transaction A will not restart until the moment x. The probability $\hbox {P}_{{\mathrm{nr}}}(x)$ can be expressed using the probability $\hbox {P}_{{\mathrm{nrs}}}(x)$ that transaction A will not restart during interval x until exactly s other transactions commit and the probability $\hbox {P}(s)$ that exactly s other transactions commit without conflicting with transaction A. Taking into consideration all possible values for s, the expression for $\hbox {P}_{{\mathrm{nr}}}(x)$ is:

$$\begin{aligned} {\mathrm{P}}_{{\mathrm{nr}}} ({\mathrm{x}})= & {} \sum \limits _{{\mathrm{s}}=0}^{+\infty } {\mathrm{P}({\mathrm{s}})\cdot {\mathrm{P}}_{{\mathrm{nrs}}} ({\mathrm{x}})} =\sum \limits _{\mathrm{s}=0}^{+\infty } {{\mathrm{e}}^{-\overline{{{\mathrm{s}}}}}\cdot \frac{(\overline{{\mathrm{s}}})^{{\mathrm{s}}}}{{\mathrm{s}}!}\cdot } \left( {\frac{{\mathrm{F}}_{{\mathrm{nr}}} ({\mathrm{x}})}{{\mathrm{x}}}} \right) ^{{\mathrm{s}}}={\mathrm{e}}^{-\overline{{\mathrm{s}}}}\cdot \sum \limits _{{\mathrm{s}}=0}^{+\infty } {\frac{\left( {\frac{\overline{{{\mathrm{s}}}}\cdot {\mathrm{F}}_{{\mathrm{nr}}} ({\mathrm{x}})}{{\mathrm{x}}}} \right) ^{{\mathrm{s}}}}{{\mathrm{s}}!}}\\= & {} {\mathrm{e}}^{-\overline{{{\mathrm{s}}}}}\cdot \mathrm{e}^{\frac{\overline{{{\mathrm{s}}}}\cdot {\mathrm{F}}_{{\mathrm{nr}}} ({\mathrm{x}})}{\mathrm{x}}} \end{aligned}$$

The probability that transaction A will restart, R, is a complement of the event that transaction A will not restart during entire execution time, L, with probability $\hbox {P}_{\mathrm{nr}}(L)$, which gives:

$$\begin{aligned} 1-{\mathrm{R}}={\mathrm{P}}_{{\mathrm{nr}}} ({\mathrm{L}}) \end{aligned}$$

furthermore:

$$\begin{aligned} {\mathrm{R}}=1-{\mathrm{e}}^{-\frac{({\mathrm{N}}-1)}{\frac{{\mathrm{E}}({\mathrm{T}})}{\mathrm{L}}+\frac{{\mathrm{V}}}{{\mathrm{L}}}}\cdot \left( {1-\frac{{\mathrm{F}}_{{\mathrm{nr}}} ({\mathrm{L}})}{{\mathrm{L}}}} \right) } \end{aligned}$$

(2)

4.2 Transaction Execution Time Model for TCC

This section develops the analytical model for calculating the expected transaction execution time, $\hbox {E}(T)$, for TCC implementation using the input parameters and the given assumptions. The expected execution time of a transaction can be calculated using:

$$\begin{aligned} {\mathrm{E}}({\mathrm{T}})=\sum \limits _{{\mathrm{i}}=0}^{+\infty } {{\mathrm{p}}_{\mathrm{i}} \cdot \hat{{\mathrm{T}}}_{\mathrm{i}} } \end{aligned}$$

(3)

where $p_{{\mathrm{i}}}$ is a probability that the transaction restarts exactly i times, and $\hat{{\mathrm{T}}}_{\mathrm{i}}$ is the expected execution time of the transaction, if it restarts exactly i times.

Using assumption 1, the probability $p_{{\mathrm{i}}}$ that the transaction restarts exactly i times can be modeled with the geometric distribution with a parameter $1-R$. The parameter $1-R$ is the success probability in a sequence of Bernoulli trials and represents the probability that the transaction finishes without further restarts:

$$\begin{aligned}&{\mathrm{p}}_0 =(1-{\mathrm{R}}) \nonumber \\&{\mathrm{p}}_1 ={\mathrm{R}}\cdot (1-{\mathrm{R}})\nonumber \\&\ldots \nonumber \\&{\mathrm{p}}_{\mathrm{i}} ={\mathrm{R}}^{{\mathrm{i}}}\cdot (1-{\mathrm{R}}) \end{aligned}$$

(4)

An expected execution time $T_{0}$ of a transaction that does not restart corresponds to the input parameter L:

$$\begin{aligned} {\mathrm{T}}_0 ={\mathrm{L}} \end{aligned}$$

In the case of a transaction restart, the execution time includes the additional time spent during unsuccessful trials. When a transaction restarts exactly once at a location $x_{1}$ where the restart occurred corresponds to a random moment within the transaction and is in the interval from 0 to L. An expected execution time $T_{1 }$ includes the time L needed for the successful completion of the transaction in the absence of restarts and the time interval $x_{1}$ spent during the unsuccessful trial:

$$\begin{aligned} {\mathrm{T}}_1 ={\mathrm{x}}_1 +{\mathrm{L}} \end{aligned}$$

For a transaction that restarts exactly i times during an expected execution time $T_{i}$ includes the time L needed for the successful completion of the transaction in the absence of restarts and all i time intervals spent during unsuccessful trials:

$$\begin{aligned} {\mathrm{{T}}}_{\mathrm{i}} ={\mathrm{x}}_1 +{\mathrm{x}}_2 +\cdots +{\mathrm{x}}_{\mathrm{i}} +\mathrm{L}={\mathrm{L}}+\sum \limits _{{\mathrm{j}}=1}^{\mathrm{i}} {{\mathrm{x}}_{\mathrm{j}} } \end{aligned}$$

All time intervals $x_{{\mathrm{j}}}$ correspond to restart moments within the transaction and are in the interval from 0 to L. The intervals $x_{{\mathrm{j}}}$ are random variables and according to the assumption 1 are mutually independent with uniform distribution. The expected execution time of the transaction, if it restarts exactly i times can be calculated:

$$\begin{aligned}&\hat{{\mathrm{T}}}_{\mathrm{i}} ={\mathrm{E}}({\mathrm{T}}_{\mathrm{i}} )=\mathrm{L}+\sum \limits _{{\mathrm{j}}=1}^{\mathrm{i}} {{\mathrm{E}}({\mathrm{x}}_{\mathrm{j}} )} \nonumber \\&{\mathrm{x}}_{\mathrm{j}} \sim \hbox {Unif}(0,{\mathrm{L}}) \nonumber \\&{\mathrm{E}}({\mathrm{x}}_{\mathrm{j}} )=\int \limits _0^{\mathrm{L}} {\frac{{\mathrm{x}}_{\mathrm{j}} }{{\mathrm{L}}}\cdot {\mathrm{dx}}_{\mathrm{j}} } =\frac{1}{{\mathrm{L}}}\cdot \int \limits _0^{{\mathrm{L}}} {{\mathrm{x}}_{\mathrm{j}} \cdot {\mathrm{dx}}_{\mathrm{j}} } =\frac{{\mathrm{L}}}{2} \nonumber \\&{\mathrm{E}}({\mathrm{T}}_{\mathrm{i}} )={\mathrm{L}}+\sum \limits _{{\mathrm{j}}=1}^{\mathrm{i}} {\frac{{\mathrm{L}}}{2}} ={\mathrm{L}}+{\mathrm{i}}\cdot \frac{{\mathrm{L}}}{2}=\mathrm{L}\cdot \left( {1+\frac{{\mathrm{i}}}{2}} \right) \end{aligned}$$

(5)

Combining Eqs. (3), (4), and (5) the expression for the expected transaction execution time $\hbox {E}(T)$ is:

$$\begin{aligned} {\mathrm{E}}({\mathrm{T}})= & {} \sum \limits _{{\mathrm{i}}=0}^{+\infty } {{\mathrm{R}}^{{\mathrm{i}}}\cdot (1-{\mathrm{R}})\cdot {\mathrm{L}}\cdot \left( {1+\frac{{\mathrm{i}}}{2}} \right) }\nonumber \\ {{\mathrm{E}}}({\mathrm{T}})= & {} \frac{2-{\mathrm{R}}}{2\cdot (1-{\mathrm{R}})}\cdot {\mathrm{L}} \end{aligned}$$

(6)

4.3 Transaction Execution Time Model for RO-TCC

This section develops the analytical model for calculating the expected transaction execution time, $\hbox {E}(T)$, for RO-TCC implementation using the input parameters and given assumptions. In both cases, TCC and RO-TCC, the probability $p_{{\mathrm{i}}}$ that the transaction restarts exactly i times is the same. In the case of RO-TCC, $\hbox {E}(T)$ differs in calculation of the expected execution time $\hat{{\mathrm{T}}}_{{\mathrm{i}}} $ of the transaction, if it restarts exactly i times.

As in TCC, the expected execution time $T_{0}$ of a transaction that does not restart corresponds to the input parameter L:

$$\begin{aligned} {\mathrm{T}}_0 ={\mathrm{L}} \end{aligned}$$

In the case of a transaction restart, the execution time includes the additional time spent during the unsuccessful trials. When a transaction restarts exactly once, the location $x_{1}$ where the restart occurred corresponds to a random moment within the transaction and is in the interval from 0 to L. Contrary to the TCC, where the transaction restarts from the beginning, in the case of RO-TCC a location $y_{1,}$ where the transaction restarts from is a random moment within the time interval of the unsuccessful trial and is in the interval from 0 to $x_{1}$. The expected execution time $T_{1 }$ includes the time interval $x_{1}$ until a restart and the time L needed for the successful completion of the transaction in the absence of restarts diminished for $y_{1}$ from where the transaction continued with execution after the restart:

$$\begin{aligned} {\mathrm{T}}_1 ={\mathrm{x}}_1 +{\mathrm{L}}-{\mathrm{y}}_1 ={\mathrm{x}}_1 -{\mathrm{y}}_1 +{\mathrm{L}} \end{aligned}$$

In the case of a transaction restarting exactly twice, at a location $x_{1 }$ where the first restart occurs and at a location $y_{1}$ from where the transaction continues with the execution, are determined in the same way as the case of the transaction restarting exactly once. After the first restart, the transaction does not execute a section of the transaction that comes before $y_{1}$ and therefore the location $x_{2}$ where the second restart occurs has to be after $y_{1.}$ After the second restart, the location $y_{2}$ where the transaction restarts from is a random moment within the time interval from 0 to $x_{2}$.

$$\begin{aligned} {\mathrm{T}}_2 ={\mathrm{x}}_1 +{\mathrm{x}}_2 -{\mathrm{y}}_1 +{\mathrm{L}}-{\mathrm{y}}_2 =\mathrm{x}_1 -{\mathrm{y}}_1 +{\mathrm{x}}_2 -{\mathrm{y}}_2 +{\mathrm{L}} \end{aligned}$$

For a transaction that restarts exactly i times, the expected execution time $T_{\mathrm{i}}$ includes time L needed for the successful completion of the transaction in the absence of restarts and all i time intervals during the unsuccessful trials:

$$\begin{aligned} {\mathrm{T}}_{\mathrm{i}}= & {} {\mathrm{x}}_1 +{\mathrm{x}}_2 +\cdots +{\mathrm{x}}_{\mathrm{i}} +\mathrm{L}-{\mathrm{y}}_1 -{\mathrm{y}}_2 -\cdots -{\mathrm{y}}_{\mathrm{i}} \\ {{\mathrm{T}}}_{{\mathrm{i}}}= & {} {{\mathrm{L}}}+\sum \limits _{{\mathrm{j}}=1}^{\mathrm{i}} {({\mathrm{x}}_{\mathrm{j}} -{\mathrm{y}}_{\mathrm{j}} )} \\ \end{aligned}$$

All time intervals $x_{{\mathrm{j}}}$ and $y_{{\mathrm{j}}}$ are mutually dependent random variables and are modeled with uniform distributions on the defined interval. The expected execution time of the transaction, if it restarts exactly i times can be calculated:

$$\begin{aligned}&\hat{{\mathrm{T}}}_{\mathrm{i}} ={\mathrm{E}}({\mathrm{T}}_{\mathrm{i}} )=\mathrm{L}+\sum \limits _{{\mathrm{j}}=1}^{\mathrm{i}} {({\mathrm{E}}({\mathrm{x}}_{\mathrm{j}} )-{\mathrm{E}}({\mathrm{y}}_{\mathrm{j}} ))}\nonumber \\&{\mathrm{x}}_1 \sim \hbox {Unif}(0,{\mathrm{L}}) \nonumber \\&{\mathrm{y}}_1 \sim \hbox {Unif}(0,{\mathrm{x}}_1 ) \nonumber \\&\cdots \nonumber \\&{\mathrm{x}}_{\mathrm{i}} \sim \hbox {Unif}({\mathrm{y}}_{\mathrm{i}-1} ,{\mathrm{L}}) \nonumber \\&{\mathrm{y}}_{\mathrm{i}} \sim \hbox {Unif}(0,{\mathrm{x}}_{\mathrm{i}} ) \end{aligned}$$

(7)

The mathematical expectations for the random variables $x_{{\mathrm{i}}}$ and $y_{{\mathrm{i}}}$ can be calculated as:

$$\begin{aligned} {\mathrm{E}}({\mathrm{x}}_{\mathrm{i}} )= & {} \int \limits _0^{\mathrm{L}} {\frac{1}{{\mathrm{L}}}} \int \limits _0^{{\mathrm{x}}_1 } {\frac{1}{{\mathrm{x}}_1 }} \cdots \int \limits _{{\mathrm{y}}_{{\mathrm{i}}-1} }^{\mathrm{L}} {\frac{{\mathrm{x}}_{\mathrm{i}} }{{\mathrm{L}}-{\mathrm{y}}_{{\mathrm{i}}-1} }{\mathrm{dx}}_1 {\mathrm{dy}}_1 \ldots {\mathrm{dx}}_{\mathrm{i}} } \nonumber \\ {\mathrm{E}}({\mathrm{y}}_{\mathrm{i}} )= & {} \int \limits _0^{\mathrm{L}} {\frac{1}{{\mathrm{L}}}} \int \limits _0^{{\mathrm{x}}_1 } {\frac{1}{{\mathrm{x}}_1 }} \cdots \int \limits _{{\mathrm{y}}_{{\mathrm{i}}-1} }^{\mathrm{L}} {\frac{1}{{\mathrm{L}}-\mathrm{y}_{{\mathrm{i}}-1} }} \int \limits _0^{{\mathrm{x}}_\mathrm{i} } {\frac{{\mathrm{y}}_{\mathrm{i}} }{{\mathrm{x}}_{\mathrm{i}} }{\mathrm{dx}}_1 {\mathrm{dy}}_1 \ldots {\mathrm{dx}}_{\mathrm{i}} {\mathrm{dy}}_{\mathrm{i}} } \end{aligned}$$

(8)

Solving Eq. (8) for variable $y_{{\mathrm{i}}}$ shows a dependency between the mathematical expectations for $x_{{\mathrm{i}}}$ and $y_{{\mathrm{i}}}$:

$$\begin{aligned} {\mathrm{E}}({\mathrm{y}}_{\mathrm{i}} )= & {} \int \limits _0^{\mathrm{L}} {\frac{1}{{\mathrm{L}}}} \int \limits _0^{{\mathrm{x}}_1 } {\frac{1}{{\mathrm{x}}_1 }} \cdots \int \limits _{{\mathrm{y}}_{{\mathrm{i}}-1} }^{\mathrm{L}} {\frac{1}{{\mathrm{L}}-\mathrm{y}_{{\mathrm{i}}-1} }} \left( {\int \limits _0^{{\mathrm{x}}_{\mathrm{i}} } {\frac{{\mathrm{y}}_{\mathrm{i}} }{{\mathrm{x}}_{\mathrm{i}} }{\mathrm{dy}}_{\mathrm{i}} } } \right) {\mathrm{dx}}_1 {\mathrm{dy}}_1 \ldots {\mathrm{dx}}_\mathrm{i} \nonumber \\ {\mathrm{E}}({\mathrm{y}}_{\mathrm{i}} )= & {} \frac{1}{2}\cdot \int \limits _0^{\mathrm{L}} {\frac{1}{{\mathrm{L}}}} \int \limits _0^{{\mathrm{x}}_1 } {\frac{1}{{\mathrm{x}}_1 }} \cdots \int \limits _{{\mathrm{y}}_{{\mathrm{i}}-1} }^{\mathrm{L}} {\frac{{\mathrm{x}}_{\mathrm{i}} }{{\mathrm{L}}-{\mathrm{y}}_{{\mathrm{i}}-1} }} {\mathrm{dx}}_1 {\mathrm{dy}}_1 \ldots {\mathrm{dx}}_{\mathrm{i}} \nonumber \\ {\mathrm{E}}({\mathrm{y}}_{\mathrm{i}} )= & {} \frac{1}{2}\cdot {\mathrm{E}}({\mathrm{x}}_{\mathrm{i}} ) \end{aligned}$$

(9)

Solving Eq. (8) for variable $x_{{\mathrm{i}}}$ shows a dependency between the mathematical expectations for $x_{{\mathrm{i}}}$ and $y_{\mathrm{i}-1}$:

$$\begin{aligned} {E}({x}_{i} )= & {} \int \limits _0^{L} {\frac{1}{{L}}} \int \limits _0^{{x}_1 } {\frac{1}{{x}_1 }} \cdots \int \limits _0^{{x}_{{i}-1} } {\frac{1}{{x}_{{i}-1} }} \left( {\int \limits _{{y}_{{i}-1} }^{L} {\frac{{x}_{i} }{{L}-{y}_{{i}-1} }{dx}_{i} } } \right) {dx}_1 {dy}_1 \ldots {dy}_{{i}-1}\nonumber \\ {E}({x}_{i} )= & {} \int \limits _0^{L} {\frac{1}{{L}}} \int \limits _0^{{x}_1 } {\frac{1}{{x}_1 }} \cdots \int \limits _0^{{x}_{{i}-1} } {\frac{1}{{x}_{{i}-1} }} \left( {\frac{1}{{L}-{y}_{{i}-1} }\cdot \left( {\frac{{L}^{2}}{2}-\frac{{y}^{2}_{{i}-1} }{2}} \right) } \right) {dx}_1 {dy}_1 \ldots {dy}_{{i}-1} \nonumber \\ {E}({x}_{i} )= & {} \frac{1}{2}\cdot {L}\cdot \int \limits _0^{L} {\frac{1}{{L}}} \int \limits _0^{{x}_1 } {\frac{1}{{x}_1 }} \cdots \int \limits _0^{{x}_{i-1} } {\frac{1}{{x}_{{i}-1} }} {dx}_1 {dy}_1 \ldots {dy}_{i-1}\nonumber \\&+\,\frac{1}{2}\cdot \int \limits _0^{L} {\frac{1}{{L}}} \int \limits _0^{{x}_1 } {\frac{1}{{x}_1 }} \cdots \int \limits _0^{{x}_{{i}-1} } {\frac{{y}_{{i}-1} }{{x}_{{i}-1} }} {dx}_1 {dy}_1 \ldots {dy}_{{i}-1} \nonumber \\ {E}({x}_{i} )= & {} \frac{1}{2}\cdot {L}+\frac{1}{2}\cdot {E}({y}_{{i}-1} ) \end{aligned}$$

(10)

Combining Eqs. (9) and (10) results in a difference equation expressing the recurrence relation between the mathematical expectations for $x_{{\mathrm{i}}}$ and $x_{{\mathrm{i}}-1}$:

$$\begin{aligned} {\mathrm{E}}({\mathrm{x}}_{\mathrm{i}} )=\frac{1}{2}\cdot {\mathrm{L}}+\frac{1}{4}\cdot {\mathrm{E}}({\mathrm{x}}_{{\mathrm{i}}-1} ) \end{aligned}$$

Solving the difference equation results in an expression for the mathematical expectation of random variable $x_{{\mathrm{i}}}$:

$$\begin{aligned} {\mathrm{E}}({\mathrm{x}}_{\mathrm{i}} )=\frac{2}{3}\cdot \left( {1-\frac{1}{4^{{\mathrm{i}}}}} \right) \cdot {\mathrm{L}} \end{aligned}$$

(11)

According to Eq. (7), and using results from Eqs. (9) and (11) it can be calculated:

$$\begin{aligned} {\mathrm{E}}({\mathrm{T}}_{\mathrm{i}} )=\left( {\frac{8}{9}+\frac{{\mathrm{i}}}{3}+\frac{1}{9\cdot 4^{{\mathrm{i}}}}} \right) \cdot {\mathrm{L}} \end{aligned}$$

(12)

Combining Eqs. (3), (4), and (12), the expression for the expected transaction execution time E(T) on a RO-TCC system:

$$\begin{aligned} {\mathrm{E}}({\mathrm{T}})=\frac{(2-{\mathrm{R}})^{2}}{(1-{\mathrm{R}})\cdot (4-{\mathrm{R}})}\cdot {\mathrm{L}} \end{aligned}$$

(13)

4.4 Model for Expected Relative Performance Gain

This section combines the results for the expected transaction execution time for TCC and RO-TCC with the aim to calculate the expected performance gain after the introduction of restart optimization. The calculation considers the execution time $\hbox {E}(T_{{\mathrm{R}}})$ of those transactions that restart at least once during the execution. The expected execution time, $\hbox {E}(T)$, of a transaction is equal to L in the case of the successful completion of the transaction in the absence of restarts, while in the opposite case, when the transaction restarts at least once, the execution time is equal to $\hbox {E}(T_{{\mathrm{R}}})$. The probability that the transaction restarts, R, is a complement of the event that the transaction does not restart during entire execution time with the probability $1-R$. The relation between $\hbox {E}(T_{{\mathrm{R}}})$ and $\hbox {E}(T)$ can be expressed with:

$$\begin{aligned} {\mathrm{E}}({\mathrm{T}})=(1-{\mathrm{R}})\cdot {\mathrm{L}}+{\mathrm{R}}\cdot {\mathrm{E}}({\mathrm{T}}_{\mathrm{R}} ) \end{aligned}$$

that in the case of TCC can be combined with Eq. (6) to give:

$$\begin{aligned} {\mathrm{E}}({\mathrm{T}}_{{\mathrm{R}}-{\mathrm{TCC}}} )=\frac{3-2\cdot {\mathrm{R}}_{\mathrm{TCC}} }{2\cdot (1-{\mathrm{R}}_{{\mathrm{TCC}}} )}\cdot {\mathrm{L}} \end{aligned}$$

or in the case of RO-TCC can be combined with Eq. (13) to give:

$$\begin{aligned} {\mathrm{E}}({\mathrm{T}}_{{\mathrm{R}}-{\mathrm{ROTCC}}} )=\frac{5-5\cdot \mathrm{R}_{{\mathrm{ROTCC}}} +{\mathrm{R}}_{{\mathrm{ROTCC}}}^2 }{(1-{\mathrm{R}}_{{\mathrm{ROTCC}}} )\cdot (4-{\mathrm{R}}_{{\mathrm{ROTCC}}} )}\cdot {\mathrm{L}} \end{aligned}$$

The expected relative performance gain is calculated as the relative difference of the expected execution times of those transactions that restart at least once during the execution for TCC and RO-TCC:

$$\begin{aligned} {\updelta } _{\mathrm{R}} =\frac{{\mathrm{E}}({\mathrm{T}}_{{\mathrm{R}}\hbox {-}{\mathrm{TCC}}} )-{\mathrm{E}}({\mathrm{T}}_{{\mathrm{R}}\hbox {-}{\mathrm{ROTCC}}} )}{{\mathrm{E}}({\mathrm{T}}_{{\mathrm{R}}\hbox {-}{\mathrm{TCC}}} )} \end{aligned}$$

A possible consequence of restart optimization is the difference of restart probabilities between TCC and RO-TCC. The difference comes from the fact that, after a restart in RO-TCC, the access buffer is not empty, and it can lead to new restarts occurring sooner, making the restart probability higher. However, after a restart in RO-TCC, a restarted transaction has less work to perform before it commits, leaving less time for new restarts to occur, making the restart probability lower. Based on the analytical model for the expected execution times $\hbox {E}(T_{{\mathrm{R}\hbox {-}\mathrm{TCC}}})$ and $\hbox {E}(T_{\mathrm{R}\hbox {-}\mathrm{ROTCC}})$, Fig. 4 shows the expected relative performance gain ${\updelta }_{{\mathrm{R}}}$ for different values of restart probabilities R-TCC and R-ROTCC. Performance degradation could be expected in the case when the restart probability of RO-TCC is substantially higher than the restart probability of TCC.

Assuming that restart probabilities are approximately the same, an approximation that $\hbox {R}_{{\mathrm{TCC}}}=\hbox {R}_{\mathrm{ROTCC}}=\hbox {R}$ can be made. The approximation would enable easy assessment of the expected gain after the introduction of the proposed restart optimization into existing transactional memory systems with an experimentally determined restart probability. With the approximation, the expected performance gain ${\updelta }_{{\mathrm{R}}}$ is:

$$\begin{aligned} {\updelta } _{\mathrm{R}} =\frac{2-{\mathrm{R}}}{(3-2\cdot {\mathrm{R}})\cdot (4-{\mathrm{R}})} \end{aligned}$$

(14)

The results for ${\updelta } _{{\mathrm{R}}}$ demonstrate that in cases when the restart probability remains the same after the introduction of restart optimization, the expected relative gain for transactions that have been restarted at least once is a monotonically increasing function with values in a range from 0.17 to 0.33 for R ranging from 0 to 1. Even though Eq. (14) does not demonstrate a strong sensitivity to small variations in R, Eqs. (6) and (13) imply that for higher values of restart probability even small variations lead to a large deviation in calculation of the expected transaction execution time and relative gain.

4.5 Spatial Complexity Model

Assessing restart optimization effectiveness depends on the complexity of the implementation. The complexity can be viewed from two aspects, temporal and spatial. Temporal complexity is defined as the time needed to store/restore the proper context, while spatial complexity is defined as the space required to store different versions of core’s contexts and cache lines. The influence of time complexity can be disregarded if the time needed for storing/restoring core’s context into/from Context Buffer is less than the time needed for fetching a cache line during a miss.

The spatial complexity is determined by Context Buffer size and Version Cache size. Context Buffer size depends on a number of core’s contexts that need to be stored and is equal to a product of access-set size K and size of a core’s context. Version Cache size depends on a number of versions created for speculatively modified cache lines. A new version of a cache line is created when a write to an older cache line occurs. The worst case for the spatial complexity is when each fetch of a new cache line is followed by write accesses to all cache lines that are already present in the cache, thus requiring new versions for each write. In the average case it can be expected that write accesses are done only to some cache lines. The probability $P_{{\mathrm{wr}}}$ for a write access to a cache line that is already present in the cache can be calculated using probability $P_{{\mathrm{w}}}$ that an access is a write operation. The write probability $P_{{\mathrm{w}}}$ can be expressed as a ratio between the number of write operations $B_{{\mathrm{w}}}$, and the number of subsequent read and write operations that a transaction needs to successfully finish before it commits B:

$$\begin{aligned} {\mathrm{P}}_{\mathrm{w}} =\frac{{\mathrm{B}}_{\mathrm{w}} }{{\mathrm{B}}}=\frac{\mathrm{B}_{{\mathrm{fw}}} +{\mathrm{B}}_{{\mathrm{nfw}}} }{{\mathrm{B}}}=\frac{{\mathrm{B}}_{\mathrm{fw}}^{\prime } +{\mathrm{B}}_{{\mathrm{fw}}}^{{\prime }{\prime }} +{\mathrm{B}}_{\mathrm{nfw}} }{{\mathrm{B}}} \end{aligned}$$

The number of write operations $B_{{\mathrm{w}}}$ can be expressed as the sum of write miss $B_{{\mathrm{fw}}}$ and write hit operations $B_{\mathrm{nfw}}$. Write miss operations can be divided into $B'_{{\mathrm{fw}}}$ operations that write data that will never be read from within the current transaction (e.g., result data), and $B''_{{\mathrm{fw}}}$ operations that write data that may be read from or written to with the probability $P_{{\mathrm{wr}}}$ again within the current transaction (e.g., intermediate data). $B_{{\mathrm{fw}}}$ operations are divided into $B'_{{\mathrm{fw}}}$ and $B''_{{\mathrm{fw}}}$ operations because read-set $K_{{\mathrm{r}}}$ and write-set $K_{{\mathrm{w}}}$ can overlap (0$\le K_{{\mathrm{r}}}\le K$; 0$\le K_{{\mathrm{w}}}\le K$; $K\le K_{{\mathrm{r}}}+K_{{\mathrm{w}}})$:

$$\begin{aligned} {\mathrm{B}}_{{\mathrm{fw}}}^{\prime }= & {} {\mathrm{K}}-{\mathrm{K}}_{\mathrm{r}} \\ {\mathrm{B}}_{{\mathrm{fw}}}^{{\prime }{\prime }}= & {} {\mathrm{P}}_{{\mathrm{wr}}} \cdot ({\mathrm{K}}_{\mathrm{r}} +{\mathrm{K}}_{\mathrm{w}} -{\mathrm{K}}) \\ {\mathrm{B}}_{{\mathrm{nfw}}}= & {} {\mathrm{P}}_{{\mathrm{wr}}} \cdot ({\mathrm{B}}-{\mathrm{K}}) \\ \end{aligned}$$

Therefore the expression for the probability $P_{{\mathrm{wr}}}$ for a write access to a cache line that is already present in the cache is:

$$\begin{aligned} {\mathrm{P}}_{{\mathrm{wr}}} =\frac{{\mathrm{P}}_{\mathrm{w}} \cdot {\mathrm{B}}-(\mathrm{K}-{\mathrm{K}}_{\mathrm{r}} )}{\mathrm{B}-2\cdot {\mathrm{K}}+{\mathrm{K}}_{\mathrm{r}} +{\mathrm{K}}_{\mathrm{w}} } \end{aligned}$$

The required space for holding different versions can be estimated by determining the number of new versions added during a period between two cache misses. The number of new versions can be calculated as the number of different cache lines that are written to during the period excluding the cases of writes to the data that caused the cache miss. Assuming that the cache contains n valid lines during the period, the number of new versions m(n, w) added after w write operations can be calculated according to the Generalized Birthday problem:

$$\begin{aligned} {\mathrm{m}}({\mathrm{n,w}})={\mathrm{n}}\cdot \left( {1-\left( {1-\frac{1}{{\mathrm{n}}}} \right) ^{{\mathrm{w}}}} \right) -\left( {1-\left( {\frac{{\mathrm{n}}-1}{{\mathrm{n}}}} \right) ^{{\mathrm{w}}}} \right) =({\mathrm{n}}-1)\cdot \left( {1-\left( {1-\frac{1}{{\mathrm{n}}}} \right) ^{{\mathrm{w}}}} \right) \end{aligned}$$

The probability that the transaction has performed exactly w writes between two cache misses while the cache contains n valid cache lines can be modeled as the Poisson random variable P(w) with the distribution:

$$\begin{aligned} {\mathrm{P}}({\mathrm{w}})={\mathrm{e}}^{-\overline{{{\mathrm{w}}}}}\cdot \frac{(\overline{{{\mathrm{w}}}})^{{\mathrm{w}}}}{{\mathrm{w}}!} \end{aligned}$$

The average number of write operations ${\overline{{{\mathrm{w}}}}}$ between two misses can be calculated using the probability $P_{{\mathrm{wr}}}$ for the write access to a cache line that is already present in the cache, the number of ‘cache hit’ accesses $B-K$, and the number of ‘cache miss’ accesses K. The expression for the average number of write operations is:

$$\begin{aligned} \overline{{{\mathrm{w}}}}={\mathrm{P}}_{{\mathrm{wr}}} \cdot \frac{{\mathrm{B}}-{\mathrm{K}}}{{\mathrm{K}}} \end{aligned}$$

Let m(n) be the average number of new versions added between two cache misses in the case when the cache contains n valid cache lines. The number m(n) is calculated by using the probability P(w) that there is exactly w write operations between two misses and the average number of new versions added in the case when there are exactly w operations while the cache contains n valid lines m(n, w):

$$\begin{aligned} {\mathrm{m}}({\mathrm{n}})=\sum \limits _{{\mathrm{w}}=0}^{+\infty } {\mathrm{P}}({\mathrm{w}})\cdot {\mathrm{m}}( {\mathrm{n}},{\mathrm{w}}) \end{aligned}$$

The size can be expressed as the sum of all new versions that are added between the two misses:

$$\begin{aligned} {\mathrm{m}}=\sum \limits _{{\mathrm{n}}=1}^\mathrm{K} {{\mathrm{m}}({\mathrm{n}})} \end{aligned}$$

The expression for calculating m can be simplified using the set of approximations:

$$\begin{aligned} {\mathrm{m}}({\mathrm{n}})\approx & {} {\mathrm{m}}({\mathrm{n}},\overline{{{\mathrm{w}}}})=({\mathrm{n}}-1)\cdot \left( {1-\left( {1-\frac{1}{{\mathrm{n}}}} \right) ^{\overline{{{\mathrm{w}}}}}} \right) \approx ({\mathrm{n}}-1)\cdot \left( {1-\left( {1-\frac{\overline{{\mathrm{w}}}}{{\mathrm{n}}}} \right) } \right) \nonumber \\ {\mathrm{m}}\approx & {} \frac{{\mathrm{P}}_{\mathrm{w}} \cdot {\mathrm{B}}-({\mathrm{K}}-\mathrm{K}_{\mathrm{r}} )}{{\mathrm{B}}-2\cdot {\mathrm{K}}+{\mathrm{K}}_{\mathrm{r}} +{\mathrm{K}}_{\mathrm{w}} }\cdot ({\mathrm{B}}-{\mathrm{K}}) \end{aligned}$$

(15)

5 Results of Evaluation

The evaluation of the proposed RO-TCC and comparison with TCC were completed with a simulation based on the model described in Sect. 2, and the appropriate analytical models from Sect. 3. The simulation also served the purpose of verification of the proposed analytical models. In all experiments, the simulation parameters were set to values of a magnitude as they might occur in typical benchmark applications [25, 26]. Table 2 shows the value ranges and default values for all simulation parameters. Each experiment varied only one simulation parameter, while other parameters were set to default values. This section presents experiments that characterize the essence of the proposed solution.

Table 2 Value ranges and default values for all simulation parameters

Full size table

As opposed to the proposed analytical model, which models one transaction and its interaction with the rest of the system, the simulation model implements N transactions explicitly as concurrent processes. Each simulated transaction initially selects a sequence of B memory read and write accesses, with the write probability $\hbox {P}_{{\mathrm{w}}}$. The sequence is maintained throughout the lifetime of the transaction and even after a restart the transaction retries the identical sequence of operations. The memory accesses are uniformly distributed over the entire duration of the transaction L. The moments when a transaction accesses data for the first time is important from the aspect of RO-TCC. For the purposes of the simulation, the moments were uniformly distributed over the entire duration of the transaction. After a commit of a transaction, the core that executed the transaction starts a new transaction after the expiration of the period, V. New transactions start until the number of restarts does not become sufficient for a confidence level of 95 % and a relative error of 5 %.

Beside a simulation based on the model described in Sect. 2, this section also presents an experiment that uses an architectural simulator. The architectural simulation enables characterizing the behavior of the proposed solution by executing the benchmark tests using configurable cache memory and servers a purpose of reevaluation.

5.1 Simulation Analysis of the Transaction Restart Probability Model

The analytical model shows that the transaction restart probability, R, depends on the number of parameters including the number of cores, the percentage of time a core spends in transactions, the size of working, access, write, and read sets. In order to validate the analytical model, the experiments were conducted, varying all parameters. The simulation shows that the predominant parameter for the restart probability calculation is the number of cores. Increasing the number of cores leads to greater congestion that results in an increased number of restarts. In the experiments, the number of cores ranged from 2 to 32 while all other parameters had default values as given in Table 2.

Figure 5 demonstrates how the restart probability for TCC and RO-TCC depends on the number of cores. The bars correspond to the simulation results, while the lines correspond to the analytical results. The figure shows that the analytical model accurately reflects the behavior of simulated systems with an average relative error of 1 %. The figure also shows that the restart probability enters saturation. In the case of parameters that correspond to the average load, the saturation occurs when the number of cores approaches 16. The difference between the simulation results for restart probability, R, obtained for TCC and RO-TCC indicates that in some cases the approximation $\hbox {R}_{{\mathrm{TCC}}}=\hbox {R}_{\mathrm{ROTCC}}={\mathrm{R}}$ can be used.

5.2 Simulation Analysis of the Transaction Execution Time Model

The analytical expression obtained in Sect. 3 represents a dependency between expected transaction execution time $\hbox {E}(T)$ and the transaction restart probability, R. Similar to the simulation analysis of the transaction restart probability, the predominant parameter is the number of cores. Figure 6 shows how the expected transaction execution time for TCC and RO-TCC depends on the number of cores. The transaction execution time overhead resulting from the transaction restarts has approximately linear dependency on the number of cores in the examined range (from 2 to 32). The analytical model accurately reflects the behavior of the simulated system with a relative error of less than 10 %.

5.3 Simulation Analysis of the Expected Performance Gain

Relative gain, ${\updelta }$, represents the expected performance gain after the introduction of restart optimization in TCC system. The relative gain is directly dependent on the expected transaction execution time. Section 4.2 demonstrates that the analytical model for $\hbox {E}(T)$ has a relative error of up to 10 %, which results in calculating the relative gain with a relative error below 20 %. Calculating relative gain depends not only on effects of the superposition of the errors of two models (TCC and RO-TCC), but also on the fact that the transaction restart probability is not the same in these two models.

Figure 7 shows how the expected relative gain depends on the number of cores. Regardless of the differences between the experimental and analytical models, the figure shows that restart optimization brings a relative gain of around 10 %. A similar conclusion about the relative gain can also be derived from Fig. 8, which shows the dependency between the relative gain and the working set size.

5.4 Simulation Analysis of Spatial Complexity

Restart optimization depends on keeping track of multiple versions during transaction execution making space a critical resource for achieving the expected performance gain. The analytical model shows that the spatial complexity depends on the number of parameters including the number of memory accesses, write probability, and size of access, write, and read sets. Figures 9 and 10 show how the required space depends on the write probability and the number of memory accesses. Both figures show approximately linear dependency, which in the case of Fig. 9 can be explained by the fact that the access set size is almost equal to the sum of the write and read set sizes. The analytical model accurately reflects required space and has a relative error of less than 1 %.

5.5 Reevaluation with Architectural Simulation

The reevaluation, performed as a separate experiment, aims to revisit previous findings. In the contrast to the previous experiments, where the model simulator was used, the reevaluation is conducted by using an architectural simulator. The architectural simulator consists of the X86 processor and the configurable cache memory simulator. The X86 processor is part of the JPC simulator that simulates full PC-based system at the behavior level [27]. The JPC full system simulator includes the processor, motherboard, disk controller, graphics controller and user interface, but for the purpose integration with the configurable cache memory only the processor is used [28]. The configurable cache memory simulator operates at the register transfer level, provides statistical data, and logs complete memory access trace. For the purpose of the reevaluation the configurable cache memory simulator has been modified to support TCC and RO-TCC.

The architectural simulations were conducted using the STAMP benchmark with parameters set to default values. The benchmark programs were compiled in debug mode using MS Visual Studio 6.1. The simulations used the configuration of the architectural simulator given in Table 3. During simulations, the sizes of Context Buffer and Version Cache were unlimited in order to determine the maximal performance gain.

Table 3 Configuration used for simulation with the architectural simulator

Full size table

Figures 11 and 12 show the relative gain depending on the benchmark test and a number of cores used in simulation. The relative gain is calculated according to execution time spent in all transactions (Fig. 11) and execution time spent in transactions that restarted at least once (Fig. 12). Regardless of the differences between the model and architectural simulation, Fig. 12 shows that results are within expected ranges. The differences can be attributed to the fact that the transactions in benchmark tests may have different distributions than the one used in analytical model. The results obtained in case of ssca2 test show that the execution time for TCC and RO-TCC had negligible differences due to the fact that the test had low contention level and that all restarts required re-execution from the beginning of transactions. The experiment also showed that RO-TCC had more restarts than TCC, which confirms that a possible consequence of the restart optimization can be a higher restart probability. However, in case of Intruder test although the restart probability was almost double in case of RO-TCC, the restart optimization brought the relative gain of 29 % when observing all transactions, or 49 % when observing only transactions that restarted at least once.

5.6 Threats to Validity

The correctness of the proposed analytical model depends on the justification of the adopted assumptions. For example, assuming that the locations within a transaction where the restart occurred and where the transaction restarts from are random moments with uniform distributions on the defined interval may not model real situations adequately and consequently limits the generalizability of the findings. Similarly, a workload chosen for evaluation purposes was based on the STAMP benchmark, which even though widely used, may not represent the proper choice for some applications. Moreover, the evaluation was based on the results gained with a model simulator, which does not produce a fine level of information, and therefore may represent a threat to the accuracy of the findings. Although the reevaluation with an architectural simulator confirmed that relative gains are in expected ranges, the usage of the architectural simulator based on JPC that represents a single architecture may also be considered as a threat to validity of the findings.

6 Conclusion

The goal of transactional memory is to simplify multi-core concurrent programming. A programmer demarcates a transaction while the transactional memory provides conflict resolution for all concurrently executing transactions. This paper has introduced an optimization for transactional memory with lazy conflict detection. The proposed optimization builds on previous research and differs in that it eliminates the need for the prediction of conflicting accesses and introduces incremental context saving. The application of the proposed optimization on the TCC was analytically modeled using a continuous-time model.

The comparison of the analytical results with the results of simulations demonstrated both the accuracy and the capability of the proposed analytical model to reveal the trend behavior of transactions with and without the restart optimization as well as their relative performance. The simulation results indicate that the optimization does not have a significant influence on the value of the restart probability. However, at higher values of restart probability even small variations lead to large deviations in the calculation of the expected transaction execution time and relative gain. The results for the analytical model show that in cases when the restart probability remains the same after the introduction of restart optimization, the expected relative gain for transactions that have been restarted at least once is a monotonically increasing function with values ranging from 0.17 to 0.33. Even though the proposed optimization and its analytical model were developed to fulfill the specific requirements of the hardware transactional memory with lazy conflict detection, the generality of the presented approach allows its applicability to other types of transactional memory with lazy version management regardless of the conflict detection strategy.

References

McDonald, A., Carlstrom, B.D., Chung, J., Minh, C.C., Chafi, H., Kozyrakis, C., Olukotun, K.: Transactional memory: the hardware–software interface. IEEE Micro 27(1), 67–76 (2007)
Article MATH Google Scholar
Harris, T., Cristal, A., Unsal, O.S., Ayguade, E., Gagliardi, F., Smith, B., Valero, M.: Transactional memory: an overview. IEEE Micro 27(3), 8–29 (2007)
Article Google Scholar
Moore, K.E., Bobba, J., Moravan, M.J., Hill, M.D., Wood, D.A.: LogTM: log-based transactional memory. In: The Twelfth International Symposium on High-Performance Computer Architecture, pp. 254–265 (2006)
Ananian, C.S., Asanovic, K., Kuszmaul, B.C., Leiserson, C.E., Lie, S.: Unbounded transactional memory. IEEE Micro 26(1), 59–69 (2006)
Article Google Scholar
Meunier, Q.L., Pétrot, F.: Lightweight transactional memory systems for NoCs based architectures: design, implementation and comparison of two policies. J. Parallel Distrib. Comput. 70(10), 1024–1041 (2010)
Article MATH Google Scholar
Bobba, J., Moore, K., Volos, H., Yen, L., Hill, M.D., Swift, M., Wood, D.A.: Performance pathologies in hardware transactional memory. IEEE Micro 28(1), 32–41 (2008)
Article Google Scholar
Herlihy, M., Luchangco, V., Moir, M.: A flexible framework for implementing software transactional memory. In: Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA ’06), pp. 253–262 (2006)
Shavit, N., Touitou, D.: Software transactional memory. In: Proceedings of the 14th ACM Symposium on Principles of Distributed Computing, pp. 204–213 (1995)
Vallejo, E., Sanyal, S., Harris, T., Vallejo, F., Beivide, R., Unsal, O., Cristal, A., Valero, M.: Hybrid transactional memory with pessimistic concurrency control. Int. J. Parallel Prog. 39(3), 375–396 (2011)
Article Google Scholar
Sonmez, N., Arcas, O., Pflucker, O., Unsal, O.S., Cristal, A., Hur, I., Singh, S., Valero, M.: TMbox: a flexible and reconfigurable 16-core hybrid transactional memory system. In: IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 146–153 (2011)
Hammond, L., Carlstrom, B.D., Wong, V., Chen, M., Kozyrakis, C., Olukotun, K.: Transactional coherence and consistency: simplifying parallel hardware and software. IEEE Micro 24(6), 92–103 (2004)
Article Google Scholar
Waliullah, M.M., Stenstrom, P.: Intermediate checkpointing with conflicting access prediction in transactional memory systems. In: IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2008), pp. 1–11 (2008)
Waliullah, M.M., Stenstrom, P.: Removal of conflicts in hardware transactional memory systems. Int. J. Parallel Prog. 42(1), 198–218 (2014)
Article Google Scholar
Ceze, L., Tuck, J., Torrellas, J., Cascaval, C.: Bulk disambiguation of speculative threads in multiprocessors. In: Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA ’06), pp. 227–238 (2006)
Quislant, R., Gutierrez, E., Plata, O., Zapata, E.L.: Hardware signature designs to deal with asymmetry in transactional data sets. IEEE Trans. Parallel Distrib. Syst. 24(3), 506–519 (2013)
Article Google Scholar
Tomic, S., Perfumo, C., Kulkarni, C., Armejach, A., Cristal, A., Unsal, O., Harris, T., Valero, M.: EazyHTM: eager-lazy hardware transactional memory. In: Proceedings of the 42nd International Symposium on Microarchitecture, pp. 145–155 (2009)
Lupon, M., Magklis, G., Gonzalez, A.: FASTM: a log-based hardware transactional memory with fast abort recovery. In: Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques (PACT ’09), pp. 293–302 (2009)
Ros, A., Acacio, M., Garcia, J.M.: A direct coherence protocol for many-core chip multiprocessors. IEEE Trans. Parallel Distrib. Syst. 21(12), 1779–1792 (2010)
Article Google Scholar
Heindl, A., Pokam, G.: An analytic framework for performance modeling of software transactional memory. Comput. Netw. 53(8), 1202–1214 (2009)
Article MATH Google Scholar
Heindl, A., Pokam, G., Adl-Tabatabai, A.: An analytic model of optimistic software transactional memory. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2009), pp. 153–162 (2009)
Heindl, A., Pokam, G.: An analytic model for optimistic STM with lazy locking. In: Proceedings of the 16th International Conference on Analytical and Stochastic Modeling Techniques and Applications (ASMTA ’09), pp. 339–353 (2009)
He, Z., Hong, B.: Modeling the run-time behavior of transactional memory. In: IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 307–315 (2010)
Sanzo, P., Di Ciciani, B., Palmieri, R., Quaglia, F., Romano, P.: On the analytical modeling of concurrency control algorithms for software transactional memories: the case of commit-time-locking. Perform. Eval. 69(5), 187–205 (2012)
Article Google Scholar
Poe, J., Chang-Burm, C., Tao, L.: Using analytical models to efficiently explore hardware transactional memory and multi-core co-design. In: The 20th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD ’08), pp. 159–166 (2008)
Minh, C.C., Chung, J., Kozyrakis, C., Olukotun, K.: STAMP: Stanford transactional applications for multi-processing. In: IEEE International Symposium on Workload Characterization (IISWC 2008), pp. 35-46 (2008)
Hughes, C., Poe, J., Qouneh, A., Tao, L.: On the (dis)similarity of transactional memory workloads. In: IEEE International Symposium on Workload Characterization (IISWC 2009), pp. 108–117 (2009)
Newman, R., Dennis, C.: JPC: an x86 PC emulator in pure Java. In: Spinellis, D., Gousios, G. (eds.) Beautiful Architecture: Leading Thinkers Reveal the Hidden Beauty in Software Design, pp. 199–234. O’Reilly Media, Sebastopol (2009)
Radivojevic, Z., Cvetanovic, M.: Integration of the JPC simulator into the configurable cache memory simulator. In: Proceedings of the 54th ETRAN Conference (ETRAN LIV), pp. RT.4.10.1–RT.4.10.4 (2010)

Download references

Acknowledgments

This work was supported by the Ministry of Education, Science, and Technological Development of the Republic of Serbia (III44009, TR32047). The authors gratefully acknowledge the support.

Author information

Authors and Affiliations

School of Electrical Engineering, University of Belgrade, Belgrade, Serbia
Miloš Cvetanović, Zaharije Radivojević & Veljko Milutinović

Authors

Miloš Cvetanović
View author publications
You can also search for this author in PubMed Google Scholar
Zaharije Radivojević
View author publications
You can also search for this author in PubMed Google Scholar
Veljko Milutinović
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zaharije Radivojević.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cvetanović, M., Radivojević, Z. & Milutinović, V. Restart Optimization for Transactional Memory with Lazy Conflict Detection. Int J Parallel Prog 45, 482–507 (2017). https://doi.org/10.1007/s10766-016-0411-z

Download citation

Received: 26 August 2014
Accepted: 15 March 2016
Published: 28 March 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10766-016-0411-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Restart Optimization for Transactional Memory with Lazy Conflict Detection

Abstract

Similar content being viewed by others

Insights into the Fallback Path of Best-Effort Hardware Transactional Memory Systems

Algorithmic Techniques in STM Design

Architecture and analysis of a dynamically-scheduled real-time memory controller

1 Introduction

2 Related Work

3 Restart Optimization

4 Analytical Model

4.1 Transaction Restart Probability Model

4.2 Transaction Execution Time Model for TCC

4.3 Transaction Execution Time Model for RO-TCC

4.4 Model for Expected Relative Performance Gain

4.5 Spatial Complexity Model

5 Results of Evaluation

5.1 Simulation Analysis of the Transaction Restart Probability Model

5.2 Simulation Analysis of the Transaction Execution Time Model

5.3 Simulation Analysis of the Expected Performance Gain

5.4 Simulation Analysis of Spatial Complexity

5.5 Reevaluation with Architectural Simulation

5.6 Threats to Validity

6 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Restart Optimization for Transactional Memory with Lazy Conflict Detection

Abstract

Similar content being viewed by others

Insights into the Fallback Path of Best-Effort Hardware Transactional Memory Systems

Algorithmic Techniques in STM Design

Architecture and analysis of a dynamically-scheduled real-time memory controller

1 Introduction

2 Related Work

3 Restart Optimization

4 Analytical Model

4.1 Transaction Restart Probability Model

4.2 Transaction Execution Time Model for TCC

4.3 Transaction Execution Time Model for RO-TCC

4.4 Model for Expected Relative Performance Gain

4.5 Spatial Complexity Model

5 Results of Evaluation

5.1 Simulation Analysis of the Transaction Restart Probability Model

5.2 Simulation Analysis of the Transaction Execution Time Model

5.3 Simulation Analysis of the Expected Performance Gain

5.4 Simulation Analysis of Spatial Complexity

5.5 Reevaluation with Architectural Simulation

5.6 Threats to Validity

6 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation