5 Load buffering - Tagungsband des 35ten Jahrestreffens der GI-Fachgruppe "Programmiersprachen

In order to further relax the memory model described in the previous section, we need to introduce load buffering, also referred to asdelayed reads. The presence of load buffering, however, gives rise to a number of complications. For one, reads have to be executedasynchronously:a thread reading a value has to proceed even if the value is not yet available. Such “non-blocking” reads are problematic in that a thread’s continuation after an asynchronous read typically depends on the value read. Processp₀in the code of Figure 4a and both processes in Figure 5a contain examples of this type of depen-dency. In these examples, the dependency is a puredata dependency: a “subsequent”

instruction writes the value read back to memory.

Control dependencies, which have not yet been discussed in connection with the examples from Section 3, are more complicated than data dependencies. A control de-pendency happens, for example, when the condition of an if-then-else involves the value read by a prior load-instruction. If the load is executed “out-of-order” and after the con-ditional expression, then both branches are in principle possible. This superposition leads to a form ofspeculativeexecution.

The asynchronous execution of a load instruction is similar to a simplefuture-like mechanism; one involvingimplicitfutures. While futures allow the asynchronous exe-cution of arbitrary sequential code that produces an eventual value, asynchronous reads execute only theload-instruction and nothing else. We say that the load is done, or the “future” is resolved, when a concrete value can be passed back to issuer of the asynchronous read.

Asynchronous loads and futures differ in terms of synchronization. Conventionally, futures involve a form of synchronization that is sometimes called “wait-by-necessity,”

meaning that the caller blocks when accessing results that are not yet available. With its strict separation between synchronization (achieved by sending-to and receive-from channels) and communication, reads and writes arecompletely devoidof synchroniza-tion. Informally, the lack of synchronization can be seen as a

“don’t-wait-not-even-ample, the “preserved-program-order” edge in Figure 4b would be represented in our model as→po-edge instead.

To delay the read and thereby account for out-of-order execution, configurations are extended with read events (in addition to the write events already discussed in the pre-vious section). The semantics is also extended to deal with references to future values.

For example, before adding delayed reads, store instructions took the formz:=vwhere vdenotes a user-level data value such as an integer. Once delays are added, a new form of the store instruction is needed:z:=nwithna “future reference” to an asynchronous read. Note that, as shown in rule R-READ, future references are writtenn[(σ,loadz)].

When it comes to ensuring that “outdated” or shadowed writes are not observable by a read instruction (see equation (2)), the semantics with asynchronous reads differs from the semantics without them. In the presence of asynchronous reads, the future referencen[(σ,loadz)] is responsible for making sure that shadowed writes are not observed. In other words, the future reference must ensure that the value resulting from an asynchronous read does not come from a write that is shadowed to the reader. In the semantics without asynchronous reads, a thread’s shadow set was used to exclude visibility of shadowed writes. This exclusion is captured by the corresponding premise of the R-READrule of Table 1.

When it comes to causality (see equation (1)), the semantics lacking asynchronous reads trivially guarantees that a read instruction cannot observe writes that happen-after.

Once asynchronous reads are introduced, however, additional measures must be taken so causality is not violated. Note that theσ-information of the reader is of no help here:

in the operational execution, the write event is generatedafterthe asynchronous issuing ofn[(σ,loadz)]. Consequently, the subsequent write is unknown ton[(σ,loadz)]and, therefore, the write is not mentioned inσ. In general, the fact that a read event does not “know”about a write event is useless to determine whether the read can observe said write event (as both is consistent with being not mentioned in the localσ of the read event). Instead, it is from the perspective of the write-event that one can determine whether a read can observe the write: if the write event “knows” that a read event has

“happened-before,” then the write should not be made observable to the read. Thus, write events now also carry happens-before and shadow information and are of the formn(|σ,z:=v|). Note also that the conditions corresponding to equations (1) and (2) are captured by the premises of rule R-OBSfrom Table 3.

The implicit future references not only occur on the right-hand side of the let-construct (see rule R-READ), but can also occur as stand-in for the value to be loaded from memory. In particular, assignments may take now the formz:=nwherenis a reference to a value that has not yet been resolved. Consequently, write events may take the formn₁(|σ,z:=n₂|)and delayed read constructs can take the form of a reference to a reference:n1[(n2)]. As references are resolved into concrete values, the semantics must have rules to shorten chains of indirections. For instance,n₁(|σ,z:=n₂|)kn₂[(v)]−→ n1(|σ,z:=v|)kn2[(v)]. Additionally, the semantics needs to deal with compound expres-sions containing future references, which likewise cannot be immediately evaluated.

The corresponding dereferencing rules are left out in this discussion. Also left out of

R-READ

phσ,letr=loadzinti −→ phσ,letr= ninti kn[(σ,loadz)]

σ= (Ehb,Es) σ⁰= (Ehb+ (n,z),Es+Ehb(z)) fresh(n)

R-WRITE

phσ,z:=v;ti −→ (phσ⁰,ti kn(|σ,z:=v|)) σ1= ( ,E_s¹) σ2= (E_hb² , ) n2∈/E_s¹ n1∈/E_hb²

R-OBS

n1[(σ1,loadz)]kn2(|σ2,z:=v|) −→ n1[(v)]kn2(|σ2,z:=v|)

Table 3: Operational semantics: shared memory

this note is the exact treatment ofcontroldependencies in the presence of delayed reads.

When the decision on which branch to take depends on the value of a read that is not yet resolved, the semantics allows execution to continue in a speculative manner, collecting a symbolic representation of the branch condition in the form of a path condition.

Example 1 (Out-of-thin-air).For illustration, let us revisit the litmus test from Fig-ure 5a. Assume that both threads start with a local state ofσ0. We don’t show the write events for initialization of the two shared variables. Now, after executing four in-structions, the resulting configuration may continue as follows, dereferencing the load reference:

p1hσ₁,n1i kn1[(σ₀,loadx)]kn⁰₁(|σ₀,y:=n1|)kn2[(σ₀,loady)]kn⁰₂(|σ₀,x:=n2|)kp2hσ₂,n2i −→ p1hσ₁,n1i kn1[(n2)]kn⁰₁(|σ₀,y:=n1|)kn2[(σ₀,loady)]kn⁰₂(|σ₀,x:=n2|)kp2hσ₂,n2i −→ p1hσ₁,n1i kn1[(n2)]kn⁰₁(|σ₀,y:=n1|)kn2[(n1)]kn⁰₂(|σ₀,x:=n2|)kp2hσ₂,n2i

After these two dereferencing steps, the configuration containsn1[(n2)]andn2[(n1)], each attempting to solve each other’s indirect reference in some form ofdeadlock.Adding a rule that “resolves” such cyclic dependencies by picking a random value (in a form of self-justification) allows the modeling of out-of-thin-air behavior. ut

6 Conclusion

We present the ideas behind anoperationalspecification for a weak memory model.

The semantics is accompanied by an implementation in theKframework and by several examples and test cases [10]. We plan to use the implementation towards the verification of program properties such as data-race freedom.

[1] S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. Research Report 95/7, Digital WRL, Sept. 1995.

[2] S. V. Adve and M. D. Hill. Weak ordering — a new definition. SIGARCH Computer Architecture News, 18(3a):2–14, 1990.

[3] J. Alglave, L. Maranget, and M. Tautschnig. Herding cats: Modelling, simulation, testing, and data-mining for weak memory.ACM TOPLAS, 36(2), 2014.

[4] D. Aspinall and J. ˇSevˇc´ık. Java memory model examples: Good, bad and ugly. Proc. of VAMP, 7, 2007.

[5] Programming languages — C++. ISO/IEC 14882:2001, 2011.

[6] H.-J. Boehm and S. V. Adve. Foundations of the C++ concurrency memory model. InACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).

ACM, 2008.

[7] H.-J. Boehm and B. Demsky. Outlawing ghosts: Avoiding out-of-thin-air results. In Pro-ceedings of the Workshop on Memory Systems Performance and Correctness, MSPC ’14, pages 7:1–7:6, New York, NY, USA, 2014. ACM.

[8] W. W. Collier.Reasoning about Parallel Architectures. Prentice Hall, 1992.

[9] A. A. A. Donovan and B. W. Kernighan.The Go Programming Language. Addison-Wesley, 2015.

[10] D. Fava. Operational semantics of a weak memory model with channel synchronization.

https://github.com/dfava/mmgo, Oct. 2017.

[11] D. Fava, M. Steffen, and V. Stolz. Operational semantics of a weak memory model with channel synchronization. In K. H. et.al., editor,FM, volume 10951 ofLecture Notes in Computer Science, pages 1–19. Springer Verlag, July 2018.

[12] D. Fava, M. Steffen, and V. Stolz. Operational semantics of a weak memory model with channel synchronization: Proof of sequential consistency for race-free programs. Technical Report 477, University of Oslo, Faculty of Mathematics and Natural Sciences, Dept. of Informatics, Jan. 2018. Available athttps://www.duo.uio.no/handle/10852/61977.

[13] The Go programming language specification. https://golang.org/ref/spec, Nov.

2016.

[14] The Go memory model.https://golang.org/ref/mem, 2014. Version of May 31, 2014, covering Go version 1.9.1.

[15] S. I. Inc and D. L. Weaver.The SPARC architecture manual. Prentice-Hall, 1994.

[16] L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communica-tions of the ACM, 21(7):558–565, 1978.

[17] L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs.IEEE Transactions on Computers, C-28(9):690–691, Sept. 1979.

[18] J. Manson, W. Pugh, and S. V. Adve. The Java memory model. InPOPL ’05. ACM, Jan.

2005.

[19] L. Maranget, S. Sarkar, and P. Sewell. A tutorial introduction to the ARM and POWER relaxed memory models (version 120), Oct. 2012.

[20] W. Pugh. Fixing the Java memory model. InProceedings of the ACM Java Grande Confer-ence, June 1999.

[21] J. M. Tendler, J. S. Dodson, J. S. F. Jr., H. Q. Le, and B. Sinharoy. Power4 system microar-chitecture.IBM Journal of Research and Development, 46(1):5–25, 2002.

Translets —

Parsing Diagnosis in Small DSLs,

In document Tagungsband des 35ten Jahrestreffens der GI-Fachgruppe "Programmiersprachen und Rechenkonzepte" (sider 117-122)