Abstract
Motivated by recent distributed systems technology, Aguilera et al. introduced a hybrid model of distributed computing, called the message-and-memory model or m&m model for short. In this model, processes can communicate by message passing and also by accessing some shared memory (e.g., through some RDMA connections). We first consider the basic problem of implementing an atomic single-writer multi-reader (SWMR) register shared by all the processes in m&m systems. Specifically, we give an algorithm that implements such a register in m&m systems and show that it is optimal in the number of process crashes that it tolerates. This generalizes the well-known ABD implementation of an atomic SWMR register in a pure message-passing system. We then combine our register implementation for m&m systems with a randomized consensus algorithm of Aspnes and Herlihy, and obtain a randomized consensus algorithm for m&m systems that is also optimal in the number of process crashes that it can tolerate. Finally, we determine the minimum number of RDMA connections that is sufficient to implement a SWMR register, or solve randomized consensus, in an m&m system with t process crashes, for any given t.
Similar content being viewed by others
Code availability
Not applicable.
Notes
This can be ensured by the writer w writing values of the form \(\langle k , val \rangle \) where k is the value of a counter that w increments before each write.
Note that L satisfies Assumption 1 because each \(S_i = N^+(p_i)\) contains \(p_i\).
As we mentioned in the introduction, we assume that the crash of \(p_i\) does not prevent the neighbours of \(p_i\) from accessing the shared registers \(R_i[*]\).
The ABD algorithm is the special case of Algorithm 1 for \({\mathcal {S}}_{L}\) where \(L = \{ \{ p_1 \} , \{ p_2 \} , \ldots , \{ p_n \} \}\).
Our algorithm’s SWMR registers store such pairs and are therefore unbounded.
A step of \({\mathcal {A}}\) executed by process p is one of the following: p sending a message, receiving a message, or p writing or reading a shared register in \({\mathcal {S}}_L\).
As with the Hoffman-Singleton graph, Petersen graph is a Moore Graph with diameter 2 [23].
[7] considers randomized consensus algorithms only for uniform m&m systems.
Roughly speaking, this is because the remaining two processes \(p_1\) and \(p_4\) cannot simulate a majority of correct processes, as required by the technique used in [7].
Recall that such a system is modelled by a graph G where nodes represent processes and each edge between two processes represents an RDMA connection between these processes which allows them to share some SWMR registers.
Note that for reasons explained in Sect. 5.1, this transformation does not necessarily work for randomized algorithms.
The focus of [21] as not on minimizing the number of RDMA connections, but rather on the benefits that accrue from shared memory with memory permissions.
In a history of an object implementation, we omit all steps other than the invocation and response steps on that object.
References
Gen-Z draft specifications. https://genzconsortium.org/bulk-download-of-completed-and-draft-gen-z-specifications-now-available/
Gen-Z DRAM and persistent memory theory of operation. https://genzconsortium.org/wp-content/uploads/2019/03/Gen-Z-DRAM-PM-Theory-of-Operation-WP.pdf
InfiniBand. https://en.wikipedia.org/wiki/InfiniBand
Lim, K., Chang, J., Mudge, T., Ranganathan, P., Reinhardt, S.K., Wenisch, T.F.: Disaggregated memory for expansion and sharing in blade servers. In: International Symposium on Computer Architecture, pp. 267–278 (2009)
RDMA over converged ethernet. https://en.wikipedia.org/wiki/RDMA_over_Converged_Ethernet
Aguilera, M.K., Ben-David, N., Calciu, I., Guerraoui, R., Petrank, E., Toueg, S.: Passing messages while sharing memory. In: Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing, PODC 2018, pp. 51–60 (2018)
Dragojević, A., Narayanan, D., Castro, M., Hodson, O.: FaRM: Fast remote memory. In: Symposium on Networked Systems Design and Implementation, pp. 401–414 (2014)
Kalia, A., Kaminsky, M., Andersen, D.G.: Using RDMA efficiently for key-value services. In: ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 295–306 (2014)
Kalia, A., Kaminsky, M., Andersen, D.G.: FaSST: Fast, scalable and simple distributed transactions with two-sided (RDMA) datagram RPCs. In: Symposium on Operating Systems Design and Implementation, pp. 185–201 (2016)
Tsai, S.Y., Zhang, Y.: LITE kernel RDMA support for datacenter applications. In: ACM Symposium on Operating Systems Principles, pp. 306–324 (2017)
Attiya, H., Bar-Noy, A., Dolev, D.: Sharing memory robustly in message-passing systems. J. ACM 42(1), 124–142 (1995)
Aspnes, J., Herlihy, M.: Fast randomized consensus using shared memory. J. Algorithms 11(3), 441–461 (1990)
Attiya, H., Enea, C.: Putting Strong Linearizability in Context: Preserving Hyperproperties in Programs That Use Concurrent Objects. In: 33rd International Symposium on Distributed Computing, DISC 2019, pp. 2:1–2:17 (2019)
Golab, W., Higham, L., Woelfel, P.: Linearizable implementations do not suffice for randomized distributed computation. In: Proceedings of the 2011 ACM Symposium on Theory of Computing, STOC 2011, pp. 373–382 (2011)
Hadzilacos, V., Hu, X., Toueg, S.: On register linearizability and termination. In: Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing, PODC 2021, pp. 521–531 (2021)
Hadzilacos, V., Hu, X., Toueg, S.: Randomized consensus with regular registers. (June 11, 2020). arxiv:2006.06771. To appear in Information Processing Letters (March 2022)
Lamport, L.: On interprocess communication parts I-II. Distrib. Comput. 1(2), 77–101 (1986)
Poke, M., Hoefler, T.: Dare: High-performance state machine replication on RDMA networks. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2015, pp. 107–118 (2015)
Yang, J., Izraelevitz, J., Swanson, S.: Orion: A distributed file system for non-volatile main memory and RDMA-capable networks. In: 17th USENIX Conference on File and Storage Technologies, FAST 2019, pp. 221–234 (2019)
Aguilera, M.K., Ben-David, N., Guerraoui, R., Marathe, V., Zablotchi, I.: The impact of RDMA on agreement. In: Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, PODC 2019, pp. 409–418 (2019)
Herlihy, M., Wing, J.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990)
Hoffman, A.J., Singleton, R.R.: On Moore graphs with diameters 2 and 3. IBM J. Res. Dev. 4(5), 497–504 (1960)
Figure by Uzyel - Own work, CC BY-SA 3.0. https://commons.wikimedia.org/w/index.php?curid=10378641
Hoory, S., Linial, N., Wigderson, A.: Expander graphs and their applications. Bull. Amer. Math. Soc. 43(4), 439–561 (2006)
Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)
Loui, M.C., Abu-Amara, H.H.: Memory requirements for agreement among unreliable asynchronous processes. Adv. Comput. Res. 4(163–183), 31 (1987)
Hadzilacos, V., Hu, X., Toueg, S.: Optimal register construction in m&m systems. In: 23rd International Conference on Principles of Distributed Systems, OPODIS 2019, pp. 28:1–28:16 (2019)
Hadzilacos, V., Hu, X., Toueg, S.: On atomic registers and randomized consensus in M&M systems (June 17, 2020). arxiv:1906.00298v2
Amza, C., Cox, A.L., Dwarkadas, S., Keleher, P., Lu, H., Rajamony, R., Yu, W., Zwaenepoel, W.: TreadMarks: Shared memory computing on networks of workstations. IEEE Comput. 29(2), 18–28 (1996)
Baumann, A., Barham, P., Dagand, P.E., Harris, T., Isaacs, R., Peter, S., Roscoe, T., Schüpbach, A., Singhania, A.: The multikernel: A new OS architecture for scalable multicore systems. In: ACM Symposium on Operating Systems Principles, pp. 29–44 (2009)
Bennett, J.K., Carter, J.B., Zwaenepoel, W.: Munin: Distributed shared memory based on type-specific memory coherence. In: ACM Symposium on Principles and Practice of Parallel Programming, pp. 168–176 (1990)
David, T., Guerraoui, R., Yabandeh, M.: Consensus inside. In: International Middleware Conference, pp. 145–156 (2014)
Kaxiras, S., Klaftenegger, D., Norgren, M., Ros, A., Sagonas, K.: Turning centralized coherence and distributed critical-section execution on their head: a new approach for scalable distributed shared memory. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2015, pp. 3–14 (2015)
Kranz, D., Johnson, K., Agarwal, A., Kubiatowicz, J., Lim, B.H.: Integrating message-passing and shared-memory: Early experience. In: ACM Symposium on Principles and Practice of Parallel Programming, pp. 54–63 (1993)
Nelson, J., Holt, B., Myers, B., Briggs, P., Ceze, L., Kahan, S., Oskin, M.: Latency-tolerant software distributed shared memory. In: USENIX Annual Technical Conference, pp. 291–305 (2015)
Scales, D.J., Gharachorloo, K., Thekkath, C.A.: Shasta: A low overhead, software-only approach for supporting fine-grain shared memory. In: International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 174–185 (1996)
Attiya, H., Kumari, S., Schiller, N.: Optimal resilience in systems that mix shared memory and message passing. In: Q. Bramas, R. Oshman, P. Romano (eds.) 24th International Conference on Principles of Distributed Systems, OPODIS 2020, pp. 16:1–16:16 (2020)
Attiya, H., Kumari, S., Schiller, N.: Optimal resilience in systems that mix shared memory and message passing. (December 20, 2020). arxiv:2012.10846
Afek, Y., Greenberg, D.S., Merritt, M., Taubenfeld, G.: Computing with faulty shared memory. In: Proceedings of the 1992 ACM Symposium on Principles of Distributed Computing, PODC 1992, pp. 47–58 (1992)
Jayanti, P., Chandra, T.D., Toueg, S.: Fault-tolerant wait-free shared objects. J. ACM 45(3), 451–500 (1998)
Acknowledgements
We thank the anonymous reviewers for their comments that helped improve this paper. This work is partially supported by the Natural Sciences and Engineering Research Council of Canada under Grant NSERC RGPIN-2014-05296.
Funding
This research was partially funded by the Natural Sciences and Engineering Research Council of Canada.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No conflicts of interest to this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hadzilacos, V., Hu, X. & Toueg, S. On atomic registers and randomized consensus in M&M systems. Distrib. Comput. 35, 81–103 (2022). https://doi.org/10.1007/s00446-021-00405-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00446-021-00405-7