Abstract
Various load balancing policies are known to achieve vanishing waiting times in the large-scale limit, that is, when the number of servers tends to infinity. These policies either require a communication overhead of one message per job or require job size information. Load balancing policies with an overhead below one message per job are called hyperscalable policies. While these policies often have bounded queue length in the large-scale limit and work well when the overhead is somewhat below one, they show poor performance when the communication overhead becomes small, that is, the mean response time tends to infinity when the overhead tends to zero even at low loads. In this paper, we introduce a hyperscalable load balancing policy, called Join-Up-To(m), that remains effective even when the communication overhead tends to zero. To study its performance under general job size distributions, we make use of the “queue at the cavity" approach. We provide explicit results for the first two moments of the response time, the generating function of the queue length distribution and the Laplace transform of the response time. These results show that the mean response time only depends on the first two moments of the job size distribution.
Similar content being viewed by others
References
Lu, Y., Xie, Q., Kliot, G., Geller, A., Larus, J.R., Greenberg, A.: Join-idle-queue: a novel load balancing algorithm for dynamically scalable web services. Perform. Eval. 68, 1056–1071 (2011). https://doi.org/10.1016/j.peva.2011.07.015
Stolyar, A.L.: Pull-based load distribution in large-scale heterogeneous service systems. Queueing Syst. 80(4), 341–361 (2015). https://doi.org/10.1007/s11134-015-9448-8
Anselmi, J.: Combining size-based load balancing with round-robin for scalable low latency. IEEE Trans. Parallel Distrib. Syst. 31(4), 886–896 (2020). https://doi.org/10.1109/TPDS.2019.2950621
Van der Boor, M., Zubeldia, M., Borst, S.: Zero-wait load balancing with sparse messaging. Oper. Res. Lett. 48(3), 368–375 (2020). https://doi.org/10.1016/j.orl.2020.04.006
Gamarnik, D., Tsitsiklis, J.N., Zubeldia, M.: Delay, memory, and messaging tradeoffs in distributed service systems. ACM SIGMETRICS Perf. Evaluat. Rev. 44(1), 1–12 (2016). https://doi.org/10.1287/stsy.2017.0008
Van der Boor, M., Borst, S., van Leeuwaarden, J.: Hyper-scalable JSQ with sparse feedback. Proc. ACM Meas. Anal. Comput. Syst. 3(1), 1–37 (2019). https://doi.org/10.1145/3322205.3311075
Hellemans, T., Kielanski, G., Van Houdt, B.: Performance of load balancers with bounded maximum queue length in case of non-exponential job sizes. IEEE/ACM Trans. Netw. (2022). https://doi.org/10.1109/TNET.2022.3221283
Bramson, M., Lu, Y., Prabhakar, B.: Randomized load balancing with general service time distributions. ACM Sigmetrics 2010, 275–286 (2010). https://doi.org/10.1145/1811039.1811071
Shneer, S., Stolyar, A.: Large-scale parallel server system with multi-component jobs. Queueing Syst. 98, 21–48 (2021). https://doi.org/10.1007/s11134-021-09686-y
Anselmi, J., Dufour, F.: Power-of-d-choices with memory: fluid limit and optimality. Math. Oper. Res. 45(3), 862–888 (2020). https://doi.org/10.1287/moor.2019.1014
Hellemans, T., Van Houdt, B.: On the power-of-d-choices with least loaded server selection. Proc. ACM Meas. Anal. Comput. Syst. (2018). https://doi.org/10.1145/3224422
Gast, N.: Expected values estimated via mean-field approximation are 1/N-accurate. Proc. ACM Meas. Anal. Comput. Syst. 1(1), 17 (2017). https://doi.org/10.1145/3084454
Gross, D., Shortle, J.F., Thompson, J.M., Harris, C.M.: Fundamentals of Queueing Theory, 4th edn. Wiley-Interscience, New York (2008)
Fuhrmann, S.W.: A note on the M/G/1 queue with server vacations. Oper. Res. 32(6), 1368–1373 (1984)
Fuhrmann, S.W., Cooper, R.B.: Stochastic decompositions in the M/G/1 queue with generalized vacations. Oper. Res. 33(5), 1117–1129 (1985)
Neuts, M.F.: Matrix-Geometric Solutions in Stochastic Models: an Algorithmic Approach. John Hopkins University Press, Baltimore (1981)
Kurtz, T.G.: Approximation of Population Processes. Approximation of Population Processes, vol. nrs. 36-40. SIAM, (1981). https://books.google.be/books?id=XbDd8SIYzFYC
Gast, N., Gaujal, B.: Markov chains with discontinuous drifts have differential inclusion limits. Perform. Eval. 69(12), 623–642 (2012)
Cohen, J.W.: The Single Server Queue, 2 sub edn. North-Holland Series in Applied Mathematics and Mechanics 8. North-Holland, Amsterdam,The Netherlands (1982)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
A On the convergence to the queue at the cavity
In this appendix, we make a number of observations related to the convergence to the queue at the cavity for exponential job sizes and finite buffers of size B. The results in this section can also be generalized to phase-type distributed job sizes (which complicates notations). In (1*), we show that the stochastic system consisting of N servers is a density-dependent population process as defined by Kurtz [17]. In (2*), we present an expression for the drift function, which is not continuous everywhere, and define a differential inclusions based on the drift function. Leveraging the framework in Gast and Gaujal [18] allows us to show that the sample paths of the stochastic systems converge to the set of solutions of the differential inclusion over finite time scales as N tends to infinity. If the differential inclusion has multiple solutions, the system may converge to any solution of the DI, depending on its random innovations. In (3*), we argue that there exists a solution of the differential inclusion that makes a so-called sliding motion in a certain region of the state space, and this region contains a fixed point that corresponds to our queue at the cavity.
Given the above three results, weak convergence of the steady-state measures to the Dirac measure of the fixed point in (3*) follows due to Gast and Gaujal [18, Section 4.2] provided that we can show that the trajectory of any solution of the differential inclusion converges to this fixed point. In other words, it suffices to show that the fixed point mentioned in (3*) is a global attractor of any solution to the differential inclusion in order to exchange the limits of t and N. This proof of global attraction is still missing. We comment on a possible approach at the end of this section.
(1*) We first show that the stochastic system consisting of N servers with exponential service times and finite buffers of size B is a density-dependent population process. Define the variables \(Y^{(N)}_{i,j}(t)\), for \(0\le j\le i \le B\) as the fraction of the N servers that have queue length j and for which the dispatcher has an estimated queue length equal to \(i \ge j\) at time t. Clearly, due to the exponential job sizes the variables \(Y^{(N)}_{i,j}(t)\) form a continuous time Markov chain on the state space \(\mathcal {S}^{(N)} =\) \(\{y_{i,j} \mid 0\le j\le i \le B, \sum _{i,j} y_{i,j}=1, Ny_{i,j} \in \mathbb {N}\}\) \(\subseteq \mathbb {Z}^{(B+1)(B+2)/2}/N\). This Markov chain is a density-dependent population process if there exists a finite set \(\mathcal {L} \subset \mathbb {Z}^{(B+1)(B+2)/2}\) (with \(0 \not \in \mathcal {L}\)), such that for each \(\ell \in \mathcal {L}\) and \(y \in S^{(N)}\), the rate of transition from y to \(y + \ell /N\) is of the form \(N \beta _\ell (y) \ge 0\), where \(\beta _{\ell }(\cdot )\) does not depend on N. Let \(e_{(i,j)} \in S^{(N)}\) be the vector with \(y_{i,j}=1\) (and zeros elsewhere). For the JUT(m) system, we have three types of transitions. (1) We can have an arrival that is assigned to a queue with length j and estimated length i. These transitions are denoted as \(\ell _{a(i,j)} = -e_{(i,j)}+e_{(i+1,j+1)}\) (for \(i < B\)) as they change the queue state from (i, j) to \((i+1,j+1)\). Let \(\kappa (y)\) be the minimum of m and the smallest estimated queue length when the system is in state y, that is, \(\kappa (y) = \min (m, \min \{i \mid \exists j: y_{i,j} > 0 \})\). As the job arrivals occur at rate \(\lambda N\) and a job is assigned to a queue with the smallest estimated queue length if \(\kappa (y) < m\) and at random otherwise, we have
(2) A service completion can occur in a server with length j and estimated queue length i. We denote these transitions as \(\ell _{s(i,j)} = -e_{(i,j)} + e_{(i,j-1)}\) for \(i \ge j > 0\). As service completions do not depend on other queues, we have \(\beta _{\ell _{s(i,j)}}(y) = y_{i,j}\) due to the exponential service times with mean 1. (3) The last type of transition that can occur is an update from an idle server, which changes its state from (i, 0) to (0, 0) for \(i > 0\). We denote these as \(\ell _{u(i,0)} = -e_{(i,0)}+e_{(0,0)}\). As such updates occur at rate \(\delta _0\) in any idle queue, we have \(\beta _{\ell _{u(i,0)}}(y)=\delta _0 y_{i,0}\). The functions \(\beta _\ell (\cdot )\) do not depend on N, therefore the Markov chain is a density-dependent population process.
(2*) The drift function f(y), with components \(f_{(i,j)}(y)\) in our case, of a density-dependent population process are defined as \(f(y) = \sum _{\ell \in \mathcal {L}} \beta _{\ell }(y) \ell \). Let \(u_i = \sum _{j=0}^i y_{i,j}\). Given the above discussion on the transitions in \(\mathcal {L}\), we have
where \(1[A]=1\) if A is true and \(1[A]=0\) otherwise. The first two terms are due to the service completions, the next two due to the updates and the remaining ones are a result of the arrivals. Note that the \(1[i \ge m]\) and \(1[i > m]\) conditions on the last two terms can be dropped as \(y_{i,j} = 0\) for \(i < m\) when \(\kappa (y) = m\). Further, for ease of presentation, the changes needed due to having a finite B are omitted.
When the drift function f(y) is Lipschitz continuous Kurtz showed that the sample paths of the stochastic system converge to the solution of the set of ODEs given by \(dy(t)/dt = f(y(t))\) over any finite time interval [0, T]. In our case, the drift function f is clearly not continuous due to the presence of the \(\kappa (\cdot )\) function. The result of Kurtz was however generalized in Gast and Gaujal [18, Theorem 5] to systems with drifts that contain discontinuities. More specifically, define the differential inclusion (DI) \(dy(t)/dt \in F(y(t))\) with \(y(0)=y_0\) where F(y) is the convex closure of the set of all f(y) values that can be obtained as \(f(y) = \lim _n f(y_n)\) with \(\lim _n y_n = y\). Let \(\mathcal {G}_T(y_0)\) be the set of solutions to the DI on [0, T] with \(y(0)=y_0\), where a solution is an absolutely continuous function y such that \(df(y)/dt \in F(y(t))\) almost everywhere. Gast and Gaujal [18, Theorem 5] then implies that
in probability provided that \(\sup _y \sum _{\ell \in \mathcal {L}} \beta _\ell (y) < \infty \) and \(\sum _{\ell \in \mathcal {L}} \Vert \ell \Vert \sup _y \beta _\ell (y) < \infty \). Both conditions hold in our case as \(\mathcal {L}\) is finite and \(\sup _y \beta _\ell (y) \le \max (1,\delta _0)\).
To define the set-valued function F(y), we introduce the vectors \(w^k(y)\) for \(k=0,\ldots , \kappa (y)-1\) with (i, j)-th component given by:
Looking at (19) one finds that the set F(y) is defined as the convex closure of the set \(\{w^0(y),\ldots ,w^{\kappa (y)-1}(y),f(y)\}\). When \(\kappa (y)=m\) this means that F(y) contains all functions \({\tilde{f}}(y)\), with components \({\tilde{f}}_{(i,j)}(y)\) of the form
with \(\alpha _{i} \in [0,1]\) and \(\sum _{i=0}^{m} \alpha _{i} =1\).
(3*) Suppose now that we are in a region of the state space where \(\kappa (y)= m\) and \(q_0(y)=\sum _{i\ge 0} y_{i,0} \le 1-\lambda \). In order to remain in this part of the state space by making a so-called sliding motion, the drift of \(y_{0,0}\) should be zero, such that \(y_{0,0}\) remains zero. By (21) and the fact that \(y_{i,j} = 0\) for \(i <m\) when \(\kappa (y)=m\) shows that \(\alpha _0 = \delta _0 q_0(y)/\lambda \). Further, if we demand that \(y_{i,j}\) remains zero for \(0< i < m\), then (21) indicates that \(\alpha _{i}=\alpha _{i-1}\). As the sum of all \(\alpha \)’s equals one, we have \(\alpha _m = 1-\delta _0 q_0(y) m/\lambda \le 1\) when \(q_0(y)\le 1-\lambda \) due to our assumption throughout the paper that \(\lambda > \delta m\) and the fact that \(\delta _0 = \delta /(1-\lambda )\). If we now focus on the region with \(q_0(y)=1-\lambda \) during this sliding motion, we find that \(\alpha _{i} = \delta /\lambda \) for \(0< i < m\) and \(\alpha _m = 1-\delta m/\lambda = {\tilde{\lambda }}/\lambda \). When we plug in the above \(\alpha \) values in (21), we find
where the second equality is due to \(\sum _{s >0}y_{s,0} = \sum _{s \ge m}y_{s,0} = 1-\lambda = \delta /\delta _0\) when \(\kappa (y)= m\).
If we sum these drifts over i and use the fact that \(\sum _{i>0} y_{i,0}=1-\lambda \), we find
where \(y_j = \sum _{i\ge j} y_{i,j}\). Recall now that the queue at the cavity for the JUT(m) policy with exponential job sizes is defined as an M/M/1 queue with arrival rate \({\tilde{\lambda }} = \lambda - \delta m\), except that when the queue is empty, there are also batch arrivals of size m that occur at rate \(\delta _0 = \delta /(1-\lambda )\) such that the probability that the queue is idle is given by \(1-\lambda \). By demanding that \(\sum _{i\ge j} {\tilde{f}}_{(i,j)}(y) = 0\) and by replacing \(1-\lambda \) by \(y_0\), we obtain the balance equations of such an M/M/1 queue given by
for \(m \not = j > 0\).
This completes items (1*) to (3*). To prove convergence of the stationary measures, we must show that the fixed point of (3*) is a global attractor for any solution of the differential inclusion. This could be done by first showing that there is a unique solution and subsequently showing that any trajectory of this solution (for any starting point \(y_0\), including all points with \(\kappa (y_0) < m\)) converge to this fixed point. A sufficient condition such that we have at most one solution is that the set valued function F(y) is one-sided Lipschitz. This means that for any \(y, y' \in \mathbb {R}^{(B+1)(B+2)/2}\) and any \(z \in F(y), z' \in F(y')\), we have
for some constant L, where \(\langle x,y \rangle \) is the inner product. The following example indicates that the set-valued function F(y) that characterizes our differential inclusion is not one-sided Lipschitz. Let \(m=1\) and let y be such that \(y_{0,0}=\epsilon , y_{1,1}=1-\epsilon \) which implies that the only nonzero f(y) components are \(f_{0,0}(y)=-\lambda , f_{1,0}(y)=1-\epsilon \) and \(f_{1,1}(y)=\lambda +\epsilon -1\). Let \(y'\) be such that \(y'_{1,1}=1-2\epsilon \) and \(y'_{2,1}=2\epsilon \), then the nonzero components of \(f(y')\) are given by \(f_{1,0}(y')=1-2\epsilon , f_{1,1}(y')=-1+2\epsilon -\lambda \), \(f_{2,2}(y')=\lambda \), \(f_{(2,0)}(y')=2\epsilon \) and \(f_{(2,1)}(y')=-2\epsilon \). As \(f(y) \in F(y)\) and \(\Vert y-y' \Vert ^2 = O(\epsilon ^2)\), we must have that \(\langle y-y',f(y)-f(y') \rangle = O(\epsilon ^2)\). However,
Hence, F(y) is not one-sided Lipschitz and the uniqueness of the solution of the differential inclusion must be proven in some other manner. One possible approach could be to find a change of variables such that the set-valued drift does become one-sided Lipschitz. Once uniqueness of the solution is established, one can try to use monotonicity arguments to prove global attraction of the fixed point in (3*).
B Calculation of second (raw) moment of the response time
In this appendix, we derive a formula for the second (raw) moment \(E[R^2]\) of the response time, which combined with E[R] yields a formula for the variance Var[R]. We have \(E[R^2] = R^{*\prime \prime }(0)\). By using (8) together with \(G^{*\prime }(0) = -E[G] = -1\), \(-{Y^*}'(0)=E[Y]=E[G^2]/2\) and \(G^{*\prime \prime }(0) = E[G^2]\), we obtain
One readily checks that \(Y^{*\prime \prime }(0)= E[G^3]/3\) and we already know that \(\pi ^\prime (1) = \lambda E[R]\). From (4), we have
Making use of (5), one finds
We still need to find \(\xi ^{\prime \prime }(1)\). Denote respectively by \({\tilde{R}}\) and W the response and waiting time of an ordinary M/G/1 queue with arrival rate \({\tilde{\lambda }}\). By using [13, (5.30)], we get \(\xi ^{\prime \prime }(1) = {\tilde{\lambda }}^2 E[{\tilde{R}}^2]\). We have \(E[{\tilde{R}}^2]=E[(W+G)^2] = E[W^2]+2E[W]E[G]+ E[G^2]\). As \(E[G] = 1\) and \(E[W] = \frac{\tilde{\lambda }E[G^2]}{2(1-{\tilde{\lambda }})}\), we obtain \(2E[W]E[G]+ E[G^2] = E[G^2]/(1-{\tilde{\lambda }})\). \(E[W^2]\) is given by [19, p.256]:
It follows that
Putting everything together, we get
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kielanski, G., Hellemans, T. & Van Houdt, B. Join-Up-To(m): improved hyperscalable load balancing. Queueing Syst 105, 291–316 (2023). https://doi.org/10.1007/s11134-023-09897-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11134-023-09897-5