Problem definition and setting
Let us assume there is a planned large, longlasting event that consists of several subevents to be held in a given area. The area is composed of a road network with multiple venues, hotels, train stations and scenic spots that will be origin/destinations (ODs) for walking users. At different periods of each day, different subevents will be held in different venues. During walking outdoors between ODs, pedestrians are at risk of heatstroke. In this study, many factors that affect the heatstroke risk of pedestrians, such as walking distance, solar radiation, pedestrian flow (which is represented simply by ‘flow’ in the following) density, the number and location of relief stations and the supply volume, are taken into the consideration.
The proposed problem is set as optimizing the number, location and supply volume of each temporary station as well as the pedestrian routes in the road network to reduce heatstroke each day during the large outdoor event. It should be emphasized that the solution of the problem corresponds to the scheduling scheme for all the subevents every single day during the event. Before the optimization, there are some preparatory works needed to be done: (a) facilities and POIs extraction; (b) extraction and simplification of the road network for a given area; (c) the extraction and calculation of the heatstroke related data; (d) event schedule collection and pedestrian flow simulation. By handling the preparatory work and solving the optimization problem, we are not only able to provide event holders with a reasonable allocation scheme of relief stations and supply volume but also to recommend walking paths for pedestrians.
The optimization setting in our research can be described as follows:
Assumption:

1.
There will be several inflows before each event and outflows after the event between event venues and other facilities such as places of interests (POIs), hotels and stations.

2.
The location of the relief station cannot be changed during the day, the supply volume can be however reassigned at different times of the day.
Input:

1.
The road network information including nodes, edges, the length of each edge, and the factor value that increases the vulnerability of each edge, the coordinates of each node.

2.
The Environment related data of the given area.

3.
The numbers of time units and simulated flow density at each time interval.

4.
The sets of start nodes and end nodes of all flows.

5.
The maximum number of relief stations and supply volumes of each station, the maximum allowed heatstroke risk of all edges (road) in the road network, the edge set on the path of each flow and the edge set between every two adjacent stations on the path of each flow.
Determine:

1.
The numbers and locations of relief stations on a given day.

2.
The supply volume of each relief station at different time units.

3.
The optimal route is made up of a set of road segments with the least heatstroke risk of different flows at different time units.
To mathematically model the problem, we represent the road network as a graph \(G = \left( {V,E} \right)\) that consists of the vertex set V and edge set E. Specifically, a vertex \(v \in V\) of the road network represents either an origin or destination point of flows or the intersections of the road segments. Then we propose a heatstroke risk metric to measure the risk of each road segment at different times during the events with different indicators including both precalculated parameters and the decision variables to be optimized. With the objective function of minimizing global risk values, a Mixed Integer Nonlinear Programming (MINLP) model [34] is established to work out the optimal solutions. MINLP model refers to a model whose decision variables include integer variables and continuous variables and, at the same time, the objective function or constraint condition contains the nonlinear form of the decision variable. For this type of model is difficult to guarantee the global optimal solution. However, the relatively optimal solution can be obtained by using a metaheuristic algorithm such as genetic algorithm. All notations of the mathematical models that we are going to introduce are listed and described in the section of Nomenclature.
Measuring heatstroke risk
We use a framework of a traditional risk model with heterogeneous data collected from different data sources. A traditional risk model divides risk into three factors which are hazard, vulnerability and exposure. Then a simple approach for measuring emergency risk is realized by multiplication of these three factors:\(R = r_{hazard} \times r_{vulnerability} \times r_{exposurre}\).Generally, the hazard represents the possibility that the emergency happens [35] while vulnerability represents the lack of proper resistance to the emergency, which is dependent on the context information. Finally, exposure refers to the amount of time spent when exposed to the hazard or the number of people involved. Although several studies have been applied to estimate the heatstroke risk on a macro scale [36], few focus on microanalysis, which should have different indicators depending on the distinct micro context. We utilize different micro indicators to implement our risk model for micro heatstroke analysis. The indicators of the hazard, vulnerability and exposure factors are listed as follows:
Hazard
Hazard is measured by Wet Bulb Globe Temperature (WBGT) which has been applied in other heatstrokerelated studies [36, 37]. Specifically, we choose a datadriven approach to measure heatstroke hazard via historical WBGT data at different hours during summertime. We utilize a normalized index W_{t} to represent the probability of the heatstroke severity for each hour t. In particular, for each hour we evaluate the average WBGT in all summer days. Then the average WBGT is normalized by the min and max temperature based on the government guidance.^{Footnote 1}
Vulnerability
Vulnerability is the factor related to the contextual environment. Generally, the vulnerability could be generated by the existing contextual environment, or reduced by improving the environment via temporary service. In this study, vulnerability is denoted by the indicators of a road segment. Sky view factor (SVF) defines the ratio of sky hemisphere visible from the ground that is not obstructed by buildings, terrain or trees [38]. SVF has been proved to be quite an important indicator for computing solar radiation related to heatstroke. Generally, higher SVF in a place denotes more solar radiation, making the place more vulnerable to heatstroke [39]. Therefore, in our model SVF is utilized to measure the vulnerability of the existing contextual environment. On the other hand, the relief stations are set to reduce vulnerability and a station with a larger service volume (e.g., more volunteers) could help more pedestrians. Therefore, Vulnerability \(R_{i,t}^{V}\) could be measured by Eq. (1) with a given SVF value \(V_{i}^{I}\) and the vulnerability reduction indicator \(V_{i,t}^{R}\) computed from the supply volume \(N_{i,t}^{V}\) in each road segment i and time interval t.
$$R_{i,t}^{V} = \frac{{V_{i}^{I} }}{{V_{i,t}^{R} }} = \frac{{V_{i}^{I} }}{{1 + \left( {B_{i}^{S} N_{i,t}^{V} } \right)^{d} }} \, \forall i \in E,t \in T$$
(1)
where \(B_{i}^{S}\) are binary variables that refer to whether there is a relief station on the road segment i,\(N_{i,t}^{V}\) are integer variables that refer to the volumes of all the relief stations on the road segment i at time interval t. Finally, d is an index of vulnerability reduction indicator and is set to 1 in this research.
Exposure
Exposure is measured by the total walking time of all pedestrians for each road segment. To simplify the calculation, we assume all pedestrians walking at the same speed in all road segments and within alltime intervals. As a result, the exposure of heatstroke in this study is proportional to the number of pedestrians and the road length for each road segment. Therefore, for a given road segment i at time interval t, exposure volume is denoted by Eq. (2)
$$R_{i,t}^{E} = L_{i} \sum\limits_{f} {B_{i,f,t}^{P} N_{f,t}^{P} } \, \forall i \in E,f \in F,t \in T$$
(2)
where \(L_{i}\) is the length of road segment i,\(B_{i,f,t}^{P}\) are binary variables referring to whether flow f is observed on edge i at time interval t, while \(N_{f,t}^{P}\) denotes the number of people of flow f at time interval t.
With the factors defined, the risk for flow f in road segment i at time interval t could be denoted by Eq. (3).
$$\begin{gathered} R_{i,f,t} = B_{i,f,t}^{P} \left( {W_{t} } \right)^{a} \left( {R_{i,t}^{V} } \right)^{b} \left( {R_{i,t}^{E} } \right)^{c} \hfill \\ \, = B_{i,f,t}^{P} \left( {W_{t} } \right)^{a} \left( {\frac{{V_{i}^{I} }}{{1 + \left( {B_{i}^{S} N_{i,t}^{V} } \right)^{d} }}} \right)^{b} \left( {L_{i} \sum\limits_{f} {B_{i,f,t}^{P} N_{f,t}^{P} } } \right)^{c} \, \forall i \in E,f \in F,t \in T \hfill \\ \end{gathered}$$
(3)
where \(W_{t}\) is the hazard factor defined by WBGT score during the time interval t.
Optimization model
With the risk metric defined above, this study provides an MINLP model to work out the solutions for facility and path optimization with one objective function and several constraints. The model is solved by genetic algorithm with several strategies to accelerate the computation process.
Objective function
The objective function is to minimize the total heatstroke risk, i.e., the risk value generated by the risk metric introduced in the last section, for all flows during all the events within a time interval T, which can be denoted by Eq. (4):
$$\min \, Risk = \sum\limits_{i} {\sum\limits_{f} {\sum\limits_{t} {R_{i,f,t} } } } \quad \forall i \in E,f \in F,t \in T$$
(4)
Constraints
In this study, several constraints are set either for solutions to meet the predefined parameters or for overcoming the shortages of a single objective function for enabling more practical application. Generally, constraints can be categorized into the following four groups based on their target:

A.
Station constraints
The total number of supply stations established cannot exceed the maximum. It is described as below:
$$B_{i}^{S} = \left\{ \begin{gathered} 1, \, if \, \exists i = E_{s}^{S} \hfill \\ 0, \, if \, \forall i \ne E_{s}^{S} \hfill \\ \end{gathered} \right. \, \forall i \in E,s \in S$$
(5)
$$\sum\limits_{i} {B_{i}^{S} } \le N^{S,max} \, \forall i \in E$$
(6)

B.
Flow path constraints
In this study, since the flows are represented by a set of edges from the given origin and destination nodes, the path constraints mainly focus on edge connectivity and origin–destination connectivity.
Specifically, the origin–destination connectivity constraint denotes that the start (end) edge should be the only edge connected to the origin (destination) node. These constraints are as follows:
$$\sum\limits_{i} {B_{i,f,t}^{P} } = 1 \, \forall i \in E^{SE} ,f \in F,t \in T$$
(7)
$$\sum\limits_{i} {B_{i,f,t}^{P} } = 1 \, \forall i \in E^{EE} ,f \in F,t \in T$$
(8)
For each flow path at time interval t, the connectivity is judged by two Boolean matrices with the size of \(n_{f,t}^{E}\), adjacency matrix \(A_{f,t}\) and the reachability matrix \(P_{f,t}\) of the selected edges. Adjacency is represented by Constraint (11) which means if there is a point connected by both edge i and edge j, then the two edges are adjacent, the value of \(a_{i,j,f,t}\) is 1, otherwise it is 0. Reachability matrix \(P_{f,t}\) is obtained by Boolean addition and Boolean multiplication for adjacency matrix which are described by Eq. (13) and Eq. (14), and constraint (15) ensures that the graph composed of the selected edges is connected.
$$n_{f,t}^{E} = \sum\limits_{i} {B_{i,f,t}^{P} } \quad \forall i \in E,\,f \in F,\,t \in T$$
(9)
$$A_{f,t} = \left( {a_{i,j,f,t} } \right)_{{n_{f,t}^{E} \times n_{f,t}^{E} }} \, \forall i,j \in E^{FP} ,f \in F,t \in T$$
(10)
$$a_{i,j,f,t} = \left\{ \begin{gathered} 1, \, if \, N_{{_{i,f,t} }}^{EF} = N_{{_{j,f,t} }}^{EF} \, or \, N_{{_{i,f,t} }}^{EF} = N_{{_{j,f,t} }}^{ES} \, or \hfill \\ \, N_{{_{i,f,t} }}^{ES} = N_{{_{j,f,t} }}^{EF} \, or \, N_{{_{i,f,t} }}^{ES} = N_{{_{j,f,t} }}^{ES} \, \hfill \\ 0, \, otherwise \hfill \\ \end{gathered} \right. \, \forall i,j \in E^{FP} ,f \in F,t \in T$$
(11)
$$P_{f,t} = \left( {p_{i,j,f,t} } \right)_{{n_{f,t}^{E} \times n_{f,t}^{E} }} \, \forall i,j \in E^{FP} ,f \in F,t \in T$$
(12)
$$P_{f,t} = \bigcup\limits_{k = 1}^{{n_{f,t}^{E} }} {A_{f,t}^{\left( k \right)} } \, \forall f \in F,t \in T$$
(13)
$$A_{f,t}^{\left( k \right)} = A_{f,t}^{{\left( {k  1} \right)}} \odot A_{f,t} \, \forall f \in F,t \in T$$
(14)
$$p_{i,j,f,t} = 1 \quad \forall i,j \in E^{FP} ,f \in F,\,t \in T$$
(15)

C.
Volume constraints
The following volume constraints ensure that the total volume of service number of all supply stations at any time interval cannot exceed the max value. In addition, the constraint that the volume number of supply stations on each road segment should not exceed the maximum function is realized by implementing a sufficiently large constant M.
$$N_{i,t}^{V} = \left\{ \begin{gathered} \sum\limits_{s} {N_{s,t}^{V} \, } , \, if \, i = E_{s}^{S} \hfill \\ 0, \, otherwise \hfill \\ \end{gathered} \right. \, \forall i \in E,s \in S,t \in T$$
(16)
$$\sum\limits_{i} {\left( {B_{i}^{S} N_{i,t}^{V} } \right)} \le N^{V,max} \, \forall i \in E,t \in T$$
(17)
$$N_{i,t}^{V} \ge M\left( {B_{i}^{S}  1} \right) + 1 \, \forall i \in E,t \in T$$
(18)
$$N_{i,t}^{V} \le M\left( {1  B_{i}^{S} } \right) + N^{SV,max} \, \forall i \in E,t \in T$$
(19)
where \(N_{s,t}^{V}\) is the volume of relief station s at time interval t.

D.
Risk constraints
Risk constraints are set to exclude those solutions with concentrated stations in adjacent road segments. Specifically, the constraints should ensure that the value of the risks at each edge (\(R_{i,t}\)), at all edges of each flow (\(\sum\limits_{i} {R_{i,f,t} }\)) and between every two adjacent stations on the path of flow f (\(\, R_{s,k,f,t}\)) should not exceed the predetermined max risk values. The constraints are listed as follows:
$$R_{i,t} \le Ri^{max} \quad \forall i \in E,\,t \in T$$
(20)
$$\sum\limits_{i} {R_{i,f,t} \le Rf^{max} } \quad \forall i \in E,\,f \in F,\,t \in T$$
(21)
$$\, R_{s,k,f,t} \le Rs^{max} \quad \forall s,k \in S,\,f \in F,\,t \in T$$
(22)
Optimization algorithm
The proposed MINLP problem is NPhard and it is timeconsuming to directly apply any solution to the proposed model due to a large number of variables. We then apply several strategies to effectively generate efficient solutions.
In particular, the problem is solved by genetic algorithm (GA). GA is a widely used effective algorithm with good performance of global search and strong robustness. It is suitable for solving complex optimization problems that can be described as mixed linear models or mixed nonlinear models. The GA algorithm in this work starts with a set of the initial population that is generated based on certain requirements rather than randomly generated. This is beneficial to improve the convergence rate and the quality of the solution. Then the algorithm evaluates each individual in the population through the fitness function. A certain proportion of individuals at the top are selected as elite individuals and directly retained in the nextgeneration population. The nextgeneration population also includes children who are reproduced by selected elite individuals through crossover and mutation. With the process of evolution, the solution gradually approaches the optimal solution according to the principle of the survival of the fittest.
Generating initial GA population
Inspired by the assignment of initial solutions to ant colony optimization (ACO) in [40], the proposed method generated an initial GA population with feasible solutions to accelerate the convergence. In this study, we choose a group of paths based on the actual situation of excluding the solutions with large detours. Specifically, we apply Dijkstra algorithm to generate the shortest paths with the least SVF weighted length. Dijkstra is a classic algorithm for finding the shortest path from a given starting vertex x to all n1 other vertices in a positively weighted graph. Then a depthfirst search is applied to acquire all paths with the edge number smaller than or equal to the edge number in the shortest path + 5. Then the initial population can be represented by a combination of different paths for different flows.
Penalty function
To get solutions that satisfy constraints in the presented model, we use a penalty function to exclude the chromosomes that cannot meet the constraints in the evolution. Then the fitness function by which the next generation is bred could be represented by a sum of the objective function and penalty function, which is shown as Eq. (23).
$$\begin{aligned} F\left( {{\mathbf{X}},\delta_{n} } \right) & = f\left( {\text{X}} \right) + \delta_{1} \sum\limits_{i = 1}^{{n^{p} }} {\left( {p_{i} \left( {\mathbf{X}} \right)  1} \right)} + \delta_{2} \sum\limits_{j = 1}^{{n^{v} }} {\max \left\{ {0,\left( {v_{i} \left( {\mathbf{X}} \right)  N^{V,max} } \right)} \right\}} + \delta_{3} \sum\limits_{k = 1}^{{n^{ri} }} {\max \left\{ {0,\left( {ri_{i} \left( {\mathbf{X}} \right)  Ri^{max} } \right)} \right\}} \hfill \\ & \quad + \delta_{4} \sum\limits_{p = 1}^{{n^{rf} }} {\max \left\{ {0,\left( {rf_{i} \left( {\mathbf{X}} \right)  Rf^{max} } \right)} \right\}} + \delta_{5} \sum\limits_{q = 1}^{{n^{rs} }} {\max \left\{ {0,\left( {rs_{i} \left( {\mathbf{X}} \right)  Rs^{max} } \right)} \right\}} \hfill \\ \end{aligned}$$
(23)
where \({\mathbf{X}}\) represents a chromosome, i.e., a potential solution to the problem.\(f\left( {\text{X}} \right)\) is the objective function while the remaining items are penalty functions.\(\delta_{k} \left( {k = 1,2,3,4,5} \right)\) represents the weight of each penalty function which is usually a sufficiently large constant. The more variables that do not meet the constraints, the greater the value of the penalty function. Besides, the objective function is to minimize the risk of heatstroke. Therefore, the smaller the value of fitness, the better the solution. In addition, to further speed up the convergence, we compute the fitness function of the entire population in parallel.