A differential algebra based importance sampling method for impact probability computation on Earth resonant returns of Near Earth Objects

A differential algebra based importance sampling method for uncertainty propagation and impact probability computation on the first resonant returns of Near Earth Objects is presented in this paper. Starting from the results of an orbit determination process, we use a differential algebra based automatic domain pruning to estimate resonances and automatically propagate in time the regions of the initial uncertainty set that include the resonant return of interest. The result is a list of polynomial state vectors, each mapping specific regions of the uncertainty set from the observation epoch to the resonant return. Then, we employ a Monte Carlo importance sampling technique on the generated subsets for impact probability computation. We assess the performance of the proposed approach on the case of asteroid (99942) Apophis. A sensitivity analysis on the main parameters of the technique is carried out, providing guidelines for their selection. We finally compare the results of the proposed method to standard and advanced orbital sampling techniques.


INTRODUCTION
Over the last thirty years, significant efforts have been devoted to develop new tools for detection and prediction of planetary encounters and potential impacts by Near Earth Objects (NEO). The task introduces relevant challenges due to the imperative of early detection and accurate estimation and propagation of their state and associated uncertainty set (Chesley 2005). The problem is made more complicated by the fact that the dynamics describing the motion of these objects is highly nonlinear, especially during close encounters with major bodies. Nonlinearities of the orbital dynamics tend to significantly stretch the initial uncertainty sets during the time propagation. Nonlinearities are not confined to object dynamics only: even simple conversions between coordinate systems introduce nonlinearities, thus affecting the accuracy of classical propagation techniques (Wittig et al. 2015). Present day approaches for robust detection and prediction of planetary encounters and potential impacts by NEO mainly refer to linearised models or full nonlinear orbital sampling (Farnocchia et al. 2015). The impact prob-E-mail: matteo.losacco@polimi.it ability computation by means of linear methods in the impact plane was introduced by Chodas (1993), whereas the introduction of the Monte Carlo technique to this problem was developed by Yeomans & Chodas (1994) and Chodas & Yeomans (1999), who suggested to apply the method to sample the linear six dimensional confidence region at the observation epoch and then numerically integrate over the time interval of investigation using fully nonlinear equations (Milani et al. 2002). Milani et al. (1999), Milani (1999) and Milani et al. (2000a,b) applied the multiple solutions approach to sample the central Line of Variations (LOV) of the nonlinear confidence region at the initial epoch and then numerically integrate over the time span of interest in a similar way. Within the framework of the impact probability computation of resonant returns, a well-known approach relies on the concept of keyholes, small regions of the impact plane of a specific close encounter such that, if an asteroid passes through one of them, it will hit the Earth on subsequent return (Gronchi & Milani 2001;Milani et al. 2002;Valsecchi et al. 2003).
The preferred approach to detecting potential impacts depends on the uncertainty in the estimated orbit, the investigated time window and the dynamics between the obser-vation epoch and the epoch of the expected impact (Farnocchia et al. 2015). Linear methods are preferred when linear approximations are reliable for both the orbit determination and uncertainty propagation. When these assumptions are not valid, one must resort to more computationally intensive techniques: among these, Monte Carlo methods are the most accurate but also the most computationally intensive, whereas the LOV method guarantees compute times 3-4 orders of magnitude lower than those required in MC simulations, though the LOV analysis may grow quite complex after it has been stretched and folded by multiple close planetary encounters, leaving open the possibility of missing some pathological cases (Farnocchia et al. 2015).
Alternative approaches rely on the use of Differential Algebra (DA). Differential algebra supplies the tools to compute the derivatives of functions within a computer environment, i.e. it provides the Taylor expansion of the flow of Ordinary Differential Equations (ODEs) by carrying out all the operations of any explicit integration scheme in the DA framework (Berz 1999;Wittig et al. 2015). DA has already proven its efficiency in the nonlinear propagation of uncertainties (Armellin et al. 2010b;Morselli et al. 2012;Valli et al. 2013). Nonetheless, the accuracy of the method drastically decreases in highly nonlinear dynamics. The propagation of asteroids motion after a close encounter with a major body is a typical case.
A DA based automatic domain splitting algorithm was presented by the authors in the past to overcome the limitations of simple DA propagation (Wittig et al. 2014a(Wittig et al. ,b, 2015. The method can accurately propagate large sets of uncertainties in highly nonlinear dynamics and long term time spans. The propagation algorithm automatically splits the initial uncertainty domain into subsets when the polynomial expansions representing the current state do not meet predefined accuracy requirements. The performance of the algorithm was assessed on the case of asteroid (99942) Apophis, providing a description of the evolution of the uncertainty set to the epoch of predicted close encounters with Earth in 2036 and 2037 (Wittig et al. 2015). Though representing a significant improvement with respect to simple DA propagation, the approach required a not negligible computational effort in propagating the whole set of generated subdomains. Moreover, no information about the impact probability for asteroid Apophis was provided, as the propagation of the uncertainty set was stopped before the close encounters.
We present in this paper an evolution of the automatic domain splitting algorithm. The method, referred to as automatic domain pruning, automatically identifies possible resonances after a close encounter with a major body. Then, assuming no intervening close approaches with other celestial bodies in between, it optimizes the propagation to the first resonant returns, by limiting the propagation of the uncertainty set to the regions that generate a close encounter with that celestial body at the investigated epoch. The result is a list of polynomial state vectors, each mapping only specific subsets of the initial domain to the resonant return epoch. Taking advantage of the availability of the polynomial maps, a DA based Monte Carlo importance sampling technique is then used to generate samples in the propagated subsets and provide an estimate for the impact probability at the epoch of the selected resonant return. The proposed approach does not apply any simplification step on the uncertainty domain associated with the orbit determination process. Thus, the method is proposed as an alternative approach with respect to equivalent techniques, such as a full Monte Carlo simulation or other six dimensional-based orbital sampling techniques, which will represent the main term of comparison for our analysis. The paper is organized as follows. First, we present a description of the automatic domain pruning and importance sampling techniques, showing the application to the case of the first resonant return. Then, we apply the method to the critical case of asteroid (99942) Apophis, providing an estimate of the impact probability for the resonant return in 2036. Finally, we carry out a sensitivity analysis on the main parameters of the method, presenting a comparison with standard and advanced orbital sampling techniques.

DIFFERENTIAL ALGEBRA AND AUTOMATIC DOMAIN SPLITTING
Differential algebra provides the tools to compute the derivatives of functions within a computer environment (Ritt 1932(Ritt , 1948Risch 1969Risch , 1970Kolchin 1973;Berz 1999). Historically, the treatment of functions in numerics has been based on the treatment of numbers, and the classical numerical algorithms are based on the evaluation of functions at specific points. The basic idea of DA is to bring the treatment of functions and the operations on them to the computer environment in a similar way as the treatment of real numbers (Berz 1999). Real numbers, indeed, are approximated by floating point (FP) numbers with a finite number of digits. With reference to Fig. 1, let us consider two real numbers a and b, and their FP counterpartā andb respectively: given any operation × in the set of real numbers, an adjoint operation ⊗ is defined in the set of FP numbers so that the diagram in figure commutes. Consequently, transforming the real numbers a and b in their FP representation and operating on them in the set of FP numbers returns the same result as carrying out the operation in the set of real numbers and then transforming the achieved result in its FP representation. In a similar way, suppose two sufficiently regular functions f and g are given. In the framework of DA, these functions are converted into their Taylor series expansions, F and G respectively. In this way, the transformation of real numbers in their FP representation is now substituted by the extraction of the Taylor expansions of f and g (see Fig. 1, right). For each operation in the function space, an adjoint operation in the space of Taylor polynomials is defined such that the corresponding diagram commutes.
The implementation of DA in a computer environment provides the Taylor coefficients of any function of v variables up to a specific order n. More specifically, by substituting classical real algebra with the implementation of a new algebra of Taylor polynomials, any function f of v variables can be expanded into its Taylor expansion up to an arbitrary order n, along with the function evaluation, with a limited amount of effort. The Taylor coefficients of order n for sum and product of functions, as well as scalar products with real numbers, can be directly computed from those of summands and factors. As a consequence, the set of equivalence classes of functions can be endowed with well-defined operations, leading to the so-called truncated power series algebra. In addition to basic algebraic operations, differentiation and integration can be easily introduced in the algebra, thus finalizing the definition of the differential algebra structure of DA (Berz 1986(Berz , 1987. The DA used in this work is implemented in the DACE software (Rasotto et al. 2016).
A relevant application of DA is the automatic high order expansion of the solution of an ODE with respect to the initial conditions (Berz 1999;Di Lizia et al. 2008;Rasotto et al. 2016). This expansion can be achieved by considering that any integration scheme, explicit or implicit, is characterized by a finite number of algebraic operations, involving the evaluation of the ODE right hand side (RHS) at several integration points. Therefore, replacing the operations between real numbers with those on DA numbers, it yields to the nth order Taylor expansion of the flow of the ODE, φ(t; δx 0 , t 0 ) = M φ (δx 0 ), at each integration time, assuming a perturbed initial condition x 0 + δx 0 . Without loss of generality, consider the scalar initial value problem: (1) and the associated flow φ(t; δx 0 , t 0 ). For simplicity, consider uncertain initial conditions only. Starting from the nth order DA representation of the initial condition, [x 0 ] = x 0 + δx 0 , which is a (n + 1)-tuple of Taylor coefficients, and performing all the operations in the DA framework, we can propagate the Taylor expansion of the flow in x 0 forward in time, up to the final time t f . Consider, for example, the forward Euler's scheme: and replace the initial value with the DA expression [x 0 ] = x 0 + δx 0 . The first time step yields If the function f is evaluated in the DA framework, the output of the first step, [x 1 ], is the nth order Taylor expansion of the flow φ(t; δx 0 , t 0 ) in x 0 for t = t 1 . Note that, as a result of the DA evaluation of f ([x 0 ]), the (n + 1)-tuple [x 1 ] may include several non zero coefficients corresponding to high order terms in δx 0 . The previous procedure can be repeated for the subsequent steps. The result at the final step is the nth order Taylor expansion of φ(t; δx 0 , t 0 ) in x 0 at the final time t f . Thus, the flow of a dynamical system is approximated, at each time step t i , with its nth order Taylor expansion in a fixed amount of effort. Any explicit ODE integration scheme can be rewritten as a DA scheme. For the numerical integrations presented in this paper, a DA version of a 7/8 Dormand Prince (8th order solution for propagation, 7th order solution for step size control) Runge Kutta scheme is used.
The main advantage of the DA based approach is that there is no need to write and integrate variational equations to obtain high order expansions of the flow. It is therefore independent on the RHS of the ODE and it is computationally efficient. Unfortunately, DA fails to accurately describe, with a single polynomial map, the evolution in time of an uncertainty set in case of highly nonlinear dynamics or long term propagation. The approximation error is strictly related to the size of the domain the polynomial is defined in (Wittig et al. 2015). The approximation error between an n + 1 times differentiable function f ∈ C n+1 and its Taylor expansion P f of order n, without loss of generality taken around the origin, is given by Taylor's theorem: for some constant C > 0. Consider now the maximum error e r of P f on a domain B r of radius r > 0 around the expansion point. Considering equation (4), we obtain: If the domain of P f is reduced from B r to B r/2 of radius r/2, the maximum error of P f over B r/2 will decrease by a factor 1/2 n+1 : By subdividing the initial domain into smaller domains and computing the Taylor expansion around the center points of the new domains, the error greatly reduces, whereas the expansions still cover the entire initial set. Starting from these considerations, Automatic Domain Splitting (ADS) employs an automatic algorithm to determine at which time t i the flow expansion over the set of initial conditions is no longer able to describe the dynamics with enough accuracy (Wittig et al. 2015). Once this case has been detected, the domain of the original polynomial expansion is divided along one of the expansion variables into two domains of half their original size. By re-expanding the polynomials around the new centre points, two separate polynomial expansions are obtained. By defining with x i the splitting direction, both generated polynomial expansions P 1 and P 2 have terms of order n in x i smaller by a factor of 2 n with respect to the original polynomial expansion P. Thus, the splitting procedure guarantees a more accurate description of the whole uncertainty set at the current time epoch t i . After such a split occurs, the integration process is resumed on both generated subsets, until new splits are required. A representation of the ADS procedure is shown in Fig. 2.
The decision on the splitting epoch and, in case of multivariate polynomials, the splitting direction relies on estimating the size of the (n + 1)th order terms of the polynomial using an exponential fit of the size of all the known nonzero terms up to order n. If the size of the truncated order becomes too large, we decide to split the polynomial. This method allows us to consider all the information available in the polynomial expansion and to obtain an accurate estimate of the size of the n + 1 order term, the first discarded order. The exponential fit is chosen because, after reducing the domain with a sufficient number of splits, the coefficients of the resulting polynomial expansion decay exponentially as Absolute Size Order Exact Estimate ials q j,m do not depend on x j . Then the size S j,m of the polynomid by the sum of the absolute values of their coefficients and the same outine as described above is applied to obtain an estimate of the size ted terms of order n + 1 in x j . Finally the splitting direction i is choion corresponding to the component x j with largest truncation error plits are performed in the direction of the variable that currently has d contribution to the total truncation error of the polynomial P, and the maximal impact on reducing the approximation error. The splitting ere in general, and the selection of the splitting direction in particular, nt on the parametrization of the initial condition. The direction of maxgeneral is not aligned with a single direction of the parametrization, al variables will contribute to the truncation error. In this case, splits along all variables involved. However, the initial condition can often be hat expansion happens mainly along only a few or even just one of the mics example: domain splitting illustration full analysis of the effect of the splitting precision on the accuracy, er of final sets in the next subsection, we first demonstrate the domain escribed in the previous section. We apply it to the same problem of s dynamics as presented in the Sect. 2.1. Computations are performed at me initial condition box. The splitting precision is set to ε = 3 × 10 −4 , the estimated truncation error of an expansion exceeds this limit a split it was chosen this high to allow for a better visualization of the splitting plications the limit is typically chosen much lower. namics from time t 0 = 0 to time t f = 50 (2.81 nominal revolutions), on takes about 22 s on the same machine used for the example in the produces 23 final polynomial expansions covering the initial condition. e resulting sets at various times during the integration. Up until time t a = l revolutions), the entire set is well described by a single DA expansion. .96 nominal revolutions), just before completing the first revolution 2 , leading to three polynomial patches. Another split is performed at time al revolutions). Figure 5d shows the 15 DA patches that are necessary a direct consequence of Taylor's theorem. A mathematical description is offered hereafter and follows the scheme presented in Wittig et al. (2015). Consider a polynomial P of order n of the form written using multi-index notation, the size S i of the terms of order i is computed as the sum of the absolute values of all coefficients of exact order i: We denote by I the set of indices i for which S i is nonzero. A least squares fit of the exponential function is used to determine the coefficients A and B such that f (i) = S i , i ∈ I, is approximated optimally in least squares sense. Then, the value of f (n+1) is used to estimate the size S n+1 of the truncated order n+1 of P. An example of the application of the method is shown in Fig. 3, where the polynomial is the Taylor expansion of 1 + x/2 up to order 9. The size S i of each order is shown as bars, whereas the resulting fitted function f is shown as a line.
In the case of multivariate polynomials P(x ) = P(x 1 , x 2 , . . . , x v ), the split is performed in one component x i . We determine the splitting direction using a method similar to the one adopted for the splitting decision. For each j = 1, . . . , v we begin by factoring the known coefficients of P of order up to n with respect to x j , i.e.
where the polynomials q j,m do not depend on x j . The size S j,m of the polynomials q j,m is estimated by the sum of the absolute values of their coefficients. Then, the exponential fitting routine is applied to estimate the size S j,n+1 of the truncated terms of order n + 1 in x j . Finally, the splitting direction i is chosen as the component x j with the largest truncation error S j,n+1 . In this way, all splits are performed in the direction of the variable that currently has the largest estimated contribution to the total truncation error of the polynomial P.
The main parameters of the algorithm are the tolerance for the splitting procedure and the maximum number of allowed splits N max . The first parameter is selected according to the required precision of the polynomial expansions and determines the splitting epochs: when the estimated truncation error exceeds the imposed tolerance, the current domain is split. As a direct consequence of the ADS procedure, the maximum error over the obtained set of polynomials decreases with the selected splitting precision. However, the maximum error is always larger than the selected integration precision. This difference is actually expected, as the splitting tolerance plays a similar role as the one-step error set in the automatic step size control of the integration scheme (Wittig et al. 2015). It is the maximum error that can accumulate at any time before the integrator takes action to reduce further error accumulation. However, the accumulated error at the time of the splitting cannot be undone as the splitting only re-expands the polynomial to prevent exponential growth in future integration steps. The ideal tolerance depends on both the dynamics and the integration time, and it has to be chosen heuristically to ensure the final result satisfies the accuracy requirements of the application. A numerical example is shown in Section 7.1.
The second parameter plays the role of limiting the number of generated subdomains by imposing a minimum size for the generated subsets: domain splitting is disabled on any set whose volume is less than 2 −N max times that of the initial domain. That is, any set is split at most N max times. Then, instead of splitting a set further, integration is stopped at the attempt to perform the (N max +1)th split and the resulting polynomial expansion is saved as incomplete. Incomplete polynomials are later treated separately in the analysis of the results (Wittig et al. 2015).
When each generated subset reaches either the final simulation time or the minimum box size, the ADS propagation terminates, and the result is a list of polynomial expansions, each covering a specific subset of the domain of initial conditions. A more detailed description of the ADS algorithm can be found in Wittig et al. (2015).

AUTOMATIC DOMAIN PRUNING
As described in Section 2, automatic domain splitting provides an accurate description of the evolution in time of a given uncertainty set by splitting the domain in subsets when required. Unfortunately, this method may entail a not negligible computation effort, as all generated subsets are propagated to the final simulation time or till the minimum box size is reached. While this approach is unavoidable when the behaviour of the whole uncertainty set is analysed, it becomes a strong limitation when only a portion of the initial set is to be investigated. This is the case when the first resonant return of a Near Earth Object is studied. Resonant returns occur when, during a close encounter, an asteroid is perturbed into an orbit with a period T ∼ k/h years. Thus, after h revolutions of the asteroid and k revolutions of the Earth, both celestial bodies are in the same region of the first close encounter and a second one may occur. Given the initial uncertainty set, only a portion of it may lead to the resonant return. It would be therefore interesting to limit the propagation to this region only.
Starting from these considerations, the Automatic Domain Pruning (ADP) we present in this paper combines the ADS algorithm with a pruning technique with the aim of limiting the number of propagated subsets. We make here the assumption of no close approaches with other celestial bodies between the first close encounter and the selected resonant return. This assumption is easily checked right before the ADP propagation, as later explained in Section 7.2.
The first phase of the algorithm consists in propagating the whole uncertainty set by means of ADS propagation up to the epoch of the first close encounter. The availability of the polynomial expansion of the state vector of the object with respect to the initial uncertainty provides the polynomial expansion of the orbital period of the object after the close encounter. By using a polynomial bounder, we can estimate the range of all possible values of the orbital period after the close encounter and, thus, retrieve all possible resonances with the planet, i.e. all orbital periods included in the computed orbital period range leading to a resonant return with the planet.
Once all resonances are computed, the analysis focuses on a single resonance, and the propagation is resumed. Every time a new subset is generated, the method automatically identifies if the set may lead to the investigated resonant return or not. By exploiting the knowledge of the DA state vector at the epoch of the first close encounter, indeed, we can assign a given orbital period range to each generated subset. This range, defined as ∆T sub , is compared to a refer- We select the reference range in order to consider small dynamical perturbations between the first close encounter and the resonant return. If ∆T sub is at least partially included in the reference range, then the current subset is retained, and its propagation is continued. If ∆T sub is not included in the reference range, then the initial conditions included in the current subset do not lead to a resonant return at the investigated epoch, and so the subset is discarded. This way, subsets are dynamically pruned during the ADS propagation. An illustration of the ADP algorithm is shown in Fig. 4.
The ADP algorithm, therefore, does not alter the sequence of generated subdomains, but limits the propagation in time to those subsets that are involved in the investigated resonant return. This pruning action has a positive impact on the overall computational burden, since the computational effort required by the propagation of all the discarded subsets is saved. As only subsets with close approaches to the Earth at the epoch of the investigated resonant return S1 S2 = 0 = S2 S1 error < Store S1 Discard S2 Figure 4. ADP algorithm illustration. Pruning is performed by comparing the estimated subset orbital period range ∆T sub with the reference range ∆T ref .
are maintained, the result at the end is a set of subdomains whose propagation stops slightly before the epoch of the investigated resonant return for having reached their minimum box size.

IMPORTANCE SAMPLING METHOD
The output of the ADP propagation is a list of subsets at epochs close to the investigated resonant return. Still, no value for the impact probability is available. We obtain an estimate for the impact probability by sampling the generated subsets and propagating the samples till they reach their minimum geocentric distance. Among all possible sampling technique, we employ the Importance Sampling (IS) method (Zio 2013). The IS method amounts to replacing the original probability density function (pdf) q x (x ) with an Importance Sampling Distribution (ISD)q x (x ) arbitrarily chosen by the analyst so as to generate a large number of samples in the importance region of the phase space F, the region of initial conditions leading to an impact with Earth at the epoch of the resonant return. In the case under study, we select the auxiliary distribution in order to limit as much as possible the generation of the samples to the subsets that get through the dynamic pruning. The IS algorithm is the following: . If a good choice for the auxiliary pdf is made, the generated samples concentrate in the region F.
(iv) Compute the estimatep(F) for the impact probability p(F) by resorting to equation (11): (v) Compute the variance of the estimatorp(F) as: The selection of the ISD represents the most critical point for the method. Several techniques have been developed in order to find the one giving small variance for the estimator (Zio 2013). In this paper, we shape the ISD according to the result of the ADP propagation. As described in Section 3, the ADP propagation provides a list of subsets whose propagation is stopped slightly before the resonant return. All subsets are identified as Potentially Hazardous Subdomains (PHS's), but no probability ranking is provided by the ADP propagation. Starting from these considerations, we define the ISD as a uniform probability density function including all the generated subsets over the whole domain. This selection allows us to increase the number of samples drawn in the PHS's and, eventually, in the impact-leading region.

AUTOMATIC DOMAIN PRUNING IMPORTANCE SAMPLING METHOD
The combination of the methods presented in Sections 3 and 4 yields the ADP importance sampling method (ADP-IS) for uncertainty propagation and impact probability computation of the first resonant returns of NEO. The starting point is represented by the output of an orbit determination process of a given NEO at the observation epoch t 0 . This output can be expressed in terms of estimated state vector and related covariance matrix. Then, the steps of the ADP propagation phase are the following: (i) Consider the initial state vector and related pdf and perform an analysis to identify possible epochs of close encounters and resonant returns. The analysis is carried out by propagating the uncertainty set using ADS up to the first close encounter, computing the semi-major axis dispersion over the set with a polynomial bounder and identifying the resonant frequencies. The validity of the resonances is then checked as explained in Section 7.2.
(ii) Select a resonance and identify its epoch t res .
(iii) Perform an ADP propagation till the epoch t res . Every time a split is required, compare the orbital period range of the current subset ∆T sub with the reference range ∆T ref : The method provides a set of n PHS PHS's and related is a polynomial state vector, each component being a function of the initial conditions x i 0 . The IS phase is initialized by setting the value of the estimated impact probabilityp old and the number of iterations n it equal to zero. Then, the steps of the algorithm are the following: (i) Define the ISD functionq x (x ) as a uniform pdf including all the generated PHS's.
(ii) Set n it = n it + 1 and draw one sample x it 0 fromq x (x ). (iii) Check if the sample belongs to one of the PHS's: if it is out of the PHS's, go back to step (ii), otherwise identify the correct PHS i the sample belongs to.
(iv) Compute the algebraic state vector x it f corresponding to the drawn sample x it 0 at the truncation epoch t i f by performing a polynomial evaluation of the DA state vector (vi) Compute the minimum geocentric distance d| it res and evaluate the indicator I it (vii) If I it F = 0, go back to step (ii), otherwise evaluate the new impact probabilityp new . By reformulating equation (12), we obtain: of the previous iterations. The total number of samples considered for the estimation is n it , i.e. the number of drawn samples when the estimate is computed. Note that, since the ISD is uniform over the whole set of PHS's, it can be extracted from the summation.
(viii) Comparep old andp new : if the relative difference is larger than an imposed tolerance, go back to step (ii), otherwise stop.

NUMERICAL SIMULATIONS: THE CASE OF ASTEROID (99942) APOPHIS
In this section, we assess the performance of the ADP-IS method on the evaluation of the impact probability for the test case of asteroid (99942) Apophis. Table 1 shows the nominal initial state and associated uncertainties σ for Apophis on June 18, 2009 expressed in terms of equinoctial parameters p = (a, P 1 , P 2 , Q 1 , Q 2 , l), considering a diagonal covariance matrix. Data were obtained from the Near Earth Objects Dynamic Site 1 in September 2009. We selected a diagonal covariance matrix in order to help distinguish the contribution of the six orbital parameters and test our method in a scenario in which the uncertainty volume is maximized. In general, however, this selection may lead to quite inaccurate results as uncertainties may be highly correlated. Nevertheless,the method can be applied in the most general case of full covariance matrix exactly in the same way, with the only difference that the DA variables would be placed along the directions of the covariance eigenvectors to avoid artificially adding extra-volume in the initial domain definition.
As previously stated, the starting point, not including The aim is therefore to apply the presented method to provide an estimate for the impact probability at the epoch of the first resonant return, in 2036. The motion of Apophis in the Solar system is modelled according to the (N + 1)body problem, including relativistic corrections to the Newtonian forces (Seidelmann 1992;Wittig et al. 2015). Specifically, the full equation is where r is the position of Apophis in Solar System barycentric coordinates, G is the gravitational constant, m i and r i are the mass and the Solar System barycentric position of Solar System body i, r i = |r i − r |, c is the speed of light in vacuum, and β and γ are the parametrized post-Newtonian parameters measuring the nonlinearity in superposition of gravity and space curvature produced by unit rest mass (Seidelmann 1992). The position and velocity vectors of all celestial bodies are computed with NASA's SPICE library 2 . We used the planetary and lunar ephemeris DE432s. The N bodies include the Sun, the planets and the Moon. For planets with moons, with the exception of the Earth, the centre of mass of the system is considered. The dynamical model is written in the J2000 ecliptic reference frame. Figure 6 shows the geocentric distance profile in time for one thousand samples from the initial Gaussian distribution. As expected, the uncertainties significantly increase after 2029 and pave the way to resonant returns in 2036 and 2037.
The authors showed an analysis of the performance of the ADS algorithm for the propagation of the whole set up to the second resonant return in Wittig et al. (2015). The results are now limited to the first resonant return, and they will be used as a reference for the assessment of the performance of the ADP.
All the results presented in this section are obtained considering an expansion order equal to 8, a tolerance for the splitting procedure equal to 10 −10 , a value of N max equal to 12 and an initial uncertainty set with 3σ boundaries, i.e. a 6-dimensional (6D) rectangle with 3σ boundaries.
The initial uncertainty set should be properly selected, as the neglected part of the probability mass, i.e. the integral of the pdf over the domain outside the considered box, could significantly alter the estimated impact probability. For the case under study, in which we are considering a 6-dimensional problem with uncorrelated variables, the selection of a 6D rectangular domain with 3σ boundaries corresponds to considering the 98.4 per cent of the probability mass, and so the estimated impact probability may result underestimated. The accuracy of the estimate improves for larger initial uncertainty sets. A detailed sensitivity analysis on the uncertainty set size and all the other available parameters is offered in Section 7. All computations are performed on a single core Intel i7-3770 CPU @3.4 GHz, 16 GB RAM processor.
The number of subdomains obtained with ADS propagation without pruning is 653, while the computational time is 10 h 6 min. An analysis of the average number of splits per direction shows that most splits occur in the semi major axis (a) and true longitude (l) directions (Wittig et al. 2015). Thus, though the problem is six dimensional, the analysis on the dynamics can be focused on the projection onto the a − l plane of the initial conditions. Figure 5 shows the projection of the initial uncertainty box onto the a − l plane, along with the subdomains generated during the ADS propagation. Colours refer to the truncation epoch of the related subset: white regions represent subsets that were able to reach the final simulation time (May 31, 2036, after the expected resonant return), coloured regions represent subsets whose propagation was stopped earlier because they reached their minimum box size. Figure 5 can be exploited to easily identify the regions of the initial set that are involved in the resonant return in 2036. While all initial conditions lying within white regions have no risk to impact the Earth, coloured subdomains represent sets of initial conditions that might lead to close encounters with Earth at that epoch. That is, coloured regions represent PHS's. This behaviour is expected, as splits occur when the nonlinearities increase, which happens when trajectories get closer to Earth. It is evident, however, that a significant portion of the computational effort required by the ADS propagation is spent on regions of the initial set that are not involved in the first resonant return. Thus, the application of a selective pruning technique as the ADP aims at alleviating this inefficiency.
We now investigate the performance of the ADP method. The first part of the analysis is represented by the propagation of the uncertainty set up to the epoch of the first close encounter in 2029. The DA propagation of the whole uncertainty set up to the close encounter in 2029 is performed with no splits. Therefore, the whole set can be described with a single polynomial map at the epoch of the first close encounter. The availability of the DA state vector of the asteroid, then, provides the polynomial expansion of its perturbed orbital period immediately after the close encounter with the Earth. This polynomial expansion allows us to estimate the asteroid orbital period range after the close encounter by means of a polynomial bounder: for the case under study, this range is equal to [415.02, 428.91] days. By looking at this range, we can identify the first resonances: T res1 = 7/6T ⊕ = 426.12 days (where T ⊕ is the Earth orbital period) is the first resonant orbital period included in the computed range, and it represents a resonant return in 2036. This value is expected, as shown in Fig. 6. We can also notice that the expected second resonant return (in 2037, resonance 8:7), is also included (T res2 = 417.43 days). The a priori identification of the resonances and the application of the ADP-IS method is strictly related to the assumption of no intervening close encounter with other major bodies in between. This assumption is checked immediately after the resonances computation, as explained in Section 7.2. For the case under study, the assumptions are verified. Therefore, we can now concentrate the analysis on the first resonant return, in 2036. Given a nominal value T = 7/6T ⊕ , ∆T ref is determined by setting a value of ε equal to 10 −3 . The value of ε is selected in order to take into account small perturbations between the close encounter in 2029 and the resonant return in 2036. An analysis of the impact of ε on the results is carried out in Section 7.2. The propagation is then resumed as described in Section 5. Figure 7 shows the results of the ADP propagation in terms of subdomains distribution on the a − l plane. A comparison with Fig. 5 clearly shows how the ADP restricts the propagation of the generated subdomains to a limited portion of the PHS's. That is, only subsets that are actually involved in the resonant return in 2036 are propagated till the end of the simulation. The pattern of subdomains is not altered by the introduction of the pruning. Simply, a large portion of the initial set is no longer investigated. This action has a strong impact on the number of propagated subdomains, that is now significantly lower (267). Consequently, the computational time required by the propagation reduces significantly (4 h 6 min).
The pattern of generated subdomains represents the starting point for the second phase, the application of the IS method for the computation of the impact probability in 2036. We initialize the method by defining a uniform pdf including all the generated PHS's as ISD. The boundaries of the ISD on the a − l plane are represented in blue in Fig. 7. Then, samples are drawn from the ISD and each sample is associated with a PHS if possible. For samples belonging to the PHS's, the state vector corresponding to the drawn sample at the truncation epoch of the related PHS is reconstructed, and a pointwise propagation up to the epoch of minimum geocentric distance is performed. Figure 8 shows a focus of the resulting subsets, whereas Fig. 9 shows the pattern of generated samples projected onto the a − l plane. Samples belonging to the PHS's are represented in blue, whereas impacting samples are represented in yellow. Black dots represent discarded samples. Not all samples belong to the PHS's, due to the shape of the selected ISD. A uniform ISD over a domain of regular shape enclosing all PHS's represents the   easiest choice and can be applied regardless the complexity of the PHS's pattern. On the other side, this selection leads to the black dots shown in Fig. 9. These samples, however, have a minimal impact on the computational effort required by the method, as they are discarded as soon as they are identified. The selection of the IS method as sampling technique al- Figure 9. Projection of the generated samples onto the a − l plane. In black, discarded samples. In blue, samples belonging to the PHS's. In yellow, impacting samples.
lows us to increase significantly the number of samples lying within the PHS's with respect to a standard Monte Carlo approach, and this advantage is made possible by the pruning action of the ADP propagation. The analysis of the distribution of the impacting samples on the a − l plane, however, shows that these are confined to a limited region inside the PHS's. That is, not all PHS's actually give a contribution to the impact probability in 2036. This result is related to the selection of the amplitude of ∆T ref : the value was set in order to grant a conservative pruning action on the subsets. A more detailed analysis is offered in Section 7.2. The trend of the estimated impact probability with the number of drawn samples is represented in Fig. 10. Impacting samples are represented with yellow circles. The tolerance for the stopping criterion was set equal to 0.01 per cent. After some initial significant oscillations, the impact probability asymptotically converges to the value of 1.17 · 10 −5 . The estimate is of the same order of magnitude of the reference value (2.2 · 10 −5 ) obtained with a standard Monte Carlo analysis, though slightly lower (see Section 8). This difference can be explained considering the size of the propagated uncertainty set, as later explained in Section 7.3.
An overview of the main results of the simulation is shown in Table 2. Results are expressed in terms of number of PHS's n PHS , computational time required by the ADP propagation t ADP , number of generated samples for the IS method at convergence n samples , computational time required by the IS method t IS , overall computational time t all , estimated impact probability valuep and related Poisson statistics uncertainty σ(p).

SENSITIVITY ANALYSIS
The analysis presented in Section 6 was carried out starting from predefined values of expansion order, tolerance for the splitting routine, maximum number of splits and size of the uncertainty set. In this section, we investigate the role of the different parameters and provide some guidelines for the selection of the most appropriate set of parameters. The discussion is carried out dividing parameters mostly affecting the ADP propagation (order, tolerance, minimum box size and reference orbital period range) and parameters affecting the estimated impact probability (uncertainty box size).

Selection of splitting tolerance, expansion order and N max
As described in Section 2, the main parameters for the ADS propagation are the tolerance for the splitting procedure, Table 3. Average error in position as a function of the imposed tolerance for subsets at the epoch of the first resonant return (ADP propagation, order 8, N max 12, 3σ initial uncertainty set).
Tolerance Error 10 −8 4.83 · 10 −6 AU 10 −9 1.24 · 10 −6 AU 10 −10 5.95 · 10 −7 AU the expansion order and the maximum number of splits. The selection of the tolerance is strictly related to the accuracy required in the description of the subsets at the end of the simulation. This concept is valid in both ADS and ADP propagation. Due to error accumulation during the integration process, indeed, the actual accuracy of the ADP result tends to decrease with respect to the imposed accuracy. This effect becomes more significant as the nonlinearities of the dynamics increase, so that, in order to grant a specific accuracy, the imposed tolerance must be in some cases some orders of magnitude lower. Table 3 shows the average accuracy in position for the subsets at the epoch of the first resonant return considering an expansion order equal to 8 and decreasing values of tolerance. We estimated the accuracy by comparing the results of pointwise propagations and polynomial evaluations for random samples drawn in the generated subsets. The error in position shown in Table 3 represents an average of the computed errors. As expected, there is a difference of around three orders of magnitude with respect to the imposed tolerance.
For the case under study, the error is strictly related to the intervening close encounter in 2029. As an example, if we perform an ADS propagation with order 8 and tolerance 10 −10 , and we stop the propagation three months before the close encounter in 2029, we obtain a position error of about 10 −11 AU. This error expands to 10 −8 AU six months later, i.e. three months after the close encounter. That is, the close encounter yields an increase of about 3 orders of magnitude in the position error. As the propagation continues, the error accumulates and reaches 5.95 · 10 −7 AU at the epoch of the first resonant return. Therefore, the splitting tolerance is a critical parameter and its selection must account for all the above aspects. For our analysis, we selected a tolerance capable of granting a maximum error in position of 100 km. This requirement results into a splitting tolerance of at least 10 −10 .
Expansion order and minimum box size, instead, play quite different roles in ADS and ADP propagation. During a DA propagation, a reduction of the expansion order causes a decrease in the accuracy of the results at a specific integration epoch. This decreased accuracy yields an increase in the required number of splits during the ADS propagation and, overall, a larger number of generated subsets. The role of the minimum box size, instead, is to limit the number of splits, so that, overall, both parameters have a strong influence on the number of generated subdomains and, as a consequence, on the required computational effort. The role of the expansion order is twofold, since a decrease in the order causes the number of subdomains to increase, but reduces the computational time required to perform a single integration step.   ADP IS Thus, it is reasonable to imagine that there exists a specific expansion order capable of minimizing the computational effort required by the ADS propagation. This value, obviously, changes according to the specific case under study. The role of the minimum box size, instead, is univocal: by increasing the value of N max , the computational effort required by the ADS propagation increases.
In the case of ADP propagation, the analysis is quite different. The role of the two parameters for the two phases is reported in Table 6. More specifically, the ADP propagation aims to select only subsets whose integration stops before the resonant return of interest having reached their minimum box size. A change in the expansion order modifies the splitting history, which could, but not necessarily would, modify the overall number of splits. This behaviour has a direct impact on the required computational time, though the description of the role of the expansion order is not immediate. A decrease in the expansion order, indeed, may cause just earlier splits performed with the same splitting sequence, or a complete change in the splitting history. In the first case, the role of the expansion order becomes univocal: a reduction in the expansion order causes a decrease in the computational effort. In the second case, the changes in the splitting history and the number of generated subsets may be so relevant that what is gained in performing single integration steps may be lost in the longer propagation of the generated subsets. Overall, the role of the order in not univocal, and it exists an order that minimizes the computational time required by the ADP propagation.
The role of the minimum box size, instead, is the same as in the ADS propagation: a decrease in the value of N max causes an earlier stop of the propagation of the subsets, and a reduction of the computational effort.
The whole procedure, however, includes both an ADP propagation and a sampling phase, and the role played by the two parameters during the sampling phase is different from the ADP phase. The role of the expansion order is, again, twofold: a reduction of the order causes longer pointwise propagations, but faster polynomial evaluations. The relative weight of the two effects essentially depends on the number of required samples. The role of the minimum box size, instead, is univocal and opposite with respect to the ADP propagation: a reduction of the value of N max implies longer pointwise propagations.
The selection of the best combination of order and minimum box size, therefore, relies on all these aspects. Starting from these considerations, we performed a sensitivity analysis in order to quantify the impact of the two parameters on the performance of the ADP-IS method for the case of the first resonant return of asteroid Apophis.
The results of a sensitivity analysis on the expansion order are shown in Table 4, considering six different expansion orders. The comparison is performed by considering the same parameters of the analysis presented in Section 6. The second column shows the number of generated subdomains. The value is not affected by the expansion order till order 4, while for order 3 the value is more than doubled. This trend can be explained looking at the splitting history. For orders from 8 to 4, no split occurs before the close encounter in 2029, and the sequence of splits is exactly the same, though single splits are performed at different epochs. Things completely change with order 3, with 4 splits occurring before the 2029 close encounter. This change has a direct impact on the number of generated subsets. A difference can be detected also by looking at the required number of samples or at the estimated impact probability and related Poisson statistics uncertainty. Assuming not to alter the sequence of generated samples, values obtained with orders 8 to 4 are identical, whereas values obtained with order 3 are slightly different.
The expansion order has a significant impact on the computational effort required by the ADP and sampling   Fig. 11. We limited the analysis in Table 4 to order 3 as the minimum order, as early splits that appear with this order magnify with order 2, leading to 53 subsets generated before the 2029 close encounter. This behaviour exacerbates the limitations previously pointed out for low orders. Moreover, the error estimation procedure described in Section 2 does not work with linear approximation and tends to provide inaccurate estimates with order 2.
We performed a similar analysis by considering the effect of the minimum box size on the required computational effort and estimated impact probability. Table 5 shows the results of the analysis considering the optimal expansion order identified in Table 4 and the same values of tolerance and uncertainty box size of the previous simulations. As described before, the role of the minimum box size is univocal in the two phases, though opposite, and this trend is confirmed by the analysis: a decrease in the value of N max causes a reduced computational effort required by the ADP propagation but longer pointwise propagations for all samples. For the case under study, N max equal to 10 allows us to minimize the required computational effort.
As in the previous case, a change in the value of N max does not alter the estimated impact probability, though the pattern of generated subsets is now modified. This trend can be explained considering the fact that, a reduction of the value of N max generates larger subsets at earlier truncation epochs, but with the same accuracy. Thus, if the drawn samples are fixed, their mapping to the epoch of the first resonant return is essentially the same. That is, the pattern of impacting samples is not altered.
As described in the presented analysis, both expansion order and minimum box size influence the performance of the method. In particular, the expansion order plays a key role in the definition of the computational effort required by the method, while the minimum box size has a lower influence. As the method is composed by two phases, we can say that the selection of the order must be done in order to minimize the computational effort required by the heaviest one. In our method, the ADP propagation plays this role, so that, in order to decrease its impact on the overall computational time, the most effective way is to reduce the expansion order, still limiting as much as possible the number of generated subsets before the first close encounter.

Definition of ∆T ref
As described in Section 3, the reference orbital period range ∆T ref represents the key parameter for the ADP propagation. The range, centred in the selected resonant return period T , is defined to account both inaccuracies in the estimation of the orbital period range of the subdomains and small dynamical perturbations between the first close encounter and the predicted resonant return. The semi-amplitude of this range, ε, plays therefore a key role as it influences both the accuracy of the probability estimates and the required computational time. Table 7 shows the performance of the ADP-IS method for different values of the reference range semi-amplitude ε. With respect to the previous analyses, two additional parameters are shown: the number of generated subdomains n D and the number of subdomains that include impacting samples n imp . These two parameters provide, along with the number of PHS's n PHS , a clear picture of the pruning action performed during the ADP propagation. We performed the analysis considering four values of ε. In particular, it is interesting to analyse what happens considering the two limiting cases: ε = 0 and ε = 1.
In the first case, the reference orbital period range collapses to the value of T , i.e. subsets are maintained throughout the simulation only if their estimated orbital period range includes T . In this case, the number of PHS's is lower than the one obtained with ε = 10 −3 , but the value of impact probability is exactly the same. This result can be explained considering that the pattern of impacting samples is not altered, as confirmed by the parameter n imp . That is, for the case under study, a less conservative selection of the parameter ε would allow us to obtain the same results, though no evident savings in computational time would be obtained. The selection of ε = 0 corresponds to considering a Keplerian motion between the two encounters. As described in Valsecchi et al. (2003), this assumption may provide quite accurate results for the timing, and for the case under study, where no significant perturbations between the two encounters exist, it can be considered acceptable.
Let us now analyse the second limit case. If ε is set to 1, we are essentially selecting a very large reference range for the orbital period. That is, the ADP propagation becomes an ADS propagation, i.e. no pruning is performed and all subsets are propagated until the final simulation time or Table 7. Performance of the ADP-IS method for different values of ε (order 5, tolerance 10 −10 , N max 10, 3σ initial uncertainty set). the maximum number of splits is reached. The sampling phase, instead, is not significantly altered: the ISD is defined including subsets whose propagation stops before the expected resonant return. Results are reported in the last row of Table 7. The value of impact probability is similar to the one obtained with the pruning action, but the number of generated subsets is much larger, which affects in turn the required computational time. That is, a very conservative selection of ε would yield a factor three increase in computational time.
The selection of the parameter ε is therefore crucial. Within the assumptions of our method, i.e. small perturbations between the two encounters, we can define an upper threshold for the ε value as where h is the number of revolutions of the asteroid between the encounters, whereas v ⊕ is the Earth heliocentric velocity. The expression on the left hand side of the inequality represents the heliocentric arc covered by the Earth in the time range hεT . When we define the reference range ∆T ref and we compare it with ∆T sub of a given subset, therefore, we are verifying that the uncertainty in the position of the current subset with respect to the Earth position is lower than 0.05 AU, i.e. the current subset can be labelled as Potentially Hazardous. For the case under study, this value is about 10 −3 , the value selected for the analysis presented in the previous sections. As previously stated in the paper, the application of the ADP-IS method is strictly related to the assumption of no intervening close approaches with other major bodies in the investigated time window. In case of expected close approaches, indeed, the situation drastically changes as the resonances estimated at the epoch of the first close encounter may lose their validity. In such cases, one must rely on the more conservative approach of ADS propagation for investigating a selected propagation window, obtaining accurate results with unavoidable drawbacks in efficiency.
The decision whether to perform an ADP propagation or disable pruning is made based on a preliminary analysis of the possible trajectories of the asteroid between the two encounters. For the case under study, the availability of the dispersion of Apophis' orbital parameters after the first close encounter allows us to estimate the minimum orbit intersection distance (MOID) dispersion between the asteroid and the other main bodies of the Solar System (see Armellin et al. (2010a)). This fast survey provides us with an overview of possible close approaches between the two encounters and drives our decision on the propagation method. For the case under study, the analysis required less than one minute and allowed us to exclude any significant encounter with other planets in between.

Effect of the size of the uncertainty domain
The analysis presented in the previous sections was done considering different values of expansion order, tolerance for the splitting procedure and minimum box size, whereas we considered only one size for the initial uncertainty set, that is a 3σ 6-dimensional rectangle. This selection, in a 6dimensional problem with uncorrelated variables, consists in considering the 98.4 per cent of the probability mass. In the following paragraphs, we present the impact of the size of the uncertainty set on the results of the ADP-IS method. It is worth noting that, as previously mentioned in Section 6, in case of full covariance matrix, the initial uncertainty box would be defined in the eigenvector space in order to avoid wrapping effect and including very low probability solutions, so all the analyses and values presented in this section hold for the more general case of correlated variables.
The ADP propagation and the IS phase are strongly influenced by the selection of the size of the initial uncertainty set. The ADP propagation, indeed, limits the generation of the subsets within the boundaries of the considered uncertainty set, and this aspect influences also the shape of the ISD for the impact probability computation phase. Samples, indeed, are confined within the initial uncertainty set, impacting samples are found only within these limits and possible impacting samples that lie out of the initial uncertainty set are discarded. This aspect distinguishes our sampling approach from a standard Monte Carlo method, where samples are drawn directly from the original probability density function, and the probability of drawing samples is determined by the pdf itself. In principle, samples could lie anywhere in the uncertainty region.
Starting from these considerations, it is therefore interesting to study how the estimate for the impact probability changes with an increasing size of the initial uncertainty set. We initially performed the analysis by considering a size for the initial uncertainty set of 3σ (i.e. what was previously presented), 4σ and 5σ. Figure 12a shows the results for the ADP propagation considering order 5, tolerance 10 −10 , N max equal to 10 and an initial uncertainty set size of 4σ. The ISD boundaries on the a − l plane are represented in light blue. On the same plot, the 3σ boundaries are represented with black dashed lines, whereas the ISD boundaries for the 3σ case are represented with dashed blue lines. The plot allows us to compare the sample regions in the two cases. In particular, a portion of the PHS's generated during the 4σ ADP propagation is not considered during the 3σ case. The 5σ case is shown in Fig. 12b, representing the 5σ ISD in cyan. Figure 13 shows a comparison of the results of the sampling phase for the 3σ, 4σ and 5σ cases. Blue points represent drawn samples belonging to the 3σ case, with impacting  (b)) and the 3σ boundaries respectively. In light blue and cyan, the ISD boundaries for the 4σ and 5σ respectively. In dashed blue, the 3σ ISD boundaries. Figure 13. Comparison of the samples distribution for the 3σ, 4σ and 5σ cases. In blue, light blue and cyan, drawn samples for the 3σ, 4σ and 5σ case respectively. In yellow, black and red, impacting samples for the 3σ, 4σ and 5σ case respectively.
samples represented as yellow dots. Light blue points represent drawn samples belonging to the 4σ case, with impacting samples represented as black dots. Finally, cyan points represent accepted samples for the 5σ case, with impacting samples represented with red dots. The analysis of the plot offers a clear picture of how the sampling region changes in the three cases. Moreover, it is possible to see how the increasing size of the initial uncertainty set allows us to include impacting samples out of the 3σ domain. While a 3σ domain appears as a too narrow selection, the 4σ and 5σ domains offer a better description of the impact region. Figures 14a  and 14b show the distribution of the impacting samples for the three different simulations, with colors showing the contribution to the overall impact probability. Table 8 shows the results of the analysis, including the 6σ case. With reference to the previous analyses, we added the parameter p out , which represents the probability mass outside the selected uncertainty set, i.e. the complementary to 1 of the integral of the pdf over the considered domain. We remark that, because we use rectangular uncertainty sets, the values of p out shown in Table 8 are significantly smaller than those corresponding to the more commonly used ellipsoidal uncertainty regions, for which p out is equal to 0.17, 1.38 · 10 −2 , 3.41 · 10 −4 and 2.76 · 10 −6 for the 3σ, 4σ, 5σ and 6σ cases respectively.
By increasing the size of the initial uncertainty set, the computational time required by the ADP propagation increases. This trend is essentially due to the larger computational effort required by a single integration step and the larger number of subsets. An increase in the number of generated PHS's can be detected passing from 3σ to 4σ,   whereas this value remains essentially the same in the 5σ and 6σ cases. The analysis of the sampling phase shows some interesting results. As expected, an increase in the initial uncertainty size causes an increase in the estimated impact probability value. Essentially, regions of the uncertainty set that were not studied during the ADP propagation for the 3σ case are now considered, and impacting samples can be found also in these regions. As a result, the estimated impact probability values for the 5σ and 6σ cases become very close to the reference value. The enlargement of the investigated region causes also an increase in the Poisson statistics uncertainty of the estimate. This result is expected too, as the variance is proportional to the sample region volume (see equation (13)). The analysis of the required number of samples at convergence shows that this value increases for larger initial uncertainty sets, and this trend reflects back on the computational time required by the sampling phase.
The size of the uncertainty set should be selected to achieve the desired resolution on the impact probability, which is directly expressed by the parameter p out . For the case under study, with an estimated impact probability of the order of 10 −5 , we selected a 6σ domain, which excludes only 1.18 · 10 −8 of the probability mass.

COMPARISON WITH STANDARD AND ADVANCED ORBITAL SAMPLING TECHNIQUES
The analysis presented in the previous sections showed how the ADP-IS method represents a valuable tool for uncertainty propagation and impact probability computation for the first resonant return of a NEO. In order to assess the efficiency of the method with respect to other impact probability computation tools, we present in this section a comparison with standard and advanced orbital sampling techniques. In the first part, we compare our approach with Monte Carlo sampling techniques based on sample generation on the whole uncertainty set. Finally, we present a general comparison with the most used technique for impact probability calculation, the LOV method.
A first comparison can be done considering a standard Monte Carlo approach, where samples are drawn from the covariance matrix directly at the initial epoch (June 18, 2009). This method is probably the most straightforward approach but also the most expensive one, as the sampling is performed on the whole domain, and the propagation of each sample starts from the observation epoch. By performing the propagation of one million samples, the estimated impact probability results into 2.2 · 10 −5 , whereas the Poisson statistics uncertainty is equal to 4.71 · 10 −6 . We selected the number of samples in order to detect a non null value of impact probability (Farnocchia et al. 2015).
Unfortunately, if the Monte Carlo simulation is performed considering the same conditions of our method (i.e same dynamics, single core), the required computational time is much larger. The average computational time required to perform a single pointwise propagation from the initial epoch to the epoch of the first resonant return is ≈ 1.2 s. As a result, within the computation time required by the ADP-IS method for the 6σ case (see Table 8), about 4400 samples could be propagated, which is not enough to estimate the expected impact probability. All this would lead to an estimated computational time of around two weeks for propagating one million samples on a single core. This value is of course not realistic, as typically Monte Carlo analyses can be easily set up in a multi-thread environment, thus granting significant savings in computational time. It is interesting, however, to highlight the significant savings that our approach grants with respect to standard MC approach in the same conditions. The ADP-IS method, indeed, employs a lower number of samples, as samples are drawn just in a subset of the uncertainty set. Moreover, the propagation of all samples starts immediately before the resonant return, while in a standard Monte Carlo approach each sample is propagated starting from June 18, 2009. Therefore, the computational effort required by the ADP propagation is largely repaid later by shorter pointwise propagations and a reduced number of samples.
We show now a comparison with an advanced Monte Carlo technique called Subset Simulation (SS). The basic idea of SS is to compute small failure probabilities as the product of larger conditional probabilities (Au & Beck 2001;Zio & Pedroni 2009;Zuev et al. 2012). Given a target failure event F, let F 1 ⊃ F 2 ⊃ ... ⊃ F n = F be a sequence of intermediate failure events, so that F k = k i=1 F i , k = 1, 2, ..., n. Considering a sequence of conditional probabilities, then the failure probability becomes: where P(F i+1 |F i ) represents the probability of F i+1 conditional to F i . A detailed description of the algorithm can be found in Au & Beck (2001). In the problem under study, the failure F represents an impact with Earth, i.e. a geocentric distance smaller than the Earth radius. The method is initialized using standard MC to generate samples at the so-called conditional level (CL) 0 starting from the available nominal state vector and related uncertainty of the investigated object at the observation epoch. The number of samples generated at this level is maintained for each generated conditional level and it is referred to as N. Once the failure region F 1 is identified, a Monte Carlo Markov Chain (MCMC) Metropolis Hastings algorithm is used to generate conditional samples in the identified intermediate failure region. Another intermediate failure region is then located, and other samples are generated by means of MCMC. The procedure is repeated until the target failure region is identified. An illustration of the method is shown in Fig. 15. The approach was originally developed for the identification of structural failures, but it was also used in different research areas in reliability such as the definition of failure probabilities of thermo-hydraulic passive systems. The method was recently applied to the computation of space debris collisional probabilities by Morselli et al. (2014).
In the presented approach, the intermediate failure regions are identified by assuming a fixed value of conditional probability p 0 = p(F i+1 |F i ). The identification of each conditional level, therefore, is strictly related to this value, and changes accordingly step by step, as explained in the followings. The resulting SS algorithm follows the general description presented in Morselli et al. (2014) and goes through the following steps: (i) Set i = 0 and generate N samples x 0,k 0 , k = 1 , ... , N at conditional level 0 by standard MC starting from the available state estimate of the investigated objects at the initial epoch t 0 .
(ii) Propagate each sample up to the epoch of the first resonant return and compute its minimum geocentric distance. Note that, as in the ADP-IS method, the resonances can be easily determined by propagating the uncertainty set up to the epoch of the first close encounter by means of DA and evaluating the orbital period range.
(iii) Sort the N samples in descending order according to the associated geocentric distance at the epoch of the first resonant return.
(iv) Identify an intermediate threshold value D i+1 as the geocentric distance corresponding to the (1 − p 0 )Nth element of the sample list. Define the (i + 1)th conditional level as F i+1 = {d < D i+1 }, where d represents the geocentric distance. According to the definition of D i+1 , the associated conditional probability p(F i+1 |F i ) = p 0 .
(v) If D i+1 < R ⊕ , i.e. the geocentric threshold distance is lower than the Earth radius, go the the last step, otherwise select the last p 0 N samples of the list x i, j 0 , j = 1, . . . , p 0 N. By definition, these samples belong to the (i + 1)th conditional level.
(vi) Using MCMC, generate (1 − p 0 )N additional conditional samples starting from the previously selected seeds belonging to F i+1 . A sample is set to belong to F i+1 according to the following performance function: > 0 x 0 is out of the (i + 1) th CL = 0 x 0 is at the limit of the CL < 0 x 0 belongs to the (i + 1) th CL (vii) Set i = i + 1 and return to step 2.
(viii) Stop the algorithm.
The total number of generated samples is where n is the overall number of conditional levels required to reach the impact region. Since the conditional probability is equal to p 0 for each level, the impact probability expressed by equation (19) becomes: where N n is the number of samples belonging to the last conditional level whose geocentric distance is lower than the Earth radius.
The main degrees of freedom of the method are the selected fixed conditional probability p 0 , the number of samples per conditional level and the proposal auxiliary distribution for the MCMC phase, and they govern the accuracy and efficiency of the method (Zuev et al. 2012). We used for our analysis 1000 samples per conditional level and a value of conditional probability equal to 0.1. A normal distribution with spread equal to the original pdf was selected as proposal pdf for the MCMC algorithm. A comparison between the SS technique and the ADP-IS method is shown in Table 9. The required number of samples, the overall computational time, the estimated impact probability and related Poisson statistics uncertainty are shown. Results for the ADP-IS method are the ones referring to the 6σ case.
Subset Simulation and ADP-IS have a similar computational burden, though the required number of samples is very different. This result is expected, as the propagation windows for the two cases are different. Figure 16 shows the distribution on the a − l plane of the generated conditional samples obtained with SS, along with the thresholds per conditional level and related colors. Impacting samples at the last conditional level are represented in black. Conditional samples progressively move to the left, until impacting samples at conditional level 4 are identified. If compared to Figs. 8-9, this region is practically coincident with the PHS's identified during the ADP propagation. That is, SS and ADP-IS allow us to identify the same region in two completely independent ways.
The advantage of the ADP-IS method is that, by identifying the PHS's, the propagation of the samples is drastically reduced in time, which yields a similar computational burden though the number of generated samples is significantly larger. Potentially, the ADP-IS method could take advantage of parallelization both during ADP propagation and the sampling phase, while the advantages for SS would be lower, as parallelization could be introduced only for specific phases of the algorithm. This approach would heighten the difference in efficiency between the two methods. However, the great savings granted by the SS could be included in the ADP-IS method during the sampling phase, by replacing the standard MC performed in the ISD with a SS limited to the unpruned subsets. This aspect may represent a future development of the method. Overall, the combination of ADP propagation and importance sampling allows us to achieve a computational burden that is competitive with both standard and advanced Monte Carlo techniques.
Finally, it is worth comparing the performance of the presented approach with the reference technique in the field of impact probability computation, the LOV method. The LOV method takes advantage of the fact that the orbital uncertainty grows with time by stretching into a long slender ellipsoid in Cartesian space (Farnocchia et al. 2015). The tendency of uncertainty to stretch during propagation suggests the possibility of a one-dimensional parametrization of the uncertainty region, i.e. the sampling and the generation of the so-called Virtual Asteroids (VAs) is performed along the line of weakness of the orbit determination, and if all orbits are sufficiently close to the LOV, then significant sav- Table 9. Comparison of the performance of ADP-IS method (order 5, tolerance 10 −10 , N max equal to 10, 6σ uncertainty set) and SS method (N equal to 1000, p 0 equal to 0.1) n samples t allp σ(p) ADP-IS 353056 1 h 28 min 2.09 · 10 −5 7.00 · 10 −6 SS 4600 1 h 30 min 2.46 · 10 −5 5.04 · 10 −6 -8 -6 -4 -2 0 2 4 6 8 -4 ings in computational time with respect to a standard Monte Carlo approach are obtained without sacrificing reliability.
The analysis presented in Milani et al. (2005) offers a first term of comparison: the generic completion level of 10 −7 can be obtained with the propagation of only ∼ 10 4 VAs. If compared to a standard MC approach, it would lead to compute times 3-4 orders of magnitude below those required for similar completeness with MC simulations (Farnocchia et al. 2015). The analysis presented in the previous section showed that the ADP-IS method grants a reduction in computation burden of around two orders of magnitude with respect to standard MC. Therefore, the LOV shows better performance than the ADS-IS method in the current implementation.
Nevertheless, there are some cases in which the LOV method does not guarantee the same level of accuracy of a standard MC approach. A first case occurs when the observed arc of the investigated object is very short, i.e. 1 or 2 days (Milani et al. 2005). In this case, the confidence region is wide in two directions and the unidimensional sampling may not be suitable. What happens is that different LOVs, computed with different coordinates, provide independent sampling and may provide different results. That is, if some impacting samples lie well of the LOV and are separated from it by some strong nonlinearity, then the VAs selected along the LOV may fail to indicate some potential threatening encounters (Milani et al. 2002). In such cases, a standard MC approach would result more reliable, with unavoidable drawbacks in terms of computational time. As presented in this paper, the ADP-IS method, though maintaining a six-dimensional sampling, allows us to drastically reduce the computational effort by limiting the sampling to just specific regions. For these reasons, the method may be considered as a valuable trade-off between the efficiency of the LOV method and the reliability of standard MC in all those cases in which the former may result inaccurate. The possibility of improving the efficiency of the method by means of parallelization in both ADP propagation and sampling phases represents another step in this direction, as well as an optimised coding of the dynamics.

CONCLUSIONS
This paper introduced the combination of automatic domain pruning and importance sampling for uncertainty propagation and impact probability computation for Earth resonant returns of Near Earth Objects. The automatic domain pruning represents an evolution of the DA based automatic domain splitting technique, it allows us to estimate possible resonances after a planetary close encounter and limit the propagation of an uncertainty set to those subsets that may be involved in the resonant return of interest. During the propagation, the uncertainty domain is divided into subsets (Potentially Hazardous Subdomains) whose propagation stops just before the epoch of the resonant return. The identification of PHS's represents the starting point for the sampling phase. An importance sampling probability density function is defined over these subdomains and samples are drawn directly from this auxiliary pdf. We tested the ADP-IS method on the case of asteroid (99942) Apophis, provid-ing an estimate for the impact probability in 2036. We carried out a sensitivity analysis on the main parameters of the method, providing general guidelines for their selection. The comparison with a standard Monte Carlo approach showed how the ADP-IS method can reduce the computation effort by more than two orders of magnitude, still granting the same accuracy level for the impact probability estimate. In addition, the current algorithm can be implemented to make use of parallelization techniques in both the ADP and the IS phase, thus significantly reduce the required computational time. All these considerations suggest that the method may be used as a valuable alternative to standard MC in all those cases in which the LOV method does not guarantee the required level of accuracy. Future developments include a more rigorous formulation of the reference orbital period for subsets pruning allowing us to extend the pruning algorithm to the more critical case of intervening close encounters with other celestial bodies between the two encounters, and the testing to a wider set of cases.