|
||||||||||||||
| National Virtual Observatory | ||||||||||||||
|
|
The XMatch AlgorithmThe cross-matching algorithm is the heart of the SkyQuery distributed query engine. It is absolutely critical to the success of SkyQuery as a distributed astronomical query service. We have devoted a great deal of thought to the design of this algorithm, and it is encapsulated in a SQL stored procedure called spGetMatch that gets invoked by the XMatch() function of the SkyNode in the C# implementation. We discuss below the cross-matching strategy that we adopted and the reasons for adopting it.Likelihood analysisThe elements of our cross-matching strategy are as follows:
The cross-matching algorithm encoded in the
stored procedure is a probabilistic
calculation that minimizes the chisquare parameter as defined by:
We compute four cumulative quantities at each
cross-identification step – these are This is divided by the number of surveys considered up to that point, and compared to the tolerance. If a tuple’s log-likelihood exceeds this threshold, it is killed. This cross-identification process is fully symmetric, the particular order of matching does not matter. The cross-matching is applied to each node recursively by the portal when it runs the query execution plan. Mandatory Matches and DropoutsThe majority of cross-matching queries would search for objects that match in each one of the selected catalogs. This is the mandatory match mode meaning that objects must meet the matching criterion in every archive that the query is run on. However, users may actually want those objects that exist in one or more archive(s) and not in the other(s). These dropouts are often as important scientifically as the matches, e.g. quasars that appear in an optical sources catalog but not in a radio sources catalog. Our algorithm is designed to handle both cases. The special syntax that we have introduced into ADQL/s to achieve this is the XMATCH construct, as illustrated in the example below:
SELECT …
FROM SDSS:photoobj p, 2MASS:photoobj t, FIRST:obj r
WHERE XMATCH( p, t, !r ) < 3 AND Region('Circle J2000 0.9 0.8 0.3)
AND (…)
--remaining constraints
which means “find all objects that satisfy the remaining constraints in the archives represented by p and t, but not in the archive represented by r”. Hence we are selecting mandatory matches in the first two archives but dropouts in the third. |
|||
|