|
|
SkyQuery Limitations
SkyQuery is a facility that allows users to access individual
astronomical catalogs, as well as to compare them finding positional
cross-matches subject to any other conditions or constraints the user
wishes to define based on the data in the catalogs. The catalogs/databases
that are available for use are shown on the Query Screen on the left side,
under the title Nodes. We call them SkyNodes.
Users should be aware of the fact that queries between SkyNodes (including
MyData) are always limited to a maximum of 1,000,000 rows. We apply this
restriction so Web access is possible and big queries don't swamp the
systems.
What does this 1,000,000 rows limit mean?
- Single node queries will be limited to 1,000,000 rows.
- Cross-matches between query sets that contain more than
1,000,000 objects are likely to be incomplete.
Why?
The way SkyQuery works is as follows:
- First, SkyNodes are queried for the number of rows that meet the query
constraints.
- Then a query plan is created in such a way that the smallest SkyNode is
executed first and this sends the results to the next in size to do the
first cross-match and so on.
- If the first SkyNode has more than 1,000,000 objects meeting the WHERE
condition, the first cut will be applied here. This is likely to happen
when the REGION constraint covers a big area with a lot of objects, OR/AND
other conditions in the WHERE statement are not very restrictive.
- An additional cut may happen when the cross-match is performed and the
results are sent to the next node. During the cross-match process, each
object from the prior node is compared to the current catalog looking for
matches. If the prior node provided about 1,000,000 rows and a one-to-one
match is expected, the xmatch table might end up with more than 1,000,000 rows
depending on how restrictive is the confidence level in "XMATCH () <
confidence_level" and the catalog sigma.
http://www.skyquery.net/Sky/SkySite/help/algo.aspx
Conclusion
If you expect to have a big overlay between catalogs, use
constrains (PARAMETER ranges or REGIONS) that keep the number of objects
small. How small depends on how many objects you expect to cross-match per
object.
We are already working on a parallel framework capable of doing
full catalog-to-catalog cross-matches.
Thank you for your patience!
|