Across many disciplines, the size and scope of data are quickly outpacing traditional methods
of analysis and visualization. Increasingly, new discoveries and results depend on sophisticated
statistical analysis of large datasets, which can be difficult for others to reproduce. To keep the
scientific process healthy within this new paradigm will take a renewed effort toward open
research -- including open code and open data -- to lower the barrier for scientific results to be
reproduced, double-checked, and extended. AstroML is an initiative to
encourage this sort of open research in Astronomy and Astrophysics. AstroML is a Python package
which contains an open source compendium of statistics, data mining, and machine learning tools,
along with hundreds of examples of their use on data drawn from open-access astronomical
catalogs. The package takes advantage of the active work of the open-source scientific Python
community centered around the packages NumPy, Scipy, Matplotlib, and Scikit-learn. In this talk
I will highlight some of the more interesting tools and datasets made available through AstroML,
and discuss some of the current research and education it enables.
|