论文标题

天文学:天文数据中的个性化主动异常检测

Astronomaly: Personalised Active Anomaly Detection in Astronomical Data

论文作者

Lochner, Michelle, Bassett, Bruce A.

论文摘要

诸如Vera C. Rubin天文台和平方公里阵列之类的调查望远镜将发现数十亿个静态和动态的天文来源。这些庞大的数据集经过适当的开采,可能是稀有或未知天体物理现象的井。面临的挑战是数据集是如此之大,以至于大多数数据永远不会被人眼看到。目前,我们必须检测到相关异常的最强大仪器。机器学习是该制度中异常检测的有用工具。但是,它努力区分有趣的异常和无关的数据,例如工具伪像或稀有的天文来源,这些数据根本不引起特定科学家的兴趣。主动学习将人脑的灵活性和直觉与机器学习的原始处理能力相结合。通过战略性地选择特定的对象进行专家标签,它可以最大程度地减少科学家必须查看的数据量,同时最大程度地提高潜在的科学回报。在这里,我们介绍了天文学:一种一般的异常检测框架,采用一种新型的活跃学习方法,旨在提供个性化的建议。天文学可以在大多数类型的天文数据上运行,包括图像,光曲线和光谱。我们使用Galaxy Zoo数据集来证明天文学的有效性以及模拟数据以彻底测试我们的新活跃学习方法。我们发现,对于两个数据集,天文学大致使用户查看的前100个对象中发现的有趣异常数量翻了一番。天文学很容易扩展,以包括新的特征提取技术,异常检测算法甚至不同的主动学习方法。该代码可在https://github.com/michellelochner/astrononaly上公开获取。

Survey telescopes such as the Vera C. Rubin Observatory and the Square Kilometre Array will discover billions of static and dynamic astronomical sources. Properly mined, these enormous datasets will likely be wellsprings of rare or unknown astrophysical phenomena. The challenge is that the datasets are so large that most data will never be seen by human eyes; currently the most robust instrument we have to detect relevant anomalies. Machine learning is a useful tool for anomaly detection in this regime. However, it struggles to distinguish between interesting anomalies and irrelevant data such as instrumental artefacts or rare astronomical sources that are simply not of interest to a particular scientist. Active learning combines the flexibility and intuition of the human brain with the raw processing power of machine learning. By strategically choosing specific objects for expert labelling, it minimises the amount of data that scientists have to look through while maximising potential scientific return. Here we introduce Astronomaly: a general anomaly detection framework with a novel active learning approach designed to provide personalised recommendations. Astronomaly can operate on most types of astronomical data, including images, light curves and spectra. We use the Galaxy Zoo dataset to demonstrate the effectiveness of Astronomaly, as well as simulated data to thoroughly test our new active learning approach. We find that for both datasets, Astronomaly roughly doubles the number of interesting anomalies found in the first 100 objects viewed by the user. Astronomaly is easily extendable to include new feature extraction techniques, anomaly detection algorithms and even different active learning approaches. The code is publicly available at https://github.com/MichelleLochner/astronomaly.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源