论文标题
树生 - 用于数据框架的蒙特卡洛发生器
TreeGen -- a Monte Carlo generator for data frames
论文作者
论文摘要
数据科学中的典型问题是创建一个结构,该结构编码行中独特元素的发生频率以及数据框架不同行之间的关系。我们介绍了概率树抽象数据结构,即决策树的扩展,该结构有助于具有分配的概率的两个以上选择。这样的树代表数据框架不同行之间的统计关系。概率树算法结构是由发电机模块提供的,该模块是越过树的蒙特卡洛发生器。这两个组件在Treegen Python软件包中实现。该软件包可以用于增加数据多样性,压缩数据保存其统计信息,构建层次模型,探索数据以及功能提取。
The typical problem in Data Science is creating a structure that encodes the occurrence frequency of unique elements in rows and relations between different rows of a data frame. We present the probability tree abstract data structure, an extension of the decision tree, that facilitates more than two choices with assigned probabilities. Such a tree represents statistical relations between different rows of the data frame. The Probability Tree algorithmic structure is supplied with the Generator module that is a Monte Carlo generator that traverses through the tree. These two components are implemented in TreeGen Python package. The package can be used in increasing data multiplicity, compressing data preserving its statistical information, constructing hierarchical models, exploring data, and in feature extraction.