论文标题
与JSON兼容二进制序列化规范的基准
A Benchmark of JSON-compatible Binary Serialization Specifications
论文作者
论文摘要
我们使用Schemastore开放式测试套件收集了400多个JSON文档,符合其各自的模式并代表其在整个行业中使用的代表,我们提供了与JSON兼容二元序列化规范的全面基准。我们基准一组模式驱动的(ASN.1,Apache Avro,Microsoft Bond,Cap'n Proto,FlatBuffer,协议缓冲区和Apache Thrift)和无模式(BSON,CBOR,CBOR,FLEXBUFFERS,MESSEDSPACK,SMILEPACK,SMILE和UBJSON)JSON兼容json composible companial compapible Barinial serialsial serialialsial serialialsial serialialsial serialial serialial serialial serialial serialial serialialsization。现有有关基准与JSON兼容二元序列化规范的文献在二元序列化规范覆盖范围,可重复性和代表性,数据压缩在二元序列化中的作用以及过时的二元均值规范的选择和使用。我们为JSON文档介绍了一个分类分类学,该文件由36个类别组成,分类为第1层,第2层和第3层,作为基于其规模的JSON文档,内容类型,其结构特征和冗余标准的共同基础。我们构建并发布了一个免费的在线工具,可以根据我们的分类法自动对JSON文档进行分类,该分类法生成了相关的摘要统计信息。为了公平性和透明度,我们坚持可再现的软件开发标准,并在Github上公开托管基准软件和结果。
We present a comprehensive benchmark of JSON-compatible binary serialization specifications using the SchemaStore open-source test suite collection of over 400 JSON documents matching their respective schemas and representative of their use across industries. We benchmark a set of schema-driven (ASN.1, Apache Avro, Microsoft Bond, Cap'n Proto, FlatBuffers, Protocol Buffers, and Apache Thrift) and schema-less (BSON, CBOR, FlexBuffers, MessagePack, Smile, and UBJSON) JSON-compatible binary serialization specifications. Existing literature on benchmarking JSON-compatible binary serialization specifications demonstrates extensive gaps when it comes to binary serialization specifications coverage, reproducibility and representativity, the role of data compression in binary serialization and the choice and use of obsolete versions of binary serialization specifications. We introduce a tiered taxonomy for JSON documents consisting of 36 categories classified as Tier 1, Tier 2 and Tier 3 as a common basis to class JSON documents based on their size, type of content, characteristics of their structure and redundancy criteria. We built and published a free-to-use online tool to automatically categorize JSON documents according to our taxonomy that generates related summary statistics. In the interest of fairness and transparency, we adhere to reproducible software development standards and publicly host the benchmark software and results on GitHub.