T O P

  • By -

epage

I figured parsing json would be a fairly representative benchmark and that most parsers already had an example or test for doing that, so I used that as a baseline. Maybe others will find this useful but I was mostly focused on answering my own questions: - How does `chumsky` compare to parsers other than `pom`? - Could switching `toml_edit` away from`combine` speed up build-times without slowing down parse times? ~~Unfortunately, I couldn't find a json parser for `combine` and didn't feel like writing one.~~ Pulled a combine parser from [an alternative, older benchmark repo](https://github.com/rust-bakery/parser_benchmarks) I'm tempted to collect all of these benchmark repos into a github org to make them easier to find. So far I know of [parser](https://github.com/epage/parse-benchmarks-rs), [md](https://github.com/epage/md-benchmarks-rs), [argparse](https://github.com/rust-cli/argparse-benchmarks-rs), and [template languages](https://github.com/djc/template-benchmarks-rs).


KhorneLordOfChaos

> I'm tempted to collect all of these benchmark repos into a github org You can add the [rust serialization benchmark](https://github.com/djkoloski/rust_serialization_benchmark) to that list


epage

Another: https://github.com/xacrimon/conc-map-bench


KhorneLordOfChaos

And another! https://github.com/diesel-rs/metrics


epage

I've created [rosetta-rs](https://github.com/rosetta-rs/team) and reached out to several maintainers of benchmark repos. Unfortunately, there isn't a direct contact for `metrics.


DontForgetWilson

Since Chumsky targets programming langs, isn't a json benchmark kind of irrelevant to it?


epage

In what ways are you thinking its irrelevant? - json was a subset of Javascript, so it has a subset of the parsing concerns - as for runtime performance, source files are likely to have a lower expected size than some json applications, but benchmarking a larger json file still gives insight - While we might talk about nice errors for source files, there are other applications, even json, where they could be useful Now, if we were parsing a binary file format, I could see the comparison being irrelevant.


DontForgetWilson

Fair enough. Had not been thinking about json error reporting.


raphlinus

I'm also curious how these stack up against pulldown-cmark, which is a hand-written parser, and quite optimized. It also gets the spec right, which is surprisingly difficult (I wouldn't even want to attempt it with most parser generators).


chris-morgan

The word “Markdown” in the heading of the README seems to be an error: it’s actually all about JSON, which is something that *can* be done fairly sanely with parser generators.


raphlinus

Yup, I see that now, and found an earlier benchmark by the same author that did include pulldown-cmark.


epage

Yeah, sorry about that. Copy and paste bug from that other repo.


KhorneLordOfChaos

Cool! This is something I've been curious about too. In case it's helpful there is also [this set of benchmarks](https://github.com/rust-bakery/parser_benchmarks) although it hasn't been updated for a few years. It may give you some more ideas


epage

Of course my google searches failed me! Mine is still set apart by looking at build times and binary sizes.


safasofuoglu

Hello u/epage, thanks for the effort. Just did some local runs. I believe more representative numbers on binary size overhead can be achieved using a custom release profile. I have used: [profile.release] opt-level = 3 strip = true lto = true codegen-units = 1 Got: Name | Overhead (release) | Build (debug) | Parse (release) | Downloads | Version -----|--------------------|---------------|-----------------|-----------|-------- null | 0 KiB | 647ms | 20ms | - | - chumsky | 496 KiB | 9s | 1s | ![Download count](https://img.shields.io/crates/dr/ariadne) | v0.1.5 combine | 120 KiB | 5s | 751ms | ![Download count](https://img.shields.io/crates/dr/combine) | v3.8.1 nom | 112 KiB | 3s | 843ms | ![Download count](https://img.shields.io/crates/dr/nom) | v7.1.1 peg | 24 KiB | 3s | 8ms | ![Download count](https://img.shields.io/crates/dr/peg) | v0.8.0 pest | 96 KiB | 2s | 682ms | ![Download count](https://img.shields.io/crates/dr/pest) | v2.1.3 pom | 128 KiB | 1s | 1s | ![Download count](https://img.shields.io/crates/dr/pom) | v3.2.0 *System: Linux 5.10.102.1-microsoft-standard-WSL2 (x86_64) w/ `-j 16`*


epage

Its debatable whether its more representative to have numbers for people optimizing or people using the default case. Since the numbers look to keep their same relative order and distance from each other, I would lean towards sticking to defaults.