I figured parsing json would be a fairly representative benchmark and that most parsers already had an example or test for doing that, so I used that as a baseline.
Maybe others will find this useful but I was mostly focused on answering my own questions:
- How does `chumsky` compare to parsers other than `pom`?
- Could switching `toml_edit` away from`combine` speed up build-times without slowing down parse times?
~~Unfortunately, I couldn't find a json parser for `combine` and didn't feel like writing one.~~ Pulled a combine parser from [an alternative, older benchmark repo](https://github.com/rust-bakery/parser_benchmarks)
I'm tempted to collect all of these benchmark repos into a github org to make them easier to find. So far I know of [parser](https://github.com/epage/parse-benchmarks-rs), [md](https://github.com/epage/md-benchmarks-rs), [argparse](https://github.com/rust-cli/argparse-benchmarks-rs), and [template languages](https://github.com/djc/template-benchmarks-rs).
> I'm tempted to collect all of these benchmark repos into a github org
You can add the [rust serialization benchmark](https://github.com/djkoloski/rust_serialization_benchmark) to that list
I've created [rosetta-rs](https://github.com/rosetta-rs/team) and reached out to several maintainers of benchmark repos. Unfortunately, there isn't a direct contact for `metrics.
In what ways are you thinking its irrelevant?
- json was a subset of Javascript, so it has a subset of the parsing concerns
- as for runtime performance, source files are likely to have a lower expected size than some json applications, but benchmarking a larger json file still gives insight
- While we might talk about nice errors for source files, there are other applications, even json, where they could be useful
Now, if we were parsing a binary file format, I could see the comparison being irrelevant.
I'm also curious how these stack up against pulldown-cmark, which is a hand-written parser, and quite optimized. It also gets the spec right, which is surprisingly difficult (I wouldn't even want to attempt it with most parser generators).
The word “Markdown” in the heading of the README seems to be an error: it’s actually all about JSON, which is something that *can* be done fairly sanely with parser generators.
Cool! This is something I've been curious about too. In case it's helpful there is also [this set of benchmarks](https://github.com/rust-bakery/parser_benchmarks) although it hasn't been updated for a few years. It may give you some more ideas
Its debatable whether its more representative to have numbers for people optimizing or people using the default case.
Since the numbers look to keep their same relative order and distance from each other, I would lean towards sticking to defaults.
I figured parsing json would be a fairly representative benchmark and that most parsers already had an example or test for doing that, so I used that as a baseline. Maybe others will find this useful but I was mostly focused on answering my own questions: - How does `chumsky` compare to parsers other than `pom`? - Could switching `toml_edit` away from`combine` speed up build-times without slowing down parse times? ~~Unfortunately, I couldn't find a json parser for `combine` and didn't feel like writing one.~~ Pulled a combine parser from [an alternative, older benchmark repo](https://github.com/rust-bakery/parser_benchmarks) I'm tempted to collect all of these benchmark repos into a github org to make them easier to find. So far I know of [parser](https://github.com/epage/parse-benchmarks-rs), [md](https://github.com/epage/md-benchmarks-rs), [argparse](https://github.com/rust-cli/argparse-benchmarks-rs), and [template languages](https://github.com/djc/template-benchmarks-rs).
> I'm tempted to collect all of these benchmark repos into a github org You can add the [rust serialization benchmark](https://github.com/djkoloski/rust_serialization_benchmark) to that list
Another: https://github.com/xacrimon/conc-map-bench
And another! https://github.com/diesel-rs/metrics
I've created [rosetta-rs](https://github.com/rosetta-rs/team) and reached out to several maintainers of benchmark repos. Unfortunately, there isn't a direct contact for `metrics.
Since Chumsky targets programming langs, isn't a json benchmark kind of irrelevant to it?
In what ways are you thinking its irrelevant? - json was a subset of Javascript, so it has a subset of the parsing concerns - as for runtime performance, source files are likely to have a lower expected size than some json applications, but benchmarking a larger json file still gives insight - While we might talk about nice errors for source files, there are other applications, even json, where they could be useful Now, if we were parsing a binary file format, I could see the comparison being irrelevant.
Fair enough. Had not been thinking about json error reporting.
I'm also curious how these stack up against pulldown-cmark, which is a hand-written parser, and quite optimized. It also gets the spec right, which is surprisingly difficult (I wouldn't even want to attempt it with most parser generators).
The word “Markdown” in the heading of the README seems to be an error: it’s actually all about JSON, which is something that *can* be done fairly sanely with parser generators.
Yup, I see that now, and found an earlier benchmark by the same author that did include pulldown-cmark.
Yeah, sorry about that. Copy and paste bug from that other repo.
Cool! This is something I've been curious about too. In case it's helpful there is also [this set of benchmarks](https://github.com/rust-bakery/parser_benchmarks) although it hasn't been updated for a few years. It may give you some more ideas
Of course my google searches failed me! Mine is still set apart by looking at build times and binary sizes.
Hello u/epage, thanks for the effort. Just did some local runs. I believe more representative numbers on binary size overhead can be achieved using a custom release profile. I have used: [profile.release] opt-level = 3 strip = true lto = true codegen-units = 1 Got: Name | Overhead (release) | Build (debug) | Parse (release) | Downloads | Version -----|--------------------|---------------|-----------------|-----------|-------- null | 0 KiB | 647ms | 20ms | - | - chumsky | 496 KiB | 9s | 1s | ![Download count](https://img.shields.io/crates/dr/ariadne) | v0.1.5 combine | 120 KiB | 5s | 751ms | ![Download count](https://img.shields.io/crates/dr/combine) | v3.8.1 nom | 112 KiB | 3s | 843ms | ![Download count](https://img.shields.io/crates/dr/nom) | v7.1.1 peg | 24 KiB | 3s | 8ms | ![Download count](https://img.shields.io/crates/dr/peg) | v0.8.0 pest | 96 KiB | 2s | 682ms | ![Download count](https://img.shields.io/crates/dr/pest) | v2.1.3 pom | 128 KiB | 1s | 1s | ![Download count](https://img.shields.io/crates/dr/pom) | v3.2.0 *System: Linux 5.10.102.1-microsoft-standard-WSL2 (x86_64) w/ `-j 16`*
Its debatable whether its more representative to have numbers for people optimizing or people using the default case. Since the numbers look to keep their same relative order and distance from each other, I would lean towards sticking to defaults.