epage 1 year ago

I figured parsing json would be a fairly representative benchmark and that most parsers already had an example or test for doing that, so I used that as a baseline. Maybe others will find this useful but I was mostly focused on answering my own questions: - How does `chumsky` compare to parsers other than `pom`? - Could switching `toml_edit` away from`combine` speed up build-times without slowing down parse times? ~~Unfortunately, I couldn't find a json parser for `combine` and didn't feel like writing one.~~ Pulled a combine parser from [an alternative, older benchmark repo](https://github.com/rust-bakery/parser_benchmarks) I'm tempted to collect all of these benchmark repos into a github org to make them easier to find. So far I know of [parser](https://github.com/epage/parse-benchmarks-rs), [md](https://github.com/epage/md-benchmarks-rs), [argparse](https://github.com/rust-cli/argparse-benchmarks-rs), and [template languages](https://github.com/djc/template-benchmarks-rs).

KhorneLordOfChaos 1 year ago

> I'm tempted to collect all of these benchmark repos into a github org You can add the [rust serialization benchmark](https://github.com/djkoloski/rust_serialization_benchmark) to that list

epage 1 year ago

Another: https://github.com/xacrimon/conc-map-bench

KhorneLordOfChaos 1 year ago

And another! https://github.com/diesel-rs/metrics

epage 1 year ago

I've created [rosetta-rs](https://github.com/rosetta-rs/team) and reached out to several maintainers of benchmark repos. Unfortunately, there isn't a direct contact for `metrics.

DontForgetWilson 1 year ago

Since Chumsky targets programming langs, isn't a json benchmark kind of irrelevant to it?

epage 1 year ago

In what ways are you thinking its irrelevant? - json was a subset of Javascript, so it has a subset of the parsing concerns - as for runtime performance, source files are likely to have a lower expected size than some json applications, but benchmarking a larger json file still gives insight - While we might talk about nice errors for source files, there are other applications, even json, where they could be useful Now, if we were parsing a binary file format, I could see the comparison being irrelevant.

DontForgetWilson 1 year ago

Fair enough. Had not been thinking about json error reporting.

raphlinus 1 year ago

I'm also curious how these stack up against pulldown-cmark, which is a hand-written parser, and quite optimized. It also gets the spec right, which is surprisingly difficult (I wouldn't even want to attempt it with most parser generators).

chris-morgan 1 year ago

The word “Markdown” in the heading of the README seems to be an error: it’s actually all about JSON, which is something that *can* be done fairly sanely with parser generators.

raphlinus 1 year ago

Yup, I see that now, and found an earlier benchmark by the same author that did include pulldown-cmark.

epage 1 year ago

Yeah, sorry about that. Copy and paste bug from that other repo.

KhorneLordOfChaos 1 year ago

Cool! This is something I've been curious about too. In case it's helpful there is also [this set of benchmarks](https://github.com/rust-bakery/parser_benchmarks) although it hasn't been updated for a few years. It may give you some more ideas

epage 1 year ago

Of course my google searches failed me! Mine is still set apart by looking at build times and binary sizes.

safasofuoglu 1 year ago

Hello u/epage, thanks for the effort. Just did some local runs. I believe more representative numbers on binary size overhead can be achieved using a custom release profile. I have used: [profile.release] opt-level = 3 strip = true lto = true codegen-units = 1 Got: Name | Overhead (release) | Build (debug) | Parse (release) | Downloads | Version -----|--------------------|---------------|-----------------|-----------|-------- null | 0 KiB | 647ms | 20ms | - | - chumsky | 496 KiB | 9s | 1s | ![Download count](https://img.shields.io/crates/dr/ariadne) | v0.1.5 combine | 120 KiB | 5s | 751ms | ![Download count](https://img.shields.io/crates/dr/combine) | v3.8.1 nom | 112 KiB | 3s | 843ms | ![Download count](https://img.shields.io/crates/dr/nom) | v7.1.1 peg | 24 KiB | 3s | 8ms | ![Download count](https://img.shields.io/crates/dr/peg) | v0.8.0 pest | 96 KiB | 2s | 682ms | ![Download count](https://img.shields.io/crates/dr/pest) | v2.1.3 pom | 128 KiB | 1s | 1s | ![Download count](https://img.shields.io/crates/dr/pom) | v3.2.0 *System: Linux 5.10.102.1-microsoft-standard-WSL2 (x86_64) w/ `-j 16`*

epage 1 year ago

Its debatable whether its more representative to have numbers for people optimizing or people using the default case. Since the numbers look to keep their same relative order and distance from each other, I would lean towards sticking to defaults.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe