The goals of artifact evaluation
I have participated in artifact evaluation committees since 2021. I have done 5 years of CHES, one year of EUROCRYPT, and 2 NDSS years. Over these years, I have seen good artifacts, even great ones, and a fair share of “just OK” ones. There have been no truly terrible ones, though some required a bit of polish.
The approach to artifact evaluation at the different venues also differs, and reviewing an artifact for NDSS this year reminded me of the differences. In the following text, I have some opinions on those differences and on the aims of artifact evaluation at large. I also looked at a how other conferences do artifact evaluation, even if I was not reviewing for them, such as USENIX Sec, CCS, CRYPTO, or ACSAC. I summarize how their approach changed over the years, what badges they use, what they focus on, and whether they have artifact appendices with a strict format. I also look at link rot and availability of artifacts over the years.
I want to highlight the work done by the community at secartifacts.github.io. Having one place that survives in the event of the conference pages disappearing is quite nice.
Focus on reproducibility#
NDSS, as well as a few other conferences (e.g., USENIX Sec), puts a heavy focus on reproducibility. They require that artifacts contain a detailed artifact appendix (USENIX Sec 2026 example). It describes the artifact, its requirements, and setup but most importantly, it contains a detailed description of the major claims of the paper and how they are supported by the artifact and how reviewers can run experiments to verify these claims. The appendix is thus usually very focused on the details of the paper and on how to reproduce them exactly. It certainly helps in reproducing the experiments in the paper.
However, I find that the artifact appendix – and in general, this focus on reproducibility – does little to help reusability and extendability of the artifact. The conferences that have this reproducibility goal also mention extendability and reusability. I feel like with the significant focus on reproducibility, the other goals get less focus, both from the authors and reviewers. There is only a finite amount of effort the authors will put towards an artifact. If they choose to target the reproduced badge, the added workload is significant, especially with how detailed the artifact appendix needs to be.
The reviewers, guided by the artifact appendix, then also focus on reproducibility mainly. I saw artifacts that “passed all the checks” from the artifact appendix were evaluated positively, even if their code was a nightmare, ridden with bad practices, largely undocumented, and would be a pain to reuse.
While I think the focus on reproducibility is worthwhile, it should not come at a cost of extendability or reusability. I also think that the added value of evaluating reproducibility by the reviewers running the artifact and comparing the outputs to the claimed outputs in the paper is relatively small. With the amount of time the reviewers are expected to spend on the evaluation, there is a very small chance that they will spot a bug in the code or an error in a dataset that may completely invalidate the findings. There is even less of a chance that they will spot a malicious author spoofing the results.
This brings me to an interesting scenario. The badges at NDSS (and maybe some other venues) are not fully transitive. Meaning that you can request a functional and reproduced badge without requesting the available badge. What this means is that you will give access to the artifact for the reviewers to evaluate, but for whatever reason (usually legal/licensing), you will not release it publicly. Thus, the only outcome of this kind of evaluation is that you get a stamp on your paper with the badge. Ideally, this stamp would increase trust in the results of the paper if you trust that the artifact reviewers did the evaluation correctly. However, as I pointed out above, I believe there is only a small chance that artifact reviewers would actually spot a correctness issue or a malicious author spoofing the results.
Focus on reusability#
Conferences like CHES focus on extendability and reusability. CHES explicitly dropped the reproduced badge in 2025. I am not sure of the exact rationale, as it was not publicly described, so I can only hypothesize on the reasoning. It may have been the fact that reproducibility is a really tricky target for conferences like CHES, where – expensive or sometimes even custom – hardware is often a requirement. Reproducing someone’s experimental lab setup during a reasonable time allocated for artifact evaluation is a hefty goal.
With no artifact appendix or a reproducibility goal, the evaluation is simplified, both for the reviewers and the authors. I like this guideline from CHES a lot:
When in doubt, imagine a first-year grad student in 2030 who is told by their supervisor: “See if you can change this artifact from CHES 2025 to do X.” We want to give them the best chance of success with the least amount of pain.
It summarizes the usual situation many of us have found ourselves in. This is where proper documentation, good practices, and structure can help out a lot.
Archiving#
The conferences have different models with regards to when a public and long-term upload of the artifact should be done. Some require it for the artifact available badge, so upfront, whereas others only require that it is done after the review process. This sometimes creates confusion among the authors and reviewers. The IACR venues even take on the responsibility of archiving the artifact themselves at artifacts.iacr.org. Having proper rules on where and how the artifacts should be archived is important to ensure long term availability and prevent link rot (as you can see in the case of the ACSAC conference below).
Conferences#
Below you will find a summary of a few conferences I looked at. For some, I looked at the artifact links over the last couple of years and evaluated whether they still work and point to an accessible website.
CHES#
- Badges: No badges until 2024, then Available, Functional, Reproduced, then dropped Reproduced in 2025.
- No artifact appendix.
- Since 2021: 2021 CfA, 2022 CfA, 2023 CfA, 2024 CfA, 2025 CfA, 2026 CfA
- Artifacts archived by the IACR.
CRYPTO#
- Badges: No badges
- No artifact appendix.
- Only happened in 2024: 2024 CfA
- Artifacts archived by the IACR.
EUROCRYPT#
- Badges: Available, Functional, Reproduced since 2025, no before that.
- No artifact appendix.
- Since 2024: 2024 CfA, 2025 CfA, 2026 CfA
- Artifacts archived by the IACR.
ASIACRYPT#
- Badges: Available, Functional, Reproduced
- No artifact appendix.
- Since 2024: 2024 CfA, 2025 CfA,
- Artifacts archived by the IACR.
FSE#
- Badges: Available, Functional
- No artifact appendix.
- Since 2025: No CfP online?
- Artifacts archived by the IACR.
NDSS#
- Badges: Available, Functional, Reproduced
- Has an artifact appendix
- Since 2024: 2024 CfA, 2025 CfA, 2026 CfA
- On secartifacts.github.io, only links to artifacts, not archived by the venue.
| Year | Total | Available | Not available | Broken |
|---|---|---|---|---|
| 2024 | 38 | 36 | 2 | 0 |
| 2025 | 63 | 61 | 2 | 0 |
| 2026 | 114 | 109 | 4 | 1 |
CCS#
- Badges: ACM badges: Available, Evaluated, Evaluated-Functional, Evaluated-Reusable, Reproduced. Used USENIX Sec badges in 2023.
- Has an artifact appendix, but only after the artifact evaluation.
- Since 2023: 2023 CfA, 2024 CfA, 2025 CfA, 2026 CfA
- Not on secartifact.github.io. It is actually unclear where the artifacts from the three years of CCS artifact evaluation ended up (see the discussion in the issue). They are not listed on the conference pages. Looking at the publisher’s site (ACM DL) and going year by year one gets:
USENIX Sec#
- Badges: Available, Functional, Reproduced
- Has an artifact appendix.
- Since 2020: 2020 CfA, 2021 CfA, 2022 CfA, 2023 CfA, 2024 CfA, 2025 CfA, 2026 CfA
- USENIX Sec has an open science policy from 2026. All papers must share artifacts, unless they have none. Interesting!
- USENIX Sec definitely has the most developed AE.
- On secartifacts.github.io, only links to artifacts, not archived by the venue.
- Artifact availability at provided links:
| Year | Total | Available | Broken |
|---|---|---|---|
| 2020 | 36 | 35 | 1 |
| 2021 | 28 | 28 | 0 |
| 2022 | 108 | 108 | 0 |
| 2023 | 138 | 138 | 0 |
| 2024 | 138 | 138 | 0 |
| 2025 | 380 | 379 | 1 |
S&P#
- Badges: Available, Functional, Reproduced
- No artifact appendix.
- Since 2026: 2026 CfA
ACSAC#
- Badges: IEEE Xplore: (Available, Reviewed, Reproducible, Replicated) since 2024. ACM badges: Available, Evaluated, Evaluated-Functional, Evaluated-Reusable, Reproduced up until then.
- No artifact appendix.
- Since 2017: 2020 CfA, 2021 CfA, 2022 CfA, 2023 CfA, 2024 CfA, 2025 CfA, 2026 CfA
- On secartifacts.github.io, only links to artifacts, not archived by the venue.
- ACSAC (especially in 2024) seems to accept just about any link to the artifact with the promise that it remains online and available. However, this means even personal/lab websites make it into the mix as well as sites like anonymous.4open.science, which are great for blinding during review but are not a permanent store of artifacts. This leads to the following state of affairs w.r.t. actual artifact availability and link rot:
| Year | Total | Available | Broken |
|---|---|---|---|
| 2017 | 12 | 8 | 4 |
| 2018 | 22 | 18 | 4 |
| 2019 | 19 | 19 | 0 |
| 2020 | 26 | 25 | 1 |
| 2021 | 20 | 18 | 2 |
| 2022 | 43 | 41 | 2 |
| 2023 | 38 | 34 | 4 |
| 2024 | 53 | 37 | 16 |
| 2025 | 55 | 54 | 1 |
Summary#
There are many ways of doing artifact evaluation and there are many goals to strive for while doing so. Sometimes, it is good to aim high and achieve an okay result after a meeting with reality. I do have some perhaps opinionated tips on artifact evaluation:
- Make sure that some goals do not come at a cost of other goals, such as reproducibility vs reusability and extendability.
- Ensure proper long-term archiving and prevent link rot.
- Encourage submissions for the highest achievable badge level for a given artifact. Motivate authors to not go for an easy available badge, when they could put in a bit of effort and go for functional.
- Discourage artifacts that do not go for the available badge, make sure they have a proper reason and that there is really no other way the artifact could be published.
Extra references#
Finally, here are some extra references I collected. Useful for artifact evaluation chairs, reviewers and authors.