PVLDB Reproducibility

Starting with PVLDB 2018, pVLDB joins SIGMOD in encouraging the database community to develop a culture of sharing and cross-validation. PVLDB's reproducibility effort is being developed in coordination with SIGMOD's.

News

Submissions to PVLDB Reproducibility observe the submission deadlines of PVLDB and should be submitted using the PVLDB submission site in CMT for the reproducibility track.

Recent Reproducibility Highlights

Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries, , Tziavelis, Nikolaos (Norteastern University); Ajwani, Deepak (Norteastern University); Gatterbauer, Wolfgang (Norteastern University); Riedewald, Mirek (Norteastern University); Yang, Xiaofeng (Norteastern University)
Prefix Filter: Practically and Theoretically Better Than Bloom, Tomer Even (Tel Aviv University)*; Guy Even (Tel Aviv University); Adam Morrison (Tel Aviv University)
A Critical Analysis of Recursive Model Indexes, Marcel Maltry (Saarland University), Jens Dittrich (Saarland University)
ByShard: Sharding in a Byzantine Environment, Jelle Hellings (McMaster University)*; Mohammad Sadoghi (University of California, Davis)
A four-dimensional Analysis of Partitioned Approximate Filters, Tobias Schmidt (TUM)*; Maximilian Bandle (TUM); Jana Giceva (TU Munich)
SetSketch: Filling the Gap between MinHash and HyperLogLog, Otmar Ertl (Dynatrace Research)
ORBITS: Online Recovery of Missing Values in Multiple Time Series Streams, Mourad Khayati (University of Fribourg)

What is PVLDB Reproducibility?

PVLDB Reproducibility has three goals:

Increase the impact of database research papers.
Enable easy dissemination of research results.
Enable easy sharing of code and experimentation set-ups.

In short, the goal is to assist in building a culture where sharing results, code, and scripts of database research is the norm rather than an exception. The challenge is to do this efficiently, which means building technical expertise on how to do better research via creating repeatable and sharable research. The pVLDB Reproducibility committee is here to help you with this.

Submission

Submit your accepted PVLDB papers for reproducibility through CMT (Reproducibility Track). To submit, you'll need the following information:

The title and abstract of your original, accepted pVLDB paper.

A link to your original, accepted pVLDB paper.

A short description of how the reviewer may retrieve your reproducibility submission. This should include at least the following information: a link to the code and how to use the scripts for (a) code compilation, (b) data generation, (c) experimentation.

A short description of the hardware needed to run your code and reproduce experiments included in the paper, with detailed specification of unusual or not commercially available hardware. If your hardware is sufficiently specialized, please have plans to allow the reviewers to access your hardware.

A short description of any software or data necessary to run your code and reproduce experiments included in the paper, particularly if it is restricted-access (e.g., commercial software without a free demo or academic version). If this is the case, please have plans to allow the reviewers access to any necessary software or data.

In keeping with pVLDB itself, the pVLDB Reproducibility effort will use a rolling, monthly deadline. Papers received by 5PM EST on the first of each month will be distributed for that month's round of reviews. We will aim to have a completed reproducibility review within 2 months.

Why should I be part of this?

You will be making it easy for other researchers to compare with your work, to adopt and extend your research. This instantly means more recognition for your work and higher impact.

How much overhead is it?

At first, making research sharable seems like an extra overhead for authors. You just had your paper accepted in a major conference; why should you spend more time on it? The answer is to have more impact!

If you ask any experienced researcher in academia or in industry, they will tell you that they in fact already follow the reproducibility principles on a daily basis! Not as an afterthought, but as a way of doing good research.

Maintaining easily reproducible experiments, simply makes working on hard problems much easier by being able to repeat your analysis for different data sets, different hardware, different parameters, etc. Like other leading system designers, you will save significant amounts of time because you will minimize the set up and tuning effort for your experiments. In addition, such practices will help bring new students up to speed after a project has lain dormant for a few months.

Ideally reproducibility should be close to zero effort.

Criteria and Process

Availability

Each submitted experiment should contain: (1) A prototype system provided as a white box (source, configuration files, build environment) or a black-box system fully specified. (2) Input Data: Either the process to generate the input data should be made available, or when the data is not generated, the actual data itself or a link to the data should be provided. (3) The set of experiments (system configuration and initialization, scripts, workload, measurement protocol) used to produce the raw experimental data. (4) The scripts needed to transform the raw data into the graphs included in the paper.

Reproducibility

The central results and claims of the paper should be supported by the submitted experiments.
Therefore, an independent team should be able to recreate result data and graphs that demonstrate similar behavior with that shown in the paper, using the author’s own artifacts.

Please note that typically, for some results (e.g., about response times), the exact numbers will depend on the underlying hardware. We do not expect to get identical results with the paper unless it happens that we get access to identical hardware. Instead, what we expect to see is that the overall behavior matches the conclusions drown in the paper, e.g., that a given algorithm is significantly faster than another one, or that a given parameter affects negatively or positively the behavior of a system.”

Process

Each paper is reviewed by one database group. The process happens in communication with the reviewers so that authors and reviewers can iron out any technical issues that arise. The end result is a short report which describes the result of the process.

The goal of the committee is to properly assess and promote database research! While we expect that authors try as best as possible to prepare a submission that works out of the box, we know that sometimes unexpected problems appear and that in certain cases experiments are very hard to fully automate. The committee will not dismiss submissions if something does not work out of the box; instead, they will contact the authors to get their input on how to properly evaluate their work.

Reproducibility Committee

Co-Chairs

Peter Triantafillou (University of Warwick)
Gokhan Kul (University of Massachusetts, Dartmouth)

Committee

TU Delft: Asterios Katsifodimos
Huazhong University of Science and Technology: Bolong Zheng
Bongki Moon: Bongki Moon
Illinois Institute of Technology: Boris Glavic
ETH Zurich: Cedric Renggli
Lyon 1 University: Chao Zhang
UC Irvine: Chen Li
Tsinghua University: Chengliang Chai
PUC Chile: Cristian Riveros
Osaka University: Daichi Amagata
TU Dresden: Dirk Habich
Rutgers Universituy - New Brunswick: Dong Deng
National University of Singapore: Dumitrel Loghin
Concordia University: Essam Mansour
Guangzhou University: Fan Zhang
ATHENA Research Center: George Papastefanatos
University of Modena and Reggio EmiliaUniversity of Modena and Reggio EmiliaUniversity of Modena and Reggio Emilia:
Giovanni SimoniniUniversity of Massachusetts Dartmouth: Gokhan Kul
FORTH-ICS: Haridimos Kondylakis
Tsinghua University: Huanchen Zhang
Google: Ingo Müller
Washington State University: Jia Yu
Zhejiang University: Jinfei Liu
The Hong Kong University of Science and Technology: Jing Tang
Boston University: John Liagouris
University of New South Wales: Kai Wang
Mohammed VI Polytechnic University: Karima Echihabi
Seoul National University: Kunsoo Park
The University of Sydney: Lijun Chang
Nanjing University of Science and Technology: Long Yuan
Aalborg University: Matteo Lissandrini
Humboldt-Universität zu Berlin: Matthias Weidlich
IIT Delhi: Maya Ramanath
University of Zurich: Michael H Böhlen
University of Helsinki: Michael Mathioudakis
University of Edinburgh: Milos Nikolic
The University of Western Ontario: Mostafa Milani
Huawei Technologies R&D (UK) Ltd: Nikos Ntarmos
Microsoft: Raghav Kaushik
Universität Mannheim: Rainer Gemulla
The University of Hong Kong, China: Reynold Cheng
MIT: Ryan C Marcus
Zhejiang Univ: Sai Wu
University of Auckland: Sebastian Link
University of Cincinnati: Seokki Lee
The Chinese University of Hong Kong: Sibo Wang
Nanyang Technological University: Siqiang Luo
Nanyang Technological University: Sourav S Bhowmick
Universita' degli Studi di Bergamo: Stefano Paraboschi
DFKI Berlin: Steffen Zeuch
Simon Fraser University: Tianzheng Wang
Aalborg University: Torben Bach Pedersen
Harvard University: Utku Sirin
University of New South Wales: Wenjie Zhang
Microsoft Research: Wentao Wu
Kent State University: Xiang Lian
Northeastern University: Xiaochun Yang
University of New South Wales: Xiaoyang Wang
Aalborg University: Yan Zhao
Kyoto University: Yang Cao
BUPT: Yingxia Shao
University of New South Wales: You Peng
University at Buffalo - SUNY: Zhuoyue Zhao

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy