Starting with PVLDB 2018, pVLDB joins SIGMOD in encouraging the database community to develop a culture of sharing and cross-validation. PVLDB's reproducibility effort is being developed in coordination with SIGMOD's.
Submissions to PVLDB Reproducibility observe the submission deadlines of PVLDB and should be submitted using the PVLDB submission site in CMT for the reproducibility track.
Recent Reproducibility Highlights
- ByShard: Sharding in a Byzantine Environment, Jelle Hellings (McMaster University)*; Mohammad Sadoghi (University of California, Davis)
- A four-dimensional Analysis of Partitioned Approximate Filters, Tobias Schmidt (TUM)*; Maximilian Bandle (TUM); Jana Giceva (TU Munich)
- SetSketch: Filling the Gap between MinHash and HyperLogLog, Otmar Ertl (Dynatrace Research)
- ORBITS: Online Recovery of Missing Values in Multiple Time Series Streams, Mourad Khayati (University of Fribourg)
- A Critical Analysis of Recursive Model Indexes, Marcel Maltry (Saarland University), Jens Dittrich (Saarland University)
What is PVLDB Reproducibility?
PVLDB Reproducibility has three goals:
- Increase the impact of database research papers.
- Enable easy dissemination of research results.
- Enable easy sharing of code and experimentation set-ups.
In short, the goal is to assist in building a culture where sharing results, code, and scripts of database research is the norm rather than an exception. The challenge is to do this efficiently, which means building technical expertise on how to do better research via creating repeatable and sharable research. The pVLDB Reproducibility committee is here to help you with this.
Submit your accepted PVLDB papers for reproducibility through CMT (Reproducibility Track). To submit, you'll need the following information:
- The title and abstract of your original, accepted pVLDB paper.
- A link to your original, accepted pVLDB paper.
- A short description of how the reviewer may retrieve your reproducibility submission. This should include at least the following information: a link to the code and how to use the scripts for (a) code compilation, (b) data generation, (c) experimentation.
- A short description of the hardware needed to run your code and reproduce experiments included in the paper, with detailed specification of unusual or not commercially available hardware. If your hardware is sufficiently specialized, please have plans to allow the reviewers to access your hardware.
- A short description of any software or data necessary to run your code and reproduce experiments included in the paper, particularly if it is restricted-access (e.g., commercial software without a free demo or academic version). If this is the case, please have plans to allow the reviewers access to any necessary software or data.
In keeping with pVLDB itself, the pVLDB Reproducibility effort will use a rolling, monthly deadline. Papers received by 5PM EST on the first of each month will be distributed for that month's round of reviews. We will aim to have a completed reproducibility review within 2 months.
Why should I be part of this?
You will be making it easy for other researchers to compare with your work, to adopt and extend your research. This instantly means more recognition for your work and higher impact.
How much overhead is it?
At first, making research sharable seems like an extra overhead for authors. You just had your paper accepted in a major conference; why should you spend more time on it? The answer is to have more impact!
If you ask any experienced researcher in academia or in industry, they will tell you that they in fact already follow the reproducibility principles on a daily basis! Not as an afterthought, but as a way of doing good research.
Maintaining easily reproducible experiments, simply makes working on hard problems much easier by being able to repeat your analysis for different data sets, different hardware, different parameters, etc. Like other leading system designers, you will save significant amounts of time because you will minimize the set up and tuning effort for your experiments. In addition, such practices will help bring new students up to speed after a project has lain dormant for a few months.
Ideally reproducibility should be close to zero effort.
Criteria and Process
Each submitted experiment should contain: (1) A prototype system provided as a white box (source, configuration files, build environment) or a black-box system fully specified. (2) Input Data: Either the process to generate the input data should be made available, or when the data is not generated, the actual data itself or a link to the data should be provided. (3) The set of experiments (system configuration and initialization, scripts, workload, measurement protocol) used to produce the raw experimental data. (4) The scripts needed to transform the raw data into the graphs included in the paper.
The central results and claims of the paper should be supported by the submitted experiments.
Therefore, an independent team should be able to recreate result data and graphs that demonstrate similar behavior with that shown in the paper, using the author’s own artifacts.
Please note that typically, for some results (e.g., about response times), the exact numbers will depend on the underlying hardware. We do not expect to get identical results with the paper unless it happens that we get access to identical hardware. Instead, what we expect to see is that the overall behavior matches the conclusions drown in the paper, e.g., that a given algorithm is significantly faster than another one, or that a given parameter affects negatively or positively the behavior of a system.”
Each paper is reviewed by one database group. The process happens in communication with the reviewers so that authors and reviewers can iron out any technical issues that arise. The end result is a short report which describes the result of the process.
The goal of the committee is to properly assess and promote database research! While we expect that authors try as best as possible to prepare a submission that works out of the box, we know that sometimes unexpected problems appear and that in certain cases experiments are very hard to fully automate. The committee will not dismiss submissions if something does not work out of the box; instead, they will contact the authors to get their input on how to properly evaluate their work.
- Boston University: Charalampos Tsourakakis
- Chinese Academy of Sciences: Shimin Chen
- Duke University: Sudeepa Roy (Yuhao Wen, Prajakta Kalmegh, Zhengjie Miao, Yuchao Tao)
- ETH-Zurich: Zhang Ce
- Hong Kong University of Science and Technology: Qiong Luo
- IIT: Boris Glavic
- IIT Delhi: Maya Ramanath (Madhulika Mohanty, Prajna Upadhyay)
- Imperial College London: Peter Pietzuch (George Theodorakis, Panagiotis Garefalakis)
- Leibniz University of Hannover: Ziawasch Abedjan
- National University of Singapore: Bingsheng He
- Northeastern University: Mirek Riedewald
- Tsinghua University: Guoliang Li (Ji Sun)
- TU Darmstad: Carsten Binnig
- Télécom ParisTech University: Fabian Suchanek (Jonathan Lajus)
- UC Davis: Mohammad Sadoghi (Suyash Gupta, Thamir Qadah, Patrick Liao, Domenic Cianfichi)
- University Mass Dartmouth: Gokhan Kul
- University of Florida: Daisy Zhe, Wang
- University of Glasgow and Huawei Research: Nikos Ntarmos
- University of Insubria: Elena Ferrari
- University of New South Wales: Wenjie Zhang
- University of Rochecter: Fatemeh Nargesian