PVLDB Volume 18 — Contributions

New for PVLDB Volume 18

Updated Topics of Interest

Please also check Submission Guidelines for updates on the expectation of submitting supplementary materials for transparency and reproducibility.

Overview

The Proceedings of the VLDB (PVLDB), established in 2008, is a scholarly journal for short and timely research papers pursuing a strict quality assurance process. PVLDB is distinguished by a monthly submission process with rapid reviews. PVLDB issues are published regularly throughout the year. A paper will appear in PVLDB soon after acceptance, and possibly in advance of the VLDB Conference. All papers accepted for Volume 18 by June 15, 2025 will form the Research Track of the VLDB 2025 Conference, together with any rollover papers from Volume 17. Papers accepted to Volume 18 after June 15, 2025 will be rolled over to the VLDB 2026 Conference. At least one author of each accepted paper must attend the VLDB 2025 Conference. PVLDB is the only submission channel for research papers to appear in the VLDB 2025 Conference. Please see the Submission Guidelines for paper submission instructions. The submission process for other VLDB 2025 tracks, such as Demonstrations or Tutorials, is different, and is described in their respective calls for papers.

Scope of PVLDB

PVLDB welcomes original research papers on a broad range of research topics related to all aspects of data management, where systems issues play a significant role, such as data management system technology and information management infrastructures, including their very large scale of experimentation, novel architectures, and demanding applications as well as their underpinning theory. The scope of a submission for PVLDB is also described by the subject areas given below. Moreover, the scope of PVLDB is restricted to scientific areas that are covered by the combined expertise on the submission's topic of the journal's editorial board. Finally, the contributions in the submission should build on work already published in data management outlets, e.g., PVLDB, VLDB Journal, ACM SIGMOD, IEEE ICDE, EDBT, ACM TODS, IEEE TKDE, and go beyond a syntactic citation.

Four Paper Categories

There are four equally important categories of papers in the research track:

Regular Research Papers
Scalable Data Science Papers (SDS)
Experiment, Analysis & Benchmark Papers (EA&B)
Vision Papers

See Submission Guidelines for page limits for these categories.

Regular Research Papers

PVLDB invites regular research papers with different flavors:

Foundations and Algorithms Papers: The primary contribution of foundations papers and algorithms papers lies in their formal underpinnings expressed through precise pseudocode or theoretical formalism. These papers are encouraged to include a prototype implementation and evaluation, including comparisons with alternate approaches, but these are typically limited to demonstrating the conceptual ideas.

Systems Papers: The primary contribution of systems papers lies in the development of novel and practical approaches. They typically have no proofs and no theoretical formalism, but include a solid prototype implementation and empirical evaluation, typically in a working system.

Information System Architectures Papers: The key contribution of information system architectures papers lies in an innovative architecture for a new type of data management system. These papers include an initial prototype implementation and evaluation, but their main contribution lies in the breadth and impact of the overall vision. The details of design goals (e.g., the class of workload to be supported), systems architecture, new abstractions, and design justifications are expected.

You may optionally append the flavor(s) of your paper as a suffix to the title, e.g., “Paper Title (Flavor: Systems)”, “Paper Title (Flavors: Information System Architectures and Systems)”.

Scalable Data Science (SDS) Papers

We solicit submissions of papers describing design, implementation, experience, or evaluation of solutions and systems for practical data science and data engineering tasks, including data management, data engineering, data analytics, data visualization, data quality, data integration, data mining, and machine learning on large-scale data.

Distinct from the Regular Research papers, papers in this category do not necessarily propose new algorithms or models, but emphasize solutions that either solve or advance the understanding of issues related to data science technologies in the real world. Note that SDS is not a "tech-lite" avenue to bypass Regular for publishing new research: it is an avenue for research that has other forms of valuable novelty (not just novelty of techniques) and has more readily apparent potential for practical impact or has already had practical impact.

We seek two types of submissions: (a) deployed solutions and (b) evaluated solutions.

Papers about deployed solutions describe the implementation of a system that solves a significant real-world problem and is (or was) in use for an extended period of time in industry, science, medicine, education, government, nonprofit organizations, or as open source. The paper should present the problem, its significance to the application domain, the design choices for the solution, the implementation challenges, and the lessons learned from successes and failures, including post-launch performance analysis. Papers that describe enabling infrastructure for deployment of applied machine learning also fall into this category. An example may be an open-source, general-purpose entity linkage tool that takes data from any two data sources and links records that refer to the same real-world entity. Or a paper on a low-latency system to automatically monitor online model predictions on streaming data at scale to detect concept drift and recommend how to react.

Papers about evaluated solutions (but not necessarily deployed) shall describe fundamental insights derived from addressing a real-world problem. This might include papers that provide significant insights into an applied area/domain or papers that provide strong baselines that are thoroughly tested on real data. We also encourage papers that conclude that a problem is solved under particular conditions or is infeasible with current techniques. In addition to insights, the paper should explain what milestones were reached, what the practical impact is, and (if applicable) what the obstacles to deployment are. Straightforward improvements over trivial baseline solutions tested on small datasets are unlikely to qualify. Continuing with the previous example, a paper might present an entity linkage model that applies state-of-the-art deep learning techniques and obtains high performance on a few real-world datasets, showing success of adaptations of recent techniques in helping solve an important and practical data science problem. Similarly, a paper on a system to handle concept drift in streaming prediction applications may apply or extend recent statistical or ML approaches but demonstrates their efficacy and scalability convincingly with real-world datasets.

The papers need not cover all aspects of an application or give all details. Instead, we encourage papers with key insights supported by solid data points.

This category helps bridge the gap between the Regular Research papers and the Industrial Track papers, especially due to the fast evolving nature of data science. In particular, this category differs from the Industrial Track on both scope and level of impact expected. This category focuses more specifically on new technology for data science-oriented workloads, while the Industrial Track is more general and covers all aspects of database technology. The Industrial Track focuses on already commercial technology, while this category also welcomes work that may not yet be commercial or deployed but still at the proof-of-concept stage, as long as it is convincingly validated and has good potential for impact. In relation to concurrent submissions, authors are not allowed to submit papers on the same work to any other category or track of VLDB, except for the Demonstrations Track.

Scalability is an important aspect of data science research at the cusp of practical impact. But scalability can refer to different axes and metrics in different data science contexts, e.g., number of data examples, number of attributes/features, number of data sources, number of models, number of users, or number of concurrent requests for access, and response latency, system throughput, machine resource footprints, and monetary costs for metrics. It is not possible to enumerate a comprehensive list. Reviewers will assess whether the submission is sound on the scalability aspect based on the merits of the work and its target application setting.

It is our hope that this category will attract more of the cutting-edge and impactful real-world work in the scalable data science arena to VLDB for the benefit of the VLDB community, including spurring new technical connections, inspiring new follow-on research on scalable data science, and enhancing the impact of the VLDB community on data science practice.

Here are some example SDS papers from past volumes:

“HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework” (VLDB 2022 Best Scalable Data Science Paper)
“Optimizing Bipartite Matching in Real-World Applications by Incremental Cost Computation” (VLDB 2021 Best Scalable Data Science Paper)
“Inspector Gadget: A Data Programming-based Labeling System for Industrial Images” (VLDB 2021)
“Fine-Grained Lineage for Safer Notebook Interactions” (VLDB 2021)

Experiment, Analysis & Benchmark (EA&B) Papers

EA&B papers focus on the extensive evaluation of algorithms, data structures, and systems that are of wide interest. The scientific contribution of an EA&B paper lies in providing:

fundamentally new insights into the strengths and weaknesses of existing methods, or

new ways to evaluate existing methods.

Such contributions are essential, because they can springboard new follow-up research by enabling the research community to see new design possibilities for new methods, new metrics that matter, new but problematic corner cases, as well as having a common infrastructure to judge methods.

Some examples of paper types suitable for this category are:

Experimental survey: Experimental surveys that compare multiple existing solutions (including open-source solutions) to a problem and, through extensive experiments, provide a comprehensive perspective on their strengths and weaknesses.

Analysis: Papers that focus on relevant problems or phenomena and through analysis and/or experimentation provide insights on the nature or characteristics of these phenomena.

Benchmark: Papers that present new benchmarks, characteristics of the benchmark data, methods in generating the data and the gold standard, usage of the benchmarks, and optionally example experimental results on the benchmarks.

Reproducibility: Papers that verify or refute results published in the past and that, through a renewed performance evaluation, help to advance the state of the art.

For papers that identify negative or contradictory results for published results by third parties, the Review Board may ask the third party to comment on the submission and even allow a short rebuttal/explanation to be published along with the submission in the event of acceptance.

Vision Papers

Vision papers outline futuristic information systems and architectures or anticipate new challenges. Submissions would describe novel projects that are in an early stage but hold out the strong promise of eventual high impact. The focus should be on the key insight behind the project (e.g., a new set of ground rules or a novel technology), as well as explaining how the key insight can be leveraged in building a system. The paper should describe what the success criteria are for the vision project.

Topics of Interest

PVLDB welcomes original research papers on a broad range of topics related to all aspects of data management. The themes and topics listed below are intended to serve primarily as indicators of the kinds of data-centric subjects that are of interest to PVLDB – they do not represent an exhaustive list.

Data Mining and Analytics
∟ Data warehousing, OLAP
∟ Parallel and distributed data mining
∟ Data stream mining
∟ Mining/analysis of different data types (e.g., scientific/business, social networks, text, web, graphs, rules, patterns, logs, time series, spatio-temporal)
∟ Explainable AI

Data Privacy and Security
∟ Access control and privacy
∟ Blockchain

Database Engines
∟ Access methods
∟ Concurrency control, recovery, and transactions
∟ Memory and storage management
∟ Multi-core processing and hardware acceleration
∟ Query processing and optimization
∟ Views, indexing, and search

Database Performance and Manageability
∟ Administration and manageability
∟ Tuning, benchmarking, and performance measurement

Distributed Database Systems
∟ Cloud data management, resource management, database as a service
∟ Data networking and content delivery
∟ Distributed analytics
∟ Distributed transactions

Graph and Network Data
∟ Graph data management
∟ Hierarchical, non-relational, and other modern data models
∟ Social networks

Information Integration and Data Quality
∟ Data cleaning, data preparation
∟ Heterogeneous and federated DBMS, metadata management
∟ Knowledge graphs and knowledge management
∟ Schema matching, data integration
∟ Source discovery
∟ Web data management and Semantic Web

Languages
∟ Data models and query languages
∟ Schema management and design

Machine Learning, AI, and Databases
∟ Applied ML and AI for data management
∟ Data management issues and support for ML and AI

Novel Database Architectures
∟ Data management on novel hardware
∟ Embedded and mobile databases
∟ Energy-efficient data systems
∟ Real-time databases, sensors and IoT, stream databases
∟ Video management and analytics systems
∟ Vector databases
∟ Time series databases

Provenance and Workflows
∟ Debugging
∟ Process mining
∟ Profile-based and context-aware data management
∟ Provenance analytics

Specialized and Domain-Specific Data Management
∟ Crowdsourcing
∟ Ethical data management
∟ Fuzzy, probabilistic, and approximate data
∟ Image and multimedia databases
∟ Scientific and medical data management
∟ Spatial and temporal databases
∟ Time series data
∟ High-dimensional vector data

Text and Semi-Structured Data
∟ Data extraction
∟ Information retrieval
∟ Semi-structured data management, RDF
∟ Text in databases

User Interfaces
∟ Data exploration tools
∟ Database support for visual analytics
∟ Database usability
∟ Explainable AI
∟ Interactive querying and visualization for large data
∟ NL interfaces to data
∟ Recommender engines

Start

Current Submission

All Volumes

Reproducibility

General Information