updated: 2021-12-05

Utilizing Visualization-oriented Natural Language Interfaces (V-NLI) as a complementary input modality to direct manipulation for visual analytics can provide an engaging user experience. It enables users to focus on their tasks rather than worrying about operating the interface to visualization tools. In the past two decades, leveraging advanced natural language processing technologies, numerous V-NLI systems have been developed both within academic research and commercial software, especially in recent years. In our survey, we conduct a comprehensive review of the existing V-NLIs. In order to classify each paper, we develop categorical dimensions based on a classic information visualization pipeline with the extension of a V-NLI layer as shown in the figure. The following seven stages are used: query interpretation, data transformation, visual mapping, view transformation, human interaction, dialogue management, and presentation. This website includes the survey of existing systems, available datasets, and our evaluation results.


Survey (Click to Google Sheet)

The table lists the details of related works, including NLP libraries applied in the system, chart types supported, visualization recommendation algorithm adopted, and various characteristics in V-NLIs.



Available Datasets for V-NLIs
Name Publication NL queries Num Data tables Num Benchmark Other Contributions Website
VisQA CHI'20 629 52 × VQA with explanations https://github.com/dhkim16/VisQA-release
Quda arXiv'20 14,035 36 × Three Quda applications https://freenli.github.io/quda/
NLV CHI'21 893 3 Characterization of utterances https://nlvcorpus.github.io/
nvBench SIGMOD'21 25,750 780 NL2SQL-to-NL2VIS and SEQ2VIS https://github.com/TsinghuaDatabaseGroup/nvBench
  • nvBench

    nvBench is a large dataset for complex and cross-domain NL2VIS task, which covers 105 domains, supports seven common types of visualizations, and contains 25,750 (NL, VIS) pairs. This repository contains the corpus of NL2VIS, with JSON format and Vega-Lite format.

  • NLV

    There is a lack of empirical understanding of how people specify visualizations through natural language. Researchers of NLV conducted an online study (N = 102), showing participants a series of visualizations and asking them to provide utterances they would pose to generate the displayed charts. From the responses, they curated a dataset of 893 utterances and characterized the utterances according to (1) their phrasing (e.g., commands, queries, questions) and (2) the information they contained (e.g., chart types, data aggregations).

  • Quda

    Quda aims to help V-NLIs recognize analytic tasks from free-form natural language by training and evaluating cutting-edge multi-label classification models. The dataset contains 14035 diverse user queries, and each is annotated with one or multiple analytic tasks. Quda achieves this goal by first gathering seed queries with data analysts and then employing extensive crowd force for paraphrase generation and validation. This work is the first attempt to construct a large-scale corpus for recognizing analytic tasks.

  • VisQA

    People often use charts to analyze data, answer questions and explain their answers to others. In a formative study, Kim et al. found that such human-generated questions and explanations commonly refer to visual features of charts. Based on this study, they developed an automatic chart question answering pipeline that generates visual explanations describing how the answer was obtained. During the study, they showed people various bar charts and line charts, and collected questions participants posed about the charts along with their answers and explanations to those questions.


  • Test cases and evaluation results (Click to Google Sheet)

    To evaluate the state-of-the-art open-source V-NLIs (FlowSense, NL4DV, Microsoft's Power BI and Tableau's Ask Data), we defined a two-dimensional space that varies in task and information level for visualization-oriented NLIs. At task-level, we adopted the widely recognized ten low-level tasks (e.g., Characterize Distribution and Find Extremum) proposed by Amar et al. At information-level, human language is complex and can be divided into vague and specific according to utterance information. Although the communication of information between people is very smooth and unimpeded, whether itis vague or specific, it is very different for machines to parse the two types of utterance. Based on the space, we designed a set of test cases in each scenario as shown in the yellow-colored columns below.

    The evaluation results are shown in pink-colored columns, both generated visualizations and judged results. The results can be divided into three catatories: (a) The system cannot parse the input query, and no result was produced (marked as “-”); (b) The system can generate visualization but cannot meet the expected demand (marked as “×”); (c) The generated visualization can correctly extract the target data attributes and well reflect the user's analytic task (marked as “√”).

    Column descriptions: