Utilizing Visualization-oriented Natural Language Interfaces (V-NLI) as a complementary input modality to direct manipulation for visual analytics can provide an engaging user experience. It enables users to focus on their tasks rather than worrying about operating the interface to visualization tools. In the past two decades, leveraging advanced natural language processing technologies, numerous V-NLI systems have been developed both within academic research and commercial software, especially in recent years. In our survey, we conduct a comprehensive review of the existing V-NLIs. In order to classify each paper, we develop categorical dimensions based on a classic information visualization pipeline with the extension of a V-NLI layer as shown in the figure. The following seven stages are used: query interpretation, data transformation, visual mapping, view transformation, human interaction, dialogue management, and presentation. This website includes the survey of existing systems, available datasets, and our evaluation results.
Survey (Click to Google Sheet)
The table lists the details of related works, including NLP libraries applied in the system, chart types supported, visualization recommendation algorithm adopted, and various characteristics in V-NLIs.
Available Datasets for V-NLIs
Name | Publication | NL queries Num | Data tables Num | Benchmark | Other Contributions | Website |
VisQA | CHI'20 | 629 | 52 | × | VQA with explanations | https://github.com/dhkim16/VisQA-release |
Quda | arXiv'20 | 14,035 | 36 | × | Three Quda applications | https://freenli.github.io/quda/ |
NLV | CHI'21 | 893 | 3 | √ | Characterization of utterances | https://nlvcorpus.github.io/ |
nvBench | SIGMOD'21 | 25,750 | 780 | √ | NL2SQL-to-NL2VIS and SEQ2VIS | https://github.com/TsinghuaDatabaseGroup/nvBench |
nvBench is a large dataset for complex and cross-domain NL2VIS task, which covers 105 domains, supports seven common types of visualizations, and contains 25,750 (NL, VIS) pairs. This repository contains the corpus of NL2VIS, with JSON format and Vega-Lite format.
There is a lack of empirical understanding of how people specify visualizations through natural language. Researchers of NLV conducted an online study (N = 102), showing participants a series of visualizations and asking them to provide utterances they would pose to generate the displayed charts. From the responses, they curated a dataset of 893 utterances and characterized the utterances according to (1) their phrasing (e.g., commands, queries, questions) and (2) the information they contained (e.g., chart types, data aggregations).
Quda aims to help V-NLIs recognize analytic tasks from free-form natural language by training and evaluating cutting-edge multi-label classification models. The dataset contains 14035 diverse user queries, and each is annotated with one or multiple analytic tasks. Quda achieves this goal by first gathering seed queries with data analysts and then employing extensive crowd force for paraphrase generation and validation. This work is the first attempt to construct a large-scale corpus for recognizing analytic tasks.
People often use charts to analyze data, answer questions and explain their answers to others. In a formative study, Kim et al. found that such human-generated questions and explanations commonly refer to visual features of charts. Based on this study, they developed an automatic chart question answering pipeline that generates visual explanations describing how the answer was obtained. During the study, they showed people various bar charts and line charts, and collected questions participants posed about the charts along with their answers and explanations to those questions.
Test cases and evaluation results (Click to Google Sheet)
To evaluate the state-of-the-art open-source V-NLIs (FlowSense, NL4DV, Microsoft's Power BI and Tableau's Ask Data), we defined a two-dimensional space that varies in task and information level for visualization-oriented NLIs. At task-level, we adopted the widely recognized ten low-level tasks (e.g., Characterize Distribution and Find Extremum) proposed by Amar et al. At information-level, human language is complex and can be divided into vague and specific according to utterance information. Although the communication of information between people is very smooth and unimpeded, whether itis vague or specific, it is very different for machines to parse the two types of utterance. Based on the space, we designed a set of test cases in each scenario as shown in the yellow-colored columns below.
The evaluation results are shown in pink-colored columns, both generated visualizations and judged results. The results can be divided into three catatories: (a) The system cannot parse the input query, and no result was produced (marked as “-”); (b) The system can generate visualization but cannot meet the expected demand (marked as “×”); (c) The generated visualization can correctly extract the target data attributes and well reflect the user's analytic task (marked as “√”).
Column descriptions:
- Task: Ten low-level tasks of analytic activity in information visualization at task-level.
- Dataset: The dataset the designed queries are issued in the context of. The ten datasets used during the evaluation can be found on the GitHub repository.
- Query type: Whether the query is specific or vague.
- Query: Query inputted to the systems.
- Target attributes: Data attributes that should be analyzed in the query.
- Evaluation results titled by names of four state-of-the-art systems: The generated visualizations during the evaluation can be found on the GitHub repository. The mark types in the table above were finally determined after discussion by multiple scholars. Fortunately, in general, these results were quite different and easy to distinguish which is better.