Misleading Axis Scales
Axis scales are fundamental to interpreting graphs as they provide a reference frame for understanding the magnitude and relationships within the dataset. A misleading axis scale can distort the true narrative of the data, whether intentionally or unintentionally. In scientific data visualization, maintaining data integrity is crucial to accurate storytelling.
Consider these tips for ensuring that your axis scales retain data integrity and accuracy:
- The y-axis of column and line charts should always start at 0. A truncated y-axis visually exaggerates differences between groups, leading to inaccurate interpretation of the data.
- Use the same scales for related graphs that are meant to be compared. Similar to truncated y-axes, using different scales for related graphs causes visual bias in data interpretation, making it difficult to draw accurate comparisons across graphs.
- Note: While this may cause some individual graphs to inherit scales far past their maximum data values, the trade-off in this case is warranted to ensure accurate and unbiased graph comparisons.
BioRender Graph makes axis customization easy. Learn more today.
Visual Perspective and Distortion
Visual perspective plays a significant role in how we perceive data in graphs. It involves the angle and distance from which the viewer sees the graph. One of the main culprits of data distortion is the use of 3D graphs. Unnecessarily adding 3-dimensionality to a graph causes data in the foreground to appear larger than data further in the background. This makes it nearly impossible for viewers to draw accurate comparisons between groups in a dataset.
Consider these tips for avoiding data distortions caused by inappropriate perspectives:
- Don’t use 3D charts for data that represent 1-2 variables. Two-dimensional charts are much clearer and provide an accurate view of the data, making direct comparisons possible. In more rare and specialized cases, 3D charts are appropriate for data that represent 3 variables (plotted on x, y, and z coordinates).
- When representing data with circles (e.g. bubble charts), values should be proportional to the area not the radius/diameter of the circles. When viewing circles, we perceive their size based on the area they occupy, not their radius or diameter. Representing values by circle radius/diameter causes differences to appear exaggerated.
- Add clear labels and annotations to defeat any possible visual distortions, illusions, or ambiguity. Labels and annotations are powerful methods of adding clarity to data, whether to avoid ambiguity or to highlight key trends and differences.
Showing Variability
Communicating variability is an important aspect of accurate data visualization and statistics. In most scientific graphs, the primary comparison is between group averages. But averages don’t tell the whole story. Representing the degree to which individual data points in a dataset diverge from the average values provides insights into the spread and reliability of the data.
When deciding on how to represent variability in your graphs, you will first need to determine the most appropriate measure(s) of variability to include. This will depend on the type of data, as well as context surrounding the data. Detailed coverage of this topic is best left to statistical resources, but here are common options to consider:
- Individual data points (particularly outliers)
- Standard deviation (SD; denoted σ)
- Standard error of the mean (SEM = σ/√n where n is the number of samples)
- Confidence interval (CI; typically 95% when statistical significance is set to α = 0.05)
- Range (maximum to minimum values)
- Quartiles and interquartile range (data spread, used to calculate outlier values)
Once you’ve determine what variability measures to include, you should consider how to represent them in your graph. This will again depend on the type of data and graph. As a best practice, try to select the simplest way of representing variability to ensure that this information is salient and clearly interpretable by the viewer.
BioRender Graph Tip
BioRender Graph gives you multiple options for representing data variability, depending on the type of graph you’re working with. For columns and points, the “Display Data” settings in the left panel allows you specify what your error bars should represent: mean with SD, mean with SEM, or median with 95% CI.
You can also superimpose individual data points to your graph by checking “Data points” under “Additional settings” in the left panel.
If your dataset is compatible with boxplots, you can also change the “Graph type” to “Boxplot”.
For linear regression analyses, the “Show analysis” panel provides multiple display options. Check the “Best fit line” and “95% Confidence interval” options to display variability in your dataset. The 95% CI is displayed as a light color-matched shape behind your data points.
Return to the table of contents
References
- The Visual Display of Quantitative Information (second edition) by Edward R. Tuft
- Research Data Visualization and Scientific Graphics: for Papers, Presentations and Proposals by Martins Zaumanis
- Data Visualization: a practical guide to producing effective visualizations for research communication by Rebecca Wolfe (2014. RESYST Consortium)