>

Top Five Questions to Ask About Data

In my recent “Introduction to Data” training video, I urged Vanderbilt data users to be thoughtful about the data you use. I have also heard many people lecturing about how effective use of AI requires good critical thinking skills.

Being thougthful and employing critical thinking sound like great ideas, but what exactly do they mean in practice? More than anything else, they mean asking questions – of yourself and of the person sharing the data or insights with you. Never be afraid of a simple question; often they are the ones that are most able to ferret out problems or insights.

Here are a five questions that I ask reguarly when I am looking at a set of data or an output from an AI tool. I find them to be a good toolset for critical thinking with data, and I hope you find them a useful starting point.

1. Use your knowledge of the context to ask “does this make sense to me?” At a most basic level, this is often the first way that you get an inkling that numbers might be wrong – or surprising. Do the numbers seem oddly high or low? If you saw a dataset that suggested half of Vanderbilt undergraduates were unemployed a year after graduation, you would be right to be skeptical about whether such a talented group of people were truly unemployed. If you see something like that, ask questions! Sometimes that will teach you something you didn’t know and lead to more informed questions and insights. Sometimes it will highlight an issue. For instance, you might learn that the dataset missed off all the Vanderbilt graduates who had gone on to pursue advanced degrees, which would make it a very misleading way of presenting Vanderbilt data.

2. Be sure you know where the number came from, and ask whether it is reasonable for this source to know this information. This tells you how much trust you can put in the data, and how you can use it. There are vendors out there who tell us that they have information on our graduates’ salaries. But when we ask how they sourced this information, they tell us that they use salary estimates for similar jobs in similar locations for people with similar years of experience. So they don’t actually have any idea how much Jane Smith is actually making; it’s really just an informed guess. If I am reporting on aggregated numbers, this is probably fine, but if I were to ask Jane if this is what she is making she would almost certainly tell me it was wrong. You also want to consider if the data is timely; if I read a study on the impact of AI on work from 2023, shortly after the release of ChatGPT, I will be skeptical about whether the results are really correct in 2026.

3. Put the number that is being quoted into a larger context. If you are quoted a percentage, consider “percentage of what” (also phrased as my favorite question: what is the denominator here?). At Vanderbilt, if I see that 20% of trombone majors are unhappy about something, that means something different than if I see that 20% of engineering majors are unhappy with something. Twenty percent of trombone majors might be one person, while 20% of engineering majors is several hundred individuals. A variant of this test for data visualizations is to make sure you look at the Y-axis in any chart. It is common for Y-axes to be shortened, for instance for survey results which might be quite closely clustered. This can make very small differences look alarmingly (or encouragingly) large. Sometimes our data will be presented with indicators of statistical significance or with standard deviation markers around it. Where those overlap, you are justified in asking whether this difference really is significant.

4. Consider what is included and what is missing to find out what it can and can’t tell you. This is especially true with survey data. For instance, surveys regularly now have response rates below 20%. You have to wonder whether the people who responded were somehow different than the norm. Often they represent people who were very satisfied or very dissatisfied, but not the “quiet middle.” Are we really able to draw any broad conclusions? Famously, polls in 2016 undercounted white, non-college educated voters because they were less likely to respond to surveys. This led to an underestimation of Trump supporters. Closer to home and in a slightly different vein, about once a year I am asked why we don’t participate in the National Survey of Student Engagement (NSSE). My reasoning has always been that NSSE schools are almost all large state institutions whose undergraduate populations are very different (less residential, less financial aid) than Vanderbilt’s undergraduate population. Our students would look very good compared to those students, and therefore we would learn very little that would actually improve the life of our students.

5. Come back to “does it make sense?” and think carefully about any data-based conclusions. This is perhaps the hardest. Data tells you correlations (this occurred and then this occurred, or more people who had this attribute also did this) but it almost never directly answers the question why something occurred. This requires an understanding of context. The challenge is biggest when you are making predictions or taking actions based on data. Sometimes you can be pretty certain about them because you understand the correlation, but sometimes you want to ask more. For instance, alumni households where both partners are Dores (the “double-Dore households”) are more likely to give and be engaged than those with only Vanderbilt alum; the causality here is pretty clear, so it makes sense to lean into those relationships. But if you learned that alumni who live in Milwaukee are more likely to be engaged than those who live in Madison, you might reasonably want to know more about why before pouring money into your Milwaukee alumni chapter.

One aid to critical thinking I would use with open eyes is asking a GenAI platform like ChatGPT or Amplify for help. Remember that GenAI is programmed to try to make you happy, so it will give you an answer it thinks you want to hear. I once asked it to help me with a negotiation; the difference between the answer it gave me as me, and me when I told it I was the other party was 20% of the total value of the deal. Use your prompt engineering skills and ask it to “Act like a skeptical Chief Data Officer and identify all the questions I should ask to find possible weaknesses,” and you will be on your way!

Olivia Kew
Chief Data Officer, Vanderbilt University
January 2026

Explore Story Topics