Sherlock Holmes Mode: Getting to Know Your Data

February 3rd, 2026

In Part 2, we discussed the importance of labelling your data to create a ground truth. Now that we have our dataset, the temptation is to immediately start “fixing” it.

However, you cannot fix what you do not understand.

This stage is called Exploratory Data Analysis (EDA). Think of this as the interview phase. You are interviewing your data to learn its secrets, its quirks, and its potential problems.

Visualising the Distribution

The first step is always visualisation. You need to see the shape of your data. If you simply calculate the average (mean) of a column, you might miss the full story.

For example, if you are analysing house prices, a few massive mansions could skew your average significantly. By using Histograms and Box Plots, you can visualise the spread. Is the data bell-shaped (normal distribution)? Or is it heavily skewed to one side? Understanding this shape helps you choose the right statistical tools later on.

The Outlier Dilemma: Error or Insight?

During EDA, you will almost certainly find data points that do not fit. These are outliers.

If you are analysing customer ages and find a value of “200”, that is clearly an error to be removed. However, if you are analysing credit card transactions and see a massive purchase, that might not be an error. That might be the fraud you are trying to predict.

Key takeaway: Never delete outliers blindly. Investigate them. They often hold the most valuable information in the entire dataset.

Finding Relationships

Finally, we look for correlations. How do different variables interact?

A Correlation Matrix (often visualised as a heatmap) allows you to spot redundant features. If “variables A” and “variable B” move in perfect sync, you likely do not need both. Feeding the model redundant information can slow down training and lead to overfitting.

Once you understand the shape and structure of your data, you are finally ready to pick up the mop and bucket.

Next up: Part 4 covers Data Cleaning. We look at how to handle missing values and scrub the dataset until it shines.

Get in touch to talk to a data engineering expert

Categories

Recent

Why Complex Subsea Risk Requires Custom Tech: Announcing Our Partnership With Ternan Energy July 17th, 2026 When you are operating in a highly specialised sector like offshore energy, Cable Burial Risk Assessment (CBRA) is not a simple administrative chore....

Fore-Site: The AI-First HSE Platform Built for the Frontline May 27th, 2026

Every safety manager knows the frustration of trying to enforce compliance using clunky, outdated software. Frontline crews avoid loggin...

Why Safety Leaders Are Betting on HSE Tech May 26th, 2026

The role of the Health, Safety, and Environment (HSE) professional is undergoing a massive shift. For decades, safety management was syn...

How Can We Help?

Building a new data product?
Let's bring your vision to life.
Getting AI-ready?
We'll prepare your data for intelligent insights.
Need custom application development?
Scalable, secure, and built for growth.
Database challenges?
Optimization, migration, or architecture - we've got you covered.
Exploring AI solutions?
Our experts can guid your next big move.
Need better reporting & analytics?
We create dashboards and visualisations that turn your data into clear, actionable insights.

Send a message or schedule a call for a free consultation

Sherlock Holmes Mode: Getting to Know Your Data

Contact us

How Can We Help?

Company

Our services

Product discovery

Design

Software development

Data engineering

Artificial intelligence (AI)

Support

Techonologies we use

Backend

Frontend

Database

Cloud & devops

BI & analytics

Industries