A couple of years ago I read R for Marketing and Research and Analytics, by Chris Chapman and Ela McDonnell Feit. I definitely recommend this book if you’re new to marketing analytics and have an interest in learning R while working with marketing data.
Chapter 3 of the book teaches you how to create simulated sales data and how to inspect the data. The chapter also outlines a recommended approach to inspecting data. I use this approach in my own work to inspect new data before starting the analysis.
A basic data analysis workflow is as follows:
Load Data –> Inspect –> Wrangle –> Analyze –> Report
(The workflow described below relates to the Inspect phase.)
Inspect Data Workflow
To get started, load your data into R and make sure the data is in a data frame:
- Check that the data has the expected number of rows and columns
- Check that the data frame does not have a header row and no empty rows at the end.
- Check several random rows to spot-check data.
- Check for appropriate variable data types (especially factor type).
- Check for unexpected values (especially min and max).
- Check observation counts, trimmed mean and skew.
I have taken the workflow a step further and created a convenient utility function to run through all steps at once. Once you import your data, simply call the function and it will display the summary statistics in the console output to review.
Data Inspection Workflow Output Example:
Note that the function utilizes two libraries that you will have to install:
It’s important to create reusable code to quickly work through repeatable tasks. Creating utility functions are useful for abstracting code and allowing you to move through data preparation a little faster.
Feel free to tweak the code and make it your own and change the workflow steps to represent your process.