If you search for the term “data science” on Google or Bing, you will find quite a few definitions or explanations of what it is supposed to be. There does not seem to be clear consensus around one definition, and there is even less agreement about when this term originated.
We will not repeat these definitions here, nor will we try to choose one that we think is most correct or accurate. Instead, we provide our own definition, one that comes from a practitioner’s point of view:
Data science is the exploration of data via the scientific method to discover meaning or insight and the construction of software systems that utilize such meaning and insight in a business context.
This definition emphasizes two key aspects of data science.
First, it’s about exploring data using the scientific method. In other words, it entails a process of discovery that in many ways is similar to how other scientific discoveries are made: an iterative process of ask-hypothesize-implement/test-evaluate. This iterative process is shown in Figure. The iterative nature of data science is very important since, as we will see later, it dramatically impacts how we plan, estimate, and execute data science projects.
Secondly, and not less important, data science is also about the implementation of software systems that can make the output of the technique or algorithm available and immediately usable in the right business context of the day-to-day operations.