There is a broad and fast-growing interest in data science and machine learning. It is fueled by an explosion in business applications that rely on automated detection of patterns and behaviors hidden in the data, that can be found by software and exploited to dramatically improve the way we market and sell products, optimize our inventory and supply chain, and detect fraud and support customers. In short, data science and machine learning improve how we make decisions in a wide range of situations based on patterns found in data.
For decades, mathematical modeling in business belonged to an obscure area at the intersection of business and IT. Now it is moving into the mainstream and the rush is on: Where do we find data scientists, how do we train them, and what tools do we give them? Is there a way we can scale analytics and data science to the point where they become a normal aspect of any software development project?
This series of blog posts is addressed to software engineers and technology managers who want to understand, in simple terms, how data science is used to solve common challenges in machine learning.
In thinking about the best ways to expose a large number of programmers to the basics of data science and machine learning, we took the same approach that helped introduce Java Spring to millions of developers: the Pet Clinic, a teaching-oriented demo application that is intuitive enough that any developer can relate to its business goals, complex enough to represent real-world requirements, and simple enough to keep the developer from being overwhelmed by complexities found in real-world business applications.
“Social Movie Reviews” is what we’re calling our “Pet Clinic for data science and machine learning,” and here is how we are going to use it to expose you to the world of data science:
This data science guide explains how we built our Twitter sentiment analysis application in three parts: First, we discuss the data science process and key machine learning terminology. Second, we explain how to understand and process the raw data using dictionaries, machine learning, and test data sets. Third, the guide reviews how to tune the model and visualize insights derived from it.
This blog series is also a logical companion to our series of blog posts on In-Stream Processing, which is a popular approach to building a computational platform for performing mathematical analysis and machine learning. We use our In-Stream Processing Service Blueprint to provide a computational platform used in this tutorial on data sciences.
Victoria Livschitz, Anton Ovchinnikov, Joseph Gorelik