Last evening I was privileged to speak at the London Business Analytics Meetup. The talk was titled "An Introduction to Predictive Analytics". I promised the audience that due to my lack of slides I'd summarise my comments into a reference blog with some interesting links and resources to refer to.
What was the talk about?
A high level introduction into predictive analytics
Structural Ideas and Concepts
Start with Rugby (not Netflix)
The measurement of rugby players is a good example of predictive analytics at work.
Collecting data from various sources to understand player performance
Similar ideas also applied in football, baseball, etc
Attempting to predict injury, performance, fatigue, etc
This approach is a good mental model to apply to your own business context
A general framework for describing the analytical maturity of a given organisation, which is composed of three parts:
These companies are mature when it comes to gathering, aggregating, modeling and using data within their organisations. They have had data science teams for several years, if not since inception, and are applying forecasting and prediction techniques to well formed problems. They have typically invested heavily in systems, which are either built internally, purchased (Cloudera Stack), open-source (Hadoop), or a combination of both.
These effort support the primary products of these companies
The outcome of their efforts directly affects the revenue model of their offering
Oh, I saw this article earlier today which I really liked! Worth a read too.
Earlier in the data analytics life cycle. The management or leadership are aware of the potential for value to exist within the various data sources across the organisation. They may have already invested in some tools and people to start attempting to understand the data and to begin extracting it's potential value.
The typical starting point is the single customer view
Multiple schema, multiple siloed systems
More about this later
This is a catch all for the rest of the world. Not meant in a negative or disparaging way - just a mental model. There are companies where they are simply too small to require prediction. Or within their industry prediction isn't possible due to the particular aspects of the problem space. Or the problem is of a form that isn't computationally possible in anything less than super-polynomial time as a function of the size of the inputs. If you are interested in understanding more about algorithmically hard problems the following is a good book, doesn't cover data science or prediction though! Focused primarily on algorithmic problems and complexity: Algorithmics: The Spirit Computing
Returning to the Data Motivated Category of our mental model
Organisations at this stage are often looking for exploratory tools, initial prototypes to interrogate the data and to identify potential clusters of customer behaviour.
Budget is a challenge while the value case is still being established. Typically enough money is available to start the ball rolling but nothing significant
Rapid prototyping and early momentum are key here
Navigating access to siloed information across the organisation can also be a challenge. Political will and strong internal champions are often needed
The build or buy question will inevitably raise it's head at some stage
Don't under-estimate the challenges of the enterprise architecture. Understand if the company is still a monolithic system, SOA, moving towards micro-services? How will this affect your ability to source and mine data?
Key Take Away
Spend more time defining the problem you are attempting to solve than you think you need.
Making more money for the business isn't a well defined prediction problem... it can be an outcome but you need to drill down into systems, process, people.
Find the organisational bottlenecks where intuition is being used and where the available data is too large or dense for the users to navigate.
Insert light weight data capture points into existing processes if possible. Social media and website click-stream data will only get you so far.... beware of the social media dashboard as a panacea!
Don't confuse agile with fast (read John Foreman on this: "DataScience at the speed of Hype")
In the end these are expert systems. Humans need to use them and understand the output effectively. This is where design, UX and data visualisation is vital. The world doesn't need another dashboard.... Data Viz is a huge topic and unless you expect your Account managers (or whomever) to interpret K-Means Clustering in an iPython notebook you will need to translate the output of your analysis into a consumable format.
Current Things we see in the market
Fat End of the Revenue Curve - providing account managers with recommendation systems to service customers better
Companies understanding the limitations of the social media dashboard.... thankfully
SI dressed up as data science to plumb systems together
The SAAS Infrastructure Gap (I'm planning another blog on this so check back soon!)
Other information sources:
Great weekly newsletter:
Data Visualisation Podcast
Literature: (various books I've found useful or have been recommended to me)
Some Random Quora Answers on the topic: