Business intelligence is the application of analytics to business problems. Analytics is math and statistics applied to data sets to look for patterns, find correlations, take measurements, and make projections.
Types of Statistics
Software vendors have sold the idea of analytics as a panacea that can provide business intelligence. Their products feature handsome dashboards with graphs and trend lines. But looking at these dashboards, one can draw the wrong conclusions, and make bad business decisions, if they do not take the time to understand the principles upon which the resulting conclusions are drawn.
Business has come to realize this, which is why there is now a demand for data scientists in IT. The data scientist’s tools are the R programming language and the statistics packages built into SAP HANA PAL (Predictive Analytics Library) and other programming languages and APIs. Plus Microsoft Excel has many statistical functions.
The data scientist looks at a set of data and makes two kinds of observations:
- Descriptive—this means making a measurement of what you already have. The most common descriptive statistic is the A measurement of the variation in data is called the standard variation. There are hundreds of different descriptive statistics.
- Inferential—means drawing some conclusion from the data. A forecast is one example. A forecast makes a prediction about what is likely to happen in the future given what has happened in the past. Trends are an observation of data over time. One of the simpler analytics algorithms include is linear regression, which determines an association between two variables. For example, analysis of a dataset could reveal that sales increase 3% for every 10% reduction in prices.
Anomaly Detection with SAP HANA
SAP HANA PAL is one example analytics tool, although there are lots of tools to do analytics. SAP HANA is an in-memory database. Doing analytics there runs extremely fast since there is no moving disk controller, no application server, and no network latency as all the computations are done in the database, which is where the data is also located.
Let’s illustrate this idea of analytics with a very simple example, anomaly detection, and use SAP to do that.
An anomaly is a data point that varies from other data points to a degree that is statistically significant. An anomaly is also called an outlier. Outliers can reveal whether a computer has been hacked, someone has cancer, or whether a credit card transaction might be fraudulent. To find an anomaly, you group data points into clusters and then find points, which lie outside those clusters. Those points are the anomalies.
SAP HANA PAL functions are called just like any other SAP HANA function, which means you use extensions to the SQL language. For example, this complicated looking function shown below is how you tell SAP to look at table (or view) and then create a new table with the data records that are the anomalies:
CALL DM_PAL.PAL_ANOMALY_DETECTION_PROC(<input table>, <parameters table>, <output results table>) with OVERVIEW;
With SAP you put the parameters for the statistical function in a table as well. This is where you need a data scientist, as to measure the anomaly you would need to pick from a list of different algorithms. In this case, they are Manhattan, Euclidean distance, or Minkowski.
At Seamgen we specialize in designing data visualizations for complex analysis, and we understand the complex details behind analytics and can program such knowledge into your applications. Contact us today for more information.