The Next Wave: Big Data Predictive Analytics

(1)

The Next Wave: Big Data Predictive Analytics

Shiv Sikand, IC Manage

DAC Panel 2015

(edited transcript)

There’s a lot of talk about analytics and big data. There is a fundament difference - analytics and analytical thinking requires a cultural shift in the way you track and plan your activities. Those changing thought processes are based on where we are today. Most schedules and organization are very, very reactive. We have standard reports. People fill in data. They report that information at status meetings. There's usually a leader or a project leader who tries to marshal and corral that information and present it in a succinct fashion during weekly meetings or bi-weekly meetings, and try to draw through and find out what data is relevant to the overall scope of the project, versus very individualized technical problems that a group maybe facing.

Then of course there's this whole notion of semi-automated data gathering where you're really requiring or relying on the engineers to themselves populate this data and present it to you.

As you know people will tend to color that data because they don't necessarily understand the full significance of it in the overall project context, and they only want to present the data that they feel is relevant. But the truth is much deeper than that.

(2)

Now just because you gather all the big data, doesn't automatically mean you are no longer reactive at all: I've built a big data system, I grab all this stuff, and now suddenly I'm still reactive.

The transition from reactive to proactive is difficult. You don't get it out of the box. Forward looking decisions, and I think Simon [Burke, Xilinx] pointed that out, really need new types of analysis. That analysis typically is continually changing. You have different constraints as you go through your project lifecycle and then different constraints between projects, because they could have different flavors. In a typical semiconductor company you've got different flavors of devices and IP is being delivered through different mechanisms. The key is to have the ability to keep writing analysis, identify the key indicators, and then drill down deeper to find out exactly what's going on. To do that you really need a powerful framework.

(3)

What are those framework challenges? The first one is that we're talking about very, very large amounts of unstructured data.

This is machine data on all the jobs that ran. But unless you can correlate the actual events that drove the creation of that data it's very difficult to write a good model. That's where a very well structured SCM repository can help drive that correlation.

It's also very important to remember that project leads are not database gurus, they're not data scientists. What they want is something that's usable. They don't want to have to train a group of people to run those analytics for them. They want to be able to do it in a fairly scalable fashion.

In terms of framework challenges making data useful requires a certain level of optimization. Yes, you do need to grab a lot of your data and put it into the system, but you also want to make sure that it’s optimized, and that you don't put a lot of wasteful stuff in there, because it just becomes harder to find that needle in the haystack.

You need optimization, you need some notion of predictive modeling, and there's always a fair amount of statistical analysis. Usually that statistical analysis is based on historical trends or historical precedents that allow you to build your base model. Once you have your base model, then you can extrapolate it based on your complexity and your scaling factors.

The key here is that vendors need to provide that value. Most of that value comes through providing library functions so that you can compute the metrics that you want, and provide easy programmatic interfaces so that not everyone has to be a database expert or a data scientist. Then provide presentation layers so that visualization, which is a very big part of analytics, comes up very, very quickly.

(4)

As Simon has mentioned, the tape-out prediction comes after at least one design and you get actionable information and accuracy improves over time. It's not some magic thing that you're going to press a button and get a result from a tool.

Now another big aspect of big data analytics is IP theft prevention. We see these trends of data leaking out from organizations. Typically what we find is that it's not one record that's stolen or two records that's stolen; it’s four million records or ten million records.

(5)

The key there is that analytics can enable you to stop those kinds of mega-breaches. Otherwise, when you have vulnerability, the typical time to find it is about 280 days. Further, in general, that vulnerability is not found by you it's found by some external organization.

One of our partners did a survey and what they found was that most of the time that information was

coming from third party agencies like the FBI, the NSA or the CIA. They were being told hey, we raided this

office or we found this set of computers, and on it we found stuff marked “your company confidential”. The key is to be able to prevent these mega breaches, because once you have vulnerability, this data starts leaking out and you want to be able to apply threat prevention analytics very early on to stop large amounts of data leaking out of your system.

Thank you.