• No results found

How to Develop Data Science Skills

In document The Data Science Handbook (Page 120-128)

GEORGE ROUMELIOTIS

116

which site visitor should be exposed to which advertisement in order to continuously optimize the revenue stream. As the site visitor behavior changed over time, the system would automatically adapt. Those were very exciting days, and our customers included MSN, eBay, and Cisco. Not so exciting was the Dot Com Crash in 2001, when the doors for additional funding slammed shut overnight. It was like nuclear winter for venture capital. We could not scale down our operations fast enough, so we had to shutter the company and sell off the technology and intellectual property.

Undeterred, I went on to found another start-up, JRG Software, with another set of business partners, this time raising about $10M. That start-up was in a completely different domain, namely factory scheduling for the food and beverage industry. The problem we solved was to enable factories to rapidly adapt to changing demand without holding a lot of inventory. One of our early customers was General Mills, which still uses our system to schedule all West Coast production of Cheerios! The business challenge was how to penetrate the headquarters of large companies like General Mills where SAP was firmly entrenched. We were eventually acquired by a public company that added our scheduling system to their product line.

At that point, my wife said something along the lines of, “Perhaps you should look at

doing something other than a start-up next,” and I eventually arrived at Intuit as one of its

first Data Scientists.

You were at Intuit before people started calling themselves data scientists, right? That’s right. And it’s been a fascinating journey.

Along with the rest of the world, over the past five years Intuit has dramatically evolved its thinking regarding the applications of big data and advanced analytics. Five years ago, the focus was entirely on marketing optimization. Then, starting about three years ago, the scope increased to include improving the user experience by analyzing how users are interacting with our products. Now, the focus is squarely on leveraging big data and advanced analytics to create new products that solve important problems for our customers. Our unique aim is to deliver “Big Data for the Little Guy, which empowers individuals and small businesses by allowing them to benefit from the power of their own data as well as the collective wisdom of millions of fellow Intuit customers. This means that small businesses now have access to insights that were once only available to big, multi-million dollar companies, and enables consumers to put their own data back to work for them.

You worked at Intuit before there was a lot of hype and discussion about this term “data science”. As someone who’s been in this field for awhile, what are the myths

GEORGE ROUMELIOTIS

117

and what are the truths when people talk about big data and data science?

You might have heard the joke, “What is a Data Scientist?” The punchline is, “A Data Scientist is a data analyst, who just happens to live in California.” I think the hype will go away, but Data Science will be a permanent feature of the business landscape. Data Science is its own unique discipline, combining elements of applied mathematics, computer science, business consulting, and, increasingly, new product development. I consider a good Data Scientist to be like a Swiss army knife, competent across all these areas, with deep expertise in one or two of them. More specifically, the technical table stakes for a Data Scientist are advanced statistics, machine learning, SQL and Hadoop, and a mainstream programming language like Java. So there’s a combination of applied mathematics and computer science. But of equal importance are business consulting skills. These are often overlooked, or added as an afterthought, but they are critical. Business consulting skills can be the difference between a Data Scientist and a Data “Gopher”.

A Data Gopher is someone who responds to incoming requests for analyzing this or that, but who never has a seat at the table when the business decisions are being made. On the other hand, a Data Scientist with business consulting skills is like a senior McKinsey consultant, who can translate fluently between business and technical domains, and who is a trusted advisor to business leaders. Those are highly non-trivial skills.

When you talked about skills required in data science, you talked about three things: classical statistics or machine learning, computer science and business consulting skills. What suggestions would you make to someone looking to build those skills? In terms of database skills, it is essential to feel completely comfortable with SQL and Hadoop. If you are still on campus, for goodness sakes take advantage of that by signing up for relevant basic courses that include a major project component.

In terms of programming skills, learning R is very important. It is kind of ugly, but it is the lingua franca. Personally, I would stay away from proprietary, commercial statistical programming languages. You know the ones I’m talking about. And certainly you need to learn a mainstream programming language like Java or C++. Learning a mainstream scripting language like Python or Perl also comes in handy.

I consider a good Data Scientist to be like a Swiss army knife, competent across all these areas, with deep expertise in one or two of them.

GEORGE ROUMELIOTIS

118

If I had to assign a weight to help someone prioritize all this learning, it would look something like this:

SQL 40%

Hadoop 30%

R 15%

Mainstream programming language 10% Mainstream scripting language 5%

In terms of acquiring business skills, you have to get creative. At Stanford there was a fabulous entrepreneurship course tailored to engineers and scientists. Simply listening to lots of entrepreneurs tell their story is very helpful. Subscribe to The

Harvard Business Review. Talk your way into a

challenging internship that presents you with an open-ended problem. Above all, just start an online business. It doesn’t need to be the next Google. Give yourself the challenge of starting with $100 and seeing how much you can make it grow in a month. That can be quite eye-opening. Don’t become a Data Scientist who has never operated as much as a lemonade stand.

You have a very unconventional experience in that you left your postdoctorate position to found companies. Not only did transition from a postdoctorate to a business environment, but you jumped into the deep end and decided to start your own company. What types of thinking did you feel like you benefited from during your experience in academia and what are the things that you felt were hindrances to you when you entered the business world?

Having the foundation of applied mathematics was extremely useful, because then I could pick up other math-based bodies of knowledge very easily. On the Ph.D. side, I mainly learned persistence.

What certainly didn’t help me, and what I had to unlearn, was how academics present their results to others. As academics, we’re trained to take an axiomatic approach. “Here

at the start of the presentation are my axioms, and here in the middle are the detailed steps that I took, and then here at the very end are my results.” But if you do that in a business

meeting, and you hand out copies of your slides beforehand, you’ll observe that the first thing the business leaders do is flip to the back of the deck to see your conclusions. They just don’t care about the detailed reasoning, because that is your job, not theirs. I have found it much more effective to start with “the bottom line up front” and then show the thought process if there are questions. This is a very different mindset from academia.

Don’t become a Data Scientist who has never operated as much as a lemonade stand.

GEORGE ROUMELIOTIS

119

Also, in academia you get kudos and endorphins from doing something novel. But in business it’s all about the efficiency with which the company can transform money into more money. So a Data Scientist needs to resist the impulse to solve problems ab initio, or to spend time going from the 80% percent solution to the 90% solution. That effort sometimes doesn’t make much business sense. You’ve got to think about allocating your time as though you were the owner of the business.

Intuit is a very data-centric and financial-centric company.You didn’t start out in a business context, so what framework do you use to evaluate the success of potential ideas at Intuit?

The way I look at business has definitely evolved, especially from being at Intuit. I’ve learned to take a hypothesis-driven, experimental approach to developing solutions to business problems. We should all feel passionate about the problems we are solving, but we must not fall in love with our solutions. We design experiments to let the customers choose between Solution A and Solution B, rather than that choice being made by “the loudest voice in the room.” That’s a mistake I made in start-ups, and one that I saw a lot of other people make as well. We were all convinced that of course we knew what the market wanted, and we proceeded to spend a lot of time building it. The way I work now, and the way I would have advised my younger self to work, is to create minimalist prototypes and test them out on real customers. Don’t fall in love with your own ideas. Market feedback is the only thing that matters. You’ve got to do experiments, and you’ve got to be ruthless about changing your ideas based on the results of those experiments.

We’ve interviewed people who have recently made the transition to data science, but as someone who’s seen the growth and development of younger data scientists, what are some of the mistakes often made by younger hires?

First, you have to proactively build relationships with your non-technical colleagues. Data Scientists are often by temperament introverts, but if you want to be effective and successful, you need to step outside that comfort zone. Email a non-technical colleague you’ve never met, and ask them to lunch. Make it your responsibility to form such relationships before you need them.

Next, practice viewing the world in terms of business processes. What’s a business process? It’s a foreign concept to many new Data Scientists coming directly from academia. A business process encompasses the people, systems and steps involved in a business activity. Generally speaking, a Data Science project has the goal of improving

Don’t fall in love with your own ideas. Market feedback is the only thing that matters.

GEORGE ROUMELIOTIS

120

some existing business process. The truth is, it’s really difficult to change a business process.

For example, it took me a long time to grasp that improving the efficiency of a business process might actually be perceived as threatening to someone’s job, and the natural reaction of that person might be to consciously or unconsciously undermine any progress. So you have to develop deep empathy for the people involved in business processes, and create solutions that help those people transition to higher-value work. That sounds like a lot of responsibility for a Data Scientist, but if you don’t think about things like that, your ideas might never be implemented in the real world.

Beyond these three attributes, what does it take to be a successful data scientist, in your opinion?

A successful Data Scientist changes the world around them. It comes down to mindset. One mindset is that your responsibilities are to analyze a situation, construct a solution, and then pass along that solution to others for implementation. But that is a recipe for frustration for anyone who is interested in moving the needle in the real world. A better mindset is to think of yourself as the business owner who is responsible for changing how the business works. That’s a whole different mindset, one of taking ownership for how your ideas are going to be implemented and measured. The additional skill you need is influence without power. How do you influence others to move forward with your recommendations when they don’t report to you?

How do you influence others without power? It is not easy, that’s for sure.

As I said earlier, it starts with being proactive in forming relationships with your non- technical colleagues, because people want to work with those they know and like.

It is also important to go for small wins before trying to hit the ball out of the park. Small wins prove that you are a reliable partner.

And you have to make the connection between your recommendations and the bottom

It took me a long time to grasp that improving the efficiency of a business process might actually be perceived as threatening to someone’s job, and the natural reaction of that person might be to consciously or unconsciously undermine any progress.

GEORGE ROUMELIOTIS

121

line. Yes, that’s often very hard. There are usually many links in the chain between your work as a Data Scientist and the outcome for the business. But nobody else will do that analysis if you don’t. It goes back to having the mindset of the business owner.

When you’re looking for data scientists, do you feel there is a necessity for having any form of senior academic credentials? A lot of the data scientists we’re seeing now have a Ph.D. background, but do you think this trend will continue into the future?

Back in the day, when relational databases were brand new to the world, the folks who were most comfortable with that technology were at IBM Research. It is not surprising that the first relational database experts in industry had Ph.Ds, but over time the barrier to

entry has obviously been reduced. Data Science might be like that. Maybe. On the other hand, Data Science might be more like brain surgery than SQL. I think it is too early to tell. The well-rounded Data Scientist is competent in applied mathematics, computer science and business. Such people don’t exactly grow on trees.

What distinguishes a data scientist from someone who works as a data analyst in a traditional business intelligence role, or a statistician with programming knowledge? How deep do these distinctions go?

Statisticians might be steeped in mathematical tools for inference and prediction, but that alone is not going to make them an effective Data Scientist. They also need to be completely self-sufficient in extracting and manipulating large data sets that are usually found in legacy systems, and which are often a noisy mess. They need SQL, NoSQL, and programming skills to do that. And even if they have all the programming skills, but they don’t have superb consultative skills, they will have very limited influence. I think a Data Scientist is a very different animal from a statistician. But that’s just my opinion. I’m not interested in getting into a religious argument about what constitutes a “real” Data Scientist.

You were the CTO of the first startup you founded and that seems to suggest that in addition to being an excellent physicist, you’re also quite skilled at building systems that provided software solutions built off your interests in image processing or scheduling. Were those programming skills something you picked up during the course of your graduate degree or was that more born out of need?

It is not surprising that the first relational database experts in industry had Ph.Ds, but over time the barrier to entry has obviously been reduced. Data Science might be like that.

GEORGE ROUMELIOTIS

122

I found that software engineering courses at Stanford taught you how to write a program, but they did not necessarily teach you how to work in teams, or with diverse systems that you have to integrate. And they did not teach you how to build, deploy and maintain complex software. All that relates to project management and people skills which are usually not addressed in computer science programs. So I learned those skills via trial and error. Lots of error!

I acquired software engineering skills by actually building the Version 1 solutions at the two startups I co-founded. I think it’s very hard for Data Scientists to work effectively with software engineers if they haven’t done any software engineering themselves. I don’t think a Data Scientist necessarily needs to be a production software engineer, which is a different mindset yet again. But basic fluency — knowing how to write, document and test code, and how to create components that are used in larger systems, that’s important.

Where do you think data science is headed?

I think we are going to see an explosion of both consumer and enterprise products that are made possible by Data Science — that is, by the creative melding of big data and advanced analytics. To achieve that, some Data Scientists will need to become product designers, adding the skill of “design thinking” to their toolbox. Deep customer empathy, rapid iterative prototyping, and in-market experimentation will be essential to this emerging sub-type of Data Scientist. Or maybe we’ll need a new name. Data Product Designer, anyone?

I don’t think a Data Scientist needs to be a production software engineer, which is a different mindset yet again. But basic fluency ... that’s important.

In document The Data Science Handbook (Page 120-128)