ERICH OWENS
96
realize how there is this massive glut of smart people learning how to game arbitrage with marginal returns. In the grand scheme of things, finance just seemed fruitless. Compare that to the Bay area, where you have people learning how to build recommender systems, teaching systems to learn and that to me is very exciting. I think personally that move seemed accessible to someone with a mathematics background. It required heavy use of high dimensional vector spaces, linear programs, kernel methods, etc., which was a language I spoke already.
On the contrary, server client protocols and the more computer science concepts were foreign to me.
You mentioned this a little bit earlier concerning your move into machine learning and data science. Now that you’re at Facebook, what would you say is the value that you add as a data scientist?
In the case of Quid, they had a whole team of data analysts who were interested in having humans label training data. For them to scale, it wasn’t about hiring hundreds of more people, but it was about teaching an algorithm to do what the analysts did. Their move is largely emblematic of the growth potential of Silicon Valley, where you get exponential returns by scaling hardware instead of people.
I think finding people who could play around in Python and C++ and build these learning systems was hard.
Briefly going back to your previous experience in academia, what would you say were your biggest challenges in doing research positions at SLAC or your Ph.D. program to your roles at Quid, Newsle or Facebook?
Academics don’t really learn to code the way engineers in the Bay Area do. You learn as an academic to hack together code to produce results for your research. There is no incentive to learn to code well or maintainably. You don’t think about object orientation, functional programming or other techniques in the academic environment, which can be an impediment.
Wearing a business hat also provides a higher-level end goal which is sometimes not present in the academic environment.
Wearing a business hat also provides a higher-level end goal which is sometimes not present in the academic environment.
ERICH OWENS
97
How did you overcome these challenges?
I first joined Quid as a quantitative analyst and I had a very basic level of Python skills from academia. Fortunately some engineers at Quid took me under their wings and taught the basics of good software engineering.
I think when you are a student in mathematics or physics, you think vector is a vector or a matrix is a matrix, but you don’t really think about how those representations tie into the computer. You don’t think about sparsity, run-time, etc., which are very important in industry.
Throughout our conversations we’ve talked to many people about their data science background because there is such a diverse set of ways for people to get into the field. What would you have done differently through school and work given the experiences you’ve acquired?
I wish I plunged in more to build things, building websites or projects. When you’re comfortable writing things on a whiteboard, you get scared of code. I think iterating a lot on a prototype is really empowering and lets you learn programming and languages.
I wish I had programmed more, because when I first moved to Silicon Valley, lack of coding skills was a big stumbling block. I think my roles at the earlier startups also demanded a lot of iteration and prototyping, which helped me learn a lot. The pressure to see results in industry made the learning process a lot quicker compared to if I were learning in school.
What would you say is the value that you bring to Facebook as a data scientist? The value I bring is not so much as a data scientist, but as a software engineer. Although I borrow the tools of data science in terms of clustering , data analysis and classifiers, I have the ability to build a scalable full-stack system. So I am not just building stand- alone models which are pretty to look at which I’ll write a paper about, but where I add real value is by incorporating that model into a scalable system.
It’s really interesting that you say that because we’ve talked to people who say that their main value is not software engineering, but rather their quantitative skills. Of the people you’ve worked with, how many tend to come from the math to engineering transition and vice-versa?
I wish I had programmed more, because when I first moved to Silicon Valley, lack of coding skills was a big stumbling block.
ERICH OWENS
98
Facebook has its own data science team which is full of brilliant academics. I talk to them to get advice on what features to build and what algorithms.
Having that isolated academic data science team is really useful for an engineer like me. We wouldn’t be as successful without them.
I sit at the intersection of data science and engineering.
Can you talk a little bit about where data science in Facebook sits in the organizational chart or the product pipeline?
I’m on the public content ranking team. We want to connect you to content that you may like. So in a sense we’re working on a content delivery system. In order for that to work, you really have to understand how newsfeed-ranking algorithms work and what the goal for that team is. It’s one thing to rank and display your friends’ content which is quite a finite problem, it’s another to aggregate all content on Facebook at a given time to enable content discovery. The problem is much broader than that. Data science at Facebook is a stand-alone organization, but I’ve met several data scientists who have been embedded in different groups. So the structure depends on the product. On some teams, data is used to inform product decisions, on others data is a core component.
In some ways these silos of data science remind one of Bell Labs, where you build great things and are not so worried from week to week about the details of short-term projects or metric gains.
So you’re more insulated from the hard product deadlines and have more freedom to explore?
That would be my guess, but I am a software engineer. I think that would be accurate as the data science team does publish a lot of papers with the data Facebook collects.
You’ve been in a lot of roles from startups to Facebook, and you’ve surely met a lot of data scientists along the way. What would you say are some of the qualities that separate the best data scientists from the rest?
The brilliant ones I’ve seen at the few companies I’ve worked at were the ones who
The value I bring is not so much as a data scientist, but as a software engineer.
ERICH OWENS
99
could read papers, prototype and then turn it into a scalable system. I’ve met quite a few people who would have a great idea, but would then take forever to implement it even in Matlab.
So I think strong programming skills coupled with systems-level thinking is very important. Building scalable systems may limit your ideas, but it makes them that much more powerful in terms of impact. At Quid, for instance, there were engineers who could build systems on their own and think theoretically. In my opinion, the combination of strong theory and the ability to implement that in a scalable manner are makes a data scientist stand out. Are there any developments in the field of data science and machine learning that really excite you?
I like the idea of wearable computing, for instance Google Glass. Say you’re in this neighborhood and you want coffee, but Glass could recommend a nearby art gallery. I like the idea of life recommendations, the idea of personal assistance, the idea of picking up on personal signals and making recommendations.
More advanced algorithms based off linear separations like support vector machines (SVMs) or deep neural networks that could learn intermediary steps or do automated feature engineering are very exciting.
Say at some point that you would like to move on; do you think that your background would facilitate an easy transition to another field?
I’ve thought about hypotheticals, where say in 10 years I’ve built a great career at Facebook and might go back to school to study quantum computers or some exciting technology at the time.
Having a strong mathematical background really emboldens you to do these things. The nature of hiring at that age is different as you would no longer be a fresh college graduate, but an experienced hire. At that point, you may also have enough experience to start your own company in an adjacent field.
How do you approach problems? What’s a mathematical way of approaching data science problems and how do you use that framework to solve other problems?
I think strong programming skills coupled with systems-level thinking is very important. Building scalable systems may limit your ideas, but it makes them that much more powerful in terms of impact.
ERICH OWENS
100
I’ll give you an example. When looking at time series of data, one usually opts for analyzing the entire data set which requires a large amount of memory to store, which will impede actual analysis.
Having learned mathematics and signal theory, I could use a low-pass filter and just keep a small buffer to learn the exponential moving average at any given time. You see how an analog-to-digital converter can be useful in analyzing social data. I think spotting analogous metaphors between fields is the most useful thing someone from a rigorous background can do.
Just building off analogous metaphors — simulated annealing was inspired by metallurgy. How have you found your background in mathematics useful in cross- pollinating ideas to your current role at Facebook?
When it comes to recommendation systems, people will often use singular value decomposition (SVD) to do dimensionality reduction. For me that makes sense from a mathematical background, but I’ve seen stumbling blocks when talking to engineers about why that concept would be useful.
The ability to read a paper and understand it is also very useful. For instance there is this beautiful technique called random projections where you populate a random projection matrix using ones, zeros and minus ones, scaled by some normalization term. You can throw such a projection matrix against a high-dimensional vector and map it to a lower-dimensional space. According to the Johnson Lindenstrauss lemma, you can guarantee with high probability that the interpoint distances will be mostly consistent. It’s a remarkable property because you basically scatter your data into the wind, but it’s still useful with the added benefits of easier implementations and lower runtimes. It makes sense in terms of probability, but it seems really non-intuitive otherwise.
What advice or feedback would you give to people who are just starting out on their transition to the industry?
I think the most useful thing about being in college and graduate school for so many years was that I was learning for the sake of it and it was just very interesting. When I was doing applied mathematics I ironically wasn’t that interested in applications. When I asked myself what I wanted during graduate school, I would say that I wanted the
I think spotting analogous metaphors between fields is the most useful thing someone from a rigorous background can do.
ERICH OWENS
101
autonomy to work on some really big and hard problems. That was as concrete as my career goals were.
I’m really lucky that the whole data science and machine learning industry existed when I got out of school. I worry that if I were pragmatically focused on learning certain things, I might miss more abstract concepts which have greater implications later.
So I guess I would encourage people to study what they like, but the way that worked out for me may not work for others. It’s difficult to give very specific advice.