pApER
Big data: what’s holding
you back?
Uncertainty about return on investment and skills shortages
needs to be overcome if the promise of big data technologies
is to be fulfilled
ContEntS
Executive summary
p3
A way to go
p4
Snapshot
p5
Business case
p8
Operational efficiency
p9
Adoption
p9
Big data obstacles
p10
Managing obstacles to big data
p13
Conclusion
p13
About the sponsor, Talend
p14
This document is property of Incisive Media. Reproduction and distribution of this publication in any form without prior written permission is forbidden.
Executive summary
Big data is often portrayed as a big-ticket IT endeavour which is within reach of only the largest enterprises. The situation is analogous to oil and gas exploration: you have good reason to believe that there is unrealised wealth buried at subterranean levels. But you do not know until you have sunk some highly expensive test wells. You may strike it rich; or you may surface empty handed.
However, this picture is both outdated and incomplete. Not only have tools come to market that can substantially reduce costs, but organisations have also identified that big data can lead to direct cost benefits such as reduced data warehouse costs, which is less about “exploration” and more about “optimisation”.
A recent survey of Computing’s readership reveals a snapshot of the uptake of big data in their organisations and the obstacles that hold them back from further adoption.
Compared with a year ago, this survey finds a marked uptick in organisations taking a first step into this new world.
However, as is usual with IT projects, budget and skills remain constraints for many, with organisations uncertain as to whether investment in infrastructure and personnel will yield the desired results.
While examining the data landscape, this paper argues that these obstacles can be overcome and a sound business case for big data adoption be built.
A way to go
The concept of big data has been around for a few years, but it would appear that it is still having difficulty making it out of the IT department. According to Computing’s survey, in some 41 percent of organisations, the term is unknown outside of IT circles (Fig. 1).
Fig. 1 : What is the attitude to the concept of ‘big data’ in your
organisation?
Clearly there is a requirement for education across the organisation. However, when talking to line of business managers and senior executives about big data, it is important to frame the concept in terms of business opportunity and risk – preferably in that order – not in technical language. “We are sitting on a vast wealth of market intelligence which could help us sell [insert latest product/service/project here], but if we do not exploit it our competitors will eat our lunch”, is a far more persuasive opener to an executive on profit-related remuneration than “we have 2PB of unstructured data outside our RDBMS which is inaccessible to our current query and analysis tools.” The good news for IT people tasked with proselytising the big data gospel to the organisation is that, according to the survey, in 25 percent of organisations, the attitude to big data is that it represents a big opportunity.
In only seven percent of organisations is big data dismissed as vendor hype, and just two percent regard it as a big problem the organisation faces, rather than an opportunity. When we asked this question of the Computing audience in May 2012, 34 percent said their organisations dismissed big data as vendor hype and 35 percent said it was a problem, not an opportunity. So the message is working, but there is still a way to go.
Few people outside the IT department have heard of it
Dismissed as IT vendor hype
Shorthand for a big problem which our organisation faces
Shorthand for a big opportunity which our organisation should take
We think there might be value buried in un-analysed data, but we don’t know where to start Don’t know Other
41%
7%
2%
7%
5%
13%
25%
Snapshot
To identify the opportunities in big data, it is vital to take stock of what is currently captured and analysed, who analyses it, and the volume of data which would be involved – now and in the foreseeable future.
According to Computing’s survey, email is the most captured source of data – by 66 percent of organisations (Fig. 2).
Fig. 2 : What are the main sources of data currently collected in your
organisation?
66%
59%
58%
53%
49%
47%
39%
30%
19%
12%
10%
7%
4%
3%
Email communicationsTransactional data & ecommerce
Structured and selected unstructured (eg from office productivity documents)
Web logs and traffic
Structured only: ie databases
Network traffic data
Call data records
Video/audio/still images
Sensor output and machine-to-machine
Every scrap of data that goes across our network
Scientific, medical, research
Only that which is required for legal compliance
Don’t know
Other
* Respondents could select multiple answers.
Unsurprisingly, transactional and ecommerce data (59%), and structured and some unstructured data (58%) are not far behind. Web logs and traffic are captured by 53 percent. Just less than half of respondents (49%) capture only structured data.
The type of data captured will depend on what business the organisation conducts. Sensor and machine-to-machine data will be captured only by companies that have networks of such devices, for example on process or manufacturing lines; student performance will be captured only in the education sector.
Given that the concept of big data is still in relative infancy, a significant proportion – 12 percent – say they capture every scrap of data that passes across their IT infrastructure. Just seven percent say they capture only that data which is legally required for compliance.
However, when it comes to what proportion of this data is analysed, the picture is less
encouraging. One in four respondents simply do not know what proportion of data that is kept is analysed (Fig. 3).
Fig. 3 : Approximately, what proportion of data collected by your
organisation is used for analysis?
The majority of remaining respondents (46%) estimate that less than half of the data their organisations keep is analysed, and six percent analyse only data that can be captured in a relational database. That leaves 22 percent of organisations that analyse more than half the data they keep.
This begs the question: if so much data is unanalysed, why keep it? One reason is that in many industries corporate data is required to be retained for a certain period of time on legal grounds. For example, in the UK telecoms operators are required to retain call data records (CDR) for one year.
Other firms simply hold on to data until they can work out what to do with it. As one respondent said, anecdotally: “the 90 per cent of unanalysed data is a work in progress”.
According to the survey, retained data is usually analysed by the IT department (44% of respondents) or the business department that owns the data stream (43%).
Twenty-nine percent of organisations employ the services of expert analysts and/or statisticians to examine data. In 23 percent of organisations senior management analyse data and in 24 percent anyone with authorised access can analyse data. Respondents were permitted to choose more than one option that applies to their organisation.
None Less than 10% 11 – 25% 26 – 50% 51 – 75% 76 – 90% More than 90%
Only data which can be put in a relational database management system (RDBMS) Don’t know Other
12%
16%
4%
6%
25%
1%
1%
15%
14%
16%
* Respondents could select multiple answers.
If so much data
is unanalysed,
why keep it?
What about the volume of data involved? Nearly half of the respondents (43%) simply do not know how much data would be involved. For the majority that do, it is less than a petabyte (Fig. 4).
Fig. 4 : Approximately what quantity of data under management
is/would be involved in a big data project in you organisation?
Knowing how much data would be involved currently is one concern, but how much data will IT leaders have to plan for in the future? According to some estimates, the world’s data is doubling every year.
The largest proportion of respondents to Computing’s survey (42%) say their data volumes are growing by 10 – 20 percent per annum, although a quarter say they do not know the rate of increase (Fig. 5).
Fig. 5 : At what rate is the data quantity changing in your organisation?
Less than 500TB 501TB – 999TB 1 – 5PB 6 – 10PB More than 10PB Other
48%
15%
9%
1%
5%
22%
Roughly staticIncreasing 10 – 20% per annum
Increasing 21 – 50% per annum
Increasing 51 – 100% per annum
Increasing more than 100% per annum
Decreasing Don’t know
42%
19%
2%
0%
6%
6%
25%
For 19 percent of respondents, data in their organisation is increasing at a rate of between 21 and 50 percent. For eight percent it is increasing more than 50 percent. None of the respondents said the volume of data is decreasing.
Business case
If IT is to make the business case for exploiting big data in the organisation, then it is important to draw up an exhaustive and detailed list of the potential business-oriented benefits and to identify which stakeholder groups are likely to benefit the most. The list of potential benefits in Fig. 6 is as good a place to start as any.
Fig. 6 : What would you identify as potentially the main benefits
(operational, strategic or technical) of adopting a big data strategy in
your organisation?
Respondents were permitted to choose as many of the options as applied to their organisation. Just over half (52%) of respondents identified improved decision-making as one of the main benefits, 42 percent said better operational efficiency and 40 percent improved data quality and/or integrity.
Improving customer intelligence, identifying new trends and opportunities, improving overall business insight and improving customer service were also popular.
Improved decision-making
52%
Better operational efficiency
42%
Improved data quality/integrity
40%
Improved customer intelligence
37%
Identify new trends/opportunities
37%
Improved overall business insight
36%
Improved customer service
31%
Faster queries and analysis
29%
Reduced duplication of data/effort
28%
Reduced operating costs
25%
Improved risk-management
20%
Enhanced sales revenue
17%
optimised marketing and branding
16%
Better compliance, including in real time
14%
Better collaboration with business partners
13%
It is important to remember that these are generic benefits that need interpreting and expanding for each organisation. Selling big data to senior executives on the grounds that “it can improve business decisions”, may sound like a veiled criticism of current business decisions. Better to identify what type of decisions it can improve and how, and, if at all possible, to attach an estimate of financial payback or elimination of risk.
For example, optimised marketing and branding was chosen by 16 percent of respondents. This may be a case of identifying customers who are ripe for up-selling or cross-selling. But equally it can also identify customers who already have a specific product or service yet are still receiving marketing materials, thereby reducing waste and the danger of annoying otherwise loyal customers. Having identified the business benefits of big data, it is
then easier to derive which stakeholder groups stand to gain the most.
In keeping with the previous answers, if improved decision-making is the benefit identified by most respondents, then it is fitting that senior management is the group most likely to gain (48% of respondents).
The marketing department (30% of respondents) is also a fairly obvious beneficiary, in terms of improved customer intelligence and identifying new trends and opportunities.
IT, sales, finance and middle management are all groups that are also seen as potential beneficiaries from analysing big data by a significant proportion of respondents.
operational efficiency
Second on the list of potential benefits of big data systems (see Fig. 6) comes operational efficiency. Big data technologies can be used to pre-process data before it is moved to a data warehouse, or to provide a more cost-effective storage platform on which a proportion of the corporate data can be stored. Since big data technologies are generally designed to run on clusters of commodity servers with relatively cheap storage, savings on both hardware and software can soon add up.
Significant time-savings are also possible with applications built on big data systems. Since relational schemas are no longer required – or at least are much simplified – database applications can be built and deployed much more quickly with consequent savings to the development budget.
Adoption
Interest in and roll-out of big data strategies have moved on markedly since last asking the
Computing audience how far their organisations have got. But there is still a significant
proportion of organisations stuck in the starting blocks.
Senior
management
is the group
most likely to gain
from big data
Fig. 7 : How far has your organisation engaged with big data?
Just less than a quarter of respondents (24%) say there is no interest in big data in their organisation. That is down from 61 percent in May 2012. Only four percent have rejected dealing with their big data after discussion, up from one percent.
Those engaged in preliminary discussions about using a big data approach have increased from 24 percent of organisations to 36 percent.
About one in five organisations (19%) are at the planning and appraisal stage, up from eight percent previously. But the proportion engaged in a pilot of big data is still at four percent.
However, the surveys show a small number of big data leaders pulling away from the pack: one in 10
respondents to the latest survey are engaged in a large-scale roll-out, up from two percent last year. These are the companies that have proved the value of a big data approach in a pilot, are pressing ahead and will gain an early lead over their competitors.
But what is driving adoption? For nearly a quarter of respondents (24%) it is simply the increasing volume of data.
For around one in five respondents (19%) the very real business requirement to increase revenue drives the take-up of big data. The pressures of compliance and product and service
development motivate 11 percent each.
Smaller proportions (6%) say big data in their organisation is driven by the requirement to find new customers, to keep up with competitors or to collaborate more closely with business partners.
Big data obstacles
Lack of budget and skills are nearly always identified in the top three constraints in any IT endeavour, and big data is no different.
61%
24%
1%
4%
24%
36%
8%
19%
4%
4%
2%
10%
May 2012 March 2013 No interest shown Discussed and rejected its use Discussions about using it Planning and appraisal stagePilot projects Large-scale roll-out
Interest in
and roll-out
of big data
strategies have
moved on markedly
in the last year
Big data adoption is seen as a big-ticket solution, and skilled operatives and data scientists thought to be thin on the ground (Fig. 8). Big data analytics requires a fairly unique blend of programming and statistical skills, together with the ability to understand and communicate what the analyses mean for the organisation. Since big data is a relatively new field, it is not surprising that organisations might struggle to find – or afford – suitable candidates.
Fig. 8 : What are the primary obstacles that prevent your organisation
moving further ahead with analysing big data?
Figure 9 drills down on the technical issues.
Fig. 9 : What are the technical obstacles to implementing a big data
strategy in your organisation?
Budgetary constraints
Big data isn’t a strategic priority
Lack of skills
technological restrictions
the need to comply with regulations
We don’t know where to start
Restricted scalability
Listed in descending order of priority
51%
40%
37%
37%
31%
25%
21%
19%
18%
16%
13%
10%
Large variety of data structures
Linking structured to unstructured data
Deciding what to keep and what to discard
Privacy/data protection is a concern
Large volume of files to be incorporated
Rapid rate of change of the data to be analysed
Transfer of large datasets across the network
Authorisation of access is a concern
Insufficient hardware, eg processors and storage The big data solutions provided by our IT partner(s) don’t fit our architecture
Backup is problematic
None - We have the technology in place to exploit our big data
Big data is said to be defined by “the three Vs”: high-volume (quantity of data); variety (different types of data – including structured and unstructured – and disparate databases); and velocity (the rate at which data is changing). All three are prevalent in the responses. The large variety of data structures are identified by 51 percent of respondents and linking unstructured and structured data by 40 percent as obstacles to big data. The large volume of files to be incorporated inhibits 31 percent, and the rapid rate of change of the data to be analysed is a problem for 29 percent.
Other major technical issues include deciding what to keep and what to discard, and privacy or data protection concerns (both identified by 37%).
In terms of the skills issues more than a fifth of respondents (22%) say the people with the right skills are too expensive, 18 percent do not have or cannot attract skilled data analysts, and 14 percent do not have or cannot attract people with the skills to integrate disparate data sources. However, 38 percent say they have sufficient people with the right skills.
Figure 10 breaks down the budgetary constraints.
Fig. 10 : What impact does budget have on your big data strategy?
Nearly half of respondents (49%) say uncertainty about return on investment hampers big data in their organisation. This is understandable if big data is seen as a large expense and the business case has not been clearly defined.
Nearly a third (31%) say there is simply no budget available for new projects, hardware or software. Only a slightly smaller proportion (28%) relate budget to skills – hiring and training people in big data skills is too expensive.
For 22 percent, the lack of budget is also a scalability constraint: the more data they want to process, the more expensive it gets.
And 17 percent are held back by their IT providers: in that the big data solutions provided by their IT partner(s) are too expensive to buy or license.
Uncertainty about return on investment (RoI)
49%
there is no budget for new projects/hardware/software
31%
Hiring and training people in big data skills is expensive
28%
the more data we want to process the more expensive it gets
22%
the big data solutions provided by our It partner(s) are very expensive
to buy/license
17%
none - we have all the budget we need
12%
other
5%
Managerial obstacles to big data
For over a quarter of the survey correspondents (26%) there is insufficient buy-in or no
champion for big data at senior level. Sixteen percent blame generic inertia; 14 percent say some senior executives are believers, but others not so keen; and the same proportion says line of business managers do not believe that the value exists or are unable to visualise the benefits. Sceptics to the IT vision can be annoying, but it is up to IT to build a business case which speaks in language non-technical managers can comprehend.
Happily, 14 percent say their senior managers are gung-ho for big data, while one respondent reports: “I wouldn’t quite say ‘gung-ho’ but we have a good level of senior buy-in.”
Conclusion
The obstacles to big data adoption are clearly numerous, but certainly not insurmountable. Budget and skills are constraints on nearly every IT endeavour including big data solutions. The potential scale of big data projects would indicate that these two parameters would be the major limitations, because big data requires people able to integrate a number of very large and often inflexible disparate data sources, and skilled data scientists to analyse the collective data streams.
Clearly big data need not be big-ticket. The ability to run open-source databases and integration tools on the open-source Hadoop platform enables big data integration and analysis applications to be run cheaply across clusters of commodity servers, reducing the outlay for hardware.
However, wary of the perceived immaturity of some of these open source players, many organisations will be looking to adopt solutions provided by their current enterprise software vendors.
But existing tools and traditional vendor solutions may not scale either technically or from a pricing or licensing standpoint, with costs rising in proportion to the volume of data to be analysed. Furthermore, vendor lock-in is rife, even when using supposedly open source databases and tools which still require proprietary emulators with expensive licensing models.
However, these obstacles can be overcome. Tools are available which can exploit the Hadoop’s MapReduce framework, breaking the vendor lock-in of proprietary ETL engines that were never designed for the Hadoop platform and providing an escape route from escalating licence fees. Graphical tools have been developed which can be used by Java programmers, obviating the need for rare – and hence highly expensive – data scientists. Furthermore, these open-source tools can be downloaded and tried for free.
This effectively reduces the cost of sinking the big data equivalent of a test well, by removing the limitations on technical and economic scalability.
But what of the lack of awareness among senior executives and business managers faced with a fuzzy business case and uncertain RoI?
About the sponsor, talend
Talend provides integration solutions that truly scale for any type of integration challenge, any volume of data, and any scope of project, no matter how simple or complex. Only Talend’s highly scalable data, application and business process integration platform enables organizations to effectively leverage all of their information assets. Talend unites integration projects and technologies to dramatically accelerate the time-to-value for the business.
Ready for big data environments, Talend’s flexible architecture easily adapts to future IT platforms. Talend’s unified solutions portfolio includes data integration, data quality, master data management (MDM), enterprise service bus (ESB) and business process management (BPM). A common set of easy-to-use tools implemented across all Talend products maximizes the skills of integration teams.
Unlike traditional vendors offering closed and disjointed solutions, Talend offers an open and flexible platform, supported by a predictable and scalable value-based subscription model.