Top Ten Data
Management Trends
September, 2013
Raj Gill
Founder and President, Scalability Experts
Executive Summary
The amount of data that companies need to manage is doubling
every couple of years and with the expansion of Web 2.0 channels,
smartphones, online/m-commerce and a variety of other new smart
technologies, it will only continue to grow at an even quicker pace in
the years to come.
Many IT organizations are having a hard time keeping up with the sheer
volume and criticality of this data. They are looking for more efficient
and effective ways to manage, store and leverage this information at
each point along the data lifecycle. Many companies are already deep
into consolidation, virtualization and plotting their path to the Cloud in
order to optimize their computing and database operations.
This paper will introduce you to the top ten data management trends
Scalability Experts is seeing in the market that will be important in 2011
and beyond. If you are an IT manager or CIO, you should familiarize
yourself with these technologies. In today’s fast paced data-rich
environment, you need to take advantage of every opportunity to gain
a competitive advantage. By leveraging the latest trends, you will be
able to make more insightful business decisions, reduce your IT costs
and make your environment more responsive, available and scalable.
Contents
Introduction ...3
Top Ten Data Management Trends ...4
1. Data Warehouse Appliance ...4
2. Databases in the Cloud ...5
3. Data Governance ...5
4. Predictive and In-Database Analytics ...6
5. Pervasive Insight ...6
6. Data Integration ...7
7. Master Data Management ...7
8. Very Large Databases (Hadoop) ...7
9. Data Replication ...7
10. Geo-spatial data visualization ...8
Importance of People, Processes and Technology ...8
Recent Example – Smart Grid Technology ...9
How Can Scalability Experts Help ...10
Conclusion ...10
Introduction
As experts in Data Management and Business Intelligence, it is Scalability Experts’ (SE) job to keep up with the latest methodologies, best practices and technologies available in the market to help companies get more value from their data. Every month the company completes a wide range of mission-critical consulting engagements with customers from various industries. Based on this customer experience, this paper will introduce you to the top ten data management trends SE is seeing in the market that will be important for 2011 and beyond. The following will give you a quick introduction to some of the key drives SE has identified.
Even though most companies have used some type of BI solution in the past, there has been a significant shift towards the use of more advanced predictive analytics and associated technologies. This shift in focus has occurred for many reasons, but one of the drivers has been the exponential growth in the amount of rich data that is available to companies to leverage and use for better decision making. A recent issue of Database Trend Magazine[1] cited an independent survey of 500+ companies that found, on average, data is growing
more than 50% per year. To keep up with this unprecedented growth in data many of these companies will also need to upgrade their computing platform to ensure that their environment is scalable and available. Many companies are becoming smarter about their data and most now view their database operations as a way to gain a competitive advantage.
Another driver of change is the maturing of Cloud computing technologies. Over the last couple of years Cloud has gained a lot of momentum with a larger percentage of companies now taking a serious look at the benefits of on-premise private Cloud and off-premise public Cloud solutions. The use of Cloud computing is also introducing alternative new databases to the market like noSQL and Hadoop clusters (using MapReduce) to aggregate massive amounts of data very quickly and reliably. For traditional environments, there is an emergence of database appliances like Microsoft’s Parallel Data Warehouse and TeraData.
As the volume and complexity of data increases, data integrity becomes a bigger challenge. Not only do you have multiple streams of different types of data you need to comprehend, but you have greater collaboration and sharing of that data among the various departments and their supply chain. This has posed new challenges to enterprise data integration like high data volume and the growing importance of Master Data Management and Data Governance, causing the shift of organizing the data from database administrators to Business users. Depending on your needs, not all of the trends will be applicable to your current IT situation, but they should be on your radar screen for adding to your roadmap for future implementation. The market is changing quickly.
Top Ten Data Management Trends
The trends and technologies listed below will give you the latest solutions available to: • Increase data availability and scalability
• Improve operational efficiency • Ensure data integrity
• Lower total cost of ownership • Increase speed to market
• Improve business decision making
Determining which solutions best fit your situation will depend on the size, scope and type of business challenges your company faces. As a first step, conducting an architectural design review session is recommended in order to assess the cost/benefit advantages and weigh the pros and cons of implementing any of these solutions. The following gives you SE’s top ten list (not necessarily in priority order):
1. Data Warehouse Appliance
As your data volume and user base grows it may not always be feasible to re-architect your system and introduce a disruptive change in order to ensure predictable query response times, hence many companies are opting to implement massively parallel processing (MPP) data warehouse appliances which provide the required scalability. Database warehouse appliances like Microsoft’s Parallel Data Warehouse, Teradata and other solutions provide a turnkey approach to allow greater capacity and scalability from tens to hundreds of terabytes while also lowering operational costs for improved ROI.
When implementing Data Warehouse Appliances customers can gain increased performance out-of-the-box with less effort because the solutions are optimized for data warehousing. What distinguishes an appliance from a typical online transaction processing (OLTP) database is that all components – from CPU to disk are balanced for online analytical processing (OLAP), with a primary emphasis on eliminating potential performance bottlenecks. Solutions like Microsoft’s Fast Track appliance are also optimized for sequential IO rather than random IO and are designed to provide up to 200 MB/s per CPU core. Choosing an appliance in most cases provides similar performance at less than a third of the price of other traditional solutions.
2. Databases in the Cloud
Organizations faced with the challenge to optimally size their computing resources to meet rapidly changing business demands are looking at Cloud computing as a solution. Many times either their system is over-sized to meet the most demanding business scenarios and utilized during non-peak times, or under-sized due to the lack of an accurate capacity model and hence fail short when demand spikes. In the above scenarios there is a requirement for elastic computing for optimal resource allocation. Cloud computing provides the required elasticity by making it easy to quickly add or drop capacity as load changes.
On-demand or pay-as-you-use computing services delivered through a Cloud solution can be thought of as a marriage between Utility Computing and Service Oriented Architecture (SOA) built on an autonomic computing concept. Utility computing and SOA enables databases to be available in the form of a Platform-as-a-Service (PaaS) or sometimes referred to as a Database-as-a-Service (DaaS) allowing metered-usage and charge backs to the end-user, thus allowing the service to scale-up on demand reducing TCO. The Cloud can either be private (hosted on-premise) or public (off-premise through a Cloud service provider like Microsoft (Azure platform), Google, Amazon or others). Another feature of Cloud computing is that they are multi-tenant and self-provisioning.
Cloud computing provides significant benefits of greater scalability, manageability and lower operational costs, however there are also risks to consider related to control, security, governance and regulatory compliance. Depending on your business, it can be challenging to decide the best way to take advantage of this new technology. To determine the right approach and road map, a close evaluation of a company’s goals and current computing environment is needed.
3. Data Governance
More and more business users are demanding access to data rather than just reports, since tools like Microsoft’s Excel are now capable of providing advanced analytic capabilities. This has elevated the importance of having an effective Data Governance process in place in order to maintain data integrity and the quality of data being used across the enterprise, i.e., being able to control where the data comes from, who controls data values, consistent data types, etc. Having a tight process in place can also help ensure that business data is not exposed to unauthorized personnel, especially sensitive information that can be exploited. In the past, database administrators had exclusive access to the data and would provide canned reports to the business users. Now business users are being given greater access to data so that they can slice and dice the information and organize the data in a way that best meets their needs. Making your Data Governance processes more robust
According to a March 2011
TechTarget [2] survey that polled 500 companies of all sizes across a variety of different industries, a whopping 70% said they have budgeted for Cloud computing initiatives this year, compared with fewer than 10% of companies in 2010.
can not only allow for broader usage of your data to improve insight and decision making capabilities, but it will also help ensure that data quality is maintained in a safe and secure way.
4. Predictive and In-Database Analytics
Using Predictive Analytics to find patterns in historical and transactional data to identify opportunities and risks for the business has become increasingly sophisticated over the last few years. New BI technology and new modeling/scenario planning capabilities are allowing companies to capture relationships between explanatory variables and the predicted variables from past occurrences, and exploit it to predict future outcomes, i.e., future trends and behavior patterns. This is allowing companies to better predict future demand for their products, enable just-in-time supply chain systems, and anticipate future Website usage spikes, just to name a few business planning scenarios. Use of information from the full data lifecycle for forecasting and planning purposes is driving greater operational efficiencies and creating demand from a broader range of new users that now want access to this data.
As greater numbers of business users gain access to your data there is potential for your systems to bog down while trying to handle the additional workload. One way for IT to avoid this problem is to provide users with in-database analytic capabilities to minimize the impact of the additional workload/queries generated by these users. This provides them with the capability to run simulations and analytics close to the data. Traditionally, the data would reside in an enterprise data warehouse and analytics would be performed on this data. For advanced analytics, the data would be extracted into a data mart. However, with the growing data volume and the need for quicker data response, it is desirable to eliminate the data movement in order to increase the availability of the enterprise data warehouse servers. In-database analytics enables advanced analytics like data mining, predictive modeling and Monte Carlo Simulations to be performed close to the data inside the database without impacting the performance of the overall system.
5. Pervasive Insight
The term Pervasive Insight is used to describe the goal of making data more valuable, more accessible and more available to a greater number of users across the enterprise. The aim is to facilitate better decision making further down into the organization by providing easier 24/7 access to systems and tools so that users can query, manipulate and display data in a variety of ways. Many software vendors provide tools (like Microsoft Office) that tightly integrate with analytics and reporting tools to provide an easy to use intuitive interface for reporting and forecasting. Making the right data more accessible to the right users can lead to better decision making and create a competitive advantage for the company.
6. Data Integration
Data Integration is the process of combining data that is residing in different sources and providing users with a unified view of that data in order to analyze and use the information for better decision making. If your organization is in the process of integrating data from various organizational units it is important to understand the difference between sharing data and integrating data to ensure the organizations are using the right tools for the right purposes. With new advances in customer relationship management (CRM) applications, supply chain management and even online tools to gather social media data, it becomes even more important to comprehend the types of data (structured and unstructured) being captured and have the right systems and methodologies in place to be able to holistically review the entire business processes from sales to support. This now requires companies to integrate their data enterprise-wide for cross-functional analysis.
7. Master Data Management
Viewed as an important part of data integration, Master Data Management refers to the processes and tools used to consistently define and manage the non-transactional data entities of an organization. The objective of which is to follow a unified methodology for collecting, aggregating, matching, consolidating, quality-assuring, persisting and distributing such data throughout an organization in order to ensure consistency and control over the use of that information. As part of the process it is important to have an on-going understanding of the data and its sources. With the growing need for enterprise-wide data integration, there is an even greater need to standardize this data since critical business decisions are made based on this data. Master Data Management helps ensure data integrity is achieved throughout the company.
8. Very Large Databases (Hadoop)
Many organizations are realizing the importance of parallel computing to query very large databases and re-architect their systems to take advantage of multi-core systems. Due to the rapid growth in data volumes (data explosion) companies face new challenges in managing, storing and manipulating very large databases. For example, a query processing a million rows may take a few seconds to process the data, whereas the same query may take minutes to process a billion rows. In many business situations this lag time of minutes would be unacceptable. In order to improve the query response time various algorithms now use the MapReduce concept to query large amounts of data very quickly. One such implementation of MapReduce is using a Hadoop cluster to work with petabytes of data. Cloud service providers like Amazon have also started providing Hadoop clusters in the Cloud in order to meet the massive processing needs of their vendor database-intensive applications.
9. Data Replication
Data Replication is the process of sharing information to ensure consistency between redundant resources to improve reliability, fault-tolerance, or accessibility. The process of maintaining copies of critical data usually uses a parent/child relationship between the original and the copies. The parent database logs the updates, which then ripple through to the secondary child database. The child outputs a message stating that it has received the update
successfully, thus allowing the sending (and potentially re-sending until successfully applied) of subsequent updates. Multi-master replication, where updates can be submitted to any database node, and then ripple through to other servers, is often desired, but introduces substantially increased costs and complexity which may make it impractical in some situations. The most common challenge that exists in multi-master replication is transactional conflict prevention or resolution. Most synchronous or eager replication solutions do conflict prevention, while asynchronous solutions have to do with conflict resolution. The more complex and mission-critical your database applications are, the greater need there is to ensure the reliability and availability of your data. Data Replication used to maintain multiple copies of data is critical to high availability operations.
10. Geo-spatial data visualization
This refers to a system that captures, stores, analyzes, manages and presents data with reference to geographic location or geo-spatial relationship. It is the merging of cartography, statistical analysis and database technology that allows users to create interactive queries, analyze spatial information, edit data maps, and present the results of these operations in a visual representation. Producing graphics displayed on a device, analytical dashboard or printed on paper helps the user to visualize and thereby understand the results of analyses or simulations of potential events. Use of such visualization presented on the fly will become even more pervasive as more interactive devices and smartphones leverage GPS activated technology. From a business perspective, geo-spatial data provides another layer of rich information being captured that can be used for business intelligence, enabling context enriched ecommerce and other applications.
Importance of People, Processes and Technology
E-commerce, social media and mobile, wireless, multi-purpose devices—including smartphones and machine-to-machine communications using radio frequency identification (RFID) technology—are all generating new streams of rich data that need to be managed, stored and leveraged. As mentioned, this data explosion is beginning to tax the capabilities of legacy Data Management systems, creating added security risk and putting a strain on IT budgets. Organizations that do not stay on top of these changes will soon find out that their ability to compete and respond to changing market conditions will be impeded.
The key to overcoming this growing complexity is greater process standardization, more effective data integration and increased use of the latest Data Management and BI automation. Using data warehouse appliances, leveraging the Cloud and using advanced predictive analytics are just a few of the new technologies you may want to consider to increase your success. More importantly is to make sure that your people, processes and technologies are all fully integrated and aligned. If you plan to take advantage of one of the new technologies
Recent Example – Smart Grid Technology
Predictive analytics is an important new data management technology. The following provides an example of the growing demand for this type of capability from work SE completed with one of the world’s largest Smart Grid Technology companies to scale-up their Smart Meter BI solution. The company needed help to improve the performance and scalability of their predictive analytics capabilities.
With the cost of energy continuing to rise, there is increasing pressure by consumers and regulators on Utilities and Municipalities to find more efficient ways to generate, deliver and manage the consumption of electricity, gas, water and other consumables. The deployment of new Smart Grid transmission and generation networks, Smart Meters and advanced BI tools to leverage real-time data promises to deliver next generation operational efficiencies that will allow operators to:
• More effectively forecast demand • Pursue new pricing models
• Allow consumers to make more intelligent purchase decisions • Move to a just-in-time generation model
As the result of Grid technologies, instead of once a month usage reporting, Smart Meters will constantly monitor and provide real-time data to the operator regarding the supply, demand and consumption of energy. This represents a significant increase in the volume of data that will need to be captured and stored by the IT infrastructure and poses unprecedented challenges for the management and comprehension of the rich data being created.
The Smart Meter BI system SE worked on needed to be scaled-up to handle more than 50 million meter readings daily in order to give the operator the ability to analyze a broad range of real-time data, perform predictive analytics, conduct various modeling scenarios to increase their efficiency levels, improve productivity and optimize their overall operations. Business intelligence gained from the data will allow them to predict and better match consumer demand with energy generation, to lower costs and eliminate excess power generation, predict when potential transformers are close to failure, to eliminate outages, and to reduce replacement costs and give consumers access to new levels of detail regarding their usage resulting in greater visibility and control over their own daily consumption choices.
Smart Meter data analytics services are projected to generate more than $4.2 billion in annual revenue by 2015, according to a report released by Pike Research[3]. Smart Grid implementations around the world represents a
fast growing and significant new market opportunity that is leveraging the latest Data Management technologies. It provides a good example of how an industry is taking advantage of one of the top ten trends mentioned in this paper.
How Can Scalability Experts Help
Keeping up with the latest database technologies can be challenging, given the speed of change. Constraints on your IT budget, increases in regulatory requirements and the overall growth in enterprise data can create situations where IT organizations lack the right experienced resources. SE can help fill the gap and provide you with short-term or long-term strategic consulting and implementation support. SE services include:
• Business Intelligence – drive business performance with improved visibility and decision making • Performance Management– drive efficiencies and optimize resources for mission critical computing
and operational excellence
• Platform Optimization – improve performance, mitigate risk and lower TCO by migrating or upgrading to the latest database platform. Consolidate and virtualize your computing environment to reduce your footprint, lower power consumption and increase utilization
• Cloud-Based Computing – improve productivity and efficiency by leveraging the latest technologies to set-up a shared service, on-premise private Cloud or take advantage of off-premise public Cloud services
• Strategic Resourcing – Experts-as-a-Service (EaaS) is a pay as you need on-demand resourcing solution that provides experienced and certified best-in-class data management and BI architects, consultants and DBAs.
Conclusion
In today’s fast paced data-rich environment, you need to take advantage of every opportunity to gain a competitive advantage. The amount of data you need to manage is growing every year and it will only continue to grow at an even quicker pace in the years to come. By leveraging the latest Data Management technologies you will be able to make your computing environment more available and scalable, reduce your IT costs and gain insightful data to make better business decisions. To determine what path to take, you should first evaluate the current situation of your environment, determine your business needs and create a road map. Taking the right steps now can help you automate and streamline your processes and raise the performance of your operations.
Biography
Author – Raj Gill
Raj Gill is Founder and President of Scalability Experts, a global leader in data management and business intelligence services with locations in the U.S., Dubai, Singapore and India. Gill has over 20 years of experience in all areas of database architecture, data lifecycle management, deployment and development. Gill serves on Microsoft’s Advanced Infrastructure Solutions technology advisory board and over the years he has earned international recognition as a leading data management expert in database performance, platform scalability and transformational strategic planning involving server consolidation, virtualization and cloud computing. He has published many articles on the performance of Data Management platforms and has authored case studies addressing deployment best practices in the enterprise environment. Gill is a frequent keynote speaker and has presented at events such as PASS, Microsoft’s world-wide SQL Server launch events, CXO roundtables and various technical user groups.
About Scalability Experts
We are an award-winning global leader in Data Management and Business Intelligence solutions. Our services help you get more value from your data and increase the performance and scalability of your computing environment. With 10 years of industry experience and our deep understanding of every facet of the data lifecycle, we can optimize the performance of your database operations, make your systems more responsive and provide business insight critical to gain a competitive advantage. The world’s leading software and hardware companies such as Microsoft and HP rely on Scalability Experts to help their customers. Let us help you. Contact us at 469-635-6200 or visit our Website at www.scalabilityexperts.com. If you would like a Solutions Sales Manager to contact you please send us an email at [email protected].