Towards the Next Generation Intelligent BPM In the Era of Big Data

(1)

In the Era of Big Data

Xiang Gao

Department of Management Information System,

China Mobile Communications Corporation, Beijing 100033, China

Abstract. Big data opens a new dimension, space, to oﬀer the advan-tage of gleaning intelligence from data and translating that into business beneﬁts. It will lead to knowledge revolution in all sectors, including Busi-ness Process Management (BPM). This paper sheds light on key charac-teristics of intelligent BPM (iBPM) from an industrial point of view. A big data perspective on iBPM is then proposed, showing the challenges and potential opportunities in attempt to catalyze ideas from insight to application. China Mobile Communications Corporation’s (CMCC) ex-ploring and practice are provided, which also elicit the future research directions for enterprise applications.

Keywords: big data, intelligent BPM.

1 From BPM to Intelligent BPM

Business Process Management (BPM) is recognized as a holistic management approach that promotes business effectiveness and efficiency while striving for innovation, flexibility, and integration with technology. It is growing as a disci-pline, where new technologies are rapidly emerging, keeping BPM center stage in both business and technology domains [11].

Recently, intelligent BPM (iBPM) has been given new impetus by integrating analytical technologies into orchestrated processes. It is enabling leading orga-nizations to make their business operations more intelligent, and giving process participants better real-time situational awareness and the ability to tailor their responses appropriately. Gartner considers it the next stage in the evolution of BPM for the following reasons [7]. Firstly, it will meet the ongoing need for process agility, especially for regulatory changes and more-dynamic exception handling. Secondly, it will aim at leveraging the greater availability of data from inside and outside the enterprise as input into decision making. Thirdly, it will facilitate interactions and collaboration in cross-boundary processes.

From an application infrastructure and middleware (AIM) point of view, an iBPM Suite inherits all the features of traditional BPM Suite, complemented with more-advanced technologies, summarized from 10 areas of functionalities by Gartner [7]. From enterprise application and consolidation point view, the diﬀerence of next generation iBPM from the current one can be summarized as the following “4As”.

F. Daniel, J. Wang, and B. Weber (Eds.): BPM 2013, LNCS 8094, pp. 4–9, 2013. c

(2)

– Analytical: The most prominent feature of iBPM is the capability of ad-vanced analytics. It integrates with state-of-the-art analytic technologies, including both pre-analytics and post-analytics. The former mainly concen-trates on process model based analysis, such as model decomposition [6], clone detection [5], similarity search [4] etc. The latter makes use of the historical log and other information, and refers to automatic business pro-cess discovery (i.e., propro-cess mining [15, 16]), social analysis [17], intelligent recommendation, prediction and so on.

– Automatic: The enormous volumes of data require automated or semi-automated analysis techniques to detect patterns, identify anomalies, and extract knowledge. Take business processes consolidation for example. It is always an extremely arduous task for large organizations with more than thousands of process models. The iBPM should be designed to facilitate the procedure that automatically reduces duplications and makes the diﬀerences between process models explicit, instead of manual operation.

– Adaptive: The dynamic changing of business processes and external data inside and outside should be ﬂexibly captured and responded by resorting to not only the adaptive adjusting of the analysis algorithm parameters, but also the on-demand selection of appropriate algorithms in a conﬁguration way.

– Agile: There is always a big gap: business analysts have deep understanding of business but cannot design the process models independently without the support of IT staffs, even though notation based modeling language is exploited. The iBPM is expected to simplify the procedure. For example, by incorporating process fragments with business semantics into design tool, the efficiency of modeling can be significantly improved and most of the procedures can be implemented by business analysts with the least IT efforts. It is worth noticing that achieving of the “4As” features will be given new op-portunities in the era of big data.

2 A Big Data Perspective on iBPM

The birth and growth of big data was the deﬁning characteristic of the 2000s. As obvious and ordinary as this might sound to us today, we are still unraveling the practical and inspirational potential of this new era [13].

What does big data really mean in the evolution of BPM? Just one thing, elegantly stated by the founding father and pioneer long before the introduction of the big data concept. “In God we trust; all others must bring data,” – W. Edwards Deming. Before the existence of big data, we could only treat these words as maxims. However, now, we must consider them as achievable technical criteria for our work, where big data brings unprecedented impetus and vitality for BPM. Driven by process data and other related data, it can be a new platform for the R&D of intelligence based on big data, making Deming’s maxims a reality for the operation of future iBPM systems.

(3)

To find the needle in the big data iBPM haystack, one must first clarify what the “Big Data” is in business process field. In common sense, mobile sensors, social media services, genomic sequencing, and astronomy are among myriad applications that have generated an explosion of abundant data, which are nat-urally treated as big data. However, it is very important that the biggest mis-nomer actually comes from the name itself. When we talk about big data, we must put its size in relation to the available resources, the question asked, and the kind of data. To some extent, large numbers of historical log and instances data generated by running business processes can all be treated as “big data” for their high variety and heterogeneity, especially in a large-scale organization. Furthermore, from a generalized point of view, data with the following features can be recognized as process data [8]: 1) composed of events; 2) on multiple units and levels of analysis with ambiguous boundaries; 3) of variable temporal embeddedness; and 4) eclectic. Then, the web search query log, data of product ordering by customers, and so many other kinds of data all belongs to process big data.

On the path from insight to action, one must pay much attention to the following perspectives.

– Sparsity Vs. Redundancy. The widespread use of traditional data min-ing and artificial intelligence algorithms has usually exposed their limitations on data sparsity in large-scale data set or problems associated with high di-mensionality [2]. For example, user-based collaborative filtering systems have been very successful in the past, but their weakness has been revealed for large, sparse databases [12]. However, the large amount of process data al-ways exhibits redundancy instead of sparsity. A real scenario is investigated in China Mobile Communications Corporation (CMCC) Office Automation (OA) systems. There are totally more than 8000 processes running in these systems, independently maintained and evolved by subsidiary organizations themselves. Due to individual management requirements, these processes, even expressing the same business behavior, are usually not exactly the same while having a high degree of similarity. The technology for automatic frag-mentization of process models and identification of highly reusable fragments are required in iBPM.

– Sample Vs. Population. Sample based analysis is usually conducted to infer the whole behavior of population. However, in the age of big data, one turns to put emphasis on population but not sample, since collecting and processing large amount of data are feasible now. Take the process mining scenario for example, where the completeness of event log plays an extremely important role. For limited event log (i.e., recognized as sample), the global completeness needs to be evaluated by resorting to distribution ﬁtting or at least bound estimation. However, for complete event log (i.e., recognized as population), the global completeness is deﬁnitely guaranteed. It seems that the discovery problem becomes easier and the result will be more accurate! However, much more attention should be further paid to the special cases and quality of event log. Free control is fairly particular in CMCC to describe

(4)

the behavior that allows administrator to adjust a normal routine as what he wants. It seldom appears in the complete event log for very special cases. Actually, it almost cannot be automatically discovered due to its flexibility. The data quality can also affect the efficiency of mining algorithms, while it suffers from data missing and noise infection for population data.

– Individual Vs. Network. The observation of a large variety of complex systems often reflects individual data sets and decentralized links, which can be further integrated and consolidated together into a data network. Big data is often associated with such kind of complex data network, so that it is offering a fresh perspective, rapidly developing into a new network science discipline [1]. It also has exerted a subtle influence on the BPM research and applications. Web Ontology Language (OWL) has been introduced to serve the annotation of process models, and then, a set of mapping strategies are developed to conduct the annotation by considering the semantic relation-ships between model artifacts to facilitate process knowledge management and semantic interoperability [9]. Moreover, the business network manage-ment (BNM) can strive to make business network, joined up by collaborating business processes, visible within network views and combine automatic dis-covery, mining and inference capabilities with expert knowledge in complex, dynamic and heterogeneous enterprise domains [10].

– Causality Vs. Correlation. A major issue of concern in big data research is that correlation plays much more important role than causality. For example, Google’s founding philosophy is that we don’t know why this page is better than that one: If the statistics of incoming links say it is, that’s good enough. No semantic or causal analysis is required. However, we would like to mention that causality and correlation are equally important in BPM ﬁeld. Obviously, process mining is strongly based on the rigorous deduction of activity causality from event log. Correlation also attracts much attention, in the scenario that some clustering based technologies are taken into consideration. In this situ-ation, not only the business behavior and structure but also ontology based business semantics should be treated to describe the process features.

3 Embrace the Idea of iBPM in the Era of Big Data

CMCC is committing itself to the exploring and practice of iBPM in the era of big data, aiming at facilitating process consolidation and improving analytical intelligence.

First of all, CMCC is concentrating on the key problems of business under-standing & raw process reconstruction, complex business logic & recessive rules, ﬂexible modeling based on business semantics, as well as redundancy removal & process repository. Several existing algorithms, such as process mining, process models decomposition, similarity search, clustering and merging, are well con-sidered and exploited. These algorithms are further implemented and integrated into a tailor made process model conﬁguration tool, in order that the advanced analysis is appropriately integrated into the BPM life-cycle. Empirical study

(5)

also discovers some problems to be further considered. For example, the refined process structure tree (RPST) [18] and its extension mainly focus on the struc-ture instead of business logics. It suggests that business analysts may not easily reuse these fragments directly obtained by RPST for modeling, where involving business semantics into the fragmentization algorithms is of much interest. The current mining algorithms provide efficient procedure to reconstruct mostly the control flow, while the information from e-forms, rules and organizational rela-tionship are also needed to elaborately consider in real scenario. Besides, the business behavior of processes is always restricted by specifying constraints on allowed actions, which are usually recessive and need to be modeled semantically. Secondly, most of the big data surge is data in the wild–unruly stuff like words, images and videos. Similar phenomenon partially happens in heterogeneous pro-cess data. It is not typically grist for traditional databases. The NoSQL databases based on distributed storage technology exhibit obvious advantages in CRUD, which gains benefits regarding to extensibility, data model flexibility, economical efficiency, accessibility and so on. Based on YCSB benchmark [3] and analysis framework, CMCC developed a novel distributed cloud storage benchmark, ini-tially consisted of 15 x86s (IBM 3650M3). Function, performance, scalability, consistency are designed as the main test metrics for the mainstream distributed file systems (e.g., HDFS, Swift, GPFS and so on) and NOSQL databases (e.g., Hbase, Cassandra, MongoDB and so on). By testing them, we found distributed file systems also have exactly the same standard read & write interface, and thus, is ready to conveniently replace traditional storage mode. MongoDB has the most familiar operation interface with RDBM and balanced performance and reliability in most cases. However, open source frameworks need detailed parameter selection and optimization.

Thirdly, several related analyses are also conducted in CMCC. We have es-tablished an open source based big data analytics platform. It implements sev-eral time series analysis models for CMCC’s operations indicators predictions. By combining Hadoop and Mahout, an eﬃcient recommendation engine is con-structed for Mobile Market (CMCC’s App Store, like Apple App Store), using Parallel FP-Growth algorithm and taking advantages of the platform. Besides, social network analysis (SNA) is applied and improved to analyze users’ behav-ior, public feeling, employee emotion and so many topics, using internal social network data and external internet micro-blog data. These technologies will be further adjusted and transformed to BPM ﬁeld to discover new value.

As mentioned in the previous conference keynote by Prof. Van der Aalst, the “Big Data” wave is providing new prospects for BPM research [14]. We do believe that big data will provide advancing trends in technology that open the door to a new approach to promptly improve the theory and application level of iBPM.

References

[1] Barab´aszl´o, A.L.: The Network Takeover. Nature Physics 8(1), 14–16 (2011) [2] Billsus, D., Pazzani, M.J.: Learning Collaborative Information Filters. In: Proc.

(6)

[3] Cooper, B.F., Silberstein, A., Tam, E., et al.: Benchmarking Cloud Serving Sys-tems with YCSB. In: Proc. of the 1st ACM Symposium on Cloud Computing, pp. 143–154. ACM (2010)

[4] Dijkman, R.M., Dumas, M., Dongen, B., Uba, R., Mendling, J.: Similarity of Business Process Models: Metrics and Evaluation. Information Systems 36(2), 498–516 (2011)

[5] Ekanayake, C.C., Dumas, M., Garc´ıa-Ba˜nuelos, L., La Rosa, M., ter Hofstede, A.H.M.: Approximate Clone Detection in Repositories of Business Process Models. In: Barros, A., Gal, A., Kindler, E. (eds.) BPM 2012. LNCS, vol. 7481, pp. 302– 318. Springer, Heidelberg (2012)

[6] Gschwind, T., Koehler, J., Wong, J.: Applying patterns during business process modeling. In: Dumas, M., Reichert, M., Shan, M.-C. (eds.) BPM 2008. LNCS, vol. 5240, pp. 4–19. Springer, Heidelberg (2008)

[7] Janelle, B.H., Roy Schulte, W.: BPM Suites Evolve into Intelligent BPM Suites, Gartner G00226553 (2011)

[8] Langley, A.: Strategies for Theorizing from Process Data. Academy of Manage-ment Review, 691-710 (1999)

[9] Lin, Y.: Semantic Annotation for Process Models. Diss. Trondheim, Norway (2008) [10] Ritter, D.: Towards a Business Network Management. Enterprise Information

Sys-tems of the Future, pp. 149–156. Springer, Heidelberg (2013)

[11] Samantha, S.: Research Index: New BPM Technologies Lead the Way to Achieving Process Adaptability. Gartner 00228461 (2012)

[12] Sarwar, B., Karypis, G., Konstan, J., Riedl, J., et al.: Item-based Collaborative Fil-tering Recommendation Algorithms. In: Proc. International Conference on World Wide Web 2001, pp. 285–295. ACM (2001)

[13] Tawny, S., Brian, D.J.: Entertainment in the Age of Big Data. Proceedings of the IEEE 100(5), 1404–1408 (2012)

[14] van der Aalst, W.M.P.: A Decade of Business Process Management Conferences: Personal reﬂections on a Developing Discipline. In: Barros, A., Gal, A., Kindler, E. (eds.) BPM 2012. LNCS, vol. 7481, pp. 1–16. Springer, Heidelberg (2012) [15] van der Aalst, W.M.P., Weijters, T., Maruster, L.: IEEE Transactions on

Knowl-edge and Data Engineering 16(9), 1128–1142 (2004)

[16] van der Aalst, W.M.P., Dongen, B., Herbst, J., Maruster, L., Schimm, G., Weijters, A.: Workﬂow mining: A Survey of Issues and Approaches. Data & Knowledge Engineering 47(2), 237–267 (2003)

[17] van der Aalst, W.M.P., Song, M.S.: Mining Social Networks: Uncovering Interac-tion Patterns in Business Processes. In: Desel, J., Pernici, B., Weske, M. (eds.) BPM 2004. LNCS, vol. 3080, pp. 244–260. Springer, Heidelberg (2004)

[18] Vanhatalo, J., V¨olzer, H., Koehler, J.: The Reﬁned Process Structure Tree. In: Dumas, M., Reichert, M., Shan, M.-C. (eds.) BPM 2008. LNCS, vol. 5240, pp. 100–115. Springer, Heidelberg (2008)