An Integrated Framework for Cloud Data Management in Educational
Institutes
Indu
Arora
Department of Computer Science and
Applications
MCM DAV College for Women
Chandigarh, India
[email protected]
Dr.
Anu
Gupta
Department of Computer Science and
Applications
Panjab University
Chandigarh, India
[email protected]
Abstract
Information and Communication Technology (ICT) has transformed the whole world to a global village. Acquiring and maintaining essential ICT infrastructure has become a great challenge especially in education sector. Being a human resource development sector, it needs to use expensive infrastructure more effectively not only for providing education but also for its transactional applications. With the evolution of Cloud Computing, scalable IT enabled services are delivered on-line to its users on pay-per-usage basis from anywhere through variety of devices. The present paper emphasizes upon the use of Cloud Computing and Cloud databases for the transactional applications of educational institutes. It highlights the issues involved in managing data in Cloud like conformity of ACID (Atomicity, Consistency, Isolation, Durability) guarantees in transactional data. Then the paper proposes an Integrated Framework for managing transactional applications of educational institutes. The proposed framework provides efficient and effective technique to manage transactional data. It also brings uniformity in the way transactional applications and data are stored and accessed by various educational institutes in the Cloud.
1. Introduction
Information and Communication Technology (ICT) has become an essential mean of providing quality education in all the educational institutes. In addition to delivery of education through ICT, different applications like Student Information System, Fee Management, Course Management, Examination System and Library Management are designed and used to meet the requirements of educational institutes. These systems are either transactional or analytical in nature. Educational institutes spend major chunk of their budget on building and maintaining ICT
infrastructure. Every educational institute irrespective of its size and strength is managing its own Information Technology (IT) infrastructure with a number of laboratories as per requirements of different courses. Apart from investing in hardware infrastructure, institutes also invest in software licenses for teaching, training and back-office activities. Faculty, administrative staff, students and research scholars need unlimited access to the computing resources for carrying out practical and experimental work handling large amount of data. They also want to access Internet for information, communication and collaboration.
Educational institutes present a very dim picture of utilization of expensive resources. It is very low as compared to the investment made in setting up ICT infrastructure. Huge investment in procuring, installing and maintaining ICT infrastructure goes underutilized most of the time as it is made available to meet the demand during peak hours. Figure-1 shows the utilization of ICT resources during different hours of a day in educational institutes on the basis of observations of few urban colleges.
Frequent upgradation in the technology tends to make existing hardware and software obsolete after some time. Research projects require vast quantity of storage or processing capacity for a limited time. Application requirements of students also change with the change in semester or session. IT department is not able to focus on its core issues. Its workforce is spending more time on setting and maintaining ICT infrastructure rather than adopting and exploring new technologies.
Figure1. Utilization of ICT Resources in a Day
In developing countries like India, educational institutes are using ICT in a closed environment. Their ICT infrastructure is not made accessible to several institutes especially in rural area. Teachers are not able to combine traditional teaching-learning (talk and chalk, lecturing etc.) methods with latest technologies due to the lack of required infrastructure. The need of the hour is to use ICT resources judiciously. Under utilization as well as lesser availability of these expensive resources force us to think about other alternatives. ICT infrastructure should not only be available but also scalable up or down as per curriculum and research requirements. Adoption of Cloud Computing (CC) is the best option to overcome problems in the existing system. Factors like high speed networks, fast commodity hardware, World Wide Web and changing storage requirements have put more emphasis on data management using latest technologies such as CC and Cloud databases. CC can be used as an alternative mechanism to provide expensive ICT infrastructure uniformly and economically to all the educational institutes while Cloud databases can be used to manage the data effectively and efficiently.
1.1. Cloud Computing
Cloud Computing (CC) is considered as fifth generation computing after Mainframe, Personal computers, Client-Server Computing and World Wide Web. The Term “Cloud” is used to represent the abstraction of Internet [1]. It is a fast growing concept changing the ICT related perceptions of its users as they want to own ICT as a commodity instead of an asset. CC allows people to use computing, storage and network resources on pay-per-usage basis just as telephone, electricity and water services are used. It takes benefits of many technologies such as server consolidation, huge and faster storage, utility computing, distributed computing, grid computing, virtualization, N-tier architecture and robust networks.
With CC, scalable IT enabled services are delivered on-line to its user on pay-per-usage basis from anywhere through variety of devices. Google's Gmail, Yahoo‟s mail, Facebook.com and Twitter.com are popular examples of CC.
According to the definition of National Institute of Standards and Technology (NIST), “Cloud Computing” is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g. networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction” [2]. Foster et al. define the CC as “a large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted, virtualized, dynamically-scalable, managed computing power, storage, platforms and services are delivered on demand to external customers over the Internet” [3]. Elasticity, scalability, high availability, price-per-usage and multi-tenancy are the main features of CC.
CC is composed of three service delivery models: Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). Cloud has four deployment models which describe the nature of Cloud environment: Public, Private, Community and Hybrid Clouds [4, 5, 6].
1.2. Cloud Databases
A Cloud database is a database delivered to users on demand through the Internet from a Cloud database provider's servers. Cloud databases are also called NoSQL databases. NoSQL means „Not Only SQL‟ or „Not Relational‟. It is defined as a non-relational, shared-nothing, horizontally scalable database without ACID (Atomicity, Consistency, Isolation, Durability) guarantees. NoSQL implementations are classified further into Key/Value stores, Document stores, Object stores, Tuple stores, Column stores and Graph stores. They can store and retrieve unstructured, semi-structured and semi-structured data. Amazon‟s SimpleDB, Amazon RDS, Google‟s BigTable, Yahoo‟s Sherpa and Microsoft‟s SQL Azure are the commonly used databases in the Cloud [7, 8, 9].
1.3. Cloud Data Management
Data Management primarily deals with capturing, storing and protecting data. It also improves quality of information generated, manages information across
enterprises, aligns data with technology and business needs, ensures privacy and maximizes its effective use. In the past, data was used to be stored in flat files. But with the changes in requirements and advancement in technology, data is stored in databases of different types such as Relational, Object-oriented, Column, NoSQL etc. to meet the varying requirements of applications. Relational and Object-oriented databases are used in client-server and Internet applications for write-intensive transactional applications. Relational databases are ACID compliant. They are used mainly for transaction processing applications which manage structured data held in databases along with its metadata [10].
Column databases are becoming popular in Cloud Computing (CC) environment. NoSQL Cloud databases are used in Cloud for read-intensive analytical and scalable applications such as data warehousing, data mining and business intelligence. They are mainly used for semi-structured, unstructured data and user created contents such as documents and photos. They follow BASE (Basically Available, Soft state, Eventually consistent) in contrast to the ACID guarantees [11, 12].
ElasTraS is a light-weight data store that supports only a sub-set of the operations supported by traditional database systems. It provides scalable transactional data access to the data store and employs two-level hierarchy of Transaction Managers (TM) for providing transactional guarantees along with scalability. Its ACID guarantees for transactions are limited to a single partition [13]. Cloudtran helps to write scalable applications for distributed and Cloud environments. It allows use of multiple databases located on different machines as a persistent data store for In-memory data. Its data layer server coordinates the movement of data between databases and the In-memory data grid. Extra coding is not required for ACID transactionality or location of data [14].
2. Cloud Data Management Issues in
Educational Institutes
Data integrity is the most critical requirement of all update-intensive transactional applications and is maintained through database constraints. The lack of data integrity results in unexpected outputs. Though Cloud databases address the limitations of existing Relational databases related to scalability, ease of use and dynamic provisioning, but they provide high
availability at the cost of consistency. So, Cloud databases are not considered suitable for write-intensive transactional applications of educational institutes such as Student Information System, Fee Management, Library Management, Payroll and Personnel Information, Financial Accounting and Store Keeping due to their rigid requirement of ACID guarantees. The data in these applications must be written onto disk instantaneously. Cloud databases are distributed and replicated in nature, so it becomes difficult to handle frequent and immediate updates to the database in the Cloud. Analytical applications are best suited for Cloud environment as they do not require ACID guarantees. Cloud databases such as SimpleDB, Bigtable and Dynamo are scalable, but are not able to provide transactional guarantees [15]. Existing applications running in educational institutes were mainly developed for lower transaction volumes on fewer computer nodes. They are not able to support unbounded transaction volumes operating in the Cloud. Serialized requests for data and disk I/O also affect overall performance [16].
Cloud Data Management deals with variety of on-line transactions, distributed databases, and numerous users. A distributed query has to access multiple nodes of Cloud database. Querying distributed database in Cloud is a major challenge that developers face. Distributed access to a single resource of data or access to distributed resources from a single application component makes it difficult to provide ACID guarantees in the Cloud. So, it is difficult to ensure consistent and reliable access to Cloud database. Such issues bring down the performance and increase query response time. Huge volume of diversified transactions from different users put extra load on database layer. The different instances of Application Server layer and Web Server layer in web-based applications do not share state information within the same application. So, these two layers can be easily scaled up by spawning new machine instances to absorb the increased load. But Database layer is not easily scalable especially with write-intensive transaction applications as they perform frequent updates, inserts and deletes to the database [17].
Virtualization adds additional layer to Cloud environment. The ability to automatically discover, manage and govern any new instance of a transaction makes difficult to scale transaction processing applications in the Cloud as they used to scale on a mainframe [18].
3.
Need
for
Integrated
Cloud
Data
Management Framework
Cloud Computing (CC) has not been fully utilized for higher education especially in India. It is mainly used for collaborative and communication activities by educational institutes in developed countries. CC has not been used for deploying back office transaction processing applications of educational institutes. There is no uniform way to represent, store, access and integrate data required by these applications. So, data is managed differently and individually by various educational institutes. Moreover, educational institutes are scattered across the wider area. They need to interact with other institutes located across the globe. Cloud environment is also distributed in nature. It becomes difficult to handle OLTP (Online Transaction Processing) applications of educational institutes in the highly distributed Cloud environments. Since, data management activity related to these applications is not given due importance while deploying them in the Cloud for educational institutes, it requires more extensive and thorough research so that new algorithms and techniques can be adopted.
Due to lack of integration of data, educational institutes are not able to use their digital data for analytical processing. Data used and produced by transactional applications can be mined via analytical applications for strategic planning. Due to these observations and limitations, a dire need is felt to have a uniform and cost-effective solution to handle transactional data of educational institutes.
This paper proposes a framework for educational institutes which can provide efficient and effective access to applications as well as data stored in the Cloud. Besides addressing the issues of data management, the proposed framework will help in reducing digital divide among educational institutes. The proposed study will enable all needy or remote institutes to access these services using thin clients through network in consistent and identical manner. They can store data in the Cloud which can be used for transactional and analytical applications of these institutes.
This framework can be utilized by a group of universities or a single university which can become a Cloud provider to meet the requirements of affiliated institutes. These institutes can avail services like accessing/storing applications/ data from any
Internet-enabled device on the basis of mutual collaboration or on pay per usage.
4. Integrated Cloud Data Management
Framework
The Proposed architecture of Integrated Cloud Data Management Framework in educational institutes is shown in figure-2 along with other layers required for managing data. A brief introduction of other layers such as Cloud Interface layer, Virtualization Layer and Cloud Infrastructure Layer is also given below with the main focus on Cloud Data Management Layer.
4.1. Cloud Interface Layer
Cloud Interface layer will allow accessing of data through a variety of devices such as Smartphone, Tablets, and Personal computers etc. from the Cloud database. Cloud provider will provide this layer. It will connect Cloud users to Cloud databases through Internet using web-oriented Application Interfaces. This layer will apply authentication and authorization techniques for the users and their access to the resources.
4.2. Virtualization Layer
Virtualization Layer will prepare virtual machine comprising guest Operating System (OS), data in In-memory database and applications for the use of Cloud users. Different users can access different applications or same applications, but data will be different. Each user will access its own data. In case of common applications to be used by different users, the Cloud environment provider is responsible for configuring and deploying applications. In case of individual applications, educational institute is responsible for configuring and deploying application. So, based on different scenarios, there are two types of users: System Users and Operational Users. System User is responsible for deploying and configuring of common applications. Operational User is responsible for making transactional entries in the system. In case of common applications, System User will configure applications at Cloud environment level and in case of individual application; System User will work at institute level. Operational User will always work from individual institute.
4.3. Cloud Data Management Layer
Cloud Data Management Layer will be a software layer used to manage data, metadata, data consistency and data integrity of a particular Cloud client. This layer will be the main focus of the proposed study. At
this layer, the data management activities are given below:
Figure 2. Proposed Cloud Data Management Framework in Educational Institutes
Data Organization activity will be concerned with physical and logical organization of metadata. This part of the Cloud Data Management will identify the hardware and software requirement
for storing replicated, distributed and partitioned data. The other sub tasks such as backup and recovery, managing metadata and application data will be carried out under this layer.
Data Collection part of Cloud Data Management will be concerned with capturing data from end user and performing validation and updation. This will also ensure data consistency and data integrity.
Data Distribution component of Cloud Data Management Layer will be used to make data available and accessible to the Cloud user depending on user requirement and defined security policy. This layer will be responsible for managing users, their roles and responsibilities. It will also be responsible for the implementation of defined security policy.
4.4. Cloud Infrastructure Layer
Cloud Infrastructure Layer will consist of hardware and host OS with capabilities of distributed hardware components such as server, storage devices, network devices etc. A virtualized environment comprising servers, storage, network devices, guest OS and system software will be provided at this layer.
5. Cloud Data Management Procedure for
Educational Institutes
The following activities have been identified to manage Cloud data in educational institutes. The procedure for managing cloud data for educational institutes is shown in figure-3.
5.1. User Authentication and Authorization The users belonging to educational institutes will access Cloud through web interface provided for this purpose. The users will then be authenticated. The information regarding roles and responsibilities in terms of application and data access would be obtained. This will be helpful for providing required data to user. 5.2. Creating Virtualized Environment
Based on user type and requirement of applications, a virtualized environment will be created within host Operating System (OS). For common application, a single Virtual Machine (VM) is sufficient having multiple users at operating system level. For individual applications, different VMs will be required for different educational institutes.
5.3. Loading Required Data into In-memory Database
This step will provide required data for the use of already loaded application. A data set will be generated from multiple storage locations and from large pool of data. Generated data set will be loaded into In-memory database residing in memory.
5.4. Data Access to User
After loading data into In-memory database, applications would now be fully available for the user. Depending upon the type of application user, access would be given to data. All the interaction with database will be done in memory and then updated data will be stored on permanent storage devices periodically while ensuring ACID properties.
5.5. Unloading Virtualized Environment
Last step will be to unload virtualized environment from memory after educational institute finishes the job. The main activity will be to shut down OS so that memory can be used for other virtual machine in the Cloud.
Figure 3. Activities of Cloud Data Management in Educational Institutes
6. Implementation tools
Different tools shall be used to implement different layers of proposed framework. The Cloud Interface Layer will be implemented using Java Enterprise Edition 7 (Java EE 7). The Cloud Virtualized Layer
will be implemented using virtual software form VMWARE Workstation and CITIRX‟s XenDesktop. To implement Cloud Data Management Layer, Cloudtran, Coherence and Java EE 7 shall be used and to implement Cloud Infrastructure Layer, VMware‟s workstation, MySQL as database shall be used. Other tools shall also be explored.
7. Conclusion and Scope for Future Work
Cloud Computing (CC) is a cost effective solution for educational institutes as it provides scalable, reliable and latest ICT infrastructure on pay-per-usage basis while optimizing the usage of expensive resources. Transactional applications and data are stored, processed and accessed in different ways in spite of the similar requirements of various institutes. Relational databases played dominant role in handling transactional data but cannot be employed in the Cloud easily due to their rigid ACID requirements. The proposed study attempts to bring uniformity in the way transactional applications and data are stored and accessed by various educational institutes. Besides using expensive infrastructure efficiently by universities, affiliated institutes can avail services such as infrastructure and software without much investment in costly ICT infrastructure. It may also help in reducing digital divide among affiliated educational institutes.10. References
[1] S.Rajan, “Cloud Computing: The Fifth Generation of Computing,” Proceedings of International Conference on Communication Systems and Network Technologies (CSNT), Jammu, India, 3-5 June 2011, pp. 665–667.
[2] Peter Mell and Tim Grance, “The NIST Definition
of Cloud Computing,”
http://www.nist.gov/itl/Cloud/upload/Cloud-def-v15.pdf [Last Accessed on September 25, 2012].
[3] Ian Foster, Yong Zhao, Ioan Raicu, Shiyong Lu, “Cloud Computing and Grid Computing 360-Degree Compared,” Grid Computing Environments Workshop (GCE‟08), Austin, Texas, USA, 12-16 November 2008, pp. 1- 10.
[4] Silky Bansal, Sawtantar Singh, Amit Kumar, “Use of Cloud Computing in Academic Institutions,” International Journal of Computer Science and Technology (IJCST), Vol. 3, Issue 1, Jan-March 2012, pp. 427-429.
[5] T. Ercan, “Effective Use of Cloud Computing in Educational Institutions,” Procedia Social and Behavioral Science, Vol. 2, 2010, pp. 938-942.
[6] M. Mircea, A. I. Andreescu, “Using Cloud Computing in Higher Education: A Strategy to Improve
Agility in Current Financial Crisis,” International Business Information Management Association (IBIMA), Vol. 2011, 2011.
[7] Rick Cattell, “Scalable SQL and NoSQL Data Stores,” ACM Special Interest Group on Management of Data (ACM SIGMOD), New York, USA, Vol. 39, Issue 4, May 2011, pp. 12-27.
[8] F. Chang, J. Dean, S.Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, R, E, Gruber, “Bigtable: A Distributed Storage System for Structured Data,” Proceedings of 7th Usenix Symposium Operating Systems Design and Implementation (OSDI 06), Usenix Association, Seattle, Washington, 6-8 November 2006, pp. 205–218.
[9] Xeround Cloud Database, http://xeround.com [Last Accessed on May 25, 2012].
[10] S. Sakr, A. Liu, D. M. Batista, and M. Alomari, “A Survey of Large Scale Data Management Approaches in Cloud Environments,” IEEE Communications Surveys and Tutorials, Vol. 13, Issue 3, 2011, pp. 311-336.
[11] Daniel J. Abadi, “Data Management in the Cloud: Limitations and Opportunities,” Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, Vol. 32, No. 1, 2009, pp. 3-12.
[12] Daniel J. Abadi, “Column-stores vs. Row-stores: How Different Are They Really?,” Proceedings of International Conference on Management of Data- SIGMOD‟08, Vancouver, Canada, 9-12 June 2008, pp. 967-980.
[13] Sudipto Das, “ElasTraS: An Elastic Transactional
Data Store in the Cloud,”
http://static.usenix.org/event/hotcloud09/tech/full_papers/das. pdf [Last Accessed on 9th Aug 2012].
[14] Cloudtran Technology Genesis, A Technology
Whitepaper by Cloudtran Inc, 2011,
http://www.Cloudtran.com/pdfs/CloudtranTechnologyBrief.p df [Last Accessed on 9th Sep 2012].
[15] Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, Werner Vogels, “Dynamo: Amazon‟s Highly Available Key-value Store,” Proceedings of 21st ACM (Association for Computing Machinery) Symposium on Operating System Principles, (SOSP), Stevenson, WA USA, 14-17 October 2007, pp. 205-220.
[16] Donald Kossmann, Tim Kraska, Simon Loesing, “An Evaluation of Alternative Architectures for Transaction Processing in the Cloud,” Special Interest Group on Management of Data SIGMOD‟10, Indianapolis, Indiana, USA, 6-11 June 2010, pp. 579-590.
[17] Marcia Kaufman, “The Challenge of Managing On-line Transaction Processing Applications in the Cloud Computing World,” A Hurwitz Whitepaper [Last Accessed on 15th October 2012].
[18] K.Vasantha Kokilam and Samson Dinakaran, “Data Storage Virtualization in Cloud Computing,” International Journal of Advanced Research in Technology (IJART), Vol. 1, Issue 1, 2011, pp. 16-2