• No results found

Big Data

N/A
N/A
Protected

Academic year: 2020

Share "Big Data"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

Big Data

Big data is an evolving term that describes a large volume of structured, semi-structured and unsemi-structured data that has the potential to be mined for information and used in machine learning projects and other advanced analytics application.

Types of Data

Big Data is widely classified into three main types, which

are-1. Structured data

Structured Data is used to refer to the data which is already stored in databases, in an ordered manner. It accounts for about 20% of the total existing data, and is used the most in programming and computer-related activities.

There are two sources of structured data- machines and humans. All the data received from sensors, web logs and financial systems are classified under machine-generated data. These include medical devices, GPS data, data of usage statistics captured by servers and applications and the huge amount of data that usually move through trading platforms, to name a few.

Human-generated structured data mainly includes all the data a human input into a computer, such as his name and other personal details. When a person clicks a link on the internet, or even makes a move in a game, data is created- this can be used by companies to figure out their customer behavior and make the appropriate decisions and modifications.

2. Unstructured data

While structured data resides in the traditional row-column databases, unstructured data is the opposite- they have no clear format in storage. The rest of the data created, about 80% of the total account for unstructured big data. Most of the data a person encounters belongs to this category- and until recently, there was not much to do to it except storing it or analyzing it manually.

Unstructured data is also classified based on its source, into machine-generated or human-generated. Machine-generated data accounts for all the satellite images, the scientific data from various experiments and radar data captured by various facets of technology.

(2)

Instagram handles, the videos we watch on YouTube and even the text messages we send all contribute to the gigantic heap that is unstructured data.

3. Semi-structured data.

The line between unstructured data and semi-structured data has always been unclear, since most of the semi-structured data appear to be unstructured at a glance. Information that is not in the traditional database format as structured data, but contain some organizational properties which make it easier to process, are included in structured data. For example, NoSQL documents are considered to be semi-structured, since they contain keywords that can be used to process the document easily.

Applications

 Academics

 Banking

 Healthcare

 Manufacturing

 Research

 IT

 Stock Etc.

IT as a Service

IT as a service (ITaaS) is an operational model where the information technology (IT) service provider delivers an information technology service to a business. The IT service provider can be an internal IT organization or an external IT services company. The recipients of ITaaS can be a line of business (LOB) organization within an enterprise or a small and medium business (SMB). The information technology is typically delivered as a managed service with a clear IT services catalog and pricing associated with each of the catalog items. At its core, ITaaS is a competitive business model where businesses have many options for IT services and the internal IT organization has to compete against those other external options in order to be the selected IT service provider to the business. Options for providers other than the internal IT organization may include IT outsourcing companies and public cloud providers.

Virtualization

(3)

Benefits of Virtualization

 Cost

 Disaster recovery

 Energy save

 Resource utilization

 IT Flexibility Etc.

Application virtualization

This is a process where applications are virtualized and delivered from a server to the end user’s device, such as laptops, smartphones, and tablets. So instead of logging into their computers at work, users will be able to gain access to the application right from their device, provided an Internet connection is available. This is particularly popular for businesses that require the use of their applications on the go.

Hardware virtualization

(4)

Network virtualization

Network virtualization combines all physical networking equipment into a single, software-based resource. It also divides available bandwidth into multiple, independent channels, each of which can be assigned to servers and devices in real time. Businesses that would benefit from network virtualization are ones that have a large number of users and need to keep their systems up and running at all times. With the distributed channels, your network speed will increase dramatically, allowing you to deliver services and applications faster than ever before.

Storage virtualization

This type of virtualization is very easy and cost-effective to implement, since it involves compiling your physical hard drives into a single cluster. Storage virtualization is handy when it comes to planning for disaster recovery, since the data stored on your virtual storage can be replicated and transferred to another location. By consolidating your storage into a centralized system, you can eliminate the hassles and costs of managing multiple storage devices.

Hypervisor

A hypervisor or virtual machine monitor (VMM) is

(5)

Multitenancy

The characteristics of a system that enables a resource component to serve different consumers (tenants), where by each of them being isolated from the other is referred to a multi-tenancy. Multi-tenancy enables the service provider to provide computing service to multiple isolated consumers using single and shared set of resources. It is also called as ownership free resource sharing. It enables the lower computing costs and increases resource utilization which are important aspects of utility computing.

Thus, there is no one-to-one mapping between a consumer and a resource component in terms of resource use. Neither a resource component is dedicatedly used for some particular application. Resource components are allocated to users and applications solely on-availability basis. This increases resource utilization rate and in turn decreases investment as well.

Types if Tenancy

1. Single

2. Multiple

Types of Tenants

1. Single

2. Mutually exclusive – cotenant 3. Non-Mutually Exclusive

Resource Provisioning

(6)

model, it is now just a matter of minutes to achieve the same provided required volume of resources are being available.

Approaches of Resource Provisioning

1. Static

2. Dynamic

Pricing Models

References

Related documents

The software application presented in this paper, attach Office code to a document, for a company that sells wine, using .NET platform, and provides novelty in domain in that it is an

In particular, the paper concentrates on the existing relation between academic librarians - in Cyprus, Greece and Spain - with geographic information in their

The dropout literature shows that the accessibility is a big issue in studying the dropout phenomenon (Hunt 2008; Rumberger 2011). Initially we approached to conduct the study only in

Figures 2a (top) & 2b (bottom): Trophic State Index values for Seeley Lake (a) and Salmon Lake (b), based on summer chlorophyll a, Secchi depth, AHOD and spring turnover total

Virtual hardware resource layer: consists of hardware resources and virtualization layer, which operator, a storage device, network equipment and other resources constitute

S1 Shift Solenoid A 3-Way Transmission Solenoid S2 Shift Solenoid B Transmission Solenoid 3 SR Shift Solenoid E 3-Way Transmission Solenoid 2 SL1 Pressure Control

The questionnaire included questions about medication use, length of time with diabetes, self- reported health, levels of success of diabetes treatment, management

Organizations typically use an incoherent strategy towards BI deployment, characterized by different departments or business units using different BI tools.. The decision is