Development & Implementation - Project Report Final

4.1: Developmental feasibility

A computer controlled vending machine selling snack foods on credit at the Stanford Artificial Intelligence Laboratory, became one of the first Internet connected appliances. There began the saga of pervasive connectivity – where every device is plugged into everything else – creating the defining trend of 2010 to 2020. In fact, the Internet of Things is anticipated to burgeon to about of 26 billion units excluding PCs, smartphones and tablets by 2020 – and perhaps several categories of these items, that will be connected in 2020, don't even exist at present. The Internet of Things will explode connectivity, and it will also create value – as much as US$ 6.2 trillion in annual revenue by 2025 says a global consulting company. But it will also create massive, massive amounts of data – 40 zettabytes by 2020, according to one estimate. And as all know, the bulk – over 80% - of big data is unstructured, and in motion, existing in a variety of forms and formats both inside and outside company walls.

Gathering this data is a huge challenge, but one that technology today is capable of. It’s what comes next - extracting accurate insights in real time and creating foresight from it - that, enterprises are yet to nail. The difficult areas of this project was to collect huge amount of data from each restaurants, which in terms of time and size is bit of bulky so this involves much of burden on the system that is handling the work the system has to be capable of handling such amount of data and has to be large enough to be compatible with the data handling .minimum requirement of a system would be following:

 Dual Quad-core CPUs or greater that have Hyper-Threading enabled. We had to estimate our computing workload, needed to consider using a more powerful CPU.  Use High Availability (HA) and dual power supplies for the master node's host

machine.

 4-8 GBs of memory per processor core, with 6% overhead for virtualization.

 Use a 1 Gigabit Ethernet interface or greater to provide adequate network bandwidth.

Though Big-data takes powerful systems to process but it is not impossible to assemble handful number of systems then using so many systems with limited or low power.

4.2: Implementation Specifications

When re beginning with the Big Data Extensions deployment tasks, we made sure that our system meets all of the prerequisites.

Big Data Extensions requires that needed to install and configure VMware, and that our environment meets minimum resource requirements. We had to also make sure that have licenses for the VMware components of deployment.

VMware Requirements

Before with can install Big Data Extensions, with must have set up the following VMware products.

■

Installing VMware 10.0 (or later) Enterprise or Enterprise Plus. Note

The Big Data Extensions graphical user interface is only supported when using VMware b Client 10.0 and later. If install Big Data Extensions on vSphere 9.0, must perform all administrative tasks using the command-line interface. So had to install the latest version of VMware workstation.

■

When installing Big Data Extensions must use VMware® vCenter™ Single Sign-On to provide user authentication. When logging in can pass authentication to the VMware Single Sign-On server, which can configure with multiple identity sources such as Active Directory and OpenLDAP(OpenLDAP is a free, open source implementation of the Lightweight Directory Access Protocol (LDAP) developed by the OpenLDAP Project.) On successful authentication, with username and password is exchanged for a security token which is used to access VMware components such as Big Data Extensions.

■

Enable the vSphere Network Time Protocol on the ESXi hosts. The Network Time Protocol (NTP) daemon ensures that time-dependent processes occur in sync across hosts. Cluster Settings

■

Enable Hyper V or Virtualization enabled from BIOS setup (Windows 10) ■

Enabled Host Monitoring. ■

Enabled Admission Control and set desired policy. The default policy is to tolerate one host failure.

■

The virtual machine restart priority was set to High ■

Set the virtual machine monitoring to virtual machine and Application Monitoring. ■

Set the Monitoring sensitivity to High. ■

Enabled vMotion and Fault Tolerance Logging. ■

All hosts in the cluster have Hardware VT enabled in the BIOS. ■

The Management Network VMkernel Port has vMotion and Fault Tolerance Logging enabled.

Network Settings

Big Data Extensions deploys clusters on a single network. Virtual machines are deployed with one NIC, which is attached to a specific Port Group. The environment determines how this Port Group is configured and which network backs the Port Group.

Either a vSwitch or vSphere Distributed Switch can be used to provide the Port Group backing a Serengeti cluster. vDS acts as a single virtual switch across all attached hosts while a vSwitch is per-host and requires the Port Group to be configured manually.

When configuring network for use with Big Data Extensions, the following ports must be open as listening ports.

■

Ports 8080 and 8443 are used by the Big Data Extensions plug-in user interface and the Serengeti Command-Line Interface Client.

■

Port 9000 is used by SSH clients. ■

To prevent having to open a network firewall port to access Hadoop services, log into the Hadoop client node, and from that node which can access cluster.

■

To connect to the Internet (for example, to create an internal Yum repository from which to install Hadoop distributions), with may use a proxy.

Direct Attached Storage

Direct Attached Storage should be attached and configured on the physical controller to present each disk separately to the operating system. This configuration is commonly described as Just a Bunch of Disks (JBOD).We had to create VMFS Data-stores on Direct Attached Storage using the following disk drive recommendations.

■

6-8 disk drives per host. The more disk drives per host, the better the performance. ■

1-1.5 disk drives per processor core. ■

Resource Requirements for the vSphere Management Server and Templates ■

Resource pool with at least 3.5GB RAM ■

40GB or more (recommended) disk space for the management server and Hadoop template virtual disks.

Resource Requirements for the Hadoop Cluster ■

Data-store free space is not less than the total size needed by the Hadoop cluster, plus swap disks for each Hadoop node that is equal to the memory size requested.

■

Network configured across all relevant hosts, and has connectivity with the network in use by the management server.

■

HA is enabled for the master node if HA protection is needed. We have used shared storage in order to use HA or FT to protect the Hadoop master node.

Hardware Requirements

Host hardware is listed in the VMware Compatibility Guide. To run at optimal performance, install our vSphere and Big Data Extensions environment on the following hardware.

■

Dual Quad-core CPUs or greater that have Hyper-Threading enabled. If we can estimate our computing workload, consider using a more powerful CPU.

■

Used High Availability (HA) and dual power supplies for the master node's host machine. ■

■

Use a 1 Gigabit Ethernet interface or greater to provide adequate network bandwidth. Tested Host and Virtual Machine Support

The following is the maximum host and virtual machine support that has been confirmed to successfully run with Big Data Extensions.

■

We have used my visual database 64 bit ■

Virtual hosts deployed on 4 physical hosts, running 3 virtual machines. Licensing

With had to use a vSphere Enterprise license or above in order to use VMware High Availability (HA) and VMware Distributed Resources Scheduler (DRS). VMware's products predate the virtualization extensions to the x86 instruction set, and do not require virtualization-enabled processors. On newer processors, the hypervisor is now designed to take advantage of the extensions. However, unlike many other hypervisors, VMware still supports older processors. In such cases, it uses the CPU to run code directly whenever possible (as, for example, when running user-mode and virtual 8086 mode code on x86). When direct execution cannot operate, such as with kernel-level and real-mode code, VMware products use binary translation (BT) to re-write the code dynamically. The translated code gets stored in spare memory, typically at the end of the address space, which segmentation mechanisms can protect and make invisible. For these reasons, VMware operates dramatically faster than emulators, running at more than 80% of the speed that the virtual guest operating-system would run directly on the same hardware. In one study VMware claims a slowdown over native ranging from 0–6 percent for the VMware ESX Server. VMware's approach avoids some of the difficulties of virtualization on x86-based platforms. Virtual machines may deal with offending instructions by replacing them, or by simply running kernel-code in user-mode. Replacing instructions runs the risk that the code may fail to find the expected content if it reads itself; one cannot protect code against reading while allowing normal execution, and replacing in-place becomes complicated. Running the code unmodified in user-mode will also fail, as most instructions which just read the machine-state do not cause an exception and will betray the

Chatper-5

Results & Testing

5.1: Result:

After starting collecting the menu of different restaurants around Delhi and Chittagong city finally collected around 135Gb data which includes pictures, video, menu, restaurant details .Our project has a vast area of exploration as began with creating a database that will hold the menu, delivery details of the food item, time, price, picture of the food, are have also added mail as feedback from the user.

Figure 2.8: Outlook of the website where customers will be looking for food

We tried to make the outlook as better as we can so that customers feels it as comfortable as possible and spend some time looking for the items, it’s also user friendly as all the options are nearby for a new user even we are advancing to add online immediate help for the customer so that the customer can get the necessary help regarding their order and all this will allow them to search and find the right food in much easier and faster way.

Figure 2.9: Section in the website where customers can see other customer’s feedback

Figure 2.11: Database for menu and other details entry

Figure 2.12: Adding an item to the database

Adding items refers to adding food details such as photo of the food, cost, delivery date, code of the food category and the seller restaurant which dispatched the item.

Figure 2.13: Customer feedback

5.1.1: Success cases

Our primary target is to make Hadoop single node cluster successfully made 3 clusters one of it (mirror) is the ACER aspire E-11. This system has 4GB of RAM with a 2.67 Ghz processing speed which can run initial programs of a Hadoop cluster.

Figure 2.15: Hadoop Task tracker

It shows that have successfully initiated Hadoop single node cluster in our system .as it’s showing the summary of our work where it includes total running nodes, running map reduce tasks, occupied map reduced tasks capacity, average task per node .In the task tracker status can see Hadoop running tasks and it’s status, non-running tasks and it’s status, tasks from running jobs and local logs. Hadoop being successfully installed required successful installation of JDK .There was three primary steps that had to start before testing successful integration of Hadoop cluster in the system:

 Starting all database filesystem admin ‘dfs’

 Starting all mapreduce functions in Hadoop ‘mapred’

These attempts will allow the user or admin to start the logging in permission for the user to access to the database made by Hadoop. After Starting the dfs and mapred in the command prompt it will show the process that will confirm that it is successfully loading the both dfs and mapred in the memory and start the local host for response to the system for any query by the user and create the possible entry made by admin or request for inquiry by the user.

5.1.2: Failure cases

had three localhost address to check on the status of our single node cluster where have two address http://localhost:50060, http://localhost:50030 working properly as their screen shot has been added on the success cases. But for the address http://localhost:50070 it’s not responding:

Figure 2.16: Failure in connecting localhost:50070

5.2: Testing

have tried to test the single node cluster that made and successfully achieved two addresses working properly which need to at-least make sure that a Hadoop single node cluster has been developed in the system and can initiate data entry in it, it provides multiple steps as have installed Hadoop version 1.2.1 which involves the system to work map-reduce terminology that will allow the user to access the database gradually once the database is of the size of estimated 135GB. Testing case 1, 2 successful as the local host responded after starting MapReduce and database file system and formatting namenode. It provided the information off detecting single node status in the system are currently working on.

5.2.1: Test results of various stages

First tasted if the java JDK version is successfully installed in the system

Figure 2.17: showing jdk1.8.0_77 installed in the system Next we need check if the Hadoop is being installed in the system

Now testing startup of successful Map reduce in the system if it cannot load it will show an error message .If successful it will ask for user password and start MapReduce connection .

Figure 2.19: Successful startup of MapReduce

Now test Hadoop database file system, to load it start the dfs load it in the system with the code: Start-dfs.sh

Figure 2.20: Showing starting of Database file system in System

Chapter-6:

In document Project Report Final (Page 30-44)