• No results found

Using the Bluemix Analytics for Hadoop Service to Analyse Data

N/A
N/A
Protected

Academic year: 2021

Share "Using the Bluemix Analytics for Hadoop Service to Analyse Data"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

Using the Bluemix Analytics for Hadoop

Service to Analyse Data

(2)

Lab 1: Using the Bluemix Analytics for Hadoop Service to

Analyze Data

!

Lab Objectives: This lab will show you how to use the Analytics for Hadoop Service in Bluemix to analyse large volumes of medical data collected by heart monitors using BigSheets, a spreadsheet-style tool accessible from the console of the Analytics for Hadoop Service in Bluemix.

!

Lab Duration :45 minutes

1. Creating a Bluemix Application with an Hadoop Service Instance


!

In this section you'll create an Bluemix application with an Hadoop Service instance that will be used throughout the rest of the labs

1. In your browser go to the Bluemix URL http://bluemix.net and login if necessary

2. Make sure you're in the Dashboard tab (if not click on the Dashboard link at the top of the page to

take you there)

3. Scroll down to the Applications section and click on CREATE AN APP

4. For the template choose Web

(3)

!

6. We will select the boiler plate Internet Of the Things

7. Choose a name and click create

Your application should restart, wait for a few minutes.

!

8. Let’s add a new service to the app, from the app overview click on ADD A SERVICE

9. Scroll down to the Big Data category and click on the icon for the IBM Analytics for Hadoop service

.

Then click

Create,

the application must Restage again.

!

2. Uploading the medical data to the Analytics for Hadoop Service

instance


!

In this section you'll upload medical data to the instance of the Analytics for Hadoop Service instance that you just created

(4)

You’ll have to upload files then create a new working environment to have an spreadsheet like environment with several tabs.

!

1. Click on the Hadoop service of your APP then on the Launch icon to launch the console of the service

instance you just created in another tab.

!

Figure 3 Analytics for Hadoop console icon


!

2. From the console click the Files tab

3. In the DFS navigator, expand the user directory and select the directory biblumix

4. Click the Upload icon.

!

Figure 4 Upload icon

5. Click Browse and select the file \BDADaysLabs\Lab1-BigInsight\Historical_Personal_Data.txt

where \BDADaysLabs is the root folder of the files provided to you by the instructor. Click OK.

6. Repeat to upload the file \BDADaysLabs\Lab1\Historical_Health_Data.txt

3. Importing the Data into BigSheets


!

Now that you have the sample data uploaded to HDFS, you can import the data into BigSheets and create workbooks that contains that data.

(5)

!

Figure 5 Create BigSheets Workbook

!

2. Name the workbook PersonalData. In the distributed file system browser, select the file

Historical_Personal_Data.csv that you uploaded in the previous step.

3. In the preview pane on the right , select a new reader to map the data into a spreadsheet format.

(Currently the reader is Line Reader.) Click the edit icon next to 'Line Reader' and select Comma

Separated Value (CSV) Data from the drop-down list.

!

Figure 6 Changing the reader

!

4. Click the green check mark to change the reader. Click Fit Columns in the preview pane to make the

tabular data appear more compact.

Figure 7 Fit Columns to width

!

5. Click the green check at the bottom to save the workbook PersonalData .

(6)

!

!


 


!

!

!

Figure 8 Workbooks link

Click Build new workbook

13. Name the workbook HealthData. In the distributed file system browser go to user/biblumix/ select

the file Historical_Health_Data that you uploaded in the previous step.

14. Change Line Reader : In the preview pane on the right , select a new reader to map the data into a

spreadsheet format. (Currently the reader is Line Reader.) Click the edit icon next to 'Line Reader' and select Comma separated Values (CSV) Data from the drop-down list.

15. Click the green check mark to change the reader. Click Fit Columns in the preview pane to make the

tabular data appear more compact.

16. Click the green check at the bottom to save the workbook HealthData.

17. You now have 2 workbooks HealthData and PersonalData

!

!

4. Combining the Data From Multiple Workbooks and Creating

Ccharts!

!

Now, since we have two workbooks with the a common field (PatientID), we can perform a “join” of these two workbooks as the basis for exploring the medical study data by heart failure

Figure 9 Workbooks link

!

(7)

!

2. Click on the PersonalData workbook link

!

3. Click Build New Workbook.

4. Rename it PatientData by clicking on the pen and click Save

!

!

At this stage , at the bottom of the page you have :

!

!

5. In the top left-hand side, you should see a link called Add sheets. This allows you to perform

additional analysis on your data within the current workbook. Click Add Sheets

(8)

!

Figure 10 Load additional sheet

!

6. The Load option will allow you to load data into the current workbook from another workbook. Click

(9)

!

7. Set the Sheet name to HealthData1 and click the green arrow icon to load the new workbook into

your current workbook.

!

8. Verify that you see two tabs at the bottom on your current workbook. Move your mouse over the

second one, and a tool tip will show the action and the name you provided for this sheet / tab within your current workbook.

!

9. Next add another sheet (a sheet should be considered as a new tab in a workbook) in order to

(10)

!

Figure 12 Add sheet using JOIN

!

10. Name the sheet PatientByHeartFailure

11. Select Inner as the Join type

!

12. Click on the arrow next to Add sheets (at least 2) to join and select PersonalData

13. Click on the green plus next to Add sheets (at least 2) to join:

14. Click on the arrow next to Add sheets (at least 2) to join and select HealthData1

(11)

16. Select the PatientID column as the Join column Figure 14 Join column

!

17. Click the green checkmark icon to complete the operation

!

18. Click the arrow in any column header and select Organize Columns…

19. Scroll down to the end and Remove the column PatientID1 (we have 2 Patient ID columns because

of the Join)

20. Click the green checkmark icon to complete the operation

(12)

!

22. Click the Run button to run the workbook (it takes a while).

23. We will now export this workbook to a table, for the data to be accessible by our Analytics tools such

as SPSS and Cognos. We will use in the next labs this dataset in SPSS to build predictive Heart failure model and in Cognos to build a simple dashboard. Click on the button create table and hit

confirm.

!

22. Let’s now check that our table has been created. Go the Files tab then select the Catalog Tables tab

and you should see the schemas sheets and under the table patientdata. We will use this table in the next labs when we will be using SPSS and Cognos

!

!

23. Le’ts go back to the BigSheets tab and leverage analytical functions embedded in the tool, select the

workbook PatientData.

24. To easily visualize the data by Heart failure, you can create a chart. Click Add chart and then click on

(13)

!

!

25. Complete the following information to produce a pie chart

!

(14)

!

!

!

!

(15)

!

By clicking on the small icon above , you get a view of the workflow used to create the Patient Data workbook

!

!

!

!

Congratulations!

You’ve successfully completed Lab 1 where you have explored the Bluemix

Hadoop Analytics service powered by IBM BigInsights

References

Related documents

In conclusion, for the studied Taiwanese population of diabetic patients undergoing hemodialysis, increased mortality rates are associated with higher average FPG levels at 1 and

 As previously stated, since this is a PFR, the analysis is limited based on location, character, and extent. Therefore, staff’s review and analysis is limited to the

Materials and methods: Twenty-four as- ymptomatic dogs with a positive test result for Giardia spp were random- ized in two equal groups to receive a single dose of secnidazole at

associated with the perceived quality of risk control (H2a); the process variable perceived quality of risk knowledge sharing is positively associated with perceived quality of

As we move further into 2021, let’s dive into the state of these trends – travel, retail, health and wellness, technology, and finance – and see where they are taking

A comparison with the benchmark model of Schwartz and Smith (2000), shows that the proposed model greatly improves the futures pricing at the short end of the futures curve

Rabbit polyclonal and mouse monoclonal anti- bodies reacted with the TSHR in Western blotting and one monoclonal antibody (3C7) was able to inhibit 125 I-TSH binding to native

To determine if STZ-induced diabetic C3H/HeN mice are more susceptible to UTI compared to buffer-treated control mice, various doses of the UPEC cystitis isolate, UTI89,