Using the Bluemix Analytics for Hadoop
Service to Analyse Data
Lab 1: Using the Bluemix Analytics for Hadoop Service to
Analyze Data
!
Lab Objectives: This lab will show you how to use the Analytics for Hadoop Service in Bluemix to analyse large volumes of medical data collected by heart monitors using BigSheets, a spreadsheet-style tool accessible from the console of the Analytics for Hadoop Service in Bluemix.
!
Lab Duration :45 minutes
1. Creating a Bluemix Application with an Hadoop Service Instance
!
In this section you'll create an Bluemix application with an Hadoop Service instance that will be used throughout the rest of the labs
1. In your browser go to the Bluemix URL http://bluemix.net and login if necessary
2. Make sure you're in the Dashboard tab (if not click on the Dashboard link at the top of the page to
take you there)
3. Scroll down to the Applications section and click on CREATE AN APP
4. For the template choose Web
!
6. We will select the boiler plate Internet Of the Things
7. Choose a name and click create
Your application should restart, wait for a few minutes.
!
8. Let’s add a new service to the app, from the app overview click on ADD A SERVICE
9. Scroll down to the Big Data category and click on the icon for the IBM Analytics for Hadoop service
.
Then click
Create,
the application must Restage again.
!
2. Uploading the medical data to the Analytics for Hadoop Service
instance
!
In this section you'll upload medical data to the instance of the Analytics for Hadoop Service instance that you just created
You’ll have to upload files then create a new working environment to have an spreadsheet like environment with several tabs.
!
1. Click on the Hadoop service of your APP then on the Launch icon to launch the console of the service
instance you just created in another tab.
!
Figure 3 Analytics for Hadoop console icon
!
2. From the console click the Files tab
3. In the DFS navigator, expand the user directory and select the directory biblumix
4. Click the Upload icon.
!
Figure 4 Upload icon
5. Click Browse and select the file \BDADaysLabs\Lab1-BigInsight\Historical_Personal_Data.txt
where \BDADaysLabs is the root folder of the files provided to you by the instructor. Click OK.
6. Repeat to upload the file \BDADaysLabs\Lab1\Historical_Health_Data.txt
3. Importing the Data into BigSheets
!
Now that you have the sample data uploaded to HDFS, you can import the data into BigSheets and create workbooks that contains that data.
!
Figure 5 Create BigSheets Workbook
!
2. Name the workbook PersonalData. In the distributed file system browser, select the file
Historical_Personal_Data.csv that you uploaded in the previous step.
3. In the preview pane on the right , select a new reader to map the data into a spreadsheet format.
(Currently the reader is Line Reader.) Click the edit icon next to 'Line Reader' and select Comma
Separated Value (CSV) Data from the drop-down list.
!
Figure 6 Changing the reader
!
4. Click the green check mark to change the reader. Click Fit Columns in the preview pane to make the
tabular data appear more compact.
Figure 7 Fit Columns to width
!
5. Click the green check at the bottom to save the workbook PersonalData .
!
!
!
!
!
Figure 8 Workbooks link
Click Build new workbook
13. Name the workbook HealthData. In the distributed file system browser go to user/biblumix/ select
the file Historical_Health_Data that you uploaded in the previous step.
14. Change Line Reader : In the preview pane on the right , select a new reader to map the data into a
spreadsheet format. (Currently the reader is Line Reader.) Click the edit icon next to 'Line Reader' and select Comma separated Values (CSV) Data from the drop-down list.
15. Click the green check mark to change the reader. Click Fit Columns in the preview pane to make the
tabular data appear more compact.
16. Click the green check at the bottom to save the workbook HealthData.
17. You now have 2 workbooks HealthData and PersonalData
!
!
4. Combining the Data From Multiple Workbooks and Creating
Ccharts!
!
Now, since we have two workbooks with the a common field (PatientID), we can perform a “join” of these two workbooks as the basis for exploring the medical study data by heart failure
Figure 9 Workbooks link
!
!
2. Click on the PersonalData workbook link
!
3. Click Build New Workbook.
4. Rename it PatientData by clicking on the pen and click Save
!
!
At this stage , at the bottom of the page you have :
!
!
5. In the top left-hand side, you should see a link called Add sheets. This allows you to perform
additional analysis on your data within the current workbook. Click Add Sheets
!
Figure 10 Load additional sheet
!
6. The Load option will allow you to load data into the current workbook from another workbook. Click
!
7. Set the Sheet name to HealthData1 and click the green arrow icon to load the new workbook into
your current workbook.
!
8. Verify that you see two tabs at the bottom on your current workbook. Move your mouse over the
second one, and a tool tip will show the action and the name you provided for this sheet / tab within your current workbook.
!
9. Next add another sheet (a sheet should be considered as a new tab in a workbook) in order to
!
Figure 12 Add sheet using JOIN
!
10. Name the sheet PatientByHeartFailure
11. Select Inner as the Join type
!
12. Click on the arrow next to Add sheets (at least 2) to join and select PersonalData
13. Click on the green plus next to Add sheets (at least 2) to join:
14. Click on the arrow next to Add sheets (at least 2) to join and select HealthData1
16. Select the PatientID column as the Join column Figure 14 Join column
!
17. Click the green checkmark icon to complete the operation
!
18. Click the arrow in any column header and select Organize Columns…
19. Scroll down to the end and Remove the column PatientID1 (we have 2 Patient ID columns because
of the Join)
20. Click the green checkmark icon to complete the operation
!
22. Click the Run button to run the workbook (it takes a while).
23. We will now export this workbook to a table, for the data to be accessible by our Analytics tools such
as SPSS and Cognos. We will use in the next labs this dataset in SPSS to build predictive Heart failure model and in Cognos to build a simple dashboard. Click on the button create table and hit
confirm.
!
22. Let’s now check that our table has been created. Go the Files tab then select the Catalog Tables tab
and you should see the schemas sheets and under the table patientdata. We will use this table in the next labs when we will be using SPSS and Cognos
!
!
23. Le’ts go back to the BigSheets tab and leverage analytical functions embedded in the tool, select the
workbook PatientData.
24. To easily visualize the data by Heart failure, you can create a chart. Click Add chart and then click on
!
!
25. Complete the following information to produce a pie chart
!
!
!
!
!
!
By clicking on the small icon above , you get a view of the workflow used to create the Patient Data workbook
!