• No results found

CONFIGURING ECLIPSE FOR AWS EMR DEVELOPMENT

N/A
N/A
Protected

Academic year: 2021

Share "CONFIGURING ECLIPSE FOR AWS EMR DEVELOPMENT"

Copied!
42
0
0

Loading.... (view fulltext now)

Full text

(1)

1 | P a g e © 2014. Agile ISS. All rights reserved.

CONFIGURING ECLIPSE FOR AWS EMR DEVELOPMENT

With this post we thought of sharing a tutorial for configuring Eclipse IDE (Intergrated Development Environment) for Amazon AWS EMR scripting and development. Once we started creating our own bootstrap scripts for EMR, we quickly realized that it gets cumbersome to use Notepad ++, PuTTY, WinSCP, Command Prompt for EMR CLI and Git all in different windows and it would be nice to have an integrated environment to do that. Eclipse seemed perfect for this as it has a plugin for anything one can even think of.

We have developed a bootstrap script for launching a keep-alive MapR M3 cluster on AWS EMR using this environment and were quite happy with it.

All the steps are summarized in this index. Feel free to jump to a specific topic that interests you or follow all the steps from the beginning. Please leave your comments as we’d like to hear back from you!

Installing Amazon EMR Command Line Interface (CLI) ... 2

Setting up Eclipse IDE for AWS/Hadoop Development in Shell Scripts And Python ... 5

Installing Oracle JDK ... 5

Installing Eclipse ... 5

Installing PyDev Plugin ... 6

Installing AWS Toolkit ... 8

Installing ShellEd... 11

Configuring SSH ... 12

Configuring GIT ... 13

Setting Up CMD Prompt Inside Eclipse ... 28

Developing EMR Bootstrap Script ... 31

Launching EMR MapR Cluster ... 31

Connecting To Master Node From Eclipse ... 34

Running the Bootstrap Script ... 40

(2)

2 | P a g e © 2014. Agile ISS. All rights reserved.

Installing Amazon EMR Command Line Interface (CLI)

This installation is done on local Windows computer (not on AWS). The EMR CLI is written in Ruby and therefore requires Ruby to be installed as a prerequisite.

1. Install Ruby 1.8.7 on Windows

a. Get installation package from http://rubyforge.org/frs/download.php/76524/rubyinstaller-1.8.7-p371.exe

(3)

3 | P a g e © 2014. Agile ISS. All rights reserved.

c. Check that Ruby and RubyGems are installed properly by running “ruby -v” and “gem -v” from command prompt.

(4)

4 | P a g e © 2014. Agile ISS. All rights reserved.

a. Unzip content of elastic-mapreduce-ruby.zip into C:\AWS\elastic-mapreduce-cli b. Create credentials.json file under C:\AWS\elastic-mapreduce-cli

{ “access_id”: “AKIAJGJKJSHKDJF6GUIOIEUR”, “private_key”: “dfsdfKJKDFSDFldfsdf99484nksdjnwr934 “, “key-pair”: “key”, “key-pair-file”: “C:\key.ppk”, “log_uri”: “s3n://mybucket/logs/”, “region”: “us-east-1″ }

(5)

5 | P a g e © 2014. Agile ISS. All rights reserved.

Setting up Eclipse IDE for AWS/Hadoop Development in Shell Scripts And

Python

Installing Oracle JDK

As stated in Eclipse readme file Oracle Java 7u9 is the best supported JDK for Eclipse. We tried installing with 8u5 version and it worked fine. Download JDK fromhttp://download.oracle.com/otn-pub/java/jdk/8u5-b13/jdk-8u5-windows-x64.exe and follow the installation procedure.

Installing Eclipse

1. Download Eclipse JEE version

from https://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/rele ase/kepler/SR2/eclipse-jee-kepler-SR2-win32-x86_64.zip&mirror_id=1135

2. Expand downloaded zip archive into a local folder (in my case it was C:\Users\Dmitri\eclipse)

3. Launch eclipse.exe

4. Select workspace location

(6)

6 | P a g e © 2014. Agile ISS. All rights reserved.

Installing PyDev Plugin

(7)

7 | P a g e © 2014. Agile ISS. All rights reserved.

(8)

8 | P a g e © 2014. Agile ISS. All rights reserved.

(9)

9 | P a g e © 2014. Agile ISS. All rights reserved.

1. Goto Help->Install New Software and adding AWS repository at http://aws.amazon.com/eclipse

(10)

10 | P a g e © 2014. Agile ISS. All rights reserved.

2. Provide AWS access keys and click Finish

(11)

11 | P a g e © 2014. Agile ISS. All rights reserved.

4. Click OK and the environment should look like this:

Installing ShellEd

1. Go to Help->Install New Software and adding ShellEd repository at

(12)

12 | P a g e © 2014. Agile ISS. All rights reserved.

2. Click Next and follow the installation procedure

Configuring SSH

1. Go to Window->Preferences->General->Network Connections->SSH2 2. Click Key Management tab.

(13)

13 | P a g e © 2014. Agile ISS. All rights reserved.

3. Click Load Existing Key button. Locate your AWS private key used in credentials.json file when setting up AWS EMR CLI.

4. Click Save Private Key button, click OK twice to bypass warnings and save id_rsa file in your .ssh directory. It should say that it has successfully saved public and private keys.

Configuring GIT

1. Go to Window->Preferences->Team->Git->Configuration and changing user and email parameters

(14)

14 | P a g e © 2014. Agile ISS. All rights reserved.

(15)

15 | P a g e © 2014. Agile ISS. All rights reserved.

(16)

16 | P a g e © 2014. Agile ISS. All rights reserved.

(17)

17 | P a g e © 2014. Agile ISS. All rights reserved.

(18)

18 | P a g e © 2014. Agile ISS. All rights reserved.

6. Right click on the project and go to Team->Share Project in the context menu.

(19)

19 | P a g e © 2014. Agile ISS. All rights reserved.

8. On the Configure Git Repository screen click Create button

(20)

20 | P a g e © 2014. Agile ISS. All rights reserved.

9. Enter the name of the new Git repository

10. Click Finish two times and the screen should look like this. NO-HEAD means that nothing has been committed yet.

(21)

21 | P a g e © 2014. Agile ISS. All rights reserved.

11. Doing first commit. Right click on the project name and go to Team->Commit in the context menu. Select the .sh file, provide comments and click Commit.

(22)

22 | P a g e © 2014. Agile ISS. All rights reserved.

13. Pushing the changes to GitHub. Bring up Git Repositories view by going to Window->Show View->Other->Git->Git Repositories.

(23)

23 | P a g e © 2014. Agile ISS. All rights reserved.

15. Leave Remote name as “origin” and Configure push option selected and click OK

16. Create your GitHub repository if you haven’t already done so. If you don’t have a GitHub account, then sign up.

(24)

24 | P a g e © 2014. Agile ISS. All rights reserved.

17. Go to Account Settings -> SSH Keys and add a new key. Paste the text from your *.ppk file used in credentials.json file when setting up EMR CLI.

(25)

25 | P a g e © 2014. Agile ISS. All rights reserved.

(26)

26 | P a g e © 2014. Agile ISS. All rights reserved.

19. In Eclipse paste the copied URL, select ssh for protocol, leave the Password blank and click Finish.

(27)

27 | P a g e © 2014. Agile ISS. All rights reserved.

(28)

28 | P a g e © 2014. Agile ISS. All rights reserved.

21. If there are no-fast-forward errors refer to the web forums discussions and troubleshooting at http://stackoverflow.com/questions/3598355/i-am-not-able-to-push-on-git

Setting Up CMD Prompt Inside Eclipse

1. In Eclipse go to Run->External Tools->External Tools Configurations.

2. Click on Program and then on Create New Launch Configuration button (the one with “+” sign on the left toolbar)

(29)

29 | P a g e © 2014. Agile ISS. All rights reserved.

3. Specify “CMD” as a Name, “C:\Windows\System32\cmd.exe” as a Location and a folder where EMR CLI is installed as a Working Directory.

(30)

30 | P a g e © 2014. Agile ISS. All rights reserved.

5. Click Run and you have a running Windows console in your Eclipse. If you need to work with another AWS CLI, then change the Working Directory to point to it.

(31)

31 | P a g e © 2014. Agile ISS. All rights reserved.

Developing EMR Bootstrap Script

Launching EMR MapR Cluster

1. Run the following command from CMD:

C:\AWS\elastic-mapreduce-cli>ruby elastic-mapreduce –create –alive –instance-type m1.large –num-instances 3 –supported-product mapr –name “MapR M3 Cluster” –args “–edition,m3″ -v

2. Check the instances in EC2 Instances window.

(32)

32 | P a g e © 2014. Agile ISS. All rights reserved.

C:\AWS\elastic-mapreduce-cli>ruby elastic-mapreduce –describe [Job Flow ID]

4. Check if MapR Control System (MCS) is running.

a. Go to Security Groups and click on ElasticMapReduce-master

b. Right click In the list of permissions and select Add Permission from the context menu. Enter port 8453 and you can leave default value for Network Mask, or restrict it to a specific IP address or subnet.

(33)

33 | P a g e © 2014. Agile ISS. All rights reserved.

c. Open MCS in the browser at https://master-node-public-dns:8453. It’s going to warn about the certificate, ask about applying licenses, etc. Do all that.

d. Locate master node by hovering over green squares in the Dashboard view and click on it. Check the running services.

(34)

34 | P a g e © 2014. Agile ISS. All rights reserved.

Connecting To Master Node From Eclipse

1. Go to Window->Open Perspective->Other->Remote System Explorer

2. Click “Define a connection to remote system button” (the one with “+” sign on the toolbar on the left)

(35)

35 | P a g e © 2014. Agile ISS. All rights reserved.

(36)

36 | P a g e © 2014. Agile ISS. All rights reserved.

5. Right click on the “EMR MapR Master Node” in the Remote Systems window and choose Connect from the context menu. Change User ID to hadoop, leave password blank and click OK.

(37)

37 | P a g e © 2014. Agile ISS. All rights reserved.

6. In the Properties window it has to say “Some subsystems connected”

7. Right click on Ssh Terminals under EMR MapR Master Node connection and select Launch Terminal from the context menu. The terminal should launch with the master node prompt.

8. Switch back to AWS Management perspective

9. Open the Remote Systems view by going to Window->Show View->Other->Remote Systems->Remote Systems.

10. Open the Terminals view by going to Window->Show View->Other->Remote Systems->Terminals

(38)

38 | P a g e © 2014. Agile ISS. All rights reserved.

11. Your environment should look like the screenshot below with AWS Explorer, Project Explorer and Remote Systems on the left, the opened file to work on at the top and all the EC2, Git, Windows CMD and Terminal connected to master node at the bottom.

12. Right click on Local Files under Local in Remote Systems view and select New->Filter from the context menu. Enter the path to the working directory for your project.

(39)

39 | P a g e © 2014. Agile ISS. All rights reserved.

13. After filter is set up you should see the project folder and the *.sh file you are editing

14. Create /opt/mapr/custom directory on the master node and change the owner to the hadoop user

(40)

40 | P a g e © 2014. Agile ISS. All rights reserved.

$ sudo mkdir /opt/mapr/custom

$ sudo chown hadoop /opt/mapr/custom

15. Follow similar procedure to set filter under Sftp Files that points to /opt/mapr/custom folder on the master node.

16. Now you should be able to drag and drop the file from Local to Sftp folders.

Running the Bootstrap Script

The finished script is located in GitHub at https://github.com/dmitrisafine/aws-emr-mapr/blob/master/aws-emr-mapr/emr-mapr-bootstrap.sh

1. Upload the script to S3.

(41)

41 | P a g e © 2014. Agile ISS. All rights reserved.

ruby elastic-mapreduce –create –alive –instance-type m1.large –num-instances 3 –supported-product mapr –name “MapR M3 Cluster” –args “–edition,m3″ –bootstrap-action “s3://mybucket/emr-mapr-bootstrap.sh”

3. Note your Job Flow ID.

Testing the Cluster

1. Find out the public DNS of the master node by running

C:\AWS\elastic-mapreduce-cli\ruby elastic-mapreduce –describe <Job Flow ID>

2. Go to MCS at https://master_node_public_dns:8453 and login with hadoop/hadoop as a username and password.

(42)

42 | P a g e © 2014. Agile ISS. All rights reserved.

3. Go to Hue at http://master_node_public_dns:8888 and login with hadoop/mapr as username and password. It should say All OK. Configuration check passed.

References

Related documents

To launch the data transfer when the software is open, click on the Exit button in the upper-right corner of the screen and select Transfer terminal data5. The same screen as the

Right click your FaxCore database and select Tasks &gt; Detach to launch the Detach Database window and from there, select the General page and check the Drop Connection

From the Eclipse project, it will possible to launch the PhoneGap Simulator or the Sony Ericsson WebSDK Packager. Right-click on the project folder and press Launch PhoneGap Simulator

1) Click on Master Slides on the right of the Sidebar to open the Master Slides deck. 2) Right-click on the required master slide in the Used in This Presentation panel. 3) Select

From the AWS Management Console, select Instances from the left- hand navigation section and click Launch Instance in the My Instances page:.. Select the Community AMIs tab and

Under Terminal Services, click Role Services .The Select Role Services page appears.. Select Terminal Server and

To launch an Expert advice Custom App for the data returned by a search, select a column or cell of data in Grid view, right-click and click Execute to launch

On the Security tab of the AgtResourceManager Properties dialog box, under Launch and Activation Permissions, select Customize and click Edit.. The Launch Permission