1 | P a g e © 2014. Agile ISS. All rights reserved.
CONFIGURING ECLIPSE FOR AWS EMR DEVELOPMENT
With this post we thought of sharing a tutorial for configuring Eclipse IDE (Intergrated Development Environment) for Amazon AWS EMR scripting and development. Once we started creating our own bootstrap scripts for EMR, we quickly realized that it gets cumbersome to use Notepad ++, PuTTY, WinSCP, Command Prompt for EMR CLI and Git all in different windows and it would be nice to have an integrated environment to do that. Eclipse seemed perfect for this as it has a plugin for anything one can even think of.
We have developed a bootstrap script for launching a keep-alive MapR M3 cluster on AWS EMR using this environment and were quite happy with it.
All the steps are summarized in this index. Feel free to jump to a specific topic that interests you or follow all the steps from the beginning. Please leave your comments as we’d like to hear back from you!
Installing Amazon EMR Command Line Interface (CLI) ... 2
Setting up Eclipse IDE for AWS/Hadoop Development in Shell Scripts And Python ... 5
Installing Oracle JDK ... 5
Installing Eclipse ... 5
Installing PyDev Plugin ... 6
Installing AWS Toolkit ... 8
Installing ShellEd... 11
Configuring SSH ... 12
Configuring GIT ... 13
Setting Up CMD Prompt Inside Eclipse ... 28
Developing EMR Bootstrap Script ... 31
Launching EMR MapR Cluster ... 31
Connecting To Master Node From Eclipse ... 34
Running the Bootstrap Script ... 40
2 | P a g e © 2014. Agile ISS. All rights reserved.
Installing Amazon EMR Command Line Interface (CLI)
This installation is done on local Windows computer (not on AWS). The EMR CLI is written in Ruby and therefore requires Ruby to be installed as a prerequisite.
1. Install Ruby 1.8.7 on Windows
a. Get installation package from http://rubyforge.org/frs/download.php/76524/rubyinstaller-1.8.7-p371.exe
3 | P a g e © 2014. Agile ISS. All rights reserved.
c. Check that Ruby and RubyGems are installed properly by running “ruby -v” and “gem -v” from command prompt.
4 | P a g e © 2014. Agile ISS. All rights reserved.
a. Unzip content of elastic-mapreduce-ruby.zip into C:\AWS\elastic-mapreduce-cli b. Create credentials.json file under C:\AWS\elastic-mapreduce-cli
{ “access_id”: “AKIAJGJKJSHKDJF6GUIOIEUR”, “private_key”: “dfsdfKJKDFSDFldfsdf99484nksdjnwr934 “, “key-pair”: “key”, “key-pair-file”: “C:\key.ppk”, “log_uri”: “s3n://mybucket/logs/”, “region”: “us-east-1″ }
5 | P a g e © 2014. Agile ISS. All rights reserved.
Setting up Eclipse IDE for AWS/Hadoop Development in Shell Scripts And
Python
Installing Oracle JDK
As stated in Eclipse readme file Oracle Java 7u9 is the best supported JDK for Eclipse. We tried installing with 8u5 version and it worked fine. Download JDK fromhttp://download.oracle.com/otn-pub/java/jdk/8u5-b13/jdk-8u5-windows-x64.exe and follow the installation procedure.
Installing Eclipse
1. Download Eclipse JEE version
from https://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/rele ase/kepler/SR2/eclipse-jee-kepler-SR2-win32-x86_64.zip&mirror_id=1135
2. Expand downloaded zip archive into a local folder (in my case it was C:\Users\Dmitri\eclipse)
3. Launch eclipse.exe
4. Select workspace location
6 | P a g e © 2014. Agile ISS. All rights reserved.
Installing PyDev Plugin
7 | P a g e © 2014. Agile ISS. All rights reserved.
8 | P a g e © 2014. Agile ISS. All rights reserved.
9 | P a g e © 2014. Agile ISS. All rights reserved.
1. Goto Help->Install New Software and adding AWS repository at http://aws.amazon.com/eclipse
10 | P a g e © 2014. Agile ISS. All rights reserved.
2. Provide AWS access keys and click Finish
11 | P a g e © 2014. Agile ISS. All rights reserved.
4. Click OK and the environment should look like this:
Installing ShellEd
1. Go to Help->Install New Software and adding ShellEd repository at
12 | P a g e © 2014. Agile ISS. All rights reserved.
2. Click Next and follow the installation procedure
Configuring SSH
1. Go to Window->Preferences->General->Network Connections->SSH2 2. Click Key Management tab.
13 | P a g e © 2014. Agile ISS. All rights reserved.
3. Click Load Existing Key button. Locate your AWS private key used in credentials.json file when setting up AWS EMR CLI.
4. Click Save Private Key button, click OK twice to bypass warnings and save id_rsa file in your .ssh directory. It should say that it has successfully saved public and private keys.
Configuring GIT
1. Go to Window->Preferences->Team->Git->Configuration and changing user and email parameters
14 | P a g e © 2014. Agile ISS. All rights reserved.
15 | P a g e © 2014. Agile ISS. All rights reserved.
16 | P a g e © 2014. Agile ISS. All rights reserved.
17 | P a g e © 2014. Agile ISS. All rights reserved.
18 | P a g e © 2014. Agile ISS. All rights reserved.
6. Right click on the project and go to Team->Share Project in the context menu.
19 | P a g e © 2014. Agile ISS. All rights reserved.
8. On the Configure Git Repository screen click Create button
20 | P a g e © 2014. Agile ISS. All rights reserved.
9. Enter the name of the new Git repository
10. Click Finish two times and the screen should look like this. NO-HEAD means that nothing has been committed yet.
21 | P a g e © 2014. Agile ISS. All rights reserved.
11. Doing first commit. Right click on the project name and go to Team->Commit in the context menu. Select the .sh file, provide comments and click Commit.
22 | P a g e © 2014. Agile ISS. All rights reserved.
13. Pushing the changes to GitHub. Bring up Git Repositories view by going to Window->Show View->Other->Git->Git Repositories.
23 | P a g e © 2014. Agile ISS. All rights reserved.
15. Leave Remote name as “origin” and Configure push option selected and click OK
16. Create your GitHub repository if you haven’t already done so. If you don’t have a GitHub account, then sign up.
24 | P a g e © 2014. Agile ISS. All rights reserved.
17. Go to Account Settings -> SSH Keys and add a new key. Paste the text from your *.ppk file used in credentials.json file when setting up EMR CLI.
25 | P a g e © 2014. Agile ISS. All rights reserved.
26 | P a g e © 2014. Agile ISS. All rights reserved.
19. In Eclipse paste the copied URL, select ssh for protocol, leave the Password blank and click Finish.
27 | P a g e © 2014. Agile ISS. All rights reserved.
28 | P a g e © 2014. Agile ISS. All rights reserved.
21. If there are no-fast-forward errors refer to the web forums discussions and troubleshooting at http://stackoverflow.com/questions/3598355/i-am-not-able-to-push-on-git
Setting Up CMD Prompt Inside Eclipse
1. In Eclipse go to Run->External Tools->External Tools Configurations.
2. Click on Program and then on Create New Launch Configuration button (the one with “+” sign on the left toolbar)
29 | P a g e © 2014. Agile ISS. All rights reserved.
3. Specify “CMD” as a Name, “C:\Windows\System32\cmd.exe” as a Location and a folder where EMR CLI is installed as a Working Directory.
30 | P a g e © 2014. Agile ISS. All rights reserved.
5. Click Run and you have a running Windows console in your Eclipse. If you need to work with another AWS CLI, then change the Working Directory to point to it.
31 | P a g e © 2014. Agile ISS. All rights reserved.
Developing EMR Bootstrap Script
Launching EMR MapR Cluster
1. Run the following command from CMD:
C:\AWS\elastic-mapreduce-cli>ruby elastic-mapreduce –create –alive –instance-type m1.large –num-instances 3 –supported-product mapr –name “MapR M3 Cluster” –args “–edition,m3″ -v
2. Check the instances in EC2 Instances window.
32 | P a g e © 2014. Agile ISS. All rights reserved.
C:\AWS\elastic-mapreduce-cli>ruby elastic-mapreduce –describe [Job Flow ID]
4. Check if MapR Control System (MCS) is running.
a. Go to Security Groups and click on ElasticMapReduce-master
b. Right click In the list of permissions and select Add Permission from the context menu. Enter port 8453 and you can leave default value for Network Mask, or restrict it to a specific IP address or subnet.
33 | P a g e © 2014. Agile ISS. All rights reserved.
c. Open MCS in the browser at https://master-node-public-dns:8453. It’s going to warn about the certificate, ask about applying licenses, etc. Do all that.
d. Locate master node by hovering over green squares in the Dashboard view and click on it. Check the running services.
34 | P a g e © 2014. Agile ISS. All rights reserved.
Connecting To Master Node From Eclipse
1. Go to Window->Open Perspective->Other->Remote System Explorer
2. Click “Define a connection to remote system button” (the one with “+” sign on the toolbar on the left)
35 | P a g e © 2014. Agile ISS. All rights reserved.
36 | P a g e © 2014. Agile ISS. All rights reserved.
5. Right click on the “EMR MapR Master Node” in the Remote Systems window and choose Connect from the context menu. Change User ID to hadoop, leave password blank and click OK.
37 | P a g e © 2014. Agile ISS. All rights reserved.
6. In the Properties window it has to say “Some subsystems connected”
7. Right click on Ssh Terminals under EMR MapR Master Node connection and select Launch Terminal from the context menu. The terminal should launch with the master node prompt.
8. Switch back to AWS Management perspective
9. Open the Remote Systems view by going to Window->Show View->Other->Remote Systems->Remote Systems.
10. Open the Terminals view by going to Window->Show View->Other->Remote Systems->Terminals
38 | P a g e © 2014. Agile ISS. All rights reserved.
11. Your environment should look like the screenshot below with AWS Explorer, Project Explorer and Remote Systems on the left, the opened file to work on at the top and all the EC2, Git, Windows CMD and Terminal connected to master node at the bottom.
12. Right click on Local Files under Local in Remote Systems view and select New->Filter from the context menu. Enter the path to the working directory for your project.
39 | P a g e © 2014. Agile ISS. All rights reserved.
13. After filter is set up you should see the project folder and the *.sh file you are editing
14. Create /opt/mapr/custom directory on the master node and change the owner to the hadoop user
40 | P a g e © 2014. Agile ISS. All rights reserved.
$ sudo mkdir /opt/mapr/custom
$ sudo chown hadoop /opt/mapr/custom
15. Follow similar procedure to set filter under Sftp Files that points to /opt/mapr/custom folder on the master node.
16. Now you should be able to drag and drop the file from Local to Sftp folders.
Running the Bootstrap Script
The finished script is located in GitHub at https://github.com/dmitrisafine/aws-emr-mapr/blob/master/aws-emr-mapr/emr-mapr-bootstrap.sh
1. Upload the script to S3.
41 | P a g e © 2014. Agile ISS. All rights reserved.
ruby elastic-mapreduce –create –alive –instance-type m1.large –num-instances 3 –supported-product mapr –name “MapR M3 Cluster” –args “–edition,m3″ –bootstrap-action “s3://mybucket/emr-mapr-bootstrap.sh”
3. Note your Job Flow ID.
Testing the Cluster
1. Find out the public DNS of the master node by running
C:\AWS\elastic-mapreduce-cli\ruby elastic-mapreduce –describe <Job Flow ID>
2. Go to MCS at https://master_node_public_dns:8453 and login with hadoop/hadoop as a username and password.
42 | P a g e © 2014. Agile ISS. All rights reserved.
3. Go to Hue at http://master_node_public_dns:8888 and login with hadoop/mapr as username and password. It should say All OK. Configuration check passed.