- Export the project as a runnable
jar using the run configuration that
you just implicitly created, but for
library handling pick "Copy required
libraries...". We are going to use
the libraries that are on the
cluster to avoid conflicts. Keep
track of where the jar goes.
- On your local machine, download the
2.7.1 binary package of Hadoop from the
Apache Hadoop Releases page
- Transfer nine files to wcpkneel
- The jar you exported
- The binary hadoop package
that ends in ".tar.gz"
- An input file for you to test your word counting on,
perhaps this one
- capacity-scheduler.xml
- core-site.xml
- hadoop-env.sh
- mapred-site.xml
- yarn-env.sh
- yarn-site.xml
This may take a long time if you are
doing it from a slow spot on the
net.
- on the remote machine uncompress the hadoop binary file. If the binary ends in ".tar.gz" you can use the command "tar xvofz <file_name>" to do that
- This will create a directory called hadoop-2.7.1
- cd into it and make a directory
(mkdir) called "conf" (for
configuration)
- put the last 6 configuration files above into that directory
- Now, from the hadoop-2.7.1 directory
you should be able to run
bin/hdfs dfs -ls /
To list what is on the distributed
file system: the one that is split
across all the nodes in the cluster.
- Make your own directory in the
distributed file system with the
command:
bin/hdfs dfs -mkdir /user/<user_name>
- Look to see what is in your
distributed file system directory
with the command:
bin/hdfs dfs -ls /user/<user_name>
- You can also see what's in the
distributed file system by looking
at the filesystem in a web browser
- Make your own input directory in the
distributed file system with the
command:
bin/hdfs dfs -mkdir /user/<user_name>/input
- Move your input file from
wcpkneel's filesystem into the
distributed file system using this
command:
bin/hdfs dfs -copyFromLocal input.txt /user/<user_name>/input
- Check to make sure the file
arrived there by using the -ls
command or the filesystem
browser
- If you need to delete a file use
the command, -rm, in place
of, -mkdir, or -ls
above
- The files on the dfs may be deleted
at any time if the cluster crashes. So don't keep anything
important there
-
If the warning about native
libraries bothers you, you can
replace the files in lib/native with
these that I compiled locally on
wcpkneel. It bugged me so I
replaced them.
- Now run your program on the cluster
bin/hadoop jar <yourJar>.jar /user/<user_name>/input /user/<user_name>/output
- If everything went okay, then your answer should be in the output directory. View it in the filebrowser
- If you use the sample input file from above, then this should be your output: sample-output.txt
- You can see you program execute here
on the cluster