Thursday 4 October 2012

Interacting With Hadoop Distributed File System (HDFS) - Hadoop Online Training


This section will familiarize you with the commands necessary to interact with HDFS, loading and retrieving data, as well as manipulating files. This section makes extensive use of the command-line.
The bulk of commands that communicate with the cluster are performed by a monolithic script namedbin/hadoop. This will load the Hadoop system with the Java virtual machine and execute a user command. The commands are specified in the following form:
  user@machine:hadoop$ bin/hadoop moduleName -cmd args...
The moduleName tells the program which subset of Hadoop functionality to use. -cmd is the name of a specific command within this module to execute. Its arguments follow the command name.
Two such modules are relevant to HDFS: dfs and dfsadmin. Their use is described in the sections below.

COMMON EXAMPLE OPERATIONS

The dfs module, also known as "FsShell," provides basic file manipulation operations. Their usage is introduced here.
A cluster is only useful if it contains data of interest. Therefore, the first operation to perform is loading information into the cluster. For purposes of this example, we will assume an example user named "someone" -- but substitute your own username where it makes sense. Also note that any operation on files in HDFS can be performed from any node with access to the cluster, whose conf/hadoop-site.xml is configured to set fs.default.name to your cluster's NameNode. We will call the fictional machine on which we are operating anynode. Commands are being run from the "hadoop" directory where you installed Hadoop. This may be /home/someone/src/hadoop on your machine, or/home/foo/hadoop on someone else's. These initial commands are centered around loading information into HDFS, checking that it's there, and getting information back out of HDFS.

No comments:

Post a Comment