Thursday 4 October 2012

Retrieving data from Hadoop Distributed File System (HDFS) - Hadoop Online Training


There are multiple ways to retrieve files from the distributed file system. One of the easiest is to use catto display the contents of a file on stdout. (It can, of course, also be used to pipe the data into other applications or destinations.)
Step 1: Display data with cat.
If you have not already done so, upload some files into HDFS. In this example, we assume that a file named "foo" has been loaded into your home directory on HDFS.
  someone@anynode:hadoop$ bin/hadoop dfs -cat foo
  (contents of foo are displayed here)
  someone@anynode:hadoop$
Step 2: Copy a file from HDFS to the local file system.
The get command is the inverse operation of put; it will copy a file or directory (recursively) from HDFS into the target of your choosing on the local file system. A synonymous operation is called -copyToLocal.
  someone@anynode:hadoop$ bin/hadoop dfs -get foo localFoo
  someone@anynode:hadoop$ ls
  localFoo
  someone@anynode:hadoop$ cat localFoo
  (contents of foo are displayed here)
Like the put command, get will operate on directories in addition to individual files.

1 comment:

  1. Thanks for your valuable post ofHadoop Online Training is very informaive and useful for who wants to learn about Hadoop

    Visit :http://www.trainingbees.com/

    ReplyDelete