Hadoop Online Training: Retrieving data from Hadoop Distributed File System (HDFS)

Thursday 4 October 2012

Retrieving data from Hadoop Distributed File System (HDFS) - Hadoop Online Training

There are multiple ways to retrieve files from the distributed file system. One of the easiest is to use catto display the contents of a file on stdout. (It can, of course, also be used to pipe the data into other applications or destinations.)

Step 1: Display data with cat.

If you have not already done so, upload some files into HDFS. In this example, we assume that a file named "foo" has been loaded into your home directory on HDFS.

  someone@anynode:hadoop$ bin/hadoop dfs -cat foo
  (contents of foo are displayed here)
  someone@anynode:hadoop$

Step 2: Copy a file from HDFS to the local file system.

The get command is the inverse operation of put; it will copy a file or directory (recursively) from HDFS into the target of your choosing on the local file system. A synonymous operation is called -copyToLocal.

  someone@anynode:hadoop$ bin/hadoop dfs -get foo localFoo
  someone@anynode:hadoop$ ls
  localFoo
  someone@anynode:hadoop$ cat localFoo
  (contents of foo are displayed here)

Like the put command, get will operate on directories in addition to individual files.

1 comment:

Unknown29 July 2016 at 16:00
Thanks for your valuable post ofHadoop Online Training is very informaive and useful for who wants to learn about Hadoop

Visit :http://www.trainingbees.com/
ReplyDelete
Replies

Add comment