HDFS Operations
The main objective of this article is to understand hdfs commands and operations. HDFS (Hadoop Distributed File System) commands are similar to Linux commands. So, if you are familiar to Linux commands, you can understand them more easily and you can do almost all the operations on the HDFS that we can do on a local file system like create a directory, copy the file, change permissions, etc.
Hadoop provides two types of commands to interact with File System; hadoop fs
or hdfs dfs
Before begin, let’s take a look some system commands.
version
command shows hadoop version.
$ hadoop version
fsck
command is used to check the health of the file system, to find missing files, over replicated, under replicated and corrupted blocks.
$ hdfs fsck /
List Files
ls
command list all the files/directories for the given hdfs destination path:
$ hdfs dfs -ls /
There some options of ls
command.
-d
option will list the details of hadoop folder:
$ hdfs dfs -ls -d /
-h
option will format file sizes in a human-readable fashion:
$ hdfs dfs -ls -h /
-R
option will Recursively list all files and all subdirectories in hadoop directory:
$ hdfs dfs -ls -R /
or
$ hdfs dfs -lsr /
Read Files
cat
command displays the content of a file.
$ hdfs dfs -cat /test.log
text
takes a source file and outputs the file in text format.
$ hdfs dfs -text /test.log
Creating, Deleting and Copying Files
touchz
creates a file of zero length.
$ hdfs dfs -touchz /test.txt
rm
command sends a file to the trash.
$ hdfs dfs -rm /test.txt
skipTrash
option will bypass trash and delete the specified file immediately.
$ hdfs dfs -rm -skipTrash /test.txt
r
option eletes the directory and any content under it recursively.
$ hdfs dfs -rm -r /test.txt
or
$ hdfs dfs -rmr /test.txt
cp
command copies file from source to destination on HDFS. In this case, copying file1 from hadoop directory to hadoop1 directory.
$ hdfs dfs -cp /hadoop/file1 /hadoop1
f
option overwrites the destination if it already exists.
$ hdfs dfs -cp -f /hadoop/file1 /hadoop1
mv
command moves files from source to destination.
$ hdfs dfs -mv /hadoop/file1 /hadoop1
This command also allows multiple sources as well in which case the destination needs to be a directory.
$ hdfs dfs -mv /hadoop/file1 /hadoop/file2 /hadoop/file3 /hadoop/dir1
Directory Operations
To create a new directory on hdfs, we use mkdir
command.
The following command will create a directory named data:
$ hdfs dfs -mkdir /data
rmdir
command deletes a directory:
$ hdfs dfs -rmdir /data
Upload/Download Files
put
command copies the file from local file system to HDFS.
$ hdfs dfs -put /home/user/file /hadoop
copyFromLocal
command is similar to put
command, except that the source is restricted to a local file reference. And f
option will overwrite the destination if it already exists.
$ hdfs dfs -copyFromLocal /home/user/file /hadoop
moveFromLocal
command works similarly to the put
command, except that the source is deleted after it’s copied.
$ hdfs dfs -moveFromLocal /home/user/file /hadoop
get
command copies the file from HDFS to local file system.
$ hdfs dfs -get /hadoop/file /home/user/
copyToLocal
command works similarly to the get
command, except that the destination is restricted to a local file reference.
$ hdfs dfs -copyToLocal /home/user/file /hadoop
moveToLocal
command works similarly to the put
command, except that the source is deleted after it’s copied.
$ hdfs dfs -moveToLocal /home/user/file /hadoop
appendToFile
appends the content of a local file text1 to a hdfs file text2.
$ hdfs dfs -appendToFile /home/user/text1 /hadoop/text2
Permissions and Ownership Operations
To change permissions of a file:
$ hdfs dfs -chmod 755 /hadoop/file
To change ownership of a file:
$ hdfs dfs -chown user:hadoop /hadoop/file
To change group association of a file:
$ hdfs dfs -chgrp user /hadoop
Other Commands
stat
returns the stat information on the path.
getfacl
displays the Access Control Lists (ACLs) of files and directories.
setfacl
sets the Access Control Lists (ACLs) of files and directories.
getmerge
takes a source directory and a destination file as input and concatenates files in src into the destination local file.
setrep
changes the replication factor of a file.
count
counts the number of directories, files and bytes under the paths that match the specified file pattern.
tail
displays last kilobyte of the file to stdout.
expunge
empties the trash.
du
displays sizes of files and directories contained in the given directory or the length of a file.
dus
displays a summary of file lengths.