HDFS Operations

Ahmet Okan YILMAZ
4 min readMay 4, 2022

--

The main objective of this article is to understand hdfs commands and operations. HDFS (Hadoop Distributed File System) commands are similar to Linux commands. So, if you are familiar to Linux commands, you can understand them more easily and you can do almost all the operations on the HDFS that we can do on a local file system like create a directory, copy the file, change permissions, etc.

Hadoop provides two types of commands to interact with File System; hadoop fs or hdfs dfs

Before begin, let’s take a look some system commands.

version command shows hadoop version.

$ hadoop version

fsck command is used to check the health of the file system, to find missing files, over replicated, under replicated and corrupted blocks.

$ hdfs fsck /

List Files

lscommand list all the files/directories for the given hdfs destination path:

$ hdfs dfs -ls /

There some options of lscommand.

-doption will list the details of hadoop folder:

$ hdfs dfs -ls -d /

-hoption will format file sizes in a human-readable fashion:

$ hdfs dfs -ls -h /

-Roption will Recursively list all files and all subdirectories in hadoop directory:

$ hdfs dfs -ls -R /

or

$ hdfs dfs -lsr /

Read Files

cat command displays the content of a file.

$ hdfs dfs -cat /test.log

text takes a source file and outputs the file in text format.

$ hdfs dfs -text /test.log

Creating, Deleting and Copying Files

touchz creates a file of zero length.

$ hdfs dfs -touchz /test.txt

rmcommand sends a file to the trash.

$ hdfs dfs -rm /test.txt

skipTrash option will bypass trash and delete the specified file immediately.

$ hdfs dfs -rm -skipTrash /test.txt

r option eletes the directory and any content under it recursively.

$ hdfs dfs -rm -r /test.txt

or

$ hdfs dfs -rmr /test.txt

cpcommand copies file from source to destination on HDFS. In this case, copying file1 from hadoop directory to hadoop1 directory.

$ hdfs dfs -cp /hadoop/file1 /hadoop1

f option overwrites the destination if it already exists.

$ hdfs dfs -cp -f /hadoop/file1 /hadoop1

mvcommand moves files from source to destination.

$ hdfs dfs -mv /hadoop/file1 /hadoop1

This command also allows multiple sources as well in which case the destination needs to be a directory.

$ hdfs dfs -mv /hadoop/file1 /hadoop/file2 /hadoop/file3 /hadoop/dir1

Directory Operations

To create a new directory on hdfs, we use mkdircommand.

The following command will create a directory named data:

$ hdfs dfs -mkdir /data

rmdircommand deletes a directory:

$ hdfs dfs -rmdir /data

Upload/Download Files

put command copies the file from local file system to HDFS.

$ hdfs dfs -put /home/user/file /hadoop

copyFromLocal command is similar to putcommand, except that the source is restricted to a local file reference. And f option will overwrite the destination if it already exists.

$ hdfs dfs -copyFromLocal /home/user/file /hadoop

moveFromLocal command works similarly to the put command, except that the source is deleted after it’s copied.

$ hdfs dfs -moveFromLocal /home/user/file /hadoop

get command copies the file from HDFS to local file system.

$ hdfs dfs -get /hadoop/file /home/user/

copyToLocal command works similarly to the get command, except that the destination is restricted to a local file reference.

$ hdfs dfs -copyToLocal /home/user/file /hadoop

moveToLocal command works similarly to the put command, except that the source is deleted after it’s copied.

$ hdfs dfs -moveToLocal /home/user/file /hadoop

appendToFile appends the content of a local file text1 to a hdfs file text2.

$ hdfs dfs -appendToFile /home/user/text1 /hadoop/text2

Permissions and Ownership Operations

To change permissions of a file:

$ hdfs dfs -chmod 755 /hadoop/file

To change ownership of a file:

$ hdfs dfs -chown user:hadoop /hadoop/file

To change group association of a file:

$ hdfs dfs -chgrp user /hadoop

Other Commands

stat returns the stat information on the path.

getfacl displays the Access Control Lists (ACLs) of files and directories.

setfacl sets the Access Control Lists (ACLs) of files and directories.

getmerge takes a source directory and a destination file as input and concatenates files in src into the destination local file.

setrep changes the replication factor of a file.

count counts the number of directories, files and bytes under the paths that match the specified file pattern.

tail displays last kilobyte of the file to stdout.

expunge empties the trash.

du displays sizes of files and directories contained in the given directory or the length of a file.

dus displays a summary of file lengths.

Source

--

--

Ahmet Okan YILMAZ
Ahmet Okan YILMAZ

Written by Ahmet Okan YILMAZ

Industrial Engineer | Data Scientist | Factory Manager

No responses yet