Hadoop distributed file system aka HDFS support for write once and read many kind of algorithm and the way of access of the files in HDFS are sequential reads.
No Updates and Cache
Typically in HDFS – we can
1. Write the new files from local computer to HDFS.
2. Delete the files or folder in HDFS
3. Append the data into existing data
4. We cannot update the files in HDFS. That is the limitations of HDFS
5. HDFS does not provide local cache.
Name Node Metadata, Fsimage and Edit Log
Name Node metadata only has data block tracks file names, permissions and locations of this block. Typically this metadata stores in name node main memory as a small file.
The below two files which stores in Name Node local file system.
Fsimage: The persistent storage of file system file (metadata store – namespace) called as Fsimage which contains mapping of blocks to files and its properties which stores in name node local file system.
EditLog: The transaction log which also persistent record every change occurs in the file system called as Editlog file which also stored in Name Node local file system.
HDFS Communication protocols
- All HDFS communication protocols are layered on top of the TCP/IP protocol
- A client establishes a connection to a configurable TCP port on the Name node machine. It talks ClientProtocol with the Name node.
- The Data nodes talk to the Name node using Datanode protocol.
- RPC (Remote Procedure Call) abstraction wraps both ClientProtocol and Datanode protocol.
- Name node never initiates any RPCs. it only responds to RPC requests issued by Data Nodes or clients.