Use Docker to build a Hadoop distributed environment

Hits: 0

01 Install [Docker]

Install Docker in Ubuntu Reference: Docker installation and deployment and basic operations

Windows 10 Install Docker Desktop for Windows

Docker Desktop for Windows supports the 64-bit version of Windows 10 Pro, and Hyper-V must be turned on (if the version is v1903 and above, you do not need to turn on Hyper-V), or the 64-bit version of Windows 10 Home v1903 and above.

Docker Desktop installation is very convenient, manually download the installation package download address , and then directly click the Docker Desktop Installer.exefile to install.

installation error

If the following error occurs during the installation of Docker Desktop:

Just click the link in the error message to update the Linux kernel,

Click the link to enter the following page

Download this WSL2 Linux kernel update package for x64 machinesupgrade package, and after installation, click on the Docker desktop installation error prompt box Restart, and the installation will be completed and started successfully.

Start running Docker

After the installation is complete, you can see the Docker desktop shortcut on the desktop. Double-click to start it. After the startup is successful, the whale icon as shown below will appear on the Windows taskbar.

Now you can use Docker using Docker commands in PowerShell

Domestic mirror acceleration

It is sometimes difficult to pull images from Docker Hub in China. At this time, it is necessary to configure domestic image sources for acceleration.

The configuration of Docker Desktop installed on Windows is very simple, just select Settings in the right-click menu of the Docker icon in the taskbar tray, open the configuration window and select Docker Engine in the left navigation menu, edit the json file on the right as below, and then click Apply & After Restart is saved, Docker will restart and apply the configured image address.

{
  "registry-mirrors": [
    "https://hub-mirror.c.163.com",
    "https://mirror.baidubce.com"
  ]
}

After the configuration is complete, use the docker infocommand . If the console outputs the following content, the configuration is successful

Registry Mirrors:
 https://hub-mirror.c.163.com/

02 Download [Hadoop] image

To create a Hadoop container, we need a suitable Hadoop image. Here we use the highly praised docker-hadoop image on Github. Use the following command to clone the image locally

git clone https://github.com/big-data-europe/docker-hadoop.git

Then go to the docker-hadoopdirectory and run

docker-compose up -d

Download hadoop image and create container.

After the command is executed, use the docker container lscommand to view the started container, we can see the following 5 nodes

After the Hadoop cluster is successfully started, you can access each node through the following URL

Namenode: http://<dockerhadoop_IP_address>:9870/dfshealth.html#tab-overview
History server: http://<dockerhadoop_IP_address>:8188/applicationhistory
Datanode: http://<dockerhadoop_IP_address>:9864/
Nodemanager: http://<dockerhadoop_IP_address>:8042/node
Resource manager: http://<dockerhadoop_IP_address>:8088/

Accessing Namenode through a browser can see the following Hadoop cluster management page

add data node

At this point, the Hadoop cluster has been created. If you want to add nodes, you can do so by modifying the docker-compose.ymlfiles .

For example, we add two data nodes datanode to the current cluster and modify the docker-compose.ymlfile as follows:

datanode:
    image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8
    container_name: datanode
    restart: always
    volumes:
      - hadoop_datanode:/hadoop/dfs/data
    environment:
      SERVICE_PRECONDITION: "namenode:9870"
    env_file:
      - ./hadoop.env
  datanode2:
    image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8
    container_name: datanode2
    restart: always
    volumes:
      - hadoop_datanode2:/hadoop/dfs/data
    environment:
      SERVICE_PRECONDITION: "namenode:9870"
    env_file:
      - ./hadoop.env
  datanode3:
    image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8
    container_name: datanode3
    restart: always
    volumes:
      - hadoop_datanode3:/hadoop/dfs/data
    environment:
      SERVICE_PRECONDITION: "namenode:9870"
    env_file:
      - ./hadoop.env

Then re-execute docker-compose up -dto add nodes

03 Test the Hadoop cluster

test preparation

We use a simple word frequency count mapreduce task to test the Hadoop cluster

First download the hadoop-mapreduce-examples jar package

Then use the following command to copy the jar package to the namenode node

docker cp .\hadoop-mapreduce-examples-2.7.1.jar namenode:/tmp/

Then we create a input.txttest file and enter the text content

We can only go faster, we can only aim higher, we can only become stronger by standing together — in solidarity.

Then also copy this input file to the namenode node

docker cp .\input.txt namenode:/tmp/

start testing

First use the following command to enter the namenode container and enter the tmp directory

docker exec -it namenode /bin/bash
cd tmp/

Then use the following command to create an input directory in HDFS

hdfs dfs -mkdir -p /user/root/input

input.txtStore input files in HDFS

hdfs  dfs -put input.txt /user/root/input 
# View the content of the input file 
hdfs  dfs -cat /user/root/input/input.txt

Tips: You can add files to the specified Datanode node with the following command
hdfs dfs -put Input.txt the-datanode-id

Finally, use the following command to run the wordcount word frequency statistics mapreduce task in the Hadoop cluster

hadoop  jar hadoop-mapreduce-examples-2.7.1.jar wordcount input output 
# View the running result 
hdfs  dfs -cat output/part-r-00000

The word frequency statistics of the input text content are as follows

We can only go faster, we can only aim higher, we can only become stronger by standing together — in solidarity.

References

Docker Hadoop

You may also like...

Leave a Reply

Your email address will not be published.