Docker in Data Science and a Friendly Beginner to Docker

Ahmet Okan YILMAZ
5 min readMar 26, 2022

--

You heard of this sentence oftenly: The code worked on my machine! If your code work in your machine, it must work on anywhere. But, how? The answer is Docker.

There are tons of tutorials about Docker on the internet. My mind totally confused. I’ve read a lot of tutorials. I’ve watched a lot of videos on Youtube. And I’ve decided to write everything that I learnt.

Why should I learn Docker as a Data Scientist?

Using Docker means you can give everyone in your team the same data science environment. So the code runs correctly and identically on each machine, and you can also deploy your Docker container to a production server, so that works perfectly too. Your projects will be shareable and reproducible.

And main reason in my opinion, it will help you turn your ML projects into applications and deploy models into production.

What is Docker?

Docker is an open platform for developing, shipping, and running applications. Docker enables you to separate your applications from your infrastructure so you can deliver software quickly. With Docker, you can manage your infrastructure in the same ways you manage your applications.

Source: https://towardsdatascience.com/how-docker-can-help-you-become-a-more-effective-data-scientist

Differences between Docker and VMs

Containers and virtual machines have similar resource isolation and allocation benefits, but function differently because containers virtualize the operating system instead of hardware.

Difference between containerized applications and virtual machines
Source: https://www.docker.com/resources/what-container/

Docker is container based technology and containers are just user space of the operating system. At the low level, a container is just a set of processes that are isolated from the rest of the system, running from a distinct image that provides all files necessary to support the processes. It is built for running applications. In Docker, the containers running share the host operating system kernel.

Virtual machines are made up of user space plus kernel space of an operating system. Under VMs, server hardware is virtualized. Each VM has Operating system and applications. It shares hardware resource from the host.

Docker Terminology

Here’s a bit of Docker terminology you should know:

Container: A container is a runnable instance of an image. You can create, start, stop, move, or delete a container using the Docker API or CLI.

İmage: An image is a read-only template with instructions for creating a Docker container.

Dockerfile: Docker can build images automatically by reading the instructions from a Dockerfile. A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image.

Docker Daemon: The Docker daemon listens for Docker API requests and manages Docker objects such as images, containers, networks, and volumes.

Docker CLI: It is a command line tool that lets you talk to the Docker daemon.

Docker Registry: Docker registry is a service that is storing your docker images.

Docker Repository: Docker repository is a collection of different docker images with same name, that have different tags.

Docker Hub: Docker Hub is a service provided by Docker for finding and sharing container images with your team.

Play with Docker

If you want to try without installing the Docker, go to Play with Docker. Before login to Play with Docker, you need create an account on Docker Hub.

Source: https://labs.play-with-docker.com/

After login, create an instace. Then enjoy with Docker.

Install Docker

To install Docker, just follow this guide for your operating system. I’m using Arch Linux and I’ll explain the installation on linux.

$ yay -S docker docker-compose

To enable the docker.service:

$ sudo systemctl enable docker

And let’s start docker.service:

$ sudo systemctl start docker

To test if everything is working as expected, we will run the following docker command:

$ docker run hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.(amd64)
3. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/

For more examples and ideas, visit:
https://docs.docker.com/get-started/

Let’s take a look at what happened:

· If you do not have the hello-world image locally, Docker pulls it from Docker Hub.

· Docker creates a new container.

· Docker allocates a read-write filesystem to the container.

· Docker creates a network interface to connect the container to the default network.

· Docker starts the container and executes.

· Then the container stops but is not removed.

docker run command did all these tasks for you. But if you want, you can customize docker runcommand.

Basic Docker Commands

You can find all Docker commands here. I have listed the most needed commands.

Before creating a basic ml deployment example, let’s talk about Dockerfile and Docker Compose.

Docker can build images automatically by reading the instructions from a Dockerfile. A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image.

Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services. Then, with a single command, you create and start all the services from your configuration.

Dockerfile and Docker Compose automize our jobs. Instead of these two, we can also do jobs manually. Let’s get your hands dirty in the next article.

--

--

Ahmet Okan YILMAZ
Ahmet Okan YILMAZ

Written by Ahmet Okan YILMAZ

Industrial Engineer | Data Scientist | Factory Manager

No responses yet