This Tech Talk is free and open to everyone. Register below to get a link to join the live event.
Format
Date
RSVP
Presentation + Q&A
Wednesday, August 5, 2020, 1:00–2:00 p.m. ET
If you can’t join us live, the video recording will be published here as soon as it’s available.
About the Talk
Kubernetes is hard. In our spirit for all things simple, a team of engineers and designers at DigitalOcean set out to create a Kubernetes experience that developers can love. They built features that help you go from zero to running applications as quickly as possible, without the hassle of management and maintenance.
Hear from Phil Dougherty, Senior Product Manager at DigitalOcean, who will walk through how you can easily set up your own Kubernetes cluster.
Cert-manager is a Kubernetes add-on designed to assist with the creation and management of TLS certificates. Similar to Certbot, cert-manager can automate the process of creating and renewing self-signed and signed certificates for a large number of use cases, with a specific focus on container orchestration tools like Kubernetes.
Note
This guide assumes a working knowledge of Kubernetes key concepts, including master and worker nodes, Pods, Deployments, and Services. For more information on Kubernetes, see our Beginner’s Guide to Kubernetes series.
Understanding Cert Manager Concepts
Cert-Manager is divided into a number of components and microservices that are each designed to perform specific tasks necessary for the certificate lifecycle.
Issuers and ClusterIssuers
Certificate creation begins with Issuers and ClusterIssuers, resources that represent certificate authorities and are able to generate signed certificates using a specific issuer type. An issuer type represents the method used to create your certificate, such as SelfSigned for a Self-Signed Certificate and ACME for requests for certificates from ACME servers, typically used by tools like Let’s Encrypt. All supported issuer types are listed in Cert-Manager’s Documentation.
While Issuers resources are only able to create certificates in the namespace they were created in, ClusterIssuers can create certificates for all namespaces. This guide provides an example that demonstrates how ClusterIssuers creates certificates for all namespaces in the cluster.
Certificates and CertificateRequests
Although Issuers are responsible for defining the method used to create a certificate, a Certificate resource must also be created to define how a certificate is renewed and kept up to date.
After a Certificate resource is created, changed, or a certificate referenced needs renewal, cert-manager creates a corresponding CertificateRequest resource, which contains the base64 encoded string of an x509 certificate request (CSR). Additionally, if successful, it contains the signed certificate where one is successfully returned and updates the Ready condition status to True.
Note
A CertificateRequest resource is not designed to interact with a user directly, and instead is utilized through controllers or similar methods where needed.
ACME Orders and Challenges
For external certificates from ACME servers, cert-manager must be able to solve ACME challenges in order to prove ownership of DNS names and addresses being requested.
An Order resource represents and encapsulates the multiple ACME challenges the certificate request requires for domain validation. The Order resource is created automatically when a CertificateRequest referencing an ACME Issuer or has been created.
Challenge resources represent all of the steps in an ACME challenge that must be completed for domain validation. Although defined by the Order, a separate Challenge resource is created for each DNS name that is being validated, and each are scheduled separately.
ACME Order and Challenge resources are only created for Issuers and ClusterIssuers with a type of ACME.
Note
An order or challenge resource is never manually created directly by a user and are instead defined through CertificateRequest resources and the Issuers type. After it is issued, order and challenge resources cannot be changed.
This feature includes the ability to request certificates through Let’s Encrypt.
Installing Cert-Manager
Cert-Manager can be easily installed through a single command as follows:
As the installation completes, you should see a number of required resources created, including a cert-manager namespace, RBAC rules, CRD’s, and a webhook component. To confirm that the installation was a success, enter the following:
kubectl get pods --namespace cert-manager
The output is similar to the following:
NAME READY STATUS RESTARTS AGE
cert-manager-766d5c494b-l9sdb 1/1 Running 0 19m
cert-manager-cainjector-6649bbb695-bz999 1/1 Running 0 19m
cert-manager-webhook-68d464c8b-86tqw 1/1 Running 0 19m
Next Steps
To learn how to apply some of the concepts learned in this guide, see the Configuring Load Balancing with TLS Encryption on a Kubernetes Cluster guide.
More Information
You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.
This guide is published under a CC BY-ND 4.0 license.
Kubeflow is an excellent alternative for training and evaluating machine learning models in public and private clouds. Kubeflow is designed to make your machine learning experiments portable and scalable.
Start by creating Jupyter notebooks in the cloud. Once you’re confident in your model, you can scale your model to run on thousands of machines.
Kubeflow optimises your model and breaks it down into smaller tasks that can be processed in parallel. Then, it distributes the tasks to several computers and waits until the results are ready.
Are you ready to train your model at scale?
Before You Begin
You can run this tutorial locally using minikf. However, you should have at least 12GB of RAM, 2CPUs and 50GB of disk space available.
The kfctl command line interface (CLI) used to deploy Kubeflow currently only works on Linux and Mac. If you’re on Windows, you can use WSL2 or a Docker container to work around this limitation.
Most Kubeflow pipelines require Kubernetes Volumes that can be attached to several nodes at once (ReadWriteMany). Currently, the only mode supported by the Linode Block Storage CSI driver is ReadWriteOnce, meaning that it can only be connected to one Kubernetes node at a time.
Caution
This guide’s example instructions create several billable resources on your Linode account. If you do not want to keep using the example cluster created with this guide, be sure to delete it when you have finished. If you remove the resources afterward, you will only be billed for the hour(s) that the resources were present on your account. For more information see our How Linode Billing Works guide. For a full list of plan prices, visit our Pricing page.
Create an LKE Cluster
Follow the instructions in Deploying and Managing a Cluster with Linode Kubernetes Engine Tutorial to create and connect to an LKE cluster.
The official Kubeflow documentation recommends provisioning a cluster with at least 4 CPU cores, 12GB of memory and 50GB of space available. We recommend running three 16GB Linodes — that should give you enough resources to scale your models.
You can verify that the installation is successful with:
kubectl get nodes
The output should be similar to:
NAME STATUS ROLES AGE VERSION
lke7189-9006-5f05145fc9a3 Ready 8h v1.17.3
lke7189-9006-5f051460a1e2 Ready 8h v1.17.3
lke7189-9006-5f0514617a87 Ready 8h v1.17.3
Install Kubeflow
To install Kubeflow, you need three parts:
A Kubernetes cluster, which you already provisioned in the previous step.
kfctl — a command-line tool that configures and deploys Kubeflow.
A KfDef file — a recipe of components that should be included in your Kubeflow installation.
Download and Install kfctl
You can download and install kfctl from the official repository:
Download the latest release of kfctl v1.0.2 from the Kubeflow releases page.
Unpack the tar ball with:
tar -xvf kfctl_v1.0.2_.tar.gz
Add the location of kfctl binary to the path environment variable. If you don’t add the location of the binary to the path variable, you must use the full path to the kfctl binary each time you run it.
export PATH=$PATH:<path to where kfctl was unpacked>
Note, at the time of writing this guide, there is no Windows release available. However, you can use WSL2 or a Docker container to work around this limitation.
Verify that the binary is installed correctly with:
kfctl version
The KfDef file
The last piece of the puzzle is the KfDef file. Think of the KfDef file as a list of components that should be installed with Kubeflow. As an example, Kubeflow can be configured to use the Spark operator, Katib (Hyperparameter Tuning), Seldon serving, etc.
You can download the following KfDef that includes a lighter version of Kubeflow as well as Dex — an identity provider useful for managing user access.
If you open the file, you can see the various components that are about to be installed.
Pay attention to line 116 and 118. Those lines contain the default username and password that you will use to log in into Kubeflow, [email protected] and 12341234. It’s a good idea to change these to non-default values.
You should notice that a new folder named kustomize was created. If you inspect the folder, you should find a collection of components and configurations.
The command reads the KfDef definition and the kustomize folder, and submits all resources to the cluster.
The process could easily take 15 to 20 minutes, depending on your cluster specs. You can monitor the progress of the installation from another terminal with:
kubectl get pods --all-namespaces
As soon as kfctl completes the installation, it might take a little longer for all Pods to be in a Running state.
Accessing Kubeflow
Once the installation is completed, you can decide how you will use Kubeflow. You have two options:
You can temporarily create a tunnel to your Kubeflow cluster. The cluster is private, and only you can access it.
You can expose the cluster to the internet with a NodeBalancer.
Option 1: Creating a Tunnel to Kubeflow
If you prefer creating a tunnel, execute the following command:
Visit http://localhost:8080 in your browser. Skip to the Logging In section.
Option 2: Expose Kubeflow with a NodeBalancer
If you prefer a more permanent solution, you can expose the login page with a NodeBalancer. Execute the following command:
kubectl patch service --namespace istio-system istio-ingressgateway -p '{"spec": {"type": "LoadBalancer"}}'
The command exposes the Istio Ingress Gateway to external traffic.
You can obtain the IP address for the load balancer with:
kubectl get service --namespace istio-system istio-ingressgateway
The output should look like:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
istio-ingressgateway LoadBalancer 10.128.29.15 139.999.26.160 15020:31944/TCP,80:31380/TCP,443:31390/TCP,31400:31400/TCP,15029:32622/TCP,15030:31175/TCP,15031:31917/TCP,15032:31997/TCP,15443:30575/TCP 8m2s
Note the value of the EXTERNAL-IP IP address, and open that address in your browser.
Logging In
If you haven’t changed the credentials as previously suggested, the default username and password are:
Once logged in, you should see Kubeflow’s dashboard.
You have no successfully installed Kubeflow.
Kubeflow has two frequently used features: pipelines and Jupyter notebooks. In the next section, you will create a Jupyter notebook.
Jupyter notebooks
A Jupyter notebook is a convenient environment to explore and create machine learning model. The notebook runs in Kubeflow, so you can create environments that are as big as your Linodes. For example, if you selected 16GB Linode nodes, you might be able to create notebook servers with up to 14GB of memory and 3 CPUs.
Creating a notebook is straightforward, as demonstrated in this animation:
Please note, you can’t create notebook servers that have specs higher than a single Linode. The notebook server is not distributed and runs as a single Pod in the Kubernetes cluster. If you wish to run your model at scale, you might want to use Kubeflow’s Pipelines, which are demonstrated in the next section.
Kubeflow pipelines
Kubeflow pipelines are similar to CI/CD pipelines. You define sequential (or parallel) tasks and connect them.
In CI/CD pipelines, the tasks are usually a sequence of build, test and deploy. In Kubeflow, pipelines you have similar stages: train, predict and serve.
There are a few examples in the “Pipelines” section that you can test. As an example, you can select the “Data passing in Python components” pipeline.
The pipeline has three independent tasks that are meant to demonstrate how to:
Generate data (i.e. writing to a file).
Consume data (i.e. reading from a file).
Transform data (i.e. reading, processing and writing to a file).
You can explore the Python code using a Jupyter notebook.
The same code is used to generate a pipeline in Kubeflow.
Pipelines are written using Python and a Kubeflow domain-specific language (DSL). While you can leverage your existing Python models, there are a small amount of changes necessary to teach Kubeflow how to break your model into smaller parts for distributed processing.
The best resource to continue learning Kubeflow pipelines is the official documentation.
More Information
You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.
This guide is published under a CC BY-ND 4.0 license.