Waiting for each minion to be registered with cloud provider Validating we can run kubectl commands. NAME READY STATUS RESTARTS AGE Connection to 127.0.0.1 closed.
Principal Middleware Architect Blog: http://blog.christianposta.com Twitter: @christianposta Email: christian@redhat.com |
Committer on Apache ActiveMQ, Apache Camel, Fabric8
Technology evangelist, recovering consultant
Spent a lot of time working with one of the largest Microservices, web-scale, unicorn companies
Frequent blogger and speaker about open-source, cloud, microservices
Intro / Prep Environments
Day 1: Docker Deep Dive
Day 2: Kubernetes Deep Dive
Day 3: Advanced Kubernetes: Concepts, Management, Middleware
Day 4: Advanced Kubernetes: CI/CD, open discussions
Linux containers
Docker API
Images
Containers
Registry
Application distribution
Dependency management
Application density
Reduced management overhead in terms of VMs
On premise, hybrid, public cloud
Containers run on single Docker host
Containers are ephemeral
Nothing watchdogs the containers
Containers can have external persistence
Containers do not contain
Operating system matters
Immutable infrastructure
DevOps
CI/CD
Who cares: give me a platform to move faster!!!
Lab prerequisites in place!
Verify you have Oracle VirtualBox 4.3.x
Install Kubernetes http://kubernetes.io/v1.0/docs/getting-started-guides/vagrant.html
Windows: need to have HVT
Understand the architecture
Smoke test
Waiting for each minion to be registered with cloud provider Validating we can run kubectl commands. NAME READY STATUS RESTARTS AGE Connection to 127.0.0.1 closed.
Kubernetes cluster is running. The master is running at:
https://10.245.1.2
The user name and password to use is located in ~/.kubernetes_vagrant_auth.
calling validate-cluster Found 1 nodes. NAME LABELS STATUS 1 10.245.1.3 kubernetes.io/hostname=10.245.1.3 Ready Validate output: NAME STATUS MESSAGE ERROR controller-manager Healthy ok nil scheduler Healthy ok nil etcd-0 Healthy {"health": "true"} nil Cluster validation succeeded Done, listing cluster services:
Kubernetes master is running at https://10.245.1.2 KubeDNS is running at https://10.245.1.2/api/v1/proxy/namespaces/kube-system/services/kube-dns KubeUI is running at https://10.245.1.2/api/v1/proxy/namespaces/kube-system/services/kube-ui
Everything at Google runs in containers!!
Gmail, search, maps
2 billion containers a week
GCE? VMs in containers…
"helmsman of a ship"
Containers @ Google
Open source 6/2014
GKE…
Red Hat a top contributor
100% written in golang
Do we need this?
Different way to look at managing instances: scale
Design for failure
Efficient / Lean/ Simple
Portability
Extensible
How to place containers on a cluster
Smart placement
How to interact with a system that does placement
Different than configuration management
Immutable infrastructure principles
What to do when containers fail?
Containers will fail
Cluster security authZ/authN
Scaling
Grouping/Aggregates
Managing containers by hand is harder than VMs: won’t scale
Automate the boilerplate stuff
Runbooks → Scripts → Config management → Scale
Decouple application from machine!
Applications run on "resources"
Kubernetes manages this interaction of applications and resources
Manage applications, not machines!
What about legacy apps?
Simplicity, Simplicity, Simplicity
Pods
Labels / Selectors
Replication Controllers
Services
etcd
API Server
Scheduler
Controller manager
Open source project started at CoreOS
Distributed database
CAP Theorem? == CP
Raft algorithm/protocol
watchable
etcd provides HA datastore
Nodes are VMs / physical hosts
Nodes need connectivity between them
Ideally same network/data center/availability zone
Kubelet
Watches for pods to be assigned to node
Mount volumes
Install secrets
Runs the pod (via Docker)
Reports pod status / node status
kube-proxy
Connection forwarding
Kube services
Docker
Monitoring
DNS
UI
Logging
Pods
Labels / Selectors
Replication Controllers
Services
A pod is one or more docker containers
Ensures collocation / shared fate
Pods are scheduled and do not move nodes
Docker containers share resources within the pod
Volumes
Network / IP
Port space
CPU / Mem allocations
Pod health probes
Readiness
Liveness
Out of the box
Readiness
Liveness
Probe types
ExecAction
TCPSocketAction
HTTPGetAction
Results
Success
Failure
Unknown
Pods restarted on single node
Only replication controller reschedules
RestartPolicy
Always (default)
OnFailure
Never
Applies to all containers in the pod
For replication controllers, only Always applicable
Some Examples:
Pod is Running, 1 container, container exits success
Always: restart container, pod stays Running
OnFailure: pod becomes Succeeded
Never: pod becomes Succeeded
Pod is Running, 1 container, container exits failure
Always: restart container, pod stays Running
OnFailure: restart container, pod stays Running
Never: pod becomes Failed
Pod is Running, 2 containers, container 1 exits failure
Always: restart container, pod stays Running
OnFailure: restart container, pod stays Running
Never: pod stays Running
When container 2 exits…
Always: restart container, pod stays Running
OnFailure: restart container, pod stays Running
Never: pod becomes Failed
Help debug problems by putting messages into "termination log"
default location /dev/termination-log
Quick demo?
Let’s take a look at the Kubernetes resource file that we will use and evolve for the purposes of this first hands-on experience:
apiVersion: v1
kind: Pod
metadata:
labels:
name: influxdb
name: influxdb
spec:
containers:
- image: docker.io/tutum/influxdb:latest
name: influxdb
ports:
- containerPort: 8083
name: admin
protocol: TCP
- containerPort: 8086
name: http
protocol: TCP
You can either copy/paste that locally, or download the pod resource file like this:
curl -s -L https://raw.githubusercontent.com/christian-posta/docker-kubernetes-workshop/master/demos/hands-on-pods/influxdb-simple.yaml -O influxdb-simple.yaml
Once you have the file locally downloaded, run the kubectl command (./cluster/kubectl.sh
) like this:
kubectl create -f influxdb-simple.yaml
Wait for the Docker image to download and the pod to start. Can watch it like this:
kubectl get pod --watch
Can watch the docker events in another window like this (to keep track of what’s happening on the docker side) — note, you’ll need to be logged into minion-1
directly to see the docker events:
docker events
Now that we have our first pod up, let’s see where it’s deployed:
kubectl describe pod/influxdb
Note where it’s deployed (Node field).
ceposta@postamac(kubernetes (master)) $ kubectl describe pod/influxdb Name: influxdb Namespace: demo-pods Image(s): docker.io/tutum/influxdb:latest Node: 10.245.1.3/10.245.1.3 Labels: name=influxdb Status: Running Reason: Message: IP: 10.246.1.7 Replication Controllers: <none> Containers: influxdb: Image: docker.io/tutum/influxdb:latest State: Running Started: Sun, 18 Oct 2015 17:09:16 -0700 Ready: True Restart Count: 0 Conditions: Type Status Ready True Events: <truncated>
Let’s login to our minion-1 machine and interact with the pod. We found out which machine the pod was running on in the last slide.
vagrant ssh minion-1
Let’s try ping the pod to see if we can reach it from the minion (we could have also logged into the master
and run the same commands; the assumption kubernetes makes is a flat network reachable from anywhere in the cluster. More on the kubrentes network model in Day 3).
ping 10.246.1.7
Note, you should use the IP address of the pod from the above kubectl describe
command.
[vagrant@kubernetes-master ~]$ ping 10.246.1.7 PING 10.246.1.7 (10.246.1.7) 56(84) bytes of data. 64 bytes from 10.246.1.7: icmp_seq=1 ttl=63 time=2.43 ms 64 bytes from 10.246.1.7: icmp_seq=2 ttl=63 time=8.41 ms 64 bytes from 10.246.1.7: icmp_seq=3 ttl=63 time=1.27 ms 64 bytes from 10.246.1.7: icmp_seq=4 ttl=63 time=2.23 ms ^C --- 10.246.1.7 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3004ms rtt min/avg/max/mdev = 1.271/3.587/8.411/2.819 ms
Now, let’s issue a command to that IP address. Note, these commands must be run from either master
or minion-1
export INFLUX_NODE=10.246.1.7
curl -G "http://$INFLUX_NODE:8086/query" --data-urlencode "q=CREATE DATABASE kubernetes"
curl -X POST "http://$INFLUX_NODE:8086/write?db=kubernetes" -d 'pod,host=node0 count=10'
curl -G "http://$INFLUX_NODE:8086/query?pretty=true" --data-urlencode "db=kubernetes" --data-urlencode "q=SELECT SUM(count) FROM pod"
We add a new database to the influxdb index, then we add some data, and finally we query it. You should see 10 as the result.
At this point, we’ve deployed our first application on Kubernetes, but it doesn’t do too much. Earlier we discussed the different probe options we have for determining whether a pod is healthy or ready for action. Let’s add some of that to our pod. The pod YAML resource looks like this:
apiVersion: v1
kind: Pod
metadata:
labels:
name: influxdb
name: influxdb
spec:
containers:
- image: docker.io/tutum/influxdb:latest
name: influxdb
readinessProbe:
httpGet:
path: /ping
port: 8086
initialDelaySeconds: 5
timeoutSeconds: 1
livenessProbe:
httpGet:
path: /ping
port: 8086
initialDelaySeconds: 15
timeoutSeconds: 1
ports:
- containerPort: 8083
name: admin
protocol: TCP
- containerPort: 8086
name: http
protocol: TCP
First, let’s stop and remove our previous influxdb
pod since we’ll be re-doing it this time with health-checking:
kubectl stop pod/influxdb
Curl the pod resource as we did before.
curl -s -L https://raw.githubusercontent.com/christian-posta/docker-kubernetes-workshop/master/demos/hands-on-pods/influxdb-readiness.yaml -O influxdb-rediness.yaml
Use the kubectl
command line again to create our pod:
kubectl create -f influxdb-readiness.yaml
Now that we’ve deployed our liveness and readiness checks, our Pod is not automatically monitored by kubernetes. If there is ever an issue with it, Kubernetes will kill the pod and depending on its restartPolicy will take action.
We have readiness and liveness checks, but we may also want to constrain what the influxdb
pod(s) do in terms of isolation and CPU/Memory usage. We don’t want a single pod taking over all of the underlying resources and starving other processes.
Let’s alter our pod yaml resource descriptor to add in resource constraints:
apiVersion: v1
kind: Pod
metadata:
labels:
name: influxdb
name: influxdb
spec:
containers:
- image: docker.io/tutum/influxdb:latest
name: influxdb
resources:
limits:
cpu: "500m"
memory: "128Mi"
readinessProbe:
httpGet:
path: /ping
port: 8086
initialDelaySeconds: 5
timeoutSeconds: 1
livenessProbe:
httpGet:
path: /ping
port: 8086
initialDelaySeconds: 15
timeoutSeconds: 1
ports:
- containerPort: 8083
name: admin
protocol: TCP
- containerPort: 8086
name: http
protocol: TCP
Let’s stop our previous pod, download the new manifest and run it:
kubectl stop pod/influxdb
Curl the pod resource as we did before.
curl -s -L https://raw.githubusercontent.com/christian-posta/docker-kubernetes-workshop/master/demos/hands-on-pods/influxdb-resource-limits.yaml -O influxdb-resource-limits.yaml
Use the kubectl
command line again to create our pod:
kubectl create -f influxdb-resource-limits.yaml
Now our pod is allocated specific limits so we can sleep at night knowing we have resource isolation.
Whenever we add new databases to our influxdb
and then stop the pod, we find that the data in the influxdb
is not persisted. Let’s add a way to persist that information.
Kubernetes offers a few persistence mechanisms out of the box:
emtpyDir
hostpath
gcePersistentDisk
awsElasticBlockStorage
nfs
glusterfs
gitRepo
In tomorrow’s demo, we’ll show how to use a shared-storage mechanism in Google Compute Engine to create stateful pods. But for now, we’ll just map to a location on the Node/host. Note this has implications about pod rescheduling etc that will discussed in the class.
Let’s create the Kubernetes Persistent Volume
curl -s -L https://raw.githubusercontent.com/christian-posta/docker-kubernetes-workshop/master/demos/hands-on-pods/local-persistent-volume.yaml | kubectl create -f -
Now we’re going to create a "claim" for persistent volume space (details explained in class):
curl -s -L https://raw.githubusercontent.com/christian-posta/docker-kubernetes-workshop/master/demos/hands-on-pods/local-pvc.yaml | kubectl create -f -
Make sure our influxdb pod is stopped
kubectl stop pod influxdb
And now deploy our new pod that uses the PersistentVolume:
curl -s -L https://raw.githubusercontent.com/christian-posta/docker-kubernetes-workshop/master/demos/hands-on-pods/influxdb-local-pv.yaml | kubectl create -f -
Now let’s figure out what IP address the new influxd pod lives:
kubectl describe pod/influxdb
And now let’s query/interact with influxdb
export INFLUX_NODE=10.246.1.7
curl -G "http://$INFLUX_NODE:8086/query" --data-urlencode "q=CREATE DATABASE kubernetes"
curl -X POST "http://$INFLUX_NODE:8086/write?db=kubernetes" -d 'pod,host=node0 count=10'
curl -X POST "http://$INFLUX_NODE:8086/write?db=kubernetes" -d 'pod,host=node1 count=9'
curl -X POST "http://$INFLUX_NODE:8086/write?db=kubernetes" -d 'pod,host=node2 count=12'
curl -G "http://$INFLUX_NODE:8086/query?pretty=true" --data-urlencode "db=kubernetes" --data-urlencode "q=SELECT SUM(count) FROM pod"
Now let’s kill the pod and restart it. We should see the same databases and data in the influxbd even after we restart it:
kubectl stop pod/influxdb
Restart it:
curl -s -L https://raw.githubusercontent.com/christian-posta/docker-kubernetes-workshop/master/demos/hands-on-pods/influxdb-local-pv.yaml | kubectl create -f -
Query it:
curl -G "http://$INFLUX_NODE:8086/query?pretty=true" --data-urlencode "db=kubernetes" --data-urlencode "q=SELECT SUM(count) FROM pod"
Use PersistentVolumes for stateful pods/containers
Use Kubernetes secrets to distribute credentials
Leverage readiness and liveness probes
Consider resources (CPU/Mem) limites/quotas
Use termination messages
Decouple application dimensions from infrastructure deployments
Tiers
Release Tracks
Grouping of microservices
Kubernetes clusters are dynamic
Predicated on failure
Need a way to identify groups of services
Be able to add/remove from group on failures
Needs to be flexible
Queryable
Kubernetes Labels
Arbitrary metadata
Attach to any API objects
eg, Pods, ReplicationControllers, Endpoints, etc
Simple key=value assignments
Can be queried with selectors
release=stable, release=canary
environment=dev, environment=qa, environment=prod
tier=frontend, tier=backend, tier=middleware
partition=customerA, partition=customerB, etc
used to group “like” objects — no expectation of uniqueness
The structure of a key:
<prefix>/<name>
example key org.apache.hadoop/partition
Prefix can be up to 253
chars
Name can be up to 63
chars
Values can be ([a-z0-9A-Z]) with dashes (-), underscores (_), dots (.)
Values can be up to 63
chars
Labels are queryable metadata; selectors can do the queries:
Equality based
environment = production
tier != frontend
combinations: "tier != frontend, version = 1.0.0"
Set based
environment in (production, pre-production)
tier notin (frontend, backend)
partition
Attaching useful metadata to objects; not queryable
Data that you want to know to build context; easy to have it close
Author information
build date/timestamp
links to SCM
PR numbers/gerrit patchsets
Annotations are not queryable *use labels for that)
Can build up tooling around annotations
Everything.
Can configure the number of replicas of a pod
Will be scheduled across applicable nodes
Replication controllers can change replica value to scale up/down
Which pods are scaled depends on RC selector
Labels and selectors are the backbone of grouping
Can even do complicated things with labels/RCs
Take a pod out of cluster by changing its label
Can have multiple RCs so no SPOF
If a pod is unhealth (health prob), RC can kill and restart
apiVersion: v1
kind: ReplicationController
metadata:
labels:
app: inspector
track: stable
name: inspector
spec:
replicas: 1
selector:
app: inspector
track: stable
template:
metadata:
labels:
app: inspector
track: stable
spec:
containers:
- image: b.gcr.io/kuar/inspector:1.0.0
imagePullPolicy: IfNotPresent
name: inspector
resources: {}
terminationMessagePath: /dev/termination-log
dnsPolicy: ClusterFirst
restartPolicy: Always
status:
observedGeneration: 1
replicas: 1
On your cluster, run the following command:
curl -s -L https://raw.githubusercontent.com/christian-posta/docker-kubernetes-workshop/master/demos/replication-controllers/inspector-rc.yaml | kubectl create -f -
Then do:
kubectl get rc
You should see:
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS inspector inspector b.gcr.io/kuar/inspector:1.0.0 app=inspector,track=stable 1
We want to change the value of the replicas
for our inspector
replication controller to scale up:
kubectl scale rc inspector --replicas=5
If successful, you should see:
scaled
Now do a "get pods" to see
kubectl get pod
NAME READY STATUS RESTARTS AGE inspector-3sh6q 1/1 Running 0 13m inspector-5djgp 1/1 Running 0 13m inspector-9b68m 1/1 Running 0 13m inspector-ohaod 1/1 Running 0 13m inspector-r5bqo 1/1 Running 0 19m
Let’s scale the inspector app down to two pods:
kubectl scale rc inspector --replicas=2
Now when you get the pods, there should just be two. Kubernetes decides which ones to kill:
NAME READY STATUS RESTARTS AGE inspector-ohaod 1/1 Running 0 15m inspector-r5bqo 1/1 Running 0 20m
Ongoing work in the community
Use existing monitoring tools to provide scaling
Pods have their own IP address
Pods can come and go; they’re fungible
Replication controllers maintain replica invariants
So you have lots of Pods and IPs, which do you use?
Use labels for grouping
Kubernetes services does the heavy lifting of finding Pods
Decouple providers and accessors of services
Don’t depend on Pod IPs directly
Need to find pods that match certain selection criteria
mmm… labels and selectors again :)
Pods can live on any node
Use a single IP that doesn’t change
Virtual IP load balancing and discovery
apiVersion: v1
kind: Service
metadata:
name: inspector
labels:
app: inspector
spec:
type: NodePort
selector:
app: inspector
ports:
- name: http
nodePort: 36000
port: 80
protocol: TCP
Use selectors to locate existing kubernetes pods
Use a service w/out selectors to point to external services
DB running outside of Kubernetes
---
kind: "Service"
apiVersion: "v1"
metadata:
name: "my-service"
spec:
ports:
-
protocol: "TCP"
port: 80
targetPort: 9376
Endpoint objects will not be created; will need to make those and point them where you want:
---
kind: "Endpoints"
apiVersion: "v1"
metadata:
name: "my-service"
subsets:
-
addresses:
-
IP: "1.2.3.4"
ports:
-
port: 80
When a new service/endpoint object created in etcd
kube-proxy opens a port on the node
Connections to that port will be proxied to the pods
kube-proxy maintains iptables routing from the clusterIP (VIP) to the node port
Example, UI pods need to use DB pods *Environment variables
{svcname}_SERVICE_HOST
{svcname}_SERVICE_PORT
implies an ordering requirement: service must have been created first!
DNS (as a cluster addon — recommended)
{svcname}.{nsname}
eg: my-service.my-namespace would be how you find a service in the my-namespace namespace
Example:
REDIS_MASTER_SERVICE_HOST=10.0.0.11
REDIS_MASTER_SERVICE_PORT=6379
REDIS_MASTER_PORT=tcp://10.0.0.11:6379
REDIS_MASTER_PORT_6379_TCP=tcp://10.0.0.11:6379
REDIS_MASTER_PORT_6379_TCP_PROTO=tcp
REDIS_MASTER_PORT_6379_TCP_PORT=6379
REDIS_MASTER_PORT_6379_TCP_ADDR=10.0.0.11
ClusterIP — default, create an internal IP that can be used within pods for service discovery
NodePort — open a port on all of the nodes that the service listens on (can be accessed from outside the cluster)
set type == “NodePort”
Will allocate a port in the default range (can be configured): 30000-32767
will set spec.ports.nodePort for you
can set it manually and it will try to use this fixed port, or it will fail
can then use your own loadbalancers and point to this specific port (external access)
LoadBalancer — useful for cloud providers that expose external loadbalancers
http://kubernetes.io/v1.0/docs/user-guide/services.html#type-loadbalancer
---
kind: "List"
apiVersion: "v1"
items:
- kind: "Service"
apiVersion: "v1"
metadata:
name: "mock"
labels:
app: "mock"
status: "replaced"
spec:
ports:
- protocol: "TCP"
port: 99
targetPort: 9949
selector:
app: "mock"
- kind: "ReplicationController"
apiVersion: "v1"
metadata:
name: "mock"
labels:
app: "mock"
status: "replaced"
spec:
replicas: 1
selector:
app: "mock"
template:
metadata:
labels:
app: "mock"
spec:
containers:
- name: "mock-container"
image: "kubernetes/pause"
ports:
- containerPort: 9949
protocol: "TCP"
Assuming you still have the replication controller from the previous section, let’s create a service that
exposes the inspector
app (https://github.com/kelseyhightower/inspector/) directly on the Nodes.
curl -s -L https://raw.githubusercontent.com/christian-posta/docker-kubernetes-workshop/master/demos/services/inspector-svc.yaml | kubectl create -f -
This service exposes to a NodePort on the Node. To see which port exactly it is exposed:
kubectl get svc/inspector -o template -t "{{.spec.ports}}"
Then you can query the Node (by IP if using vagrant) and the port. If you do this from outside the VMs, make sure to map your VM port to your host.
[vagrant@kubernetes-master ~]$ curl -G http://10.245.1.3:30727/
<!DOCTYPE html> <html lang="en"> <head> <link rel="stylesheet" href="css/bootstrap.min.css"> </head> <div class="container"> <body> <h1>Inspector</h1> <p> <b>Home</b> | <a href="./env">Environment</a> | <a href="./mnt">Mounts</a> | <a href="./net">Network</a> </p> <hr/> <p>Version: Inspector 1.0.0</p> </ul> </body> </div> </html>
Try running the curl command pointing to the /net path. For each call we can inspect what network information gets returned. And if you have more than 1 replica, you should see that network information change as we access different pods behind the service (load balanced):
[vagrant@kubernetes-master ~]$ curl -G http://10.245.1.3:30727/net <snipped> <tr> <td>eth0</td> <td>02:42:0a:f6:01:0e</td> <td><ul class="list-unstyled"> <li>10.246.1.14/24</li> <li>fe80::42:aff:fef6:10e/64</li> </ul> </td> </tr> </snipped>
|