kubectl get namespaces
Principal Middleware Architect Blog: http://blog.christianposta.com Twitter: @christianposta Email: christian@redhat.com |
Committer on Apache ActiveMQ, Apache Camel, Fabric8
Technology evangelist, recovering consultant
Spent a lot of time working with one of the largest Microservices, web-scale, unicorn companies
Frequent blogger and speaker about open-source, cloud, microservices
Intro / Prep Environments
Day 1: Docker Deep Dive
Day 2: Kubernetes Deep Dive
Day 3: Advanced Kubernetes: Concepts, Management, Middleware
Day 4: Advanced Kubernetes: CI/CD, open discussions
Containers run on single Docker host
Containers are ephemeral
Nothing watchdogs the containers
Containers can have external persistence
Containers do not contain
Operating system matters
Smart placement
How to interact with a system that does placement
Different than configuration management
Containers will fail
Scaling
Managing containers by hand is harder that VMS: won’t scale
Automate the boilerplate stuff
Runbooks → Scripts → Config management → Scale
Decouple application from machine!
Applications run on "resources"
Kubernetes manages this interaction of applications and resources
Manage applications, not machines!
What about legacy apps?
Simplicity, Simplicity, Simplicity
Pods
Labels / Selectors
Replication Controllers
Services
API
Immutable infrastructure
DevOps
CI/CD
Who cares: give me a platform to move faster!!!
Divide cluster across uses, tiers, and teams
Unique within a namespace; not across multiple namespaces
Very powerful when combined with Labels
Example: qa/dev/prod can be implemented with Namespaces
List the namespaces available to the cluster
kubectl get namespaces
List all the pods across all the namespaces
kubectl get pods --all-namespaces
Let’s create a new namespace for our guestbook
application:
curl -s -L https://raw.githubusercontent.com/christian-posta/docker-kubernetes-workshop/master/demos/guestbook/namespace.yaml | kubectl create -f -
Let’s list the pods in the guestbook
namespace, hint: there shouldn’t be any at the moment:
kubectl get pods --namespace=guestbook
You can log into multiple kubernetes clusters with the same client and switch between clusters/contexts at the command line. You can also specify which namespaces to use when pointing to specific clusters. For example, to view the current cluster context:
kubectl config view
Sample output:
- context:
cluster: master-fuse-osecloud-com:8443
namespace: microservice
user: admin/master-fuse-osecloud-com:8443
name: microservice/master-fuse-osecloud-com:8443/admin
- context:
cluster: vagrant
user: vagrant
name: vagrant
current-context: vagrant
kind: Config
preferences: {}
users:
- name: admin/master-fuse-osecloud-com:8443
user:
token: kZ_L5Oj5sJ8nJUVJD4quq813Q1pRv4yZWhOjuJEw79w
- name: vagrant
user:
client-certificate-data: REDACTED
client-key-data: REDACTED
password: vagrant
username: vagrant
We can create a new context that points to our vagrant cluster:
kubectl config set-context guestbook --namespace=guestbook --user=vagrant --cluster=vagrant
Now, let’s switch to use that context so we can put any new pods/RCs into this new namespace:
kubectl config use-context guestbook
Now double check we’re in the new context/namespace:
kubectl config view | grep current-context | awk '{print $2}'
Now let’s deploy a replication controller
curl -s -L https://raw.githubusercontent.com/christian-posta/docker-kubernetes-workshop/master/demos/guestbook/frontend-controller.yaml | kubectl create -f -
Now let’s see how many pods we have:
kubectl get pods
NAME READY STATUS RESTARTS AGE frontend-juz6j 0/1 Pending 0 5s
We have two good ways to group components for development purposes and then clean them up when you want to start over.
Use Kubernetes labels
Use namespaces
You can delete all resources in a namespace like this:
kubectl config use-context vagrant
kubectl delete namespace guestbook
This approach works fine for local development and grouping. In shared environments the best approach is to properly lable your components (services, RCs, pods, etc) and delete them using labels:
kubectl delete all -l "label=value"
Most objects are in a namespace
pods
replication controllers
services
Namespaces themselves not in namespace
Nodes, PersistentVolumes
If the API Server has ResourceQuota
passed to the kube-apiserver
's --admission_control
argument, then a
namespace can set a ResourceQuota object to limit resources.
Example from the vagrant/master:
root 6055 0.0 0.0 3172 48 ? Ss 00:04 0:00 /bin/sh -c /usr/local/bin/kube-apiserver --address=127.0.0.1 --etcd_servers=http://127.0.0.1:4001 --cloud_provider=vagrant --runtime_config=api/v1 --admission_control=NamespaceLifecycle,NamespaceExists,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota --service-cluster-ip-range=10.247.0.0/16 --client_ca_file=/srv/kubernetes/ca.crt --basic_auth_file=/srv/kubernetes/basic_auth.csv --cluster_name=kubernetes --tls_cert_file=/srv/kubernetes/server.cert --tls_private_key_file=/srv/kubernetes/server.key --secure_port=443 --token_auth_file=/srv/kubernetes/known_tokens.csv --bind-address=10.245.1.2 --v=2 --allow_privileged=False 1>>/var/log/kube-apiserver.log 2>&1
Pods must use Resource Limits or will fail to accept the Pod (can use a LimitRange to add default limits)
Admin creates a ResourceQuota for the namespace
If a Pod would cause the Resource Limits to breach, the pod is rejected
If the aggregate Resource Limits are set higher than actual available resources, first-come first-serve
You can organize your Nodes based on classifications/tiers/resource types. For example, for some data-intensive applications you may wish to request that the scheduler put those pods on nodes that have SSD storage/PV support:
kubectl label nodes node-foo disktype=ssd
Now if you add a node selector section to your Pod, the pod will only end up on nodes with the disktype=ssd
label
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
nodeSelector:
disktype: ssd
Appropriate boundaries between cluster, pods, users who manage cluster/application developers
Appropriate boundaries enforced between containers and hosts (via docker/linux cap/selinux/apparmor/etc)
Ability to delegate administrative functions to users where it makes sense
Hide credentials/keys/passwords from others
Administration/Full authority
Project/namespace admin
Developer
--client_ca_file
— used to allow authentication via client certificates
--token_auth_file
— allow authentication via tokens; tokens are long-lived and cannot be refreshed (atm)
--basic_auth_file
— HTTP basic httpswd file
The four attributes that apply to authorization measures:
The user (as authenticated already)
Read only/Write — GET commands are readonly
The resource in question (pod/RC/service,etc)
The namespace
Specifying policies: when starting the API server, pass a single-line JSON file to --authorization_policy_file
{"user":"ceposta"}
{"user":"ceposta", "resource": "pods", "readonly": true}
{"user":"ceposta", "resource": "events"}
{"user":"ceposta", "resource": "pods", "readonly": true, "ns": "projectBalvenie"}
This file is only reloaded when restarting API server |
Service accounts vs User accounts
User accounts for humans; service accounts for services w/in Pods
Service accounts are "namespaced"
Service account creation is much simpler/lightweight vs User creation
Allow services to access the Kubernetes API
Acts as part of the API server, decorates pods with Service Account information:
Will assign default
Service Account if one not specified
Will reject a Service Account if it specified and does not exist
Add ImagePullSecrets (for private repos)
Adds volume for token-based API access (secret)
Runs synchronously when pods are created
Image secrets
Secret Volumes
Service accounts actually use secrets to pass API tokens
Can pass sensitive data
passwords
keys
certificates
apiVersion: v1
kind: Secret
metadata:
name: mysecret
type: Opaque
data:
password: dmFsdWUtMg0K
username: dmFsdWUtMQ0K
Secret "keys" in the map above, must follow DNS subdomain naming convention. The values are base64 encoded |
---
apiVersion: "v1"
kind: "Pod"
metadata:
name: "mypod"
namespace: "myns"
spec:
containers:
-
name: "mypod"
image: "redis"
volumeMounts:
-
name: "foo"
mountPath: "/etc/foo"
readOnly: true
volumes:
-
name: "foo"
secret:
secretName: "mysecret"
local, host-only bridge (docker0)
create new adapters to the bridge (veth) for each container that’s created
veth is mapped to eth0 on a container
eth0 is assigned an IP from the range dedicated to the virtual bridge
result: docker containers can talk to each other only on the same machine
containers on different hosts could have the exact same IP
in order for docker containers to communicate across hosts, they need to allocate ports on the host
this means containers must coordinate appropriately, etc or allocate dynamically (and know when not to run out of ports)
this is difficult to do, doesn’t scale very well
dynamic port allocation tricky — now each app MUST take a “port” parameter and configured at runtime
all pods can communicate with other pods w/out any NAT
all nodes can communicate with pods without NAT
the IP the pod sees is the same IP seen outside of the pod
cannot take docker hosts out of the box and expect kube to work
this is a simpler model
reduces friction when coming from VM environments where this is more or less true
Flat networking space
So the transition is consistent VM→Pod
No additional container or application gymnastics /NAT/etc to have to go through each time you deploy
Pods have their own “port space” independent of other pods
Don’t need to explicitly create “docker links” between containers (would only work on a single node anyway)
Otherwise, dynamic allocation of ports on Host every time a pod needs a port gets very complicated for orchestration and scheduling
exhaustion of ports
reuse of ports
tricky app config
watching/cache invalidation
redirection, etc
conflicts
NAT breaks self-registration mechanisms, etc
IP address visible inside and outside of the container
Self-registration works fine as you would expect as does DNS
Implemented as a “pod container” which holds the network namespace (net) and “app containers” which join with Docker’s —net=container:<id>
In docker world, the IP inside the container is NOT what an entity outside of the container sees, even in another container
All containers behave as though they’re on a single host, i.e., they see the same ports and network. they can communicate with each other over localhost
Simplicity (well known ports, 80, 22, etc)
Security (ports bound on localhost are only visible within the pod/containers, never outside)
Performance (don’t have to take network stack penalties, marshaling, unmarhsaling, etc)
Very similar to running multiple processes in a VM host for example
Drawback: no container-local ports, could clash, etc. but these are minor inconveniences at the moment and workarounds are being implemented
However, pods come with the premise of shared resources (volumes, CPU, memory, etc) so a reduction in isolation is really expected. If you need isolation, use Pods not containers to achieve this.
Service IPs are VIP
kube-proxy alters iptables on the node to trap service IPs and redirect them to the correct backends
Simple, hi-performance, HA solution
This gets tricky
Need to set up external load balancer to fwd all service IPs and load balance against all nodes
The kube-proxy should trap that IP and send it to service?
Expose services directly to node hosts? —> suitable for poc type workloads, but not suitable for real prod workloads
AddOns implemented as Services and Repliction Controllers
Sky DNS used to implement DNS-addon
A pod that bridges between kubernetes services and DNS
A kubernetes service that is the DNS provider (ie, has an vIP, etc)
Kublet configured to decorate the pods with correct DNS server
Can configure the kubelet manually if not automatically set up:
--cluster_dns=<DNS service ip>
--cluster_domain=<default local domain>
A records are created for services in the form svc-name.ns-name.svc.cluster.local
Headless service (no clusterIP) are DNS round-robin
SRV records (discovering services and ports) _my-port-name._my-port-protocol.my-svc.my-namespace.svc.cluster.local
resolves to the hostname my-svc.my-namespace.svc.cluster.local
and the port
Log collector on each node
Implemented with fluentd, as a pod
Watches all containers' logs on that node and pump them to Elastic search cluster
Elasticsearch can be queried via Kibana
Quick Demo?
Need visibility into the cluster as an aggregate and individually where appropriate
cAdvisor
Heapster
Influxdb/Prometheus/Graphite
Grafana
HA master nodes
etcd datastore
Replicated, load-balanced, API server
Elected scheduler and controllers
Open source project started at CoreOS
Distributed database
CAP Theorem? == CP
Raft algorithm/protocol
watchable
etcd provides HA datastore
Run kubelet on the masters to monitor that API server process and restart on failure
systemctl enable kubelet and systemctl enable docker.
Replicated etcd
Run shared storage locations for each of the etcd nodes
Network loadbalancers over the API servers
Run podmaster
which coordinates a lease-lock election using etcd
Canary release
Blue/green deployment
A/B testing
Rolling upgrade/rollback
Can do with labels (exclude certain labels)
Example:
App to deploy:
labels: app: guestbook tier: frontend track: canary
Existing set of apps:
labels: app: guestbook tier: frontend track: stable
Service selector
selector: app: guestbook tier: frontend
Have two separate replication controllers, one blue, one green
Have labels "color=green", "color=blue"
Service selector = "color=green"
Change selector to "color=blue" to switch
Can switch back
Bring up a container with the new version, same fleet of containers
Bring down one of the old version
Bring up a second container with the new version
Repeat
To be aware: potentially scaling while doing Rolling Updates
Use replication controllers to control the number of replicas at a given step
Use `kubectl rolling-update
Replaces an old RC with a new RC
Must be in same namespace
Share at least one label name, different value
Example:
kubectl rolling-update frontend-v1 -f frontend-v2.json
What happens if a failure is introduced part-way through the Rolling update:
Kubernetes keeps track and annotates the RC with info:
kubernetes.io/desired-replicas
the number that this replica controller needs to get to
kubernetes.io/update-partner
Who’s the other half of the replica-set
Recovery is achieved by running the same command again
While size of foo-next < desired-replicas annotation on foo-next
increase size of foo-next
if size of foo > 0 decrease size of foo
Goto Rename
Set of maven goals for managing docker builds and containers
Can be run as part of a CI/build step in your existing build or CI pipelines
Requires access to a Docker Daemon for builds
Can build images, start/stop containers, etc
docker:start Create and start containers
docker:stop Stop and destroy containers
docker:build Build images
docker:watch Watch for doing rebuilds and restarts
docker:push Push images to a registry
docker:remove Remove images from local docker host
docker:logs Show container logs
mvn package docker:build
Can build a docker image as part of mvn lifecycle
Package files from project (build artifacts, configs, etc) into docker image
Which files are selected using maven-assembly-plugin
Selected files are inserted into base image at specified location
default /maven
See the assembly descriptor file format
Once image is built, can use maven-failsafe-plugin to run integration tests
<configuration>
<images>
<image>
<alias>service</alias>
<name>jolokia/docker-demo:${project.version}</name>
<build>
<from>java:8</from>
<assembly>
<descriptor>docker-assembly.xml</descriptor>
</assembly>
<ports>
<port>8080</port>
</ports>
<cmd>
<shell>java -jar /maven/service.jar</shell>
</cmd>
</build>
<run>
<ports>
<port>tomcat.port:8080</port>
</ports>
<wait>
<url>http://localhost:${tomcat.port}/access</url>
<time>10000</time>
</wait>
<links>
<link>database:db</link>
</links>
</run>
</image>
<image>
<alias>database</alias>
<name>postgres:9</name>
<run>
<wait>
<log>database system is ready to accept connections</log>
<time>20000</time>
</wait>
</run>
</image>
</images>
</configuration>
Can watch for changes in project and rebuild
Rebuild docker image
Re-start existing running container
Fast development feedback/loop
mvn package docker:build docker:watch -Ddocker.watchMode=build
mvn docker:start docker:watch -Ddocker.watchMode=run
*
<configuration>
<!-- Check every 10 seconds by default -->
<watchInterval>10000</watchInterval>
<!-- Watch for doing rebuilds and restarts -->
<watchMode>both</watch>
<images>
<image>
<!-- Service checks every 5 seconds -->
<alias>service</alias>
....
<watch>
<interval>5000</interval>
</watch>
</image>
<image>
<!-- Database needs no watching -->
<alias>db<alias>
....
<watch>
<mode>none</mode>
</watch>
</image>
....
</images>
</configuration>
fabric8:json
fabric8:apply
fabric8:rolling
fabric8:devops
fabric8:create-routes
fabric8:recreate
Generates kubernetes.json
file based on Maven settings
Can generate ReplicationController/Services/Pods
Attaches kubernetes.json
and versions as part of the build
Will be included in the artifacts uploaded to artifact repo
Options
Hand-generate your own file and let mvn coordinates be applied
Use default mvn properties and let fabric8:json generate the json file
Use annnotation processors and typesafe DSL builders directly
Enrich the generated JSON with additional stuff
<project>
...
<properties>
<fabric8.env.FOO>bar</fabric8.env.FOO>
...
</properties>
...
</project>
docker.image
Used by the docker-maven-plugin to define the output docker image name.
fabric8.combineDependencies
If enabled then the maven dependencies will be scanned for any dependency of <classifier>kubernetes</classifier> and <type>json</type> which are then combined into the resulting generated JSON file. See Combining JSON files
fabric8.container.name
The docker container name of the application in the generated JSON. This defaults to ${project.artifactId}-container
fabric8.containerPrivileged
Whether the generated container should be run in priviledged mode (defaults to false)
fabric8.env.FOO
= BAR Defines the environment variable FOO and value BAR.
fabric8.extra.json
Allows an extra JSON file to be merged into the generated kubernetes json file. Defaults to using the file target/classes/kubernetes-extra.json.
fabric8.generateJson
If set to false then the generation of the JSON is disabled.
fabric8.iconRef
Provides the resource name of the icon to use; found using the current classpath (including the ones shipped inside the maven plugin). For example icons/myicon.svg to find the icon in the src/main/resources/icons directorty. You can refer to a common set of icons by setting this option to a value of: activemq, camel, java, jetty, karaf, mule, spring-boot, tomcat, tomee, weld, wildfly
fabric8.iconUrl
The URL to use to link to the icon in the generated Template.
fabric8.iconUrlPrefix
The URL prefix added to the relative path of the icon file
fabric8.iconBranch
The SCM branch used when creating a URL to the icon file. The default value is master.
fabric8.imagePullPolicy
Specifies the image pull policy; one of Always, Never or IfNotPresent, . Defaults to Always if the project version ends with SNAPSHOT otherwise it is left blank. On newer OpenShift / Kubernetes versions a blank value implies IfNotPresent
fabric8.imagePullPolicySnapshot
Specifies the image pull policy used by default for SNAPSHOT maven versions.
fabric8.includeAllEnvironmentVariables
Should the environment variable JSON Schema files, generate by the fabric-apt API plugin be discovered and included in the generated kuberentes JSON file. Defaults to true.
fabric8.includeNamespaceEnvVar
Whether we should include the namespace in the containers' env vars. Defaults to true
fabric8.label.FOO
= BAR Defines the kubernetes label FOO and value BAR.
fabric8.livenessProbe.exec
Creates a exec action liveness probe with this command.
fabric8.livenessProbe.httpGet.path
Creates a HTTP GET action liveness probe on with this path.
fabric8.livenessProbe.httpGet.port
Creates a HTTP GET action liveness probe on this port.
fabric8.livenessProbe.httpGet.host
Creates a HTTP GET action liveness probe on this host.
fabric8.livenessProbe.port
Creates a TCP socket action liveness probe on specified port.
fabric8.namespaceEnvVar
The name of the env var to add that will contain the namespace at container runtime. Defaults to KUBERNETES_NAMESPACE.
`fabric8.parameter.FOO.descriptio`n Defines the description of the OpenShift template parameter FOO.
fabric8.parameter.FOO.value
Defines the value of the OpenShift template parameter FOO.
fabric8.port.container.FOO
= 1234 Declares that the pod’s container has a port named FOO with a container port 1234.
fabric8.port.host.FOO
= 4567 Declares that the pod’s container has a port port named FOO which is mapped to host port 4567.
fabric8.provider
The provider name to include in resource labels (defaults to fabric8).
fabric8.readinessProbe.exec
Creates a exec action readiness probe with this command.
fabric8.readinessProbe.httpGet.path
Creates a HTTP GET action readiness probe on with this path.
fabric8.readinessProbe.httpGet.port
Creates a HTTP GET action readiness probe on this port.
fabric8.readinessProbe.httpGet.host
Creates a HTTP GET action readiness probe on this host.
fabric8.readinessProbe.port
Creates a TCP socket action readiness probe on specified port.
fabric8.replicas
The number of pods to create for the Replication Controller if the plugin is generating the App JSON file.
fabric8.replicationController.name
The name of the replication controller used in the generated JSON. This defaults to ${project.artifactId}-controller
fabric8.serviceAccount
The name of the service account to use in this pod (defaults to none)
fabric8.service.name
The name of the Service to generate. Defaults to ${project.artifactId} (the artifact Id of the project)
fabric8.service.port
The port of the Service to generate (if a kubernetes service is required).
fabric8.service.type
The type of the service. Set to "LoadBalancer" if you wish an external load balancer to be created.
fabric8.service.containerPort
The container port of the Service to generate (if a kubernetes service is required).
fabric8.service.protocol
The protocol of the service. (If not specified then kubernetes will default it to TCP).
fabric8.service.port.<portName>
The service port to generate (if a kubernetes service is required with multiple ports).
fabric8.service.containerPort.<portName>
The container port to target to generate (if a kubernetes service is required with multiple ports).
fabric8.service.protocol.<portName>
The protocol of this service port to generate (if a kubernetes service is required with multiple ports).
fabric8.volume.FOO.emptyDir
= somemedium Defines the emtpy volume with name FOO and medium somemedium.
fabric8.volume.FOO.hostPath
= /some/path Defines the host dir volume with name FOO.
fabric8.volume.FOO.mountPath
= /some/path Defines the volume mount with name FOO.
fabric8.volume.FOO.readOnly
Specifies whether or not a volume is read only.
fabric8.volume.FOO.secret
= BAR Defines the secret name to be BAR for the FOO volume.
Takes the kubernetes.json
from fabric8:json and "applies" it to kubernetes
Synonymous with kubectl create -f <resource
Can be applied part of mvn build/mvn lifecycle
Just configure these environment variables KUBERNETES_MASTER - the location of the kubernetes master KUBERNETES_NAMESPACE - the default namespace used on operations
mvn fabric8:apply -Dfabric8.recreate=true \
-Dfabric8.domain=foo.acme.com -Dfabric8.namespace=cheese
fabric8.apply.create
Should we create new resources (not in the kubernetes namespace). Defaults to true.
fabric8.apply.servicesOnly
Should only services be processed. This lets you run 2 builds, process the services only first; then process non-services. Defaults to false.
fabric8.apply.ignoreServices
Ignore any services in the JSON. This is useful if you wish to recreate all the ReplicationControllers and Pods but not recreate Services (which can cause PortalIP addresses to change for services which can break some Pods and could cause problems for load balancers. Defaults to false.
fabric8.apply.createRoutes
If there is a route domain (see fabric8.domain) then this option will create an OpenShift Route for each service for the host expressio: ${servicename}.${fabric8.domain}. Defaults to true.
fabric8.domain
The domain to expose the services as OpenShift Routes. Defaults to $KUBERNETES_DOMAIN.
fabric8.namespace
Specifies the namespace (or OpenShift project name) to apply the kubernetes resources to. If not specified it will use the
Demo time!!
|