kubernetes api公约

欢迎加入本站的kubernetes技术交流群,微信添加:加Blue_L。


api规范

前言

本问主要讲解整个kubernetes api的规范公约,本文译自kubernetes社区的本篇文档。译这篇文章是因为在与kubernetes打交道曾多次参考,非常有价值,可以帮助我们更好的理解kubernetes api。中文有的地方表达不是很完善,最好可以对照原文看。下面是部分译文。

本文档面向想深度理解kubernetes api结构,以及扩展kubernetes的开发人员。

目录

本kubernetes api规约目的是简化客户端开发,保证在各种使用场景中有一个一致的使用方式。

kubernetes api是一般的RESTful风格 – 通过HTTP动词(POST,PUT,DELETE和GET)实现对象的创建,更新,删除或查询 – 这些api优先接收和返回JSON。kubernetes也为非标准的动作暴露了额外的接口,允许额外的内容类型(译注:如log,exec)。所有接收和返回的JSON都有一个结构说明(schema),由“apiVersion”和“kind”这两个字段表示。

下面解释一下相关术语:

  • Kind 特定对象的结构说明(如猫和狗有不同的属性与特性)
  • Resource 使用JSON表示的系统实体对象,通过HTTP发送或接收。Resources的暴露通过:
    • Collections – 相同类型资源的列表,可以用来查询
    • Elements – 一个单独的资源,可以通过URL寻址
  • API Group 暴露出来的一系列资源的集合,通过“apiVersion”字段来表示,如“policy.k8s.io/v1”

每个资源通常接收和返回单一类型(kind)。类型(kind)可以被多个资源(resources)接收和返回。例如,Pod类型通过“pods”资源暴露,允许用户创建,更新和删除pods,同时一个单独的“pod status”资源(that acts on “Pod” kind),允许自动化的程序来更新那个资源的部分子集字段。(译注:不同资源是在etcd分别存储的)

资源是和API组绑定在一起的 – 每个组可以有一个或多个版本,内个组下的不同版本下可以有一个或多个资源。组的名称通常使用域名表示,kubernetes项目本身保留使用了空的组名称、所有的单个单词名称以及所有已“*.k8s.io”结尾的名称。当选择组的名称时,推荐使用你所在组织的子域名,例如“widget.mycompany.com”。

版本字符应该满足DNS_LABEL格式。

资源集合(resource collections)应全部使用小写字符,并且使用复数形式,而类型(kind)应该使用驼峰形式,并且使用单数形式。组必须使用先写形式,而且必须是合法的DNS子域名。

类型(Kinds)

类型分为三种不同的类别:

  1. Objects代表系统中的一个可持久化的(资源对象)实体。

创建api对象的目的是为了记录 – 一旦创建,系统将确保资源存在。所有API对象都有通用的(common)元数据信息(译注:ObjectMeta)。

示例:PodReplicationSetServiceNamespaceNode

  1. Lists代表资源的集合,可以是一个(一般情况)类型或多个(偶尔)类型。列表(List)的名称必须以“List”结尾。列表可以有一组有限的通用元数据。所有的列表都使用必要的”items”字段容纳返回的实际对象列表。

系统内定义的大多数对象都应该有一个返回完整资源的接口,同时可以有返回部分列表的接口。一些对象可能是单例的(如当前的user),可能没有列表。

此外,所有的列表在通过labels返回对象时都应该支持使用label过滤(参见labels文档),大多数列表应该支持使用支持fields过滤(参见fields文档)。

示例:PodListServiceListNodeList

  1. Simple是给特定对象的特定动作使用的,以及非持久化对象实体。

鉴于它们的范围有限,它们具有与列表相同的一组有限的通用元数据。

例如,当有错误出现时,会返回“Status”类型,它不存储在系统中。

许多simple资源是子资源,坐落在(rooted)某个特定资源api路径下。当资源想要暴露额外的动作,或与某资源有紧密的耦合,应该使用新的子资源来这样做,常见的子资源包括:

  • /binding:用于将用户请求的资源(例如 Pod、PersistentVolumeClaim)绑定到集群基础设施资源(例如 Node、PersistentVolume)。
  • /status:仅用于写资源的status部分,来更新资源的状态。例如,/pods接口仅允许更新metadataspec部分,因为这两部反映应用户的意图。一个自动化的程序应能向/pods/<name>/status接口向server发送一个更新过的Pod对象来修改状态,以便能够让用户进行观测。该额外接口允许将不同的规则应用于更新,并适当限制访问。
  • /scale:用于以独立于特定资源模式的方式读取和写入资源的计数。

此外,还有两个子资源,proxyportforward,提供了访问额外的访问集群的方式。

标注的REST动词(定义如下)必须返回必须返回单个 JSON 对象。一些API接口可以会偏离严格的REST模式并且返回不是单一JSON对象的资源,如JSON对象流或非结构化的文本日志数据。

一组通用的“元数据”API对象在所有API组中使用,因此被视为meta.k8s.io的API组的一部分。这些类型可能会独立于使用它们的API组而发展,并且API服务器可能允许以它们的通用形式对它们进行寻址。示例是ListOptions、DeleteOptions、List、Status、WatchEvent和Scale。由于历史原因,这些类型是每个现有API组的一部分。配额、垃圾回收、自动缩放器等通用工具和kubectl等通用客户端利用这些类型来定义跨不同资源类型的一致行为,例如编程语言中的接口。

Resources

API返回的所有JSON对象必须具有以下字段:

  • kind:一个字符串,标识该对象应具有的模式
  • apiVersion:一个字符串,用于标识对象应具有的架构版本

这些字段是正确解码对象所必需的。默认情况下,它们可能由服务器从指定的URL路径填充,但客户端可能需要知道这些值才能构造URL路径。

Objects

Metadata

每个对象的“metadata”下必须包含下列元数据字段:

  • namespace:namespace是dns兼容的标签,表示对象所在的命名空间。
  • name:表示对象在当前命名空间中的名称。名称可以用来对单个对象进行检索。
  • uid:一个唯一的时空值(通常是RFC 4122生成的标识符,参考),用于区分以删除和重新创建的同名对象。

每个对象的“metadata”下应该包含下列元数据字段:

  • resourceVersion 标识对象的内部版本,外部客户端可以根据该字段决定对象是否发生变化。该字段对客户端没有什么特别的意义,但是客户端需要原封不动的将该值传会给服务端(译注:etcd中的全局版本号)。资源版本在不同的命名空间,对于不同类型的资源和不同的服务端没有什么实际的含义(参见下面的并发控制)。
  • generation 一个代表特定特定期望状态的序号。由系统设置,单调递增,每个资源都有。可以进行比较(译注:spec变化)。
  • creationTimestamp:一个以RFC 3339标准表示的时间字符串,表示资源对象的创建事件。
  • deletionTimestamp:一个以RFC 3339标准表示的时间字符串,当超过这个时间后资源会被删除。当用户请求通过优雅的方式删除资源对象时,服务端会给该字段进行设值,客户端是不能够直接设置的。当超过设置的时间后,资源对象会被删除(就不能够通过列表查询到,也不能通过名称获取到),除非该对象有finalizer。在有finalizer的情况下,对象的删除会至少推迟到所有的finalizer移除掉。该字段一旦设置了值,就不能(may not)取消设置,或设置为更后面的时间,尽管它可能会被缩短或在此之前可能就会删除资源。
  • labels:一组由key-value对组成的字典,用来组织资源对象和分类使用。
  • annotations:一组由key-value对组成的字典,外部工具可以使用该字段存储和检索任意关于该对象的元数据信息(参见annotations文档)。

labels的目的是用来给终端用户组织资源使用。annotations可以给第三方自动化程序和工具提供额外的元数据使用能力。

Spec和Status

按照惯例,kubernetes api对资源对象的目标状态的规范(spec)和资源对象在当前的状态(status)做区分。

规范(spec)是对一个对象目标状态的完整描述,包括用户提供的配置,系统填充的默认值,以及在资源对象创建后由生态内的其他组件(如scheduler)进行初始化或修改的值,然后会持久化到稳定的存储介质中。

状态(status)总结描述对象的当前状态,并且通常通过自动化过程与对象一起持久化,但可能会即时(on the fly)生成。作为一般准则,状态字段应该是对实际状态的最新观察,但它们可能包含诸如分配结果或响应对象规范执行的类似操作之类的信息。 请参阅下面的更多细节。

具有规范和状态的类型可以(并且通常应该)具有不同的授权范围。允许用户被授予对规范的完全写入访问权限和对状态的只读访问权限,而相关控制器被授予对规范的只读访问权限但对状态的完全写入访问权限。

当对象的新版本被创建(POST)或更新(PUT)时,规范会更新并立即可用。随着时间的推移,系统将努力使状态符合规范。无论先前版本如何,系统都将朝着最新的规范发展。例如,如果一次更新将值从2更改为5,然后在另一个更新中又回到3,则系统不需要在将状态更改为3之前先到达5。换句话说,系统的行为是基于水平的,而不是基于边缘的。这可以在缺少中间状态更改的情况下实现稳健的行为。

Kubernetes API还作为系统声明性配置模式的基础。为了促进声明性配置的基于级别的操作和表达,规范中的字段应该具有声明性而不是命令性的名称和语义——它们代表所需的状态,而不是旨在产生所需状态的操作。

对象上的创建(POST)和更新(PUT)必须忽略状态(status)值,以避免在读-修改-写场景中意外覆盖状态。必须提供/status子资源以使系统组件能够更新它们管理的资源的状态。

此外,更新(PUT)期望是指定整个对象。因此,如果省略了某个字段,则假定客户端想要清楚这个字段的值。PUT不支持部分更新。要想修改资源对象的部分内容,可以先获取(GET)资源,修改资源的speclalelsannotations,然后再PUT给服务端。参加下面的并发控制了解读-修改-写模式的一致性。某些对象可能会公开额外资源,以允许更改状态或对对象执行自定义操作。

所有代表物理资源的对象,其状态可能与用户期望的不同,都应该有一个规范(spec)和一个状态(status)。 状态不会与用户期望的不同的对象,可以只有规范(spec),并且可以将“spec”修改为更合适的名称(译注:比如configmap)。

同时包含规范状态的对象不应包含标准元数据字段以外的其他顶级字段。

一些不会在在系统中持久化的对象 – 例如SubjectAccessReview和其他 webhook风格调用 – 可能会选择添加规范状态来封装“调用和响应”模式。规范是请求(通常是信息请求),状态是响应。对于这些类似RPC的对象,唯一的操作可能是POST,但是在提交和响应之间具有一致的模式可以降低这些客户端的复杂性。

Typical status properties

条件(Conditions) 为控制器的高层级的状态报告提供了一个标准机制。它们是一种扩展机制,允许工具和其他控制器收集有关资源的摘要信息,而无需了解特定于资源的状态详细信息。控制器要将挂测到的对象的详尽状态补充写入到条件中,而不是替换它。例如,部署的“可用”条件可以通过检查部署的就绪副本(readyReplicas)、副本(replicas)和其他属性来确定。“可用”条件避免其他组件重复编写判断Deployment可用性的逻辑。

资源对象的条件报告可以包含多个条件,在未来也可以添加新的条件,也可以由其他第三方控制器添加新条件。因此,条件时使用列表来表示,每个条件有一个类似的结构。该列表应实际上应当认为是一个map,以type为字典的key。

当条件遵循一致性的约定时,他们是最有用的:

  • 应添加条件以明确传达用户和组件关心的属性,而不是要求从其他观察中推断出这些属性。一旦定义,条件的含义就不能随意更改 – 它成为API的一部分,并且与API的任何其他部分具有相同的向后和向前兼容性考虑。
  • 控制器应在第一次看到资源对象时将其条件应用于资源对象,即使状态(stauts)为未知(Unknown)。这允许系统中的其他组件知道条件存在,并且控制器正在调和(reconcile)该资源方面取得进展(译注:让别人知道我正在处理)。
    • 并不是所有的控制器都会遵守关于报告“Unknown”或“False”值的建议。对于已知条件,条件状态的缺失应解释为与未知相同,通常表示协调尚未完成(或资源状态可能尚不可观察)。
  • 对于某些情况,True表示正常运行,而对于某些情况,False表示正常运行。(“正常-真”条件有时被称为具有“正极性”,而“正常-假”条件有时被称为具有“负极性”。)如果不进一步了解这些条件,就不可能计算出通用摘要资源的条件。
  • 条件类型(type)名称应该对人类有意义;作为一般规则,无论是正极性还是负极性都不能被推荐。 像“MemoryExhausted”这样的负面条件对于人类来说可能比“SufficientMemory”更容易理解。 相反,“Ready”或“Succeeded”可能比“Failed”更容易理解,因为“Failed=Unknown”或“Failed=False”可能会导致双重否定混淆。
  • 条件类型名称应该描述资源的当前观察状态,而不是描述当前状态转换。这通常意味着名称应该是形容词(“Ready”、“OutOfDisk”)或过去时动词(“Succeeded”、“Failed”)而不是现在时动词(“Deploying”)。可以通过将条件的状态(status)设置为未知(Unknown)来指示中间状态。
    • 对于需要很长时间(例如超过1分钟)的状态转换,将转换本身视为观察到的状态是合理的。在这些情况下,条件(例如“Resizing”)本身不应是瞬态的,而应使用True/False/Unknown模式发出信号。这允许其他观察者确定来自控制器的最后一次更新,是成功还是失败。在状态转换无法完成且继续协调不可行的情况下,应使用原因和消息来指示转换失败。
  • 在为资源设计条件时,有一个通用的顶级条件来概括更详细的条件会很有用。简单的消费者可以简单地查询顶级条件。尽管它们不是一致的标准,但API设计人员可以将ReadySucceeded条件类型分别用于长时间运行和有限执行的对象。

对于资源Foo,定义了FooCondition代表该资源的状态,可以包含下列字段,其中typestatus两个字段是必须有的,其他字段可以没有:

  Type               FooConditionType   `json:"type" description:"type of Foo condition"`
  Status             ConditionStatus    `json:"status" description:"status of the condition, one of True, False, Unknown"`

  // +optional
  Reason             *string            `json:"reason,omitempty" description:"one-word CamelCase reason for the condition's last transition"`
  // +optional
  Message            *string            `json:"message,omitempty" description:"human-readable message indicating details about last transition"`

  // +optional
  LastTransitionTime *unversioned.Time  `json:"lastTransitionTime,omitempty" description:"last time the condition transit from one status to another"`

以后可以添加新的字段。

不要使用你不需要的字段 – 简单更好。

鼓励使用Reason字段。

条件的类型(type)应该以驼峰形式命名,优先使用剪短的名称(如Ready,而不是MyResourceReady)。

条件的状态(status)值可以是TrueFalseUnknown。如果条件确实,应该认为其状态为Unknown。控制器如何处理Unknown取决于所讨论的条件。

围绕条件的思考随着时间的推移而发展,因此有几个广泛使用的非规范示例。

通常,条件值可能会来回变化,但某些条件转换可能是单调的,具体取决于资源和条件类型。然而,条件是观测而不是状态机,我们也没有为对象定义全面的状态机,也没有与状态转换相关的行为。该系统是基于level-based的而不是edge-triggered的,并且应该假设一个开放世界(译注:类似epoll的水平和边缘)。

例如对于类型为“Ready”的条件,该条件表明资源对象在最后一次探测时被认为是完全可操作的。对于一个可能的单调(monotonic)条件的类型可能是SucceededSucceededTrue状态(status)意味着完成并且资源不再处于活动状态。仍处于活动状态的对象通常具有状态为未知(Unknown)的成功(Succeeded)条件。

v1 API中的一些资源包含称为**phase**的字段,以及相关的messagereason和其他状态字段。不推荐使用phase的模式。较新的API类型应改为使用条件。Phase本质上是一个状态机枚举字段,它与系统设计原则相矛盾并阻碍了后续演化,因为添加新的枚举值会破坏向后兼容性。与其鼓励客户从phase推断隐含属性,我们更愿意明确地公开客户需要监控的各个条件。条件还具有这样的好处,即可以在所有资源类型中创建一些具有统一含义的条件,同时仍然公开特定资源类型独有的其他条件。有关更多详细信息和讨论,请参阅#7856

在条件类型以及它们出现在API中的其他任何地方,Reason旨在成为一个单词,CamelCase表示当前状态的原因类别,而Message旨在成为人类可读的短语或句子,可能包含个别事件的具体细节。 Reason旨在用于简洁的输出,例如单行kubectl get输出,以及总结发生的原因,而Message 旨在以详细的状态说明呈现给用户,例如kubectl describe`输出。

历史信息状态(例如,上次转换时间、失败次数)仅通过合理努力提供,不保证不会丢失。

一些资源在状态中报告observedGeneration,这说明状态反映的是针对最近观察到的generation的期望状态(译注:参考metadata中字段说明)。例如,这可用于确保报告的状态反映最新的所需状态。(译注:例如对spec修改,generation变化,status中反映的是对应generation的观测状态,想象一个资源的调和耗时5秒,创建后一秒进行修改)。

References to related objects

对松散耦合对象集的引用,例如由replication controller监管的pods,通常最好使用标签选择器来引用。为了确保单个对象的GET在时间和空间上保持有界,这些集合可以通过单独的 API查询进行查询,但不会扩展查询相关引用对象的状态(译注:说的是不使用类似外键的方式,而是通过标签进行关联)。

有关特定对象的引用,请参阅对象引用。

Lists of named subobjects preferred over maps

#2004和其他地方讨论过。任何API对象中都不使用字典容纳子对象。相反,约定是使用包含名称(name)字段的子对象列表。Kubernetes文档中更详细地描述了这些约定,以及如何更改lists、structs和maps的语义(semantics)。

例如:

ports:
  - name: www
    containerPort: 80

对比

ports:
  www:
    containerPort: 80

此规则保持API对象中所有JSON/YAML键的不可变性。唯一的例外是API中的纯映射(目前labels,selectors,annotations,data),而不是子对象集。

Primitive types

  • 尽量避免使用浮点值,绝不要在spec中使用浮点值。浮点值在传输过程中会被编码和重接解码,可能会发生变化,因此是不可靠的,并且在不同的语言和体系结构中具有不同的精度和表示。
  • 所有数字(例如,uint32、int64)会被Javascript和其他一些语言转换为float64,因此字段值的取值范围或精度上超过该值(译注:指float64)的字段(特别是整数值 > 53 位)都应该被序列化并作为字符串使用。
  • 不要使用无符号整数,因为跨语言和库的支持不一致。
  • 不要使用枚举值,而是使用string别名(如NodeConditionType)。
  • 查看API中的类似字段(例如,端口、持续时间)并遵循现有字段的约定(译注:尽量参考现有API)。
  • 所有公共整数型字段必须使用Go的(u)int32(u)int64类型,而不是(u)int(取决于目标平台,这是不明确的)。内部类型可以使用(u)int
  • 对于使用布尔类型字段,需要多加考虑。许多想法以布尔值开始,但最终趋向于一小部分互斥选项。通过将策略选项明确描述为字符串类型别名(例如TerminationMessagePolicy)来规划未来的扩展。

Constants

某些字段的值可能会被限定为一个枚举列表。这些值是字符串,它们将采用驼峰(CamelCase)形式,首字母大写。示例:ClusterFirst、Pending、ClientIP。当单词是首字母缩写词时,首字母缩写词中的每个字母都应大写,例如 ClientIP 或 TCPDelay。当专有名称或命令行可执行文件的名称用作常量时,专有名称应以一致的大小写表示 – 示例:systemd、iptables、IPVS、cgroupfs、Docker(作为通用概念)、docker(作为命令行可执行文件)。如果使用了混合大写的专有名称,例如 eBPF,则应将其保留在更长的常量中,例如 eBPFDelegation。

Kubernetes中的所有API都必须利用这种风格的常量,包括标志和配置文件。在以前使用不一致常量的情况下,新标志应该只是驼峰形式,并且随着时间的推移,旧标志应该被更新以使用驼峰形式值以及不一致的常量。示例:Kubelet的–topology-manager-policy标志,其值为 none、best-effort、restricted和single-numa-node。这个标志应该接受 None、BestEffort、Restricted和SingleNUMANode。如果向该标志添加新值,则应支持两种形式。

Unions

有时,一组字段中最多可以设置其中一个字段的值。例如,PodSpec的[volumes]字段有17个不同的卷类型特定字段,例如nfs和iscsi。集合中的所有字段都应该是可选的。

有时,当创建一个新类型时,api设计者可能会预料到将来会需要一个联合,即使最初只允许一个字段。在这种情况下,请务必将字段设为可选的,如果字段没有设置一个唯一的值,您仍然可能返回错误。不要为该字段设置默认值。

Lists and Simple kinds

每个列表或简单类型都应该在metadata对象的字段中具有以下元数据:

resourceVersion:一个字符串,标识列表中返回的对象的通用版本。这个值必须被客户端视为不透明的(对于客户端没意义),并且不加修改地传回服务器。资源版本仅在单个命名空间下的单个资源类型有效。服务器返回的每个简单类型,以及发送到服务器的要支持幂等性或乐观并发的简单类型要应该返回这个值。由于简单资源经常被用作修改对象的输入替代动作,所以简单资源的资源版本应该与对象的资源版本相对应。(译注:修改版本为1的资源,返回的版本也是1,以及修改的数据,如果有别人同时进行修改,服务端的版本号增加了,本次修改失败)

Differing Representations

Verbs on Resources

API资源应该使用传统的REST模式:

  • GET /<resourceNamePlural> – 获取列表,例如GET /pods返回pods列表。
  • POST /<resourceNamePlural> – 通过客户端提供的JSON对象创建一个新的资源。
  • GET /<resourceNamePlural>/<name> – 获取具有指定名称的单个资源实例,例如GET /pods/first返回一个名称为first的Pod资源。应该在一个常数的时间内返回(译注:参考SLI/SLO),并且资源的大小应该是有界的。
  • DELETE /<resourceNamePlural>/<name> – 删除具有指定名称的单个资源实例。可以指定gracePeriodSeconds删除选项(DeleteOptions),该选项表示在资源真正删除之前的一个宽限时间,单位是秒。
  • DELETE /<resourceNamePlural> – 删除<resourceName>列表,如 DELETE /pods删除pods列表。
  • PUT /<resourceNamePlural>/<name> – 使用客户端提供的JSON对象创建或更新指定名称的资源。是否可以使用PUT请求创建资源取决于特定资源的存储策略配置,特别是 AllowCreateOnUpdate()返回值。大多数内置类型不允许这样做(译注:可参考pkg/registry/apps/deployment/storage/storage.go)。
  • PATCH /<resourceNamePlural>/<name> – 选择性修改资源的指定字段。参考下面更详细的内容。
  • GET /<resourceNamePlural>?watch=true – 监视JSON对象的流,相当于是订阅了资源对象随着时间的变更。

PATCH operations

kubernetes api支持不同的patch模型,具体使用的模型根据请求的Content-Type决定:

  • JSON Patch:Content-Type: application/json-patch+json
    • RFC6902定义,JSON Patch是对资源对象执行的一些列动作,如{"op": "add", "path": "/a/b/c", "value": [ "foo", "bar" ]}。对于更详细内容可参考该RFC文档。
  • Merge Path:Content-Type: application/merge-patch+json
    • RFC7386定义,Merge Patch本质上是资源的部分的表示。提交的JSON与当前资源“合并”以创建一个新的,然后保存新的。有关如何使用Merge Patch的更多详细信息,请参考该RFC文档。
  • Strategic Merge Patch:Content-Type: application/strategic-merge-patch+json
    • Strategic Merge Patc是Merge Path的自定义实现。有关它的工作原理以及为什么需要引入它的详细说明,请参见此处

Idempotency

所有兼容Kubernetes API的接口必须支持“名称幂等性”,并在POST请求的对象名称与与系统中现有对象具有相同名称时返回HTTP 409状态码。有关详细信息,请参阅标识符文档。

可以通过metadata.generateName请求系统生成名称(names)。GenerateName表示这个名称应该在持久化之前要使之独唯一。该字段如果非空的话则表示想要一个唯一的名称(返回给客户端的名称会和之前发送给服务端的不同)。如果资源的名称字段未指定,则名称将会以该字段和一个唯一的后缀组成。该字段的值必须符合名称规则。如果指定了该字段,并且没有指定名称的情况下,如果生成的名称在系统中已经存在,将不会返回409,而是返回201 Created,或者504 ServerTimeout(表明在分配的时间内找不到一个唯一的名称),在发生504的情况下客户端应该进行重试(可选的使用Retry-After的头中说明的时间)。

Optional vs. Required

字段要么是可选的,要么是必须的。

可选字段有如下属性:

  • 在字段注释中有+optional标签(译注:参考code-gen)
  • 字段是一个指针类型(如AwesomeFlag *SomeFlag)或者由内置nil值(如maps和slices)
  • API server允许POST和PUT这些可选字段未设置值的资源

在大多数场景下,可选字段同时应该有omitempy结构体标签(struct tag)(emitempty说明如果该字段拥有一个空值,在进行json编码的时候应该省略)。然而,如果你对可选字段想要区分处理未提供值和提供空值的情况,则不能使用emitempty(如[kubernetes/kubernetes#34641(https://github.com/kubernetes/kubernetes/issues/34641)])。

注意,考虑到向后兼容性,任何有emitempty struct tag的字段会被认为是可选的,但是未来可能会改变这种行为,非常推荐使用+optionnal

必须字段拥有相反的属性,即:

  • 字段注释中没有+optional
  • 没有omitempty结构体标签
  • 不是指针类型(如AnotherFlag SomeFlag
  • 如果这些字段没有设置值,API server应不允许POST和PUT这样的资源

使用+optionalomitempty将会在OpenAPI文档中反映出来该字段是可选的。

使用指针可以区分未设置值和空值的情况。原则上,在一些情况下,对于可选字段不必使用指针,因为空值是不允许的,因此隐含的意思就是未设置值。在代码库中有一些列子。然而:

  • 实现者可能难以预测可能需要将空值与零值区分开的所有情况
  • 即使指定了omitempty,编码器输出也不会省略结构,比较混乱
  • 使用指针总是能够让GO客户端和其他使用该类型的任何客户端明确的知道字段是可选的

因此,对于没有内置nil值的类型,我们要求对于可选字段总是使用指针。

Defaulting

默认值是特定于API版本的,并且他们会在特定版本的API的资源定义转换到代表期望状态(Spec)的内部对象时进行应用。后续在获取(GETs)资源时将会明确的包含这些默认值。

将默认值合并到Spec可确保Spec描述完整的期望状态,以便系统更容易确定如何实现该状态,并让用户知道预期什么。

对于字段的默认值,可以通过+default=标签指定。原始类型字段会在反序列化JSON对象的过程中对其赋值。如果字段没有指定omitempty json标签,并且没有指定默认值时,字段将被赋予该类型的默认值。

参考该文档了解更多信息。

API版本特定的默认值由API server设置。

Late Initialization

延迟初始化是指资源在创建或更新后,有系统的控制器对其字段进行赋值。

例如,调度器会pod创建之后设置pod.spec.nodeName字段。

延迟初始化只能够做下列类型的修改:

  • 为未设置值的字段赋值
  • 向字典中添加键(译注:如label,annotation)
  • 向具有合并语义的数组中添加值(类型定义中,具有patchStrategy:"merge"属性)(译注:参考vendor/k8s.io/api/core/v1/types.go中Env)

约定:

  • 允许用户(具有足够权限)通过设置原本会被默认的字段来覆盖任何系统默认行为
  • 使来自用户的更新能够与后期初始化期间所做的更改通过strategic merge patch合并,而不是破坏更改
  • 允许进行后期初始化的组件使用strategic merge patch,有助于此类组件的组合和并发

尽管apiserver准入控制阶段在对象创建之前起作用,但准入控制插件也应该遵循后期初始化约定,以允许稍后将它们的实现移动到“控制器”或客户端库。

Concurrency Control and Consistency

kubernetes使用资源版本(resource versions)实现乐观的并发控制。所有kubernetes资源的metadata中都有一个“resourceVersion”字段。该字段是一个字符串,表示资源的内部版本,客户端可以根据该字段确定资源是否发生变化。当资源将要更新时,他的版本将会和之前保存的版本进行对比,如果不匹配,将会失败并返回StatusConflict(HTTP状态码409)。

每次资源对象发生修改时,resourceVersion都会变化。如果在PUT的资源对象包含了resourceVersion,那么系统会对确认当前的resourceVersion和请求中的匹配,以确保在整个读取/修改/写入的环节中没有其修改周期内没有其他成功的资源变动。

当前resourceVersion是由etcd‘s modifiedIndex支持。但是,应用不应该依赖kubernetes版本系统的实现细节。将来,我们可能会改变resourceVersion的实现,如变成基于每个对象的时间戳计数。

对于客户端来说,要得知resourceVersion的唯一方式是从前一个操作的server响应中获取,典型的如GET。该值对于客户端没有什么实际意义,并且要原封不动的传回给server。客户端不应设想资源版本在不同的命名空间,不同资源类型和不同的server之间有什么含义。当前,resourceVerion的值被设置为etcd的sequencer。你可以认为它是一个逻辑时钟,api server可以用来对请求排序。但是,我们预期在将来会改变resourceVersion的实现,例如我们对kind或namespace进行分片(shard),或者移植到其他存储系统。

在出现冲突(conflict)的情况下,客户端的正确操作是再次获取该资源,应用变更,然后提交变更。该机制可以防止下列情况的竞争:

Client #1                                  Client #2
GET Foo                                    GET Foo
Set Foo.Bar = "one"                        Set Foo.Baz = "two"
PUT Foo                                    PUT Foo

当这两个操作并行进行,Foo.Bar或者Foo.Baz其中一个修改会丢失。

另一方面,其中一个PUT会失败,因为无论哪一个成功都会改变resourceVersion。

在未来,resourceVersion可以用作其他操作(例如GET、DELETE)的前提条件,例如在存在缓存的情况下实现read-after-write一致性。

“Watch”操作中在查询参数中指定resourceVersion。用来指定从哪个点开始监视资源。这可用于确保在GET资源(或资源列表)和后续Watch之间不会遗漏任何变动,即使资源的当前版本更新。这是目前列表操作(GET资源列表)返回resourceVersion的主要原因。

Serialization Format

根据Accept头,api可以返回其资源的他其他表示形式,但是请求和相应的默认的序列化方式必须是JSON。

对于内置的资源类型(译注:如pod),也支持protobuf编码。由于proto不是自描述的,因此有一个信封包装器来描述内容的类型。

所有日期都应序列化为RFC3339字符串。

Units

单位必须在字段名称中明确显示(例如timeoutSeconds),或者必须指定为值的一部分(例如resource.Quantity)(译注:apimachinery中的类型)。哪种方法更受欢迎待定,尽管目前我们使用fooSeconds约定表示持续时间(durations)。

Duration字段必须定义为integer类型,并且字段名中有单位说明(如leaseDurationSeconds)。我们没有在API中使用Duration是因为这可能需要客户端实现go兼容的字符串解析。

Selecting Fields

一些API可能需要识别JSON对象中哪个字段无效,或其他用途。当前推荐方式是使用标准的javaScript语法访问字段,假定这个JSON对象被转换成了javaScript对象,没有前缀的点,如metadata.name

示例:

  • 在一个“fields”数组中的第二个元素的“state”对象中找到“current”字段:fields[1].state.current

Object reference

命名空间类型(namespaced type)上的对象引用通常应该只引用同一命名空间中的对象。 由于命名空间是一个安全边界,跨命名空间引用可能会产生意想不到的影响,包括:

  • 将有关一个命名空间的信息泄漏到另一个命名空间。在原始资源对象中放置有关被引用对象的状态消息甚至是一些内容是很自然的。这是跨命名空间的问题。
  • 对其他命名空间的潜在入侵。引用通常可以访问一条被引用的信息,因此能够跨命名空间表达“给我那个”是危险的,而不需要额外的工作来进行权限检查或从两个涉及的命名空间中选择加入。
  • 一方无法解决的参照完整性问题。从namespace/A引用namespace/B并不意味着可以控制另一个命名空间。这意味着您可以引用无法创建或更新的事物。
  • 删除时语义不明确。如果命名空间资源被其他命名空间引用,则删除引用的资源是否会导致删除,或者是否应该强制保留引用的资源。
  • 创建时的语义不明确。如果引用的资源是在其引用之后创建的,则无法知道它是预期的资源还是使用相同名称创建的不同资源。

内置类型和ownerReferences不支持跨命名空间引用。如果非内置类型选择跨命名空间引用,则应清楚地描述上述边缘情况的语义,并应解决权限问题。这可以通过双重选择(来自推荐人和被推荐人的选择)或在准入时执行的辅助权限检查来完成。

Naming of the reference field

引用字段的名称应采用“{field}Ref”格式,后缀中始终包含“Ref”。

“{field}”部分应该表明引用的目的。例如,“targetRef”表明该引用的是target。

可以让“{field}”组件指示资源类型。例如,引用密钥时的“secretRef”。但是,如果该字段被扩展为引用多个类型,则存在该字段被误称的风险。

在引用对象列表的场景下,该字段应采用“{field}Refs”格式,同时遵循上面单个引用的情况的指导。

Referencing resources with multiple versions

大多数资源都具有多个版本。例如,核心(core)资源会经历alpha逐渐变化到GA版本。

控制器应假定资源的版本会变化,并做出恰当的错误处理。

Handling of resources that do not exist

在很多场景下会出现期望的资源不存在,例如:

  • 资源的目标版本不存在
  • 在集群启动过程中资源还未被加载
  • 用户错误

控制器的编写应假定所引用的资源可能不存在,并包括错误处理以使用户清楚地了解问题。

Validation of fields

对象应用中的使用的很多值都用做API路径(译注:url path)的一部分。例如,在路径中使用对象名称来标识对象。未经清理,这些值可用于尝试检索其他资源,例如通过使用具有语义含义的值,例如..或/。

在将字段用作API请求中的路径段之前让控制器验证字段,并发出一个事件来告诉用户验证失败。

有关合法对象名称的更多信息,请参阅对象名称和ID

Do not modify the referrd object

为了最大限度地减少潜在的权限提升,不要修改被引用的对象,或者限制对同一命名空间中的对象的修改并限制允许的修改类型(例如,HorizontalPodAutoscaler控制器仅写入/scale子资源)。

Minimize copying or printing values to the referrer objecgt

由于控制器的权限可能与控制器正在管理的对象的作者的权限不同,因此对象的作者可能没有查看引用对象的权限。因此,将有关被引用对象的任何值复制到引用对象可以被视为权限升级,使用户能够读取他们以前无法访问的值。

相同的场景适用于将有关被引用对象的信息写入事件。

通常,不要将从引用对象检索到的信息写入或打印到规范、其他对象或日志中。

必要时,考虑这些值是否是引用者对象的作者可以通过其他方式访问的值(例如,正确填充对象引用已经需要)。

Object References Examples

以下部分说明了各种对象引用方案的推荐架构。

下面概述的模式旨在随着可引用对象类型的扩展启用纯粹的附加字段,因此是向后兼容的。

例如,可以从一种资源类型转到多种资源类型,而无需对架构进行重大更改。

Single resource reference

单一种类的对象引用很简单,因为控制器可以硬编码识别对象所需的大多数限定符。例如,唯一需要提供的值是名称(和命名空间,尽管不鼓励跨命名空间引用):

# for a single resource, the suffix should be Ref, with the field name
# providing an indication as to the resource type referenced.
secretRef:
    name: foo
    # namespace would generally not be needed and is discouraged,
    # as explained above.
    namespace: foo-namespace

仅当计划始终仅引用单个资源时才应使用此模式。如果可以扩展到多种资源类型,请使用多资源引用。

Controller behavior

operator应该知道它需要从中检索值的对象的版本、组和资源名称,并且可以使用发现客户端(discovery client )(译注:client-go discovery)或直接构造API路径(path)。

Multiple resource reference

当引用可以指向一组限定的有效资源类型时,使用多类对象引用。

与单一类型对象引用一样,operator可以提供缺失的字段,前提是存在的字段足以在支持的类型集中唯一标识对象资源类型。

# guidance for the field name is the same as a single resource.
fooRef:
    group: sns.services.k8s.aws
    resource: topics
    name: foo
    namespace: foo-namespace

译注:/apis/sns.services.k8s.aws/v1/namespaces/foo-namespace/topics/foo

尽管并不总是需要帮助控制器识别资源类型,但当资源存在于多个组中时,包含“组(group)”以避免歧义。它还为最终用户提供了清晰度,并允许复制粘贴引用,而不会由于处理引用的不同控制器而改变引用的类型。

Kind vs. Resource

对象引用的一个常见混淆点是是否使用“种类(kind)”或“资源(resource)”字段构造引用。历史上,kubernetes中的大多数对象引用都使用了“kind”。这不像“resource”那么精确。虽然“组(group)”和“资源(resource)”的每个组合在 Kubernetes 中都必须是唯一的,但“组(group)”和“种类(kind)”并不总是如此。多个资源可以使用相同的“种类(kind)”。

通常,kubernetes中的所有对象都有一个规范的主要资源 – 例如“pods”,代表创建和删除“Pod”类型对象的方式。虽然有的资源可能无法直接创建,例如”Scale”对象仅用在一些工作负载的“scale”子资源上,但大多数对象引用通过其结构(schema)寻址主要资源。在对象引用的上下文中,“kind”指的是结构(schema),而不是资源。

如果在实现对象引用时,总是能够有一个明确的方式映射类型(kind)到资源(resource),那么在对象引用中使用类型(kind)是可接收的。通常,这需要实现要有一个预先定义的类型(kind)到资源(resource)的映射(这就是内置引用使用kind的场景)。依赖动态的类型到资源的映射是不安全的。即使一开始一个类型仅映射到一个资源,有可能别的资源会引用相同的类型,可能打破任何动态资源映射。

如果对象引用可用于引用任意类型的资源,并且类型和资源之间的映射可能不明确,则应在对象引用中使用“资源”。

ingress api提供了一个很好的例子,说明对象引用在哪里可以接受“类型(kind)”。 api支持后端引用作为扩展点。实现可以使用它来支持将流量转发到自定义目标,例如存储桶。重要的是,api的每个实现都清楚地定义了支持的目标类型,并且对于一种类型映射到哪个资源没有歧义。这是因为每个Ingress实现都有一个硬编码的类型到资源的映射。

如果使用“kind”而不是“resource”,上面的对象引用将如下所示:

fooRef:
    group: sns.services.k8s.aws
    kind: Topic
    name: foo
    namespace: foo-namespace
Controller behavior

operator可以存储一个(group, resource)到期望的资源版本的映射。从那里,它可以构造资源的完整路径,并检索对象。

也可以让控制器选择通过发现客户端找到的版本。但是,由于结构(schema)可能因资源的不同版本而异,控制器也必须处理这些差异。

Generic object reference

当希望提供指向某个对象的指针以简化用户发现时,使用通用对象引用。例如,这可用于引用core.v1.Event对象。

使用通用对象引用,除了标准(例如ObjectMeta)之外,无法提取有关引用对象的任何信息。由于资源的任何版本中都存在任何标准字段,因此在这种情况下可以不包括版本:

fooObjectRef:
    group: operator.openshift.io
    resource: openshiftapiservers
    name: cluster
    # namespace is unset if the resource is cluster-scoped, or lives in the
    # same namespace as the referrer.
Controller behavior

oeperator应期望使用复现客户端(由于未提供版本)来找到该资源。由于任何可检索字段对所有对象都是通用的,因此任何版本的资源都应该这样做。

Field reference

当期望从引用的对象获取字段值的时候使用字段引用。

字段引用和其他引用类型不同,因为operator在在引用前不知道该对象的信息。由于对象的结构(schema)在资源的不同版本之间可能会不同,这意味着对于字段引用需要一个“版本”说明。

fooFieldRef:
   version: v1 # version of the resource
   # group is elided in the ConfigMap example, since it has a blank group in the OpenAPI spec.
   resource: configmaps
   fieldPath: data.foo

fieldPath应该指向一个单一的值,并使用推荐的字段选择器符号表示字段路径(path)。

Controller behavior

在这种情况下,用户将提供所有必需的路径元素:组、版本、资源、名称和可能的命名空间。因此,控制器可以构造api前缀并在不使用发现客户端的情况下对其进行查询:

/apis/{group}/{version}/{resource}/

HTTP Status codes

The server will respond with HTTP status codes that match the HTTP spec. See the
section below for a breakdown of the types of status codes the server will send.

The following HTTP status codes may be returned by the API.

Success codes

  • 200 StatusOK
    • Indicates that the request completed successfully.
  • 201 StatusCreated
    • Indicates that the request to create kind completed successfully.
  • 204 StatusNoContent
    • Indicates that the request completed successfully, and the response contains
      no body.
    • Returned in response to HTTP OPTIONS requests.

Error codes

  • 307 StatusTemporaryRedirect
    • Indicates that the address for the requested resource has changed.
    • Suggested client recovery behavior:
      • Follow the redirect.
  • 400 StatusBadRequest
    • Indicates the requested is invalid.
    • Suggested client recovery behavior:
      • Do not retry. Fix the request.
  • 401 StatusUnauthorized
    • Indicates that the server can be reached and understood the request, but
      refuses to take any further action, because the client must provide
      authorization. If the client has provided authorization, the server is
      indicating the provided authorization is unsuitable or invalid.
    • Suggested client recovery behavior:
      • If the user has not supplied authorization information, prompt them for
        the appropriate credentials. If the user has supplied authorization information,
        inform them their credentials were rejected and optionally prompt them again.
  • 403 StatusForbidden
    • Indicates that the server can be reached and understood the request, but
      refuses to take any further action, because it is configured to deny access for
      some reason to the requested resource by the client.
    • Suggested client recovery behavior:
      • Do not retry. Fix the request.
  • 404 StatusNotFound
    • Indicates that the requested resource does not exist.
    • Suggested client recovery behavior:
      • Do not retry. Fix the request.
  • 405 StatusMethodNotAllowed
    • Indicates that the action the client attempted to perform on the resource
      was not supported by the code.
    • Suggested client recovery behavior:
      • Do not retry. Fix the request.
  • 409 StatusConflict
    • Indicates that either the resource the client attempted to create already
      exists or the requested update operation cannot be completed due to a conflict.
    • Suggested client recovery behavior:
      • If creating a new resource:
        • Either change the identifier and try again, or GET and compare the
          fields in the pre-existing object and issue a PUT/update to modify the existing
          object.
      • If updating an existing resource:
        • See Conflict from the status response section below on how to
          retrieve more information about the nature of the conflict.
        • GET and compare the fields in the pre-existing object, merge changes (if
          still valid according to preconditions), and retry with the updated request
          (including ResourceVersion).
  • 410 StatusGone
    • Indicates that the item is no longer available at the server and no
      forwarding address is known.
    • Suggested client recovery behavior:
      • Do not retry. Fix the request.
  • 422 StatusUnprocessableEntity
    • Indicates that the requested create or update operation cannot be completed
      due to invalid data provided as part of the request.
    • Suggested client recovery behavior:
      • Do not retry. Fix the request.
  • 429 StatusTooManyRequests
    • Indicates that the either the client rate limit has been exceeded or the
      server has received more requests then it can process.
    • Suggested client recovery behavior:
      • Read the Retry-After HTTP header from the response, and wait at least
        that long before retrying.
  • 500 StatusInternalServerError
    • Indicates that the server can be reached and understood the request, but
      either an unexpected internal error occurred and the outcome of the call is
      unknown, or the server cannot complete the action in a reasonable time (this may
      be due to temporary server load or a transient communication issue with another
      server).
    • Suggested client recovery behavior:
      • Retry with exponential backoff.
  • 503 StatusServiceUnavailable
    • Indicates that required service is unavailable.
    • Suggested client recovery behavior:
      • Retry with exponential backoff.
  • 504 StatusServerTimeout
    • Indicates that the request could not be completed within the given time.
      Clients can get this response ONLY when they specified a timeout param in the
      request.
    • Suggested client recovery behavior:
      • Increase the value of the timeout param and retry with exponential
        backoff.

Response Status Kind

Kubernetes will always return the Status kind from any API endpoint when an
error occurs. Clients SHOULD handle these types of objects when appropriate.

Status kind will be returned by the API in two cases:

  • When an operation is not successful (i.e. when the server would return a non
    2xx HTTP status code).
  • When a HTTP DELETE call is successful.

The status object is encoded as JSON and provided as the body of the response.
The status object contains fields for humans and machine consumers of the API to
get more detailed information for the cause of the failure. The information in
the status object supplements, but does not override, the HTTP status code’s
meaning. When fields in the status object have the same meaning as generally
defined HTTP headers and that header is returned with the response, the header
should be considered as having higher priority.

Example:

$ curl -v -k -H "Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc" https://10.240.122.184:443/api/v1/namespaces/default/pods/grafana

> GET /api/v1/namespaces/default/pods/grafana HTTP/1.1
> User-Agent: curl/7.26.0
> Host: 10.240.122.184
> Accept: */*
> Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc
>

< HTTP/1.1 404 Not Found
< Content-Type: application/json
< Date: Wed, 20 May 2015 18:10:42 GMT
< Content-Length: 232
<
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "pods \"grafana\" not found",
  "reason": "NotFound",
  "details": {
    "name": "grafana",
    "kind": "pods"
  },
  "code": 404
}

status field contains one of two possible values:

  • Success
  • Failure

message may contain human-readable description of the error

reason may contain a machine-readable, one-word, CamelCase description of why
this operation is in the Failure status. If this value is empty there is no
information available. The reason clarifies an HTTP status code but does not
override it.

details may contain extended data associated with the reason. Each reason may
define its own extended details. This field is optional and the data returned is
not guaranteed to conform to any schema except that defined by the reason type.

Possible values for the reason and details fields:

  • BadRequest
    • Indicates that the request itself was invalid, because the request doesn’t
      make any sense, for example deleting a read-only object.
    • This is different than status reason Invalid above which indicates that
      the API call could possibly succeed, but the data was invalid.
    • API calls that return BadRequest can never succeed.
    • Http status code: 400 StatusBadRequest
  • Unauthorized
    • Indicates that the server can be reached and understood the request, but
      refuses to take any further action without the client providing appropriate
      authorization. If the client has provided authorization, this error indicates
      the provided credentials are insufficient or invalid.
    • Details (optional):
      • kind string
      • The kind attribute of the unauthorized resource (on some operations may
        differ from the requested resource).
      • name string
        • The identifier of the unauthorized resource.
    • HTTP status code: 401 StatusUnauthorized
  • Forbidden
    • Indicates that the server can be reached and understood the request, but
      refuses to take any further action, because it is configured to deny access for
      some reason to the requested resource by the client.
    • Details (optional):
      • kind string
        • The kind attribute of the forbidden resource (on some operations may
          differ from the requested resource).
      • name string
        • The identifier of the forbidden resource.
    • HTTP status code: 403 StatusForbidden
  • NotFound
    • Indicates that one or more resources required for this operation could not
      be found.
    • Details (optional):
      • kind string
        • The kind attribute of the missing resource (on some operations may
          differ from the requested resource).
      • name string
        • The identifier of the missing resource.
    • HTTP status code: 404 StatusNotFound
  • AlreadyExists
    • Indicates that the resource you are creating already exists.
    • Details (optional):
      • kind string
        • The kind attribute of the conflicting resource.
      • name string
        • The identifier of the conflicting resource.
    • HTTP status code: 409 StatusConflict
  • Conflict
    • Indicates that the requested update operation cannot be completed due to a
      conflict. The client may need to alter the request. Each resource may define
      custom details that indicate the nature of the conflict.
    • HTTP status code: 409 StatusConflict
  • Invalid
    • Indicates that the requested create or update operation cannot be completed
      due to invalid data provided as part of the request.
    • Details (optional):
      • kind string
        • the kind attribute of the invalid resource
      • name string
        • the identifier of the invalid resource
      • causes
        • One or more StatusCause entries indicating the data in the provided
          resource that was invalid. The reasonmessage, and field attributes will
          be set.
    • HTTP status code: 422 StatusUnprocessableEntity
  • Timeout
    • Indicates that the request could not be completed within the given time.
      Clients may receive this response if the server has decided to rate limit the
      client, or if the server is overloaded and cannot process the request at this
      time.
    • Http status code: 429 TooManyRequests
    • The server should set the Retry-After HTTP header and return
      retryAfterSeconds in the details field of the object. A value of 0 is the
      default.
  • ServerTimeout
    • Indicates that the server can be reached and understood the request, but
      cannot complete the action in a reasonable time. This maybe due to temporary
      server load or a transient communication issue with another server.
      • Details (optional):
        • kind string
          • The kind attribute of the resource being acted on.
        • name string
          • The operation that is being attempted.
    • The server should set the Retry-After HTTP header and return
      retryAfterSeconds in the details field of the object. A value of 0 is the
      default.
    • Http status code: 504 StatusServerTimeout
  • MethodNotAllowed
    • Indicates that the action the client attempted to perform on the resource
      was not supported by the code.
    • For instance, attempting to delete a resource that can only be created.
    • API calls that return MethodNotAllowed can never succeed.
    • Http status code: 405 StatusMethodNotAllowed
  • InternalError
    • Indicates that an internal error occurred, it is unexpected and the outcome
      of the call is unknown.
    • Details (optional):
      • causes
        • The original error.
    • Http status code: 500 StatusInternalServerError code may contain the suggested HTTP return code for this status.

Events

Events are complementary to status information, since they can provide some
historical information about status and occurrences in addition to current or
previous status. Generate events for situations users or administrators should be alerted about.

Choose a unique, specific, short, CamelCase reason for each event category. For example, FreeDiskSpaceInvalid is a good event reason because it is likely to refer to just one situation, but Started is not a good reason because it doesn’t sufficiently indicate what started, even when combined with other event fields.

Error creating foo or Error creating foo %s would be appropriate for an
event message, with the latter being preferable, since it is more informational.

Accumulate repeated events in the client, especially for frequent events, to
reduce data volume, load on the system, and noise exposed to users.

Naming conventions

  • Go field names must be PascalCase. JSON field names must be camelCase. Other than capitalization of the initial letter, the two should almost always match. No underscores or dashes in either.
  • Field and resource names should be declarative, not imperative (SomethingDoer, DoneBy, DoneAt).
  • Use Node where referring to
    the node resource in the context of the cluster. Use Host where referring to properties of the individual physical/virtual system, such as hostnamehostPathhostNetwork, etc.
  • FooController is a deprecated kind naming convention. Name the kind after
    the thing being controlled instead (e.g., Job rather than JobController).
  • The name of a field that specifies the time at which something occurs should be called somethingTime. Do not use stamp (e.g., creationTimestamp).
  • We use the fooSeconds convention for durations, as discussed in the units subsection.
    • fooPeriodSeconds is preferred for periodic intervals and other waiting
      periods (e.g., over fooIntervalSeconds).
    • fooTimeoutSeconds is preferred for inactivity/unresponsiveness deadlines.
    • fooDeadlineSeconds is preferred for activity completion deadlines.
  • Do not use abbreviations in the API, except where they are extremely commonly used, such as “id”, “args”, or “stdin”.
  • Acronyms should similarly only be used when extremely commonly known. All
    letters in the acronym should have the same case, using the appropriate case for the situation. For example, at the beginning of a field name, the acronym should be all lowercase, such as “httpGet”. Where used as a constant, all letters should be uppercase, such as “TCP” or “UDP”.
  • The name of a field referring to another resource of kind Foo by name should be called fooName. The name of a field referring to another resource of kind
    Foo by ObjectReference (or subset thereof) should be called fooRef.
  • More generally, include the units and/or type in the field name if they could
    be ambiguous and they are not specified by the value or value type.
  • The name of a field expressing a boolean property called ‘fooable’ should be called Fooable, not IsFooable.

Namespace Names

  • The name of a namespace must be a DNS_LABEL.
  • The kube- prefix is reserved for Kubernetes system namespaces, e.g. kube-system and kube-public.
  • See the namespace docs for more information.

Label, selector, and annotation conventions

Labels are the domain of users. They are intended to facilitate organization and management of API resources using attributes that are meaningful to users, as opposed to meaningful to the system. Think of them as user-created mp3 or email inbox labels, as opposed to the directory structure used by a program to store its data. The former enables the user to apply an arbitrary ontology, whereas the latter is implementation-centric and inflexible. Users will use labels to select resources to operate on, display label values in CLI/UI columns, etc. Users should always retain full power and flexibility over the label schemas they apply to labels in their namespaces.

However, we should support conveniences for common cases by default. For
example, what we now do in ReplicationController is automatically set the RC’s selector and labels to the labels in the pod template by default, if they are not already set. That ensures that the selector will match the template, and that the RC can be managed using the same labels as the pods it creates. Note that once we generalize selectors, it won’t necessarily be possible to unambiguously generate labels that match an arbitrary selector.

If the user wants to apply additional labels to the pods that it doesn’t select upon, such as to facilitate adoption of pods or in the expectation that some label values will change, they can set the selector to a subset of the pod labels. Similarly, the RC’s labels could be initialized to a subset of the pod template’s labels, or could include additional/different labels.

For disciplined users managing resources within their own namespaces, it’s not that hard to consistently apply schemas that ensure uniqueness. One just needs to ensure that at least one value of some label key in common differs compared to all other comparable resources. We could/should provide a verification tool to check that. However, development of conventions similar to the examples in Labels make uniqueness straightforward. Furthermore, relatively narrowly used namespaces (e.g., per environment, per application) can be used to reduce the set of resources that could potentially cause overlap.

In cases where users could be running misc. examples with inconsistent schemas, or where tooling or components need to programmatically generate new objects tobe selected, there needs to be a straightforward way to generate unique label sets. A simple way to ensure uniqueness of the set is to ensure uniqueness of a single label value, such as by using a resource name, uid, resource hash, or generation number.

Problems with uids and hashes, however, include that they have no semantic
meaning to the user, are not memorable nor readily recognizable, and are not
predictable. Lack of predictability obstructs use cases such as creation of a
replication controller from a pod, such as people want to do when exploring the system, bootstrapping a self-hosted cluster, or deletion and re-creation of a new RC that adopts the pods of the previous one, such as to rename it. Generation numbers are more predictable and much clearer, assuming there is a
logical sequence. Fortunately, for deployments that’s the case. For jobs, use of creation timestamps is common internally. Users should always be able to turn off auto-generation, in order to permit some of the scenarios described above. Note that auto-generated labels will also become one more field that needs to be stripped out when cloning a resource, within a namespace, in a new namespace, in a new cluster, etc., and will need to be ignored around when updating a resource via patch or read-modify-write sequence.

Inclusion of a system prefix in a label key is fairly hostile to UX. A prefix is only necessary in the case that the user cannot choose the label key, in order to avoid collisions with user-defined labels. However, I firmly believe that the user should always be allowed to select the label keys to use on their resources, so it should always be possible to override default label keys.

Therefore, resources supporting auto-generation of unique labels should have a uniqueLabelKeyfield, so that the user could specify the key if they wanted to, but if unspecified, it could be set by default, such as to the resource type, like job, deployment, or replicationController. The value would need to be at least spatially unique, and perhaps temporally unique in the case of job.

Annotations have very different intended usage from labels. They are
primarily generated and consumed by tooling and system extensions, or are used by end-users to engage non-standard behavior of components. For example, an annotation might be used to indicate that an instance of a resource expects additional handling by non-kubernetes controllers. Annotations may carry arbitrary payloads, including JSON documents. Like labels, annotation keys can be prefixed with a governing domain (e.g. example.com/key-name). Unprefixed keys (e.g. key-name) are reserved for end-users. Third-party components must se prefixed keys. Key prefixes under the “kubernetes.io” and “k8s.io” domains are reserved for use by the kubernetes project and must not be used by third-parties.

In early versions of Kubernetes, some in-development features represented new API fields as annotations, generally with the form something.alpha.kubernetes.io/name or something.beta.kubernetes.io/name (depending on our confidence in it). This pattern is deprecated. Some such annotations may still exist, but no new annotations may be defined. New API fields are now developed as regular fields.

Other advice regarding use of labels, annotations, taints, and other generic map keys by Kubernetes components and tools:

  • Key names should be all lowercase, with words separated by dashes instead of camelCase
    • For instance, prefer foo.kubernetes.io/foo-bar over foo.kubernetes.io/fooBar, prefer desired-replicas over DesiredReplicas
  • Unprefixed keys are reserved for end-users. All other labels and annotations must be prefixed.
  • Key prefixes under “kubernetes.io” and “k8s.io” are reserved for the Kubernetes project.
    • Such keys are effectively part of the kubernetes API and may be subject to deprecation and compatibility policies.
    • “kubernetes.io” is the preferred form for labels and annotations, “k8s.io” should not be used for new map keys.
  • Key names, including prefixes, should be precise enough that a user could plausibly understand where it came from and what it is for.
  • Key prefixes should carry as much context as possible.
    • For instance, prefer subsystem.kubernetes.io/parameter over kubernetes.io/subsystem-parameter
  • Use annotations to store API extensions that the controller responsible for the resource doesn’t need to know about, experimental fields that aren’t intended to be generally used API fields, etc. Beware that annotations aren’t automatically handled by the API conversion machinery.

WebSockets and SPDY

Some of the API operations exposed by Kubernetes involve transfer of binary streams between the client and a container, including attach, exec, portforward,and logging. The API therefore exposes certain operations over upgradeable HTTP connections (described in RFC 2817) via the WebSocket and SPDY protocols. These actions are exposed as subresources with their associated verbs (exec, log, attach, and portforward) and are requested via a GET (to support JavaScript in a browser) and POST (semantically accurate).

There are two primary protocols in use today:

  1. Streamed channelsWhen dealing with multiple independent binary streams of data such as the remote execution of a shell command (writing to STDIN, reading from STDOUT and STDERR) or forwarding multiple ports the streams can be multiplexed onto a single TCP connection. Kubernetes supports a SPDY based framing protocol that leverages SPDY channels and a WebSocket framing protocol that multiplexes multiple channels onto the same stream by prefixing each binary chunk with a byte indicating its channel. The WebSocket protocol supports an optional subprotocol that handles base64-encoded bytes from the client and returns base64-encoded bytes from the server and character based channel prefixes (‘0’, ‘1’, ‘2’) for ease of use from JavaScript in a browser.
  2. Streaming responseThe default log output for a channel of streaming data is an HTTP Chunked Transfer-Encoding, which can return an arbitrary stream of binary data from the server. Browser-based JavaScript is limited in its ability to access the raw data from a chunked response, especially when very large amounts of logs are returned, and in future API calls it may be desirable to transfer large files. The streaming API endpoints support an optional WebSocket upgrade that provides a unidirectional channel from the server to the client and chunks data as binary WebSocket frames. An optional WebSocket subprotocol is exposed that base64 encodes the stream before returning it to the client.

Clients should use the SPDY protocols if their clients have native support, or WebSockets as a fallback. Note that WebSockets is susceptible to Head-of-Line blocking and so clients must read and process each message sequentially. In the future, an HTTP/2 implementation will be exposed that deprecates SPDY.

Validation

API objects are validated upon receipt by the apiserver. Validation errors are flagged and returned to the caller in a Failure status with reason set to Invalid. In order to facilitate consistent error messages, we ask that validation logic adheres to the following guidelines whenever possible (though exceptional cases will exist).

  • Be as precise as possible.
  • Telling users what they CAN do is more useful than telling them what they CANNOT do.
  • When asserting a requirement in the positive, use “must”. Examples: “must be greater than 0”, “must match regex ‘[a-z]+'”. Words like “should” imply that the assertion is optional, and must be avoided.
  • When asserting a formatting requirement in the negative, use “must not”. Example: “must not contain ‘..'”. Words like “should not” imply that the assertion is optional, and must be avoided.
  • When asserting a behavioral requirement in the negative, use “may not”. Examples: “may not be specified when otherField is empty”, “only name may be specified”.
  • When referencing a literal string value, indicate the literal in single-quotes. Example: “must not contain ‘..'”.
  • When referencing another field name, indicate the name in back-quotes. Example: “must be greater than `request`”.
  • When specifying inequalities, use words rather than symbols. Examples: “must be less than 256”, “must be greater than or equal to 0”. Do not use words like “larger than”, “bigger than”, “more than”, “higher than”, etc.
  • When specifying numeric ranges, use inclusive ranges when possible.

Automatic Resource Allocation And Deallocation

API objects often are union object containing the following:

  1. One or more fields identifying the Type specific to API object (aka the discriminator).
  2. A set of N fields, only one of which should be set at any given time – effectively a union.

Controllers operating on the API type often allocate resources based on the Type and/or some additional data provided by user. A canonical example of this is the Service API object where resources such as IPs and network ports will be set in the API object based on Type. When the user does not specify resources, they will be allocated, and when the user specifies exact value, they will be reserved or rejected.

When the user chooses to change the discriminator value (e.g., from Type X to Type Y) without changing any other fields then the system should clear the fields that were used to represent Type X in the union along with releasing resources that were attached to Type X. This should automatically happen irrespective of how these values and resources were allocated (i.e., reserved by the user or
automatically allocated by the system. A concrete example of this is again Service API. The system allocates resources such as NodePorts and ClusterIPs and automatically fill in the fields that represent them in case of the service is of type NodePort or ClusterIP (discriminator values). These resources and the fields representing them are automatically cleared when the users changes
service type to ExternalName where these resources and field values no longer apply.

Representing Allocated Values

Many API types include values that are allocated on behalf of the user from some larger space (e.g. IP addresses from a range, or storage bucket names). These allocations are usually driven by controllers asynchronously to the user’s API operations. Sometimes the user can request a specific value and a controller must confirm or reject that request. There are many examples of this in Kubernetes, and there a handful of patterns used to represent it.

The common theme among all of these is that the system should not trust users with such fields, and must verify or otherwise confirm such requests before using them.

Some examples:

  • Service clusterIP: Users may request a specific IP in spec or will be allocated one (in the same spec field). If a specific IP is requested, the apiserver will either confirm that IP is available or, failing that, will reject the API operation synchronously (rare). Consumers read the result from spec. This is safe because the value is either valid or it is never stored.
  • Service loadBalancerIP: Users may request a specific IP in spec or will be allocated one which is reported in status. If a specific IP is requested, the LB controller will either ensure that IP is available or report failure asynchronously. Consumers read the result from status. This is safe because most users do not have acces to write to status.
  • PersistentVolumeClaims: Users may request a specific PersistentVolume in spec or will be allocated one (in the same spec field). If a specific PV is requested, the volume controller will either ensure that the volume is available or report failure asynchronously. Consumers read the result by
    examining both the PVC and the PV. This is more complicated than the others because the specvalue is stored before being confirmed, which could (hypothetically, thanks to extra checking) lead to a user accessing someone else’s PV.
  • VolumeSnapshots: Users may request a particular source to be snaphotted in spec. The details of the resulting snapshot is reflected in status.

A counter-example:

  • Service externalIPs: Users must specify one or more specific IPs in spec. The system cannot easily verify those IPs (by their definition, they are external). Consumers read the result from spec. This is UNSAFE and has caused problems with untrusted users.

In the past, API conventions dictated that status fields always come from observation, which made some of these cases more complicated than necessary. The conventions have been updated to allow status to hold such allocated values. This is not a one-size-fits-all solution, though.

When to use a spec field

New APIs should almost never do this. Instead, they should use status. PersistentVolumes might have been simpler if we had done this.

When to use a status field

Storing such values in status is the easiest and most straight-forward pattern. This is appropriate when:

  • the allocated value is highly coupled to the rest of the object (e.g. pod resource allocations)
  • the allocated value is always or almost always needed (i.e. most instances of this type will have a value)
  • the schema and controller are known a priori (i.e. it’s not an extension)
  • it is “safe” to allow the controller(s) to write to status (i.e.
    there’s low risk of them causing problems via other status fields).

Consumers of such values can look at the status field for the “final” value or an error or condition indicating why the allocation could not be performed.

Sequencing operations

Since almost everything is happening asynchronously to almost everything else, controller implementations should take care around the ordering of operations. For example, whether the controller updates a status field before or after it actuates a change depends on what guarantees need to be made to observers of
the system. In some cases, writing to a status field represents an acknowledgement or acceptance of a spec value, and it is OK to write it before actuation. However, if it would be problematic for a client to observe the status value before it is actuated then the controller must actuate first and update status afterward. In some rarer cases, controllers will need to acknowledge, then actuate, then update to a “final” value.

Controllers must take care to consider how a status field will be handled in the case of interrupted control loops (e.g. controller crash and restart), and must act idempotently and consistently. This is particularly important when using an informer-fed cache, which might not be updated with recent writes. Using a resourceVersion precondition to detect the “conflict” is the common
pattern in this case. See this issue for an example.

When to use a different type

Storing allocated values in a different type is more complicated but also more flexible. This is most appropriate when:

  • the allocated value is optional (i.e. many instances of this type will not have a value at all)
  • the schema and controller are not known a priori (i.e. it’s an extension)
  • the schema is sufficiently complicated (i.e. it doesn’t make sense to burden the main type with it)
  • access control for this type demands finer granularity than “all of status”
  • the lifecycle of the allocated value is different than the lifecycle of the allocation holder

Services and Endpoints could be considered a form of this pattern, as could PersistentVolumes and PersistentVolumeClaims.

When using this pattern, you must account for lifecycle of the allocated objects (who cleans them up and when) as well as the “linkage” between them and the main type (often using the same name, an object-ref field, or a selector).

There will always be some cases which could follow either path, and these will need human evaluation to decide. For example, Service clusterIP is highly coupled to the rest of Service and most instances use it. But it also is strictly optional and has an increasingly complicated schema of related fields. An argument could be made for either path.

49012588 (Loosen the meaning of status in API conventions)

参考

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注