Introduction

To reduce the barrier of entry and avoid an inconsistent representation of the BICEPS data model in protoSDC, the entire XML Schema for BICEPS is automatically converted into protobuf as well as additional protoSDC target languages. This ensures a high degree of compatibility between protoSDC implementations as well as a very low barrier of entry.

How it works

proto-converter introduces an intermediate layer, which takes care of most of the conversion work needed to go from an inheritance based model to a composition based model. Languages only need to traverse the resulting graph of nodes to generate the types and parameters, they no longer need to know about XML. The only thing that is still related to XML is setting builtin and custom types, as they are using QNames.

Intermediate layer

The intermediate layer is a graph of nodes called BaseNode. Each BaseNode has a name, children, a nodeType and can hold a language specific information type to specify how a node is generated for a given target language.

NodeTypes are very basic types and all very much look like something from protobuf:

  • NodeType.Message represents what is essentially an object, or a message in proto-speak.

  • NodeType.Parameter is a field within a message, which can point to a message, an enum or a builtin type.

  • NodeType.StringEnumeration is an enum which can only represent strings, no values whatsoever.

  • NodeType.OneOf is a collection of parameters in which only one can be present, similar to proto.

  • NodeType.BuiltinType is a builtin type, as the name suggests. It is essentially the type holding everything built into XML Schema, such as string, decimal and friends.

Every element on the first level of an XML Schema is recursively converted into BaseNodes with NodeTypes, simplifying the structure and removing any inheritance.

Breaking up inheritance

Transforming inheritance into a composition-based model follows very simple rules:

  • if type A extends type B, B will become a field of the message A

  • if the XML model uses an element which has subtypes, replace it with a OneOf which allows for all subtypes of that element as well as the element itself to be used

Mapping example

To understand how an XML Schema type is mapped into nodes, let’s take a look at an example.

<xsd:complexType name="ClockDescriptor">
	<xsd:annotation>
		<xsd:documentation>Bla bla.</xsd:documentation>
	</xsd:annotation>
	<xsd:complexContent>
		<xsd:extension base="pm:AbstractDeviceComponentDescriptor">
			<xsd:sequence>
				<xsd:element name="TimeProtocol" type="pm:CodedValue" minOccurs="0" maxOccurs="unbounded">
					<xsd:annotation>
						<xsd:documentation>Bla bla here.</xsd:documentation>
					</xsd:annotation>
				</xsd:element>
			</xsd:sequence>
			<xsd:attribute name="Resolution" type="xsd:duration">
				<xsd:annotation>
					<xsd:documentation>So much bla bla.</xsd:documentation>
				</xsd:annotation>
			</xsd:attribute>
		</xsd:extension>
	</xsd:complexContent>
</xsd:complexType>

ClockDescriptor is an element on the root level of the schema, therefore it will become a message. Below that, a complex content element essentially tells us that ClockDescriptor is an extension of pm:AbstractDeviceComponentDescriptor, but adds a TimeProtocol field and a Resolution attribute.

This then turns into the following tree.

graph TD subgraph clockdescriptor["Message(ClockDescriptor)"] subgraph children["Children"] b1["Parameter(name=AbstractDeviceComponentDescriptor)"] b2["Parameter(name=TimeProtocol, isList=true)"] b3["Parameter(Resolution)"] end end subgraph AbstractDeviceComponentDescriptor["Message(AbstractDeviceComponentDescriptor)"] end subgraph CodedValue["Message(CodedValue)"] end subgraph duration["BuiltinType(xsd:Duration)"] end b1 -- parameter type --> AbstractDeviceComponentDescriptor b2 -- parameter type --> CodedValue b3 -- parameter type --> duration

The inheritance is resolved by applying composition and including the extended base type as well as all the extension type parameters as children. Where base types such as AbstractState are used as parameter types within the graph, they will be replaced by an AbstractStateOneOf which can be any of the extension types of the base type, or of course the base type itself. For example, AbstractMetricReport contains a list of AbstractMetricStateOneOf elements.

message AbstractMetricReportMsg {
  ...
    repeated AbstractMetricStateOneOfMsg metric_state = 3;
  ...
}

Now, once this is applied for every XML Schema element, we end up with a list of messages, which are sorted in an order that allows the language generator to simply traverse the graph in order and always have every previous type resolved. Notable exceptions occur if there are cycles in the graph, which can happen, and must be handled differently. Types included in such cycles are marked as being part of a cluster, the consequences of such clusters are language specific. In protobuf, this simply means that all nodes which are part of the cluster must be generated into the same .proto file.

The proto generator ultimately traverses the resulting graph and attaches its language types to each node. Every BaseNode then has a languageType attached in form of a ProtoType. These are essentially the same as BaseTypes, but they have rules on how to generate protocol buffers schema data attached to them. Finally, once every node has a ProtoType attached, the graph will be traversed a final time, this time writing the output for each child of the root of the graph into a file, thus resulting in a protobuf conversion of the XML Schema.

syntax = "proto3";

package org.somda.protosdc.proto.model.biceps;

option java_multiple_files = true;
option java_outer_classname = "ClockDescriptorProto";

import "org/somda/protosdc/proto/model/biceps/abstractdevicecomponentdescriptor.proto";
import "org/somda/protosdc/proto/model/biceps/codedvalue.proto";
import "google/protobuf/duration.proto";

message ClockDescriptorMsg {
  AbstractDeviceComponentDescriptorMsg abstract_device_component_descriptor = 1;
  repeated CodedValueMsg time_protocol = 2;
  google.protobuf.Duration resolution_attr = 3;
}

Generating Kotlin/Rust/*

proto-converter provides generators for programming languages as well. The basic principle remains the same as for protobuf but the output is changed to reflect the needs of the specific target. This includes, e.g.

  • different nesting behavior

  • introducing smart pointers to break cycles in the data model

  • language specific builtin types

The ClockDescriptor shown in the protobuf example would look like this in Kotlin:

package org.somda.protosdc.model.biceps

import org.somda.protosdc.model.biceps.AbstractDeviceComponentDescriptor
import org.somda.protosdc.model.biceps.CodedValue
import java.time.Duration

data class ClockDescriptor (
    val abstractDeviceComponentDescriptor: AbstractDeviceComponentDescriptor,
    val timeProtocol: List<CodedValue> = listOf(),
    val resolutionAttr: Duration? = null,
)

Generating mappers

proto-converter can additionally generate mappers for mapping between language specific representations of data and their protobuf representation. This also reduces the barrier of entry for clean separation of transport data types and language specific internal representations.

Since these mappers are automatically generated, they always match the current proto and language output the generator generates. Every supported language stores the information needed to generate the output on the nodes in the graph, which allows a mapper generator to determine the full layout of the target language. The task of mapping to and from the protobuf representation is very language specific, as it is necessary to know how protobuf schema files will be represented when compiled for that language. Field names might change from camel_case to PascalCase, nested messages might be in modules named after their parent message, or primitive types might not have an exact representation.

FAQ

Isn’t everything in proto3 optional? How do you express mandatory fields?

In short: The mappers do that. Non-primitive fields have a presence, which allows the receiver to determine whether a message field was explicitly set by the sender. When generating the protobuf model, optional primitive fields are represented by their *Value counterparts (string -> StringValue), which allows for presence checks as well. Mapping the message into the internal representation then enforces the presence of the mandatory fields as required by BICEPS.

BICEPS uses inheritance, protobuf doesn’t support that.

Composition works fine.

What about extensions?

protobuf does allow for extensions by using Any. We plan on supporting converting BICEPS XML extensions using the proto-converter, but due to time constraints, this is currently untested.

Are XML restrictions supported?

Not currently, but just like mandatory fields, validation can be integrated into the mapper.