I couldn't really find a definition of the canonical form of UCUM expressions.
Section D in the UCUM specification shows some example unit terms and their canonical form unit. But these only imply a certain structure/definition, there is no explicit statement about the canonical forms.
Here is my definition of a canonical form:
(Most of it is adopted from the Ucum-java library.
The canonical form may...
- only contain base units or be the number one (
m, s, g, rad, K, C, cd, 1)
- only contain multiplication, no division
- have (positive or negative) exponents. Positive exponents don't have their
+ symbol while negative exponents always have their - symbol. If the exponent is equal to one, then the exponent may not be written. The exponent may not be zero as this would indicate that the associated unit is superfluous and can be canonicalized further.
- not have an annotation.
- be ordered alphabetically.
Examples:
g.m
g2.m-3
C2.cd3.g-2.K-4.m2.rad3.s-2
1
The most obvious problem with this is the missing value. For example, the two expressions 2 and 3 would canonicalize to 1 with a canonicalization factor of 2and 3 respectively. Without this, the canonicalization would imply that 2=3. Now there is no current way to express these factors within the UCUM specification. I am not sure if there is an easy way to solve this as the canonicalization factor is not limited to integers, it can be any number.
I couldn't really find a definition of the canonical form of UCUM expressions.
Section D in the UCUM specification shows some example unit terms and their canonical form unit. But these only imply a certain structure/definition, there is no explicit statement about the canonical forms.
Here is my definition of a canonical form:
(Most of it is adopted from the Ucum-java library.
The canonical form may...
m,s,g,rad,K,C,cd,1)+symbol while negative exponents always have their-symbol. If the exponent is equal to one, then the exponent may not be written. The exponent may not be zero as this would indicate that the associated unit is superfluous and can be canonicalized further.Examples:
g.mg2.m-3C2.cd3.g-2.K-4.m2.rad3.s-21The most obvious problem with this is the missing value. For example, the two expressions
2and3would canonicalize to1with a canonicalization factor of2and3respectively. Without this, the canonicalization would imply that2=3. Now there is no current way to express these factors within the UCUM specification. I am not sure if there is an easy way to solve this as the canonicalization factor is not limited to integers, it can be any number.