Proposal for a new official RuuviTag data format

Scrin · 16 September 2022 19:44

The primary goal of this format is to add support for custom data into an official data format. This is achieved by making the fields included in data format 5 optional by using a bitmask byte to indicate included measurements, and leaving all “unused” bytes for custom data.

The need for something like this has risen after the availability of RuuviTag related services have increased where custom format support can’t be reasonably achieved, for example I like using Ruuvi Station (backed by Ruuvi Gateway and Ruuvi Cloud), but I’d also like to add some custom sensors and/or measurements to some tags. If I were to create a custom data format for this purpose, I wouldn’t be able to view the standard data through Ruuvi Station. I could of course have the tag transmit in two data formats, but that’s really wasteful if all I need is a byte or two for the custom data/measurement.

In short, the data format would look like this:

The first byte is the data format byte, like on all data formats
The second byte is a bitmask, indicating the included measurements (see below)
Measurement data starting from the 3rd byte, in same order as the bitmask bits (see below)
Any data after the last measurement indicated by the bitmask is treated as custom data; length being anything between zero and what still fits into one advertisement

Bitmask bit, meaning and data size (same as in data format 5):

0: Temperature                 2 bytes
1: Humidity                    2 bytes
2: Pressure                    2 bytes
3: Acceleration (all axes)     6 bytes
4: Power info                  2 bytes
5: Movement counter            1 byte
6: Measurement sequence number 2 bytes
7: MAC address                 6 bytes

Example payload with only temperature, humidity, measurement sequence number and 3 bytes of custom data:

offset: content
     0: data format byte
     1: binary 11000010 (bitmask with bits 0, 1 and 6 set)
   2-3: temperature
   4-5: humidity
   6-7: measurement sequence number
  8-10: custom data

Since one byte is used for the bitmask, all original data from data format 5 don’t fit simultaneously into one broadcast. A “standard tag” with all sensors could for example alternate between including either power info or humidity, as both change relatively slowly so doubling the effective interval for those is not a big deal. Tags with less-than-all-sensors, such as the pro tags, can fit everything into one broadcast.

Additional points/thoughts/ideas/etc:

The bitmask could be two bytes to add more “standard measurements”, for example if some future RuuviTag models include some additional sensors (I’d still love to see a tag with a gyroscope!)
Since the data length is not fixed, it can be adapted to fit in a more constrained space, such as an encrypted data format that requires a few bytes for the encryption itself
The need for special “invalid values” is removed, since measurements that are not available can simply be omitted (such as pressure on the pro tag)
Unnecessary measurements/data can be disabled to save a little bit of transmission time, and thus battery
MAC is optional since it’s only needed for Apple devices, users who don’t need to support them can omit the MAC address from the payload to fit more custom data
While iOS doesn’t let applications access the remote MAC address, they do give a non-permanent UUID, which could be used to map broadcasts to the correct tags once the application receives one payload that includes the MAC

One extra idea, which I’m not entirely sure is worth the extra complexity, is having a special bitmask for a completely separate payload or two. Since a bitmask with all bits set is an invalid combination due to space constraints, a bitmask with all bits set could indicate a special diagnostic/metadata payload. Also a bitmask with no bits set makes no sense as the broadcast would be all custom data for which a custom data format is a better fit so this could be used to indicate a different kind of special diagnostic/metadata payload as well.

These special payloads could include things like:

Firmware version/variant (to enable tag firmware version tracking without a connection, ie. with the gateway)
Tag “model” (ie. regular, pro 3in1, pro 2in1, etc, if this is “detectable” on the tag itself)
Other details about the hardware, such as which temperatures sensor is present (ie. BME280 on the older tags and SHTC3 on the current ones)
Some “configuration like” data, such as the configured broadcast interval

This “special payload” could be transmitted very infrequently since the data included in there very rarely changes. If the bitmask was 2 bytes, some of this data, such as the firmware version, could be included in a “normal payload” as well.

And just like last time, all comments and suggestions are more than welcome to improve this idea even further.

lauri · 16 September 2022 20:02

Thanks for the suggestion, good ideas you’ve got! I personally don’t have constructive feedback to share on this tonight, but just decided to drop this one over here for the reference:

Scrin · 16 September 2022 20:26

This one is interesting, thanks for the link.

I do like the idea of variable length data which would allow adjusting the “precision”. For example I’d like to have a larger measurement sequence number to prevent rollovers but this is obviously such marginal need that it’s not worth sacrificing fixed-length data for this, but using variable-length data would make this possible.

The only thing I don’t really like is the “ineffective” use of bytes, which limits the amount of measurements that can fit in one packet. Though, typical devices have a lot less sensors than RuuviTags do, so it makes sense to trade some space efficiency for greater flexibility and measurement support for a more generic standard.

ssalonen · 17 September 2022 04:02

If I understand correctly, the proposal in OP makes the location of temperature and other measurements non-fixed, i.e. position within the payload depends on bitmask.

How compatible this is with Theengs decoder specification? With a quick look, I am not sure it is. Ruuvi theengs decoding is used by the openmqttgateway, for example.

This not a definite counter-argument for the proposal, but something I simply noted which would potentially lock this out from software using Theengs.

The proposal itself is very elegant, like how the NA case is handled

–

The bthome standard or similar might have the additional benefit that additional sensors would be advertised in a standard format. That is, systems implementing bthome could have plug and play experience even with ruuvi having additional sensors?

Your criticism is valid though, it is quite wasteful as the format is self-declaring with sensor types etc

Scrin · 17 September 2022 07:48

With a quick look, it looks like the “fully dynamic behavior” can’t be reasonably specified using that specification, however one could workaround it by defining a specific set of different combinations using the conditions, but that’s a rather clunky workaround.

This is indeed a very valid concern, however I’m not proposing to replace data format 5, but rather add a new official format alongside it, and hopefully see support for it in things like Ruuvi Station. This proposed format doesn’t really add much value for the current, unmodified tags, and is more aimed at hobbyists who like to modify their tags by adding custom sensors to them, this is especially relevant if a tag has an expansion port (which I believe is available on some pro tags?).

Likewise a “BTHome firmware for RuuviTag” could equally exist alongside other firmware versions for users who would like to have plug and play experience with systems implementing BTHome, at the cost of some battery life and measurement availability, a bit like there was an “eddystone firmware” offered alongside the “weather station” firmware in the very early days.

–

After some overnight thinking after reading about the BTHome specification, I had this idea of using a bit of the measurement data to indicate the length, which could be used to sacrifice some resolution on the “standard measurements” in order to fit in more custom data.

For example the Temperature currently ranges from approx -163ºC to +163ºC at 0.005ºC increments, which is a bit unnecessarily granular resolution. Using one bit for a flag and doubling the increment would still allow the current practical resolution of 0.01ºC at the same range, but the bit could be used to indicate whether the temperature measurement is 1 or 2 bytes, and in case of 1 bytes (int8), it would be 1ºC increments offering a range from -128 to +127.

Humidity could equally well support the same principle, however being a positive number, in 1 byte (uint8), the resolution would be 0.5% rather than 1%.

Pressure on the other hand is already at the limit, so this wouldn’t work too well there.

Acceleration is right at the edge, sacrificing one bit would reduce the resolution to very low for practical uses, unless additional bits are used to make the range and resolution dynamic, the acceleration data has three numbers after all so three bits (of which one for the length) could be used.

Power info is already really tightly packed, containing both the TX power and battery voltage, so no real room there.

Movement counter could be updated to allow increased values, but I don’t think it’s necessary. The “number of movements” is not really an useful measurement since it’s very hard to distinguish between “1 movement” and “2 movements”, and it’s mainly only useful to detect if there has been movement at all, which can be achieved by just looking if the number changes.

Measurement sequence number could be increased to support 4 bytes (uint32), which is what I’d personally like to use to prevent rollovers.

I didn’t really give this “2nd part” that much thinking yet to decide if it would be even worth the extra complexity, but I just wanted to throw the ideas I got last night here, in case it would spark some additional or better ideas

otso · 17 September 2022 11:12

I like the idea of more flexible payloads. Some original design decisions about the Ruuvi data format have been driven by mobile phone usage, which means that packet reception is pretty poor and we need to refresh data often so users have a snappy experience when they open the app.

If we can drop support for mobile phones it gives a lot of freedom for the data format. For example we could have a firmware that broadcasts data of every available sensor once per 5 minutes and repeats the advertisement a few times so that the always-on listener has a better chance of catching the data.

If we’re doing a custom firmware variant I think using the BTHome format makes a lot of sense, since there should be a plenty of ecosystem already supporting the data format.

dgerman · 17 September 2022 16:29

I would like to suggest:

A) MeasureSequenceNumber is actually part of the transport reliability and should NOT be optional. Eliminating the need for a parameter presence bit.

B) A similar case con be made for the power info.

C) The MAC address (or at least part of it) is also not optional. I would like to suggest that a parameter presence bit be used to indicate a 6 byte OR 2 byte field. The 2 byte field containing only the lower 2 bytes of of the MAC address. Using only 2 bytes will provide sufficient uniqueness. In the incredibly unlikely case that a set of Ruuvi Tags in a given network had the same low 2 bytes, one of those tags could be used in another network. This also frees 2 bytes for other parameters.

D) Since the measure sequence number, power info, parameter presence bits and a portion of the MAC address are required they should placed at the beginning of the packet. The optional parameters following them. This allows for various fields to be present in some packets and absent in others
This significantly simplifies the selection for omission of packets from particular tags as well as those containing needed or un-needed sensor readings at a particular receiver.

Layout: (format 09 )(Whatever)

09 MSEQ PW MACX PP xx xx xx xx xx xx

PP:= 80: 2 byte MAC address, 40:temperature, 20: humidity, 10:pressure, 08 acceleration, 04 movement counter, 02 :TBD, 01:additinal PP byte follows previous values.

(TBD : ToBeDecided)

example:

 09 7E48     E6   22B1       13      34   07E3
FMT MSQ#     PP   2byteMAC   25°C    95%   mw/MM
             80                            milli watt
             40                            per square 
             20                            meter
             04
             02 incident solar radiation

              pressure and acceleration omitted.

I am in agreement with Scrin regarding the resolution for temperature and humidity as the current resolution far exceeds the accuracy of the sensors (and any alternates in the price range acceptable for Ruuvi Tags)

Another thought regarding temperature, humidity and pressure providing each with a 2 bit flag representing the direction of change over the last 5(?) minutes. A value of 00 being no change, 10 increase, 01 decrease , 11 illegal. This would permit a single sample to include the change information without the various receivers trying to calculate this information.

Scrin · 18 September 2022 08:46

I think we should better define the typical use-cases and expectations regarding them. What I believe is the current case is something like this:

Case 1: “Current out of the box experience”: The tags are usable using a mobile phone, a gateway, or any other device/setup that “supports ruuvitags” (note the order of words). For this I believe data format 5 is still the best fit, at least until new kinds of sensors are added to the tags. As such, this is probably the best fit for the factory firmware
Case 2: Using 3rd party devices/software/services that don’t support ruuvitags; instead the “ruuvitags could support them”. After some more research, I believe BTHome is the most commonly accepted “generic format” for this kind of sensor data, so offering an official ruuvitag firmware variant using this format would make sense, but this definitely shouldn’t replace the from-the-factory firmware needed for the current “out of the box experience”
Case 3: DIY hobbyists who otherwise belong to Case 1: Currently there are no “perfect” solutions to this case; if they want to add anything custom to the current formats, all “out of the box experience” apps/services/etc become unavailable, unless using “workarounds” such as broadcasting everything twice, using an official format and a custom one. An official data format that supports custom data would resolve this (as outlined in the original post)

“BTHome data format” would more or less cover Case 3 if support for that was added to Ruuvi Station and similar (that fall into the “Case 1 category”), however that would sacrifice the incredible amount-of-data-to-battery-life ratio that RuuviTags offer, a key selling point for me and I’m pretty sure I’m not the only one who appreciates years of battery life with fresh real time measurements every few seconds.

For the “Case 2” mentioned above, this makes sense indeed. However with the “Case 3” in mind, since all this new proposed format would add (based on current proposal) is support for custom data, I don’t think it’s necessary to even provide a firmware that implements it. If someone were to add a custom sensor to a tag, they would need to modify the firmware (or create a new one from scratch) anyway to add support for this sensor, and while at it they can just switch to this new format to include the data. Of course providing a “framework” or a sample in the firmware would be a great plus for “3rd party development”, but definitely not a requirement for the data format itself.

Of course for that to work, apps/services/etc that support ruuvitags would need to include support for this data format (or any other custom format), but merely having the data format be “official” makes it much easier to justify adding support for it to some 3rd party app/tool/service

–

I was thinking about this for a while, whether this should be always present or not, but since I did not come up with any other use for this bit I decided it would be better let a user/developer have more control, perhaps they want to use a different kind of mechanism for this using custom data, making the “standard” sequence number redundant

I think power info should be optional since someone could run their tags off of wall power for example, making the “battery voltage” irrelevant. Although like in point A, if you have a better suggestion for this bit, then it could be considered whether that tradeoff would be better if the bitmask is not increased to two bytes

Since the MAC address is always part of the raw bluetooth packet, duplicating it into the manufacturer specific data (or “the payload”) is unnecessary under most circumstances. I think the MAC should be optional, since it’s only needed for iOS apps due to Apple restrictions that prevent iOS apps from accessing the remote MAC address of an undirected advertisement, so users who don’t need to support iOS devices directly (ie. either they don’t own any, or they read the data through the Ruuvi Gateway for example) can conveniently get a lot of extra space for custom data here.

Another thought is that the “length” of the MAC included in the payload could be made variable length using the first two bits in the “measurement data” (if going by the “variable length measurements” idea) since those bits are technically redundant since the tags use static random addresses, which will always have the first two bits of the address be 1 (which is why ruuvitag MAC addresses always start with C, D, E of F)

If some measurements should be mandatory (see previous points), then those measurement should indeed be moved towards the beginning of the packet like you suggested

The use-case for something like this is a bit specific; mainly beneficial for cases where the broadcasts can be read very infrequently for example, so something like this could very well be provided through the custom data. If Ruuvi Station mobile app and other similar ones were to add some kind of support for utilizing this, then it might make sense to include it in the “official part” of the specification rather than the “custom part”

dgerman · 18 September 2022 16:07

Regarding A) MeasureSequenceNumber not optional.
This is a reliability issue and making it NOT optional encourages receiver developers to check it.

Regarding C) The MAC address (or at least part of it) is also not optional. This also means that receives only need to look as the “manufacturer specific data”.

Including MeasureSequenceNumber and the (source) MAC address (or at least part of it) is also consistent with most other broadcast protocols (they must know something)

Regarding B) power info optional?.
Scrin has a valid point regarding using “wall power” However, since the tags come with a battery and last VERY long I’m not sure why anyone would do that. Seems(to me) that is an unusable case. I’m OK with making it optional with using a parameter presence bit.
B1) Notice I propose a parameter presence bit (01) that indicates that following the 7 optional values, an additional parameter presence byte provides for 7 additional values and if it fits in long format packets even another parameter presence byte.

E) Is it interesting or necessary to include the TX power (in every packet) ?

Scrin · 19 September 2022 12:06

I don’t think this has any effect on the development of receivers; even if a receiver is forced to receive the value, it does not mean the receiver has to take the value into consideration. Of course I do agree that end-receivers should check the measurement sequence number to de-duplicate measurements if the same measurement is broadcasted multiple times and/or received through multiple intermediate devices, such as multiple gateways, but I don’t think making it required in the data format is the solution to that

For developers it’s actually much easier to just check the actual source MAC of the received packet, since practically all BLE libraries offer a direct getter for the MAC address, eliminating the need of parsing it from a byte/hex sequence. The only exception is iOS development where this is not possible, and most non-iOS ruuvitag client libraries don’t even offer access to the MAC included in the payload, only the actual source MAC from the packet

Measurement sequence number or a similar mechanism on some protocols, but I don’t think I have ever seen any other protocol that duplicates the MAC (or a portion of it). Do you have some examples?

An example for this would be using a sensor that requires a lot of power, such as a carbon dioxide sensor. This is actually one of the use-cases I have, currently my CO2 sensors are ESP8266 based contraptions that take up quite a bit of physical space and require wifi connectivity. I’d like to use ruuvitags as a replacement here to reduce the number of “individual devices” I have around my house.

I guess this falls more into the “if we make the bitmask be 2 bytes long”. I don’t think anything longer than 2 bytes makes that much sense, since the space is already very limited. But indeed if the bitmask is extended to more than 1 byte, the length of the bitmask could be variable length using something like that

Depends on the use-case. For typical cases the TX power is not necessary to be included at all since typically it’s fixed. For special cases, such as variable transmission power, it can be convenient to have this information, especially since it doesn’t take any “extra space” if combined with the battery voltage, like it is in data format 5

dgerman · 20 September 2022 12:12

I am confused regarding you response related to Measurement sequence number.

but I don’t think I have ever seen any other protocol that duplicates the MAC (or a portion of it). Do you have some examples?

Scrin · 20 September 2022 13:04

That part of the response refers to the the inclusion of MAC address (or a part of it), bolded in this quote:

To rephrase the question I had; do you have any examples of protocols that duplicate the MAC address (or a part of it) in the payload, or did I understand something incorrectly?

dgerman · 20 September 2022 15:59

Little confusion here:

Other protocols include seq # and source address.

dgerman · 20 September 2022 16:11

Regarding C) including he MAC address (or at least part of it)

Although “BLE libraries offer a direct getter for the MAC address” being in a fixed location in the payload doesn’t really require “parsing”. All other data is in the payload, The payload must be processed. Rather than checking the address by one means and the payload for the rest of the data by another means seems less clear.

C2) Back to the apple issue, do you know if Mac OS (or IOS app running under newer Mac OS) support retrieval of MAC address ? Based on the reasoning Apple has in not making it available, I would imagine not ?

PS I have a M1 MacBoobkPro and my first attempt to run a blue tooth IOS App fails. I have many BT devices nearby

Better results with Bluetooth Inspector which does report advertisement response, but I digress.

Scrin · 20 September 2022 17:22

Parsing is not the correct term to use here, I should’ve been more clear. What I meant is, for example with data format 5, practically all non-iOS parsing libraries simply stop processing the data after the measurement sequence number, since it would be unnecessary work to check the presence of the data there (for example to avoid crashes due to out-of-bounds array access when receiving corrupt data that has been “cut short”), and then picking the bytes out of the data into a MAC field in whatever format is applicable for the language used

This is pretty unlikely to change, since the reason for Apple blocking the access to remote MAC on undirected advertisements is to increase privacy by preventing apps from tracking the user location by scanning for known undirected bluetooth beacons.

This is also why it’s required to give the location permission on Android, since being able to uniquely and globally identify undirected bluetooth beacons makes it possible to track the location of the user if the physical location of the beacon is known.

iOS does give each remote device a UUID so that applications can tell different remote devices apart, but this UUID is not global: two different iOS devices will generate a different UUID for the same remote device, but the UUID is still consistent on the same device. Basically iOS apps can distinguish between different remote devices, but they can’t globally identify which remote device it is based on the undirected broadcast alone.

I guess it’s possible for Apple to change their decision regarding this and instead just require additional permissions for this, like Android does, but I wouldn’t count on it.

In a way this also means that RuuviTags broadcasting in data format 5 kind of decrease the privacy of random people, since an app on a phone could track the location of its user if someone gathers a map of physical locations of RuuviTags, since the data format 5 broadcasts include a globally identifiable identifier (the MAC duplicated in the payload).

However in practice this is an unlikely scenario since the amount of RuuviTags globally is small, especially if compared to all broadcasting BLE devices. This is also why I was interested if there are protocols that include globally unique identifiers in the payload, which would allow for location tracking.