Proposed next high-precision Data Format

To sum up latest discussions etc from Slack, the latest proposal would be as follows:

offset = description
0      = Data format (8bit) whatever the next number will be (likely 5)
1-2    = Temperature (16bit signed) in 0.005 degrees (-163.84 to 163.83 range)
3-4    = Humidity (16bit unsigned) in 0.0025% (0-163.83% range, though realistically 0-100%)
5-6    = Pressure (16bit unsigned) as it is in format 3
7-8    = Acceleration-X (16bit signed) as it is in format 3
9-10   = Acceleration-Y (16bit signed) as it is in format 3
11-12  = Acceleration-Z (16bit signed) as it is in format 3
13-14  = Power info (11+5bit unsigned), first 11bits unsigned is the battery voltage above 1.6V, in millivolts (1.6V to 3.647V range). last 5 bits unsigned is the TX power above -40dBm, in 2dBm steps. (-40dBm to +20dBm range)
15     = Movement counter (8bit unsigned), incremented by motion detection interrupts from LIS2DH12
16-17  = Measurement sequence number (16bit unsigned), each time a measurement is taken, this is incremented by one, used for measurement de-duplication (depending on the transmit interval, multiple packets with the same measurements can be sent, and there may be measurements that never were sent)
18-23  = Tag ID, the 48bit MAC address (rather than the hw ID on the nRF chip to provide compatibility with current datasets that differentiate tags based on the MAC)

Most notable was replacing packet sequence number with measurement sequence number, as it’s more relevant for measurement de-duplication.


@io53 Do you have any feedback on the format?

FYI The following is a response I got back from in response to my question: from the data sheet AH is +/- 3% in the range of 20% - 80% . What is the accuracy above 80%?

From the factory:

According to our characterization, the absolute accuracy is also +/-3%RH from 80%RH ~ 90%RH range and within +/-6%RH from 90%RH ~ 100%RH.
The raw data from BME280 is 16-bit unsigned integer named as UH. After the formula based on factory trimmed values and final temperature (FT) reading, UH is converted to final humidity value FH. Every digit change in UH, will cause about 0.008%RH change in FH. This is where the 0.008%RH resolution parameter comes from.

reminder. shutdown occurs at 1.7V

So using a 16bit unsigned int makes sense in the Data format for humidity. Regarding the reported voltage, I think the measured shutdown voltage should be tested, as based on my prior measurements, the tag reports about 0.1V less voltage than the battery actually supplies, so 1.6V minimum in the format might actually be too high, if the tag can go down to 1.7V real. :thinking:

@otso do I remember correctly that you have an adjustable lab power supply?

When this version is official?

We’ll continue to ship the tags with current format for the time being. Updated format will be offered as DFU package.

Should there be a field for firmware version? (not packet format) Hardware version?

You might consider using format type 4 (not to be confused with ‘#4’ used by the URI which includes the single ID byte)

I think that kind of rarely/never changing info (firmware version and hardware version) would belong to the “scan response” along with a friendly name and other things (to be implemented in the future).

Also, the data format id does not directly differentiate between URL and RAW modes, you can very well send formats 2 and 4 as manufacturer-specific data, or format 3 in an Eddystone URL if you use a short enough domain. This allows the actual data to be sent in many different formats, in different ways (encoded in an Eddystone URL, sent as manufacturer-specific data in a undirected BLE advertisement, or sent as a scan response, or even sent along a “traditional” directed communication)

1 Like

Format 3 uses 14 bytes which encodes to 19 characters which exceeds the “encoded URL” length without the domain name.

Correct, for Base 64. Nobody says you have to use Base 64 :slight_smile: Base 92 can fit 13 bytes of data per 18 characters, and you could as well come up with your own encoding

Just a note, researching battery information I found " Service life…discharge end-point voltage of 2.0V"
CR2477 data sheet

Also “cut off voltage down to 2.0V at 20°C” at lithium coin batteries

I think repeating the MAC address is redundant and a big waste in such an environment. Apple might fix their software one day to reveal the MAC address from the Bluetooth advertisement? The chip ID could be more useful. People using buggy software stacks can still use this field for identification. And then the chip ID could also be correlated between raw and URL modes.

There could be space for firmware version and other stuff if you encoded bits at non-byte boundaries. Eg. you can chip off 2-3 bits from format id, humidity and each acceleration field. I see how it seems difficult for beginners, so for each format version I would suggest publishing a reference decoder library in a couple of languages. This could be community contributed even.

I’m pretty sure Apple will not reveal foreign MAC addresses in the foreseeable future, as they were “intentionally hidden” for “security reasons”. If this changes some day, then perhaps a new data format can be introduced that does not contain the MAC address. Also the data format 4 (url with partial id) does not contain the entire HW id, so a 100%-proof mapping cannot be done. Most existing implementations already differentiate the tags by MAC address as that’s the only thing available for differentiating format 3 tags, it’s much more beneficial to “keep the compatibility” with MAC rather than partial ID, especially since the URL formats are not intended for logging purposes due to the reduced accuracy.

There has been discussion (mostly on Slack) about including the “rarely changing data” such as tag revision, firmware version and hw id in the scan response rather than the broadcast/beacon packet, and that way you can then keep a record about tags, their ID’s and MAC addresses.

Of course if you have specific needs that aren’t fulfilled with any of the “official data formats”, you can always create your own.

If Apple doesn’t want people to see MAC addresses, I think they might be creative enough to do something bad against Ruuvi users who circumvent that. But that’s a very pessimistic scenario.

The main reason why I said the MAC would be redundant is the following: People who are using iOS can’t access MAC addresses now anyway, so they won’t care what the new field will contain as they have nothing in the past to correlate this with. And people who are not on iOS they already have access to the MAC, so they won’t get anything new from the newer data format. The only way this can be useful to someone is if you have a collecting software on non-iOS (with MACs) and you want to migrate that to iOS, which is quite improbable I think.

Adding non-measurement device info in the scan response sounds great. I didn’t know these types of beacons can emit multiple frame types.

Well, for example users could be using multiple devices to collect ruuvitag data to a more centralized place, for example using the official Android application to send the data to a gateway, and I’m pretty sure the iOS application will have the same feature when it’s done. :slight_smile:

And as far as I know, no existing implementations rely on the tag id (only on format 4 currently, and not in full there) for differentiating between tags when logging data, so even if it’s of little use to have the MAC included in the format, it’s more than nothing

1 Like

This format is now written down and pull request is open:

As discussed on slack, invalid values were defined:

  • Largest presentable number on UINTs
  • Smallest presentable number on INTs
  • All bits set on “others”, such as MAC

Now is pretty much final chance to comment on the format :slight_smile:

maybe just a documentation issue: shouldn’t be the ‘invalid’ values for battery voltage = 2047 and tx power = 31, since 2048 / 32 can’t be represented by 11 / 5 bits?