Ecopda Sensor Log Format
From CENS Urban Sensing
This originally was from an email thread on the urbansensing mailing list starting 4/20/06 titled ["A cut at sensor data xml format".]
The thread got cut, and continues [here.]
- Used by
- Sensors
- Mediators
- Aggregators
- Consumers / Javascripts
- Desired properties:
- Easily and quickly merge and split sensor logs without having to edit or update the elements themselves, but by only appending lists of
- -data elements
- sensor elemens
- publisher elements
- For example, if I have two audio files from the same sensor, and I want to merge them, all I have to do is append the list of acoustic-data elements from one file to another.
- Another example, the wrong way to format things would have been to include the start-time and end-time attributes in the "sensor" element rather than the *-data element.
- Resolution control should require a minimal amount of manipulation:
- Publish acoustic samples, gps coordinates, etc. with a specified time-window.
- For example, I can have a 1 second acoustic sample with a time-window of 1:00 pm - 2:00 pm.
- GPS coordinates are specified as windows as well by default.
- Publish acoustic samples, gps coordinates, etc. with a specified time-window.
- Easily support queries consisting of a combination of:
- time-windows
- gps-coordinate windows
- publisher
- sensor type
- Anything more involved would require the use of sensor-specific tools and/or a real database.
- Accommodate data files that are the result of fusing data from multiple sources.
- Easily and quickly merge and split sensor logs without having to edit or update the elements themselves, but by only appending lists of
The portion of the specification that is most volitile is the *-data portion. I wanted to make sure that tools to answer simple queries using the attributes above were isolated from sensor-specific details, especially as we add more sensor types.
Contents |
Example
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE sensor-data SYSTEM "http://research.cens.ucla.edu/sensor-data.dtd">
<sensor-data version="1.0">
<publisher id="Kanoa's N80">
<sensor name="microphone" type="acoustic">
<acoustic-data
start-time="2006-04-20 21:34:03"
end-time="2006-04-20 23:09:54"
sample-rate="44100"
sample-size-bits="16"
channels="1"
format="WAV"
encoding="base64">
234ab34lsAseGAEGadda324DarqD23Veef35F1a3gr43S3eFB...
l26lkjb12ab35lkdj353j1l26lkjb12ab35lkdj353j1l26lV...
...
</data>
</sensor>
<sensor name="holux BT GPS" type="gps">
<gps-data
start-time="2006-04-20 21:34:03"
end-time="2006-04-20 23:09:54">
<gps-measurement>
<min-lat>34.0322</min-lat>
<min-lon>-118.2342</min-lon>
<min-spd unit="mph">0.0000</min-spd>
<min-alt unit="feet">334.355</min-alt>
<max-lat>...</max-lat>
<max-lon>...</max-lon>
<max-alt unit="feet">...</max-alt>
<max-spd unit="mph">...</max-spd>
</gps-datum>
<gps-datum>...</gps-datum>
<gps-datum>...</gps-datum>
</gps-data>
</publisher>
</sensor-data>
Summary of previous email discussion
- Why is GPS split into data and datum?
- This is to make aggregation and resolution control easier. I think windowing (of time and space) is a nice way of communicating uncertainty, though it is a bit verbose. Merging is simply the union simply the taking the mininum of the two mininums, and the maximum of the two maximums (time, lat, lon).
- The actual signals should be encoded within the XML, versus remaining as separate files:
- Arguments Against Embedding (basically efficiency)
- Easier for standard programs to read/manipulate the signal fils directly.
- base64 encoding (or any other ascii encoding) takes unnecessary CPU and bandwidth.
- Makes accessing the meta data difficult: XML parsers may choke on multi-megabyte XML files.
- Arguments For Embedding (basically safety)
- There's no chance of "losing" the media file, or having dangling pointers.
- Safer in the sense that if you move/transfer the file, you know you have everything, rather than transfering all of the associated media files along with the meta data file.
- If you want to delete an observation, you need only delete one file.
- Arguments Against Embedding (basically efficiency)
Jeff's alternative to embedding the media
[jb] What about starting off with a globally unique naming scheme in such a way that local caching would not be hard?
Ie,
metadata file has a record like:
<namespace>foo</namespace>
<accession>12930</accession>
<data>
<url>http://data.cens.ucla.edu/foo/12930.wav</url>
<url>http://backup.some.other.host/foo/12930.wav</url>
<mime>whatever mime type</mime>
</data>
and if you end up writing a reasonably sophisticated local application that needs to have a local data copy, it could get configured with something like
<cache>
<namespace>foo</namespace>
<localurl>file:///C:/data/foo/</localurl>
</cache>
and have your app look there first before following the absolute url in the <data> record.
but for less sophisticated applications (being selfish here :), you could just grab bunches of files from /foo/ and not have to worry about decoding them or not being able to look them up by accession # later.
Andrew's Response
I think we can do both. Either the metadata file will say that the data is embedded, or it can specify a list of locations (local and remote) as to where to retrieve the file.
