Ecopda Sensor Log Format

From CENS Urban Sensing

This originally was from an email thread on the urbansensing mailing list starting 4/20/06 titled ["A cut at sensor data xml format".]

The thread got cut, and continues [here.]


  • Used by
    • Sensors
    • Mediators
    • Aggregators
    • Consumers / Javascripts
  • Desired properties:
    • Easily and quickly merge and split sensor logs without having to edit or update the elements themselves, but by only appending lists of
        • -data elements
        • sensor elemens
        • publisher elements
      • For example, if I have two audio files from the same sensor, and I want to merge them, all I have to do is append the list of acoustic-data elements from one file to another.
      • Another example, the wrong way to format things would have been to include the start-time and end-time attributes in the "sensor" element rather than the *-data element.
    • Resolution control should require a minimal amount of manipulation:
      • Publish acoustic samples, gps coordinates, etc. with a specified time-window.
        • For example, I can have a 1 second acoustic sample with a time-window of 1:00 pm - 2:00 pm.
        • GPS coordinates are specified as windows as well by default.
    • Easily support queries consisting of a combination of:
      • time-windows
      • gps-coordinate windows
      • publisher
      • sensor type
      • Anything more involved would require the use of sensor-specific tools and/or a real database.
    • Accommodate data files that are the result of fusing data from multiple sources.

The portion of the specification that is most volitile is the *-data portion. I wanted to make sure that tools to answer simple queries using the attributes above were isolated from sensor-specific details, especially as we add more sensor types.


Contents

Example

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE sensor-data SYSTEM "http://research.cens.ucla.edu/sensor-data.dtd">
<sensor-data version="1.0">
 <publisher id="Kanoa's N80">
   <sensor name="microphone" type="acoustic">
     <acoustic-data
       start-time="2006-04-20 21:34:03"
       end-time="2006-04-20 23:09:54"
       sample-rate="44100"
       sample-size-bits="16"
       channels="1"
       format="WAV"
       encoding="base64">

       234ab34lsAseGAEGadda324DarqD23Veef35F1a3gr43S3eFB...
       l26lkjb12ab35lkdj353j1l26lkjb12ab35lkdj353j1l26lV...
       ...

     </data>
   </sensor>
   <sensor name="holux BT GPS" type="gps">
     <gps-data
       start-time="2006-04-20 21:34:03"
       end-time="2006-04-20 23:09:54">
       <gps-measurement>
         <min-lat>34.0322</min-lat>
         <min-lon>-118.2342</min-lon>
         <min-spd unit="mph">0.0000</min-spd>
         <min-alt unit="feet">334.355</min-alt>
         <max-lat>...</max-lat>
         <max-lon>...</max-lon>
         <max-alt unit="feet">...</max-alt>
         <max-spd unit="mph">...</max-spd>
       </gps-datum>
       <gps-datum>...</gps-datum>
       <gps-datum>...</gps-datum>
     </gps-data>
 </publisher>
</sensor-data>


Summary of previous email discussion

  • Why is GPS split into data and datum?
    • This is to make aggregation and resolution control easier. I think windowing (of time and space) is a nice way of communicating uncertainty, though it is a bit verbose. Merging is simply the union simply the taking the mininum of the two mininums, and the maximum of the two maximums (time, lat, lon).
  • The actual signals should be encoded within the XML, versus remaining as separate files:
    • Arguments Against Embedding (basically efficiency)
      • Easier for standard programs to read/manipulate the signal fils directly.
      • base64 encoding (or any other ascii encoding) takes unnecessary CPU and bandwidth.
      • Makes accessing the meta data difficult: XML parsers may choke on multi-megabyte XML files.
    • Arguments For Embedding (basically safety)
      • There's no chance of "losing" the media file, or having dangling pointers.
      • Safer in the sense that if you move/transfer the file, you know you have everything, rather than transfering all of the associated media files along with the meta data file.
      • If you want to delete an observation, you need only delete one file.


Jeff's alternative to embedding the media

[jb] What about starting off with a globally unique naming scheme in such a way that local caching would not be hard?

Ie,

metadata file has a record like:

<namespace>foo</namespace>
<accession>12930</accession>
<data>
       <url>http://data.cens.ucla.edu/foo/12930.wav</url>
       <url>http://backup.some.other.host/foo/12930.wav</url>
       <mime>whatever mime type</mime>
</data>

and if you end up writing a reasonably sophisticated local application that needs to have a local data copy, it could get configured with something like

<cache>
       <namespace>foo</namespace>
       <localurl>file:///C:/data/foo/</localurl>
</cache>

and have your app look there first before following the absolute url in the <data> record.

but for less sophisticated applications (being selfish here :), you could just grab bunches of files from /foo/ and not have to worry about decoding them or not being able to look them up by accession # later.


Andrew's Response

I think we can do both. Either the metadata file will say that the data is embedded, or it can specify a list of locations (local and remote) as to where to retrieve the file.