Fixing up GPX ride data with lxml.etree

Last week I went for a ride on a rather grey day. The route was one of my usuals (40.2km in to the Goodwill Bridge), and I shaved a minute or two off the ride time. I was mightily disappointed to find that both ipBike and Strava reckoned I had ridden a bare 100m. This, despite the ipBike summary field claiming "40.270km with 347m climb in 1:43:41".

I had an incident like this happen to me earlier this year and was unable to fix it using SNAP, so I figured it was time to bite the bullet and fix the recorded file : particularly since the temperature, heart rate, cadence all appeared to be correctly recorded.

My first pass attempted to make use of Tomo Krajina's gpxpy, which was fine until I realised that that library cannot handle the TrackPointExtensions that Garmin defined.

I then tried to make headway using minidom, but got myself tied in knots trying to create new document nodes. I'm sure I missed something quite obvious there but I'm not really worried. Note in passing : Lode Nachtergaele's http://castfortwo.blogspot.com.au/2014/06/parsing-strava-gpx-file-with-python.html was really useful, and helped with my final attempt.

My final (and successful) attempt uses lxml.etree to pull out the info I need, skip a few points (since the rides had different elapsed times, but somewhat dubious) and then create a new GPX document with the munged data points.

While I've now ot a close-enough fixed up file, I'm down about 2km on the ride total, and up about 30m on the climbing total (according to Strava). I am quite happy with the results overall, though more than willing to accept that my code (below) is rather fugly. Good thing I'm not integrating this to a project gate!

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
#!/usr/bin/python
#
# Copyright (c) 2015, James C. McPherson. All Rights Reserved
#

from datetime import datetime, time, date
from copy import deepcopy
from lxml import etree as ET

NSMAP = {
    'gpxx': 'http://www.garmin.com/xmlschemas/GpxExtensions/v3',
    None: 'http://www.topografix.com/GPX/1/1',
    'gpxtpx': 'http://www.garmin.com/xmlschemas/TrackPointExtension/v1',
    'xsi': 'http://www.w3.org/2001/XMLSchema-instance'
}

schemaLocation = "http://www.topografix.com/GPX/1/1"
schemaLocation += "http://www.topografix.com/GPX/1/1/gpx.xsd"
schemaLocation += "http://www.garmin.com/xmlschemas/GpxExtensions/v3"
schemaLocation += "http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd"
schemaLocation += "http://www.garmin.com/xmlschemas/TrackPointExtension/v1"
schemaLocation += "http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd"
schemaLocation += "http://www.garmin.com/xmlschemas/GpxExtensions/v3"
schemaLocation += "http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd"
schemaLocation += "http://www.garmin.com/xmlschemas/TrackPointExtension/v1"
schemaLocation += "http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd"

gpxns = "{http://www.topografix.com/GPX/1/1}"
extns = "{http://www.garmin.com/xmlschemas/TrackPointExtension/v1}"

reftracks = []
failtracks = []


def parseTrack(trk, stime, keep=None):
    tracks = {}
    for s in trk.findall("%strkseg" % gpxns):
        for p in s.findall("%strkpt" % gpxns):
            # latitude and longitude are attributes of the trkpt node
            # but elevation is a child node in its own right
            el = {}
            el['lat'] = p.get("lat")
            el['lon'] = p.get("lon")
            el['ele'] = p.find("%sele" % gpxns).text
            if keep:
                el['trkpt'] = deepcopy(p)
                rfc3339 = p.find("%stime" % gpxns).text
                try:
                    t = datetime.strptime(rfc3339, '%Y-%m-%dT%H:%M:%S.%fZ')
                except ValueError:
                    t = datetime.strptime(rfc3339, '%Y-%m-%dT%H:%M:%SZ')
                    sec_t = int(t.strftime("%s"))
                    el['time'] = rfc3339
                    tracks[sec_t - stime] = el
    return tracks

##
# Main routine starts here.
##

rf1 = open("goodfile")
ff1 = open("dodgyfile")
rf = ET.parse(rf1)
ff = ET.parse(ff1)

rstimestr = rf.getroot().find("%smetadata" % gpxns).find("%stime" % gpxns).text
rstime = int(datetime.strptime(rstimestr, "%Y-%m-%dT%H:%M:%SZ").strftime("%s"))
fstimestr = ff.getroot().find("%smetadata" % gpxns).find("%stime" % gpxns).text
fstime = int(datetime.strptime(fstimestr, "%Y-%m-%dT%H:%M:%SZ").strftime("%s"))

for track in rf.findall("%strk" % gpxns):
    reftracks.append(parseTrack(track, rstime, False))

for track in ff.findall("%strk" % gpxns):
    failtracks.append(parseTrack(track, fstime, True))

# Now we need to fix node attributes in failtracks
# We're being lazy, so assume only one key for now

rpts = len(reftracks[0].keys())
fpts = len(failtracks[0].keys())

if fpts > rpts:
    skipn = fpts % rpts
else:
    skipn = rpts % fpts

# create a "fixed" track

ntrack = ET.Element("trk")
ntrkname = ET.SubElement(ntrack, "name")

ntrkname.text = ff.find("%strk" % gpxns).find("%sname" % gpxns).text

ntseg = ET.SubElement(ntrack, "trkseg")

for (n, v) in enumerate(failtracks[0]):
    if n % skipn is 0:
        continue
    badn = failtracks[0][n]['trkpt']
    goodn = reftracks[0][n]
    badn.set('lat', goodn['lat'])
    badn.set('lon', goodn['lon'])
    badne = badn.find("%sele" % gpxns)
    badne.text = goodn['ele']
    extn = badn.find("%sextensions" % gpxns)
    badn.append(extn)
    ntseg.append(badn)

# write a new file....
newwf = open("fixedp.gpx", "w")
gpx = ET.Element("gpx", nsmap=NSMAP)
gpx.set("creator", "James C. McPherson")
gpx.set("version", "1.1")
gpx.set("{http://www.w3.org/2001/XMLSchema-instance}schemaLocation",
        schemaLocation)
gpx.append(ntrack)
et = ET.ElementTree(gpx)
et.write(newf)
newf.close()