<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>/devices/pseudo/bitbucket@0,0:pseudo (Posts about ETL)</title><link>https://www.jmcpdotcom.com/blog/</link><description></description><atom:link href="https://www.jmcpdotcom.com/blog/categories/etl.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2022 &lt;a href="mailto:blogadmin@jmcpdotcom.com"&gt;jmcp&lt;/a&gt; </copyright><lastBuildDate>Thu, 21 Apr 2022 02:58:33 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>init(Apache Spark)</title><link>https://www.jmcpdotcom.com/blog/posts/2019-10-11-apache-spark-init/</link><dc:creator>jmcp</dc:creator><description>&lt;p&gt;In a &lt;a class="reference external" href="https://www.jmcpdotcom.com/blog/posts/2019-10-04-microservices-part-2/"&gt;previous post&lt;/a&gt; I wrote about how I've started down the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Data_science"&gt;Data Science&lt;/a&gt;
path, kicking off with an exploration of &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Sentiment_analysis"&gt;sentiment analysis&lt;/a&gt; for political
tweets. This is a topic which I will come back to in the future, not least
because the &lt;a class="reference external" href="http://www.nltk.org"&gt;nltk&lt;/a&gt;  &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Text_corpus"&gt;corpus&lt;/a&gt; I've made use of for &lt;a class="reference external" href="https://github.com/jmcp/au-pol-sentiment"&gt;au-pol-sentiment&lt;/a&gt; is
based on &lt;em&gt;British&lt;/em&gt; political tweets. While Australia and Britain share a
common political heritage, I'm not completely confident that &lt;em&gt;our&lt;/em&gt; political
discourse is quite covered by &lt;a class="reference external" href="https://www.nltk.org/howto/twitter.html"&gt;that corpus&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In the meantime, another aspect of &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Data_science"&gt;Data Science&lt;/a&gt; in practice is the use of
an ecosystem called &lt;a class="reference external" href="https://spark.apache.org"&gt;Apache Spark&lt;/a&gt;. Leaving aside my 20+ years of muscle
memory spelling it as &lt;a class="reference external" href="https://sparc.org"&gt;SPARC&lt;/a&gt;, this is a &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Machine_learning"&gt;Machine Learning&lt;/a&gt; engine,
described on the homepage as &lt;strong&gt;a unified analytics engine for large-scale data
processing&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;My experience is that when I want or need to learn a new toolkit or utility,
the best way to do so is by trying to directly solve a specific problem with
it. One problem (ok, not really a problem, more a set of questions) I have is
that with all of the data I've gathered since 2013 from my &lt;a class="reference external" href="https://www.jmcpdotcom.com/blog/posts/2018-04-03-monitoring-my-inverter/"&gt;solar inverter&lt;/a&gt;
I'm dependent on &lt;a class="reference external" href="https://pvoutput.org"&gt;pvoutput.org&lt;/a&gt; for finding per-year and per-month
averages, maxima and minima. So with a &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Data_science"&gt;data science&lt;/a&gt;-focused job
interview (with &lt;a class="reference external" href="http://labs.oracle.com"&gt;Oracle Labs&lt;/a&gt;) approaching, I decided to get stuck in
and get started with &lt;a class="reference external" href="https://spark.apache.org"&gt;Apache Spark&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The first issue I faced was implementing the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Extract,_transform,_load"&gt;ETL&lt;/a&gt; pipeline. I have two
types of files to load - the first contains the output from
&lt;a class="reference external" href="https://github.com/jcroucher/solarmonj"&gt;solarmonj&lt;/a&gt;, the second has the output from my &lt;a class="reference external" href="https://www.jmcpdotcom.com/blog/posts/2018-04-03-monitoring-my-inverter/"&gt;solar inverter&lt;/a&gt;
script.&lt;/p&gt;
&lt;p&gt;Here's the first schema form:&lt;/p&gt;
&lt;table&gt;
&lt;colgroup&gt;
&lt;col style="width: 42%"&gt;
&lt;col style="width: 58%"&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th class="head"&gt;&lt;p&gt;Field name&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Units&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;Timestamp&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;seconds-since-epoch&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;Temperature&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;float (degrees C)&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;energyNow&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;float (Watts)&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;energyToday&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;float (Watt-hours)&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;powerGenerated&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;float (Hertz)&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;voltageDC&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;float (Volts)&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;current&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;float (Amps)&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;energyTotal&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;float (Watt-hours)&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;voltageAC&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;float (Volts)&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;br&gt;&lt;/p&gt;
&lt;p&gt;The second schema is from jfy-monitor, and has this schema:&lt;/p&gt;
&lt;table&gt;
&lt;colgroup&gt;
&lt;col style="width: 31%"&gt;
&lt;col style="width: 69%"&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th class="head"&gt;&lt;p&gt;Field name&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;Units&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;Timestamp&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;ISO8601-like ("yyyy-MM-dd'T'HH:mm:ss")&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;Temperature&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;float (degrees C)&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;PowerGenerated&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;float (Watts)&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;VoltageDC&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;float (Volts)&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;Current&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;float (Amps)&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;EnergyGenerated&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;float (Watts-Hours)&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;VoltageAC&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;float (Volts)&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;There are two other salient pieces of information about these files. The
first is that the &lt;em&gt;energyTotal&lt;/em&gt; and &lt;em&gt;EnergyGenerated&lt;/em&gt; fields are running
totals of the amount of energy generated on that particular day. The
second is that in the first version of the schema, &lt;em&gt;energyTotal&lt;/em&gt; needs
to be multiplied by 1000 to get the actual KW/h value.&lt;/p&gt;
&lt;p&gt;With that knowledge ready, let's dive into the code.&lt;/p&gt;
&lt;p&gt;The first step is to start up a &lt;a class="reference external" href="https://dzone.com/articles/introduction-to-spark-session"&gt;Spark&lt;/a&gt; session:&lt;/p&gt;
&lt;pre class="code python"&gt;&lt;a id="rest_code_b924a62ab4234a699a0c981baaffe174-1" name="rest_code_b924a62ab4234a699a0c981baaffe174-1"&gt;&lt;/a&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;pyspark.sql.functions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;date_format&lt;/span&gt;
&lt;a id="rest_code_b924a62ab4234a699a0c981baaffe174-2" name="rest_code_b924a62ab4234a699a0c981baaffe174-2"&gt;&lt;/a&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;pyspark&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SparkContext&lt;/span&gt;
&lt;a id="rest_code_b924a62ab4234a699a0c981baaffe174-3" name="rest_code_b924a62ab4234a699a0c981baaffe174-3"&gt;&lt;/a&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;pyspark.sql&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SparkSession&lt;/span&gt;
&lt;a id="rest_code_b924a62ab4234a699a0c981baaffe174-4" name="rest_code_b924a62ab4234a699a0c981baaffe174-4"&gt;&lt;/a&gt;
&lt;a id="rest_code_b924a62ab4234a699a0c981baaffe174-5" name="rest_code_b924a62ab4234a699a0c981baaffe174-5"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Basic Spark session configuration&lt;/span&gt;
&lt;a id="rest_code_b924a62ab4234a699a0c981baaffe174-6" name="rest_code_b924a62ab4234a699a0c981baaffe174-6"&gt;&lt;/a&gt;&lt;span class="n"&gt;sc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SparkContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"local"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"PV Inverter Analysis"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_b924a62ab4234a699a0c981baaffe174-7" name="rest_code_b924a62ab4234a699a0c981baaffe174-7"&gt;&lt;/a&gt;&lt;span class="n"&gt;spark&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SparkSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_b924a62ab4234a699a0c981baaffe174-8" name="rest_code_b924a62ab4234a699a0c981baaffe174-8"&gt;&lt;/a&gt;
&lt;a id="rest_code_b924a62ab4234a699a0c981baaffe174-9" name="rest_code_b924a62ab4234a699a0c981baaffe174-9"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# We don't need most of this output&lt;/span&gt;
&lt;a id="rest_code_b924a62ab4234a699a0c981baaffe174-10" name="rest_code_b924a62ab4234a699a0c981baaffe174-10"&gt;&lt;/a&gt;&lt;span class="n"&gt;log4j&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_jvm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;org&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;apache&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;log4j&lt;/span&gt;
&lt;a id="rest_code_b924a62ab4234a699a0c981baaffe174-11" name="rest_code_b924a62ab4234a699a0c981baaffe174-11"&gt;&lt;/a&gt;&lt;span class="n"&gt;log4j&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LogManager&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getRootLogger&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;setLevel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log4j&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ERROR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;I observed while prototyping this in the &lt;a class="reference external" href="https://github.com/apache/spark/tree/master/python"&gt;pyspark&lt;/a&gt;  &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop"&gt;REPL&lt;/a&gt; environment
that if I didn't turn the logging output right down, then I'd see
squillions of &lt;strong&gt;INFO&lt;/strong&gt; messages.&lt;/p&gt;
&lt;p&gt;The second step is to generate a list of files. As you might have
guessed, I've got a year/month/day hierarchy - but the older files have
a &lt;em&gt;csv&lt;/em&gt; extension. To get those files (and since I want to be able to
process any given year &lt;em&gt;or&lt;/em&gt; year+month combination), I need to use some
globbing:&lt;/p&gt;
&lt;pre class="code python"&gt;&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-1" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-1"&gt;&lt;/a&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;glob&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-2" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-2"&gt;&lt;/a&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-3" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-3"&gt;&lt;/a&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generateFiles&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topdir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-4" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-4"&gt;&lt;/a&gt;    &lt;span class="sd"&gt;"""Construct per-year dicts of lists of files"""&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-5" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-5"&gt;&lt;/a&gt;    &lt;span class="n"&gt;allfiles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-6" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-6"&gt;&lt;/a&gt;    &lt;span class="n"&gt;kkey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-7" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-7"&gt;&lt;/a&gt;    &lt;span class="n"&gt;patterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-8" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-8"&gt;&lt;/a&gt;    &lt;span class="n"&gt;months&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-9" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-9"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# Since some of our data dirs have months as bare numbers and&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-10" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-10"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# others have a prepended 0, let's match them correctly.&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-11" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-11"&gt;&lt;/a&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-12" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-12"&gt;&lt;/a&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;month&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-13" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-13"&gt;&lt;/a&gt;            &lt;span class="n"&gt;months&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"0"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-14" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-14"&gt;&lt;/a&gt;        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-15" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-15"&gt;&lt;/a&gt;            &lt;span class="n"&gt;months&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-16" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-16"&gt;&lt;/a&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-17" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-17"&gt;&lt;/a&gt;        &lt;span class="n"&gt;patterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{year}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{monthp}&lt;/span&gt;&lt;span class="s2"&gt;/**"&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;monthp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;monthp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-18" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-18"&gt;&lt;/a&gt;                    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;monthp&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;months&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-19" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-19"&gt;&lt;/a&gt;        &lt;span class="n"&gt;kkey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-20" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-20"&gt;&lt;/a&gt;    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-21" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-21"&gt;&lt;/a&gt;        &lt;span class="n"&gt;patterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{year}&lt;/span&gt;&lt;span class="s2"&gt;/*/**"&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-22" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-22"&gt;&lt;/a&gt;        &lt;span class="n"&gt;kkey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-23" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-23"&gt;&lt;/a&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-24" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-24"&gt;&lt;/a&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;patterns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-25" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-25"&gt;&lt;/a&gt;        &lt;span class="n"&gt;globs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-26" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-26"&gt;&lt;/a&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;pat&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;patterns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-27" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-27"&gt;&lt;/a&gt;            &lt;span class="n"&gt;globs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;glob&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;glob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topdir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pat&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-28" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-28"&gt;&lt;/a&gt;        &lt;span class="n"&gt;allfiles&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;kkey&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;globs&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-29" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-29"&gt;&lt;/a&gt;    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-30" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-30"&gt;&lt;/a&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;yy&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;allYears&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-31" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-31"&gt;&lt;/a&gt;            &lt;span class="n"&gt;allfiles&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;yy&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;glob&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;glob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topdir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-32" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-32"&gt;&lt;/a&gt;                                                  &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{yy}&lt;/span&gt;&lt;span class="s2"&gt;/*/*"&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;yy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;yy&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;a id="rest_code_7f3927fc441f4675a89b54b16e45c4dd-33" name="rest_code_7f3927fc441f4675a89b54b16e45c4dd-33"&gt;&lt;/a&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;allfiles&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;To load in each file, I turned to the tried-and-true &lt;a class="reference external" href="https://www.python.org"&gt;Python&lt;/a&gt; standard
module &lt;strong&gt;csv&lt;/strong&gt;, and - rather than having a v1 and v2 processing
function, I model &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Don%27t_repeat_yourself"&gt;DRY&lt;/a&gt; and use an input argument to determine which
set of elements to match:&lt;/p&gt;
&lt;pre class="code python"&gt;&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-1" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-1"&gt;&lt;/a&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;csv&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-2" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-2"&gt;&lt;/a&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-3" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-3"&gt;&lt;/a&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-4" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-4"&gt;&lt;/a&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;importCSV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;isOld&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-5" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-5"&gt;&lt;/a&gt;    &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-6" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-6"&gt;&lt;/a&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;isOld&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-7" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-7"&gt;&lt;/a&gt;        &lt;span class="n"&gt;multiplier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1000.0&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-8" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-8"&gt;&lt;/a&gt;    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-9" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-9"&gt;&lt;/a&gt;        &lt;span class="n"&gt;multiplier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-10" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-10"&gt;&lt;/a&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-11" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-11"&gt;&lt;/a&gt;    &lt;span class="n"&gt;csvreader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fname&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;readlines&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-12" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-12"&gt;&lt;/a&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;csvreader&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-13" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-13"&gt;&lt;/a&gt;        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-14" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-14"&gt;&lt;/a&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;isOld&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-15" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-15"&gt;&lt;/a&gt;                &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tstamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_enow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_etoday&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;powergen&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vdc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-16" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-16"&gt;&lt;/a&gt;                 &lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;energen&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vac&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-17" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-17"&gt;&lt;/a&gt;            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-18" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-18"&gt;&lt;/a&gt;                &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tstamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;powergen&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vdc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;energen&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vac&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-19" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-19"&gt;&lt;/a&gt;        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="ne"&gt;ValueError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;_ve&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-20" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-20"&gt;&lt;/a&gt;            &lt;span class="c1"&gt;# print("failed at {row} of {fname}".format(row=row, fname=fname))&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-21" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-21"&gt;&lt;/a&gt;            &lt;span class="k"&gt;continue&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-22" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-22"&gt;&lt;/a&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-23" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-23"&gt;&lt;/a&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="s2"&gt;"e"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;temp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-24" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-24"&gt;&lt;/a&gt;            &lt;span class="c1"&gt;# invalid line, skip it&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-25" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-25"&gt;&lt;/a&gt;            &lt;span class="k"&gt;continue&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-26" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-26"&gt;&lt;/a&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-27" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-27"&gt;&lt;/a&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;isOld&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-28" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-28"&gt;&lt;/a&gt;            &lt;span class="n"&gt;isostamp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fromtimestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tstamp&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-29" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-29"&gt;&lt;/a&gt;        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-30" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-30"&gt;&lt;/a&gt;            &lt;span class="n"&gt;isostamp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fromisoformat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tstamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-31" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-31"&gt;&lt;/a&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-32" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-32"&gt;&lt;/a&gt;        &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-33" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-33"&gt;&lt;/a&gt;            &lt;span class="s2"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;isostamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-34" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-34"&gt;&lt;/a&gt;            &lt;span class="s2"&gt;"Temperature"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temp&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-35" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-35"&gt;&lt;/a&gt;            &lt;span class="s2"&gt;"PowerGenerated"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;powergen&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-36" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-36"&gt;&lt;/a&gt;            &lt;span class="s2"&gt;"VoltageDC"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vdc&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-37" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-37"&gt;&lt;/a&gt;            &lt;span class="s2"&gt;"Current"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-38" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-38"&gt;&lt;/a&gt;            &lt;span class="s2"&gt;"EnergyGenerated"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;energen&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;multiplier&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-39" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-39"&gt;&lt;/a&gt;            &lt;span class="s2"&gt;"VoltageAC"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vac&lt;/span&gt;&lt;span class="p"&gt;)})&lt;/span&gt;
&lt;a id="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-40" name="rest_code_0d69783936314703b8e6f5e0b9d0cdaf-40"&gt;&lt;/a&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;Now we get to the &lt;a class="reference external" href="https://spark.apache.org"&gt;Apache Spark&lt;/a&gt; part. Having got a dictionary of
anonymous &lt;code class="docutils literal"&gt;dicts&lt;/code&gt; I can turn them into an &lt;code class="docutils literal"&gt;Resilient Distributed
Dataset&lt;/code&gt; (&lt;a class="reference external" href="https://spark.apache.org/docs/latest/rdd-programming-guide.html"&gt;RDD&lt;/a&gt;) and thence a &lt;a class="reference external" href="https://spark.apache.org/docs/latest/sql-programming-guide.html"&gt;DataFrame&lt;/a&gt;. I chose the &lt;a class="reference external" href="https://spark.apache.org/docs/latest/sql-programming-guide.html"&gt;DataFrame&lt;/a&gt;
model rather than a &lt;code class="docutils literal"&gt;Row&lt;/code&gt; because that matches up nicely with my
existing data format. For other applications (such as when I extend my
&lt;a class="reference external" href="https://www.jmcpdotcom.com/blog/posts/2019-10-04-microservices-part-2/"&gt;Twitter Sentiment Analysis&lt;/a&gt; project with the streaming API) I'll use
the &lt;code class="docutils literal"&gt;Row&lt;/code&gt; datatype instead.&lt;/p&gt;
&lt;pre class="code python"&gt;&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-1" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-1"&gt;&lt;/a&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-2" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-2"&gt;&lt;/a&gt;    &lt;span class="sd"&gt;""" Returns an ISO8601-formatted (without microseconds) timestamp"""&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-3" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-3"&gt;&lt;/a&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"%Y-%M-&lt;/span&gt;&lt;span class="si"&gt;%d&lt;/span&gt;&lt;span class="s2"&gt;T%H:%m:%S"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-4" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-4"&gt;&lt;/a&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-5" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-5"&gt;&lt;/a&gt;&lt;span class="n"&gt;allFiles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;generateFiles&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;qyear&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;qmonth&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-6" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-6"&gt;&lt;/a&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-7" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-7"&gt;&lt;/a&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s2"&gt;"Importing data files"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-8" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-8"&gt;&lt;/a&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-9" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-9"&gt;&lt;/a&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;allFiles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-10" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-10"&gt;&lt;/a&gt;    &lt;span class="n"&gt;rddyear&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-11" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-11"&gt;&lt;/a&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;allFiles&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-12" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-12"&gt;&lt;/a&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;endswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;".csv"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-13" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-13"&gt;&lt;/a&gt;            &lt;span class="n"&gt;rddyear&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;importCSV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-14" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-14"&gt;&lt;/a&gt;        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-15" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-15"&gt;&lt;/a&gt;            &lt;span class="n"&gt;rddyear&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;importCSV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-16" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-16"&gt;&lt;/a&gt;    &lt;span class="n"&gt;rdds&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rddyear&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-17" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-17"&gt;&lt;/a&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-18" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-18"&gt;&lt;/a&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s2"&gt;"All data files imported"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-19" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-19"&gt;&lt;/a&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-20" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-20"&gt;&lt;/a&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;allYears&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-21" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-21"&gt;&lt;/a&gt;    &lt;span class="n"&gt;rdd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parallelize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rdds&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-22" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-22"&gt;&lt;/a&gt;    &lt;span class="n"&gt;allFrames&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rdd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;toDF&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-23" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-23"&gt;&lt;/a&gt;    &lt;span class="n"&gt;newFrame&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"new"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-24" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-24"&gt;&lt;/a&gt;    &lt;span class="c1"&gt;# Extend the schema for our convenience&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-25" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-25"&gt;&lt;/a&gt;    &lt;span class="n"&gt;allFrames&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;newFrame&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;allFrames&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-26" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-26"&gt;&lt;/a&gt;        &lt;span class="s2"&gt;"DateOnly"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;date_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'timestamp'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"yyyyMMdd"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-27" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-27"&gt;&lt;/a&gt;    &lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"TimeOnly"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;date_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'timestamp'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"HHmmss"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-28" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-28"&gt;&lt;/a&gt;    &lt;span class="n"&gt;allFrames&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;newFrame&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;createOrReplaceTempView&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"view&lt;/span&gt;&lt;span class="si"&gt;{year}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-29" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-29"&gt;&lt;/a&gt;        &lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-30" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-30"&gt;&lt;/a&gt;
&lt;a id="rest_code_41e655eadba544f7ac0743ad83a6b51f-31" name="rest_code_41e655eadba544f7ac0743ad83a6b51f-31"&gt;&lt;/a&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s2"&gt;"Data transformed from RDDs into DataFrames"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;The reason I chose to extend the frames with two extra columns is
because when I search for the record dates (minimum and maximum), I want
to have a quick &lt;code class="docutils literal"&gt;SELECT&lt;/code&gt; which I can aggregate on.&lt;/p&gt;
&lt;pre class="code python"&gt;&lt;a id="rest_code_75175a7121f9424494f095f110692f1b-1" name="rest_code_75175a7121f9424494f095f110692f1b-1"&gt;&lt;/a&gt;&lt;span class="n"&gt;ymdquery&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"SELECT DISTINCT DateOnly from &lt;/span&gt;&lt;span class="si"&gt;{view}&lt;/span&gt;&lt;span class="s2"&gt; WHERE DateOnly "&lt;/span&gt;
&lt;a id="rest_code_75175a7121f9424494f095f110692f1b-2" name="rest_code_75175a7121f9424494f095f110692f1b-2"&gt;&lt;/a&gt;&lt;span class="n"&gt;ymdquery&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="s2"&gt;"LIKE '&lt;/span&gt;&lt;span class="si"&gt;{yyyymm}&lt;/span&gt;&lt;span class="s2"&gt;%' ORDER BY DateOnly ASC"&lt;/span&gt;
&lt;a id="rest_code_75175a7121f9424494f095f110692f1b-3" name="rest_code_75175a7121f9424494f095f110692f1b-3"&gt;&lt;/a&gt;
&lt;a id="rest_code_75175a7121f9424494f095f110692f1b-4" name="rest_code_75175a7121f9424494f095f110692f1b-4"&gt;&lt;/a&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;allYears&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_75175a7121f9424494f095f110692f1b-5" name="rest_code_75175a7121f9424494f095f110692f1b-5"&gt;&lt;/a&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;mon&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;allMonths&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_75175a7121f9424494f095f110692f1b-6" name="rest_code_75175a7121f9424494f095f110692f1b-6"&gt;&lt;/a&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;mon&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_75175a7121f9424494f095f110692f1b-7" name="rest_code_75175a7121f9424494f095f110692f1b-7"&gt;&lt;/a&gt;            &lt;span class="n"&gt;yyyymm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s2"&gt;"0"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mon&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_75175a7121f9424494f095f110692f1b-8" name="rest_code_75175a7121f9424494f095f110692f1b-8"&gt;&lt;/a&gt;        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_75175a7121f9424494f095f110692f1b-9" name="rest_code_75175a7121f9424494f095f110692f1b-9"&gt;&lt;/a&gt;            &lt;span class="n"&gt;yyyymm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mon&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_75175a7121f9424494f095f110692f1b-10" name="rest_code_75175a7121f9424494f095f110692f1b-10"&gt;&lt;/a&gt;
&lt;a id="rest_code_75175a7121f9424494f095f110692f1b-11" name="rest_code_75175a7121f9424494f095f110692f1b-11"&gt;&lt;/a&gt;        &lt;span class="n"&gt;_dates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;spark&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ymdquery&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;a id="rest_code_75175a7121f9424494f095f110692f1b-12" name="rest_code_75175a7121f9424494f095f110692f1b-12"&gt;&lt;/a&gt;            &lt;span class="n"&gt;view&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;view&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;yyyymm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;yyyymm&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;collect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;a id="rest_code_75175a7121f9424494f095f110692f1b-13" name="rest_code_75175a7121f9424494f095f110692f1b-13"&gt;&lt;/a&gt;        &lt;span class="n"&gt;days&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;asDict&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="s2"&gt;"DateOnly"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;_dates&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_75175a7121f9424494f095f110692f1b-14" name="rest_code_75175a7121f9424494f095f110692f1b-14"&gt;&lt;/a&gt;
&lt;a id="rest_code_75175a7121f9424494f095f110692f1b-15" name="rest_code_75175a7121f9424494f095f110692f1b-15"&gt;&lt;/a&gt;        &lt;span class="n"&gt;_monthMax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;a id="rest_code_75175a7121f9424494f095f110692f1b-16" name="rest_code_75175a7121f9424494f095f110692f1b-16"&gt;&lt;/a&gt;            &lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DateOnly&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;days&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;a id="rest_code_75175a7121f9424494f095f110692f1b-17" name="rest_code_75175a7121f9424494f095f110692f1b-17"&gt;&lt;/a&gt;                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"EnergyGenerated"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"max"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;collect&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_75175a7121f9424494f095f110692f1b-18" name="rest_code_75175a7121f9424494f095f110692f1b-18"&gt;&lt;/a&gt;        &lt;span class="n"&gt;monthMax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_monthMax&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;asDict&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="s2"&gt;"max(EnergyGenerated)"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;I keep track of each day's maximum, and update my &lt;code class="docutils literal"&gt;minval&lt;/code&gt; and
&lt;code class="docutils literal"&gt;minDay&lt;/code&gt; as required. All this information is then stored in a
per-month &lt;code class="docutils literal"&gt;dict&lt;/code&gt;, and then in a per-year &lt;code class="docutils literal"&gt;dict&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The last stage is to print out the record dates, monthly and yearly
totals, averages and other values.&lt;/p&gt;
&lt;p&gt;Running this utility on my 4-core &lt;a class="reference external" href="https://www.ubuntu.com"&gt;Ubuntu&lt;/a&gt; system at home, I get what I
believe are &lt;em&gt;ok&lt;/em&gt; timings for whole-year investigations, and &lt;em&gt;reasonable&lt;/em&gt;
timings if I check a specific month.&lt;/p&gt;
&lt;p&gt;When I run the utility for January 2018, the output looks like this:&lt;/p&gt;
&lt;pre class="code shell"&gt;&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-1" name="rest_code_e4b6068f059945448d35a5b59aacd665-1"&gt;&lt;/a&gt;&lt;span class="o"&gt;(&lt;/span&gt;v-3.7-linux&lt;span class="o"&gt;)&lt;/span&gt; flerken:solar-spark $ &lt;span class="nv"&gt;SPARK_LOCAL_IP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;.0.0.0 &lt;span class="nb"&gt;time&lt;/span&gt; -f &lt;span class="s2"&gt;"%E"&lt;/span&gt;  spark-submit --executor-memory 2G --driver-memory 2G solar-spark.py  -y &lt;span class="m"&gt;2018&lt;/span&gt; -m &lt;span class="m"&gt;1&lt;/span&gt;
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-2" name="rest_code_e4b6068f059945448d35a5b59aacd665-2"&gt;&lt;/a&gt;&lt;span class="m"&gt;19&lt;/span&gt;/10/15 &lt;span class="m"&gt;12&lt;/span&gt;:23:57 WARN NativeCodeLoader: Unable to load native-hadoop library &lt;span class="k"&gt;for&lt;/span&gt; your platform... using builtin-java classes where applicable
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-3" name="rest_code_e4b6068f059945448d35a5b59aacd665-3"&gt;&lt;/a&gt;Using Spark default log4j profile: org/apache/spark/log4j-defaults.properties
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-4" name="rest_code_e4b6068f059945448d35a5b59aacd665-4"&gt;&lt;/a&gt;
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-5" name="rest_code_e4b6068f059945448d35a5b59aacd665-5"&gt;&lt;/a&gt;&lt;span class="o"&gt;[&lt;/span&gt;Most INFO-level output elided&lt;span class="o"&gt;]&lt;/span&gt;
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-6" name="rest_code_e4b6068f059945448d35a5b59aacd665-6"&gt;&lt;/a&gt;
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-7" name="rest_code_e4b6068f059945448d35a5b59aacd665-7"&gt;&lt;/a&gt;&lt;span class="m"&gt;19&lt;/span&gt;/10/15 &lt;span class="m"&gt;12&lt;/span&gt;:23:58 INFO Utils: Successfully started service &lt;span class="s1"&gt;'SparkUI'&lt;/span&gt; on port &lt;span class="m"&gt;4040&lt;/span&gt;.
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-8" name="rest_code_e4b6068f059945448d35a5b59aacd665-8"&gt;&lt;/a&gt;&lt;span class="m"&gt;19&lt;/span&gt;/10/15 &lt;span class="m"&gt;12&lt;/span&gt;:23:58 INFO SparkUI: Bound SparkUI to &lt;span class="m"&gt;0&lt;/span&gt;.0.0.0, and started at http://0.0.0.0:4040
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-9" name="rest_code_e4b6068f059945448d35a5b59aacd665-9"&gt;&lt;/a&gt;
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-10" name="rest_code_e4b6068f059945448d35a5b59aacd665-10"&gt;&lt;/a&gt;&lt;span class="m"&gt;2019&lt;/span&gt;-23-15T12:10:59 Importing data files
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-11" name="rest_code_e4b6068f059945448d35a5b59aacd665-11"&gt;&lt;/a&gt;&lt;span class="m"&gt;2019&lt;/span&gt;-23-15T12:10:59 All data files imported
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-12" name="rest_code_e4b6068f059945448d35a5b59aacd665-12"&gt;&lt;/a&gt;/space/jmcp/web/v-3.7-linux/lib/python3.7/site-packages/pyspark/python/lib/pyspark.zip/pyspark/sql/session.py:366: UserWarning: Using RDD of dict to inferSchema is deprecated. Use pyspark.sql.Row instead
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-13" name="rest_code_e4b6068f059945448d35a5b59aacd665-13"&gt;&lt;/a&gt;&lt;span class="m"&gt;2019&lt;/span&gt;-24-15T12:10:01 Data transformed from RDDs into DataFrames
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-14" name="rest_code_e4b6068f059945448d35a5b59aacd665-14"&gt;&lt;/a&gt;&lt;span class="m"&gt;2019&lt;/span&gt;-24-15T12:10:01 Analysing &lt;span class="m"&gt;2018&lt;/span&gt;
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-15" name="rest_code_e4b6068f059945448d35a5b59aacd665-15"&gt;&lt;/a&gt;&lt;span class="m"&gt;2019&lt;/span&gt;-24-15T12:10:01          January
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-16" name="rest_code_e4b6068f059945448d35a5b59aacd665-16"&gt;&lt;/a&gt;&lt;span class="m"&gt;2019&lt;/span&gt;-24-15T12:10:15 All data analysed
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-17" name="rest_code_e4b6068f059945448d35a5b59aacd665-17"&gt;&lt;/a&gt;&lt;span class="m"&gt;2019&lt;/span&gt;-24-15T12:10:15 &lt;span class="m"&gt;2018&lt;/span&gt; total generation: &lt;span class="m"&gt;436130&lt;/span&gt;.00 KW/h
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-18" name="rest_code_e4b6068f059945448d35a5b59aacd665-18"&gt;&lt;/a&gt;&lt;span class="m"&gt;2019&lt;/span&gt;-24-15T12:10:15         January total:               &lt;span class="m"&gt;436130&lt;/span&gt;.00 KW/h
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-19" name="rest_code_e4b6068f059945448d35a5b59aacd665-19"&gt;&lt;/a&gt;&lt;span class="m"&gt;2019&lt;/span&gt;-24-15T12:10:15         Record dates &lt;span class="k"&gt;for&lt;/span&gt; January:    Max on &lt;span class="m"&gt;20180131&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="m"&gt;16780&lt;/span&gt;.00 KW/h&lt;span class="o"&gt;)&lt;/span&gt;, Min on &lt;span class="m"&gt;20180102&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="m"&gt;10560&lt;/span&gt;.00 KW/h&lt;span class="o"&gt;)&lt;/span&gt;
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-20" name="rest_code_e4b6068f059945448d35a5b59aacd665-20"&gt;&lt;/a&gt;&lt;span class="m"&gt;2019&lt;/span&gt;-24-15T12:10:15         Average daily generation  &lt;span class="m"&gt;14068&lt;/span&gt;.71 KW/h
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-21" name="rest_code_e4b6068f059945448d35a5b59aacd665-21"&gt;&lt;/a&gt;&lt;span class="m"&gt;2019&lt;/span&gt;-24-15T12:10:15 ----------------
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-22" name="rest_code_e4b6068f059945448d35a5b59aacd665-22"&gt;&lt;/a&gt;
&lt;a id="rest_code_e4b6068f059945448d35a5b59aacd665-23" name="rest_code_e4b6068f059945448d35a5b59aacd665-23"&gt;&lt;/a&gt;&lt;span class="m"&gt;0&lt;/span&gt;:18.76
&lt;/pre&gt;&lt;p&gt;While that processing is going on, you can see a dashboard with useful
information about the application at &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;http://localhost:4040&lt;/span&gt;&lt;/code&gt;:&lt;/p&gt;
&lt;img alt="Executors" src="https://www.jmcpdotcom.com/blog/images/2019/spark-executors.png"&gt;
&lt;img alt="Jobs" src="https://www.jmcpdotcom.com/blog/images/2019/spark-jobs.png"&gt;
&lt;img alt="Details of a query" src="https://www.jmcpdotcom.com/blog/images/2019/spark-query-details.png"&gt;
&lt;p&gt;You can find the code for this project in my &lt;a class="reference external" href="https://github.com/jmcp"&gt;GitHub&lt;/a&gt; repo &lt;a class="reference external" href="https://github.com/jmcp/solar-spark"&gt;solar-spark&lt;/a&gt;.&lt;/p&gt;
&lt;!-- put references after this point --&gt;</description><category>Apache Spark</category><category>CSV</category><category>data mining</category><category>ETL</category><category>Interview prep</category><category>JSON</category><category>Python</category><category>software engineering</category><category>training</category><category>upskilling</category><guid>https://www.jmcpdotcom.com/blog/posts/2019-10-11-apache-spark-init/</guid><pubDate>Thu, 10 Oct 2019 16:00:00 GMT</pubDate></item><item><title>Microservices (part 2)</title><link>https://www.jmcpdotcom.com/blog/posts/2019-10-04-microservices-part-2/</link><dc:creator>jmcp</dc:creator><description>&lt;p&gt;One principle that I work on is that I should always &lt;em&gt;extend the fix&lt;/em&gt; (learnt
via the &lt;a class="reference external" href="https://www.kepner-tregoe.com"&gt;Kepner-Tregoe&lt;/a&gt;  &lt;a class="reference external" href="https://www.kepner-tregoe.com/training-workshops/our-training-workshops/analytic-trouble-shooting/"&gt;Analytical Troubleshooting&lt;/a&gt; training many years
ago). Following my investigation of how to provide a more accessible method of
&lt;a class="reference external" href="https://www.jmcpdotcom.com/blog/posts/2019-09-27-microservices-part-1/"&gt;determining your electorate&lt;/a&gt;, I came back to the political polling ideas and
got to thinking about how we can track the temperature of a conversation in,
for example &lt;a class="reference external" href="https://twitter.com/search?q=%23auspol&amp;amp;src=typed_query&amp;amp;f=live"&gt;#auspol&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The term for this is &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Sentiment_analysis"&gt;sentiment analysis&lt;/a&gt; and while the major cloud providers
have their own implementations of this (&lt;a class="reference external" href="https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/"&gt;Microsoft Azure Text Analytics&lt;/a&gt;,
&lt;a class="reference external" href="https://aws.amazon.com/comprehend"&gt;AWS Comprehend&lt;/a&gt;, &lt;a class="reference external" href="https://console.developers.google.com/apis/library/language.googleapis.com"&gt;Google Cloud Natural Language API&lt;/a&gt;) you can also use
&lt;a class="reference external" href="https://www.python.org"&gt;Python&lt;/a&gt;'s &lt;a class="reference external" href="http://www.nltk.org"&gt;nltk&lt;/a&gt; in the comfort of your own venv. It's cheaper, too!&lt;/p&gt;
&lt;p&gt;A bit of searching lead me to &lt;a class="reference external" href="https://twitter.com/chapagain"&gt;@Chapagain&lt;/a&gt;'s &lt;a class="reference external" href="http://blog.chapagain.com.np/python-nltk-twitter-sentiment-analysis-natural-language-processing-nlp/"&gt;post&lt;/a&gt; which was very useful and
got me started - thankyou&lt;/p&gt;
&lt;p&gt;I decided that I really want to do something more real-time, and while I could
have done more scraping with &lt;a class="reference external" href="https://www.crummy.com/software/BeautifulSoup"&gt;Beautiful Soup&lt;/a&gt;, a quick look at the html
that's returned with you run&lt;/p&gt;
&lt;pre class="code python"&gt;&lt;a id="rest_code_9cc272ee85e3471c9485a47c868598df-1" name="rest_code_9cc272ee85e3471c9485a47c868598df-1"&gt;&lt;/a&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;requests&lt;/span&gt;
&lt;a id="rest_code_9cc272ee85e3471c9485a47c868598df-2" name="rest_code_9cc272ee85e3471c9485a47c868598df-2"&gt;&lt;/a&gt;
&lt;a id="rest_code_9cc272ee85e3471c9485a47c868598df-3" name="rest_code_9cc272ee85e3471c9485a47c868598df-3"&gt;&lt;/a&gt;&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"https://twitter.com/search?q=&lt;/span&gt;&lt;span class="si"&gt;%23a&lt;/span&gt;&lt;span class="s2"&gt;uspol&amp;amp;src=typed_query&amp;amp;f=live"&lt;/span&gt;
&lt;a id="rest_code_9cc272ee85e3471c9485a47c868598df-4" name="rest_code_9cc272ee85e3471c9485a47c868598df-4"&gt;&lt;/a&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_9cc272ee85e3471c9485a47c868598df-5" name="rest_code_9cc272ee85e3471c9485a47c868598df-5"&gt;&lt;/a&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;is eye-wateringly complex. (Go on, try it!) I just couldn't be bothered with
that so I signed up for a &lt;a class="reference external" href="https://twitter.com"&gt;Twitter&lt;/a&gt; developer account and started looking at
the APIs available for &lt;a class="reference external" href="https://developer.twitter.com/en/docs/tweets/search/overview/standard"&gt;searching&lt;/a&gt;. These are easily used with the &lt;a class="reference external" href="https://github.com/ryanmcgrath/twython"&gt;Twython&lt;/a&gt;
library:&lt;/p&gt;
&lt;pre class="code python"&gt;&lt;a id="rest_code_0b15ca5c84c84e01a2743da2c4f34022-1" name="rest_code_0b15ca5c84c84e01a2743da2c4f34022-1"&gt;&lt;/a&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;twython&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Twython&lt;/span&gt;
&lt;a id="rest_code_0b15ca5c84c84e01a2743da2c4f34022-2" name="rest_code_0b15ca5c84c84e01a2743da2c4f34022-2"&gt;&lt;/a&gt;
&lt;a id="rest_code_0b15ca5c84c84e01a2743da2c4f34022-3" name="rest_code_0b15ca5c84c84e01a2743da2c4f34022-3"&gt;&lt;/a&gt;&lt;span class="n"&gt;twitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Tython&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;consumer_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;consumer_secret&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_0b15ca5c84c84e01a2743da2c4f34022-4" name="rest_code_0b15ca5c84c84e01a2743da2c4f34022-4"&gt;&lt;/a&gt;                 &lt;span class="n"&gt;access_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;access_token_secret&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_0b15ca5c84c84e01a2743da2c4f34022-5" name="rest_code_0b15ca5c84c84e01a2743da2c4f34022-5"&gt;&lt;/a&gt;&lt;span class="n"&gt;hashtag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"#auspol"&lt;/span&gt;
&lt;a id="rest_code_0b15ca5c84c84e01a2743da2c4f34022-6" name="rest_code_0b15ca5c84c84e01a2743da2c4f34022-6"&gt;&lt;/a&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;twitter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;hashtag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"recent"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_0b15ca5c84c84e01a2743da2c4f34022-7" name="rest_code_0b15ca5c84c84e01a2743da2c4f34022-7"&gt;&lt;/a&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"statuses"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
&lt;a id="rest_code_0b15ca5c84c84e01a2743da2c4f34022-8" name="rest_code_0b15ca5c84c84e01a2743da2c4f34022-8"&gt;&lt;/a&gt;    &lt;span class="n"&gt;sentiment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;classifier_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;a id="rest_code_0b15ca5c84c84e01a2743da2c4f34022-9" name="rest_code_0b15ca5c84c84e01a2743da2c4f34022-9"&gt;&lt;/a&gt;    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentiment&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"pos"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;I hit the rate limit a few times until I realiased that there was a &lt;code class="docutils literal"&gt;while
true&lt;/code&gt; going on inside &lt;a class="reference external" href="https://github.com/ryanmcgrath/twython"&gt;Twython&lt;/a&gt; when using the &lt;code class="docutils literal"&gt;cursor&lt;/code&gt; method. In my
print-to-shell proof of concept, I got past that by using the &lt;code class="docutils literal"&gt;search&lt;/code&gt;
function inside a while loop with a 30second sleep call. I knew that that
wasn't good enough for a web app, and would actually be a road block for doing
a properly updated graph-focused page.&lt;/p&gt;
&lt;p&gt;For that I would need a charting library, and some JavaScript. I started out
using &lt;a class="reference external" href="https://www.chartjs.org"&gt;Chart.js&lt;/a&gt;, but quickly realised that it didn't have any sort of flow,
so then I retooled to use &lt;a class="reference external" href="https://c3js.org/"&gt;C3js&lt;/a&gt; instead.&lt;/p&gt;
&lt;p&gt;The initial render of the template provides the first set of data, which is a
JavaScript array (&lt;code class="docutils literal"&gt;[]&lt;/code&gt;), and checks for a saved hashtag and the id of the
most recently found tweet using &lt;a class="reference external" href="https://developer.mozilla.org/en-US/docs/Web/API/Window/sessionStorage"&gt;Window.sessionStorage()&lt;/a&gt;. Then we set up a
function to get new data when called:&lt;/p&gt;
&lt;pre class="code javascript"&gt;&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-1" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-1"&gt;&lt;/a&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nx"&gt;getNewData&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-2" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-2"&gt;&lt;/a&gt;    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;xhr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;XMLHttpRequest&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-3" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-3"&gt;&lt;/a&gt;    &lt;span class="nx"&gt;xhr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"GET"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"/sentiment?hashtag="&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="nx"&gt;hashtag&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="s2"&gt;"&amp;amp;lastid="&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="nx"&gt;lastid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-4" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-4"&gt;&lt;/a&gt;    &lt;span class="nx"&gt;xhr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-5" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-5"&gt;&lt;/a&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;xhr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;readyState&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mf"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;xhr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mf"&gt;200&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-6" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-6"&gt;&lt;/a&gt;            &lt;span class="nx"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;xhr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;responseText&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-7" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-7"&gt;&lt;/a&gt;            &lt;span class="nx"&gt;sessionStorage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;setItem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"lastid"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"lastid"&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-8" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-8"&gt;&lt;/a&gt;            &lt;span class="nx"&gt;lastid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"lastid"&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-9" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-9"&gt;&lt;/a&gt;            &lt;span class="nx"&gt;labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"labels"&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-10" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-10"&gt;&lt;/a&gt;            &lt;span class="c1"&gt;// Did we get new data points?&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-11" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-11"&gt;&lt;/a&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"chartdata"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-12" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-12"&gt;&lt;/a&gt;                &lt;span class="c1"&gt;// yes&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-13" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-13"&gt;&lt;/a&gt;                &lt;span class="nx"&gt;newDataCol&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"chartdata"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-14" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-14"&gt;&lt;/a&gt;                &lt;span class="nx"&gt;curidx&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"chartdata"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-15" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-15"&gt;&lt;/a&gt;            &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-16" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-16"&gt;&lt;/a&gt;                &lt;span class="nx"&gt;newDataCol&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-17" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-17"&gt;&lt;/a&gt;            &lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-18" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-18"&gt;&lt;/a&gt;        &lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-19" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-19"&gt;&lt;/a&gt;    &lt;span class="p"&gt;};&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-20" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-20"&gt;&lt;/a&gt;    &lt;span class="nx"&gt;xhr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onerror&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-21" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-21"&gt;&lt;/a&gt;        &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;xhr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;statusText&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-22" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-22"&gt;&lt;/a&gt;    &lt;span class="p"&gt;};&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-23" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-23"&gt;&lt;/a&gt;    &lt;span class="nx"&gt;xhr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;a id="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-24" name="rest_code_c8cc42dea1bf4af9af4d8a2b68debba4-24"&gt;&lt;/a&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;Finally, we need to tell the window to call our &lt;code class="docutils literal"&gt;updateChart()&lt;/code&gt; function
every 30 seconds, and define that function:&lt;/p&gt;
&lt;pre class="code javascript"&gt;&lt;a id="rest_code_3ea794dea366448a8eb960bfc36143d6-1" name="rest_code_3ea794dea366448a8eb960bfc36143d6-1"&gt;&lt;/a&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nx"&gt;updateChart&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;a id="rest_code_3ea794dea366448a8eb960bfc36143d6-2" name="rest_code_3ea794dea366448a8eb960bfc36143d6-2"&gt;&lt;/a&gt;    &lt;span class="nx"&gt;getNewData&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;a id="rest_code_3ea794dea366448a8eb960bfc36143d6-3" name="rest_code_3ea794dea366448a8eb960bfc36143d6-3"&gt;&lt;/a&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;newDataCol&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;a id="rest_code_3ea794dea366448a8eb960bfc36143d6-4" name="rest_code_3ea794dea366448a8eb960bfc36143d6-4"&gt;&lt;/a&gt;        &lt;span class="nx"&gt;chart&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;flow&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
&lt;a id="rest_code_3ea794dea366448a8eb960bfc36143d6-5" name="rest_code_3ea794dea366448a8eb960bfc36143d6-5"&gt;&lt;/a&gt;            &lt;span class="nx"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;prevDataCol&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_3ea794dea366448a8eb960bfc36143d6-6" name="rest_code_3ea794dea366448a8eb960bfc36143d6-6"&gt;&lt;/a&gt;            &lt;span class="nx"&gt;done&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;a id="rest_code_3ea794dea366448a8eb960bfc36143d6-7" name="rest_code_3ea794dea366448a8eb960bfc36143d6-7"&gt;&lt;/a&gt;               &lt;span class="nx"&gt;chart&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;flow&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
&lt;a id="rest_code_3ea794dea366448a8eb960bfc36143d6-8" name="rest_code_3ea794dea366448a8eb960bfc36143d6-8"&gt;&lt;/a&gt;                   &lt;span class="nx"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;newDataCol&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_3ea794dea366448a8eb960bfc36143d6-9" name="rest_code_3ea794dea366448a8eb960bfc36143d6-9"&gt;&lt;/a&gt;                   &lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;connectNull&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_3ea794dea366448a8eb960bfc36143d6-10" name="rest_code_3ea794dea366448a8eb960bfc36143d6-10"&gt;&lt;/a&gt;               &lt;span class="p"&gt;})&lt;/span&gt;
&lt;a id="rest_code_3ea794dea366448a8eb960bfc36143d6-11" name="rest_code_3ea794dea366448a8eb960bfc36143d6-11"&gt;&lt;/a&gt;            &lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_3ea794dea366448a8eb960bfc36143d6-12" name="rest_code_3ea794dea366448a8eb960bfc36143d6-12"&gt;&lt;/a&gt;        &lt;span class="p"&gt;});&lt;/span&gt;
&lt;a id="rest_code_3ea794dea366448a8eb960bfc36143d6-13" name="rest_code_3ea794dea366448a8eb960bfc36143d6-13"&gt;&lt;/a&gt;        &lt;span class="nx"&gt;prevDataCol&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;newDataCol&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;a id="rest_code_3ea794dea366448a8eb960bfc36143d6-14" name="rest_code_3ea794dea366448a8eb960bfc36143d6-14"&gt;&lt;/a&gt;    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;a id="rest_code_3ea794dea366448a8eb960bfc36143d6-15" name="rest_code_3ea794dea366448a8eb960bfc36143d6-15"&gt;&lt;/a&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;a id="rest_code_3ea794dea366448a8eb960bfc36143d6-16" name="rest_code_3ea794dea366448a8eb960bfc36143d6-16"&gt;&lt;/a&gt;
&lt;a id="rest_code_3ea794dea366448a8eb960bfc36143d6-17" name="rest_code_3ea794dea366448a8eb960bfc36143d6-17"&gt;&lt;/a&gt;&lt;span class="cm"&gt;/* Update the chart every 30 seconds */&lt;/span&gt;
&lt;a id="rest_code_3ea794dea366448a8eb960bfc36143d6-18" name="rest_code_3ea794dea366448a8eb960bfc36143d6-18"&gt;&lt;/a&gt;&lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;setInterval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;updateChart&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;30000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;So that you can change the hashtag to watch, I added a small &lt;code class="docutils literal"&gt;&amp;lt;form&amp;gt;&lt;/code&gt;
element which POSTs the new hashtag to the &lt;code class="docutils literal"&gt;/sentiment&lt;/code&gt; method on submit and
then re-renders the template.&lt;/p&gt;
&lt;p&gt;&lt;br&gt;&lt;/p&gt;
&lt;img alt="/images/2019/sentiment1.png" src="https://www.jmcpdotcom.com/blog/images/2019/sentiment1.png"&gt;
&lt;p&gt;&lt;br&gt;
&lt;br&gt;
&lt;br&gt;&lt;/p&gt;
&lt;p&gt;What I'm particularly happy with is that the JavaScript took me only a few
hours last Saturday morning and was pretty straightforward to write.&lt;/p&gt;
&lt;p&gt;You can find the code for this project in my &lt;a class="reference external" href="https://github.com/jmcp"&gt;GitHub&lt;/a&gt; repo &lt;a class="reference external" href="https://github.com/jmcp/au-pol-sentiment"&gt;au-pol-sentiment&lt;/a&gt;.&lt;/p&gt;
&lt;!-- put references after this point --&gt;
&lt;!--  --&gt;
&lt;!--  --&gt;
&lt;!--  --&gt;</description><category>C3.js</category><category>ETL</category><category>flask</category><category>JSON</category><category>microservices</category><category>nltk</category><category>Python</category><category>sentiment analysis</category><category>software engineering</category><category>training</category><category>twitter</category><category>Twython</category><category>upskilling</category><guid>https://www.jmcpdotcom.com/blog/posts/2019-10-04-microservices-part-2/</guid><pubDate>Thu, 03 Oct 2019 16:00:00 GMT</pubDate></item><item><title>Microservices (part 1)</title><link>https://www.jmcpdotcom.com/blog/posts/2019-09-27-microservices-part-1/</link><dc:creator>jmcp</dc:creator><description>&lt;p&gt;Since I departed from my comfortable niche in &lt;a class="reference external" href="http://www.oracle.com/technetwork/server-storage/solaris11/downloads/index.html"&gt;Solaris&lt;/a&gt; engineering earlier
this year, I've spent a considerable amount of time and energy in re-training
and upskilling to assist my employment prospects. Apart from acquainting
myself with a lot of terminology, I've written code. A lot of code. Mostly, as
it turns out, has been related to &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Microservices"&gt;microservices&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This post is about a &lt;a class="reference external" href="https://github.com/jmcp/find-my-electorate/blob/master/__init__.py"&gt;microservice&lt;/a&gt; I wrote to assist with accessibility in a
specific part of the Australian electoral process (finding out which
electorate you live in) and some supporting digressions.&lt;/p&gt;
&lt;p&gt;You can find all the code for this &lt;a class="reference external" href="https://github.com/jmcp/find-my-electorate/blob/master/__init__.py"&gt;microservice&lt;/a&gt; and its associated data
preparation in my &lt;a class="reference external" href="https://github.com/jmcp"&gt;GitHub&lt;/a&gt; repos &lt;a class="reference external" href="https://github.com/jmcp/grabbag"&gt;grabbag&lt;/a&gt; and &lt;a class="reference external" href="https://github.com/jmcp/find-my-electorate"&gt;find-my-electorate&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;On 18 May 2019, Australia had a &lt;a class="reference external" href="https://en.wikipedia.org/wiki/2019_Australian_federal_election"&gt;federal election&lt;/a&gt;, and in the lead up to
that event I became very interested in political polling. While I have a few
ideas on the subject which are on my back burner, mind-mapping the various
components of political polling got to wondering how do the various state,
territory and federal electoral commissions map a voter's address to an
electorate?&lt;/p&gt;
&lt;p&gt;My first point of call was the &lt;a class="reference external" href="https://www.aec.gov.au"&gt;Australian Electoral Commission&lt;/a&gt; and their
&lt;a class="reference external" href="https://electorate.aec.gov.au"&gt;Find my electorate&lt;/a&gt; site. This is nicely laid out and lets you find out
which electorate you are in - by postcode. This is all well and good if you're
in a densely populated area, like the &lt;a class="reference external" href="https://electorate.aec.gov.au/LocalitySearchResults.aspx?filter=4000&amp;amp;filterby=Postcode"&gt;electorate of Brisbane&lt;/a&gt; - three
suburbs. If, however, you choose somewhere else, like &lt;strong&gt;2620&lt;/strong&gt; which covers a
lot of Canberra and surrounding districts, you wind up with several
&lt;a class="reference external" href="https://electorate.aec.gov.au/LocalitySearchResults.aspx?filter=2620&amp;amp;filterby=Postcode"&gt;electorates covering 2620&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;a class="reference external" href="https://www.aec.gov.au"&gt;AEC&lt;/a&gt;'s website is written in &lt;a class="reference external" href="https://dotnet.microsoft.com/apps/aspnet"&gt;asp.net&lt;/a&gt;, which is up to the task, but
when you have more than one page of results the authors of the page make use
of some (to my mind) squirrelly features and callbacks which make scraping the
site difficult. As best I can determine, the &lt;a class="reference external" href="https://www.aec.gov.au"&gt;AEC&lt;/a&gt; doesn't provide an API to
access this information, so Another Method was required.&lt;/p&gt;
&lt;p&gt;At this point, I turned to the standard libraries for this sort of thing in the
&lt;a class="reference external" href="https://www.python.org"&gt;Python&lt;/a&gt; worldL &lt;a class="reference external" href="https://www.crummy.com/software/BeautifulSoup"&gt;Beautiful Soup&lt;/a&gt; and &lt;a class="reference external" href="https://requests.kennethreitz.org/en/master/"&gt;requests&lt;/a&gt;. I started by setting up a
quick venv to keep the dependencies contained&lt;/p&gt;
&lt;pre class="code shell"&gt;&lt;a id="rest_code_7c27c86dc9fc4fbf8864039fc5ba94f8-1" name="rest_code_7c27c86dc9fc4fbf8864039fc5ba94f8-1"&gt;&lt;/a&gt;$ python3.7 -m venv scraping-venv
&lt;a id="rest_code_7c27c86dc9fc4fbf8864039fc5ba94f8-2" name="rest_code_7c27c86dc9fc4fbf8864039fc5ba94f8-2"&gt;&lt;/a&gt;$ . scraping/bin/activate
&lt;a id="rest_code_7c27c86dc9fc4fbf8864039fc5ba94f8-3" name="rest_code_7c27c86dc9fc4fbf8864039fc5ba94f8-3"&gt;&lt;/a&gt;&lt;span class="o"&gt;(&lt;/span&gt;scraping-venv&lt;span class="o"&gt;)&lt;/span&gt; $ pip install requests bs4 json csv
&lt;/pre&gt;&lt;p&gt;Now since we know the url to GET, we can get the first page of responses very
easily:&lt;/p&gt;
&lt;pre class="code python"&gt;&lt;a id="rest_code_1ea4de6feae84ab498370ae8afaa418b-1" name="rest_code_1ea4de6feae84ab498370ae8afaa418b-1"&gt;&lt;/a&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;requests&lt;/span&gt;
&lt;a id="rest_code_1ea4de6feae84ab498370ae8afaa418b-2" name="rest_code_1ea4de6feae84ab498370ae8afaa418b-2"&gt;&lt;/a&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;bs4&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;
&lt;a id="rest_code_1ea4de6feae84ab498370ae8afaa418b-3" name="rest_code_1ea4de6feae84ab498370ae8afaa418b-3"&gt;&lt;/a&gt;
&lt;a id="rest_code_1ea4de6feae84ab498370ae8afaa418b-4" name="rest_code_1ea4de6feae84ab498370ae8afaa418b-4"&gt;&lt;/a&gt;&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="s2"&gt;"https://electorate.aec.gov.au/LocalitySearchResults.aspx?"&lt;/span&gt;
&lt;a id="rest_code_1ea4de6feae84ab498370ae8afaa418b-5" name="rest_code_1ea4de6feae84ab498370ae8afaa418b-5"&gt;&lt;/a&gt;&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="s2"&gt;"filter=&lt;/span&gt;&lt;span class="si"&gt;{0}&lt;/span&gt;&lt;span class="s2"&gt;&amp;amp;filterby=Postcode"&lt;/span&gt;
&lt;a id="rest_code_1ea4de6feae84ab498370ae8afaa418b-6" name="rest_code_1ea4de6feae84ab498370ae8afaa418b-6"&gt;&lt;/a&gt;
&lt;a id="rest_code_1ea4de6feae84ab498370ae8afaa418b-7" name="rest_code_1ea4de6feae84ab498370ae8afaa418b-7"&gt;&lt;/a&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2620&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;a id="rest_code_1ea4de6feae84ab498370ae8afaa418b-8" name="rest_code_1ea4de6feae84ab498370ae8afaa418b-8"&gt;&lt;/a&gt;&lt;span class="n"&gt;resh&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"html.parser"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;&lt;a class="reference external" href="https://www.crummy.com/software/BeautifulSoup"&gt;Beautiful Soup&lt;/a&gt; parses the response text, and gives us a tree-like structure
to work with. Making use of the &lt;a class="reference external" href="https://developers.google.com/web/tools/chrome-devtools/"&gt;Chrome devtools&lt;/a&gt; (or the &lt;a class="reference external" href="https://developer.mozilla.org/en-US/docs/Tools"&gt;Firefox devtools&lt;/a&gt;  )
I could see that I need to find a &lt;code class="docutils literal"&gt;&amp;lt;table&amp;gt;&lt;/code&gt; with an attribute of
&lt;code class="docutils literal"&gt;*ContentPlaceHolderBody_gridViewLocalities*&lt;/code&gt; - what a mouthful! - and then
process all the table rows (&lt;code class="docutils literal"&gt;&amp;lt;tr&amp;gt;&lt;/code&gt;) within that table&lt;/p&gt;
&lt;pre class="code python"&gt;&lt;a id="rest_code_5f680d5929c84f7e91d9becd2401523d-1" name="rest_code_5f680d5929c84f7e91d9becd2401523d-1"&gt;&lt;/a&gt;&lt;span class="n"&gt;tblAttr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ContentPlaceHolderBody_gridViewLocalities"&lt;/span&gt;
&lt;a id="rest_code_5f680d5929c84f7e91d9becd2401523d-2" name="rest_code_5f680d5929c84f7e91d9becd2401523d-2"&gt;&lt;/a&gt;&lt;span class="n"&gt;restbl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resh&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;find_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"table"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tblAttr&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;a id="rest_code_5f680d5929c84f7e91d9becd2401523d-3" name="rest_code_5f680d5929c84f7e91d9becd2401523d-3"&gt;&lt;/a&gt;&lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;restbl&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;find_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"tr"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;Using a &lt;code class="docutils literal"&gt;for&lt;/code&gt; loop we can construct a &lt;code class="docutils literal"&gt;dict&lt;/code&gt; of the data that we actually
need. Simple!&lt;/p&gt;
&lt;p&gt;What do we do, though, when we want to get the &lt;em&gt;second&lt;/em&gt; or later pages of
result? This is where the squirrelly features and callbacks come in. The page
makes use of an &lt;code class="docutils literal"&gt;__EVENTARGUMENT&lt;/code&gt; element which is &lt;code class="docutils literal"&gt;POST&lt;/code&gt; ed as payload
back to the same url. The way that we determine this is to look for a row with
the class &lt;code class="docutils literal"&gt;pagingLink&lt;/code&gt;, then for each table data (&lt;code class="docutils literal"&gt;&amp;lt;td&amp;gt;&lt;/code&gt;) element check
for its contents matching this regex&lt;/p&gt;
&lt;pre class="code python"&gt;&lt;a id="rest_code_97cee8dbcdeb4fb9988a02b3f7371790-1" name="rest_code_97cee8dbcdeb4fb9988a02b3f7371790-1"&gt;&lt;/a&gt;&lt;span class="s2"&gt;".*__doPostBack.'(.*?gridViewLocalities)','(Page.[0-9]+)'.*"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;img alt="/images/2019/paginglinktable.png" src="https://www.jmcpdotcom.com/blog/images/2019/paginglinktable.png"&gt;
&lt;p&gt;And after that we can recursively call our query with the extra payload data
in the argument list:&lt;/p&gt;
&lt;pre class="code python"&gt;&lt;a id="rest_code_8781b43a6fa84146b15eb09a01025847-1" name="rest_code_8781b43a6fa84146b15eb09a01025847-1"&gt;&lt;/a&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;queryAEC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;postcode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;extrapage&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;a id="rest_code_8781b43a6fa84146b15eb09a01025847-2" name="rest_code_8781b43a6fa84146b15eb09a01025847-2"&gt;&lt;/a&gt;    &lt;span class="sd"&gt;"""&lt;/span&gt;
&lt;a id="rest_code_8781b43a6fa84146b15eb09a01025847-3" name="rest_code_8781b43a6fa84146b15eb09a01025847-3"&gt;&lt;/a&gt;&lt;span class="sd"&gt;    Queries the AEC url and returns soup. If extrapage is empty&lt;/span&gt;
&lt;a id="rest_code_8781b43a6fa84146b15eb09a01025847-4" name="rest_code_8781b43a6fa84146b15eb09a01025847-4"&gt;&lt;/a&gt;&lt;span class="sd"&gt;    then we pass the soup to findFollowups before returning.&lt;/span&gt;
&lt;a id="rest_code_8781b43a6fa84146b15eb09a01025847-5" name="rest_code_8781b43a6fa84146b15eb09a01025847-5"&gt;&lt;/a&gt;&lt;span class="sd"&gt;    """&lt;/span&gt;
&lt;a id="rest_code_8781b43a6fa84146b15eb09a01025847-6" name="rest_code_8781b43a6fa84146b15eb09a01025847-6"&gt;&lt;/a&gt;    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"https://electorate.aec.gov.au/LocalitySearchResults.aspx?"&lt;/span&gt;
&lt;a id="rest_code_8781b43a6fa84146b15eb09a01025847-7" name="rest_code_8781b43a6fa84146b15eb09a01025847-7"&gt;&lt;/a&gt;    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="s2"&gt;"filter=&lt;/span&gt;&lt;span class="si"&gt;{0}&lt;/span&gt;&lt;span class="s2"&gt;&amp;amp;filterby=Postcode"&lt;/span&gt;
&lt;a id="rest_code_8781b43a6fa84146b15eb09a01025847-8" name="rest_code_8781b43a6fa84146b15eb09a01025847-8"&gt;&lt;/a&gt;
&lt;a id="rest_code_8781b43a6fa84146b15eb09a01025847-9" name="rest_code_8781b43a6fa84146b15eb09a01025847-9"&gt;&lt;/a&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;extrapage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_8781b43a6fa84146b15eb09a01025847-10" name="rest_code_8781b43a6fa84146b15eb09a01025847-10"&gt;&lt;/a&gt;        &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;postcode&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;a id="rest_code_8781b43a6fa84146b15eb09a01025847-11" name="rest_code_8781b43a6fa84146b15eb09a01025847-11"&gt;&lt;/a&gt;    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_8781b43a6fa84146b15eb09a01025847-12" name="rest_code_8781b43a6fa84146b15eb09a01025847-12"&gt;&lt;/a&gt;        &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"__EVENTARGUMENT"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;extrapage&lt;/span&gt;
&lt;a id="rest_code_8781b43a6fa84146b15eb09a01025847-13" name="rest_code_8781b43a6fa84146b15eb09a01025847-13"&gt;&lt;/a&gt;        &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;postcode&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;I now had a &lt;a class="reference external" href="https://github.com/jmcp/grabbag/blob/master/postcode.py"&gt;script to run&lt;/a&gt;  which extracted this info and pretty-printed it
(as well as the same info in JSON):&lt;/p&gt;
&lt;pre class="code shell"&gt;&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-1" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-1"&gt;&lt;/a&gt;$ ./postcode.py &lt;span class="m"&gt;2620&lt;/span&gt;
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-2" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-2"&gt;&lt;/a&gt;State    Postcode   Locality                         Electorate
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-3" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-3"&gt;&lt;/a&gt;ACT      &lt;span class="m"&gt;2620&lt;/span&gt;       BEARD                            Canberra
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-4" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-4"&gt;&lt;/a&gt;ACT      &lt;span class="m"&gt;2620&lt;/span&gt;       BOOTH DISTRICT                   Bean
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-5" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-5"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       BURRA                            Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-6" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-6"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       CARWOOLA                         Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-7" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-7"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       CLEAR RANGE                      Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-8" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-8"&gt;&lt;/a&gt;ACT      &lt;span class="m"&gt;2620&lt;/span&gt;       CORIN DAM                        Bean
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-9" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-9"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       CRESTWOOD                        Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-10" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-10"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       ENVIRONA                         Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-11" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-11"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       FERNLEIGH PARK                   Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-12" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-12"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       GOOGONG                          Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-13" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-13"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       GREENLEIGH                       Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-14" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-14"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       GUNDAROO                         Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-15" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-15"&gt;&lt;/a&gt;ACT      &lt;span class="m"&gt;2620&lt;/span&gt;       HUME                             Bean
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-16" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-16"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       KARABAR                          Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-17" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-17"&gt;&lt;/a&gt;ACT      &lt;span class="m"&gt;2620&lt;/span&gt;       KOWEN DISTRICT                   Canberra
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-18" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-18"&gt;&lt;/a&gt;ACT      &lt;span class="m"&gt;2620&lt;/span&gt;       KOWEN FOREST                     Canberra
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-19" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-19"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       MICHELAGO                        Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-20" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-20"&gt;&lt;/a&gt;ACT      &lt;span class="m"&gt;2620&lt;/span&gt;       OAKS ESTATE                      Canberra
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-21" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-21"&gt;&lt;/a&gt;ACT      &lt;span class="m"&gt;2620&lt;/span&gt;       PADDYS RIVER DISTRICT            Bean
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-22" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-22"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       QUEANBEYAN                       Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-23" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-23"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       YARROW                           Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-24" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-24"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       QUEANBEYAN EAST                  Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-25" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-25"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       QUEANBEYAN WEST                  Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-26" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-26"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       RADCLIFFE                        Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-27" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-27"&gt;&lt;/a&gt;ACT      &lt;span class="m"&gt;2620&lt;/span&gt;       RENDEZVOUS CREEK DISTRICT        Bean
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-28" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-28"&gt;&lt;/a&gt;ACT      &lt;span class="m"&gt;2620&lt;/span&gt;       ROYALLA                          Bean
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-29" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-29"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       ROYALLA                          Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-30" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-30"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       SUTTON                           Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-31" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-31"&gt;&lt;/a&gt;ACT      &lt;span class="m"&gt;2620&lt;/span&gt;       TENNENT DISTRICT                 Bean
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-32" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-32"&gt;&lt;/a&gt;ACT      &lt;span class="m"&gt;2620&lt;/span&gt;       THARWA                           Bean
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-33" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-33"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       THARWA                           Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-34" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-34"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       THARWA                           Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-35" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-35"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       THE ANGLE                        Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-36" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-36"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       THE RIDGEWAY                     Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-37" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-37"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       TINDERRY                         Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-38" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-38"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       TRALEE                           Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-39" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-39"&gt;&lt;/a&gt;ACT      &lt;span class="m"&gt;2620&lt;/span&gt;       TUGGERANONG DISTRICT             Bean
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-40" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-40"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       URILA                            Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-41" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-41"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       WAMBOIN                          Eden-Monaro
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-42" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-42"&gt;&lt;/a&gt;ACT      &lt;span class="m"&gt;2620&lt;/span&gt;       WILLIAMSDALE                     Bean
&lt;a id="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-43" name="rest_code_18e43e57a9b146a6a85b071a1d80a2f8-43"&gt;&lt;/a&gt;NSW      &lt;span class="m"&gt;2620&lt;/span&gt;       WILLIAMSDALE                     Eden-Monaro
&lt;/pre&gt;&lt;p&gt;That really is quite a few suburbs.&lt;/p&gt;
&lt;p&gt;So now that we've got a way to extract that information, how do we make it
available and useful for everybody? With a &lt;a class="reference external" href="https://github.com/jmcp/find-my-electorate/blob/master/__init__.py"&gt;microservice&lt;/a&gt;! I hear you cry.&lt;/p&gt;
&lt;p&gt;The very first &lt;a class="reference external" href="https://github.com/jmcp/find-my-electorate/blob/master/__init__.py"&gt;microservice&lt;/a&gt; I wrote (in 2011-12, the subject of a future
post) used &lt;a class="reference external" href="https://cherrypy.org"&gt;CherryPy&lt;/a&gt;, because we'd embedded it within &lt;a class="reference external" href="https://github.com/oracle/solaris-ips"&gt;Solaris IPS&lt;/a&gt; (image
packaging system) and didn't need any further corporate approvals. The path of
least resistance. This time, however, I was unconstrained regarding approvals,
so had to choose between &lt;a class="reference external" href="https://www.djangoproject.com"&gt;Django&lt;/a&gt; and &lt;a class="reference external" href="https://palletsprojects.com/p/flask/"&gt;flask&lt;/a&gt;. For no particular reason, I
chose &lt;a class="reference external" href="https://palletsprojects.com/p/flask/"&gt;flask&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It was pretty easy to cons up the requisite templates, and write the
&lt;code class="docutils literal"&gt;/results&lt;/code&gt; method. It was at this point that my &lt;em&gt;extend the fix&lt;/em&gt; habit
(learnt via the &lt;a class="reference external" href="https://www.kepner-tregoe.com"&gt;Kepner-Tregoe&lt;/a&gt;  &lt;a class="reference external" href="https://www.kepner-tregoe.com/training-workshops/our-training-workshops/analytic-trouble-shooting/"&gt;Analytical Troubleshooting&lt;/a&gt; training many
years ago) kicked in, and I started exploring the &lt;a class="reference external" href="https://www.ecq.qld.gov.au"&gt;Electoral Commission of
Queensland&lt;/a&gt; website for the same sort of information. To my surprise, the
relatively straight-forward interface of the &lt;a class="reference external" href="https://www.aec.gov.au"&gt;AEC&lt;/a&gt; was not available, and the
closest analogue was an interactive map.&lt;/p&gt;
&lt;p&gt;After a brief phone conversation with &lt;a class="reference external" href="https://www.ecq.qld.gov.au"&gt;ECQ&lt;/a&gt; and more digging, I discovered
that the 2017 boundaries were available from &lt;a class="reference external" href="http://qldspatial.information.qld.gov.au/catalogue/custom/detail.page?fid=%7B079E7EF8-30C5-4C1D-9ABF-3D196713694F%7D"&gt;QLD Spatial&lt;/a&gt; in shapefile,
MapInfo and Google Maps KML formats. This was very useful, because KML can be
mucked about with directly using &lt;a class="reference external" href="https://www.crummy.com/software/BeautifulSoup"&gt;Beautiful Soup&lt;/a&gt;. After not too much effort I
had the latitude+longitude pairs for the boundaries extracted and stored as
&lt;code class="docutils literal"&gt;JSON&lt;/code&gt; . My phone conversation with &lt;a class="reference external" href="https://www.ecq.qld.gov.au"&gt;ECQ&lt;/a&gt; also took me down the path
of wanting to translate a street address into &lt;a class="reference external" href="https://en.wikipedia.org/wiki/GeoJSON"&gt;GeoJSON&lt;/a&gt; - and &lt;em&gt;that&lt;/em&gt; took me
to the &lt;a class="reference external" href="https://developers.google.com/maps/documentation/maps-static/dev-guide#Locations"&gt;Google Maps API&lt;/a&gt;. I did investigate &lt;a class="reference external" href="https://wiki.openstreetmap.org/wiki/API_v0.6"&gt;OpenStreetMap&lt;/a&gt;'s api, but
testing a few specific locations (addresses where we've lived over the
years) gave me significantly different latitude+longitude results. I bit the
bullet and got a &lt;a class="reference external" href="https://developers.google.com/maps/documentation/maps-static/get-api-key"&gt;Google Maps API key&lt;/a&gt; .&lt;/p&gt;
&lt;p&gt;The next step was to research how to find out if a specific point is located
within a polygon, and to my delight the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Even%E2%80%93odd_rule"&gt;Even-odd rule&lt;/a&gt; has &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Even%E2%80%93odd_rule#Implementation"&gt;example code&lt;/a&gt;
in &lt;a class="reference external" href="https://www.python.org"&gt;Python&lt;/a&gt;, which needed only a small change to work with my data
arrangement.&lt;/p&gt;
&lt;p&gt;With that knowledge in hand, it was time to turn the handle on the
&lt;a class="reference external" href="https://developers.google.com/maps/documentation/maps-static/dev-guide#Locations"&gt;Google Maps API&lt;/a&gt; :&lt;/p&gt;
&lt;pre class="code python"&gt;&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-1" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-1"&gt;&lt;/a&gt;&lt;span class="n"&gt;keyarg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&amp;amp;key=&lt;/span&gt;&lt;span class="si"&gt;{gmapkey}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-2" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-2"&gt;&lt;/a&gt;&lt;span class="n"&gt;queryurl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"https://maps.googleapis.com/maps/api/geocode/json?address="&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-3" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-3"&gt;&lt;/a&gt;&lt;span class="n"&gt;queryurl&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{addr}&lt;/span&gt;&lt;span class="s2"&gt; Australia"&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-4" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-4"&gt;&lt;/a&gt;&lt;span class="n"&gt;queryurl&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;keyarg&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-5" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-5"&gt;&lt;/a&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-6" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-6"&gt;&lt;/a&gt;&lt;span class="o"&gt;...&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-7" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-7"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Helper functions&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-8" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-8"&gt;&lt;/a&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_geoJson&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-9" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-9"&gt;&lt;/a&gt;    &lt;span class="sd"&gt;"""&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-10" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-10"&gt;&lt;/a&gt;&lt;span class="sd"&gt;    Queries the Google Maps API for specified address, returns&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-11" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-11"&gt;&lt;/a&gt;&lt;span class="sd"&gt;    a dict of the formatted address, the state/territory name, and&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-12" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-12"&gt;&lt;/a&gt;&lt;span class="sd"&gt;    a float-ified version of the latitude and longitude.&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-13" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-13"&gt;&lt;/a&gt;&lt;span class="sd"&gt;    """&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-14" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-14"&gt;&lt;/a&gt;    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;queryurl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gmapkey&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;gmapkey&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-15" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-15"&gt;&lt;/a&gt;    &lt;span class="n"&gt;dictr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-16" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-16"&gt;&lt;/a&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="s2"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"ZERO_RESULTS"&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-17" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-17"&gt;&lt;/a&gt;        &lt;span class="n"&gt;dictr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"res"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-18" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-18"&gt;&lt;/a&gt;    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-19" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-19"&gt;&lt;/a&gt;        &lt;span class="n"&gt;rresj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="s2"&gt;"results"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-20" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-20"&gt;&lt;/a&gt;        &lt;span class="n"&gt;dictr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"formatted_address"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rresj&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"formatted_address"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-21" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-21"&gt;&lt;/a&gt;        &lt;span class="n"&gt;dictr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"latlong"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rresj&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"geometry"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s2"&gt;"location"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-22" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-22"&gt;&lt;/a&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rresj&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"address_components"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-23" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-23"&gt;&lt;/a&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"types"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"administrative_area_level_1"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-24" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-24"&gt;&lt;/a&gt;                &lt;span class="n"&gt;dictr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"state"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"short_name"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-25" name="rest_code_99763cf4cc7f47659f5ebedac04fcd0c-25"&gt;&lt;/a&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dictr&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;When you provide an address, we send that to Google which does a best-effort
match on the text address then returns &lt;a class="reference external" href="https://en.wikipedia.org/wiki/GeoJSON"&gt;GeoJSON&lt;/a&gt; for that match. For example,
if you enter&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;42 Wallaby Way, Sydney&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;the best-effort match will give you&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;42 Rock Wallaby Way, Blaxland NSW 2774, Australia&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I now had a way to translate a street address into a federal electorate, but
with incomplete per-State data my app wasn't finished. I managed to get
Federal, Queensland, New South Wales, Victoria and Tasmania data fairly easily
(see the links below) and South Australia's data came via personal email after
an enquiry through their contact page. I didn't get any response to several
contact attempts with either Western Australia or the Northern Territory, and
the best I could get for the ACT was their electorate to suburb associations.&lt;/p&gt;
&lt;p&gt;I remembered that the &lt;a class="reference external" href="https://www.abs.gov.au"&gt;Australian Bureau of Statistics&lt;/a&gt; has a &lt;strong&gt;standard&lt;/strong&gt;
called &lt;a class="reference external" href="https://www.abs.gov.au/websitedbs/D3310114.nsf/home/Australian+Statistical+Geography+Standard+(ASGS)"&gt;Statistical Geography&lt;/a&gt;, and the smallest unit of that is called a
&lt;a class="reference external" href="https://en.wikipedia.org/wiki/Meshblock#Australia"&gt;Mesh Block&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Mesh Blocks (MBs)&lt;/strong&gt; are the smallest geographical area defined by the
ABS. They are designed as geographic building blocks rather than as areas
for the release of statistics themselves. All statistical areas in the
ASGS, both ABS and Non ABS Structures, are built up from Mesh Blocks. As a
result the design of Mesh Blocks takes into account many factors including
administrative boundaries such as Cadastre, &lt;em&gt;Suburbs&lt;/em&gt; and &lt;em&gt;Localities&lt;/em&gt; and
LGAs as well as land uses and dwelling distribution.
(emphasis added)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Mesh Blocks are then aggregated into SA1s:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Statistical Areas Level 1 (SA1s)&lt;/strong&gt; are designed to maximise the spatial
detail available for Census data. Most SA1s have a population of between
200 to 800 persons with an average population of approximately 400
persons. This is to optimise the balance between spatial detail and the
ability to cross classify Census variables without the resulting counts
becoming too small for use. SA1s aim to separate out areas with different
geographic characteristics within Suburb and Locality boundaries. In rural
areas they often combine related Locality boundaries. &lt;em&gt;SA1s are
aggregations of Mesh Blocks.&lt;/em&gt;
(emphasis added)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;With this knowledge, and a handy SA1-to-postcode map in CSV format&lt;/p&gt;
&lt;pre class="code shell"&gt;&lt;a id="rest_code_ef937a48f2af4eafb9815439b73173b9-1" name="rest_code_ef937a48f2af4eafb9815439b73173b9-1"&gt;&lt;/a&gt;$ head australia-whole/SED_2018_AUST.csv
&lt;a id="rest_code_ef937a48f2af4eafb9815439b73173b9-2" name="rest_code_ef937a48f2af4eafb9815439b73173b9-2"&gt;&lt;/a&gt;SA1_MAINCODE_2016,SED_CODE_2018,SED_NAME_2018,STATE_CODE_2016,STATE_NAME_2016,AREA_ALBERS_SQKM
&lt;a id="rest_code_ef937a48f2af4eafb9815439b73173b9-3" name="rest_code_ef937a48f2af4eafb9815439b73173b9-3"&gt;&lt;/a&gt;&lt;span class="m"&gt;10102100701&lt;/span&gt;,10031,Goulburn,1,New South Wales,362.8727
&lt;a id="rest_code_ef937a48f2af4eafb9815439b73173b9-4" name="rest_code_ef937a48f2af4eafb9815439b73173b9-4"&gt;&lt;/a&gt;&lt;span class="m"&gt;10102100702&lt;/span&gt;,10053,Monaro,1,New South Wales,229.7459
&lt;a id="rest_code_ef937a48f2af4eafb9815439b73173b9-5" name="rest_code_ef937a48f2af4eafb9815439b73173b9-5"&gt;&lt;/a&gt;&lt;span class="m"&gt;10102100703&lt;/span&gt;,10053,Monaro,1,New South Wales,2.3910
&lt;a id="rest_code_ef937a48f2af4eafb9815439b73173b9-6" name="rest_code_ef937a48f2af4eafb9815439b73173b9-6"&gt;&lt;/a&gt;&lt;span class="m"&gt;10102100704&lt;/span&gt;,10053,Monaro,1,New South Wales,1.2816
&lt;a id="rest_code_ef937a48f2af4eafb9815439b73173b9-7" name="rest_code_ef937a48f2af4eafb9815439b73173b9-7"&gt;&lt;/a&gt;&lt;span class="m"&gt;10102100705&lt;/span&gt;,10053,Monaro,1,New South Wales,1.1978
&lt;a id="rest_code_ef937a48f2af4eafb9815439b73173b9-8" name="rest_code_ef937a48f2af4eafb9815439b73173b9-8"&gt;&lt;/a&gt;....
&lt;/pre&gt;&lt;p&gt;I went looking into the SA1 information from the &lt;a class="reference external" href="https://www.abs.gov.au/ausstats/subscriber.nsf/log?openagent&amp;amp;1259030001_ste11aaust_midmif.zip&amp;amp;1259.0.30.001&amp;amp;Data%20Cubes&amp;amp;6E45E3029A27FFEFCA2578CC0012083E&amp;amp;0&amp;amp;July%202011&amp;amp;14.07.2011&amp;amp;Latest"&gt;ABS shapefile&lt;/a&gt; covering the
whole of the country. Transforming the shapefile into kml is done with
&lt;a class="reference external" href="https://gdal.org/programs/ogr2ogr.html"&gt;ogr2ogr&lt;/a&gt; and provides us with an XML schema definition. From the CSV header
line above we can see that we want the &lt;code class="docutils literal"&gt;SA1_MAINCODE_2016&lt;/code&gt; and (for
validation) the &lt;code class="docutils literal"&gt;STATE_NAME_2016&lt;/code&gt; fields. Having made a per-state list of
the SA1s, we go back to the kml and process each member of the document:&lt;/p&gt;
&lt;pre class="code xml"&gt;&lt;a id="rest_code_0abed389bf914a43ac1e37b2e5c9a098-1" name="rest_code_0abed389bf914a43ac1e37b2e5c9a098-1"&gt;&lt;/a&gt;  &lt;span class="nt"&gt;&amp;lt;gml:featureMember&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_0abed389bf914a43ac1e37b2e5c9a098-2" name="rest_code_0abed389bf914a43ac1e37b2e5c9a098-2"&gt;&lt;/a&gt;    &lt;span class="nt"&gt;&amp;lt;ogr:SED_2018_AUST&lt;/span&gt; &lt;span class="na"&gt;fid=&lt;/span&gt;&lt;span class="s"&gt;"SED_2018_AUST.0"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_0abed389bf914a43ac1e37b2e5c9a098-3" name="rest_code_0abed389bf914a43ac1e37b2e5c9a098-3"&gt;&lt;/a&gt;      &lt;span class="nt"&gt;&amp;lt;ogr:geometryProperty&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_0abed389bf914a43ac1e37b2e5c9a098-4" name="rest_code_0abed389bf914a43ac1e37b2e5c9a098-4"&gt;&lt;/a&gt;        &lt;span class="nt"&gt;&amp;lt;gml:Polygon&lt;/span&gt; &lt;span class="na"&gt;srsName=&lt;/span&gt;&lt;span class="s"&gt;"EPSG:4283"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_0abed389bf914a43ac1e37b2e5c9a098-5" name="rest_code_0abed389bf914a43ac1e37b2e5c9a098-5"&gt;&lt;/a&gt;          &lt;span class="nt"&gt;&amp;lt;gml:outerBoundaryIs&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_0abed389bf914a43ac1e37b2e5c9a098-6" name="rest_code_0abed389bf914a43ac1e37b2e5c9a098-6"&gt;&lt;/a&gt;            &lt;span class="nt"&gt;&amp;lt;gml:LinearRing&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_0abed389bf914a43ac1e37b2e5c9a098-7" name="rest_code_0abed389bf914a43ac1e37b2e5c9a098-7"&gt;&lt;/a&gt;              &lt;span class="nt"&gt;&amp;lt;gml:coordinates&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_0abed389bf914a43ac1e37b2e5c9a098-8" name="rest_code_0abed389bf914a43ac1e37b2e5c9a098-8"&gt;&lt;/a&gt;  ....
&lt;a id="rest_code_0abed389bf914a43ac1e37b2e5c9a098-9" name="rest_code_0abed389bf914a43ac1e37b2e5c9a098-9"&gt;&lt;/a&gt;              &lt;span class="nt"&gt;&amp;lt;/gml:coordinates&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_0abed389bf914a43ac1e37b2e5c9a098-10" name="rest_code_0abed389bf914a43ac1e37b2e5c9a098-10"&gt;&lt;/a&gt;            &lt;span class="nt"&gt;&amp;lt;/gml:LinearRing&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_0abed389bf914a43ac1e37b2e5c9a098-11" name="rest_code_0abed389bf914a43ac1e37b2e5c9a098-11"&gt;&lt;/a&gt;          &lt;span class="nt"&gt;&amp;lt;/gml:outerBoundaryIs&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_0abed389bf914a43ac1e37b2e5c9a098-12" name="rest_code_0abed389bf914a43ac1e37b2e5c9a098-12"&gt;&lt;/a&gt;        &lt;span class="nt"&gt;&amp;lt;/gml:Polygon&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_0abed389bf914a43ac1e37b2e5c9a098-13" name="rest_code_0abed389bf914a43ac1e37b2e5c9a098-13"&gt;&lt;/a&gt;      &lt;span class="nt"&gt;&amp;lt;/gml:polygonMember&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_0abed389bf914a43ac1e37b2e5c9a098-14" name="rest_code_0abed389bf914a43ac1e37b2e5c9a098-14"&gt;&lt;/a&gt;    &lt;span class="nt"&gt;&amp;lt;/ogr:geometryProperty&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_0abed389bf914a43ac1e37b2e5c9a098-15" name="rest_code_0abed389bf914a43ac1e37b2e5c9a098-15"&gt;&lt;/a&gt;    &lt;span class="nt"&gt;&amp;lt;ogr:SED_CODE18&amp;gt;&lt;/span&gt;30028&lt;span class="nt"&gt;&amp;lt;/ogr:SED_CODE18&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_0abed389bf914a43ac1e37b2e5c9a098-16" name="rest_code_0abed389bf914a43ac1e37b2e5c9a098-16"&gt;&lt;/a&gt;    &lt;span class="nt"&gt;&amp;lt;ogr:SED_NAME18&amp;gt;&lt;/span&gt;Gladstone&lt;span class="nt"&gt;&amp;lt;/ogr:SED_NAME18&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_0abed389bf914a43ac1e37b2e5c9a098-17" name="rest_code_0abed389bf914a43ac1e37b2e5c9a098-17"&gt;&lt;/a&gt;    &lt;span class="nt"&gt;&amp;lt;ogr:AREASQKM18&amp;gt;&lt;/span&gt;2799.9552&lt;span class="nt"&gt;&amp;lt;/ogr:AREASQKM18&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_0abed389bf914a43ac1e37b2e5c9a098-18" name="rest_code_0abed389bf914a43ac1e37b2e5c9a098-18"&gt;&lt;/a&gt;  &lt;span class="nt"&gt;&amp;lt;/ogr:SED_2018_AUST&amp;gt;&lt;/span&gt;
&lt;a id="rest_code_0abed389bf914a43ac1e37b2e5c9a098-19" name="rest_code_0abed389bf914a43ac1e37b2e5c9a098-19"&gt;&lt;/a&gt;&lt;span class="nt"&gt;&amp;lt;/gml:featureMember&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;The &lt;code class="docutils literal"&gt;gml:coordinates&lt;/code&gt; are what we really need, they're space-separated
lat,long pairs.&lt;/p&gt;
&lt;pre class="code python"&gt;&lt;a id="rest_code_54cc95167a564893b18a8bc2a1da0ce5-1" name="rest_code_54cc95167a564893b18a8bc2a1da0ce5-1"&gt;&lt;/a&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;feature&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sakml&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;findAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"gml:featureMember"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;a id="rest_code_54cc95167a564893b18a8bc2a1da0ce5-2" name="rest_code_54cc95167a564893b18a8bc2a1da0ce5-2"&gt;&lt;/a&gt;&lt;span class="n"&gt;sa1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;feature&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"ogr:SA1_MAIN16"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
&lt;a id="rest_code_54cc95167a564893b18a8bc2a1da0ce5-3" name="rest_code_54cc95167a564893b18a8bc2a1da0ce5-3"&gt;&lt;/a&gt;&lt;span class="n"&gt;mb_coord&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sa1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mb_to_points&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;feature&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_54cc95167a564893b18a8bc2a1da0ce5-4" name="rest_code_54cc95167a564893b18a8bc2a1da0ce5-4"&gt;&lt;/a&gt;
&lt;a id="rest_code_54cc95167a564893b18a8bc2a1da0ce5-5" name="rest_code_54cc95167a564893b18a8bc2a1da0ce5-5"&gt;&lt;/a&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;mb_to_sed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;a id="rest_code_54cc95167a564893b18a8bc2a1da0ce5-6" name="rest_code_54cc95167a564893b18a8bc2a1da0ce5-6"&gt;&lt;/a&gt;    &lt;span class="n"&gt;electorate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mb_to_sed&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;a id="rest_code_54cc95167a564893b18a8bc2a1da0ce5-7" name="rest_code_54cc95167a564893b18a8bc2a1da0ce5-7"&gt;&lt;/a&gt;    &lt;span class="n"&gt;sed_to_mb&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;electorate&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s2"&gt;"coords"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mb_coord&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;After which we can write each jurisdiction's &lt;code class="docutils literal"&gt;dict&lt;/code&gt; of localities and lat/long
coordinates out as JSON using &lt;code class="docutils literal"&gt;json.dump(localitydict, outfile)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To confirm that I had the correct data, I wrote another simple quick-n-dirty
script &lt;a class="reference external" href="https://github.com/jmcp/grabbag/blob/master/jsoncheck.py"&gt;jsoncheck.py&lt;/a&gt; to diff the SA1-acquired JSON against my other
extractions. There was one difference of importance found - Queensland has a
new electorate &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Electoral_district_of_McConnel"&gt;McConnel&lt;/a&gt;, which was created after the most recent ABS SA1
allocation.&lt;/p&gt;
&lt;p&gt;So that's the data preparation done, back to the &lt;a class="reference external" href="https://palletsprojects.com/p/flask/"&gt;flask&lt;/a&gt; app! The app listens
at the root (&lt;code class="docutils literal"&gt;/&lt;/code&gt;), and is a simple text form. Hitting &lt;code class="docutils literal"&gt;enter&lt;/code&gt; after typing
in an address routes the &lt;code class="docutils literal"&gt;POST&lt;/code&gt; request to the &lt;a class="reference external" href="https://github.com/jmcp/find-my-electorate/blob/master/__init__.py#L206"&gt;results&lt;/a&gt; function where we
call out to the &lt;a class="reference external" href="https://developers.google.com/maps/documentation/maps-static/dev-guide#Locations"&gt;Google Maps API&lt;/a&gt;, load the relevant state JSONified
electorate list, and then locate the Federal division. There are 151 Federal
divisions, so it's not necessarily a bad thing to search through each on an
alphabetic basis and break when we get a match. I haven't figured out a way to
(time and space)-efficiently hash the coordinates vs divisions. After
determining the Federal division we then use the same method to check against
the identified state's electorate list.&lt;/p&gt;
&lt;p&gt;The first version of the app just returned the two electorate names, but I
didn't think that was very friendly, so I added another call to the
&lt;a class="reference external" href="https://developers.google.com/maps/documentation/maps-static/dev-guide#Locations"&gt;Google Maps API&lt;/a&gt; to retrieve a 400x400 image showing the supplied address on
the map; clicking on that map takes you to the larger Google-hosted map. I
also added links to the &lt;a class="reference external" href="https://www.wikipedia.org"&gt;Wikipedia&lt;/a&gt; entries for the Federal and state
electorates. To render the image's binary data we use &lt;code class="docutils literal"&gt;b64encode&lt;/code&gt;:&lt;/p&gt;
&lt;pre class="code python"&gt;&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-1" name="rest_code_6b3791c94e7542e4adadac932331c2c4-1"&gt;&lt;/a&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;base64&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;b64encode&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-2" name="rest_code_6b3791c94e7542e4adadac932331c2c4-2"&gt;&lt;/a&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-3" name="rest_code_6b3791c94e7542e4adadac932331c2c4-3"&gt;&lt;/a&gt;&lt;span class="n"&gt;keyarg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&amp;amp;key=&lt;/span&gt;&lt;span class="si"&gt;{gmapkey}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-4" name="rest_code_6b3791c94e7542e4adadac932331c2c4-4"&gt;&lt;/a&gt;&lt;span class="n"&gt;imgurl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"https://maps.googleapis.com/maps/api/staticmap?size=400x400"&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-5" name="rest_code_6b3791c94e7542e4adadac932331c2c4-5"&gt;&lt;/a&gt;&lt;span class="n"&gt;imgurl&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="s2"&gt;"&amp;amp;center=&lt;/span&gt;&lt;span class="si"&gt;{lati}&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="si"&gt;{longi}&lt;/span&gt;&lt;span class="s2"&gt;&amp;amp;scale=1&amp;amp;maptype=roadmap&amp;amp;zoom=13"&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-6" name="rest_code_6b3791c94e7542e4adadac932331c2c4-6"&gt;&lt;/a&gt;&lt;span class="n"&gt;imgurl&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="s2"&gt;"&amp;amp;markers=X|&lt;/span&gt;&lt;span class="si"&gt;{lati}&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="si"&gt;{longi}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-7" name="rest_code_6b3791c94e7542e4adadac932331c2c4-7"&gt;&lt;/a&gt;&lt;span class="n"&gt;imgurl&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;keyarg&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-8" name="rest_code_6b3791c94e7542e4adadac932331c2c4-8"&gt;&lt;/a&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-9" name="rest_code_6b3791c94e7542e4adadac932331c2c4-9"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Let's provide a Google Maps static picture of the location&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-10" name="rest_code_6b3791c94e7542e4adadac932331c2c4-10"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# Adapted from&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-11" name="rest_code_6b3791c94e7542e4adadac932331c2c4-11"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# https://stackoverflow.com/questions/25140826/generate-image-embed-in-flask-with-a-data-uri/25141268#25141268&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-12" name="rest_code_6b3791c94e7542e4adadac932331c2c4-12"&gt;&lt;/a&gt;&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-13" name="rest_code_6b3791c94e7542e4adadac932331c2c4-13"&gt;&lt;/a&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;latlong&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-14" name="rest_code_6b3791c94e7542e4adadac932331c2c4-14"&gt;&lt;/a&gt;    &lt;span class="sd"&gt;"""&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-15" name="rest_code_6b3791c94e7542e4adadac932331c2c4-15"&gt;&lt;/a&gt;&lt;span class="sd"&gt;    latlong -- a dict of the x and y coodinates of the location&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-16" name="rest_code_6b3791c94e7542e4adadac932331c2c4-16"&gt;&lt;/a&gt;&lt;span class="sd"&gt;    Returns a base64-encoded image&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-17" name="rest_code_6b3791c94e7542e4adadac932331c2c4-17"&gt;&lt;/a&gt;&lt;span class="sd"&gt;    """&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-18" name="rest_code_6b3791c94e7542e4adadac932331c2c4-18"&gt;&lt;/a&gt;    &lt;span class="n"&gt;turl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;imgurl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;longi&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;latlong&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"lng"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-19" name="rest_code_6b3791c94e7542e4adadac932331c2c4-19"&gt;&lt;/a&gt;                         &lt;span class="n"&gt;lati&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;latlong&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"lat"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-20" name="rest_code_6b3791c94e7542e4adadac932331c2c4-20"&gt;&lt;/a&gt;                         &lt;span class="n"&gt;gmapkey&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;gmapkey&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-21" name="rest_code_6b3791c94e7542e4adadac932331c2c4-21"&gt;&lt;/a&gt;    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;turl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-22" name="rest_code_6b3791c94e7542e4adadac932331c2c4-22"&gt;&lt;/a&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;b64encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-23" name="rest_code_6b3791c94e7542e4adadac932331c2c4-23"&gt;&lt;/a&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-24" name="rest_code_6b3791c94e7542e4adadac932331c2c4-24"&gt;&lt;/a&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-25" name="rest_code_6b3791c94e7542e4adadac932331c2c4-25"&gt;&lt;/a&gt;&lt;span class="o"&gt;....&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-26" name="rest_code_6b3791c94e7542e4adadac932331c2c4-26"&gt;&lt;/a&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-27" name="rest_code_6b3791c94e7542e4adadac932331c2c4-27"&gt;&lt;/a&gt;&lt;span class="c1"&gt;# and in the results function&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-28" name="rest_code_6b3791c94e7542e4adadac932331c2c4-28"&gt;&lt;/a&gt;    &lt;span class="n"&gt;img_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dictr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"latlong"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-29" name="rest_code_6b3791c94e7542e4adadac932331c2c4-29"&gt;&lt;/a&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-30" name="rest_code_6b3791c94e7542e4adadac932331c2c4-30"&gt;&lt;/a&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;render_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"results.html"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-31" name="rest_code_6b3791c94e7542e4adadac932331c2c4-31"&gt;&lt;/a&gt;                           &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-32" name="rest_code_6b3791c94e7542e4adadac932331c2c4-32"&gt;&lt;/a&gt;                           &lt;span class="n"&gt;img_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quote&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img_data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;a id="rest_code_6b3791c94e7542e4adadac932331c2c4-33" name="rest_code_6b3791c94e7542e4adadac932331c2c4-33"&gt;&lt;/a&gt;                           &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;p&gt;Putting that all together gives us a rendered page that looks like this:&lt;/p&gt;
&lt;img alt="/images/2019/resultspage.png" src="https://www.jmcpdotcom.com/blog/images/2019/resultspage.png"&gt;
&lt;p&gt;To finalise the project, I ran it through &lt;a class="reference external" href="https://pypi.org/project/flake8/"&gt;flake8&lt;/a&gt; again (I do this every few
saves), and then &lt;code class="docutils literal"&gt;git commit&lt;/code&gt; followed by &lt;code class="docutils literal"&gt;git push&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;br&gt;
&lt;br&gt;&lt;/p&gt;
&lt;section id="reference-data-locations"&gt;
&lt;h2&gt;Reference data locations&lt;/h2&gt;
&lt;p&gt;&lt;br&gt;&lt;/p&gt;
&lt;table class="colwidths-given"&gt;
&lt;colgroup&gt;
&lt;col style="width: 20%"&gt;
&lt;col style="width: 80%"&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th class="head"&gt;&lt;p&gt;Jurisdiction&lt;/p&gt;&lt;/th&gt;
&lt;th class="head"&gt;&lt;p&gt;URL&lt;/p&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;Federal&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;&lt;a class="reference external" href="https://aec.gov.au/Electorates/gis/gis_datadownload.htm"&gt;aec.gov.au/Electorates/gis/gis_datadownload.htm&lt;/a&gt;&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;Queensland&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;&lt;a class="reference external" href="http://qldspatial.information.qld.gov.au/catalogue/custom/detail.page?fid=%7B079E7EF8-30C5-4C1D-9ABF-3D196713694F%7D"&gt;qldspatial.information.qld.gov.au/catalogue/custom/detail.page?fid={079E7EF8-30C5-4C1D-9ABF-3D196713694F}&lt;/a&gt;&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;New South Wales&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;&lt;a class="reference external" href="https://elections.nsw.gov.au/Elections/How-voting-works/Electoral-boundaries"&gt;elections.nsw.gov.au/Elections/How-voting-works/Electoral-boundaries&lt;/a&gt;&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;Victoria&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;&lt;a class="reference external" href="http://ebc.vic.gov.au/ElectoralBoundaries/FinalElectoralBoundariesDownload.html"&gt;ebc.vic.gov.au/ElectoralBoundaries/FinalElectoralBoundariesDownload.html&lt;/a&gt;&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;Tasmania&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;&lt;a class="reference external" href="https://www.tec.tas.gov.au/House_of_Assembly_Elections/index.html"&gt;www.tec.tas.gov.au/House_of_Assembly_Elections/index.html&lt;/a&gt;&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;br&gt;
&lt;br&gt;&lt;/p&gt;
&lt;p&gt;Tasmania's state parliament has multi-member electorates, which have the same
boundaries as their 5 Federal divisions.&lt;/p&gt;
&lt;p&gt;South Australia data was provided via direct personal email.&lt;/p&gt;
&lt;p&gt;Australian Capital Territory, Western Australia and Northern Territory data
was extracted from the &lt;a class="reference external" href="https://www.abs.gov.au/ausstats/subscriber.nsf/log?openagent&amp;amp;1259030001_ste11aaust_midmif.zip&amp;amp;1259.0.30.001&amp;amp;Data%20Cubes&amp;amp;6E45E3029A27FFEFCA2578CC0012083E&amp;amp;0&amp;amp;July%202011&amp;amp;14.07.2011&amp;amp;Latest"&gt;ABS shapefile&lt;/a&gt; after &lt;code class="docutils literal"&gt;ogr2ogr&lt;/code&gt;-converting from
MapInfo Interchange Format.&lt;/p&gt;
&lt;!-- put references after this point --&gt;
&lt;/section&gt;</description><category>ETL</category><category>flask</category><category>GeoJSON</category><category>Google Maps API</category><category>JSON</category><category>KML</category><category>microservices</category><category>ogr2ogr</category><category>Python</category><category>software engineering</category><category>training</category><category>upskilling</category><guid>https://www.jmcpdotcom.com/blog/posts/2019-09-27-microservices-part-1/</guid><pubDate>Thu, 26 Sep 2019 16:00:00 GMT</pubDate></item></channel></rss>