Microservices (part 2)

jmcp

2019-10-04 02:00

One principle that I work on is that I should always extend the fix (learnt via the Kepner-Tregoe Analytical Troubleshooting training many years ago). Following my investigation of how to provide a more accessible method of determining your electorate, I came back to the political polling ideas and got to thinking about how we can track the temperature of a conversation in, for example #auspol.

The term for this is sentiment analysis and while the major cloud providers have their own implementations of this (Microsoft Azure Text Analytics, AWS Comprehend, Google Cloud Natural Language API) you can also use Python's nltk in the comfort of your own venv. It's cheaper, too!

A bit of searching lead me to @Chapagain's post which was very useful and got me started - thankyou

I decided that I really want to do something more real-time, and while I could have done more scraping with Beautiful Soup, a quick look at the html that's returned with you run

import requests

url = "https://twitter.com/search?q=%23auspol&src=typed_query&f=live"
result = requests.get(url)
print(result.text)

is eye-wateringly complex. (Go on, try it!) I just couldn't be bothered with that so I signed up for a Twitter developer account and started looking at the APIs available for searching. These are easily used with the Twython library:

from twython import Twython

twitter = Tython(consumer_key, consumer_secret,
                 access_token, access_token_secret)
hashtag = "#auspol"
results = twitter.search(q=hashtag, result_type="recent")
for tweet in results["statuses"]:
    sentiment = classifier_func(tweet["text"])
    print(sentiment.prob("pos"))

I hit the rate limit a few times until I realiased that there was a while true going on inside Twython when using the cursor method. In my print-to-shell proof of concept, I got past that by using the search function inside a while loop with a 30second sleep call. I knew that that wasn't good enough for a web app, and would actually be a road block for doing a properly updated graph-focused page.

For that I would need a charting library, and some JavaScript. I started out using Chart.js, but quickly realised that it didn't have any sort of flow, so then I retooled to use C3js instead.

The initial render of the template provides the first set of data, which is a JavaScript array ([]), and checks for a saved hashtag and the id of the most recently found tweet using Window.sessionStorage(). Then we set up a function to get new data when called:

function getNewData() {
    var xhr = new XMLHttpRequest();
    xhr.open("GET", "/sentiment?hashtag="+hashtag+"&lastid="+lastid, true);
    xhr.onload = function (e) {
        if ((xhr.readyState === 4) && (xhr.status === 200)) {
            parsed = JSON.parse(xhr.responseText);
            sessionStorage.setItem("lastid", parsed["lastid"]);
            lastid = parsed["lastid"];
            labels = parsed["labels"];
            // Did we get new data points?
            if (parsed["chartdata"].length > 1) {
                // yes
                newDataCol = [ parsed["chartdata"] ];
                curidx += parsed["chartdata"].length - 1;
            } else {
                newDataCol = [];
            }
        }
    };
    xhr.onerror = function (e) {
        console.error(xhr.statusText);
    };
    xhr.send(null);
};

Finally, we need to tell the window to call our updateChart() function every 30 seconds, and define that function:

function updateChart() {
    getNewData();
    if (newDataCol !== []) {
        chart.flow({
            columns: prevDataCol,
            done: function () {
               chart.flow({
                   columns: newDataCol,
                   line: { connectNull: true }
               })
            }
        });
        prevDataCol = newDataCol;
    }
};

/* Update the chart every 30 seconds */
window.setInterval(updateChart, 30000);

So that you can change the hashtag to watch, I added a small <form> element which POSTs the new hashtag to the /sentiment method on submit and then re-renders the template.

What I'm particularly happy with is that the JavaScript took me only a few hours last Saturday morning and was pretty straightforward to write.

You can find the code for this project in my GitHub repo au-pol-sentiment.

Microservices (part 1)

jmcp

2019-09-27 02:00

Since I departed from my comfortable niche in Solaris engineering earlier this year, I've spent a considerable amount of time and energy in re-training and upskilling to assist my employment prospects. Apart from acquainting myself with a lot of terminology, I've written code. A lot of code. Mostly, as it turns out, has been related to microservices.

This post is about a microservice I wrote to assist with accessibility in a specific part of the Australian electoral process (finding out which electorate you live in) and some supporting digressions.

You can find all the code for this microservice and its associated data preparation in my GitHub repos grabbag and find-my-electorate.

On 18 May 2019, Australia had a federal election, and in the lead up to that event I became very interested in political polling. While I have a few ideas on the subject which are on my back burner, mind-mapping the various components of political polling got to wondering how do the various state, territory and federal electoral commissions map a voter's address to an electorate?

My first point of call was the Australian Electoral Commission and their Find my electorate site. This is nicely laid out and lets you find out which electorate you are in - by postcode. This is all well and good if you're in a densely populated area, like the electorate of Brisbane - three suburbs. If, however, you choose somewhere else, like 2620 which covers a lot of Canberra and surrounding districts, you wind up with several electorates covering 2620.

The AEC's website is written in asp.net, which is up to the task, but when you have more than one page of results the authors of the page make use of some (to my mind) squirrelly features and callbacks which make scraping the site difficult. As best I can determine, the AEC doesn't provide an API to access this information, so Another Method was required.

At this point, I turned to the standard libraries for this sort of thing in the Python worldL Beautiful Soup and requests. I started by setting up a quick venv to keep the dependencies contained

$ python3.7 -m venv scraping-venv
$ . scraping/bin/activate
(scraping-venv) $ pip install requests bs4 json csv

Now since we know the url to GET, we can get the first page of responses very easily:

import requests
from bs4 import BeautifulSoup

url =  "https://electorate.aec.gov.au/LocalitySearchResults.aspx?"
url += "filter={0}&filterby=Postcode"

result = requests.post(url.format(2620))
resh = BeautifulSoup(result.text, "html.parser")

Beautiful Soup parses the response text, and gives us a tree-like structure to work with. Making use of the Chrome devtools (or the Firefox devtools ) I could see that I need to find a <table> with an attribute of *ContentPlaceHolderBody_gridViewLocalities* - what a mouthful! - and then process all the table rows (<tr>) within that table

tblAttr = "ContentPlaceHolderBody_gridViewLocalities"
restbl = resh.find_all(name="table", attrs={"id": tblAttr})
rows = restbl[0].find_all("tr")

Using a for loop we can construct a dict of the data that we actually need. Simple!

What do we do, though, when we want to get the second or later pages of result? This is where the squirrelly features and callbacks come in. The page makes use of an __EVENTARGUMENT element which is POST ed as payload back to the same url. The way that we determine this is to look for a row with the class pagingLink, then for each table data (<td>) element check for its contents matching this regex

".*__doPostBack.'(.*?gridViewLocalities)','(Page.[0-9]+)'.*")

And after that we can recursively call our query with the extra payload data in the argument list:

def queryAEC(postcode, extrapage):
    """
    Queries the AEC url and returns soup. If extrapage is empty
    then we pass the soup to findFollowups before returning.
    """
    url = "https://electorate.aec.gov.au/LocalitySearchResults.aspx?"
    url += "filter={0}&filterby=Postcode"

    if not extrapage:
        res = requests.post(url.format(postcode))
    else:
        payload["__EVENTARGUMENT"] = extrapage
        res = requests.post(url.format(postcode), data=payload)

I now had a script to run which extracted this info and pretty-printed it (as well as the same info in JSON):

$ ./postcode.py 2620
State    Postcode   Locality                         Electorate
ACT      2620       BEARD                            Canberra
ACT      2620       BOOTH DISTRICT                   Bean
NSW      2620       BURRA                            Eden-Monaro
NSW      2620       CARWOOLA                         Eden-Monaro
NSW      2620       CLEAR RANGE                      Eden-Monaro
ACT      2620       CORIN DAM                        Bean
NSW      2620       CRESTWOOD                        Eden-Monaro
NSW      2620       ENVIRONA                         Eden-Monaro
NSW      2620       FERNLEIGH PARK                   Eden-Monaro
NSW      2620       GOOGONG                          Eden-Monaro
NSW      2620       GREENLEIGH                       Eden-Monaro
NSW      2620       GUNDAROO                         Eden-Monaro
ACT      2620       HUME                             Bean
NSW      2620       KARABAR                          Eden-Monaro
ACT      2620       KOWEN DISTRICT                   Canberra
ACT      2620       KOWEN FOREST                     Canberra
NSW      2620       MICHELAGO                        Eden-Monaro
ACT      2620       OAKS ESTATE                      Canberra
ACT      2620       PADDYS RIVER DISTRICT            Bean
NSW      2620       QUEANBEYAN                       Eden-Monaro
NSW      2620       YARROW                           Eden-Monaro
NSW      2620       QUEANBEYAN EAST                  Eden-Monaro
NSW      2620       QUEANBEYAN WEST                  Eden-Monaro
NSW      2620       RADCLIFFE                        Eden-Monaro
ACT      2620       RENDEZVOUS CREEK DISTRICT        Bean
ACT      2620       ROYALLA                          Bean
NSW      2620       ROYALLA                          Eden-Monaro
NSW      2620       SUTTON                           Eden-Monaro
ACT      2620       TENNENT DISTRICT                 Bean
ACT      2620       THARWA                           Bean
NSW      2620       THARWA                           Eden-Monaro
NSW      2620       THARWA                           Eden-Monaro
NSW      2620       THE ANGLE                        Eden-Monaro
NSW      2620       THE RIDGEWAY                     Eden-Monaro
NSW      2620       TINDERRY                         Eden-Monaro
NSW      2620       TRALEE                           Eden-Monaro
ACT      2620       TUGGERANONG DISTRICT             Bean
NSW      2620       URILA                            Eden-Monaro
NSW      2620       WAMBOIN                          Eden-Monaro
ACT      2620       WILLIAMSDALE                     Bean
NSW      2620       WILLIAMSDALE                     Eden-Monaro

That really is quite a few suburbs.

So now that we've got a way to extract that information, how do we make it available and useful for everybody? With a microservice! I hear you cry.

The very first microservice I wrote (in 2011-12, the subject of a future post) used CherryPy, because we'd embedded it within Solaris IPS (image packaging system) and didn't need any further corporate approvals. The path of least resistance. This time, however, I was unconstrained regarding approvals, so had to choose between Django and flask. For no particular reason, I chose flask.

It was pretty easy to cons up the requisite templates, and write the /results method. It was at this point that my extend the fix habit (learnt via the Kepner-Tregoe Analytical Troubleshooting training many years ago) kicked in, and I started exploring the Electoral Commission of Queensland website for the same sort of information. To my surprise, the relatively straight-forward interface of the AEC was not available, and the closest analogue was an interactive map.

After a brief phone conversation with ECQ and more digging, I discovered that the 2017 boundaries were available from QLD Spatial in shapefile, MapInfo and Google Maps KML formats. This was very useful, because KML can be mucked about with directly using Beautiful Soup. After not too much effort I had the latitude+longitude pairs for the boundaries extracted and stored as JSON . My phone conversation with ECQ also took me down the path of wanting to translate a street address into GeoJSON - and that took me to the Google Maps API. I did investigate OpenStreetMap's api, but testing a few specific locations (addresses where we've lived over the years) gave me significantly different latitude+longitude results. I bit the bullet and got a Google Maps API key .

The next step was to research how to find out if a specific point is located within a polygon, and to my delight the Even-odd rule has example code in Python, which needed only a small change to work with my data arrangement.

With that knowledge in hand, it was time to turn the handle on the Google Maps API :

keyarg = "&key={gmapkey}"
queryurl = "https://maps.googleapis.com/maps/api/geocode/json?address="
queryurl += "{addr} Australia"
queryurl += keyarg

...
# Helper functions
def get_geoJson(addr):
    """
    Queries the Google Maps API for specified address, returns
    a dict of the formatted address, the state/territory name, and
    a float-ified version of the latitude and longitude.
    """
    res = requests.get(queryurl.format(addr=addr, gmapkey=gmapkey))
    dictr = {}
    if res.json()["status"] == "ZERO_RESULTS" or not res.ok:
        dictr["res"] = res
    else:
        rresj = res.json()["results"][0]
        dictr["formatted_address"] = rresj["formatted_address"]
        dictr["latlong"] = rresj["geometry"]["location"]
        for el in rresj["address_components"]:
            if el["types"][0] == "administrative_area_level_1":
                dictr["state"] = el["short_name"]
    return dictr

When you provide an address, we send that to Google which does a best-effort match on the text address then returns GeoJSON for that match. For example, if you enter

42 Wallaby Way, Sydney

the best-effort match will give you

42 Rock Wallaby Way, Blaxland NSW 2774, Australia

I now had a way to translate a street address into a federal electorate, but with incomplete per-State data my app wasn't finished. I managed to get Federal, Queensland, New South Wales, Victoria and Tasmania data fairly easily (see the links below) and South Australia's data came via personal email after an enquiry through their contact page. I didn't get any response to several contact attempts with either Western Australia or the Northern Territory, and the best I could get for the ACT was their electorate to suburb associations.

I remembered that the Australian Bureau of Statistics has a standard called Statistical Geography, and the smallest unit of that is called a Mesh Block:

Mesh Blocks (MBs) are the smallest geographical area defined by the ABS. They are designed as geographic building blocks rather than as areas for the release of statistics themselves. All statistical areas in the ASGS, both ABS and Non ABS Structures, are built up from Mesh Blocks. As a result the design of Mesh Blocks takes into account many factors including administrative boundaries such as Cadastre, Suburbs and Localities and LGAs as well as land uses and dwelling distribution. (emphasis added)

Mesh Blocks are then aggregated into SA1s:

Statistical Areas Level 1 (SA1s) are designed to maximise the spatial detail available for Census data. Most SA1s have a population of between 200 to 800 persons with an average population of approximately 400 persons. This is to optimise the balance between spatial detail and the ability to cross classify Census variables without the resulting counts becoming too small for use. SA1s aim to separate out areas with different geographic characteristics within Suburb and Locality boundaries. In rural areas they often combine related Locality boundaries. SA1s are aggregations of Mesh Blocks. (emphasis added)

With this knowledge, and a handy SA1-to-postcode map in CSV format

$ head australia-whole/SED_2018_AUST.csv
SA1_MAINCODE_2016,SED_CODE_2018,SED_NAME_2018,STATE_CODE_2016,STATE_NAME_2016,AREA_ALBERS_SQKM
10102100701,10031,Goulburn,1,New South Wales,362.8727
10102100702,10053,Monaro,1,New South Wales,229.7459
10102100703,10053,Monaro,1,New South Wales,2.3910
10102100704,10053,Monaro,1,New South Wales,1.2816
10102100705,10053,Monaro,1,New South Wales,1.1978
....

I went looking into the SA1 information from the ABS shapefile covering the whole of the country. Transforming the shapefile into kml is done with ogr2ogr and provides us with an XML schema definition. From the CSV header line above we can see that we want the SA1_MAINCODE_2016 and (for validation) the STATE_NAME_2016 fields. Having made a per-state list of the SA1s, we go back to the kml and process each member of the document:

  <gml:featureMember>
    <ogr:SED_2018_AUST fid="SED_2018_AUST.0">
      <ogr:geometryProperty>
        <gml:Polygon srsName="EPSG:4283">
          <gml:outerBoundaryIs>
            <gml:LinearRing>
              <gml:coordinates>
  ....
              </gml:coordinates>
            </gml:LinearRing>
          </gml:outerBoundaryIs>
        </gml:Polygon>
      </gml:polygonMember>
    </ogr:geometryProperty>
    <ogr:SED_CODE18>30028</ogr:SED_CODE18>
    <ogr:SED_NAME18>Gladstone</ogr:SED_NAME18>
    <ogr:AREASQKM18>2799.9552</ogr:AREASQKM18>
  </ogr:SED_2018_AUST>
</gml:featureMember>

The gml:coordinates are what we really need, they're space-separated lat,long pairs.

for feature in sakml.findAll("gml:featureMember"):
sa1 = feature.find("ogr:SA1_MAIN16").text
mb_coord[sa1] = mb_to_points(feature)

for block in mb_to_sed:
    electorate = mb_to_sed[block]
    sed_to_mb[electorate]["coords"].extend(mb_coord[block])

After which we can write each jurisdiction's dict of localities and lat/long coordinates out as JSON using json.dump(localitydict, outfile).

To confirm that I had the correct data, I wrote another simple quick-n-dirty script jsoncheck.py to diff the SA1-acquired JSON against my other extractions. There was one difference of importance found - Queensland has a new electorate McConnel, which was created after the most recent ABS SA1 allocation.

So that's the data preparation done, back to the flask app! The app listens at the root (/), and is a simple text form. Hitting enter after typing in an address routes the POST request to the results function where we call out to the Google Maps API, load the relevant state JSONified electorate list, and then locate the Federal division. There are 151 Federal divisions, so it's not necessarily a bad thing to search through each on an alphabetic basis and break when we get a match. I haven't figured out a way to (time and space)-efficiently hash the coordinates vs divisions. After determining the Federal division we then use the same method to check against the identified state's electorate list.

The first version of the app just returned the two electorate names, but I didn't think that was very friendly, so I added another call to the Google Maps API to retrieve a 400x400 image showing the supplied address on the map; clicking on that map takes you to the larger Google-hosted map. I also added links to the Wikipedia entries for the Federal and state electorates. To render the image's binary data we use b64encode:

from base64 import b64encode

keyarg = "&key={gmapkey}"
imgurl = "https://maps.googleapis.com/maps/api/staticmap?size=400x400"
imgurl += "&center={lati},{longi}&scale=1&maptype=roadmap&zoom=13"
imgurl += "&markers=X|{lati},{longi}"
imgurl += keyarg

# Let's provide a Google Maps static picture of the location
# Adapted from
# https://stackoverflow.com/questions/25140826/generate-image-embed-in-flask-with-a-data-uri/25141268#25141268
#
def get_image(latlong):
    """
    latlong -- a dict of the x and y coodinates of the location
    Returns a base64-encoded image
    """
    turl = imgurl.format(longi=latlong["lng"],
                         lati=latlong["lat"],
                         gmapkey=gmapkey)
    res = requests.get(turl)
    return b64encode(res.content)


....

# and in the results function
    img_data = get_image(dictr["latlong"])

    return render_template("results.html",
                           ...,
                           img_data=format(quote(img_data))
                           ...)

Putting that all together gives us a rendered page that looks like this:

To finalise the project, I ran it through flake8 again (I do this every few saves), and then git commit followed by git push.

Reference data locations

Jurisdiction	URL
Federal	aec.gov.au/Electorates/gis/gis_datadownload.htm
Queensland	qldspatial.information.qld.gov.au/catalogue/custom/detail.page?fid={079E7EF8-30C5-4C1D-9ABF-3D196713694F}
New South Wales	elections.nsw.gov.au/Elections/How-voting-works/Electoral-boundaries
Victoria	ebc.vic.gov.au/ElectoralBoundaries/FinalElectoralBoundariesDownload.html
Tasmania	www.tec.tas.gov.au/House_of_Assembly_Elections/index.html

Tasmania's state parliament has multi-member electorates, which have the same boundaries as their 5 Federal divisions.

South Australia data was provided via direct personal email.

Australian Capital Territory, Western Australia and Northern Territory data was extracted from the ABS shapefile after ogr2ogr-converting from MapInfo Interchange Format.

A response to some reading materials

jmcp

2019-06-07 06:00

As part of my effort to get up to date with my skillset, I've joined the dev.to community (pointed out by Ali Spittel, who has written some very good pieces on medium.com). This morning I came across a piece from last year which I thought would be worth a read: 9 Software Architecture Interview Questions and Answers.

It started out fairly well:

A software architect is a software expert who makes high-level design choices and dictates technical standards, including software coding standards, tools, and platforms. Software architecture refers to the high level structures of a software system, the discipline of creating such structures, and the documentation of these structures.

I don't believe that the rest of the piece matches up to that initial paragraph. The first question was pretty good: What does "program to interfaces, not implementations" mean? The sting, though, was that theanswer dived straight into OO-speak, talking about factories.

I'm sure that's ok for somebody who has just finished their first single-semester class on Design Patterns, but doesn't help you in the real world - and surely your lecturer for that class would be keen to help you turn the theory into practice?

Cue #TheVoiceOfExperience!

When I was working as a tier-4 support engineer for Sun Microsystems, we had a support call from an (of course) irate customer when a recent kernel/libc patch had fixed a bug. I forget the details of the bug, but I do not forget what the customer insisted that we do. They demanded that we reverse the bugfix because their product depended upon the broken behaviour.

Many other customers, along with our test teams and engineering teams, had identified that this particular behaviour was a bug and needed to be fixed. Fixed so that our product behaved according to a particular published specification. Everybody apart from this irate customer wanted us to get the interface correct.

If I get asked a question along these lines, I answer that you cannot depend upon the implementation of any interface, because that's not sustainable for reliable software. If you depend upon particular behaviour from an implementation, then you are at the mercy of the implementor and that way lies increased costs. Don't do it. Or if you absolutely must (yes, there are times when that's the case), then do your utmost to ensure that you get a commitment from the implementation provider that they won't go and change things out from under you without warning. In the Solaris development model, this was a Contract and where we had them we adhered to them - rigorously.

The second question wasn't about Software Architecture at all:

Q2: What are the differences between continuous integration, continuous delivery, and continuous deployment?

The subhead even called that out as #DevOps!

This question is about engineering processes. If you want to turn it into a question which is closer to being about architecture, perhaps you could ask

Please describe your software development quality framework, and how you put it into practice.

Questions three, four and five were all about whether you could recall what certain acronyms stood for (SOLID, BASE and CAP); so, a bit architecture-y. I suggest, however, that the question about SOLID is actually a theoretical computer science issue rather than architecture - at least, architecture as I understand it. It's definitely useful to know and understand SOLID, but unless I'm interviewing for a computer science research position, I bet that you're interested in how I put that knowledge into practice in designing the best architecture for the application at hand.

Question six was back to architecture of sorts, but when you unpack the "Twelve Factor App Methodology" it's implementation (best practices) rather than architecture, and that makes the whole topic one of software engineering rather than architecture.

Yes, there's a difference.

Yes, it matters. (Also, yes, I'll ramble on about this in other posts - later).

Question seven was actual architecture:

Q7: What are Heuristic Exceptions?

This is a very good question, because it asks you to think about how to handle failure in the absence of a controller which tells your sub-process what to do. This is an area of clustered operation implementations which gets a lot of attention. In the clusterware environments that I've worked on, these clusters ran telco switches, billing systems, bank credit card transaction processors and even emergency service dispatch systems. You must get the response and system architecture correct because otherwise people could die.

The final two questions are also pretty good for architecture topics:

Q8: What is shared-nothing architecture? How does it scale? Q9: What does Eventually Consistent mean?

I first came across the shared-nothing architecture while reading a Sun Cluster docbook relating to the IBM Informix agent. At that time (circa 2001) the architecture appeared to be rare when used in the context of Sun Cluster or Veritas cluster. Now, however, it's pretty much the basis for any microservice you might want to spin up with Kubernetes or Docker.

Over the course of 2018 I interviewed 36 candidates for positions in the team I was working with inside Oracle. I asked questions relating to software engineering practice, to personal preferences (vi, emacs or VS code? Do you login as root or as your own user? were just two), knowledge of the standard C library, and just two that related in any way to architecture. For the record, those questions were

Is it better for your code to be correct or to be fast?

followup: can you think of a use-case for the other answer?

and

When you're designing a utility or application which requires the user provide a password, is it acceptable to provide the password as a command-line argument? What risks are there in doing so?

My gripe with this post on dev.to is that it's not just about software architecture, it's also about computer science and software engineering (not a bad thing at all) but most importantly, of the questions presented, the proposed answers for four of them depend more on whether you can regurgitate definitions from a textbook, rather than explain how the theory is usedin practice. If you're going for a position as a computer science academic then, sure, regurgitate the textbook definition. But otherwise, I want to know how you can put SOLID into practice.

Project Lullaby - part 1

jmcp

2019-04-29 22:00

My time at Oracle has come to a close, so I'm going to take this opportunity to ramble a bit about some of the things I've worked on, and one project in particular which I'm particularly proud of.

#begin(ramble)

I started using SunOS when I started university. The university had Sun servers, and both the CS, Maths and Electrical Engineering departments had workstations as well. It was my first hands-on exposure to a UNIX of any sort; my knowledge previously was based on articles from Byte and Dr Dobb's Journal. I spent many hours in the CS labs poking around and exploring. Sometimes I even did my assignments! Over the first summer break the faculty computer unit upgraded every system to Solaris 1 and as you might expect, printing was totally different. We wailed a lot. And got on with figuring out how to use the SysV environment.

When I started work in the university library, there was a SparcStation 5 running Solaris 2.5, and a rather early version of NCSA and then Apache httpd. On moving to another university as a fulltime system administrator I now had SparcServer1000Ds to manage, along with an early fibrechannel array, backups, and disaster recovery. I learnt a lot and when Sun was next hiring support engineers they asked me in for an interview. I was delighted to receive an offer, and .... now it's close to 20 years later and let me count the roles I've had:

front line and back line technical support
fourth level technical support, working escalations and fixing bugs
working in the Solaris MultiPathed IO (MPxIO) stack for Fibre Channel
working in the project team to bring MPxIO to the x86/x64 platform
rewriting the Solaris firmware flash (fwflash) utility to make it modular and support any device which has flashable firmware
Gatekeeper for the OS/Networking consolidation
ISV/IHV liaison for driver development
Project Architect and Lead for Project Lullaby
worked on userspace and filtering support for the Solaris Analytics project

#end(ramble)

#begin(Scratching an itch)

When I started as Gatekeeper for the OS/Networking (ON hearafter) consolidation, the utility we used to build the gate was called nightly. It was run every night, and took all night long to run. A monolithic shell script, it was uninterruptible and (much more importantly) un-restartable. As my colleague Tim noted, it used almost every letter of the alphabet as a command line option - along with a shell script configuration file.

It had to go.

A Silly Bit Of JavaScript

jmcp

2019-04-22 00:00

After many years actively avoiding the issue, I've now accepted the inevitable: I have to learn JavaScript (and node.js too, for that matter). While I've spent many years doing enterprise software development at the boundary of kernel and userspace using C and Python, the software stack above has just been what I consumed rather than created. It's time for that to change.

As it so happens, I'm on gardening leave until COB on the 1st of May. I thought my homepage could do with an update and decided to write a function to display how much time remains:

let secPerDay = 86400;
let enddate = new Date("2019-05-01T07:00:00.000Z");

function timeRemaining() {
    // refers to enddate, which is Date("2019-05-01T07:00:00.000Z")
    let now = new Date();
    if (now >= enddate) {
        return ("Gardening leave has finished, I'm a free agent");
    };
    let diff = (enddate.getTime() - now.getTime())/1000/secPerDay;
    let days = Math.floor(diff);
    let frachours = 24 * (diff - days);
    let hours = Math.floor(frachours);
    let minutes = Math.floor(60 * (frachours - hours));
    return ("Gardening leave ends in " + days + " days, " + hours + " hours, " + minutes + " minutes.");
}

function outputTR() {
    let el = document.getElementById("TextDiv");
    el.innerHTML = "<b>" + timeRemaining() + "</b>";
    el.color = "#000000";
};

window.setInterval("outputTR()", 1000);

Pretty simple, a little bit silly, but does the job. I'm actually excited by it because it's an opportunity for me to rejig my thinking about what my "kernel" is - it's the browser.

As it happens I also picked up a book called Data Visualization: A Practical Introduction so I can see a fair bit of inquiry into government datasets in my future.

Pizza, another family favourite

jmcp

2018-06-16 09:37

Another family favourite recipe is Pizza - made from scratch and baked in our oven on pizza stones. The recipe for the dough is really simple; I got it from "Better Homes & Gardens" magazine years ago, written up by Karen Martini

Ingredients

1 Dry Yeast Sachet (7g)
125ml warm water
250ml cold water
400g White Plain Flour (+ more for dusting and kneading)
100g Semolina
pinch of salt
pinch of sugar
1 tbsp olive oil

Utensils

3L saucepan, with lid
rolling pin
2L mixing bowl
pizza stones
large spoon (for stirring)
kitchen scales
cork mats
large spatula
Ulu
Silicone pastry mat

Process

Fill your kettle, and boil it
place the salt and sugar into the mixing bowl, along with 250ml cold water, and then 125ml of warm water (from the kettle). If the water feels hot when you dip your fingertip in, then allow it to cool for a few minutes. Stir the water, salt and sugar together.
Add the contents of the yeast sachet, swirling it around in the bowl.
Wait for about 5 minutes for the yeast to activate. After you place the yeast in the bowl, it will drop to the bottom. After a few minutes (usually 5) you will notice a bubbling action and observe the now-activated yeast coming to the surface in something akin to a foam.
Gently but firmly stir the flour and semolina into the bowl, until you've got most of the flour-y bits combined into a ball, then tip out onto your bench. Preferably with a

baking sheet which you've lightly dusted with flour
Dust your hands with flour, then work all the remaining bits of flour into the ball, and knead it for about 2 minutes. The dough ball should be glossy.
Wash out your mixing bowl, ensuring that you get all the remaining specs of dough out, then dry the bowl. You might want to re-boil your kettle at this point.
With your 3L pot underneath, pour a little bit of the kettle's hot water onto the outside of the bowl, then fill the pot with about 1-1.5L of the remaining water.
Dribble the olive oil around the inside of the bowl so that you have even coverage.
Place the dough ball in the bowl, and roll it around in the oil so the whole thing is covered in a fine film of oil.
Cover the bowl with clingfilm, then place it on top of the pot and put the lid on.
Allow the dough to rise. This should take about 45 minutes. During this time you should clean your baking sheet and benchtop, and prep any toppings that you want on your pizzas.
Heat the oven to 250C, with the pizza stones on the racks.
Once the dough has doubled in size, it is ready to knead again and roll out to the size of your pizza stones. Tip it out onto your dusted baking sheet (or benchtop), discard the clingfilm, and then knead the dough ball with a little more flour.
At this point we divide the dough into portions by weight. Over the years I've determined that the ideal weight of dough per dough-ball for our pizza stones is 350g. I generally try to get two balls at 350g and then whatever is left is a smaller pizza which I experiment with toppings that the children aren't interested in trying yet.
After dusting the rolling pin with flour, roll the dough ball out on the baking sheet, so that it is about the same size as the pizza stone.
Take your pizza stones out of the oven, and leave all but one in a safe place. This other one, place on the bench on top of cork mats
Transfer the base to the stone.
Apply toppings, and then place into the oven and wait until baked. Repeat for the rest of the dough.
The pizza is properly baked when it has a crispy underside. Check this by sliding a spatula underneath the base while it is still on the stone after the pizza has been in the oven for about 10 minutes. If the pizza is ready, remove from the oven and let it rest.
Once the baked pizza has rested sufficiently (it won't be staggeringly hot to touch), you need to cut it. I use the Ulu for this
Enjoy!

dough on baking sheet, rolled out to size

The pizza dough, all risen and ready to work

Tonight's meatlovers ready to go into the oven

Another meatlover's variant, prior to baking

Replacing your self-signed certificate in svc:/system/identity:cert

jmcp

2018-05-26 16:44

One feature of your freshly installed Solaris 11.4 instance that can fly under the radar is the svc:/system/identity:cert service. This provides you with a system-generated (that's your system, not Oracle) certificate which is self-signed, and which a number of other services depend upon:

$ svcs -D identity:cert
STATE          STIME    FMRI
disabled       Apr_26   svc:/system/rad:remote
online         Apr_26   svc:/system/ca-certificates:default
online         Apr_26   svc:/milestone/self-assembly-complete:default
online         May_03   svc:/system/webui/server:default

By-the-bye, the svc:/system/ca-certificates service helps keep the system copy of Certificate Authority certificates updated.

So what do you do if you want to get past an error like this when you try to access https://127.0.0.1:6787 so you can try out the WebUI?

/images/2018/05/self-signed-cert-error.png

Once you've obtained a CA-signed certificate, it's actually very easy to do:

# SVC=svc:/system/identity:cert
# svccfg -s $SVC setprop certificate/cert/pem_value = astring: "$(cat /path/to/signed/certificate.crt )"
# svccfg -s $SVC setprop certificate/cert/private_key/pem_value = astring: "$(cat /path/to/signed/certificate.key )"
# svccfg -s $SVC setprop certificate/ca/pem_value = astring: "$(cat /path/to/issuer/certificate.crt )"
# svcadm refresh $SVC
# svcadm restart -sr $SVC

Where did the old posts go?

jmcp

2018-05-15 10:00

When I set up my blog in 2006, I chose to use roller, which was the same engine that the now-defunct blogs.sun.com used at the time. Later, having gotten tired of the interface and being more impressed with Wordpress' facility for image galleries, I started running my own instance of that.

After a while, however, the frequent CVEs in both PHP and the Wordpress base+plugin systems got me sufficiently motivated to change that I did.

My friends Shawn and Liane suggested going for a static site generator, so after having a brief look at Pelican and Nikola, I instead chose Hugo.

Hugo's chief attraction was the relative ease with which I could create image galleries, along with the ability to import Wordpress sites. Yay, thought I, I can have a fairly seamless transition, and away we went.

I didn't actually check the site backup that I made before turning off the Wordpress instance, however, and when I went looking for the pkgrepo procedure that I used for building darktable and didn't find the content that I wanted, I was a bit annoyed.

Given that Solaris' packaged version of Go is somewhat behind the community version, and that Hugo depends on a much newer version, this was also the trigger for me to re-explore Pelican and Nikola, both of which are written in Python. After a brief flirtation with Pelican I settled instead on Nikola and did the initial Hugo to Nikola migration fairly easily. Chris pointed me to the gallery directive plugin and I was able to make a start with some of my more recent gallery collections. A quick implementation of a PR for captioned and ordered images got me the rest of the way and then I could get back to the real problem: the missing content.

Fortunately for me, the wayback machine had a copy of the old site entries, and with a quick installation of the wayback machine downloader I was able to grab the whole site as it was up to 2016.

Phew!

Except that all the post files were chock full of Wordpress and roller's javascript and expanded css, which were a real mess to look at, let alone extract the posts from.

So I did what anybody else would do, and wrote some Python using BeautifulSoup to provide a best effort extraction which would translate the html+js+css into the plain-text (and therefore portable) ReStructured Text format. Now since this is a best effort attempt, I'm not too concerned about getting the output as perfect rst which matches my original post, and I went through about 20 different entries to tidy up the input so that the script would produce something close to what I wanted. I knew that I'd have to go and post-process quite a few entries as well, I just wanted to not have to do too much to get that going.

I've converted about 340 posts extracted from the wayback machine archive using this script wp-to-rest.py which took about 2 days to write and finesse, and another day or so to muck around with several posts to get them into better shape (ie, running nikola build doesn't yell at me). There are still a bunch of broken links in there, but at least now I've got all my content back, and can very easily fix things up as I get the inclination.

Solaris Analytics Collections, partitions, slices and operators

jmcp

2018-05-08 10:00

Having poked around with Solaris Analytics, and the WebUI a little, you might have wondered what information or statistics we gather by default. For instance, what are the statistics which we collect to make the Solaris Dashboard sheet useful?

The feature we provide to make this happen is the Collection. Collections give us a handy shorthand for gathering statistics. We ship with several collections:

# sstore list //:class.collection//:collection.name/*
IDENTIFIER
//:class.collection//:collection.name/root/apache-stats
//:class.collection//:collection.name/root/compliance-stat
//:class.collection//:collection.name/root/cpu-stats
//:class.collection//:collection.name/root/network-stats
//:class.collection//:collection.name/root/solaris-dashboard
//:class.collection//:collection.name/root/system

Listing collections is a privileged operation; if I run the command above as myself then I get a very different result:

$ sstore list //:class.collection//:collection.name/*
Warning (//:class.collection//:collection.name/*) - lookup error: no matching collections found

The collection which is enabled by default is //:class.collection//:collection.name/root/system, and you can see what it gathers by running sstore info on it:

# sstore info //:class.collection//:collection.name/root/system
Identifier: //:class.collection//:collection.name/root/system
  ssid: //:class.system//:*
 state: enabled
  uuid: 7a002985-2cf4-4965-adc9-b53116d8ae67
 owner: root
 cname: system
crtime: 1523243338963817

I quite like having the solaris-dashboard and apache-stats collections enabled, and that is really easy to do:

# sstoreadm enable-collection \
    //:class.collection//:collection.name/root/solaris-dashboard \
    //:class.collection//:collection.name/root/apache-stats

One thing I'm always concerned with, since our family media server is, shall we say, homebrew, is whether my disks are doing ok. Fortunately for me, it is very easy to cons up my own collection and stash it in /usr/lib/sstore/metadata/collections:

[
    {
        "$schema": "//:collection",
        "description": "disk-related statistics",
        "enabled": true,
        "id": "disk-stats",
        "ssids": [
            "//:class.disk//:res*//:*"
        ],
        "user": "root"
    }
]

and once you've restarted sstored you can see it like so:

# sstore info -a //:class.collection//:collection.name/root/disk-stats
Identifier: //:class.collection//:collection.name/root/disk-stats
   ssid: //:class.disk//:res*//:*
  state: enabled
   uuid: bee6c5c5-487e-4376-9d91-f4eb933fd64e
  owner: root
  cname: disk-stats
 crtime: 1525373259871426

[Note that you do need to ensure that your collection validates against the collections schema, so run soljsonvalidate /path/to/my/collection.json, and if you need to reformat it, soljsonfmt /path/to/my/collection.json].

So that's useful - now what? How about looking at the illegal requests counter? When you run iostat -En that information is jumbled up with all the other errors and can be a little difficult to distinguish:

$ iostat -En sd0
c2t0d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: WDC WD30EFRX-68E Revision: 0A82 Serial No: WD-WCC4N7CNYH0S
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 537 Predictive Failure Analysis: 0 Non-Aligned Writes: 0

With Solaris Analytics, however, we can gather all of those errors together in aggregate and partition them at the same time. This command shows us the most recent data point (the -p -1 argument):

$ sstore export -p -1 "//:class.disk//:res.name/sd0//:stat.errors//:part.type"
TIME                VALUE IDENTIFIER
2018-05-08T19:07:44  //:class.disk//:res.name/sd0//:stat.errors//:part.type
                    device-not-ready: 0.0
                    hard-errors: 0.0
                    illegal-requests: 537.0
                    media-errors: 0.0
                    no-device: 0.0
                    non-aligned-writes: 0.0
                    predictive-failure-analysis: 0.0
                    recoverable: 0.0
                    soft-errors: 0.0
                    transport-errors: 0.0

That's a bit more useful! (Yes, having to use sdN rather than cXtYdZ is a pain, sorry). So... how about just looking for the illegal-requests? That's where we really make use of the partition concept - and let's throw the argument to give a daily total from the start of this month (May 2018):

$ sstore export -t 2018-05-01T00:00:00 -i 86400  "//:class.disk//:res.name/sd0//:stat.errors//:part.type(illegal-requests)"
TIME                VALUE IDENTIFIER
2018-05-01T00:00:00  //:class.disk//:res.name/sd0//:stat.errors//:part.type(illegal-requests)
                    illegal-requests: 27.0
2018-05-02T00:00:00  //:class.disk//:res.name/sd0//:stat.errors//:part.type(illegal-requests)
                    illegal-requests: 27.0
2018-05-03T00:00:00  //:class.disk//:res.name/sd0//:stat.errors//:part.type(illegal-requests)
                    illegal-requests: 27.0
2018-05-04T00:00:00  //:class.disk//:res.name/sd0//:stat.errors//:part.type(illegal-requests)
                    illegal-requests: 79.0
2018-05-05T00:00:00  //:class.disk//:res.name/sd0//:stat.errors//:part.type(illegal-requests)
                    illegal-requests: 174.0
2018-05-06T00:00:00  //:class.disk//:res.name/sd0//:stat.errors//:part.type(illegal-requests)
                    illegal-requests: 270.0
2018-05-07T00:00:00  //:class.disk//:res.name/sd0//:stat.errors//:part.type(illegal-requests)
                    illegal-requests: 365.02
2018-05-08T00:00:00  //:class.disk//:res.name/sd0//:stat.errors//:part.type(illegal-requests)
                    illegal-requests: 461.0

Much more useful - and observe that because we're using () to extract the partition element, we need to quote the argument so the shell doesn't get snippy with us.

To finish this post, let's take a look at two more really useful features, slices and operators. One operator that I'm particularly happy with is //:op.changed, which shows you when a statistic value changed. While not particularly useful for volatile statistics on a per-second basis (watch //:class.system//:stat.virtual-memory for a few minutes and you'll see what I mean) if you aggregate such stats over a longer time period, such as a day, you can get a better understanding what that stat is doing. So, with disk errors again, but on a daily basis (-i 86400) from the start of this month (-t 2018-05-01T00:00:00):

$ sstore export -t 2018-05-01T00:00:00 -i 86400  "//:class.disk//:res.name/sd0//:stat.errors//:op.changed"
TIME                VALUE IDENTIFIER
2018-05-01T00:00:00 27.0 //:class.disk//:res.name/sd0//:stat.errors//:op.changed
2018-05-04T00:00:00 79.0 //:class.disk//:res.name/sd0//:stat.errors//:op.changed
2018-05-05T00:00:00 174.0 //:class.disk//:res.name/sd0//:stat.errors//:op.changed
2018-05-06T00:00:00 270.0 //:class.disk//:res.name/sd0//:stat.errors//:op.changed
2018-05-07T00:00:00 365.0 //:class.disk//:res.name/sd0//:stat.errors//:op.changed
2018-05-08T00:00:00 461.0 //:class.disk//:res.name/sd0//:stat.errors//:op.changed

Finally, slices. These are //:s.[....] and you enter the statistic names which you wish to extract inside the brackets - and once again I'm using the //:op.changed to constrain the output:

$ sstore export -i 86400  //:class.disk//:res.name/sd//:s.[0,28]//:stat.errors//:op.changed //:class.disk//:res.name/sd//:s.[0,28]//:stat.//:s.[vendor,serial-number]//:op.changed
TIME                VALUE IDENTIFIER
1970-01-01T10:00:00 27.0 //:class.disk//:res.name/sd0//:stat.errors//:op.changed
2018-05-04T10:00:00 119.0 //:class.disk//:res.name/sd0//:stat.errors//:op.changed
2018-05-05T10:00:00 214.0 //:class.disk//:res.name/sd0//:stat.errors//:op.changed
2018-05-06T10:00:00 309.0 //:class.disk//:res.name/sd0//:stat.errors//:op.changed
2018-05-07T10:00:00 405.0 //:class.disk//:res.name/sd0//:stat.errors//:op.changed
2018-05-08T10:00:00 500.0 //:class.disk//:res.name/sd0//:stat.errors//:op.changed
1970-01-01T10:00:00 169.0 //:class.disk//:res.name/sd28//:stat.errors//:op.changed
2018-05-04T10:00:00 120.0 //:class.disk//:res.name/sd28//:stat.errors//:op.changed
2018-05-05T10:00:00 215.0 //:class.disk//:res.name/sd28//:stat.errors//:op.changed
2018-05-06T10:00:00 310.0 //:class.disk//:res.name/sd28//:stat.errors//:op.changed
2018-05-07T10:00:00 406.0 //:class.disk//:res.name/sd28//:stat.errors//:op.changed
2018-05-08T10:00:00 501.0 //:class.disk//:res.name/sd28//:stat.errors//:op.changed
1970-01-01T10:00:00 ATA      //:class.disk//:res.name/sd0//:stat.vendor//:op.changed
1970-01-01T10:00:00 Z1D5K89L //:class.disk//:res.name/sd0//:stat.serial-number//:op.changed
2018-05-04T10:00:00 WD-WCC4N7CNYH0S //:class.disk//:res.name/sd0//:stat.serial-number//:op.changed
1970-01-01T10:00:00 ATA      //:class.disk//:res.name/sd28//:stat.vendor//:op.changed
1970-01-01T10:00:00  //:class.disk//:res.name/sd28//:stat.serial-number//:op.changed

For more information about operators, slices and partitions, have a read of ssid-op (aka ssid-op(7)).

Tune in next time when I'll guide you through the process of using a proper certificate for your WebUI instance, rather than the default self-signed certificate.

Chicken Roulade

jmcp

2018-04-08 02:00

I've been wanting to write up my recipe list for a while now, and starting to use a new blog engine is as good a reason as any to finally start doing so.

Ingredients

2 large chicken breasts (about 350-450g each)
100g baby spinach leaves
80-100g sun-dried tomato strips
50ml canola or grapeseed oil (for frying)

Utensils

2x 80cm cooking twine
clingfilm
meat tenderiser mallet
spatula
tongs
large chopping board
frying pan, medium-large diameter
baking tray.

Process

Turn your oven to 180degC. The rest of the steps take about 15 minutes.
Tie a 5-6cm loop in one end of each piece of twine. You're doing this now because at the point when you really need the loop, your hands will be sticky with chicken and sun-dried tomato.
Fan out the babay spinach leaves onto a plate.
Place the sun-dried tomato strips on a plate.
Place the chicken breasts on your board, and cover them and the board with clingfilm. It doesn't need to be tightly wrapped, but you should ensure you have a wide margin around each side.
With your meat tenderising mallet, whack the chicken breasts until they are about 1cm thick.
Remove the clingfilm and discard.
On each chicken breast, place a layer of the baby spinach leaves. You want to cover it, but not too thickly.
On top of the layer of baby spinach leaves, spread the sun-dried tomato strips on one half.
This is the tricky bit - and why you're glad you put the loop in the twine earlier!
Carefully roll one chicken breast over, to make a sort of a sausage. Take the loop end of one piece of twine, place it at one end of the sausage, thread the other end through and then wrap the length around the rest. Fold the end underneath another part of the string so that it is reasonably tight. Repeat for the other chicken breast.
Heat your frying pan to searing temperature and then add in the oil.
When the oil is hot enough, place the roulades in the pan and let them sear - but do not let them cook.
With your spatula, unstick the roulades, then with the tongs give a 1/3 or 1/4 turn and keep on searing. Repeat until each roulade is seared all over.
Transfer to the baking tray and then place into the oven, on a middle rack. Cook at 180degC for 23 minutes.
At the 23 minute mark, remove from the oven and check that they have cooked through using a meat thermometer. If they have, then rest for 5 minutes before carving. If they have not, then return to the oven for another 3-4 minutes.
Slice into about 1cm thick portions, and serve.