Posts for year 2021

Interview questions - A Complaint

A few days ago I tweeted

/images/2021/my-original-tweet.png

Alt text:

Somebody asked (last ~week) something similar to "what questions do you always ask in an #software #engineering #interview?"

The response I saw (but failed to hit like on for bookmarking purposes) was "You hit X on the keyboard. What happens next?"

I was annoyed by this response for several reasons. So annoyed, in fact, that while we were away on a long-planned family holiday I lay awake one night thinking of the many ways which one could respond.

The "always ask" really got to me. Why is that particular question something that the interviewer always asks? Is it appropriate for every role? At what level and in how much detail do they expect the candidate to respond?

Thinking back to my very first encounter with a computer at all, it was an Apple //e at my primary school, early 1980s. You know the one, its user reference manual came with the complete circuit schematics and AppleBasic listing. I don't recall exactly how Apple handled the keyboard interface, but I'll punt and suggest that everything was hardwired....

And it was. I found the Apple //e Hardware Schematics! If you piece together page 4, page 2 and the keyboard circuit schematic -- J17 is where the keyboard ribbon cable connects to the motherboard, and the circuit schematic shows very clearly that the keyboard is a row+column lookup. Very hard-wired.

/images/2021/apple-iie-actual-circuit-board-detail.png/images/2021/apple-iie-keyboard-schematic-detail.png

From the keyboard circuit schematic we see that key 'X' is on (20, 4) which maps to Y2 (pin 19 in UE14) and X3 (pin 31). The markings on UE14 are for the GI (General Instruments) AY-5-3600-PRO, which is a Keyboard Encoder.

From UE14 we go through UE12, the keyboard rom looks up the specific rendering for 'X', pushes that onto the main data bus and sends an interrupt to the 6502b cpu (see page 1 schematic). Then the relevant display function in ROM is invoked to push the actual "how to render this character" instructions into the video (NTSC or PAL) controller chip and thence to the physical display hardware.

By the way, you might be wondering what UE14 actually means. The 'U' means that the component is an integrated circuit, the E14 is an actual row+column lookup so you know where to look for it on the physical board.

/images/2021/apple-iie-row-column-layout.png

It's somewhat obscured, but on the left of the pcb you can see a D and an E, while along the bottom you can see the numbers 1-15. This is a different pcb to what's in the schematics, because you can see that the 6502 is at E1 (so it would be UE1 on the schematic) while in the schematics I've found (see page 1) it's UC4.

Later, there were PCs with keyboards connected by a cable, and as it happens, there was a Intel 8042 inside as the microcontroller - which did pretty much the same thing as the entire Apple //e, except that the interrupt is generated by the Intel 8042 and sent out along the cable to the keyboard port on the main motherboard.... where again, an interrupt was generated and all the other lookups occurred prior to sending out to the display.

That's all well and good, but how about a more complicated system, like one of the many UNIXes such as Solaris. Or how about a minicomputer with hardwired terminals like a PDP-11? One of those hardwired terminals essentially used the same principles described above, but had a much more complicated kernel to push the data into. They also had local display buffer storage, so that wasn't something the kernel needed to worry about. What the kernel was interested in (please don't anthropomorphise computers, they hate it) was sending the keystrokes to the correct process.




Some Pythonic Kafka stuff

I've been actively interviewing over the last few months, and was recently talking with a cloud streaming provider about a Staff Engineer SRE role specializing in Apache Kafka. I've been doing a lot of work in that space for $employer and quite like it as a data streaming solution.

The interviews went well, and they sent me a do-at-home challenge with a one week timeout.

The gist of it was I had to create a Flask app which allowed the user to enter a URL to monitor for status on a given frequency, use Python to write a Kafka Producer to publish this data to a topic, and write a Kafka Consumer to read from the topic and insert into a PostgreSQL database.

Fun!

I did a brief investigation into using threads within a Flask app for the monitoring code, but quickly decided that a better architecture would be to do the monitoring via a separate daemon. Separation of concerns to allow for easier maintenance. Suddenly I'm all about ongoing maintenance rather then how quickly I can deliver a new feature... Hmmm.

The next step was to sketch out the table schema I wanted in the database:



CREATE SEQUENCE public.urltargets_seq
            INCREMENT BY 1
            MINVALUE 1
            MAXVALUE 9223372036854775807
            START 1
            CACHE 1
            NO CYCLE;

    CREATE SEQUENCE public.monitor_results_seq
            INCREMENT BY 1
            MINVALUE 1
            MAXVALUE 9223372036854775807
            START 1
            CACHE 1
            NO CYCLE;

    CREATE TABLE IF NOT EXISTS urltargets (
            urltargets_pk           int4 NOT NULL DEFAULT nextval('urltargets_seq'::regclass),
            urltarget                       varchar(1024) NOT NULL,
            monitor_frequency       int NOT NULL CHECK (monitor_frequency in (1, 2, 5, 10, 20, 30)),
            CONSTRAINT urltargets_pkey PRIMARY KEY (urltargets_pk)
    );

    CREATE TABLE IF NOT EXISTS monitor_results (
            monitor_results_pk      int4 NOT NULL DEFAULT nextval('monitor_results_seq'::regclass),
            http_status                     int NOT NULL,
            start_time                      timestamp with time zone NOT NULL,
            duration            int4 NOT NULL,
            urltarget_fk            int4 NOT NULL,
            CONSTRAINT monitor_results_fk_fkey FOREIGN KEY (urltarget_fk) REFERENCES urltargets(urltargets_pk)
    );


Having decided that I would offer monitoring frequencies of 1, 2, 5, 10, 20 and 30 minutes, I created views for the Flask app to use as well, rather than direct queries. They all look like this, with other values substituted in as you would expect.

CREATE OR REPLACE VIEW view_1m as
            SELECT mr.monitor_results_pk
                    ,  ut.urltarget
                    ,  mr.http_status
                    ,  mr.start_time
                    ,  mr.duration
            FROM monitor_results mr
            JOIN urltargets ut on ut.urltargets_pk = mr.urltarget_fk
            WHERE ut.monitor_frequency = 1
    ;



Since I really like seeing schemata visualised, I created a nice(ish) ERD as well:

/images/2021/DB-ERD.png

Well that was straight forward, how about the application and daemon?


I split out the setup functionality into a separate file importable by the Flask app, the monitor daemon and the consumer. This contained the database connection, Kafka Producer and Kafka Consumer code. There's an interesting little niggle in the Kafka Producer setup which is not immediately obvious and required a bit of digging in StackOverflow as well as enabling debug output with librdkafka:

  def _get_kafka_configuration():
      """
      Common function to retrieve the Kafka configuration.
      """
      global appconfig
      configuration = {
          "bootstrap.servers": appconfig["kafka"]["broker"],
          "group.id": "website-monitor",
          "ssl.key.location": appconfig["kafka"]["keyfile"],
          "ssl.certificate.location": appconfig["kafka"]["certfile"],
          "ssl.ca.location": appconfig["kafka"]["cafile"],
          "security.protocol": "SSL",
          # "debug": "eos, broker, admin",  # re-enable if needed
          'transaction.timeout.ms': 60000,
          'enable.idempotence': True
      }
      return configuration
  def setup_kafka_producer(view):
      """
      Creates the connection to our Kafka brokers so we can publish
      messages to the topics we want. We take the {view} argument so
      that we can avoid blatting multiple producers together and getting
      errors from the broker about zombies and fencing. See
      https://stackoverflow.com/questions/50335227/how-to-pick-a-kafka-transaction-id
      for more details
      Return: a Producer
      """
      configuration = _get_kafka_configuration()
      configuration["transactional.id"] = "website-monitor" + str(view)
      kafkaProducer = Producer(configuration)
      try:
          kafkaProducer.init_transactions()
      except KafkaError as ke:
          # If we can't do this, then we have to quit
          print(f"""Producer failed to init_transactions(), throwing {ke}""")
          raise
      return kafkaProducer


When I was working with the daemon, my first iteration tried opening the DB connection and Producer in each thread's (one for each frequency) __init__() function, and .... that didn't work.

The DB connection is not picklable, so does _not_ survive the call to os.fork(). Once I had rewritten the setup and run methods to get the DB connection, that part was groovy.

The Kafka Producer still required a bit of work. After reading through stackoverflow and the upstream for librdkafka, I saw that I needed to similarly delay initialising the producer until the thread's run() method was called. I also observed that each Producer should also initialise the transaction feature, but leave the begin... end of the transaction to when it was called to publish a message.

I still had a problem, though - some transactions would get through, but then the Producer would be fenced. This was the niggle, and where the StackOverflow comments helped me out:

Finally, in distributed environments, applications will crash or —worse!— temporarily lose connectivity to the rest of the system. Typically, new instances are automatically started to replace the ones which were deemed lost. Through this process, we may have multiple instances processing the same input topics and writing to the same output topics, causing duplicate outputs and violating the exactly once processing semantics.

We call this the problem of “zombie instances.” [emphasis added]


I realised that I was giving the same transactional id to each of the six producer instances (setting the 'transactional.id' in the configuration dict generated by _get_kafka_configuration(), so I needed to uniqify them somehow. I decided to pass the monitoring frequency of the thread to the setup function, and ... booyah, I had messages being published.

That was a really nice feeling.


There is one other aspect of the monitoring daemon that I need to mention. Since each thread reads its list of URLs to monitor each time it wakes, I wanted to parallelize this effort. Monitoring each of the URLs in series could easily take too long from a sleep(...) point of view, and I really did not want to just fork a whole heap of processes and thread either - avoiding the potential for a fork-bomb.

To work around this I used the Python standard library concurrent.futures with a ThreadPoolExecutor for each target URL. Adding attributes to the future object enabled me to use an add_done_callback so that when the future crystallized it would then publish the message.


  def run(self):
      """
      Runs the monitor, updates account-keeping and kicks off
      notifications if required. Then back to sleep.
      """
      self._producer = setup_kafka_producer(self._view)
      self._conn = setup_db()
      self._cursor = self._conn.cursor()
      while True:
          alltargets = self._get_targets()
          if alltargets:
              # We use a 'with' statement to ensure threads in the pool
              # are cleaned up promptly
              with cf.ThreadPoolExecutor(max_workers=50) as executor:
                  for tgt in alltargets:
                      future = executor.submit(check_url,
                                               tgt[0], tgt[1])
                      future.args = (tgt[0], tgt[1])
                      future.producer = self._producer
                      future.add_done_callback(construct_and_publish)
                      self._futures.append(future)
                  for future in cf.as_completed(self._futures):
                      if future.done():
                          self._futures.remove(future)
          sleep(self._sleeptime)

The check and publish methods are outside of the thread definition:


  def construct_and_publish(input):
      """
      Callback function for the concurrent.future that each thread
      makes use of to query a website. Turns the future's attributes
      into a message for 'url-to-monitor' topic, then publishes that
      message to the topic.
      """
      if input.cancelled() or input.exception():
          errmsg = """Monitor attempt for {args} failed"""
          print(errmsg.format(args=input.args,
                              file=stderr, flush=True))
      else:
          message = json.dumps(dict(zip(msgFields, input.result())))
          input.producer.begin_transaction()
          publish_message(producer=input.producer,
                          topic="url-monitor-results",
                          message=message.encode('utf-8'))
          input.producer.commit_transaction()
  def check_url(url, fk):
      """
      Performs an 'HTTP GET' of the supplied url and returns a tuple containing
      (fk, start_time, duration, http_status).
      The start_time is expressed in milliseconds since the UNIX Epoch.
      """
      start_time = datetime.timestamp(datetime.now())
      result = requests.get(url)
      duration = datetime.timestamp(datetime.now()) - start_time
      return (fk, start_time, duration, result.status_code)

With the monitoring daemon written, I now needed the other end of the pipe - writing the Kafka Consumer to read from the topic and insert into the database. This was straightforward: we're polling for messages on both configured topics, when we read one we write it to the appropriate DB table using a prepared statement, commit and then do it all again with a while loop.


  urlToMonitorStmt = "INSERT INTO urltargets (urltarget, monitor_frequency "
  urlToMonitorStmt += "VALUES (%(urltarget)s, %(monitor_frequency)s)"
  urlMonitorResultsStmt = "INSERT INTO monitor_results (http_status, "
  urlMonitorResultsStmt += "urltarget_fk, start_time, duration) "
  urlMonitorResultsStmt += "VALUES (%(http_status)s, %(targetId)s, "
  urlMonitorResultsStmt += "to_timestamp(%(start_time)s), %(duration)s)"
  lookups = {
      "url-to-monitor": urlToMonitorStmt,
      "url-monitor-results": urlMonitorResultsStmt
  }
  if __name__ == "__main__":
      consumer = setup_kafka_consumer()
      connection = setup_db()
      while True:
          with connection.cursor() as curs:
              msglist = consumer.consume(200)
              for msg in msglist:
                  if not msg:
                      continue
                  elif msg.error():
                      print("Received error during poll: {error}".format(
                          error=msg.error()))
                  else:
                      stmt = lookups[msg.topic()]
                      values = json.loads(msg.value().decode('utf-8'))
                      curs.execute(stmt, values)
              curs.execute("COMMIT")
      connection.close()

Of course there should be error handling for the execute(). There should also be packaging and tests and templates for the report. Do not @ me, etc etc.

The reason why all these pieces are missing is because the day before I was due to hand this assignment in to my interviewer, I received a very, very nice offer from another company that I'd also been interviewing with - and I accepted it.




An unexpected live coding challenge

A few weeks ago I was in a technical interview, and was asked to do a live coding challenge. I was surprised, because this is the sort of thing that I expect a recruiter and hiring manager to mention ahead of time. Preparation is very important, and while I know that there are many people for whom live coding is a thrill, there are plenty of other people for whom it can be a terrifying experience.

I'm glad to say that I'm not terrified by it, but it's definitely not an ideal environment for me.

So after a few minutes of me describing what I've done in my career (it seemed pretty clear that the interviewer hadn't actually read my resume), and a few technical questions, we got into the challenge.

"""
For a given string composed of parenthesis ("(", "{", "["), check if the string is valid parenthesis.
1. "()" -- valid
2. "({})" -- valid
3. "(}{)" -- invalid
4. "{()}[{}]" -- valid
5. "({(}))" -- invalid
"""

I noted immediately that this is an issue which requires the processing function to track state, because you not only need to determine open and closed pairings, but also what type it is.

It took a minute to sketch out the classifications that I needed, talking through my decision process all the while:

OPENS = ["(", "{", "["]
CLOSES = [")", "}", "]"]

braces = [ "{", "}"]
parens = [ "(", ")"]
brackets = [ "[", "]"]

classes = { "braces": braces,
            "parens": parens,
            "brackets": brackets
}

I was able to stub out a check function pretty quickly, but got stuck when I went from the stub to implementation, because I realised that I needed to keep track of what the previous element in the string was.

Oh no! How do I do that? (A stack, btw)

Mental blank :(

I needed time to jog my memory, so I asked the interviewer to tell me about himself, what he does on the team and a few other questions.

This, I think, was a very good decision - with the focus of the interview not on me, I could step back and think about what basic data types in Python I could use to implement a stack.

The data type I needed is indeed pretty basic: a list().

A Python list() lets you push (the append() operation) and pop so with the addition of another data structure

counts = { "braces": 0,
           "parens": 0,
           "brackets": 0
}

and a short function to return the class of the element

def __classof(c):
    """ returns whether 'c' is in brackets, braces or parens """
    if c in braces:
        return "braces"
    elif c in brackets:
        return "brackets"
    else:
        return "parens"

we're now in a much better position for the algorithm.

By this time I had also calmed myself down, because everything came together pretty easily for me.

With the above code blocks already noted, here is the body of the function:

def check_valid(input):
    """ For the given examples at top, determine validity.
        Assumption: the input is _only_ braces, parens and brackets
    """

    # start
    c = input[0]

    stack = list()
    stack.append(c)

    counts[__classof(c)] += 1

    for c in input[1:]:
        if c in OPENS:
            ## increment count & add to stack
            stack.append(c)
            counts[__classof(c)] += 1
        else:
            ## closing checks
            if __classof(c) !=  __classof(stack[-1]):
                return "invalid"
            else:
                # decrement count_ (__classof(c))
                counts[__classof(c)] -= 1
                stack.pop()

    return "valid"

We're playing fast and loose here with input validity checking - there's no "is this an empty string?" and we're not handling a single-character string, let alone validating that our input only contains braces, parens and brackets.

With this main() function, though, we're doing pretty well:

## main
strings = ["""()""",
           """({})""",
           """(}{)""",
           """{()}[{}]""",
           """({(}))""",
           """](){}"""
           ]

for element in strings:
    print("""{element:20}""".format(element=element),
          check_valid(element))

which gives us the following output:

()                   valid
({})                 valid
(}{)                 invalid
{()}[{}]             valid
({(}))               invalid
](){}                valid

Using the criteria specified, the final case is invalid, given that it starts with a terminating rather than initiating/opening element - there's nothing to balance the element with. However at that point my time was up and I didn't worry about it.

My interviewer then asked whether I had considered using recursion to solve the problem.

I hadn't considered recursion because I generally don't have to for the use-cases I need to write - and in this particular problem space it didn't seem to me to be an effective use of resources.

Consider the longest case, {()}[{}]. If you're recursing on the check function, then you'll wind up calling the function four times, so that's four new stack frames to be created and destroyed. That doesn't strike me as particularly efficient in time or space. Iterating over the input, however, avoids all of the setup + teardown overhead.

Anyway, it was a relatively fun exercise, and I'm glad I did it. I was able to keep a cool head and buy myself enough time to jog my memory and finish the problem, and it worked the first time (I know, that _never_ happens!).

For future encounters like this, I think it's handy to remember these points:

  1. Breathe

  2. Talk through what you are doing, and why

  3. If you hit a problem, see point 1, and then say that you're stuck and need to think through a particular part of the issue.

  4. If you need to stop talking so you can think, say that that's what you need to do.

It is my impression that if your interviewer is a decent person, they will help you work through your point of stuckness so that you can complete as much as possible of the task.




Why do I see "Duplicate main class"?

I've recently started work on improving my skills and knowledge in the Java ecosystem, and while working on a previous post I burned several hours trying to work out why I was seeing this error:

[ERROR] ..../bearertokenCLI.java:[42,1] duplicate class.... bearer_token_cli.bearerTokenCLI

I didn't find the answers at StackOverflow to be very useful, because they invariably said something along the lines of "clean your project and let the IDE re-index things, it'll be fine".

Which is not a solution - it's like "curing" a memory leak by rebooting the host. I like to know the why of a problem.

I eventually re-re-read the message from the Maven compiler plugin and noticed that it was trying to compile 2 source files. For an exploratory project which only had one source file, this was unexpected:

[INFO] --- maven-compiler-plugin:3.8.1:compile (default-compile) @ bearer_token_cli ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 2 source files to /home/jmcp/IdeaProjects/bearer-token-cli/target/classes

Why did I now have two files? The answer lies in a bit of laziness on my part. The previous post had hard-coded credentials and URLs, but I really wanted to start using a getopt()-like library called picocli, and rather than git commit ... I just copied my first version of the source to bearerTokenCLI-hc-v1.java and kept on editing.

Apart from the relevant information (2 source files) being in an [INFO] block rather than in the [ERROR] block, why on earth couldn't it have printed the names of the files it was compiling along the way?

If you come across this error, a quick

$ find src/main/java -name \*.java |xargs grep -i "public static void main"

should help you find where that erroneous main class is hiding.

Here's where I get a bit ranty. One of the patterns that we invented and applied to the Solaris OS/Net (ON) source tree during the development of #ProjectLullaby was that for every subdirectory which contained source files, the Makefile specified each file individually.

There are solid reasons for this, starting with the need to ensure that when you build part or all of the tree, we do not miss dependencies. Over the years there were enough instances of "developer changes code, adds new file or renames/splits old file, FAILS TO CHECK IN NEW FILES, breaks build after integration" that we forced specificity. You want to add, remove or delete files that the build depended on? It's ON YOU to make sure we're tracking them. A broken build meant either a followup changeset (with "(add missing file)" etc appended to the comment), or getting backed out.

While I'm enjoying some aspects of developing in Java and I do like leaving a lot of heavy lifting to a framework or a toolset, the heavy reliance in the Java world on an IDE to do thinking for you leaves me cold.




Queensland's 2011 floods

It's now ten years since we experienced the Queensland floods of December 2010-January 2011 . I took quite a few photos around the Centenary Suburbs and put some of them into a twitter thread last week. I've put those and many more together into an album for the record.

For our part, we got off lightly. The waters came to within about 1km of our home, and while Energex shut down the West Darra substation at 1pm on the day the waters rose on our part of the river, power was back on again 24 hours later. J was pregnant with #Child2 so the lack of air movement during an incredibly hot and humid night was very draining. But that was it for us. Many people were a lot more affected; the Mud Army helped with cleanup and it was heartbreaking to see just how many homes were damaged.