Solving Lattice Path

SPOILER: if you want to or are working on the exercises from Project Euler, you should stop reading now!

According to Wikipedia, Project Euler is a website dedicated to a series of computational problems intended to be solved with computer programs [1].


The problems are mostly math related and one main characteristic of most problems is that they are usually hard to solve through brute force because they usually have polynomial/exponential complexity and questions always ask for very large input or number of inputs.

Problem 15, Lattice Paths, is one that I found really interesting because it's a geometric problem and with an intuitive solution. Let me copy here the text from the problem.

Starting in the top left corner of a 2×2 grid, and only being able to move to the right and down, there are exactly 6 routes to the bottom right corner.
How many such routes are there through a 20×20 grid?

So my first reaction was: I'll just do a brute force algorithm and see what happens!

I wrote an algorithm that uses recursion to find all possible paths. This is how it ended up:

 class LatticePath
   def initialize(size)
     @size = size

   def total
     moves(0, 0)


   def moves(rows, cols)
     return 1 if rows == @size && cols == @size
     return 0 if rows > @size || cols > @size
     return moves(rows + 1, cols) + moves(rows, cols + 1)

 sides = (ARGV[0] && ARGV[0].to_i) || 20 # default value if none specified
 lattice_path =

Well, how bad could it be? Turn outs very bad.

Running this for less than 13 sides was OK. Running this for 14 sides was enough to take quite some time. Running for more than 14, it just takes too long. I suppose it would take years for 20 sides. It obviously explodes in complexity very quickly.

Looking up for ways to solve this or similar problems, I ended up watching this video from Khan Academy where Salman basically gives it away!

Path Counting Brain Teaser

I will not try to re-explain what's in the video so you should watch it first. I'll follow from where the video ends, and will develop an algorithm that only requires an array of size equal to one _side_ of the matrix to solve this problem.

Basically after filling in values for each cell, what we get is an matrix like this:


And looking more carefully, we can see that this matrix is symmetric. The coloring I used on the pictures should make that clear. We are interested in the value on the main diagonal so we can remove everything below it.


The next step was to recognize that each cell depends only on the cell to the left and above it. The element in the diagonal is always the element above it multiplied by 2 and the other elements in that row are the sum of the element in the cell to the left with the element in the cell above them. The picture below should make it clearer.


Now even more interesting is observing that every element only depends on the elements exactly above them and once a new row is calculated from the row above it, the former is not necessary anymore. So in the end the whole problem can be solved by a simple array of size equal to the number of cell on one side of the square matrix!

And here's my optimized solution to the problem:

 #!/usr/bin/env ruby

 def lattice_path(sides)
   a = { 1 }

   1.upto(sides - 1) do |col|
     a[col] *= 2
     (col + 1).upto(sides - 1) do |n|
       a[n] = a[n-1] + a[n]

   a[sides - 1]

 sides = (ARGV[0] && ARGV[0].to_i) || 20 # default value if none specified
 puts lattice_path(sides + 1)

Have fun!

Table-driven state machines

Some weeks ago I was writing a protocol stack for a serial RFID reader for a commercial project I'm working on. Apart from one command, all the others are standard serial communication where you send a fixed data structure and receive a fixed data structure. But there's this command that does inventory, which means, get all the RFID tags in the area. You request the data and it starts sending many different packets for each tag.

The response format and size can vary but it always starts with a fixed header which can be "RITM", "BITM", "EITM" or "IITM". So I basically wrote a state machine do wait for one of these possible headers to start reading the remaining data (size depends on the header type).

To make it easier I draw the following flowchart:


I guess not many people will be impressed by my great drawing skills! Btw, if anyone can suggest me a good flow charting software for free or low-cost I'd be thankful.

The only missing edges on this diagram are the ones that connect all states back to state S0 when an invalid character is received.

Rudimentary state machine code

The code I initially wrote to solve the problem is shown below:

uint8_t rxbuf[64];
int i;
uint8_t b;
uint8_t idx;

#define S0 0
#define S1 1
#define S2 2
#define S3 3
#define S4 4

while (1) {
    uint8_t state = S0;
    idx = 0;

    do {
        err = serial_read(&rxbuf[idx], 1);
        if (err != 1) continue;

        switch (state) {
        case S0:
            b = rxbuf[idx];
            if (isvalid(b)) {
                state = S1;
                idx = 1;
            } else {
                idx = 0;
        case S1:
            b = rxbuf[idx];
            if (b == 'I') {
                state = S2;
                idx = 2;
            } else if (isvalid(b)) {
                state = S1;
                rxbuf[idx-1] = rxbuf[idx];
                idx = 1;
            } else {
                state = S0;
                idx = 0;
        case S2:
            b = rxbuf[idx];
            if (b == 'T') {
                state = S3;
                idx = 3;
            } else if (isvalid(b)) {
                state = S1;
                rxbuf[0] = rxbuf[idx];
                idx = 1;
            } else {
                state = S0;
                idx = 0;
        case S3:
            b = rxbuf[idx];
            if (b == 'M') {
                state = S4;
            } else if (isvalid(b)) {
                state = S1;
                rxbuf[0] = rxbuf[idx];
                idx = 1;
            } else {
                state = S0;
                idx = 0;
    } while (state != S4);

The function isvalid tests if the character received is a valid starting character for the header strings know.

static uint8_t isvalid(uint8_t b)
    return (b == 'R' || b == 'B' || b == 'E' || b == 'I');

It worked as expect but it should be clear that it's messy and has a spaghetti structure which could quickly grow into unmaintainable code. The problem is having a case for each state and inside it a nested if/elseif test for the received character.

Towards a better state machine structure

While reading Programming Game AI by Example (great book!) I stumbled across some designs for state machines and in particular one table-driven which doesn't require object orientation and polymorphism. Although the book only gives a rough example, it is clear enough. I previously saw a very similar approach used in lexers where a given input regular expression generates a table-driven state machine.

To get started the first thing I did was to define a struct to accomodate the necessary data. Here's how it looks:

typedef struct
    int8_t cur_state;
    uint8_t (*valid)(char);
    int8_t next_state;
    uint8_t index;
} transition_t;

Basically to decide the new state what is used is a current state and the last character received. But using the last character directly has many drawbacks so I decided to use a function (valid) to decide the next state. The index variable is custom data I need to get the position where the character will be inserted in the input buffer. This is not really part of the state machine.

Here's what the declaration of the final state machine looks like:

#define S0 0
#define S1 1
#define S2 2
#define S3 3
#define S4 4

static transition_t trs[] = {
    /* STATE 0 */
    {    S0,      initial_char,   S1,  0   },
    {    S0,          _true,      S0,  0   },
    /* STATE 1 */
    {    S1,         I_char,      S2,  1   },
    {    S1,      initial_char,   S1,  0   },
    {    S1,          _true,      S0,  0   },
    /* STATE 2 */
    {    S2,         T_char,      S3,  2   },
    {    S2,      initial_char,   S1,  0   },
    {    S2,         _true,       S0,  0   },
    /* STATE 3 */
    {    S3,         M_char,      S4,  3   },
    {    S3,      initial_char,   S1,  0   },
    {    S3,         _true,       S0,  0   },

uint8_t trs_size = sizeof(trs) / sizeof(trs[0]);

The element in the second column is a function which will evaluate if this is the valid row. For example, when in state S1 if I_char returns true (by C's definition) then it goes to state S2. Else if initial_char returns true it stays in state S1. The function _true always returns a true value.

The order of the row declarations is important because they are tested from the starting row towards the ending row.

The variable trs_size is the number of elements in the state machine description used in the main algorithm to iterate over all elements.

These are the definitions of the fuctions used in the declaration of the state machine:

static uint8_t initial_char(char c)
    return (c=='R' || c=='B' || c=='E' || c=='I');
static uint8_t I_char(char ch) { return (ch == 'I'); }
static uint8_t T_char(char ch) { return (ch == 'T'); }
static uint8_t M_char(char ch) { return (ch == 'M'); }
static uint8_t _true (char ch) { return 1; }

The function initial_char returns true if any of the characters valid to initiate a header is found. There is also one function for each of the extra possible characters. _true always returns 1.

And here is the final state machine code!

uint8_t state = S0;
do {
    err = serial_read(&b, 1);
    if (err != 1) continue;

    for (i = 0; i < trs_size; ++i) {
        if (state == trs[i].cur_state) {
            if (trs[i].valid((char) b)) {
                state = trs[i].next_state;
                rxbuf[trs[i].index] = b;
} while (state != S4);

In the end the code is actually bigger and probably less efficient. But it should be a lot easier to maintain. If I copy/paste it on another project I will only need to modify the state machine array declaration (and probably also the custom data).

Things that could be made better

The is (at least) one thing which could be made better in the newer code. Look at these function definitions below:

static uint8_t I_char(char ch) { return (ch == 'I'); }
static uint8_t T_char(char ch) { return (ch == 'T'); }
static uint8_t M_char(char ch) { return (ch == 'M'); }

As it must be quite obvious, they all look pretty much the same. Only the character tested changes. There should be a way of fixing this with preprocessor macros. I have in mind something like this:

#define CHAR_FUN(c) static uint8_t c##_char(char ch) { return (ch == '##c##'); }


The problem in this code is that preprocessor's concatenation doesn't work inside single quotes. So it generates the correct function name but the test will look like ch == '##c##' instead of the correct character.

I'm not sure yet of how to do it but I'll keep trying!

UPDATE (22 Jun 2013)

I just found a way of solving the problem above using stringification as provided by the preprocessor. Not great but it does the job.

#define CHAR_FUN(c) static uint8_t c##_char(char ch){char *s=#c;return(ch==s[0]);}

Back to guitar playing

I've been practicing some guitar playing on the last few weeks. Maybe a little longer than a few weeks! Anyway, after a lot of work and music listening, which is mostly jazz and fusion these days, I'm getting somewhat better at learning songs by ear. There's a song by the band Ohm:, the band of the great guitar player Chris Poland, that I really love and made a quick and dirty cover this afternoon. You can watch it here:

I feel quite proud of learning the whole song by ear, without any kind of tablature and transcription help.

And the next post, hopefully, will be about programming! :)

Fixing the serial port of my Nexys2

I'm writing this post because today I fixed the serial port of my Nexys2 board. This will mainly highlight how we can get trapped in some way of thinking when we're pretty sure about something and not contemplate a whole spectrum of options surrounding it. It will be somewhat shameful I guess, but let's see if you can find the solution more easily than I did!

A month ago or so I was working on learning Verilog and was writing some code. I got to a point where I wanted to test a softcore and I needed to upload code through the serial port. While getting there, I made a port of the lattice mico32 softcore to my board. Actually the hard work was already done and I only had to make some very small adaptations. I used the code from this repo:

The problem was that it booted, sent a prompt throught the serial port but I could not input commands. If you look at the code at firmware/boot0-serial/main.c you'll see that what it does is this:

uart_putstr("**soc-lm32/bootloader** > \r\n");
c = uart_getchar();

I'm omitting the irrelevant code. So I got the prompt but no data that I sent from the computer ever got in to the Nexys2. Since the lm32 is a quite complex softcore I thought that maybe it had some problems and ended up testing other UART cores. Actually I ended up writing my own which is at this repo (in the directory uart): (btw, this is not good code!)

The results were the same. It worked in the transmit direction only.

The schematics for the Nexys2 UART converter is this:


I put an oscilloscope on the RX signal and tried to find out what was happening. So here's how a RS232 UART works. The TX/RX lines are usually in high level. When you send data, the first thing it does is to send the START bit. A START bit is characterized by a transition from high to low level. The data bits go next and at last the STOP bit (or bits). What was happening here was that the RX line would go to low level and stay that way. It would never go back to high level. No data was going through. To get it back to working state I had to disconnect/reconnect.

I started suspecting my the issue was with my USB <-> RS232 adapter. I got another board (an ARM CM3 powered devboard) and wrote some code for doing serial communications. This ended up working and I got data in and out. So it was clear the problem was not with my serial adapter. Btw, here are the schematics of the board I used to do the test (Olimex STM32-P103):


The conclusion then was that the problem was either with the pin used for RX on my FPGA or with the onboard TTL <-> RS232 converter. So I ended up buying some extra converters to test it. And these things are not easy to get here in Brazil. But I found them at Multcomercial. They arrived yesterday and this morning I removed the old ST3232 and put a new MAX3232 chip. You can see the final result below:


As you can see I ended up destroying one onboard wire and had to fix with normal wire. When I turned it on, data came out and when I sent data nothing came in. Shit!!!

So, the problem must be with the FPGA pin, right? Actually no, it isn't.

Let's look harder at schematics! There are some difference between the two boards. The STM32 has the main lines (RX/TX/GND) connected in the standard way but it has also the CTS/RTS lines connected to the converter and has a resistor between DTR and DSR. The Nexys2 has the RX/TX/GND lines connected the same way but RTS is connected to CTS (no connection to the converter) and DSR/DTR/DCD are all connected together. Interesting, right?.

I got a spare serial cable, soldered the 3 pins which matter (RX/TX/GND), tried again and voilá... data out and data in.

Sometimes we get stuck with knowing for sure that we know what's happening but turns out we don't (at least it applies to me!).

Cheap Chinese SSDs

Since I bought a Netbook last year, I wasn't very happy with it's HDD. The reason was that I had to disable the power management or at least configure it to not allow the spin down of the drive.

$ sudo hdparm -B 255 /dev/sda

The problem was so annoying that I could not even edit a text file on Vim without the disc spinning down constantly and hanging text entry for some seconds. Disabling power management was effective to solve the problem but then the battery would discharge faster. Also since the Netbook is so small and light, using it at any surface that is not very flat and stable causes slight movements which could damage the drive. It all bored me.

So I wanted to install a SSD because it would solve all problems. No more disc spinning down, no need to disable power management and no requirement of having a stable surface to use it. The problem was the price. A SSD drive purchased here in Brazil would end up costing more that the Netbook itself!

I did some research and ended up buying a cheap chinese SSD on Aliexpress. The link to the seller/product I purchased follows:

Kingspec 2.5 SATA 32GB

After almost two months I'm happy so far. I was afraid it would end up giving some problem after some use but until now is like it was on first day. The system boots in something like 3 seconds (after BIOS). Everything got boosted up. The main problem is the size which prevents the use of some software. But I could build a Linux distro based on buildroot for a project I'm working on, I built cross toolchains for ARM/AVR/MSP430 and build llvm (which took 8 hours!!!). It suits me well since I can do the things I do day to day.

When I bought it in January it cost me US$ 43. Now it's selling for US$ 39. I wonder how low this price will get in the next 3 months, 1 year. Even being a very simple SSD, lacking important features like TRIM, still was a great purchase. A now with more and more embedded boards coming with a SATA controller...

Arrow and cryptographic protocols

This weekend I spent a lot of time watching Arrow. I started watching it last year but then lost interest and now decided to get up to speed. I watched episodes 11 to 19 (most recent). It actually got quite good.

The first 10 or so episodes are kind of self-contained. It's like watching Law and Order were you can watch any episode and you don't loose much. The exception are the flashbacks where the story of Oliver Queen during the five years he spent as a castaway on an chinese Island and was considered dead are told. Btw, the name Green Arrow was never used in the series. The vigilante's name in the series is The Hood.

But from episode 12 and forward the story gets more serialized so that every episode is a continuation of the previous one.

Although I'm liking the series, there are so many annoyances that requires you to develop new levels of suspension of disbelief. Let me be clear: I have no problem with super-heroes. But there are no super-heroes in Arrow. There are only heroes that are all normal people (with special training, of course). But than you see every hero and villain killing many bandits, police offices and FBI agents armed with machine guns using only rudimentary weapons like bow and arrow or throwing knives... I roll my eyes everytime...

The best parts of the show are the ones told in flashbacks which tell the story of Oliver Queen after a shipwreck and his years living with soldiers.

But the real reason for me to write this post is to mention about the computer related scenes. Oh gosh... they are so bad that they are funny. One scene really deserves special mention. It's on episode 11 and Felicity Smoak is hacking a pen drive and then what follows is this screen of code:

 for( i = 0; i < argc; i++)
    if( argv[i][0] == '-')
       switch( argv[i][1])
          case 'j': case 'J':
             julian = 1;
          case 'q': case 'Q':
             quiet = 1;
          case 's': case 'S':
             sat_no = atoi( argv[i] + 2);
          case 'd': case 'D':
             n_days = atoi( argv[i] + 2);
          case 'f': case 'F':
             ofile = fopen( argv[i] + 2, "wb");
          case 'r': case 'R':
             data_file = fopen( argv[i] + 2, "ab");

And then she describes it as: Military grade cryptographic security protocol!!!


The code seems to come from this file:

Also funny was that the screens of code load slowly like if running on very old computers... Go watch it on youtube:

Felicity Smoak and security lesson!?

Btw, every single people on the show uses Nokia Lumia's and LeNovo computers running Windows 8... go figure!

Running Bootcamp Linux on Parallels

Some weeks after I bought my MacBook Pro about two years ago I decided that I needed to run both Mac OS X, Windows and Linux on it so at the time I upgraded the original HDD to a Seagate 7200rpm 750GB disk.

I installed both Windows 7 and Linux as Bootcamp partitions. Bootcamp for Mac basically does BIOS emulation and enables you to use old style MBR partitions for booting operating systems (and some other things too). Mac OS X uses EFI and GPT.

I use rEFIt to triple boot my system. It displays an operating system chooser at startup with fancy graphics and also maintains consistency between GPT and MBR partitions.

But the problem is that it sucks having to reboot everytime you have to run a program on another operating system!

For Windows, I've been running it on Parallels Desktop as well as native. Parallels does some fancy integration enabling Windows programs to run on Mac OS X screen and works as Mac programs, including clipboard integration and such.

So my objective was to run Linux on Parallels as well is runs Windows. For Windows you just have to create a new VM using a Bootcamp partition and Parallels takes care of all configurations and everything is done automatically. For Linux you have to do pretty much everything manually. Almost everything is standard VM configuration and straightforward for those used to running VM software. The hardest part is making the hard disk emulation run from a native partition.

So I started adding the machine's disk, enumeration and selected the partition using the GUI. But when I started the machine it hanged with error: missing operating system.

Looking into the internals of a Parallels VM, there is one directory for each created VM. For Linux it was located in:

/Users/utzig/Documents/Parallels/Arch Linux.pvm/ST9750420AS (disk0).hdd/

The contents were basically these (with some extra backups, etc):

DiskDescriptor.xml PhysicalGpt.hds PhysicalMbr.hds

DiskDescriptor's contents seemed right. I tried comparing the _hds_ files to the Windows VM's counterparts and they were equal. And here's the catch. Those who remember the old days of _DOS_ will probably remember that there was only one MBR partition that could boot a system. The partition set as bootable. So this was the main problem, I had created the Linux VM having my Windows partition as bootable!

So I removed again the hard disk from the Linux VM in Parallels and then changed the bootable partition on the command line. To do that on Mac OS X, open the terminal and run:

$ sudo fdisk -e /dev/rdisk0

fdisk: 1> p
Disk: /dev/rdisk0   geometry: 91201/255/63 [1465149168 sectors]
Offset: 0   Signature: 0xAA55
         Starting       Ending
 #: id  cyl  hd sec -  cyl  hd sec [     start -       size]
 1: EE 1023 254  63 - 1023 254  63 [         1 -     409639] <Unknown ID>
 2: AF 1023 254  63 - 1023 254  63 [    409640 -  997771856] HFS+
 3: 07 1023 254  63 - 1023 254  63 [ 998445056 -  230230016] HPFS/QNX/AUX
*4: 83 1023 254  63 - 1023 254  63 [1228675072 -  236211912] Linux files*
fdisk: 1>

The necessary commands to know here are: p to print the partition table, f to set the bootable partition and quit to quit saving. So by running f 4 I set the Linux partition to bootable which is shown by the star symbols enclosing the line.

Then I re-added the hard disk to the Linux VM which this time booted to GRUB. Linux hanged because it couldn't find the root partition. After some fiddling I found that I had to add the line:

MODULES="ahci libahci sd_mod"

to /etc/mkinitcpio.conf and re-create the initramfs image. That does make some sense but I don't know exaclty why it would work without it when booting native.

After being able to get to the Linux console I spent some time cleaning up to run on a VM.

But the funny part really comes when configuring Linux's graphic programs to run on Mac OS X kinda like Parallels does for Windows with Coherence. For Linux it relies on X11 and since Mac OS X already has or you can also run XQuartz (which I do), I could enable Linux apps to present on Mac OS X. For that I used SSH X11 forwarding. By editing /etc/ssh/sshd_config on Linux and changing the following configurations:

X11Forwarding yes X11DisplayOffset 10 X11UseLocalhost yes

After restarting sshd now I can ssh to my Linux VM

$ ssh -Y

It should already export the correct $DISPLAY env variable with a value that mirrors the X11DisplayOffset from sshd_config. If not just configure it manually:

$ export DISPLAY=localhost:10.0

And the try running:

$ xeyes

If everything works alright xeyes should now be running on you Mac OS X!

The really useful part is running applications that don't have a Mac OS X port like, for example, Xilinx ISE WebPACK. So here a screenshot of ISE running on Mac OS X and synthesizing the pong example!


TCP/IP on an AVR microcontroller

Last week I was going through some boards I bought last year which ended up on some pile of electronics trash here and I found this very cheap Microchip ENC28J60 based Ethernet to SPI bridge adapter and thought it would be cool to use it for something.

This little board can be bought on Aliexpress for a price close to US$ 5 or 6.

Here the product's website: ENC28J60 Network Module

I ended up choosing to use and Seeeduino Mega (Arduino Clone) to run the tcp/ip stack and wrote the remaining software myself. I started with one of my preferred RTOS's, ChibiOS/RT which already has a demo for Arduino based boards, thanks to yours truly!

Since there is no SPI driver for AVR based microcontrollers, I ended writing one myself (and contributing back). That was the easiest part, I guess.

After that I created the demo based on an existing uIP (a lightweight TCP/IP stack) demo already available on mainstream ChibiOS/RT but tailoring (or stripping it down) for the AVR.

Since the microcontroller has only 8K for RAM I had to remove the webserver from the demo and enabled the hello world app from uIP which is basically just a listening socket that has functionality similar to an echo server.

The hard part was writing the ENC28J60 driver. I started using a driver from the EtherCard project and porting it to ChibiOS/RT.

After some days of debugging and quite some time lost because I was using a SPI clock (8MHz) which made communication between the Arduino and ENC28J60 unreliable (although far from the limits of the ENC28J60 chip itself which are 20MHz), I finally got to the stage of working ping now.

Ping itself is taking currently takes around 4.5 ms on a cross-over cable which is ok, I guess. Sometimes it doesn't work which sucks. The issue is probably with buffers...

From now on things should get a lot easier and probably very soon this will be production ready.

Finally a photo of my current setup!


The project sources are available for those courageous and crazy enough:

Asus 1025c

Last November I did not resist the temptation and bought a Netbook. For some time I had been wishing to buy a very portable computer that could be carried anywhere without much effort. I'd been keep an eye on the Asus 1025c for a few months due to its relatively well powered processor (for an Atom, obviously) and low-price.

So, on Black Friday, the day of making new debts, which we proudly copied from the Yankees, although ours is a kinda watered down version, I saw it for 75% of the normal price and voilá, here it is.

The specs of this beast are:

  • Atom N2600 1.6GHz
  • 2GB RAM
  • 320GB HDD
  • 10" screen
  • Atheros AR9485 WiFi

It came with Windows 7 Starter so guess what's the first thing to do? Not hard, huh.

I ended up installing Linux (Arch), FreeBSD and NetBSD. Despite the quite new hardware every OS works fairly well. Linux had a lot of problems with the WiFi chip initially but in the latest kernel versions it works really well.

The low-point has been the video support. For the N2x00 Atom family, Intel decided to license the PowerVR SGX 545 core from Imagination Technologies. So basically there's no 2D and 3D acceleration. Intel even released a closed source driver which require some old versions of many packages which there's no way to use with a rolling release like Arch Linux.

In general I'm impressed with the performance of the machine. It's not close to a Core i5/7 but it's better than I expected. As long as the quad-core is exploited with things like make -j4 it is good to go. The two main problems with system limitations were: running Eclipse for anything; and synthesizing VHDL/Verilog designs. The Xilinx tools take a really long time to synthesize anything. But they are slow even on an i5 anyway!