You might know that me and Kooda made our own custom keyboard a few years ago because we were unsatisfied with what the world had to offer.
It’s reminiscent of the Typematrix because we both had one before and drew some inspiration from it. The Typematrix was an okay membrane keyboard but not very durable and the switches weren’t great, especially after a few years of use.
The black rectangles on the PCB
There are two black rectangles but they’re very different. The small one is the micro-controller, an ATmega32u4. It’s very low-power but still quite powerful. My USB multimeter says it consumes 0.04W even though the code is not very optimized (mostly because we’re using an USB mode that is not very efficient and sends lots of packets, most of which are useless, if I understood correctly).
The other one is a header, or what we call the “Extension Port”! It exposes the whole port B of the ATmega32u4 (8 bidirectional pins with alternative functions), +5V and ground. Among them, a few pins can be used to program the microcontroller via SPI, which was very useful to flash the bootloader.
All the other ports of the microcontroller are used to scan the keyboard, except for one pin that has PWM capability and was labelled “Speaker”. Yes the keyboard might produce some noise eventually!
So many diodes!
This is a gamer keyboard! With linear, low-profile, low-operating-force mechanical Kailh switches. It scans the keys 1000 times each second and there is one diode for each key to prevent ghosting!
The pawprints were added for aesthetics, but also for good luck and accuracy. Touch the pawprints after each frag for maximum efficiency. Or after each death in Celeste.
The extension port
We didn’t know what we’d do with it when we put it there but we couldn’t let the unused pins go to waste. Now Kooda wants to make a Bluetooth extension module to make keyboard wireless and that sounds very exciting!
Me? Ever since I got the Gigatron, a little TTL computer with no microprocessor (check it out it’s amazing), I wanted to reproduce one of its killer features: it produces a VGA signal.
VGA: old but gold
VGA was first introduced in 1987, and was since extended with signal specifications supporting many video resolutions, from 320×200 all the way up to 1920×1200 at 60Hz (and possibly even bigger ones but I’ve never seen actual hardwarde for those).
The VGA signal is pretty simple, it has essentially 5 wires + ground:
- vertical sync (signal level),
- horizontal sync (signal level),
- red (analog),
- green (analog),
- blue (analog).
Actually it’s more complicated, each channel should have its own ground, there’s EDID stuff, some more signaling… But providing those 5 signals and a common ground will suffice to illuminate most VGA-compatible monitors. Using the right timings of course.
Timing is key
My ATmega32u4 is fast but not that fast. We’ll try to generate the video mode with the slowest signals: 640×480 at 60Hz.
To generate a VGA signal we must send synchronisation pulses to the screen. One pulse 60 times per seconds for vertical sync (the frame rate) and one pulse for every graphic line for horizontal sync; that’s 31500 times per second. Some lines are not visible: we actually generate 525 lines for 480 visible lines. We’ll send some actual pixels on the color wires between horizontal sync pulses.
So how many pixels are there in a second? A line is 800 pixels, only 640 are visible (80%). That means we have: 800×525×60 pixels per seconds. Pixels are being sent out at 25.2MHz! That’s gonna require some serious bit-banging.
Let’s see… My ATmega32u4 runs at… 16MHz. Damn. Even if I send one pixel per cycle I’m not making it.
Analog saves the day
But! VGA is an analog signal, it was made for cathodic screens. Those screens understand sync pulses and lines, but pixels? They don’t know or even care what those are. The electron guns inside will just fire whenever there’s tension on their color channels.
That means we are free to decide the horizontal resolution. How convenient!
Let’s see. Assuming we can produce one pixel with each cycle of our microcontroller, and we have to produce 31500 lines per second; that’s 16000000÷31500≈508 pixels per line. As we only get to see 80% of those pixels, that means we’ll have 406 visible pixels in each visible line.
That’s our target resolution: 406×480 pixels. Not that bad! It’s more than the Gigatron, which runs natively at 6.25MHz. That’s precisely four times less than the pixel clock, so it can only output 640÷4=160 pixels per line.
Hopefully modern LCD monitors will be as tolerant as the good old cathodic monitors!
A prototype!
Let's code!
I made a first prototype by going full
assembly mode with the ATmega32u4. It’s the most convenient because the
timing is very important and I have to count how many cycles each
instruction consumes. After choosing the pins I had to do some
wiring.
The LEDs were useful for debugging! Also I put a little button there so I can reset the keyboard to flash a new firmware. I flashed many many firmwares during the development! But the flash is given for 10000 erasure/write cycles so I think I’m still good.
The Olimex VGA breakout board is super handy and was lended to me by Kooda!
Once I got the timings right I managed to get a program with lots of busy-waiting and disabled interrupts to show some vertical lines. Each line is made of:
- 61 cycles for a H-Sync pulse,
- 31 cycles H-Sync back porch,
- 406 cycles of video signal,
- 10 cycles of H-Sync front porch.
Vertically, we have:
- 2 lines of V-Sync pulse,
- 33 lines of V-Sync back porch,
- 480 lines of video signal,
- 10 lines of V-Sync front porch.
That looks great already! Mmh the screen is detecting 720 pixel columns but that’s okay, we’re free to decide horizontal resolutions, so it’s only fair that the screen does it too. Incidentally, with a little rewiring I managed to get a cool amber color like the old amber terminals. Warm and cosy.
The documentation of the ATmega32u4 was wrong on the cycle count of
two instructions (rcall
and ret
), I had to
find the real values in another document. That was not the cause of the
misalignment of the vertical lines at the top of the screen in the
picture above, that one was just me not counting right.
Showing something meaningful
After writing this prototype, I came to a haunting realization: I cannot possibly output one pixel per cycle.
“Advanced RISC Architecture”, “135 Powerful Instructions – Most Single Clock Cycle Execution”, they say. Well sure most instructions consume 1 cycle but they do very little! However, that’s the spirit of RISC so I can’t blame them.
The Gigatron can pull off such a color show because, although it has only 8 (EIGHT!) native instructions, one of them is very powerful:
ora [Y,X++],OUT
This instruction does 4 things in just one cycle:
- Read a byte from memory, at address
(Y<<8) | X
- Apply a bitwise or to this value with whatever is in the accumulator
- Send the result to the output register (it is directly connected to the VGA DAC, which is in fact 8 resistors)
- Increment
X
In just ONE cycle! That’s the instruction the Gigatron uses 160 consecutive times to bit-bang a line’s pixels to the screen. The ATmega32u4 would use 3 whole cycles to mimic this lone instruction’s effects, assuming we don’t need the bitwise or:
lds r16, X+ # 2 cycles
out PORTB, r16 # 1 cycle
3 cycles per pixel? Mmh where would that get us. The visible part of the line lasts 406 cycles, we would end up with 135 pixels per line. That’s less than the Gigatron, whose pixels are quite chunky already.
A resolution of 135×120 pixels isn’t great but that would be a start. Okay let’s create an array for 135×120 pixels, so 16KB.
The ATmega32u4 has very little RAM
Oooh right only 2.5KB of SRAM. That’s very little. I can only store a black and white image in there (2KB); but now, outputting a pixel is not 3 cycles anymore, and we’re left with so little SRAM that running any slightly useful user program will be a challenge…
Ok so what do we do when we have many pixels but little memory? Why
not try a Text Mode
! We could have, like, 8×12 pixels
characters, and that would leave us with 50×20 characters on screen for
a RAM usage of 1000 bytes.
But now it’s even harder to draw a pixel, we have to do lookups and stuff. However, we learn from the Gigatron a nice little trick: working during the scanlines!
The Gigatron spends most of its time generating the VGA signal, but it cleverly manages to execute user programs during scanlines and non-visible lines. That’s a neat trick! We could use it for the benefit of generating the VGA signal, so we get twice the amount of processing time for each graphic line on screen! And we get a stylish oldschool screen effect for free!
Ok so now our target resolution is 400×240. Each even line of the 480
real lines is an empty scanline whose time will be used to prepare the
pixels to display in the next line and put them in some buffer array for
easy access. But we still can’t display 400 pixels in one line with the
lds
and out
instructions, as we saw that would
require 3 cycles per pixel.
Unless we compile our character set to native code!
Hard-coding a character set in AVRe assembly
Every character has 12 lines. Each line of each character will have its own little native code. What we’ll do is that during each scanline, we’ll construct an array of code addresses corresponding to character lines code, and we’ll call these addresses during the rendering, in the next line. Those will be deduplicated and daisy chained, and it’ll look like this:
In this example, we set r18
and r19
to
values that will respectively set low and high pin 1 of the port B,
which will be our (monochrome) signal. We use the out
instruction to change the pin’s state, and wait one cycle with
nop
.
But there’s a catch here. the lds
and ijmp
at the end, which load the address of the next rendering code to execute
and jumps to it, consume 2 cycles each. Hence our characters are not 8
pixels large, but 8+2+2=12 pixels large, they are very spaced out, and
we’re left with only 33 columns of text.
Also, we can only load 1 byte at a time with lds
, and we
put it to ZL
(an 8-bit register), but the address
ijmp
jumps to is inside Z
(a 16-bit register),
whose value is (ZH<<8) | ZL
. Hence, all our
character-line rendering code must fit within 256 words of memory, which
limits the number of different character-lines we can have in total!
That’s a lot of constraints for a rather meh result.
A constrained character set
It is well known to artists that constraints stimulate creativity. So let’s get inspired and find dubious ways of solving our problems!
First, we can move the lds
and put it in the place of
two consecutive nop
’s. That would reduce the separation
between characters on screen down to two cycles (two pixels), the time
necessary to execute the ijmp
instruction. The constraint
created here is that every line of every character needs to have two
consecutive idle cycles available to execute the lds
during
its rendering phase, between each jumps, which means there must always
be three consecutive pixels of the same color.
Second, we can reduce the width of our characters from 8 to 6 pixels. This way, we’ll have more columns of text and we’ll have way fewer different character lines to store; that will help us make them fit in the 256 instructions window.
Now we’re back to 50×20 characters on screen, a comfortable character density, with not too much horizontal space between alphanumeric characters, most often 3 pixels, for 5 pixels of actual character pixels.
Good enough I say! Let’s make a MS-DOS Codepage 850 character set within those constraints because that’s what I grew up with!
I think it’s a rather cool looking charset; some characters look a little strange due to the 3-consecutive-same-color-pixels rule, but I’d say it adds some charming uniqueness to it! So how does it look like in assembly?
Neat! I made a program to generate the assembly from the bitmap character set file so I could experiment with it. The program detects noncompliances with the constraints, and also counts how many times each character-line is used so I can try and replace the least used ones with other existing ones to lower the number of instructions.
There’s also this little trick: one can use a relative jump
(rjmp
) to the next address to wait 2 cycles using only one
instruction. That saved me 12 instructions!
Fun fact: the last pixel of each character-line is “repeated” during the jump, so graphic characters used to draw boxes and stuff can be created and seamlessly link with the next character!
Oh yeah, It’s all coming together
After long hours of counting cycles and writing code that always runs for the same amount of time, I managed to get a stable signal and this terrific view!
Oooo sexy. It feels so very reminiscent of the amber terminals I’ve seen in videos on Youtube and for which I got nostalgia by proxy! Now I think about it, my first terminal was a Minitel actually; it had a cold black and white screen. I’ll get hold of one again someday. Anyways!
Wrapping up
What you’ve seen just above is actually more advanced than just a program showing some characters. It’s a very small terminal emulator, and there is a small user program running, and the keyboard is being scanned and pressed keys are sent to the small program. It only echoes keystrokes for now.
What happens here is that while the rendering of the visible part of the screen is still a big uninterruptable process, the H-Sync pulses of non-visible lines are generated in an interrupt connected to a precise timer. This way, I can have a user program run cooperatively besides the VGA signal generation!
Also, I scan 1 of the 9 columns of the keyboard during the interrupt, in the last 9 non-visible lines, so the keyboard is scanned entirely at the end of each frame.
And that’s it for now!
Next, I want to implement an interpreter for a little programming langage now that the terminal emulator is working. I may try making a Forth interpreter!