Gifting the power of VGA to my custom keyboard

I have nostalgia for keyboard computers of the past so I’m making my own!
on 2024-10-31, at 16:19 CET, it was a Thursday

You might know that me and Kooda made our own custom keyboard a few years ago because we were unsatisfied with what the world had to offer.

A PCB with many diodes and a few other components on top, and a lot of colorful keys. — “Le Clavier des Pattes”, the paws keyboard in French

It’s reminiscent of the Typematrix because we both had one before and drew some inspiration from it. The Typematrix was an okay membrane keyboard but not very durable and the switches weren’t great, especially after a few years of use.

The black rectangles on the PCB

A close up to the keyboard's upper left corner; electronic chips and components are shown; colorful question marks are overlaid above two components that look like black rectangles. — We just don’t know.

There are two black rectangles but they’re very different. The small one is the micro-controller, an ATmega32u4. It’s very low-power but still quite powerful. My USB multimeter says it consumes 0.04W even though the code is not very optimized (mostly because we’re using an USB mode that is not very efficient and sends lots of packets, most of which are useless, if I understood correctly).

The other one is a header, or what we call the “Extension Port”! It exposes the whole port B of the ATmega32u4 (8 bidirectional pins with alternative functions), +5V and ground. Among them, a few pins can be used to program the microcontroller via SPI, which was very useful to flash the bootloader.

All the other ports of the microcontroller are used to scan the keyboard, except for one pin that has PWM capability and was labelled “Speaker”. Yes the keyboard might produce some noise eventually!

So many diodes!

A close up on the part of the PCB with all the diodes and pawprints. — The cat walked all over the diode fresco

This is a gamer keyboard! With linear, low-profile, low-operating-force mechanical Kailh switches. It scans the keys 1000 times each second and there is one diode for each key to prevent ghosting!

The pawprints were added for aesthetics, but also for good luck and accuracy. Touch the pawprints after each frag for maximum efficiency. Or after each death in Celeste.

The extension port

We didn’t know what we’d do with it when we put it there but we couldn’t let the unused pins go to waste. Now Kooda wants to make a Bluetooth extension module to make keyboard wireless and that sounds very exciting!

Me? Ever since I got the Gigatron, a little TTL computer with no microprocessor (check it out it’s amazing), I wanted to reproduce one of its killer features: it produces a VGA signal.

A picture of a white LCD monitor on the left and a populated PCB on the right connected to it; between those there is a small Famicom like controller; there is a picture of Jupiter on the screen and the word "Gigatron". — I soldered it myself and that’s why you don’t get a full resolution image.

VGA: old but gold

VGA was first introduced in 1987, and was since extended with signal specifications supporting many video resolutions, from 320×200 all the way up to 1920×1200 at 60Hz (and possibly even bigger ones but I’ve never seen actual hardwarde for those).

The VGA signal is pretty simple, it has essentially 5 wires + ground:

vertical sync (signal level),
horizontal sync (signal level),
red (analog),
green (analog),
blue (analog).

Actually it’s more complicated, each channel should have its own ground, there’s EDID stuff, some more signaling… But providing those 5 signals and a common ground will suffice to illuminate most VGA-compatible monitors. Using the right timings of course.

Timing is key

My ATmega32u4 is fast but not that fast. We’ll try to generate the video mode with the slowest signals: 640×480 at 60Hz.

To generate a VGA signal we must send synchronisation pulses to the screen. One pulse 60 times per seconds for vertical sync (the frame rate) and one pulse for every graphic line for horizontal sync; that’s 31500 times per second. Some lines are not visible: we actually generate 525 lines for 480 visible lines. We’ll send some actual pixels on the color wires between horizontal sync pulses.

A time diagram showing what a VGA signal looks like. — What VGA looks like on the wire when deprived of most of its lines

So how many pixels are there in a second? A line is 800 pixels, only 640 are visible (80%). That means we have: 800×525×60 pixels per seconds. Pixels are being sent out at 25.2MHz! That’s gonna require some serious bit-banging.

Let’s see… My ATmega32u4 runs at… 16MHz. Damn. Even if I send one pixel per cycle I’m not making it.

Analog saves the day

But! VGA is an analog signal, it was made for cathodic screens. Those screens understand sync pulses and lines, but pixels? They don’t know or even care what those are. The electron guns inside will just fire whenever there’s tension on their color channels.

That means we are free to decide the horizontal resolution. How convenient!

Let’s see. Assuming we can produce one pixel with each cycle of our microcontroller, and we have to produce 31500 lines per second; that’s 16000000÷31500≈508 pixels per line. As we only get to see 80% of those pixels, that means we’ll have 406 visible pixels in each visible line.

That’s our target resolution: 406×480 pixels. Not that bad! It’s more than the Gigatron, which runs natively at 6.25MHz. That’s precisely four times less than the pixel clock, so it can only output 640÷4=160 pixels per line.

Hopefully modern LCD monitors will be as tolerant as the good old cathodic monitors!

A prototype!

Let's code! I made a first prototype by going full assembly mode with the ATmega32u4. It’s the most convenient because the timing is very important and I have to count how many cycles each instruction consumes. After choosing the pins I had to do some wiring.

The keyboard's extension port wired to a breadboard with two LEDs and a button, wired to a red Olimex PCB that features a VGA connector. — My spaghetti recipe for wiring a VGA breakout board to the extension port

The LEDs were useful for debugging! Also I put a little button there so I can reset the keyboard to flash a new firmware. I flashed many many firmwares during the development! But the flash is given for 10000 erasure/write cycles so I think I’m still good.

The Olimex VGA breakout board is super handy and was lended to me by Kooda!

Once I got the timings right I managed to get a program with lots of busy-waiting and disabled interrupts to show some vertical lines. Each line is made of:

61 cycles for a H-Sync pulse,
31 cycles H-Sync back porch,
406 cycles of video signal,
10 cycles of H-Sync front porch.

Vertically, we have:

2 lines of V-Sync pulse,
33 lines of V-Sync back porch,
480 lines of video signal,
10 lines of V-Sync front porch.

A Dell monitor showing green vertical lines and the OSD showing the resolution, 720x480, and frequency, 59Hz — A lame Matrix screensaver

That looks great already! Mmh the screen is detecting 720 pixel columns but that’s okay, we’re free to decide horizontal resolutions, so it’s only fair that the screen does it too. Incidentally, with a little rewiring I managed to get a cool amber color like the old amber terminals. Warm and cosy.

Amber vertical lines on a screen, but they're not straight at the top. — Nice color but bad cycle counting

The documentation of the ATmega32u4 was wrong on the cycle count of two instructions (rcall and ret), I had to find the real values in another document. That was not the cause of the misalignment of the vertical lines at the top of the screen in the picture above, that one was just me not counting right.

Showing something meaningful

After writing this prototype, I came to a haunting realization: I cannot possibly output one pixel per cycle.

“Advanced RISC Architecture”, “135 Powerful Instructions – Most Single Clock Cycle Execution”, they say. Well sure most instructions consume 1 cycle but they do very little! However, that’s the spirit of RISC so I can’t blame them.

The Gigatron can pull off such a color show because, although it has only 8 (EIGHT!) native instructions, one of them is very powerful:

ora [Y,X++],OUT

This instruction does 4 things in just one cycle:

Read a byte from memory, at address (Y<<8) | X
Apply a bitwise or to this value with whatever is in the accumulator
Send the result to the output register (it is directly connected to the VGA DAC, which is in fact 8 resistors)
Increment X

In just ONE cycle! That’s the instruction the Gigatron uses 160 consecutive times to bit-bang a line’s pixels to the screen. The ATmega32u4 would use 3 whole cycles to mimic this lone instruction’s effects, assuming we don’t need the bitwise or:

lds r16, X+     # 2 cycles
out PORTB, r16  # 1 cycle

3 cycles per pixel? Mmh where would that get us. The visible part of the line lasts 406 cycles, we would end up with 135 pixels per line. That’s less than the Gigatron, whose pixels are quite chunky already.

A monitor showing a very colorful but very pixelated representation of the Mandelbrot set. A chronometer lies in the center of the screen. — “I like them big, I like them chunky” — Gigatron’s creators probably

A resolution of 135×120 pixels isn’t great but that would be a start. Okay let’s create an array for 135×120 pixels, so 16KB.

The ATmega32u4 has very little RAM

Oooh right only 2.5KB of SRAM. That’s very little. I can only store a black and white image in there (2KB); but now, outputting a pixel is not 3 cycles anymore, and we’re left with so little SRAM that running any slightly useful user program will be a challenge…

Ok so what do we do when we have many pixels but little memory? Why not try a Text Mode! We could have, like, 8×12 pixels characters, and that would leave us with 50×20 characters on screen for a RAM usage of 1000 bytes.

But now it’s even harder to draw a pixel, we have to do lookups and stuff. However, we learn from the Gigatron a nice little trick: working during the scanlines!

The Gigatron spends most of its time generating the VGA signal, but it cleverly manages to execute user programs during scanlines and non-visible lines. That’s a neat trick! We could use it for the benefit of generating the VGA signal, so we get twice the amount of processing time for each graphic line on screen! And we get a stylish oldschool screen effect for free!

Ok so now our target resolution is 400×240. Each even line of the 480 real lines is an empty scanline whose time will be used to prepare the pixels to display in the next line and put them in some buffer array for easy access. But we still can’t display 400 pixels in one line with the lds and out instructions, as we saw that would require 3 cycles per pixel.

Unless we compile our character set to native code!

Hard-coding a character set in AVRe assembly

Every character has 12 lines. Each line of each character will have its own little native code. What we’ll do is that during each scanline, we’ll construct an array of code addresses corresponding to character lines code, and we’ll call these addresses during the rendering, in the next line. Those will be deduplicated and daisy chained, and it’ll look like this:

A schema showing how a line of a character is compiled to assembly. — Majestic

In this example, we set r18 and r19 to values that will respectively set low and high pin 1 of the port B, which will be our (monochrome) signal. We use the out instruction to change the pin’s state, and wait one cycle with nop.

But there’s a catch here. the lds and ijmp at the end, which load the address of the next rendering code to execute and jumps to it, consume 2 cycles each. Hence our characters are not 8 pixels large, but 8+2+2=12 pixels large, they are very spaced out, and we’re left with only 33 columns of text.

Also, we can only load 1 byte at a time with lds, and we put it to ZL (an 8-bit register), but the address ijmp jumps to is inside Z (a 16-bit register), whose value is (ZH<<8) | ZL. Hence, all our character-line rendering code must fit within 256 words of memory, which limits the number of different character-lines we can have in total!

That’s a lot of constraints for a rather meh result.

A constrained character set

It is well known to artists that constraints stimulate creativity. So let’s get inspired and find dubious ways of solving our problems!

First, we can move the lds and put it in the place of two consecutive nop’s. That would reduce the separation between characters on screen down to two cycles (two pixels), the time necessary to execute the ijmp instruction. The constraint created here is that every line of every character needs to have two consecutive idle cycles available to execute the lds during its rendering phase, between each jumps, which means there must always be three consecutive pixels of the same color.

Second, we can reduce the width of our characters from 8 to 6 pixels. This way, we’ll have more columns of text and we’ll have way fewer different character lines to store; that will help us make them fit in the 256 instructions window.

Now we’re back to 50×20 characters on screen, a comfortable character density, with not too much horizontal space between alphanumeric characters, most often 3 pixels, for 5 pixels of actual character pixels.

Good enough I say! Let’s make a MS-DOS Codepage 850 character set within those constraints because that’s what I grew up with!

A character set — I love making little bitmap fonts, it feels like some kind of therapy.

I think it’s a rather cool looking charset; some characters look a little strange due to the 3-consecutive-same-color-pixels rule, but I’d say it adds some charming uniqueness to it! So how does it look like in assembly?

Neat! I made a program to generate the assembly from the bitmap character set file so I could experiment with it. The program detects noncompliances with the constraints, and also counts how many times each character-line is used so I can try and replace the least used ones with other existing ones to lower the number of instructions.

There’s also this little trick: one can use a relative jump (rjmp) to the next address to wait 2 cycles using only one instruction. That saved me 12 instructions!

Fun fact: the last pixel of each character-line is “repeated” during the jump, so graphic characters used to draw boxes and stuff can be created and seamlessly link with the next character!

Oh yeah, It’s all coming together

After long hours of counting cycles and writing code that always runs for the same amount of time, I managed to get a stable signal and this terrific view!

A screen showing amber text in a thick-border box, a list of letters that look alike, and the whole character set. — What a sight!

Oooo sexy. It feels so very reminiscent of the amber terminals I’ve seen in videos on Youtube and for which I got nostalgia by proxy! Now I think about it, my first terminal was a Minitel actually; it had a cold black and white screen. I’ll get hold of one again someday. Anyways!

Wrapping up

What you’ve seen just above is actually more advanced than just a program showing some characters. It’s a very small terminal emulator, and there is a small user program running, and the keyboard is being scanned and pressed keys are sent to the small program. It only echoes keystrokes for now.

What happens here is that while the rendering of the visible part of the screen is still a big uninterruptable process, the H-Sync pulses of non-visible lines are generated in an interrupt connected to a precise timer. This way, I can have a user program run cooperatively besides the VGA signal generation!

Also, I scan 1 of the 9 columns of the keyboard during the interrupt, in the last 9 non-visible lines, so the keyboard is scanned entirely at the end of each frame.

And that’s it for now!

Next, I want to implement an interpreter for a little programming langage now that the terminal emulator is working. I may try making a Forth interpreter!

That’s all folks! Clicky clicky!