Enabling Multi-Processors in my hobby OSLast edited on Oct 19, 2015

I recently added multi-processor support in my homebrew OS. Here are the technical details. BTW: Chapter 8 and 10 of the Intel Manual 3 are probably your best resource.

When the system starts, all but one CPU is halted. We must signal the other CPUs to start. I won't go into the details of how to bootstrap the processor, that step is easy: just go in protected mode then setup paging and jump to long mode. This is very well covered in the Intel manuals.

Basically, this is how we switch to protected mode

    // Before going any further, you must enable the A-20 line. Not covered in this example

    push    %cs     /* remember, cs is 07C0*/
    pop     %ds
    mov     $GDTINFO,%eax
    lgdtl   (%eax)
    mov     %cr0,%eax
    or      $1,%al
    mov     %eax,%cr0   /* protected mode */
    mov     $0x08,%bx

    // far jump to clear cache

     // GDT INFO
    .WORD 0x20
    .LONG . + 0x7C04    /*that will be the address of the begining of GDT table*/

    // GDT
    .LONG 00
    .LONG 00

    // GDT entry 1. Data segment descriptor used during unreal mode
    .BYTE 0xFF
    .BYTE 0xFF
    .BYTE 0x00
    .BYTE 0x00
    .BYTE 0x00
    .BYTE 0b10010010
    .BYTE 0b11001111
    .BYTE 0x00

    // GDT entry 2. Code segment used during protected mode code execution
    .BYTE 0xFF
    .BYTE 0xFF
    .BYTE 0x00
    .BYTE 0x00
    .BYTE 0x00
    .BYTE 0b10011010
    .BYTE 0b11001111
    .BYTE 0x00

    // GDT entry 3. 64bit Code segment used for jumping to 64bit mode.
    // This is just used to turn on 64bit mode. Segmentation will not be used anymore after 64bit code runs.
    // We will jump into that segment and it will enable 64bit. But limit and permissions are ignored,
    // the CPU will only check for bit D and L in this case because when we will jump in this, we will
    // already be in long mode, but in compatibility sub-mode. This means that while in long mode, segments are ignored.
    // but not entiorely. Long mode will check for D and L bits when jumping in another segment and will change
    // submodes accordingly. So in long mode, segments have a different purpose: to change sub-modes
    .BYTE 0xFF
    .BYTE 0xFF
    .BYTE 0x00
    .BYTE 0x00
    .BYTE 0x00
    .BYTE 0b10011010
    .BYTE 0b10101111  // bit 6 (D) must be 0, and bit 5 (L, was reserved before) must be 1
    .BYTE 0x00

This is how we switch to long mode

    // Before going any further, you must  setup paging structures.
    // Not covered in this example since it is very easy and well document
    // in the Intel manuals
    mov     $8,%ax
    mov     %ax,%ds
    mov     %ax,%es
    mov     %ax,%fs
    mov     %ax,%gs
    mov     %ax,%ss

    // set PML4 address
    mov     $PML4TABLE,%eax
    mov     %eax,%cr3

    // Enable PAE
    mov     %cr4,%eax
    or      $0b10100000,%eax
    mov     %eax,%cr4

    // enable long mode
    mov     $0xC0000080,%ecx
    or      $0b100000000,%eax

    //enable paging
    mov     %cr0,%eax
    or      $0x80000001,%eax
    mov     %eax,%cr0
    ljmpl   $0x18,$LONG_MODE_ENTRY_POINT

So at this point, the kernel is running in 64bit long mode.

Detecting the number of CPUs

The first thing to do is to detect the number of CPUs present. This can be done by looking for the "MP floating pointer" structure. It is located somewhere in in the BIOS address space and we must find it. I won't go into the details of the structure since it is very well documented everywhere. The MP structure contains information about the CPUs and IO APIC on the system. This structure is filled in by the BIOS at boot time. The structure can be at many places hence why we must search for it in memory. It starts with "_MP_" and contains a checksum, so by scanning the memory, you will find it. The important thing to know is that you do the following:

  • Find the structure in memory. According to the specs, it can be in a couple of different places.
  • Detect number of CPUs and Local APIC address of CPUs
  • Detect IO APIC address.

For more details on how to find the structure and its format, make a search for "Intel Multi-Processor Specification".

When wandering in the SMP world, you must forget about using the PIC (Programmable Interrupt Controller) The PIC is an old obsolete device anyway. The new way now is the use the APIC. So we won't be using the PIC anymore. There is a notion of a local APIC and the IO APIC. The local APIC is an APIC that is present on each CPU. The local APICs can be use to trigger interrupts from one CPU to another, as a way of communication. When the system starts, all but one CPU is halted. We must signal the other CPUs to start. The PIC could not allow us to do that, hence why we must use the APIC. The local APIC will allow us to trigger an interrupt on the other CPUs to get them out of their halted state.

We must then setup the local APIC for the current CPU. Each CPU have their own APIC and their APIC is mapped at the same address for each CPU. The local APIC address is 0xFEE00000. So when CPU0 read/writes at 0xFEE00000 it is not the same as if CPU1 read/write at 0xFEE00000 since the address maps to each CPU's own APIC. This is nice because it means you dont need to do something like "What CPU am I? number x? ok, then use address xyz then." Each CPU only need to write at the same address and they will be guaranteed to write to their own APIC. It's all transparent so you don't need to worry about it. The address of the IO APIC maps to the same IO APIC for all CPUs though. But that's also good because all CPUs want to use the same IO APIC anyway.

    mov     $APIC_BASE,%rdi
    mov     $(SPURIOUS_INTERRUPT_VECTOR | 0x100), %rax   // OR with enable flag

Then, we start the APs

    #define WAIT(x) push %rcx; mov $x,%rcx; rep nop; pop %rcx;
    #define STALL() 1337: hlt; jmp 1337b;
    #define COUNT_ONES(regx,regy) push %rcx; \
        xor regy,regy; \
        1337:; \
        cmp $0,regx; \
        jz  1338f; \
        inc regy; \
        mov regx,%rcx; \
        dec %rcx; \
        and %rcx,regx; \
        jmp 1337b; \
        1338:; \
        pop  %rcx

    mov     $APIC_BASE,%rdi
    mov     $0xC4500, %rax              // broadcast INIT to all APs
    mov     %eax, APIC_REG_INTERRUPTCOMMANDLOW(%rdi)
    WAIT(100000000)                     //1 billion loop should take more than 10ms on a 4ghz CPU
    mov     $0xC4600, %rax              // broadcast SIPI to all APs
    mov     $SMP_TRAMPOLINE,%rcx
    shr     $12,%rcx
    and     $0xFF,%rcx
    or      %rcx,%rax
    mov     %eax, APIC_REG_INTERRUPTCOMMANDLOW(%rdi)

    mov     STARTEDCPUS,%rbx
    cmp     CPUCOUNT,%rdx
    jz      1f
    mov     %eax, APIC_REG_INTERRUPTCOMMANDLOW(%rdi)
    mov     STARTEDCPUS,%rbx
    cmp     CPUCOUNT,%rdx
    jz      1f
    //CPUs are not all started. should do something about that

The SMP_TRAMPOLINE constant is the address of where I want the APs to jump to when starting. This address must be aligned on a 4k boundary because we the SIPI message takes the page number as a parameter. Hence why I SHR the address by 12 (div by 4096). And since the APs will start in 16bit mode, the address must reside under the 1meg barrier. STARTEDCPUS is a 64bit bitfield that represents the CPUs. Each bit get set by the APs (cpuX sets bit X).

Application processors trampoline code

I decided to put the Application Processor's trampoline code in the bootloader (I've got 512bytes of room, that should be enough). The bootloader is a good decision beacause it is below the 1meg mark, the source file is compiled as 16bit code and all the initialisation is done there anyway. But when an AP starts, it will be given a start address aligned on a 4k page boundary and the bootloader is at 0x7C00. So the bootloader will copy a "jmp" at 0x1000 to jump to the bootloader AP init function. So the order of execution is:

  • AP receives SIPI with vector 0x01
  • AP jumps to 0x1000
  • Code at 0x1000 will make AP jump to 0x7C0:
  • AP will switch protected mode and jump to KernelMain
  • KernelMain will check in MSR[0x1B] if this is an AP or the BST. if BST, then jump to normal initialisation
  • setup the temporary stack for the AP's thread of exeuction: 0x70000+256*APIC_ID (256 bytes stacks)
  • enable long mode (64 bit)
  • set CPU started flag in global variable: STARTEDCPU = STARTEDCPU | (1<

    So now I have multiple processor ready for work. The next step is to make a SMP compatible scheduler and start using the IO-APIC. I'll cover that another time.

New Home Automation systemLast edited on Aug 5, 2015


During the past months, I've been working on my Home Automation System. I did some major refactoring and moved away from the rPi. I am now running my home automation software on a x86-64 server.

The projects is hosted on github at https://github.com/pdumais/dhas.

The system uses a modular architecture now. I like to think of DHAS as a hub for different technologies. Since my home automation system is mix of Insteon devices, ethernet relays and temperature sensors, IP phones, and more, I made a central software that can interface with all those technologies. Each module is responsible for interfacing with one technology. on github, you will find these modules under the src/module folder. Everytime I need to add a new kind of module, I just create the class and it automatically gets instanciated and used. The modules register their own REST callbacks and the REST engine being self-documenting will show the newly added functions when querying the help API. This way, the system doesn't know anything about its modules. It only knows that it is running modules. So adding new modules becomes very easy because the work is isolated from the rest of the system.

The most simple module to look at is the Weather Module.


Since I am not using the rPi anymore, I needed to find a way to get GPIOs on the server. So I bought an Arduino Leonardo and made a very simple firmware that presents the device as a CDC device using the LUFA library. The source code for the firmware is here:


#include "usblib.h"

#define STABLE 100

uint8_t currentData = 0;

Sent byte to host:
bit     Arduino pin     AVR Pin
0           2               PD1
1           4               PD4
2           7               PE6
3           8               PB4
4           12              PD6

    'a' -> arduino 10 -> PB6   ; 'a' = ON, 'A' = off
    'b' -> arduino 11 -> PB7   ; 'b' = ON, 'B' = off
to test the live stream: cat /dev/ttyACM0 | xxd -c1

void setCurrentData(uint8_t data)
    currentData = data;

void sendCurrentData()

int main(void)
    uint8_t pb,pd,pe;
    uint8_t newData = 0;
    uint8_t lastData = 0;
    char stabilizerCount = -1;


    // Set pins as pullups
    PORTB |= ((1<<4));
    PORTD |= ((1<<1)|(1<<4)|(1<<6));
    PORTE |= ((1<<6));
    // Set pins as input
    DDRB &= ~((1<<4));
    DDRD &= ~((1<<1)|(1<<4)|(1<<6));
    DDRE &= ~((1<<6));

    // set PB6 and PB6 as output
    DDRB |= (1<<6)|(1<<7);
    PORTB |= (1<<6)|(1<<7); // initially off (high = off)

    uint8_t receivedChar;
    while (1)
        pb = PINB;
        pd = PIND;
        pe = PINE;

        newData = ~(((pd>>1)&1)|(((pd>>4)&1)<<1)|(((pe>>6)&1)<<2)|(((pb>>4)&1)<<3)|(((pd>>6)&1)<<4));
        newData &= 0x1F; // clear 3 top bits since we don't use them

        if (GetCDCChar(&receivedChar))
            if (receivedChar == '?')
            else if (receivedChar == 'A')
                PORTB |= (1<<6);
            else if (receivedChar == 'B')
                PORTB |= (1<<7);
            else if (receivedChar == 'a')
                PORTB &= ~(1<<6);
            else if (receivedChar == 'b')
                PORTB &= ~(1<<7);

        // debounce
        if (lastData != newData)
            stabilizerCount = STABLE;
        if (stabilizerCount>0) stabilizerCount--;
        if (stabilizerCount==0)
            if (currentData != newData)



#include "usblib.h"
#include "Descriptors.h"

USB_ClassInfo_CDC_Device_t VirtualSerial_CDC_Interface =
.Config = {
    .ControlInterfaceNumber = INTERFACE_ID_CDC_CCI,
    .DataINEndpoint = {
        .Address = CDC_TX_EPADDR,
        .Size = CDC_TXRX_EPSIZE,
        .Banks = 1,
    .DataOUTEndpoint = {
        .Address = CDC_RX_EPADDR,
        .Size = CDC_TXRX_EPSIZE,
        .Banks = 1,
    .NotificationEndpoint = {
        .Banks = 1,

void CDCWork()

uint8_t GetCDCChar(uint8_t* data)
    int16_t r = CDC_Device_ReceiveByte(&VirtualSerial_CDC_Interface);
    if (r >= 0)
        *data = r;
        return 1;
    return 0;

void SendCDCChar(uint8_t data)
    CDC_Device_SendByte(&VirtualSerial_CDC_Interface, data);

void InitCDC()
    MCUSR &= ~(1 << WDRF);

void EVENT_USB_Device_Connect(void)

void EVENT_USB_Device_Disconnect(void)

void EVENT_USB_Device_ConfigurationChanged(void)
    bool ConfigSuccess = true;
    ConfigSuccess &= CDC_Device_ConfigureEndpoints(&VirtualSerial_CDC_Interface);

void EVENT_USB_Device_ControlRequest(void)

void EVENT_CDC_Device_LineEncodingChanged(USB_ClassInfo_CDC_Device_t* const CDCInterfaceInfo)

Installing and using Docker on SlackwareLast edited on Jun 6, 2015


Imagine you want an asterisk system complete with mariadb and apache but you don't want to install all that on your day-to-day system. You could create a VM. but vms are heavy and have quite some overhead. What about backups? you can't copy live running image (not with qemu at least).

Enter Docker. Docker makes this really simple. Docker encloses an environment like a chroot would. But it acts more like a vm. With docker, the equivalent of a vm is a container. You create your filesystem and install the needed software in your container and run it. But Docker lets you run 1 command only. You start the container, the command executes, in its environment, and the container stops. All this using the host's kernel but separated with namespaces and cgroups. Nothing prevents you from running a script as the command. So you make a script that starts httpd and asterisk then load bash. the container will run as long as bash doesn't exit. so if you attach your session on the container and "exit" bash, the container will stop amd asterisk and httpd will shutdown. The container does not start "init" (unless it's the command you chose to invoke) so it's not like an entire OS is brought up. Only the command you run is ran in the container's FS but with the host's kernel

Docker allows you export running containers to a tarball. the tarball contains the entire FS of the container (not the memory) You can then import it anywhere where Docker runs.

Installing docker on slackware

These are the instructions for installing docker 1.7.0 on slackware 14.1 with kernel 3.11.1. I also tried with slackware 14.0 but the rc.S script does not mount the cgroup hierarchy properly So I modified it to do it like in 14.1 I could not get it to work with kernel 3.10.17 and did not bother to troubleshoot since I had a 3.11.1 on hand. 1.7.0-rc1 did not work for me. I needed commit 6cdf8623d52e7e4c5b5265deb5f5b1d33f2e6e95 in. So I cloned the bleeding edge from git but then 2h later rc2 came out.


Before anything, you should make sure that your kernel is compiled with


I actually made a script to enable all those settings

     "NF_NAT_IPV4" \
     "CONFIG_VETH" \

for i in ${CFG[*]};
    sed -i "/^# *$i/c\\$i=y" $1

Maybe some other flags need to be set on your kernel, but these were all the ones I was missing. There is a uitlity that you can download to check those settings: https://github.com/docker/docker/blob/master/contrib/check-config.sh

Download, compile, and prepare the environment

At first, I tried downloading the binaries but docker was complaining about "Udev sync is not supported". I found out that was because the binary is statically link and it causes some problems that I didn't care to look into. So I opted for building from source. The first step is to get "go". I didn't want to leave this on my system so I just put it in a temporary place and then deleted it.

wget https://storage.googleapis.com/golang/go1.4.2.linux-amd64.tar.gz
tar -zxf go1.4.2.linux-amd64.tar.gz
mv go /opt

#You should make that permanent if you intend to keep Go after building Docker

# download docker source.
wget https://github.com/docker/docker/archive/v1.7.0-rc2.tar.gz
tar -zxf v1.7.0-rc2.tar.gz
cd docker-1.7.0-rc2

#docker won't build because there is a header file that won't be found. There is a patch
#for that but let's do it manually here:
sed -i "/ioctl\.h/c\#include \r\n#include \r\n#include \r\n#include \r\n#include \r\n" daemon/graphdriver/btrfs/btrfs.go

#note that DOCKER_GIT_COMMIT needs to match the version you have downloaded
GOROOT=/opt/go AUTO_GOPATH=1 DOCKER_GITCOMMIT="395cced " ./hack/make.sh dynbinary
cp bundles/1.7.0-rc2/dynbinary/docker-1.7.0-rc2 /usr/sbin/docker
cp bundles/1.7.0-rc2/dynbinary/dockerinit-1.7.0-rc2 /usr/sbin/dockerinit

#remove go... or not. It's up to you. If you leave it there, you might want to permanently add it to your PATH
rm -Rf /opt/go

Prepare network bridge

In my case, since I was already using KVM/qemu, I had a bridge alrady setup. But this is what would be needed

#create bridge
brctl addbr br0
ifconfig eth0 down
ifconfig br0 netmask broadcast up

# add eth0 as member of the bridge and bring it up.
brctl stp br0 off
brctl setfd br0 1
brctl sethello br0 1
brctl addif br0 eth0
ifconfig eth0 promisc up
# setup default gateway.
route add default gw

You might want that last example to run at boot time. There is a way to setup a bridge with the init scripts but I just added those lines in rc.local before launching the docker daemon.

There is currently no easy way to assign a static IP to your container. Docker will choose an IP in the range of you bridge. But this isn't perfect. It seems to pick some addresses that are already used on my network. Issue 6743 on githubis opened for that. But for the time being, I've hacked the code to make this possible. I won't create a pull request since they are already working on a more elegant solution. But meanwhile, you can download my fork if you need it. My fork on github.. That repo contains the patch to build on slackware and also adds a "--ipv4-adress=A.B.C.D" option to docker run.

Auto start

Finally, you should add this in your rc.local script.

/usr/sbin/docker -d -b br0&

Building a container, running it and doing backups

My use case

What I'm looking to do is to isolate my home automation services in one container that I can easily transport from one computer to another (or even to a VM). In case I get a hardware failure, I want to reduce the downtime of my house services. Those include Asterisk, httpd, MariadDB, CouchDB, DHAS, cron jobs, and some more. I want to be able to make a daily backup of the container and always be able to launch it from somehwere else where Docker is installed.

Creating a container

docker run vbatts/slackware:14.1 -ti /bin/bash

That command will create a container from a base image "slackware 14.1" from docker hub. The image will be downloaded automatically. Then bash will be invoked. the -t -i flags will give you an interactive TTY attached so you will be able to interact with bash. From there, install whatever you need in the container. Download gcc, install it, etc. Once you are done, exit bash. By exiting bash, the container stops. You can now commit your changes to a new base image

docker commit ContainerID awesomeNewImage

Now you have a base image of your own that you can share with other people. Now create another container for your real use case from the base image.

docker run -tid --restart=always awesomeNewImage /root/start.sh

The -d flag runs the contain in the background. You can access it using "docker attach". --restart=always will make the container restart automatically when you exit it and when the docker daemon starts (after a host reboot for example). When you were setting up you base image, you could have created a start.sh script that invokes asterisk, httpd, mysqld and couchdb then bash. To detach from the container without stopping it (leaving your command running) you can Ctrl-P, Ctrl-Q.


An easy way to backup is to regularly export the container (Cron job?).

docker export -o backup.tar 

This will create a tarball container the entire filesystem of your container. Then to restore, either locally or on some other machine runner docker:

cat backup.tar | docker import - restoredbackup

System On A Chip emulatorLast edited on Apr 9, 2015

I was looking at some online NES emulators the other day and it gave me an idea. What If I created my own emulator but for my own architecture instead of emulating an existing one? Of course that would be easier because I can decide what the architecture looks like instead of having to go through the specs of the platform I want to emulate. So I decided to create my own instructions with an assembler and a disassembler. The assembler converts the assembly code to a binary format (obviously) and the result can be downloaded. Again, this is my own virtual assembly and my own architecture so the instructions are encoded in my own format. Each instructions are (inneficiently)encoded on 64bit including a references to 2 operands, 1 immediate 32bit value and a condition code. Each instruction is conditional.

The emulator can be found at http://www.dumaisnet.ca/vm
The documentation about the architecture and the instruction set: Documentation

Since it is written in 100% client-side javascript, it was very easy to reuse the code and make command-line tools using node.js. I've created 3 tools:

  • An assembler: convert your source code to binary
  • A disassembler: convert binary file to assembly.
  • A Simulator: execute a binary

You can find the project on github at hrefhttps://github.com/pdumais/patasm

Using Jssip and asterisk to make a webphoneLast edited on Apr 7, 2015

Building a web-based phone is easy enough with asterisk and jssip. At the time of writing this, I was using asterisk 11.6 (LTS) and jssip 0.6.21.

My web application is hosted on a local webserver that resides on the same server as asterisk. Because of that, it is not possible to tell asterisk to bond on port 80 or 443 for its internal websocket server. So I've configured asterisk to bond on port 8088. But having a webapp that wants to communicate on such a port will not work if the clien is behind a firewall that blocks outside access to non standard ports. So I've configured Apache to proxy websocket connections to a chosen URL to the asterisk server on the local host.

Configuring asterisk

step-by-step instructions:

  • Make sure res_srtp is loaded in asterisk (you may need to install libsrtp and rebuild asterisk). This is absolutely needed for webrtc to work.
  • Enable websockets in asterisk http.conf
  • Make a proxy entry in apache for wss://www.dumaisnet.ca/pbxws -> ws://
    <VirtualHost *:443>
    SSLEngine On
    ServerName www.dumaisnet.ca
    <Location /pbxws>
        ProxyPass ws:// nocanon
        ProxyPassReverse ws://
  • Generate DTLS certificates for asterisk
    mkdir /folder.to.keys
    cd <asterisk src directory>/contrib/scripts
    ./ast_tls_cert -C wwwdumaisnet.ca -O "Dumaisnet" -d /folder.to.keys
  • Configure a SIP peer in sip.conf

Building the SIP client

jssip is very easy to use. The following example should register to your server and automatically answer an incoming call