WWW.DUMAIS.IO
ARTICLES
OVERLAY NETWORKS WITH MY SDN CONTROLLERSIMPLE LEARNING SWITCH WITH OPENFLOWINSTALLING KUBERNETES MANUALLYWRITING A HYPERVISOR WITH INTEL VT-X CREATING YOUR OWN LINUX CONTAINERSVIRTIO DRIVER IMPLEMENTATIONNETWORKING IN MY OSESP8266 BASED IRRIGATION CONTROLLERLED STRIP CONTROLLER USING ESP8266.OPENVSWITCH ON SLACKWARESHA256 ASSEMBLY IMPLEMENTATIONPROCESS CONTEXT ID AND THE TLBTHREAD MANAGEMENT IN MY HOBBY OSENABLING MULTI-PROCESSORS IN MY HOBBY OSNEW HOME AUTOMATION SYSTEMINSTALLING AND USING DOCKER ON SLACKWARESYSTEM ON A CHIP EMULATORUSING JSSIP AND ASTERISK TO MAKE A WEBPHONEC++ WEBSOCKET SERVERSIP ATTACK BANNINGBLOCK CACHING AND WRITEBACKBEAGLEBONE BLACK BARE METAL DEVELOPEMENTARM BARE METAL DEVELOPMENTUSING EPOLLMEMORY PAGINGIMPLEMENTING HTTP DIGEST AUTHENTICATIONSTACK FRAME AND THE RED ZONE (X86_64)AVX/SSE AND CONTEXT SWITCHINGHOW TO ANSWER A QUESTION THE SMART WAY.REALTEK 8139 NETWORK CARD DRIVERREST INTERFACE ENGINECISCO 1760 AS AN FXS GATEWAYHOME AUTOMATION SYSTEMEZFLORA IRRIGATION SYSTEMSUMP PUMP MONITORINGBUILDING A HOSTED MAILSERVER SERVICEI AM NOW HOSTING MY OWN DNS AND MAIL SERVERS ON AMAZON EC2DEPLOYING A LAYER3 SWITCH ON MY NETWORKACD SERVER WITH RESIPROCATEC++ JSON LIBRARYIMPLEMENTING YOUR OWN MUTEX WITH CMPXCHGWAKEUPCALL SERVER USING RESIPROCATEFFT ON AMD64CLONING A HARD DRIVECONFIGURING AND USING KVM-QEMUUSING COUCHDBINSTALLING COUCHDB ON SLACKWARENGW100 MY OS AND EDXS/LSENGW100 - MY OSASTERISK FILTER APPLICATIONCISCO ROUTER CONFIGURATIONAASTRA 411 XML APPLICATIONSPA941 PHONEBOOKSPEEDTOUCH 780 DOCUMENTATIONAASTRA CONTACT LIST XML APPLICATIONAVR32 OS FOR NGW100ASTERISK SOUND INJECTION APPLICATIONNGW100 - DIFFERENT PROBLEMS AND SOLUTIONSAASTRA PRIME RATE XML APPLICATIONSPEEDTOUCH 780 CONFIGURATIONUSING COUCHDB WITH PHPAVR32 ASSEMBLY TIPAP7000 AND NGW100 ARCHITECTUREAASTRA WEATHER XML APPLICATIONNGW100 - GETTING STARTEDAASTRA ALI XML APPLICATION

SHA256 ASSEMBLY IMPLEMENTATION

2015-12-17

Assembly implementation

For a moment now, I've been wanting to try the intel AVX instructions. So I decided to write an SHA256 function in pure x86-64 assembly. That algorithm might not be the best thing to parallelize though but it still was fun to do.

Update dec 16 2015: I have modified my code to use AVX2 instructions. I recently bought a Intel i5-6400 which supports a lot of new instructions I didn't have before. So I modified the algorithm to use bleeding-edge instructions.

Performances

The use of AVX instructions in this algorithm might not give better performances. The only reason I did this was because I wanted to play with AVX. Using AVX2 probably helps a lot more since a lot of the AVX instructions are eliminated.

In fact, I'm not entirely convinced that using AVX will benefit a lot. One thing to consider here, is that for small hashes it could be slower. The thing is that when using AVX, you are using the xmm/ymm registers. During a context switch, the OS does not automatically save/restore the state of the AVX registers. Those are lazy-saved. Meaning that, without going too much into the details, the CPU will only save/restore the AVX registers if it detects that the current thread is using them (using a dirty flag and an exception). Such a save/restore is crazy expensive. So introducing the usage of AVX registers in a thread will cost a lot for the context switch, yielding less processing time for the thread. So the thing to consider is: will the thread use the AVX instructions enough to overcome the cost of the context switch?