WWW.DUMAIS.IO

ARTICLES

OVERLAY NETWORKS WITH MY SDN CONTROLLER SIMPLE LEARNING SWITCH WITH OPENFLOW INSTALLING KUBERNETES MANUALLY WRITING A HYPERVISOR WITH INTEL VT-X CREATING YOUR OWN LINUX CONTAINERS VIRTIO DRIVER IMPLEMENTATION NETWORKING IN MY OS ESP8266 BASED IRRIGATION CONTROLLER LED STRIP CONTROLLER USING ESP8266.OPENVSWITCH ON SLACKWARE SHA256 ASSEMBLY IMPLEMENTATION PROCESS CONTEXT ID AND THE TLB THREAD MANAGEMENT IN MY HOBBY OS ENABLING MULTI-PROCESSORS IN MY HOBBY OS NEW HOME AUTOMATION SYSTEM INSTALLING AND USING DOCKER ON SLACKWARE SYSTEM ON A CHIP EMULATOR USING JSSIP AND ASTERISK TO MAKE A WEBPHONE C++ WEBSOCKET SERVER SIP ATTACK BANNING BLOCK CACHING AND WRITEBACK BEAGLEBONE BLACK BARE METAL DEVELOPEMENT ARM BARE METAL DEVELOPMENT USING EPOLL MEMORY PAGING IMPLEMENTING HTTP DIGEST AUTHENTICATION STACK FRAME AND THE RED ZONE (X86_64)AVX/SSE AND CONTEXT SWITCHING HOW TO ANSWER A QUESTION THE SMART WAY.REALTEK 8139 NETWORK CARD DRIVER REST INTERFACE ENGINE CISCO 1760 AS AN FXS GATEWAY HOME AUTOMATION SYSTEM EZFLORA IRRIGATION SYSTEM SUMP PUMP MONITORING BUILDING A HOSTED MAILSERVER SERVICE I AM NOW HOSTING MY OWN DNS AND MAIL SERVERS ON AMAZON EC2 DEPLOYING A LAYER3 SWITCH ON MY NETWORK ACD SERVER WITH RESIPROCATE C++ JSON LIBRARY IMPLEMENTING YOUR OWN MUTEX WITH CMPXCHG WAKEUPCALL SERVER USING RESIPROCATE FFT ON AMD64 CLONING A HARD DRIVE CONFIGURING AND USING KVM-QEMU USING COUCHDB INSTALLING COUCHDB ON SLACKWARE NGW100 MY OS AND EDXS/LSE NGW100 - MY OS ASTERISK FILTER APPLICATION CISCO ROUTER CONFIGURATION AASTRA 411 XML APPLICATION SPA941 PHONEBOOK SPEEDTOUCH 780 DOCUMENTATION AASTRA CONTACT LIST XML APPLICATION AVR32 OS FOR NGW100 ASTERISK SOUND INJECTION APPLICATION NGW100 - DIFFERENT PROBLEMS AND SOLUTIONS AASTRA PRIME RATE XML APPLICATION SPEEDTOUCH 780 CONFIGURATION USING COUCHDB WITH PHP AVR32 ASSEMBLY TIP AP7000 AND NGW100 ARCHITECTURE AASTRA WEATHER XML APPLICATION NGW100 - GETTING STARTED AASTRA ALI XML APPLICATION

SHA256 ASSEMBLY IMPLEMENTATION

2015-12-17

Assembly implementation

For a moment now, I've been wanting to try the intel AVX instructions. So I decided to write an SHA256 function in pure x86-64 assembly. That algorithm might not be the best thing to parallelize though but it still was fun to do.

Update dec 16 2015: I have modified my code to use AVX2 instructions. I recently bought a Intel i5-6400 which supports a lot of new instructions I didn't have before. So I modified the algorithm to use bleeding-edge instructions.

Performances

The use of AVX instructions in this algorithm might not give better performances. The only reason I did this was because I wanted to play with AVX. Using AVX2 probably helps a lot more since a lot of the AVX instructions are eliminated.

In fact, I'm not entirely convinced that using AVX will benefit a lot. One thing to consider here, is that for small hashes it could be slower. The thing is that when using AVX, you are using the xmm/ymm registers. During a context switch, the OS does not automatically save/restore the state of the AVX registers. Those are lazy-saved. Meaning that, without going too much into the details, the CPU will only save/restore the AVX registers if it detects that the current thread is using them (using a dirty flag and an exception). Such a save/restore is crazy expensive. So introducing the usage of AVX registers in a thread will cost a lot for the context switch, yielding less processing time for the thread. So the thing to consider is: will the thread use the AVX instructions enough to overcome the cost of the context switch?