org.htmlparser.scanners

Class ScriptDecoder


public class ScriptDecoder
extends Object

Decode script. Script obfuscated by the Windows Script Encoder provided by Microsoft, is converted to plaintext. This code is based loosely on example code provided by MrBrownstone with changes by Joe Steele, see scrdec14.c.

Field Summary

static int
LAST_STATE
The state to enter when decrypting is complete.
protected static int
STATE_CHECKSUM
State when reading the checksum.
protected static int
STATE_DECODE
State while decoding.
static int
STATE_DONE
Termination state.
protected static int
STATE_ESCAPE
State when reading an escape sequence.
protected static int
STATE_FINAL
State while exiting.
static int
STATE_INITIAL
State on entry.
protected static int
STATE_LENGTH
State while reading the encoded length.
protected static int
STATE_PREFIX
State when reading up to decoded text.
protected static int[]
mDigits
The base 64 decoding table.
protected static byte[]
mEncodingIndex
Table of lookup choice.
protected static char[]
mEscaped
The escaped characters corresponding to the each escape sequence.
protected static char[]
mEscapes
Escape sequence characters.
protected static char[]
mLeader
The leader.
protected static char[][]
mLookupTable
Two dimensional lookup table.
protected static char[]
mPrefix
The prefix.
protected static char[]
mTrailer
The trailer.

Method Summary

static String
Decode(Page page, Cursor cursor)
Decode script encoded by the Microsoft obfuscator.
protected static long
decodeBase64(char[] p)
Extract the base 64 encoded number.

Field Details

LAST_STATE

public static int LAST_STATE
The state to enter when decrypting is complete. If this is STATE_DONE, the decryption will return with any characters following the encoded text still unconsumed. Otherwise, if this is STATE_INITIAL, the input will be exhausted and all following characters will be contained in the return value of the Decode() method.

STATE_CHECKSUM

protected static final int STATE_CHECKSUM
State when reading the checksum.
Field Value:
6

STATE_DECODE

protected static final int STATE_DECODE
State while decoding.
Field Value:
4

STATE_DONE

public static final int STATE_DONE
Termination state.
Field Value:
0

STATE_ESCAPE

protected static final int STATE_ESCAPE
State when reading an escape sequence.
Field Value:
5

STATE_FINAL

protected static final int STATE_FINAL
State while exiting.
Field Value:
7

STATE_INITIAL

public static final int STATE_INITIAL
State on entry.
Field Value:
1

STATE_LENGTH

protected static final int STATE_LENGTH
State while reading the encoded length.
Field Value:
2

STATE_PREFIX

protected static final int STATE_PREFIX
State when reading up to decoded text.
Field Value:
3

mDigits

protected static int[] mDigits
The base 64 decoding table. This array determines the value of decoded base 64 elements.

mEncodingIndex

protected static byte[] mEncodingIndex
Table of lookup choice. The decoding cycles between three flavours determined by this sequence of 64 choices, corresponding to the first dimension of the lookup table.

mEscaped

protected static char[] mEscaped
The escaped characters corresponding to the each escape sequence.

mEscapes

protected static char[] mEscapes
Escape sequence characters.

mLeader

protected static char[] mLeader
The leader. The prefix to the encoded script is #@~^nnnnnn== where the n are the length digits in base64.

mLookupTable

protected static char[][] mLookupTable
Two dimensional lookup table. The decoding uses this table to determine the plaintext for characters that aren't mEscaped.

mPrefix

protected static char[] mPrefix
The prefix. The prfix separates the encoded text from the length.

mTrailer

protected static char[] mTrailer
The trailer. The suffix to the encoded script is nnnnnn==^#~@ where the n are the checksum digits in base64. These characters are the part after the checksum.

Method Details

Decode

public static String Decode(Page page,
                            Cursor cursor)
            throws ParserException
Decode script encoded by the Microsoft obfuscator.
Parameters:
page - The source for encoded text.
cursor - The position at which to start decoding. This is advanced to the end of the encoded text.
Returns:
The plaintext.
Throws:
ParserException - If an error is discovered while decoding.

decodeBase64

protected static long decodeBase64(char[] p)
Parameters:
p - Six base 64 encoded digits.
Returns:
The value of the decoded number.

HTML Parser is an open source library released under LGPL. SourceForge.net