Everything Is Byte
Author: +mala
Abstract
Some of you have waited much, much time to read this file;
many, on the other side, just didn’t mind. For the first ones,
I must apologize and say that I have no excuses - I just lost
my will to finish this file and spent months while trying to find
it again. For the others, I hope that this text will teach you
something you didn’t know and that maybe you will be with
the first ones next time.
One more thing: the idea of patching an exe with an image
editor is NOT mine. I just found it on the web some months
ago and I really don’t remember the address... but I wanted
you to know that someone else had the original idea. If you
happen to find that page, let me know and I will add that
address to this tute.
Keywords: Everything Is Byte; Executable Images; Finding
File Structures; File Chaos; Patching with Psp
Contents
I. INTRODUCTION
A. Everything Is Byte
Everything is byte. Of course, this won’t sound SO strange
to most of you: after all, everything which resides on a
computer’s hd, whether it is a sound, a movie or this plain
text file, must be first converted to binary format. This takes
us to some less obvious considerations: if everything shares
the same format, why do I run some files while I play others?
Can I read an executable? Can I listen to an image? The
answers are, respectively: because there is something which
tells the system what to do; of course you can; of course you
can... have you ever tried to write, at the linux prompt, cat
/usr/bin/netscape ¿/dev/dsp ? :)
B. Structure And Chaos: Headers Vs. Extensions
Now, what makes the OS understand what kind of file it is
dealing with? Well, there are different ways: the most buggy
one, for instance, is just looking at the file extension; a better
one, instead, is giving a look at the file header or at a particular
sequence of bytes which (almost always) exactly identifies the
file type. Guessing which one is used by Windows is left as
an exercise to the reader ;)
Just look at this example: on your windows disk (if you have
one), find all the files that have ”.jar” extension; then, copy
one of them in another place and rename it as ”.zip”. Now
double click on it and - voila’ - it opens correctly as a ZIP
file! Then, copy a file called c:windowssystemshdoclc.dll in
another place and rename it as shdoclc.html... heh... if you
double click it (I cannot assure it won’t damage your system
O:-) some strange things will happen!
Why? What happened?
In the first case, JAR files are nothing more than ZIP files
with another extension: so, since Windoze recognizes files
according to their extensions, it’s not able to open it unless
you change its filename in whatever.zip. In the second case,
shdoclc.dll contains some html code to generate different
pages, but is NOT an HTML document: it’s an executable,
so if you open it as a .html you will see some strange codes...
and a strange browser behavior, since it’s parsing the content
of different html pages all pasted together.
As you may understand, this method is quite buggy, because
it doesn’t let you really understand what you’re working with.
The worst case happens when some viruses copy themselves
by mail as an attachment with a double extension (such as
.txt.com or .mp3.pif): if you’ve left the ”hide extension for
know filetypes” option active, you might not notice they’re
executables and run them with a double click. In other cases,
instead, this limitation in extension recognition may be useful
for us, as you’ll see later.
What can you do to be sure you’re correctly identifying a
file? Even if in some cases you CAN’T be sure, you can have
better chances by using some tools called ”file analyzers”,
running both under Windoze and Linux. The first ones can be
downloaded from the ”utils” section at
http://www.programmerstools.org/
while under Linux you have the great FILE command, which
will be described in detail in the next section.
II. FINDING STRUCTURE IN A FILE
A. The FILE Command
”file” (yes, the right command name is all lowercase!) is
a great unix file analyzer which, instead of just looking at
name extensions, does various tests on filesystem, file data
and (if data is text) language. The ”data” test is the most
interesting for us: during this test, FILE searches for particular
data sequences (called ”magic numbers”) inside the files to
understand their type. Even if it isn’t perfect, it’s still a very
good tool and because of its structure it’s the best one if you
want to learn how file recognition works. If you type ”man
file”, but still better ”man magic”, you will easily learn to look
inside the configuration file (called ”magic”) and understand
file types even without using any program!
Magic file format (under my Debian it’s located in
/usr/share/misc/magic, but you can even find it online search-
ing google for ”/usr/share/misc/magic” AND ”177ELF”) is
quite easy: every line is made up of the following fields
• OFFSET
This is a number specifying the offset, in bytes, of
the data which is to be tested inside the file. It can
be preceded by one or more \verb”¿”——, which
indicate the level of the test: if there are no ">" the test
is level 0 and only if it succeeds tests of level 1 (one
">") are evaluated, followed by level 2 tests (">>")
and so on. In a test whose level is higher than 1 you can
also find the character ”&” before the offset: this means
you don’thave to consider the offset as absolute, but as
relative to the offset of the preceding higher level test.
Here’s a little example: if you give a look at ELF
section inside magic file, you’ll be presented with the
following
0 string 177ELF ELF
>4 byte 0 invalid class
>4 byte 1 32-bit
...
NOTE: 177 is the value of a byte IN OCTAL (0x7F,
127 dec).
This means: if the file starts with byte 0x7F, followed by
the string ”ELF”, then it’s an ELF file; then, if at offset
4 it has a byte whose value is 1, then it’s a 32-bit ELF,
while if it’s 0 it’s an invalid class ELF file.
2
• TYPE
From the previous example you have seen how the
TYPE field is used yet: it just contains the type of the
data to be tested. The possible values are
byte: a one-byte value
string: a string of bytes
short, beshort, leshort: a two-byte value (on most
systems) in this machine’s native byte order, in big-
endian order (be-) ora in little-endian order (le-)
long, belong, lelong: a four-byte value (on most
systems) in this machine’s native byte order, in big-
endian order (be-) ora in little-endian order (le-)
date, bedate, ledate: a four-byte value (on most
systems) in this machine’s native byte order, in big-
endian order (be-) ora in little-endian order (le-), which
is interpreted as a UNIX date
• TEST
This is the value to be compared with the value from
the file. If the type is numeric, the value is specified in
C form; if it’s a string, it is specified as a C string with
the usual escapes permitted (such as n for new line).
On the test value, depending on its type, you can apply
some operators, such as =, (which work for numbers and
strings), & and ˆ(AND and NOT, which work only with
numbers and require some bits to be set or not). Give a
look at the man page for a more detailed explaination.
• MESSAGE
This is the message to be printed if the test succeeds. If
the string contains a printf format specification (such as
”%s”), the value from the file is printed using the message
as the format string
Here are just some of the things that you could notice after
reading the magic file:
• first of all, there is in fact something that lets your com-
puter know what file it’s dealing with: data themselves,
with particular values and in particular positions inside
the file, can identify the file type and let you discover
many other info (just see all that ">>" stuff).
• in most of the cases, the identifying bytes are at the be-
ginning of the file, but sometimes important information
is NOT necessarily in the header. And that would not be
so interesting if it wasn’t real for ZIP files too...
B. About Zip Files
The most interesting detail about ZIP files is that they keep
information about their packed files in the LAST bytes of the
zipped archive: this means that you can add whatever you want
or make slight changes at their beginning and many programs,
such as Winzip under Windoze or unzip under linux, will open
them without any problems. Note that FILE utility, instead,
will not recognize them anymore: the line
0 string PK 03 04 Zip archive data
inside magic means that it just checks for the first four bytes,
which means that if you change them with, for instance, ”ZZ”,
FILE won’t recognize the zip anymore, while other programs
will be able to open it anyway.
After all, this is not a great limit: FILE also gives you the
chance to use indirect offsets, which could help in tasks of
this kind... and it’s always possible to change the sources, so
that you can make it recognize offsets starting from wherever
you wish and not just from the beginning of the file. This, of
course, is left as an exercise to the reader... :-)
C. About Image Formats
As you’ve seen, ZIP utils don’t mind if you append anything at
the beginning of their files. On the other side, there are some
image formats which don’t mind if you append something at
their end: this is because, inside their header, image width and
height are specified, so all the exceeding bytes are ignored.
This works, for instance, for .gif and .jpg images... and joined
with ZIP’s property this means that you can append a JPG and
a ZIP together (the image first, the archive last) and, under
Windoze, open the first or the second one just by changing
the file extension!
D. Dumping
If you want to study a file, you should have a tool which
lets you open it and dump its contents on the screen in a raw
format. A hex editor is a good tool for this purpose, better if
it lets you change the visualization mode from hex to ascii,
best if -like Hiew and Biew- it also lets you disassemble the
files you open. Another great tool is list.com by Vernon Buerg,
which lets you open BIG files of any kind, dump them in ascii
or hex, search very fastly for strings and so on, all working
in a DOS box in less than 30KB (don’t search for the latest,
”bloated”, win9x versions: I’ve recently upgraded to v9.6d but
v9.0h is still ok for my purposes).
3
I usually copy list.com in c:windowscommand directory, then
run regedit and create the following register key:
HKEY_CLASSES_ROOT*shellOpen with Listcommand
with the value:
"c:windowscommandlist.com %1"
This allows me to open any unknown-type file with List just
by double clicking it, and any other file clicking with the right
mouse button and choosing ”Open with List”.
Once you’ve found a file dumper which suits your needs, learn
to use it and USE it... a lot! After a while you will notice that
many patterns occur in files of the same kind and you will
be able to easily recognize them. Also, you will learn a lot of
interesting, useful things. For instance:
• many viruses put their name or some particular bytes
at the beginning of the file they infect, so opening
executables with a dumper will let you not only avoid
being infected by viruses, but also understand which one
is trying to hit you. And, since many trojan viruses are
around in these days, dumping your attachments before
opening them is often a good idea
• did you know that CuteFTP saves your password in
clear inside its macros? Well, I know that you can open
CuteFTP macro files with ANY text viewer, but this
example is just to show you that you should try to open
ANY file you find on your hard disk :) So, if you have
just forgotten a password you’ve saved in CuteFTP ”FTP
site manager” you just have to start recording a macro,
connect to the site you wish and then save the macro...
you’ll end up with a text file like this:
Host 123.123.123.123
RemoteDir /home/httpd/mywebsite
LocalDir D:mywebsite
Retry 20
Login Normal
User myusername
Pass mypassword
Connect
• if you’re so unlucky you REALLY have to use M$ Word
(in most of the cases you don’t REALLY have to do it,
and if you do you’re just stupid, not unlucky ;) start open-
ing your .doc files: it’s really amazing how much unuseful
stuff is stored inside them. If you happened to enable the
”quick save my documents” option, you probably have
some of them pasted inside the files of some others, or
your errors together with your corrections. One example?
Here it is!
1) Open M$ Word (this experiment has been made
with Word97)
2) Be sure that ”quick saving” is activated (it’s the Save
tab inside the Options window).
3) Create a new document and write: ”Dear boss,
you really stink.”
4) Now save your document with whichever name you
like most (I’ve used example.doc).
5) Now change the text so that it reads: ”Dear boss,
you’re really a great man.”
(don’t worry, if you don’t feel you’re able to write
this you can change the text as you wish ;)
6) Save again the document and close Word.
7) Open the file with your dumper and think what
could happen if your boss receives it :)
Also, the file has grown drastically... but I don’t think
I will spend more time on this subject, I’ll rather let
you discover all the details alone, leaving you just one
suggestion: when your M$ Word crashes (I’m sure it will
do), making you lose all your last changes, try to open
the backup files it has left on your hd with a dumper
and recover most of the text with a cut’n’paste.
E. Zeroes
As you may have learned while looking at the files which
reside on your hd, every format has some values which occur
in particular places, or more frequently than others. The reason
why I’ve called this section Zeroes is because they’re often
zeroes... but not always!
For example, text files may have a CR (or CR+LF) about every
80 characters: the line size isn’t always the same, but you can
suppose you will find some regularity - and in some cases
this may let you understand that a file contains text even if
it’s encrypted (just give a look at a .box Calypso mailbox file
and you will understand what I mean). For this kind of tasks,
a good knowledge of the ASCII table may be helpful too...
but you’ll probably read something about it later.
If you happen to study executables or other binary files,
instead, you will find that zeroes are widely used: not just
as string terminators, but also as padding at the end of PE
sections. If you don’t understand this, just think that if you
have long sequences of identical values they may probably
be zeroes. Open, for instance, c:windowssystemsystray.exe
and see how many padding zeroes are there: I wonder if any
virus writer has ever thought of infecting it... it has SO much
space to use and it’s always loaded at startup O:-) Well, I’m
sorry I cannot find a similar example under linux... try to
biew /usr/bin/vim and see what happens, but I’m afraid you
won’t find a PE file ;)
4
III. PLAYING WITH CHAOS
A. Some Basics
Fine. Now you know that files are just a bunch of bytes (wow,
what a good piece of news!) and the software just understands
them the way it wants. Some systems use extensions to
recognize file types, others use particular sequences of bytes,
but the most interesting things happen when you try to open a
file with a software which is not designed to handle it :) Just
one last thing, and I’ll stop boring you... If you use applications
able to handle files in RAW format (which, in fact, is a non-
format), you can read them as text files (as you have seen with
list), images, sounds, whatever you want them to be.
B. Patching With Psp
Ever cracked a program? Well, don’t be shy... it happens
sometimes :) Even if you are one of the lamest ones, and
all you’ve done was running a crack patch, I hope you’ve
AT LEAST understood what you were doing: you know, the
program is a sequence of bytes (just to change, I’d say) and
by modifying it for even only one byte you can make it do
completely different things, such as telling you are a registered
user even if you’ve inserted the wrong reg code.
Of course, the operation of patching a program can be done
for many other purposes, like correcting errors or adding new
functions to close software which comes compiled and without
sources. Usually, to accomplish this task you can use Hex
Editors like Hiew and Biew or tools like my old hexpatcher
(which is available in my tools section). This time, we will do
that using Paint Shop Pro: anywyay, PsP is only ONE of the
programs you can use - you just need an image editing tool
that lets you open images in RAW format.
In this example, we will try to patch a little
program, Cruehead’s CrackMe v1.0, available at
http://3564020356/tutes/crackme.zip. This little Windoze
app does exactly nothing: it just stays there waiting to be
cracked... and since it has a really easy protection, I won’t
spend much time on it. Just know that there’s a regcode
check and then a jump that either sends you to the ”good
guy” piece of code or shows you the ”bad guy” message box.
For those who wish to experiment with SoftICE, just bpx on
messageboxa and when the debugger pops up return from the
call you’re in: the check and the jumps are just a few lines
above the place where you are, at address 401243.
Now, we know from softice that at 401243 in memory there’s
a jz (0x74) that we want to change with a jmp (0xEB): where
can we find it inside the file which is saved on your disk?
There are different ways to do it, depending on which tools
you have (of course, we suppose you don’t have a hex editor):
• if you have a disassembler or any other program which
shows you data about sections, you can read RVA and
Offset and calculate the right offset inside the file:
Offset in file = (Address)
- (Imagebase)
- (RVA)
+ (Offset of section)
For instance, in this case we have these data about
CODE section:
Object01: CODE
RVA: 00001000
Offset: 00000600
Size: 00000600
Flags: 6000020
And the Imagebase is 00400000. So, the right offset of
instruction at 401243 is
401243-401000+600 = 843 (HEX, of course)
• if you have neither a disassembler nor a PE viewer, you
might try with LIST (I did tell you it was useful!). Just
open the executable with it, press ALT+H to view the
data as HEX + dump, then hit backslash to search within
the dump: just enter ”c3 74 07” and BANG! You hit it at
the first shot! As you can see, the ”74” byte is at 0x843
• if you don’t have either a disassembler, a PE viewer,
or LIST... well, you might try to use PSP itself :)
WARNING: this is not easy and it might even become
not funny too, if you have to search much data or a very
common sequence of bytes. But in this case, fortunately,
it’s not so hard and you’ll also have the chance to find
the place where you have to patch inside the image... so
I’ll show you that technique in just a few lines.
If now you’re asking yourselves HOW, in practice, you can
use an image editing tool to work on data instead of images,
let me explain...
If you want to open a file, preserving its binary information,
you have to open it as RAW: i don’t know how it’s called
inside other applications, but all you have to see is a greyscale
image, where every pixel matches one byte inside the file
you’ve just opened. Also, since the size of this image is not
specified inside the file ( ALL the bytes are pixels), you will
have to choose a size yourself. A good size to choose is 100
(or 1000, if the file is huge) for width and, for height, anything
which multiplied by width gives you the size of the file (or
something bigger, we don’t mind adding zeroes at the end).
In our example, since the size is 12288 bytes, we’ll choose
width=100 and height=130.
If you want to read what value is one byte (that is, one pixel),
just put your ”dropper” or ”color picker” tool above that pixel
and click. If, instead, you want to change one byte, just choose
a size 1 brush (or pencil or whatever, I think everything should
be identical at size 1)
本文档为【Every Thing Is Byte】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。