Page 1 of 1

Code breaking source code?

Posted: 2003-09-03 06:58pm
by Shrykull
Probably can't be done if it hasn't already, kind of reverse engineer it, say you have this line in basic

printf "this is a line of text"

find out what it's machine code equivalent is and others, then take a program, and try to figure out what combinations of code that codes for a statement is a higher language.

Posted: 2003-09-03 08:35pm
by phongn
I can't make sense of what you're trying to ask.

Posted: 2003-09-03 08:53pm
by Mad
You mean a BASIC to something else translator?

Posted: 2003-09-03 11:11pm
by StimNeuro
I think he's talking about a compiler and decompiler...

Posted: 2003-09-03 11:19pm
by Mad
Yeah, it'd seem so. But that'd probably be harder than just a straight translator, which would have the same end result as what he seems to want. That is, unless there's already a high-level decompiler. But one would probably be a nightmare to create, so... (I mean, looking at machine code and then trying to determine that it's supposed to be a printf() statement in C? Yikes.)

Posted: 2003-09-04 11:25am
by phongn
Going down to assembly would be easier, but the original poster seemed to want to make a BASIC instruction using PRINT and then determine from the machine code what the C equivilant was!

Posted: 2003-09-04 04:47pm
by Shrykull
phongn wrote:I can't make sense of what you're trying to ask.
Say it's like this, you have that line in basic

printf "this is a line of text" You compile or decompile it (I'm not very familiar with programming) and look at it as pure machine code, say that particular line was 1111110101010101001010 in machine code, any time you saw that particular combination of binary digits, you'd know it was printf "this is a line of text" in basic, getting the code by looking at the binary.

Posted: 2003-09-04 05:13pm
by kojikun
the uh.. C version is

Code: Select all

printf("this is a line of text");
in C++ its

Code: Select all

cout << "this is a line of text";
in PHP its

Code: Select all

echo "this is a line of text";
Assembly would look something like this

Code: Select all

section .text
 global _start
msg db 'Hello, world!',0xa
len equ $ - msg 
_start:
 mov edx,len
 mov ecx,msg
 mov ebx,1
 mov eax,4
 int 0x80
 mov eax,1
 int 0x80

Posted: 2003-09-04 06:08pm
by Mad
Shrykull wrote:Say it's like this, you have that line in basic

printf "this is a line of text" You compile or decompile it (I'm not very familiar with programming) and look at it as pure machine code, say that particular line was 1111110101010101001010 in machine code, any time you saw that particular combination of binary digits, you'd know it was printf "this is a line of text" in basic, getting the code by looking at the binary.
Compiling to machine code or assembly and then decompiling to a high level language would be a nightmare.

As kojikun showed, several steps are required to print a message in assembly. (One line of assembly is essentially one machine code instruction.) Figuring out that said code is supposed to translate back into a PRINT in BASIC or a printf() in C would be a nightmare when normal complexities are involved (printing out formatted variables, for instance).

Further, since there may be several methods for writing to a screen in a given language (there are several different text output functions as part of the standard C library, for example), it would be near-impossible to rebuild the code as anything resembling the style of the original. (Comments and variable names would be completely obliterated regardless.)

A program that translates directly from BASIC to C would be much simplier to write and would present much cleaner results. It'd probably still be messy, but not nearly as bad trying to decompile to a high-level language.

Re: Code breaking source code?

Posted: 2003-09-05 12:19pm
by Sarevok
Shrykull wrote:Probably can't be done if it hasn't already, kind of reverse engineer it, say you have this line in basic

printf "this is a line of text"

find out what it's machine code equivalent is and others, then take a program, and try to figure out what combinations of code that codes for a statement is a higher language.
There is no strict rules compilers follow so decompiling machine language into it's precompiled form is very difficult or nearly impossible in case of high level languages like VB. With C it is possible get an idea of what precompiled C code looked like before the linker worked on it but it is way too painstaking to be worth the effort.

Furthermore compilers scramble source files in all kinds of ways to optimize them in release build making it impossible to decompile them. Also C programmers include tens of thousands of lines of library code which get mixed up with program code creating an extremely confusing picture. So if you are thinking of hacking / decompiling a commercial software forget about that. You would have better luck writting a new app that mimics whatever you wanted to hack.