![]()
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
BitCrypt Algorithm |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The text provided below is designed to explain the reasons for the program and the manner various solutions have been implemented. Please take time to read this description as it is to provide answers to many questions posted by the critical public. Please also read the page titled "Explanation" as it describes the reasons for creation of this application. About three years ago I decided to create an application that would allow for transmission of secret textual information. As many critics pointed to me later on there is a number of practical problems associated with this idea. Instead of giving up I tried to resolve those problems, or at least address them in some satisfactory manner. Thus, I wrote the program, observed its weaknesses and tried to correct them. Subsequently, I became aware of other problems with the program, so I tried to address those problems in turn. All of that time, I would acknowledge a potential or real problem faced by the application, and I would try to address it. After three years of doing so, I hope that the application is at the reasonably good level, both when the quality, and the security it provides are concerned. On this page below I would like to explain how I tried to address some main problems faced by the application, and also to describe how the application processes the data. The objective I would like to achieve is to pass the text from point A to point B without anyone suspecting. I would also assume that the sender of information may be under some form of observation, therefore he may not pass the text in the form of obviously encrypted message. For example sending of encrypted e-mail is out of the question as it would instantly expose the sender. I decided that the most practical manner would be to post an image on a website as this is a very common practice and at the same time it does not rise any questions related to security issues. Please observe, that I am not interested in the purity of the solution. I am not trying to perform any academic studies of the subject. What I would like to achieve is a simple and practical solution that may be used in real life situations. Image FormatsMy first question is related to the choice of image format. One could potentially choose: GIF, Jpg, PNG, and bitmap formats. There are of course many other image formats but I am not interested in them as they are not so often posted on the Net.
Thus, I have decided to choose 24-bit bitmaps as the medium to hold data. The bitmap may be posted on the Net directly, or may be converted to PNG in a lossless manner if this is desired by the user. As long as the conversion to and from are lossless the program will decode the data afterwards.
Visibility of the Encoded PointsThis is more complex problem due to a number of reasons. Firstly, if the image contains a well determined modifications anyone who analyzes it may instantly discover the presence of encoding. Secondly, if he does so, he may try to computer-analyze those points and to recreate the original text written into the image. This set of problems had to be approached very carefully in order to address all of the potential pitfalls awaiting here. What I did is quite simple. I agreed instantly that it is impossible to encode the points without making visible modifications to them. Thus, they will always show as modifications of the original image. However, it does not mean that it can not be done. It just needs to be done differently. What I mean is that instead of modifying just the encoded points one may modify all of the points present in the image. Thus, the application would firstly modify all of the pixels. It would be done randomly, so that each of the modifications would be different and impossible to predict. Subsequently, a selection of points would be chosen and into those points the data written. As long as the randomly introduced modifications are larger in magnitude than the actual data containing ones, the process would hide the encoded text. To make the process even more consistent, one needs to observe that the random number generators are often imperfect. One can do a simple study on random number generators provided with the programming languages and discover quickly that with large number of points they tend to display patterns. Take a white image and fill quarter of it with black points that have been randomly generated. It is almost instantly obvious that the generator is using trigonometric functions as the black points form a wavy pattern. One can actually do a lot of studies on that subject as some people do so. To resolve this issue I applied two random number generators in turn. Both have slightly different characteristics so that they obscure their respective patterns. This is not the solution that the purists would be proud of but in this application this solution seems to address the problem well enough. Below I have an example of a monochromatic image original without any encoding, with one letter encoded (that is almost completely modified by the random number generators), and with 3/4 filled with encoded text.
I have used diff function to compare the images on the right with the original image. In the case of the first image (mainly randomly In the second case the number of unchanged points
is 17.
There is a method of encoding the points with the actual information that does not require extensive modification of the pixels. I will describe the method lower down on this page. My original worry was with supercomputers applied to the pixels containing the data. In order to address this problem I sprinkled the image with as much of random modification before even attempting to encode the data as possible. In this manner it becomes impossible to find out which points are used for encoding. In fact those points are dependent on the image parameters, for example its size. I use a collection of prime numbers which are selected depending on the key entered by the user. (In version 3.1 the selection of these was based on width and the height of the image). These prime numbers are used as seeds to determine the encoding points. With a different image dimensions a different set of prime numbers would be applied to the problem. In this manner the selected points differ form image to image. The Encoding of PointsI have found the actual encoding of the points to be the most challenging. Any printable character may be represented as a number between 0 and 255. In fact the first 31 numbers are not used and anything above 126 is also not printable. The effect is that only 95 characters need to be storable within the encoding. These may be assigned to numbers from 1 to 95 or alike. Also, a number does not have to be represented in the decimal system. Computers use binary systems, and for the purpose of this application I would represent a number in the four-nary system (if this is the right word for it). Thus, for example the decimal number 95 may be represented as 1 x 64 + 1 x 16 + 3 x 4 + 3. Thus in the four-nary system that would be 1133. Any number that we need to consider would be at most four digits long, and any of those digits would be between 0 and 3. Because of that in order to encode any character we need two pixels (each pixel has three colors) giving us six places to put those digits in. For example the above number 95 which
is represented by 1133 could use two pixels with their respective colors
(r g b) (r g b). Thus, I started with (120, 11, 23) (32, 1, 3) and ended up with (121, 9, 23) (32, 3, 3) - But look, the last number is not changed even though it contains data equal to 3, and the second number is actually smaller than the original one, even though I have written 1 into it. My random number generator tries to add values to the pixels. This makes the image slightly brighter. My data encoding normalizes the points to multiplications of four first, in this way it makes the image slightly darker. The effect of this is such that the points containing the data seem to be least modified. What stands out are the points that have been modified randomly.
What remains is the choice of the four places to write those digits into. As seen above I have six color values but I need only four. I use the characters supplied in the key typed by the user to decide what combination of those four to choose. For each subsequent data point I look at the next character in the key. Based on that I choose the schema for storing those four digits. This brings additional level of security, as without the knowledge of the key it is impossible to decide what four colors have been chosen for a given point. Please observe that the program changes the colors independently from each other. In other words if you begin with a grayscale image then after the encryption processing the image will have slight tint. You could claim that the tint is due to some image editing, like for example noise addition, but it would be better to start with a proper color image. The CiphersThere is a number of analytic techniques that may be applied to the program in order to learn about the encoded information. Everybody knows about reverse engineering, but there is a lot more that can be done to that extend. For example one may attach a debugging process to any other and look at the state of the program being analyzed. Moreover, one may look at the memory occupied by the running process and to try to read the information from there. Even more, one may use a debugger to modify the CPU registers while the program is running and in this way overcome any unsatisfied 'if' statements. Also, one could save the memory of the running program to the disk and try to read the information from that (this is commonly called: "the memory dump"). I have tried to address some of those problems by making sure that at any time the memory of the program would not contain the decoded information (it would keep it only if the key has been verified to be correct). However, this approach has obvious limitations. For starters, one may not encode the key into the image as it would be retrievable by the means of one of the above mentioned techniques. Therefore, I had to include strong ciphers in the program's initial processing stage. I had a look around Internet for some quality cipher implementations. My application is written in Delphi so I looked for the modules written in this language. I downloaded about five, that were free, and contained the source code. After analyzing the first two I rejected them due to various problems with the code. I looked at the "Delphi Encryption Compendium" written by Hagen Reddmann and I liked it. I had a look at the code he has written and it seemed to be very professional. He also provided working examples of his implementation and I found them to be helpful. I also write some testing program and included his cipher module in it so that I could see it in action, and was satisfied as well. Subsequently, I looked for his other work on the Net and found his Prime Number algorithms, and his optimization work. Over all it seemed to me that he is a real professional and knows very well what he is doing. I decided to include his implementation in the program. I also decided not to touch his code. I have a little experience in the field of software modification and I think that the most effective way of breaking the program is to correct something that seems to be a problem. Thus, his module has been used in here without any modification. The module written by Hagen Reddmann contains more hash and cipher algorithms that I provided to the users. I thought that listing too many would not be productive, therefore I picked those that seemed to me most powerful, and free of trademark constrains. Memory Dumping and Process AttachmentWhen you have a running program you may stop it at any moment and then request the operating system to save the current state of the computer memory into the hard drive. This is called 'memory dump' and is provided so that the snapshot of the memory may be analyzed for debugging purposes. If the program contains any important information, like for example the encryption key, then it could be potentially retrieved from the memory dump. It could be a potential problem, as the other party would simply learn about the key and applied it in the next running of the program. There are two simple solutions to this problem:
As you can see from the above I have 360 encoding schemas. For each of the points the program reads the subsequent letter from the encryption key specified by the user, and based on that, it decides which schema to use. When the key is exhausted it starts from the beginning of the key again. At this stage, the information being written into the image is encrypted with the cipher selected by the user. Therefore, just by stepping through the executing program, it is impossible to guess which schema has been used. This is the second place where I apply the same approach. Namely, in order to use a supercomputer to decrypt the information from a string, one needs to have a reasonably clean encrypted string to start with. That is, one would like to have a string which is encrypted, and nothing else, and then to brake that encryption by the means of computer calculation. If you have a string of characters with only 10% of it representing the actual encryption, and the rest being just a collection of random characters, the computer would not be able to decide which portion of the data represents the actual encryption and which is to be removed as being just noise. I think a bit of my physics background comes into this. In experimental physics one is often presented with the problem of retrieving the signal from the underlining background noise. In this case, what I have done is to increase the level of the underlying noise. I have done it twice, firstly by adding the random number generators, and secondly here, by mixing the encoding schemas with random color variations naturally occurring in any colored image. By this stage, the level of underlying noise should be strong enough to obscure any signal that is actually present in the sample. As I indicated at the beginning of this page, I am not after making the pure mathematicians happy, but rather, after finding a practical solution to a practical problem. The Algorithm Step by StepI would like to repeat the information provided above placing it in a step by step scenario, implemented by the program. Hopefully, it will be clear to the reader what the program does.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||