Sunday, 22 December 2013

Program Space

Program Space
A Program Space is Really Messed Up
After having conceived the notion of search space, it may be quite useful to understand what is a program space. If you are familiar with genetic programming (GP), you may have heard about the term program space or the phrase that the search space in GP is composed of all the possible computer programs. So what is program space? Let us try to extrapolate about program space through our imagination a little bit.

Program space is also a search space. The search space in numerical optimization may be a real-valued hyper-plane. Obviously it depends on whether the coefficients you are trying to tune assume real values. In the case of integer programming, for instance, the search space is an integer-valued hyper-plane. As discussed elsewhere, the search space in GP corresponds to all the possible computer programs that might be the candidate solutions to a problem.

However, it not possible to make a mental picture of the program space in one's mind. The mere reason is that human mind can not visualize beyond three dimensions. However, we can try to understand the program space by listing the same reasons due to which it cannot be visualized.

GP normally utilizes a quality of computer programs that they can be represented by their syntax tree. This makes a lot of things easier for GP practitioners. For instance, it makes implementation, adaptation, execution and manipulation of GP programs easier if they are represented as syntax trees.

GP also normally restricts the size of a computer program to 17. One can have a program represented as an n-ary tree in GP. This means that the various branches of the tree can have either one, two or three leaves (or children). A branch normally does not have more than three leaves, however, there is no restriction. Moreover, a tree in GP may have many nodes either representing a coefficient, a function, an operand or a system variable.

But why do we care about all of this information? The reason is that knowing all of this we may get an idea that how much flexibility is allowed to GP in developing a tree that represents a computer program or a mathematical model. Let us call this as degree of freedom of some sort. What this tells us that a tree in GP is not restricted to acquiring a form that may be captured with a few number of parameters. We have the depth, the variables, the various functions and their various arities and coefficients that can assume random values. Moreover, all of the building blocks of a tree are again not expected to occupy a certain fixed position, they may occupy different positions.

All of these factors literally obfuscate the problem of visualizing a GP tree. In simple words it may be said that all of these factors and various combinations of these factors alone require different planes for themselves for developing concrete mathematical notions of program space. In other words, a program space is increasingly hyper-planar to the extent that it cannot be visualized easily. However, discussing it somehow is beneficial in the sense that we can get an idea that how difficult it actually is to conceive a GP program space. Undoubtedly, appreciating this difficulty fully in itself is synonymous to achieving a great milestone.

 Creative Commons License
Psyops Concepts by Psyops is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Based on a work at http://psyopsconcepts.blogspot.dk/.
Permissions beyond the scope of this license may be available at http://psyopsconcepts.blogspot.dk/.

Search Space

Search Space
Would a Search Space Look Somewhat Like This?
In machine learning (ML) we often hear about the term search space. The term is frequently used in describing almost any ML algorithm. Talk about any ML algorithm, like genetic algorithms, artificial neural networks, bio-inspired models, the term search space would somehow turn up in the discourse to haunt the student. A novice in the subject of machine learning may find it difficult to understand the meaning of the term, let alone conceiving the whole idea that it actually refers to.

So what does the term search space actually refer to. In the context of mathematical optimization it refers to a region that contains all the possible solutions of a problem. To this end, it is also expected to contain the optimal solution or solutions that may be considered to be optimal, sub-optimal or near optimal. But even this much of an explanation is a little bit vague. We can try to develop more concrete notions of search space with the help of the following example.

Suppose that you are given a mathematical model. You somehow know that the model is supposed to solve a well known regression problem. For instance, the problem may be find prices of different types of clothes to near perfection. The model that you are given and that is supposed to find the prices is a function of a few variables. These may be the color, the texture, the material and so on of the clothes. You may find the values of these variables from a relevant corpus. Now, the model also has a few coefficients. Depending upon the model there can be two, three or even a lot more coefficients. In numerical optimization you are normally expected to find the values of these coefficients.  

In the context of numerical optimization, the search space corresponds to the values of  all the possible combinations of these coefficients. For instance, if your model has six coefficients and if they are expected to assume real numbered values, then the search space is the six dimensional real plane. 

It may help a lot if we have a way to visualize the six dimensional real plane. This is however not easy. It is not possible to visualize a space beyond three dimensions. So for the sake of ease of understanding we can restrict out search space to comprise of two unknown coefficients. That is to say that we can assume that our model has two coefficients and we have to find values of them. 

Visualization now becomes easier since our search space is comprised of the two dimensional real plane. We can reserve the third dimension of the three dimensional real plane for a so called error term. The error term tells us on as to how well the model performs if we tweak the values of the coefficients a little bit here and there. This may be the error between the output of our model and some known reference model to which we are trying to map our model.

Now, normally the search space starts to look more like a mountain range with peaks and valleys here and there. This way of thinking about the search space makes it easier to conceive it. As we choose different combinations of the coefficient values, we may hit somewhere close to a peak or to a valley depending upon what we have chosen. 

Normally the goal in numerical optimization is to hit the bottom of the lowest valley. This is specifically the case if our objective is error minimization as mentioned above. Most of the numerical optimization algorithms are designed by keeping this view of the search space in mind. Indeed, it is a goal of the algorithm to be able to traverse the whole search space nicely, easily and to be able to reach a globally optimal solution if at all.

Creative Commons License
Psyops Concepts by Psyops is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Based on a work at http://psyopsconcepts.blogspot.dk/.
Permissions beyond the scope of this license may be available at http://psyopsconcepts.blogspot.dk/.

Genetic Programming

A Typical GP Breeding Cycle
Genetic programming (GP) is another very important machine learning (ML) algorithm. It is also a very elegant algorithm and possibly the most powerful too. It belongs to the class of evolutionary algorithms (EAs). To this end, it is also a coarse adaptation of Darwinian evolution and tries to manifest the themes of natural selection and the survival of fittest in problem solving as they are  perceived about natural  biological systems.

In traditional ML algorithms the search space is comprised of all the possible solutions in terms of the possible values for the various coefficients for a known or a priori fixed mathematical model. The  underlying mathematical model may have been chosen to solve any real world problem. For instance, the mathematical model may be required for weather forecasting, financial bidding or robotic control of a machine. A traditional ML algorithm is run on such a model so as to tune its coefficients so that it starts performing optimally in some meaningful sense.

GP differs from traditional ML algorithms that simply aim at tuning the coefficients of a well known model. In GP the search space is comprised of all the possible computer programs or mathematical models that may be candidate solutions of an underlying problem. To this end, while other ML algorithms search the so called coefficient space for possible solutions of a problem, GP tries to find an optimal solution from the whole program space, whatever that may be.

Running GP is quite simple and there are loads of open source software that are available online. A typical breeding cycle of GP is also quite similar to a typical EA. Following is the pseudo code for a typical breeding cycle of GP. It has been taken from a discourse on genetic algorithms

  1. Create an initial population of individuals of size n. 
  2. Evaluate each individual on the given problem and assign it a rank R. 
  3. Randomly choose pairs of individuals as some function of R. 
  4. Apply genetic operators of crossover and mutation on the chosen individuals to form a pair of offspring. 
  5. Repeat step 4 until a new population is formed.  
  6. Go to step 2 and keep repeating till this point until the desired solution is found.

 Creative Commons License
Psyops by PsyopsPrime is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Based on a work at http://www.psyops.tk/.
Permissions beyond the scope of this license may be available at http://www.psyops.tk/.

Friday, 13 December 2013

Simulated Annealing


Simulated annealing is one of the most elegant algorithms that belongs to the family of meta heuristics. It is quite easy to understand, simple to implement and run. It also has a very small memory footprint which basically means that it consumes a very small amount of CPU memory while running. This is specially considerable when we compare it to other compute intensive algorithms like genetic algorithms and genetic programming. 

Like many meta heuristic algorithms simulated annealing has been drawn as an analogy from materials science. The word annealing means to soften up something. In materials science and specially in metallurgical engineering the term annealing is basically used in the context of heat treatment of alloys. More specifically annealing is a process in which a metal or an allow is cooled down from a very high temperature at a very slow rate. The metal may be in a molten state. Cooling at slow rate allows the metal to form coarse grains and to settle in the states of lower energy. Although such grains can be observed only under a microscope, the result is that the metal becomes soft due to annealing. Hence the term annealing. The opposite of annealing is quenching. In this the metal is cooled at a very fast rate, preferably by soaking it in water. Hence the name quenching. Cooling at a much faster rate does not allow the grains of the metal to become bigger and to acquire lower energy levels. Hence, the metal becomes hard or brittle, or acquires these kind of properties.

In simulated annealing algorithm tries to look for a globally optimal solution for a problem just like other meta heuristic algorithms. In the context of numerical optimization it is normally sought to find a global minimum in a search space of candidate solutions. In order to achieve this the algorithm initially picks an initial solution randomly. After that it iteratively chooses better solutions in order to descend through the search space. The algorithm also allows itself to randomly choose bad solutions at times in the hope of hitting a trajectory on the search space that may lead to a globally optimal solution. The concern that how much should it be allowed to choose bad solutions depends on a term called the annealing temperature of the algorithm. 

The annealing temperature is initially chosen to have a high value. As the iterations proceed, the annealing temperature keeps on decreasing so as to restrict the use of bad solutions. The overall working of the algorithm can be explained with the help of the following pseudo-code.

  1. Choose a solution S for the problem randomly.
  2. Set the annealing temperature T to some sensible value.
  3. Pick another candidate solution S' .
  4. If S' is better than S then set S=S'. That is choose S' to be the new choice for a possible solution.
  5. If S' is not better than S then choose S' as the new solution with a certain probability that is a function of the annealing temperature T.
  6. Decrement T.
  7. Repeat from step 3 till this point until a certain desired solution is met, or a certain number of iterations have elapsed or a certain user defined objective has been met.
  8. Terminate the algorithm and keep S as the desired solution.



 Creative Commons License
Psyops by PsyopsPrime is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Based on a work at http://www.psyops.tk/.
Permissions beyond the scope of this license may be available at http://www.psyops.tk/.


Thursday, 12 December 2013

Genetic Algorithms

Genetic Algorithm DNA
Genetic algorithms are possibly the most beautiful and yet easy to understand algorithms in the field of machine learning. A genetic algorithm is a population and generation based global search scheme in which it tries to find near optimal solutions to user specified problems. The scheme has been adapted from Darwinian evolution as it is supposed to take place in natural ecosystems. 

The central idea in running of a genetic algorithm is that it would start off with a randomly selected population of candidate solutions to a problem. Let us call each one of the members of such a population as an individual. Although the individual may also be called as a genome, member or a candidate solution depending upon the jargon.The algorithm checks each of the individuals for its suitability to solve the given problem and assigns it a rank accordingly. Just as it is expected to happen in Darwinian natural selection, the individuals that have better ranks are supposed to stand better chances at giving rise to progeny. To this end, the algorithm iteratively selects individuals from the population according to their ranks and creates new offspring accordingly. Offspring is created by applying genetic operators. Normally these are crossover and mutation. When the desired number of offspring individuals are produced the algorithm starts over again in evaluating the individuals of the newly formed population and ranking them and so on. This process of creating, evaluating and ranking newer populations keeps on going until the desired solution to the problem at hand is found. The algorithm stops at this stage. Following is a pseudo-code of a typical genetic algorithm:

  1. Create an initial population of individuals of size n.
  2. Evaluate each individual on the given problem and assign it a rank R.
  3. Randomly choose pairs of individuals as some function of R.
  4. Apply genetic operators of crossover and mutation on the chosen individuals to form a pair of offspring.
  5. Repeat step 4 until a new population is formed. 
  6. Go to step 2 and keep repeating till this point until the desired solution is found.
Genetic algorithms are very strong in their ability for solving intricate problems. 


Creative Commons LicensePsyops by PsyopsPrime is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Based on a work at http://www.psyops.tk/.
Permissions beyond the scope of this license may be available at http://www.psyops.tk/.
Micah’s DNA by micahb37, on Flickr
Creative Commons Attribution-Share Alike 2.0 Generic License  by  micahb37 

Understanding Fourier Transform

Fourier Transform
Integral transforms are extremely essential to learning any subject related to signal processing. Moreover, they have immense applications in almost every engineering discipline related to electronics or computers. If you want to learn about anything related to signals, you would come across mentions of one integral transform or the other.

Although the family of integral transforms has quite a few of such transforms. This article addresses one of them, i.e. the fast fourier transform. It is probably the most widely used signal transformation techniques in which a digital signal is transformed from the time domain to the frequency domain and vice versa. 

However, the subject of Fourier transform is normally orchestrated to the student in such a way so as to obscure the whole idea on as to what this simple and highly effective technique actually is all about. In as much as the student fails to understand the underlying concepts whereby the signal is transformed from one domain to the other, the student completely fails to understand the utility of this highly useful mathematical tool.

The aim of this article is to explain how Fourier transform actually works in as much of a lucid way as possible. It is assumed that the student has a basic familiarity with complex numbers and also with cross-correlation. Although it is not essential at this point for the student to have a good conceptual grasp over complex numbers. It is the aim of this article to deliver the conceptual caveats of the Fourier transform without the use of complex numbers. Complex numbers normally completely baffle the keen student. And that is probably one reason why they are called complex.

So lets begin our discourse with a revision of cross correlation. If you have read about it, that is fine. Otherwise, if you have read linear algebra, we shall leverage from that a little bit. From a linear algebraic point of view, the cross correlation equation is tantamount to the inner product of two vectors. We may further make our lives easy by assuming and saying that a vector is an array of real numbers.

So what does a correlation equation do? As the name suggests, the primary function of the correlation function is to find on as to how much two vectors or arrays correspond or resemble with each other. In other words, it tries to find out that how much two vectors or array are similar to each other. If the two arrays have a higher similarity, the result is usually a large number. If the two arrays do not have a high similarity, or if they have very low similarity, the result is usually a very small number. For instance, you can assume the result to be zero if the arrays are totally dissimilar to each other. I hope that understanding the role of cross correlation has been easy for you up to this point. If you have further curiosity about the correlation equation, please read (link).

Now how is correlation correlated with the Fourier transform? The answer is that the Fast Fourier Transform is actually a correlation function.What we actually do in this is that we take time domain signal of a certain length and try to find its correlation with another signal of a well known frequency and a well known magnitude. Let us call the former signal the signal under test. It could be a speech signal or signal from the cosmic microwave background. Let us also call the latter signal the reference signal for the purpose of lucidity. For the sake of simplicity let us assume it to be a speech signal in time domain. This simplifies our work because a speech signal is two dimensional. The horizontal axis shows the time and that is why we call it a time domain axis. The vertical axis represents the magnitude of the signal and is known as the magnitude or the amplitude of the signal. 

A speech signal in the time domain is also a multi-spectral signal meaning that it has many frequency components in it. Asserting this is also important because it makes the application of Fourier transform a lot more sensible on such a signal. The very fact that a speech signal, or any other such signal, such as a signal from our local cell phone company, has many components of different frequencies embedded in it is the very reason we apply Fourier or such transforms on them. Clearly, we want to figure out which frequency components are embedded in them so that we could use them for some suitable purpose.

So how does Fourier Transform work? Given that it is basically a correlation function, we can conceive its function as follows. What we really do is that take a time domain signal of a given frequency and a given magnitude, we use it to apply a correlation operation on the signal under test. For instance, if our signal under test is a speech signal then applying the correlation operation returns to us the magnitude (or strength) of that particular frequency  of which our reference signal was made. This is how we compute the magnitude of one desired frequency in the signal.

In other words, if we wanted to know that whether our signal under test had a frequency component of, let us say, 90 hertz in it or not, we would choose our reference signal to be of 90 hertz. Performing cross correlation between this signal and the signal under test would return to us the magnitude of a 90 hertz component in the latter possibly multi-spectral signal.

If you have understood the explanation this far well, then it should be fairly easy for you to understand the rest of the Fourier transform. In rest of the transform what we simply do is that we choose various reference signals of differing frequencies and perform cross correlations between them and the signal under test one by one. At the end of each cross correlation we store the results in a separate place. These are the magnitudes of those frequencies. Assembling together the whole array of magnitudes in possibly an ascending order of frequency gives us what we may call it as the frequency domain outlook of the signal under test. Reiterated, what it simply tells us that what frequencies are present in the underlying signal and what are their magnitudes.

You may be getting slightly uneasy by the idea of doing all the calculations by hand. This may indeed be a cumbersome thing. However, you do not have to worry about that. Normally all the Digital signal processing packages, like the one that can be found in Matlab, can accomplish this with one line subroutines. Understanding the concept is however important.
Creative Commons License
Psyops by PsyopsPrime is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Based on a work at http://www.psyops.tk/.
Permissions beyond the scope of this license may be available at http://www.psyops.tk/.

crossing by Genista, on Flickr
Creative Commons Attribution-Share Alike 2.0 Generic License  by  Genista