EstimAid

Upload CSV File No file chosen.

Optimizer: Learning rate: Gradient descent steps per cycle (epochs): Significant figures:

Least Mean Squared Error (MSE) of standardised data: 2.12

\hat{y} = 0.91 + 0.12 \cdot x_{0}^{0.66} \cdot {0.51}^{x_{1}} + 0.02 \cdot x_{2}^{0.34}

(all figures provided are exemplary)

Gdp is denoted by $\hat{y}$ .
Healthcare is denoted by $x_{0}$ .
Literacy is denoted by $x_{1}$ .
Technology is denoted by $x_{2}$ .

1 About

1.1 Purpose

EstimAid is an online tool which can be used to generate a model describing the data submitted by the user. Because of the way in which the code is written, the structure of the model is not manually fixed like in conventional econometrics. This way, it is likely that a model is found with a smaller squared error term as it takes into account more options. Even though larger low-parsimonious models can be constructed that can interpolate or extrapolate more accurately, EstimAid is about using as few variables as possible so that the final model used can be interpreted contextually, just like in manual methods of regression.

1.2 Algorithm

Based on the number of variables are found in the user's dataset, EstimAid creates all possible models according to the following rules:

Each model has one random starting bias parameter, for example $0.91 +$ .
As long as the last element of the model is an addition operator, $+$ , a random constant parameter is created that will be multiplied by what will follow in the model. One example could be $0.12 \cdot$ . Combining this result with the latter rule, we now have $0.91 + 0.12 \cdot$ .
Because the last element of the model is a $\cdot$ , the first variable will be stitched to what we have up until this point and it will be either the base or exponent of another random parameter that is created. For example, we could have $x_{0}^{0.66}$ or ${0.66}^{x_{0}}$ . Using the first possibility, combining this again with what we had we will get $0.91 + 0.12 \cdot x_{0}^{0.66}$ .
After this first variable is used, either a $+$ or a $\cdot$ will follow. Then, we will repeat either step 2 or step 3, respectively, now with either a new random constant parameter or a new variable. This process will repeat itself after all variables are used up. We then might end up with models like these $0.91 + 0.12 \cdot x_{0}^{0.66} \cdot {0.51}^{x_{1}} + 0.02 \cdot x_{2}^{0.34}$ or $0.91 + 0.12 \cdot x_{0}^{0.66} + 0.21 \cdot {0.51}^{x_{1}} \cdot x_{2}^{0.34}$ .

The beauty of this algorithm is that an immensely wide array of models are considered. When brainstorming about this, one can easily confirm that a large proportion of models where each variable is used once can be generated by this model. Only models containing special mathematical operators are not considered, but this comes with mild implications as these often have less practical use than the ones more commonly used. With the possible models generated, the tool then goes on to optimize the parameter values of each model with a machine learning algorithm so that the testing values provided in the uploaded dataset are optimized. After a specified number of iterations, out of all generated possible models chosen is the one with the most accurate predictions.

Then, the second out of two stages of the process is reached. Here, the structure of the best model generated in the first stage of the process will be copied and cloned by the number of models the algorithm created in the first stage. For the same number of iterations, all of these models with different starting parameters are optimized. This way, many more local minima will be approached so that the probability of finding a local minimum lower than the optimal one found in stage one is close to one, just like the best local minimum is more likely to be closer to or virtually at the global minimum of the loss function. Finally, this mathematical representation of the model is prompted to the user.

1.3 Limitations

Perhaps the most significant limitation of this method of estimation is that the suitability of a given generated model is defined only by the value of its Mean Squared Error. Economic theory will not be used to determine the most likely form of the variables at hand. Other significantly important mathematical properties normally considered in estimation are completely neglected in this way. Therefore, again it is emphasized, EstimAid should never be used in exclusion to other methods of estimation.

Another limitation with this method of estimation is related to the machine learning algorithm. Namely, when the parameter values in the possible models are optimized, a local minimum of the loss function is approached. As not all local minima are global minima, it is not always the case that a global minimum is found of the loss function. In mathematical methods of regression, for some forms, it is possible to find the global minimum of the same loss function (the Mean Squared Error). Nevertheless, local minima can come very close to global minima or at it and the mathematical models of regression are not normally as complex as the models EstimAid generates so that using this tool can have benefits in comparison to those mathematical methods. In any case, both methods must not be used in exclusion of the other one. When time permits, they should be used in combination, enjoying both of best worlds.

The last limitation is that of performance. EstimAid runs in the browser and is therefore limited to part of the available computational power. Furthermore, as the number of variables grows, the number of possible models the tool has to optimize grows exponentially. As the number of variable values per variable grows, so does the optimization time per possible model. If both numbers grow sizable, computing time to reach an acceptable estimation function can take a long time. Some 'reasonable' numbers are tested on a reasonable laptop, however, and normally performance is not an issue at all. Still, scaling issues exist but, logically, the extent to which these manifest is determined by the challenge of the estimation project. A partial solution could be to implement a 'natural selection' algorithm in this tool, where for example generated models 'survive' to be carried forward to the next epoch based on how they score loss-wise in comparison to other generated models. Seemingly futile computations are then exterminated from the program, but it might be the case that one of the removed models was just about to descend steeply on its loss function before it was taken out of the list of generated possible models.

2 Tutorial

2.1 Data

The dataset in CSV format should be formatted as follows:

Each column starting with the variable name in the upper cell, with the variable values below.
After all variable columns, one column should follow with the output variable name at the top, and all output values below.

2.2 Configurations

Here are some things to think about when configuring the model.

Optimizer. The optimizer is the function that minimizes the loss of each possible model and changes the model parameters accordingly. The Stochastic Gradient Descent optimizer is recommended as it seems like it is best supported computationally by the Tensorflow API this website uses, but other optimizers can also be used. More information on optimizer can be found when clicking me.
Learning rate. The higher the learning rate, the bigger the steps the models take in their gradient descent of the loss function. This will result in models more swiftly approaching their local minima, but can also cause difficulties whenever the local minima are almost reached. When the epoch number is large enough, a small learning rate will not pose any problems.
Significant figures. This will determine the number of significant figures of all parameter values shown in the final model.
Gradient descent steps per cycle (epochs). This number represents the optimization steps performed by the machine learning optimizer. The higher this number, the more the loss function will be minimized and the better the final estimated model. By running some trials and monitoring the browser console as shown in the video examples below, an appropriate configuration can be determined.