An idea to build an expert advisor based on some self-learning system isn't new but is rarely used in retail Forex trading. Most expert advisors you will find do not collect statistical information to compare it later with the current market situation. Even though building a self-learning EA appears to be a tough process, it isn't an overly complex task, especially, when you have some examples.
The ideas and examples presented here can be a foundation for your own
Introduction into statistical Forex systems
A statistical Forex system is a system that relies on the information which was previously collected from the market and the amount of this information is proportional to the period of time, on which the market is analyzed. Plain optimization of input parameters during some period isn't the same as collecting statistical information, thus not every optimized system or expert advisor is a statistical EA.
Statistics is collected into a special file in a format that is recognized by the system; optionally, the system may update these statistics. Designing a statistical Forex system is a complex problem that involves market analysis, rules development, and real-time evaluator building.
If you want to build such trading system, you need to answer the following five questions first:
- What period of time to use for statistics gathering? Common sense suggests that the longer is the period, the better will be the gathered statistics, but in fact, there could be some problems if the system collects the information from the periods that, due to some reasons, are unrelated to the current market's mechanics.
- What kind of information will get into the statistics data? This is probably the most important question if you want to create your own statistical expert advisor. Will you record raw quotes, indicators values, or some custom calculations? What indicators or calculation methods to use? What else should be recorded?
- How will your system compare current market situation with the statistical data? You have some data that is associated with the rising market, some — with the falling market, and the rest of your statistics is associated with the sideways market. What methods can you use to compare current market data with your statistics to make your next trading decision?
- Will it learn or will it be taught? The statistical Forex system doesn't have to collect statistics, but it may be designed to do so. This addition has its advantages and disadvantages.
- How complex will it be? Statistical Forex EA can be a very simple program, but it can also be developed as a powerful analysis-comparison program. It can be capable of recognizing not only price action patterns, but also correlation with the days of the week and the trading hours, as well as to look into the past to see the start of the trend or the previous price patterns.
MT4 expert advisor built as a statistical Forex system can be very profitable, but its creation is not a trivial task. Further we elaborate more on these questions and optimal solutions.
Choosing statistics timeframe
Choosing the timeframe for the statistics that will be gathered for your system consists of two parts — choosing the chart timeframe discreteness and choosing the length of the period over which the statistics will be gathered.
Choosing the right chart timeframe is a matter of balance between the uninformative short-term data with many samples and the small quantity of the more specific information. On the one hand, if you choose a tick or 1-minute timeframe discreteness you will have to face a large amount of data and a lot of CPU power used up on gathering, calculating, and finding this data; looking up some patterns in these vast arrays of information wouldn't be an easy task and further strategy building process will become too complicated in this case.
On the other hand, using daily or weekly periods will give you too little information. For example, one year of market analysis on D1 charts will give you just a little more than 200 data samples. Considering the 24 hours a day and 5 days a week nature of the Forex market, the best choices here are M30, H1, or H4 timeframes as they give you a fair number of samples with a decent information density, because such samples will have a greater variation.
Alternatively, you can use multi-timeframe statistics, but that would lead to a really complex system, which, of course, will have a better learning potential.
Sampling period's length is an important parameter of statistics gathering. Using a small period will allow you to recognize the most up-to-date patterns, and your strategy will probably benefit from them in the short to medium term. Unfortunately, short sampling period may contain too few of these patterns, and if the market changes, they will probably fail to help with the recognition of the changed price dependencies. Long sampling period will give a very wide array of patterns which can be used in comparisons, but the difference between the market today and the market several years ago can bee too large, so those patterns can lead your system to a high inaccuracy rate.
Getting statistics over the past 2-3 years is a balanced decision here. You catch more than one long-term trend and you get a lot of the medium- and short-term trends caught into your statistics with such a period, while really outdated data isn't spoiling your statistics.
Of course, these decisions should also depend on your system, the nature of the data you will be collecting, and the timeframe that it will use in actual trading. But don't forget the negative and positive sides of different data timeframe and sampling periods — try to avoid extreme values that could possibly ruin your strategy.
Information to gather
Gathering statistics over a chosen period of time for the given market instrument is the next step to create a successful statistical Forex strategy. But what data should be used for these statistics? Is it a good idea to record bare chart data? Should you gather any additional information? Here are some of the types of statistics that can be used in the process:
Pure market quotes
This includes high, low, close, and open rates for bars and bid or ask rates for ticks (if you think that tick-based statistics is a good idea).
This method of statistics gathering is the most obvious. You gather the market quotes, then compare them with the current situation and decide whether to buy, sell, or hold. But there is a problem with changes in the ranges of quotes. For instance, 1 year ago EUR/USD was trading near 1.1000, a month ago it was above 1.2000 level, so the data gathered in another price range would be completely useless.
Alternatively, a normalization of some sort can be used to store such statistics — e.g., the EA could store not a quote itself, like 1.2404, but its ratio to the previous bar's open price — 1.2404/1.2423 = 0.99847058. This way, you will have data that is informative in any price range but still uses no indicators or other complex calculations.
Indicators
These are probably the best data to be recorded as the statistics. Even standard MetaTrader indicators allow recording a lot of information and then using it to compare with real-time current market situation. With a large share of indicators, a normalization process similar to the one used with the raw market data will be necessary.
It is probably a good idea to use indicators that change in the certain range — like RSI, DeMarker, Stochastics, Larry Williams' Percentage Range, Money Flow Index, etc.
The length of the arrays of the indicator values recorded for each tick or bar is also an important parameter of the statistics gathering. Remember that the longer this length is, the less informative these statistics become. It is probably better to use a single value of each indicator for a given bar or tick.
Additional information
Such information may include the time of the day to capture the trends and patterns that are specific for some trading sessions only.
Another parameter that falls into this category is the day of the week — trading usually differs depending on the busyness of the day (often, with less price action on Fridays). The statistics can also note if the day is some major holiday, current daylight-saving time mode for the major countries, and the volumes of the trades (although in Forex, they are not very informative).
Complex calculations
This may include not only calculations based on the market data and indicators, but also incorporate additional information such as time and the day of the week into the calculations. In this case, the produced number-formatted statistics would be easy to compare to the real market data.
Considering the performance levels of PCs nowadays, it wouldn't be a hard task to incorporate even the most complex calculations in a MetaTrader expert advisor that utilizes a statistical Forex strategy. Additionally, these calculations can be accompanied by various pivot points and support/resistance levels to help setting positions' parameters.
Decision making
When a completed strategy has enough statistical information and a sample from the current market situation, it should have some methods of comparing the statistical information with the sample and making the decision regarding its further actions on the market. For the majority of systems, these decisions would be limited only to buy, sell, hold, and close previous position actions, while more advanced systems may include position adjustment actions into their arsenal.
The most obvious way for a statistical Forex system to make its decision is to calculate the differences between the sample data and the data stored in the statistics, and the lowest difference will point out the most probable recorded outcome. For example, if you recorded RSI indicator values and the current RSI reading is 75.2, while the lowest difference from your statistics is 0.1 and it suggests that the price goes down near that RSI level, then your system should probably generate a sell signal. This method looks simple, but it is also flawed because accurate comparison of multiple parameters of the two samples is impossible.
In general, quotes-derived parameters should be compared with some method similar to Euclidean distance (best distance, average distance, etc.) with possible weighing of the different parameters according to their importance. Meanwhile, the comparison of the time- and fact-based parameters should be rather strict — e.g., if you recorded some information specific to Fridays and it is Monday today, then you should disregard this information.
Another noteworthy idea regarding decision-making would also require a special statistics gathering method used in the system. Using self-organizing maps (or Kohonen maps) is a popular decision making method that is widely used in finance. Unfortunately, our tests of self-organizing maps within statistical Forex systems (in the form of MetaTrader expert advisor) didn't bring any interesting results. There are many other ways of utilizing self-organizing structures to store and compare quote-derived statistical information, but their complexity seems to be excessive for such systems.
A chart-to-chart comparison can be used if the statistics stored is raw or normalized market data, which brings a lot of opportunities based on the graphical chart analysis and difference calculation. It is also necessary to note that such comparison would require a lot more CPU power and time to complete. It would also produce a more long-term aimed result than the immediate decision that would be true for the next bar or candle.
It is likely best to store statistics in three separate "containers", where statistics in the first container would correspond to the buy action, in the second — to the sell action, and third — for hold action. Finding the best Euclidean distance for the current market sample among all three "containers" gives you a hint for your next action. In this case, it is more important to collect the right data and to format it in a right way for further comparison.
Complexity of self-learning system
When traders think about a Forex system, they often come to a conclusion that a simple trading system cannot be profitable, because it doesn't capture all the market parameters that influence the behavior of the currency pairs. To some extent, this is true, however, the complexity of a Forex system should be limited. With statistical Forex systems, the complexity of its different parts may vary.
- The amount of information and the number of data types that is gathered for the statistical system are important parameters, which if increased produce a more complex system. One may decide to gather information from not just a single timeframe, but rather statistics from several timeframes and to record not only the price quotes (or OHLC data for bars/candles), but also many indicators, calculations, and other parameters. This will lead to a rather large database of statistics that would be hard to interpret correctly, but if interpreted correctly, it will surely yield better results than a simpler strategy.
- Gathering statistics before running the strategy (pre-training) is extremely important and is a necessary step, but making a system that can continue gathering statistics when it runs in a real-time is also important. It shouldn't add much complexity to your expert advisor, but it will help to react faster to changes in the market. Of course, such on-the-fly data gathering cannot substitute the pre-training process.
- On-the-fly change in the way of the system interprets and compares statistics is an advanced method to increase the complexity of your statistical system. Implementing several functions to compare the past and the current data can be helpful if you have some method to choose from these functions.
- The complexity of order management and position handling is another field for the system improvement and upgrading, but it can hardly be connected to the gathered statistical. Unless your system uses real chart-to-chart comparison for position and order adjustment. In other cases, simple buy/sell/hold decisions are the best market actions available for statistical Forex systems.
Those are the most obvious ways to make your trading system more complex. Some minor changes can also improve it to make it react more flexibly to the market volatility and evolution.
Example of a statistical expert advisor
After saying so much about the statistical Forex systems, it is time to give an example of one. But first, you need to know that this exert advisor wasn't profitable during tests "as is" — it had its losses and gains, but spread losses took over eventually, so it won't be a good idea to use it on your real money account. This expert advisor is good only as an example of an actual statistical Forex system. It uses Tom DeMark's pivot points calculated over the last 5 bars, which are then normalized by subtracting the current Open price. It was tested on the EUR/USD H1 chart, but it can be used on any other currency pair and timeframe. It consists of two .mq4 files:
The first file is StatGathererExample.mq4 — as is evident from its name, this MetaTrader EA will only gather statistics. Run it via Strategy Tester on a history period of 1-2 years. It doesn't have to be high quality historical data, because it has nothing to do with price ticks and uses just OHLC data from bars. Just make sure that your chart has enough bars. This EA gathers statistics over a period of time and stores it to the MapPath file (called "rl.txt" by default) in your /tester/files/ folder. Copy it to /MQL4/Files/ for further use by the actual expert advisor.
The second file is StatRunnerExample.mq4 — this EA is used for the actual trading and
You can freely use these examples to construct your own statistical Forex systems. A more advanced version of these example EAs is presented below.
Further development of a self-learning EA
In short, the
You can download RowLearnerCorrectedFinal for free.
How does the self-learning EA work?
The expert advisor is available only for MetaTrader 4 but could be quite easily converted to MT5 (especially, for hedging mode). Because wasn't profitable during the test, we didn't add it to the list of MT4 EAs on our website. Its main purpose is to study the concept of self-learning expert advisors.
How to use it:
- Download and copy to the /MQL4/Experts/ subfolder of your MetaTrader's data folder.
- You will need to create a
map-file with the market "knowledge" for the EA. - Open the EA in MetaEditor and comment out lines 62:
LoadKohonenMap();
and lines 92–10+:
FormVector(vector); MapLookup(vector, abmu); TLots = (MathFloor(AccountBalance() * 1.5 / 1000)) / 10; NO = 0; if (TLots < 0.1) return(0); if (TLots > MaxLotSize) { NO = MathFloor(TLots / MaxLotSize); if (NO > MaxOrdNumber) NO = MaxOrdNumber; TLots = TLots - NO*MaxLotSize; TLots = TLots * 10; TLots = MathFloor(TLots); TLots = TLots / 10; } if ((abmu[0] < abmu[1]) && (abmu[0] < abmu[2])) Buy(); else if ((abmu[1] < abmu[0]) && (abmu[1] < abmu[2])) Sell();
- Compile it.
- Run it in Strategy Tester on EUR/USD, H1, Open prices only, any period (the longer the better).
- rl.txt file will be created inside /tester/files/ subfolder of your MetaTrader's data folder.
- Copy it to /MQL4/Files/ folder.
- Open the EA in MetaEditor and uncomment lines 62 and 92–109.
- Compile it.
- You can now attach it to a EUR/USD @ H1 chart. It will trade and continue learning. On deinitialization, it will save its data to rl.txt located in your /MQL4/Files/ folder.
There are three known problems with this Forex expert advisor that make its use unprofitable:
- Basing the market knowledge on the OHLC patterns of the previous bars proved to be ineffective.
- If the market significantly changes its behavior compared to the period of learning, the EA fails to trade profitably.
- If the range of prices changes significantly, the previous data proves to be pretty useless.
Possible ways to improve this
- Find better market parameters to store as the EA's knowledge (
Heiken-Ashi , various pivots, indicator values, etc.) - Provide better learning pattern — not just buy, sell, or hold if the
Close-Open difference of the next candle after the pattern was inside/outside a certain range of pips. - Find better coefficient system to weigh the older data compared to the newer data.
- Add more maps with different data that cannot be compared with the data stored in the primary map (for example, a separate map for tick volume statistics).
- Use other distance measuring method instead of Euclidean (Mahalanobis, Manhattan, Chebyshev, Minkowski, etc.)
- Find a different way of applying the distance — using the average distance, using the shortest distance, using the median distance, etc.
You can freely modify, upgrade, and use the pieces of code from this EA as long as you leave the credit to the original author inside the .mq4 file. If you manage to get something worthy out of it, please let us know.
If you have any questions or comments regarding the concept of