Forecsting of Hydrological Time Series Data with Lag-one Markov Chain Model

Malaysia’s climate is overwhelmingly characterised by uniform temperature, high humidity, copious rainfall and light winds. As in any parts of equatorial doldrums, intermittent rain and sunshine within a day is a norm, as such, a long period of clear sky is rare. One of the commonly identified problems in water resource management in Malaysia is unavailability of long-term historical records. Where there is data availability, it was found to be discontinuous. Since forecasting using rainfall data requires long-term continuous recorded historical data, a stochastic type of model simulation that can cope with these situations is proposed.

Malaysia's climate is overwhelmingly characterised by uniform temperature, high humidity, copious rainfall and light winds.As in any parts of equatorial doldrums, intermittent rain and sunshine within a day is a norm, as such, a long period of clear sky is rare.One of the commonly identified problems in water resource management in Malaysia is unavailability of long-term historical records.Where there is data availability, it was found to be discontinuous.Since forecasting using rainfall data requires long-term continuous recorded historical data, a stochastic type of model simulation that can cope with these situations is proposed.

MODEL CONCEPTALISATION
In this study, the available hydrological records used were usually less than 100 years.In fact most of the records are less than 25 years.Even in the case of the longest record, the most extreme event such as drought or flood can be very different in magnitude with the next most extreme event.It is often debatable whether the extreme event is representative of the period recorded.The severity of a long drought can change drastically by adding or subtracting one year of its duration.To enable estimates of likelihood of severe events to be made, stochastic process is simulated where long sequences of events are generated.If the generation is done correctly, the hypothetical sequences would have as equal likelihood of occurrences in the future as in the observed records.
Any hydrological time series data are typically supported by two contributing factors namely random and persistence (stochastically deterministic) factor.Rainfall is regarded as the most basic weather variable, independent of temperature and evaporation.Therefore, generation of long-term synthetic rainfall data can provide basic sets of weather variable for Planning and operation are important elements in water resource management.Rainfall forecasting is one of the conducts commonly used to extend the lead-time for catchments with short response time.However, it is difficult to obtain a high degree of accuracy in rainfall forecasting using deterministic models.Therefore, a probability-based rainfall forecasting model, based on Markov Chain provided a better alternative due to its ability to preserve the basic statistical properties of the original series.This method was especially useful in the absence of long-term recorded data, a rampant phenomenon in Malaysia.Comparison of statistics in the generated synthetic rainfall data against those of the observed data revealed that reasonable levels of acceptability were achieved.

Forecsting of Hydrological
long-term forecasting.In Lag-One Markov Chain Model, one-year historical data can be used to forecast the subsequent year's rainfall data.Similarly if a three-year historical data are used, we can forecast the next three-year rainfall data.
Daily rainfall data used in this model was available from three types of recordings namely the manual, automatic and data logger.Rainfall data from chart and data logger recordings were selected based on the period of data provided by the recorder.Data from either the data logger or manual method were selected to replace the non-available data in the chart recorder method.If all these three methods failed to supply the data for any particular day then the data from the nearby rainfall stations were used.
The time series rainfall data used in this study must satisfy certain requirements.The aim of the analysis is to find scale-independent properties.Thus, the series should span over a large range of scale.This means that the data has to be continuous and of high temporal and intensity resolution.All intensity levels must be correctly represented in the data.However this is difficult to be accomplished when measuring at high intensity resolution.It has been shown that an insufficient intensity resolution, which leads to erroneous representation of especially low-intensity rainfall, might attract artificial breaks in scaling behaviour.
In this study, analysis was made on daily rainfall data from 1974 to 2003 for eight rainfall stations in the Gombak river catchment areas in Selangor (one of the states in Malaysia).The Gombak river is geographically located at latitude 3° 8' 53" north of the Equator and longitude 101° 41' 44" east of the Prime Meridian on the map of Kuala Lumpur.The river drains from the main river which is the Klang river.The Klang river drains an area of about 1200 2 km extending from the headwaters in steep mountain forests of the Main Range in Peninsular Malaysia, to the river mouth spanning over a total length of 120 km.The location map of the catchment areas is as shown in Figure 1.
Gombak river catchment area was selected because of the long period of data available.For automatic station, 28 years of data were recorded.For data logger stations, different period of recorded data was found as the loggers were originally installed on different dates.Detail information about the location, period of data recorded and type of recordings is as shown in Table 1.Types of rainfall gauge available at the study area and its capability are shown in Table 2.
The types of data used in this study are as follows: 1. Automatic Station -30 year period of daily rainfall data.
2. Data Logger Station -Period of daily rainfall data will depend on the date of the instrument installed.
Analysis was performed on the total monthly rainfall figures (mm): maximum rainfall for every month (mm); minimum rainfall for every month (mm); mean of the rainfall for every month (η); skew of rainfall amount for every month (γ); standard deviation rainfall amount for every month (σ) and coefficient of variation (C v ).Benson and Matalas (1967) and Solomon (1976), found that regionalized parameters are more suitable than single site parameters because regionalization reduced operational bias due to temporal and spatial variations inherent in historic sequences.Based on these findings, the use of catchment area's average rainfall data instead of rainfall data from individual stations was found to be more desirable in stochastic method of generating synthetic data.The usage of catchment area's average  rainfall data allowed better approximations of rainfall stochastic properties.The catchment's average rainfall is computed using Thiessen Polygon Method (Thiessen 1911).Figure 2 shows a sample of Thiessen average rainfall data determination.

METHODOLOGY
Baki (1997) adopted the approach used by Adamowski and Smith (1972) by using runoff generation type model to generate daily rainfall data.A first order Markov model (also known as the Lag-one Markov Chain) was used to generate standardized daily rainfall data.
The outline of model operations was as follows: 1.The daily recorded data was calculated: mean and standard deviation (σ i ) of everyday (i) in a year.
2. The overall serial correlation (r i ) of the recorded data was also calculated.
3. The standardized daily rainfall (Z i ) was computed.
4. Normally distributed random numbers (t i ) with zero mean and unit variance are generated.
5. All negative daily rainfall values were set to zero. Figure 2. Average rainfall in the catchment areas computed using Thiessen (1911).

DATA ANALYSIS AND RESULTS
The data generated was analyzed using statistical components.Table 3 shows the comparison of statistical parameters obtained from both recorded and generated data.It was found that the daily means were modeled satisfactorily.The daily standard deviations achieved were almost reasonable.The daily estimated statistical parameters obtained are shown graphically in Figures 3 to 5. Figure 3 shows the comparison of mean values for both recorded and generated data.Figure 4 shows the comparison of standard deviation values for both recorded and generated data.skew values.The daily mean values obtained from generated data were found to be similar to the mean values of recorded data.It was also found that the model generateds reasonable data sequences of wet and dry days.However, standard deviation values of generated data were found to be lower than standard deviation values of recorded data.
The negative rainfall values presented in the historical rainfall records indicated missing rainfall data.Setting the negative rainfall values to zero affected the skew values produced by the proposed model.The skew values of generated data were found to be much lower than the skew values of recorded data.The differences of these values indicated that the generated data was almost normally distributed.
In generating synthetic daily rainfall data, Fortran 90 could not detect the presence of leap years which occurred in the 26 years of rainfall records analyzed.By assuming that each year in the 26 years of rainfall record was a leap year i.e. every year has 366 days, Fortran 90 produced additional numbers of generated data.Nevertheless, due to its small amount, the effect of these additional numbers of generated data were not significant and could be neglected.

CONCLUSION
The synthetic daily rainfall data forecasted from Lag-one Markov Chain model gave an approximation of the statistical properties on the historical record available.The statistics of the generated data seemed to be sensible in generating reasonable values of daily rainfall data, monthly and annual mean values, daily maximum values of rainfall data, monthly maximum and minimum values of rainfall data, as well as the length of sequences for both dry and wet days.Thus, the proposed model was able to give a quick analysis of daily rainfall data in stochastic hydrology.

APPLICATION AND RECOMMENDATION FOR FURTHER STUDY
Daily rainfall data analysis using the proposed method could be useful for agricultural planning in Malaysia, as the method enables the planners to regionalize the characteristics of wet and dry days.The proposed method could also be potentially applied as an early warning system of probable natural disaster.The model could also be tested on other climatic data such as temperature and wind.

Figure 5 Figure 3 .Figure 4 .
Figure 3.The Mean comparison of recorded and generated data.

Figure 5 .
Figure 5.The Skew comparison of recorded and generated data.

TABLE 1 .
Location, station number, period of data recorded and type of recording methods at rainfall stations in the Gombak river catchment area.

Table 2 .
Type of rainfall gauge and capability available at study area.

Table 3 .
Comparison of statistics on recorded and generated data.