Spatial and Multidimensional Visualization using SOVAT
SOVAT (Spatial OLAP Visualization and Analytical Tool) is a novel decision-support system developed by the University of Pittsburgh [4]. SOVAT combines two key technologies: On-Line Analytical Processing (OLAP) and a Geographic Information System (GIS) to provide advanced visualization and analyses for large multidimensional data sets. OLAP technology supports multidimensional data modeling that allows for rapid queries of multidimensional data and enables powerful analysis and discovery through a visual display on easy-to-use graphical user interfaces. With OLAP, data are represented conceptually as a multidimensional cube which enables the user to view different dimensions of multiple datasets and then query several dimensions at once. OLAP supports several distinct functions for data retrieval and analysis, such as: drill-up (decreasing granularity, for example, from data by country to data by province), drill-down (increasing granularity, for example, from province to country), and slice and dice (retrieving a sub-section of data, for example, data for May and June for only one province). All of these functions act on the multidimensional data cube and are performed almost instantaneously.
The architecture of SOVAT is shown in Figure 9. SOVAT architecture consists of a back-end engine and a front-end application. The back-end engine performs data warehouse (OLAP) functions to process the numerical data and GIS functions to do spatial analysis. An advanced integration module in the back-end engine combines the outcome of OLAP and GIS functions. SOVAT is also equipped with an advanced linkage module capable of integrating an array of health-related databases (inpatient and outpatient hospitalizations; cancer, birth and death registries) and socio-economic data sets. Finally, a front end application fetches outcomes of the integration module and visualizes the outcomes to end users. The current version of SOVAT is a desktop application that runs either on a stand-alone PC or on a client-server environment (SOVAT interface runs on the desktop and the database engine runs on the server, hence the terms "front-end applications" and "back-end engine"). In its current version, SOVAT is not designed to run over the Internet.
Through an easy-to-use point-and-click interface, even a novice user is able to conduct complex queries quickly and effortlessly using SOVAT. SOVAT DSS has been designed to present health information in a way that will facilitate planning efforts among community stakeholders with diverse interests. The two main components that make up SOVAT interface are the navigation and visualization component [7], as displayed in Figure 10. Navigation components (on the left side) display cube dimensions and their members as a hierarchical tree. These trees provide simplicity for users to browse and select dimension members to be included in their queries. A search engine is provided as a complementary feature to help users find a member in a dimension with large members. The visualization component (on the right side) consists of spreadsheets, charts, and maps. SOVAT provides a spreadsheet to display query results, translates the spreadsheet data into charts, and visualizes the results onto maps. Several advantages of having different approaches include the ability to easily recognize certain trends using charts and the ability to recognize geographic patterns using the maps. The map in SOVAT is a real vector GIS (not a bitmap image) that can be used to query an area by clicking the map, drill down to lower geographic levels, drill up to upper levels, and querying neighbourhood areas. SOVAT is also equipped with the ability to export query results to be used in different applications. For example, users can export the spreadsheets into Excel format and save the grid/map as an image to be displayed in other applications. In addition to these functionalities, the queries can also be saved to be used at a later time.
Data Sets
The Indonesian data sets are comprised of demographic data, health indicators, and spatial data (maps). The data sets come in different levels of detail and were collected using different collection methods. Indonesia is divided into provinces. Provinces consist of regencies (kabupaten) and cities (kota) which together are called counties in this paper. One level below counties is sub-district (kecamatan), while the lowest administrative level – that is the one below sub-district – is called village (desa). The definition of village applies to both rural and urban areas. Political changes in Indonesia since the last decade have affected the administrative division, with the tendency of a growing number of provinces and regencies. The latest data from the Ministry of Internal Affairs show that Indonesia currently has 33 provinces and 445 counties [8]. There are more than 75 new counties since the year 2000, an increase of more than 20%.
Statistical Data: Census and Village Statistics
Data sets from the Indonesian Census and the Indonesian village statistics were used in the case studies. Similar to many countries, a census in Indonesia is conducted every decade. Indonesia conducts a series of population, agricultural, and economic censuses. The population census takes place in the years ending with "0"; the agricultural is conducted in the years ending with "3"; while the economic census is held in the years ending in "6" [9]. Of these censuses, the population census is the most comprehensive and is aimed at gathering characteristics of the Indonesian population such as gender, age, marital status, education level, and occupation. The 2000 Population Census is the latest census and the first census conducted using complete enumeration. Since the 2000 Census was aimed at providing users with small area statistics, statistics of villages can be established from the data collected. In addition to the censuses, the Indonesian Bureau of Statistics also conducts an intercensal population survey (SUPAS) in between the two censuses [10]. The survey is designed to collect the population statistic that is comparable to the population census. Another approach to data collection, rather than to collect data on each household and individual, is to collect statistics on villages [11, 12]. Village statistics, called Potensi Desa (PODES), are the main data source of this project. Village statistics provide information that otherwise is not available. Among the objectives of village-level data collection are:
• Providing information of potential and actual development in the village by providing socio-economic conditions and available facilities;
• Providing a database for regional planning as well as a progress report on the development at the village-level; and,
• Providing core data of the small area statistics.
The information in village-level collection includes: the number of the population and households, the housing and environmental data, the education and health-related data, socio-cultural information, recreation and sport facilities, transportation, and communication.
While demographic data are mostly from the Bureau of Statistics (BPS), more specific health indicators are available from the Ministry of Health [13]. These data include: a general mortality rate, an infant mortality rate, life expectancy, top diagnoses for in-patients and out-patients, and morbidity of infectious diseases Although these data are not used for this project, it is a potential source to use in future works.
Spatial Data
SOVAT uses spatial data in polygon format that consists of administrative-boundary maps. The map is rendered using certain color schemes to display the results of OLAP queries. For example, in Figure 2, the darker the color, the higher are the results of performed queries. In addition to the polygon data, SOVAT can also have additional layers using lines and point data, for example to represent rivers, streets, cities, or industrial places. The additional layers can be used to perform other spatial analysis such as buffering.
The digital map of Indonesia is provided by the BPS. The existing spatial data come from four different levels: from province level down to village level. However, due to the low accuracy of the village-level spatial data, we chose to use one level higher than the village – that is the sub-district (kecamatan) level.
Data Linkage and Multidimensional Modeling
Geographic location was used as the primary linkage variable that connects statistical and spatial data sets. The linking process is done using an administrative code that is uniquely defined for every administrative unit. Standardization is the key for the linking process. Most developed countries have a uniform identification for every geographic entity that can be used by the government and private sector. For example, in the United States there is the Federal Information Processing Standards codes (FIPS codes), a standardized code for every geographic entity in the US issued by the National Institute of Standards and Technology (NIST). This code is used by the US Census Bureau and other government agencies that generate statistical data sets.
Unfortunately, there is no such uniform identification standard for geographic entities in Indonesia. The lack of a uniform code leads to the use of geographic names (such as the name of counties and villages) as the key identifiers of the geographic entity. Geographic names are very susceptible to typographical errors and inconsistent spelling. As a result, the same geographic entities can be written differently in different reports even if the reports come from the same government institution (Bureau of Statistics). Several solutions were tried for this problem. Some of the data were corrected using a pattern-matching approach, while the remainder that could not be recognized using pattern matching were manually corrected.
The need of a uniform code is more important in light of the rapid changes of administrative boundaries in the past decade. The problem with spatial data becomes more complex since the updating process of spatial data is not as fast as the process of administrative changing. While the administrative code is easily updated with more recent changes, the map is still outdated. For example, the administrative boundaries are changed up to 2007, however the latest version of the map we use is from 2000.
Since there is no official new map released, there is no other option rather than to translate the data into an older map.
As shown in Figure 11, the geographic unit provides a linkage between spatial data and other numerical data sets. Multidimensional database design was conducted in order to develop the database capable of supporting multidimensional analysis. The "measures" in this model is a statistical number about the geographic unit such as the number of population in a sub-district. The "dimension" is an independent variable that allows us to view the information from different angles. For example, we can view the number of incidences based on the disease type, urban-rural designation area, or time. We can also simultaneously "slice" and "dice" the information using all available dimensions (for example, the number of malaria incidences in a rural area in the province of Papua in the year 2000).