doc/concepts.html

   1 <HTML>
   2 <HEAD>
   3 <TITLE>
   4 f(GIS) concepts
   5 </TITLE>
   6 </HEAD>
   7 <BODY>
   8 <H1><FONT SIZE=+3><I>f(</I></FONT><FONT SIZE=+2><B>GIS</B></FONT><FONT SIZE=+3><I>)</I></FONT>
   9     concepts</H1>
  10
  11 <!-- CONTENTS NUMBERED NESTED --><OL>
  12 <LI><A HREF="#toc_section0">Data model</A>
  13 <OL>
  14 <LI><A HREF="#toc_section1">Functional model</A>
  15 <LI><A HREF="#toc_section2">Layer classification</A>
  16 <LI><A HREF="#toc_section3">Implementation of data model</A>
  17 <LI><A HREF="#toc_section4">Regions and chartographic projection</A>
  18 </OL>
  19 <LI><A HREF="#toc_section5">Program design</A>
  20 <OL>
  21 <LI><A HREF="#toc_section6">Layer as Tcl object</A>
  22 <LI><A HREF="#toc_section7">Planchet - object for displaying maps</A>
  23 <LI><A HREF="#toc_section8">Low level objects</A>
  24 <LI><A HREF="#toc_section9">GIS operation</A>
  25 <LI><A HREF="#toc_section10">Utilities</A>
  26 <LI><A HREF="#toc_section11">Data access library</A>
  27 </OL>
  28 </OL>
  29 <!-- END CONTENT -->
  30
  31
  32
  33 <A NAME="toc_section0"></A><H3>Data model</H3>
  34
  35 GIS is a software system for processing spatial data. So, adequate model
  36 of spatial phenomena is most important thing for GIS.
  37 <P>
  38 It should provide way to represent spatial phenomena in computer memory,
  39 allow to perform desired operation on this representation and let user
  40 see the results in form, he used to. Ideally, GIS system should hide
  41 complicated issues of internal data storage from user as well as text
  42 processor hides questions of font rendering or kerning or SQL database
  43 hides actual file layout and search technologies, providing simple,
  44 but powerful relational operations instead.
  45 <P>
  46 Many modern
  47 GIS systems, especially vector based, like ARC/Info, try to
  48 represent map of spatial phenomena rather than spatial phenomena
  49 itself. It leads to overcomplication of storage format and processing
  50 algorithms, and makes user worry about such technical things as polygon
  51 topology, which are completely irrelevant to his problem (say geology
  52 or soil science), as font rendering hints and kerning is irrelevant to
  53 contents of article, typesetted with some partcular font. Maps are
  54 tool for analyse spatial data, widely used, but no more than tool.
  55 GIS system should deal with them, becouse it is neccesary to use
  56 existing data, which are represented on maps, and present results to
  57 user in understandable form of maps, but while processing data we should
  58 take into account properties of actual phenomena, rather then properties
  59 of chartographic representation like polygons.
  60 <P>
  61 <A NAME="toc_section1"></A><H4>Functional model</H4>
  62
  63 In f(GIS) we use term <I>layer</I> to denote computer representation of
  64 spatial phenomena. We define layer as function which maps geographical
  65 coordinates to value of some property. Closest analogue of our
  66 <I>layer</I> is <I>spatial variable</I> in geostatistics.
  67 <P>
  68 Layer values can be either real numbers or elements of some finite sets.
  69 If you want to study more complicated spatial phenomena, it is better
  70 to describe it as set of layers rather then individual layer with
  71 structured value. Obvoisly you'll not need values of all attributes in
  72 question for all desired calculations, and separating them makes your
  73 actions more clear.
  74 <P>
  75 Becouse layers are defined as functions it is theoretically possible to
  76 apply well develped mathematical apparatus of functional analysis to
  77 them.
  78
  79 <A NAME=layerclass></A>
  80 <A NAME="toc_section2"></A><H4>Layer classification</H4>
  81 Layers can be classified by their area of definition and their set of
  82 values. By area of definition we can distinguish between:
  83
  84 <DL>
  85 <DT> Two-dimensional layers
  86 <DD> which are defined on some contineous area.
  87   It is most frequently used type of layers for physical geography.
  88   Relief and soil type are perfect examples of such layers. Area of
  89   definition of two-dimensional layers is usially finite, limited by
  90   boundaries of study area or by availability of data. Areas which are
  91   outside of area of definition are called <I>offsite</I> areas.
  92 <DT> One-dimensional layers
  93 <DD>  are defined on set of lines within study area. Examples of such
  94    layers are hydrography or railroad network.
  95 <DT> Zero-dimensional layers
  96 <DD> are defined on set of separate points. This layers can be used
  97   for store information about sampling points or weather station
  98 networks.
  99 </DL>
 100
 101 By the set of values layers can be classified to:
 102 <DL>
 103 <DT>Numeric layers
 104 <DD> whose values belong to some contineous interval on numeric axis,
 105 for example relief layers, which have any value between lowest and
 106 highest altitude in the study area.
 107 <DT>Classification layers
 108 <DD>which have finite set of values. f(GIS) allows to use arbitrary
 109 strings as elements of such set. Soil map which has names of soil series
 110 as values can be used as an example.
 111 </DL>
 112
 113 This simple classification covers all theoretically important types of
 114 layers. Dealing with implementation we'll have to classify layers
 115 further, for example, according to source of thematic data. But for
 116 data analysis it is not significant whether data are stored in disk
 117 file or come from some data asquition system on the fly. It is only
 118 important to know type of values and whether they are defined for
 119 any point of study area or not.
 120
 121 <A NAME="toc_section3"></A><H4>Implementation of data model</H4>
 122
 123 Spatial phenomena seldom can be expressed by some mathematical equation.
 124 Even if they can, finding of this equation is usially aim of analysis,
 125 not a starting point. So, we need to store values of layers in any
 126 point they are defined. Raster is natural way to store data for
 127 two-dimensional layers.<P>
 128 <FONT SIZE=-2> (Raster is just big matrix of numeric values, stored
 129 in special format to reduce storage space. If raster is used in GIS
 130 processing, it should be known, how to find row and column numbers given
 131 real word coordinates and vice versa)</FONT>
 132 <P>
 133 f(GIS) uses raster data format developed for EPPL7 GIS system. This
 134 format have several advantages - it is compressed and allows random
 135 access at the same time and it is able to deal with very fine
 136 resolution. For example Landscape map of exUSSR with spatial resolution
 137 (raster cell size) 500m and more than 3000 distinct kinds of landscapes
 138 occupies about 9MB of disk space. Due  to such properties of data
 139 format, it is advisable to work with raster cell size significantly less
 140 then known accuracy of data. Resolution of maps can be compatible with
 141 resolution of your scanner and printer - modern processors are powerful
 142 enough to bear it, so raster doesn't mean loss of precession.
 143 <P>
 144 This data format is able to hold values in range 0..65535. While it is
 145 always sufficient for classification layers, it can look that for
 146 numeric layers it is better to use real numbers. But data always have
 147 finite accuracy, which is usially less than 1/65535 of total range,
 148 and even if we can take measurements with larger precession, we should
 149 take into account spatial variability within one raster cell.
 150 <P>
 151 For example, if we have map of relief of Russia with 500 meter cell,
 152 we need to represent range from -28 (Caspian coast) to 5642 (Elbrus)
 153 meters above sea level. Thus smallest usable unit is about 10 cm.
 154 Some points' altitude may be measured with more accuracy (for example,
 155 triangualtion points), but each raster cell represents 500x500 meters
 156 square which always would have more than 10cm of variability.
 157 Even if value of our layer should have more precession in some part
 158 of its range, we could use non-linear (for instance logarithmic) mapping
 159 of raster cell values to layer values.
 160 <P>
 161 But even with compression, raster files occupy significant storage
 162 space. So, we should avoid duplication of them if possible. Thus we
 163 introduce concept of <I>reclass tables</I>. Reclass table maps values
 164 of raster cell to another set of integer in arbitrary order. Don't mix
 165 reclass table with mapping function which is used for convert raster
 166 cell values to real units of numeric layer. For example if we have
 167 statistical data of populations by county and want to create population
 168 them as map, we can use reclass table over county map. Several counties
 169 with different names, which have distinct values in county map raster,
 170 can be mapped to same class in population density map if their population
 171 density is same.
 172 <P>
 173 Point layer is just list of triplets &lt; X, Y, Value &gt;.  Typically
 174 point layer doesn't contain more than few thousands of points, so there
 175 is no need to optimize performance or storage space.
 176 <P>
 177 Natural storage form of one-dimensional layer is vector format.
 178 It is most questionable area in current fGIS design. There are a lot of
 179 advantages of EPPL7 vector format (compactness, speed of processing),
 180 but it have only one drawback, which overcomes them all - it can
 181 associate only one value with whole vector object (polyline). But
 182 if we are talking about the function, defined on set of lines, whe
 183 should be prepared that this function (stream depth for instance) would
 184 vary from one end of line to other.
 185 <P>
 186 It is also a question how intersections and joints of lines should
 187 be stored/interpreted, becouse most interesting network analysis
 188 algorithmes require ability to cross joints and intersections.
 189 <P>
 190
 191 <A NAME="toc_section4"></A><H4>Regions and chartographic projection</H4>
 192
 193 Study area usially have hierarchical structure. For example Russia
 194 can be subdivided to administrative regions, which consists of
 195 districts. United States consists  of states, which are divided into
 196 counties. Often study is concerned only with one of such hierarchy
 197 levels, but there are opposite examples.
 198 <P>
 199 Each hierarchy level have its typical data accuracy (which is rough
 200 representation of map scale in GIS world, becouse GIS maps can be
 201 arbitrarily scaled, but only certain scale range make sense for
 202 particular data accuracy), chartographic projection (especially
 203 significant for large areas like whole country or continent).
 204 On thematic maps like soils or vegetation, different classifications
 205 can be used in different scales.
 206 <P>
 207 So, f(GIS) uses concept of <I>regions</I>. Region is set of layers,
 208 which cover almost same territory, have exactly same projection and
 209 simular spatial resolution. Regions can be nested, i.e. region of
 210 Russia can have several subregions of administrative regions, which
 211 have subregions of districts etc. In this case there should be <i>base
 212 layer</i>
 213 which have subregion names as values. When copiing data between regions
 214 f(GIS) authomatically performs neccessary projection and resolution
 215  conversion using base layer as reference. Classification conversion,
 216 if neccessary, should be performed by user, becouse it requires
 217 knowledge in problem area.
 218
 219 <A NAME="toc_section5"></A><H3>Program design</H3>
 220
 221 f(GIS) is designed as set of extensions to Tcl programming language
 222 and set of independent utilities, which perform most time consuming
 223 raster and vector processing tasks. Thus long operations can be launched
 224 in background as separate while user continues to view/analyze data in
 225 main program.
 226 <P>
 227 From users point of view, fGIS is Tcl application which allows him
 228 to operate with set of layers from GUI as well as from Tcl command line.
 229 It is essential design constraing that there should be no operation,
 230 which can be performed from GUI, but couldn't be from Tcl script. There
 231 should be way to automate everything. Other way around is enusred by
 232 very nature of Tcl. Nothing prevent user, which have direct access to
 233 Tcl interpreter from creating new button or menu item and binding any
 234 Tcl command to it.
 235 <P>
 236 From programmers point of view, fGIS consists of several abstraction
 237 levels, all available for extension and modification. And I think that
 238 every fGIS user can eventually become programmer, if he discoveres need
 239 to implement some, just invented, data analysis algorithm, or customize
 240 graphical user interface to his needs. Relationship between fGIS
 241 abstraction levels is shown on this figure.
 242 <P>
 243 <IMG SRC=levels.gif ALIGN=center>
 244 <P>
 245 <A NAME="toc_section6"></A><H4>Layer as Tcl object</H4>
 246 Layers in fGIS behave like objects in object-oriented programming
 247 language. Once created with <B>layer</b> command they become tcl
 248 commands itself (i.e. name of layer can be used as Tcl command),
 249 just like Tk widget. Options of layer command allow to manipulate
 250 properties of layer and store layer definition to file. This file
 251 is just Tcl script which creates neccessary subobjects and invokes
 252 appropriate command to create layer.
 253 <P>Layer have following properties
 254 <DL>
 255 <DT>It can return value by coordinates
 256 <DD> It is why whole thing is about
 257 <DT> It can one or more ways to draw itself
 258 <DD> Raster layer can be drawn in opaque colors, so only offsite area is
 259 transparent or using transparent monochrome patterns, thus allowing to
 260 overlay one raster over another. In most existing raster GIS, like
 261 Idrisi only vector or point layers can be overlayed over raster.
 262 In f(GIS) <B>any</B>
 263 layer can be drawn as overlay
 264 <DT> It has underlying data source
 265 <DD> Data source for layers typically consist of some object which can
 266  return integer value given coordinate (raster file, combined with
 267 reclass table, for example) and <i>legend table</i> or <i>map
 268 function</i> which maps values of underlying raster object to
 269 thematically meaningful values.
 270 <DT> It has visualization parameters
 271 <DD> visualization fo layer is controlled by several parameters such as
 272 color palette, pattern set, flag, indicating if boundaries between
 273 classes are drawn or not. All these parameters can be changed
 274 interactively.
 275 <DT> It has metadata
 276 <DD> Metadata for layer typically include layer title, units in which
 277 its values are managed, spatial resolution and value precession.
 278 Chartographic projection is property of region rather than layer.
 279 </DL>
 280
 281 Besides layer types described <A HREF=#layerclass>above</A> fGIS have
 282 <I>object</I> layer type. This layer type can consist of any objects
 283 allowed in Tcl canvas - lines, arcs, polygons, images with only one
 284 thematic value for each object. This type is primarily for annotation
 285 purposes, but also can be used as substitute for vector layers, while
 286 later are not developed
 287 <P>
 288 <A NAME="toc_section7"></A><H4>Planchet - object for displaying maps</H4>
 289 Another type of object which is essential for fGIS user is
 290 <i>planchet</I>. It is Tk widget like canvas (and actially derived from
 291 canvas) which has chartographic projection and real-world coordinates.
 292 It is used for displaying layers and picking points on them. Becouse
 293 it has real-world coordinates and physical size on the screen, it always
 294 knows its scale. When scale is changed (via zoom or window resize operation),
 295 all layers currently displayed on planchet are redrawn appropriately.
 296 <P>
 297 Planchet also have <i>look feature</I>. If right mouse button is pressed
 298 on some point in planchet, it displays values of several layers in this
 299 point in pop-up window.
 300 <P>
 301 There can be also &quot;friend widgets&quot; like status line which
 302 display current coordinates if mouse is over planchet or zoom/unzoom
 303 buttons which change its state depending of current state of planchet.
 304 <P>
 305 <A NAME="toc_section8"></A><H4>Low level objects</H4>
 306
 307 There are additional objects like rasters, palettes and pattern sets.
 308 But user seldom need to operate on them directly. They are primarily
 309 for developers of new layer types.
 310
 311 <A NAME="toc_section9"></A><H4>GIS operation</H4>
 312 GIS operation like calculationg buffer zones or computing new layer
 313 from several existing are performed by separate <A
 314 HREF=#epu>utilities</a> running in background. For user convinience
 315 there are tcl procedures which take one or more layer names as arguments
 316 and call appropriate utility.
 317 <P>
 318 Example of such procedure is interregion copy command, which tooks
 319 layer name and name of target region, determines projections and calls
 320 projection conversion program.
 321 <P>
 322 In some cases such procedures need to perform sufficient preprocessing
 323 of user-supplied arguments
 324 <A NAME="toc_section10"></A><H4>Utilities</H4>
 325
 326 GIS processing utilities are more general than fGIS. They use just
 327 data files and user-supplied arguments. So they can be used separately
 328 from fGIS, for example by users of EPPL7 GIS. Utilities are designed
 329 for batch environment, so they use exit codes to report status and
 330 stdin/stdout to recieve and return values which are not fit in command
 331 line. Important concept of these utilities is that user shouldn't worry
 332 about raster cell size. All utilites which operate on several raster
 333 files are able to deal with files with different cell sizes as long
 334 as there is non-empty intersection in terms of real-world coordinates.
 335
 336 <A NAME="toc_section11"></A><H4>Data access library</H4>
 337
 338 Both low-level Tcl objects (rasters, vectors) and utilites use common
 339 C library to access data files. This library provides appropriately
 340 high-level framework for those who want implement own data analysis
 341  algorithmes. For example it includes iterator routines, which recieve
 342 user-written function and open raster file and perform this function
 343 on every cell of given file. While library operates primarily in terms
 344 of raster cells (which can be important for cellular automata
 345 algorithmes, which need to distinguish between ``this cell'' and
 346 ``neighbouring cell'') it provides ways to process files with different
 347 cell sizes simulateously.
 348
 349
 350 </BODY>
 351 </HTML>
 352
 353
 354
 355
 356
 357
 358
 359
 360