Lecture 3
GIS Data Models Data Formats
EEOS 381 - Spatial
Databases and
Overview
Overview
GIS Data Models
Common GIS Data
Formats
Overview
Overview
Key points:
– It is important to understand what
model to use, based on the
application
– The model determines what specific
format you use
– The format may determine what
types of analysis you perform
Data Model
Data Model
General definition:
–Abstraction or representation of
objects and processes in the real
world, incorporating properties
relevant to the application at
hand
GIS Data Model
GIS Data Model
Definition:
–Digital representation of
geographic objects (spatial data)
in GIS software
• includes relationships between and
attributes of objects
• doesn’t include all of reality
GIS Data Models
GIS Data Models
The role of a data model in GIS
GIS Data Models
GIS Data Models
Levels of abstraction:
Reality Real-world phenomena - e.g. wells, streets, lakes
Conceptual Model Decide which objects are applicable, what relationships exist among them, what processes they participate in
Logical Model List objects, with names, descriptions, behavior, interaction, location, what GIS will do
GIS Data Models
GIS Data Models
Example implementation:
Reality Wells, dry cleaners, streets
Conceptual Model Ask - How does pollution from dry cleaners and
major roads affect public water supplies (wells and reservoirs)?
Logical Model Use ArcGIS to compare wells (points), reservoirs (polygons), dry cleaners (points) and streets (lines),
with buffer and proximity operations; focus on wells with 100+ gallons per minute yield and major roads, in
eastern Mass.
Physical Model BUFFER shapefile WELLS_PT, join to GPM table on YIELD
field; determine how many dry cleaners are within 1 mile of large wells and proximity to reservoirs and wells to major roads; store in Oracle-based ArcSDE geodatabase
GIS Data Models – 2 Conceptual Views
GIS Data Models – 2 Conceptual Views
Discrete objects
– World is empty except where
occupied by objects with well-defined locations and/or boundaries
• e.g. wells, streets, lakes
Fields
– Measurements may be
made at any location over a continuous surface
• e.g. elevation,
GIS Data Models
GIS Data Models
GIS Data Models
Raster is a data model
– space is divided into array (rows and
columns) of cells
– each cell (pixel, or picture element) in a
layer is the same size and has a
homogeneous value
• cell size refers to resolution (10m, 1
foot, etc.)
– usually associated with field view
GIS Data Models
GIS Data Models
Raster - examples
GIS Data Models
GIS Data Models
Raster
– Cells may belong to
zones (groups of cells with same values,
usually representing the same feature)
– Can include ‘NODATA’ -null values (out of range of dataset or no
information available for that cell)
– Some image formats can include attributes (value attribute table)
GIS Data Models
GIS Data Models
Raster
– Advantages:
• A simple data structure—a matrix of cells with values, representing a coordinate, sometimes linked to an
attribute table.
• A powerful format for intense statistical and spatial analysis; perform overlays with complex data faster than with vector data.
– “Spatial Analyst” extension in ArcGIS
• The ability to represent continuous surfaces and perform surface analysis.
• The ability to uniformly store points, lines, polygons, and surfaces.
GIS Data Models
GIS Data Models
Raster
– Disadvantages:
• Inherent spatial inaccuracies due to
the cell-based feature representation, especially if low resolution.
GIS Data Models
GIS Data Models
Vector is a data model
– points - single coordinate values
– lines (arcs) - strings of connected points
– polygons (areas) - enclosed lines
– usually associated with discrete object view
GIS Data Models
GIS Data Models
Vector –
the basicsPOINT - location with a
set of coordinates (0-D)
LINE – connected
string of points (1-D)
POLYGON – area defined by a line (2-D)
2 line segments (a direct line between two points) shown here
GIS Data Models
GIS Data Models
(topological junction, or endpoint of line)
(direct connection between two nodes)
(sequence of line segments)
(directed sequence of nonintersecting line segments with nodes
(an area defined by an outer ring without inner rings) (sequence of any line segments with
closure) (curve string)
(an area defined by an outer ring with
inner rings) (a link between two
nodes, with one direction designated)
Vector
(otherGIS Data Models
GIS Data Models
Vector
– Advantages:
• Precise values
• Efficient storage
• Topological relationships
• High-quality cartographic output
• Useful for a variety of spatial analysis
GIS Data Models
GIS Data Models
Vector
– Disadvantages:
• Poor for storing continuous surfaces
(e.g. elevation models)
• Overlay operations can be
time-consuming and computer intensive
GIS Data Models
GIS Data Models
Vector
– Simple vs. Topologic features:
• Simple - a.k.a. “spaghetti model” - no inherent connectivity relationships
• Topologic - simple features with defined spatial relationships Spaghetti – 4 linear features Topologic - 14 linear features - 13 nodes Node Line
GIS Data Models
GIS Data Models
Spaghetti Data Model
– No details of logical relationships between objects
• The line shared by two adjacent polygons is stored separately (twice) in the computer
• Spatial relationships are only implied
– Efficient for cartographic display but not data storage
– At first, GIS used vector data and cartographic spaghetti structures
GIS Data Models
GIS Data Models
Topology
– Connectivity: chains are connected at which nodes? – Direction: defined by a “from node” and a “to-node”
of a chain Example analysis: Modeling flow through the connecting lines in a network
GIS Data Models
GIS Data Models
Topology
– Adjacency: which polygons are on the left and which are on the right side of a chain?
Example analysis: Identifying adjacent
features;
Combining adjacent polygons with similar
GIS Data Models
GIS Data Models
Topology
– Inclusion: simple spatial objects (node,
chain, smaller polygon) are within a polygon
Example analysis: Overlaying geographic
GIS Data Models
GIS Data Models
Network
– Type of topologic vector data model (see pgs 218-219 in book)
– Models flow of goods and services (e.g. routes of roads, rivers, utility lines)
• Radial - flow in one direction (e.g. upstream, downstream) • Looped - intersections allowed, choices for flow allowed
“Network Analyst” extension in ArcGIS contains tools for this type
GIS Data Models
GIS Data Models
Regions
– Type of
topologic vector
data model
– Groups of
polygons in
coverages
– “Multi-part”
polygons
GIS Data Models
GIS Data Models
Routes
– Composite line features
• Created from sections (whole or partial arc) • contain “M” values (measures along route)
• Ex.: All the arc segments in ALL_ROADS that make up Interstate 90, treated as one feature in MAJOR_ROUTES
GIS Data Models
GIS Data Models
Linear Referencing System (LRS)
– Uses a relative position along an already existing linear feature, without explicit x,y
coordinates. Location is given as a position, or measure, along it (distance, or percent along).
• Have “base layer” of lines, plus a series of related “event tables”
– Address, Speed Limit, Route Number tables, etc…
• Highways/city streets (MassDOT), railroads, rivers, and pipelines, water and sewer networks
• Dynamic segmentation / “flat file”
– See pages 219-221 in
GIS Data Models
GIS Data Models
Linear Referencing System (LRS)
1 “Base” arc Speed limit # of lanes 3 “Flat file” arcs ID = 1 55 mph 45 mph 30 mi. 0 100 3 lanes 2 lanes ID = 1 2 3 3 55 1 ID SPEEDLIMIT NUMLANES 2 3 NUMLANES 60 0 1 100 60 1 ID F_MEAS T_MEAS 45 55 SPEEDLIMIT 100 30 T_MEAS 30 0 F_MEAS 1 1 ID 2 1 ID Base arcs feature class attribute table
Flat file arcs
SPEEDLIMIT Table NUMLANES Table
LRS Tables
GIS Data Models
GIS Data Models
TIN (Triangular Irregular Network)
– Topologic data model for surfaces (e.g. elevation) made up of connected triangles (faces)
– Triangle nodes have X,Y,Z values
– Triangles may be sized differently, based on original data density
GIS Data Models
GIS Data Models
TIN
As viewed in ArcScene
GIS Data Models
GIS Data Models
Terrain Dataset
– a multiresolution, TIN-based surface built from
measurements stored as features in a geodatabase. They're typically made from LiDAR, sonar, and
photogrammetric sources. Terrains reside in the
geodatabase, inside feature datasets with the features used to construct them.
GIS Data Models
GIS Data Models
Annotation
– text labels (vector
features)
– fixed position, size,
orientation
• anno does NOT
reposition as you pan and zoom
– N/A for shapefiles (only in GDB
and coverages)
GIS Data Models
GIS Data Models
Object-Relational
– Everything stored in database tables
• attributes, geometry in RDBMS
– Defined relationships between objects – Can store topology
– Can design with CASE (Computer-Aided Software Engineering) tools (like MS Visio) to produce UML (Unified Modeling Language) diagrams (see pages
221-226 in textbook)
– Download models from esri.com for various industries
GIS Data Models
GIS Data Models
Object-Relational UML Diagram
An example of a CASE tool (Microsoft Visio) The UML model
GIS Data Models
GIS Data Models
Object-Relational Diagram
A water-facility object model
Definition
Definition
Format -
The pattern into which data (coordinates, attributes, indexes, spatialreference, etc.) is systematically arranged for use on a computer. A file format is the specific design of how information is organized in the file. (All GIS data is a file on disk at the most basic level).
– For example, ArcInfo has specific, proprietary formats used to store coverages. DLG, DEM, and TIGER are geographic datasets with different file formats. ESRI has also developed Shapefiles and Geodatabases.
GIS Data Formats
GIS Data Formats
Common raster formats:
– GeoTIFF, TIFF, BIL, BIP
– MrSID (.SID), JPG, JPEG 2000 – GRID, DEM
– ERDAS IMAGINE (.IMG) – Intergraph - CIT, COT – ER Mapper
– ADRC
– NTIF - National Image Transfer Format – Geodatabase “raster datasets”
Raster - file components:
– Image file (.tif, .sid, ... )
– Header (“world”) file (.tfw, sdw, …):
– Auxiliary file (.aux) - stores spatial reference
– Reduced raster resolution (.rrd or .ovr) – stores pyramid levels
GIS Data Formats
GIS Data Formats
1.000000000000000 0.000000000000000 0.000000000000000 -1.000000000000000 237000.500000000000000 897999.500000000000000
Cell size (x-scale)
Coordinates of center of upper left pixel
Rotation terms
GIS Data Formats
GIS Data Formats
Common vector formats:
– Shapefile, Coverage, Geodatabase “feature classes” – DXF, DWG - CAD-based
– MapInfo - MIF – DLG
– TIGER, VPF – ASCII, DBF
– SDTS - Spatial Data Transfer Standard – SDC - Smart Data Compression
Definitions
Definitions
A feature is a point, line, or polygon in a dataset that represents a real-world object A feature class is a collection of features, categorized by the type of geometry used to define the feature (e.g., how the coordinates are stored, as a point, line, or polygon)
– “polygon feature class”, “arc feature class”, “point feature class”, etc.
– Should represent similar objects
Common ArcGIS Formats
Common ArcGIS Formats
Coverage
Shapefile
Geodatabase
(“geographic database”)
– Personal, File
– Spatial Database Engine (SDE)
File-based
data model
File-based
data model
DBMS-based data model(aka Object data model)
DBMS-based data model
(aka Object data model)
Vector
Vector & Raster
GIS Data Formats - Shapefile
GIS Data Formats - Shapefile
Developed by ESRI (ArcView 2)
Stored on disk in folders
Consists of a set of files
– .shp – spatial geometry
– .shx – spatial geometry index
– .dbf – dBASE file
(feature attributes)– optional others (.prj, .sbn, .sbx, .ain,
.aih, .aig, …)
always present
GIS Data Formats - Shapefile
GIS Data Formats - Shapefile
Simpler than coverages - useful for
mapmaking and some kinds of analysis. Fast display (especially when local)
Single feature class (geometry) per shapefile
– Point (points and multipoints) or
– Line (simple lines and multipart polylines) or
– Polygon (simple and multipart)
No topology or annotation
10-character max. field names (dbf limitation) May be edited in ArcGIS and ArcView GIS 2x+ Open format (specs available); may be
GIS Data Formats - Coverage
GIS Data Formats - Coverage
Developed by ESRI, c.1981
Traditional (Arc/Info) format for
complex geoprocessing, high-quality
geographic data, and sophisticated
spatial analysis.
Stores features and attributes for
thematically associated data
Can explicitly store topology (features
stored only once) - use BUILD or CLEAN
commands (vs. “spaghetti data model”
GIS Data Formats - Coverage
GIS Data Formats - Coverage
Stored on disk as a directory (folder) of files, with more files in associated ‘info’ directory Attributes in INFO format (tables)
Coverage folder stored in a “workspace” - a
special name for a folder with a coverage (or
Grid or TIN)
Workspace
Coverages
GIS Data Formats - Coverage
GIS Data Formats - Coverage
Multiple feature classes can be grouped and stored in one coverage
– Primary (label point, arc, polygon, node) – Secondary (tics, links, annotation)
– Compound (routes/sections, regions; built from primary features) – like “multi-part features”
Edit in ArcInfo Workstation only
Polygons can’t have “holes” (because of “universal polygon” (i.e. the background)
You cannot have points and polygons in the
GIS Data Formats - Coverage
GIS Data Formats - Coverage
(point attribute table) (arc attribute table)
(route attribute table)
(polygon attribute table) (node attribute table)
GIS Data Formats - Coverage
GIS Data Formats - Coverage
Explicit topology
– Connectivity (arcnode topology)
-arcs connect to each other at nodes
GIS Data Formats - Coverage
GIS Data Formats - Coverage
Explicit topology
– Area Definition (polygonarc topology)
-Arcs that connect to surround an area
define a polygon
GIS Data Formats - Coverage
GIS Data Formats - Coverage
Explicit topology
– Contiguity (adjacency) - Arcs have
direction and left and right sides
GIS Data Formats - Coverage
GIS Data Formats - Coverage
Coverage attribute tables have “Sacred Items”
– Point/Polygon: AREA, PERIMETER, <COVER>#, <COVER>-ID
– Arc: <COVER>#, <COVER>-ID, FNODE#, TNODE#, LPOLY#, RPOLY#, LENGTH
Topology between feature classes managed
with sacred items
– Ex.: <cover># in .PAT (polygon attribute table) relates to
LPOLY# and RPOLY# in .AAT (arc attribute table)
– <cover># = 1 in polygon coverages’ “universal polygon” (hidden in ArcGIS Desktop)
Data Format Conversion
Data Format Conversion
Workflow may dictate that data
need to be in another format
In ArcMap, Right-click layer in Table of Contents and choose Data >
Export Data > and select format
Data Format Conversion
Data Format Conversion
Use ArcToolbox Conversion Tools
ArcInfo license and installation of ArcInfo Workstation required for Coverage conversion tools
Data Format Conversion
Distribution
Distribution
Process of moving data from one location to another
Copy/paste in ArcCatalog if source and
destination are both accessible, otherwise:
– Coverage – export to “Arc/Info Export File” (a.k.a “interchange file”) in ArcToolbox
• ASCII file with .e00 extension
• User then “Imports” file with ArcToolbox (ArcInfo)
– Shapefile – send all components or use WinZip, PKZIP, StuffIt, etc., to send all in one file