Wednesday, November 19, 2014

GIS II: Network Analysis, Routing, and Assessment

Network Analysis and Routing for Frac Sand Mines and Railroad Terminals

Introduction
The process of network analysis is a vital tool for analyzing most effective routing and  for potential cost assessment for a variety of different levels from city to state. When coupled with python scripting and data flow modeling, the possibilities for customization of search criteria and the end usability of the resulting data far surpasses the time required to initially develop the models. In this exercise, we started by developing scripts to eliminate extraneous data from our datasets. We then learned about the functions of the Network Analysis toolbox in ArcMap. Our final step was to develop a data flow model that automated the processes after our python scripting. The resulting output provided the comprehensive length of routes for each county and the cost of travel for trucks on those roads.

Methods
Python Scripting
A python script was written in the first half of this exercise that selected active mines that were not rail loading stations. To account for any mines that may have been rail loading stations as well, a 1.5 kilometer zone around the railways was eliminated from the search criteria. More details concerning the scripting process and a screenshot of the script can be seen under the “Python Scripts” tab found at the top of this page.

Network Analyst
Network Analyst is a toolbox within the ArcGIS Suite that allows for advanced route modeling, among other things. Using Network Analyst, we were able to assess the fastest routing available for sand trucks traveling from the sand mines to the rail loading terminals. The network dataset used for the roadway mapping was acquired from ESRI Street Map data for the United States.  The mine location data was acquired from the Wisconsin DNR. The rail terminal data was acquired from the Department of Transportation website. In order to determine travel distance and cost of travel we first had to determine a closest facility that each truck would be traveling to where they could offload their sand. These routing options were tested by manually inputting the facilities (rail terminals) and the incidents (mines). The objective of this lab was to fully automate the process where a useable model could be easily used again by implementing new base layers and rerunning the data flow model.

Data Flow Model
A data flow model was created that automated the work required for replicating the results for new datasets, if needed. Figure 1 shows the complete data flow model for this exercise. A majority of the data flow model was quite simple to compile. In order to create the routing for the sand trucks from the mines to the rail terminals a closest facility was determined for each mine, locations were added that determined the input functions for where the trucks were being routed to, and the data was solved, all using simple tools within the Network Analyst toolbox. Following this, the data was selected, the features were copied, and an output feature class was created that saved the temporary feature class created by the previous solve function.  Finally, the data was projected in a Wisconsin (meter) projection that allowed it to be compatible with the rest of the data. However, when following the model, after the green circle called “Results_Prj,” the methods involved were somewhat more difficult.

Figure 1. This data flow model shows the process needed to complete the entire process described in the methods section.

A number of steps were required in order to convert the route data to a useable feature on the county level, which allowed us to calculate comprehensive distance traveled and cost of travel for each county. There were a number of different methods to accomplish this, but I started mine using the intersect tool. Intersecting allowed me to split the route line feature class by the polygon boundary. After this, I used a spatial join to combine the feature class for Wisconsin counties by the intersected route data. This prepared the data to be summarized by county, rather than just overall route length. When summarizing the comprehensive route length I used the Summarize Statistics tool, summarizing the length by county designation. After this, I was able to use the add field tool to add a field showing the distance in meters converted to distance in miles. I used the Calculate Field tool to create an SQL statement to calculate the comprehensive lengths multiplied by the conversion factor for one meter to one mile (0.00062137). Finally, another field was added and a value was calculated for the cost of travel for a trip by a truck to and from a rail terminal, estimating the cost of fuel per mile at $2.20.

Results
It should be noted that all of the data in the following section is hypothetical and in no way calculates usable results other than those determined for the sole purpose of this lab. Figure 2 shows the resulting map created after running the data flow model. 


Figure 2. This map shows the results of the data flow model after the routes have been created.

Figure 3 shows the resulting attribute table created after running the data flow model. Figures 4 and 5 are choropleth maps that denote the counties with the highest travel distance and counties with the highest cost associated with travel. It makes sense that Chippewa County, the county that travels the greatest distance (approximately 205.77 miles) would pay the most per year in fuel ($45,269.70), as this would constitute a direct relationship. As distance traveled increases the cost will increase in a direct fashion. Burnett travels the least of the sixteen counties included (1.28 miles), and therefore pays the least ($280.56).

Figure 3. This table shows the data after the final statistics have been calculated.
Figure 4. This map shows the total distance per year by county for frac sand transport routes.

Figure 5. This map shows the total cost per year by county for frac sand transport routes.

Figure 6 is a graph showing the total cost compared to the total distance traveled. Figure 7 is a small subset of Figure 6, shows the congested data that exists below one thousand miles traveled.

Figure 6. This graph shows the total cost per year vs. the total distance per year for each county included in the analysis.

Figure 7. This graph is the same as the graph in Figure 6, however it only examines the total distance under fifty miles. This shows a number of the counties in a way that is readable.


Discussion
There are much wider implications for county level analysis of the data determined in this exercise. Accelerated deterioration of the roadways caused by greater usage of the roads raises interesting questions asking who should be taking on the brunt of road repair costs. Should mines have to contribute a greater amount of money to county or city funded roads because they use them to a greater extent then a normal citizen? From personal experience I know that logging companies will pay for the repairing of roads because they use them to a substantial degree and the loggers need them maintained to a certain degree to ensure that transportation is streamlined. Should frac sand mines be called upon to support the repair of roadways and railways that are stressed as a result of a transportation-heavy industry? Is there a better way to transport these resources from place to place? Currently, I do not believe there is. When transporting extracted resources such as coal, iron ore, timber, and sand there are really no other logistically feasible ways to transport the raw materials, other than simply by truck and train. Should the railway be expanded with more nodes closer to the location of the sand mines, or even directly on site? That could be helpful, as it would reduce the amount of stress placed on the roads, but it would most likely have an adverse affect on the surrounding environment, as large swaths of trees would have to be created to make this efficient and connect all the railways.

Conclusion
The logistics behind transporting frac sand from the extraction sites to the distribution rail terminals poses a number of questions that must be further assessed before determining a final result. In the counties where frac sand companies place a large amount of stress on the railways, an agreement must be reached between the company and the county on what will be done to properly ensure the quality and safety of the roads and rails for all users, whether that be the mining industry or civilians. Network analysis and routing can assist in determining the most efficient way of accomplishing this, while allowing for usage parameters to be set. Creating a data flow model allows the workflow to be replicated as new data becomes available and current data is updated. Using these techniques, the ability to assess cost associated with frac sand transportation is much more accessible to a wider audience.

Wednesday, November 5, 2014

GIS II: Normalizing Data and Geocoding

Exercise 6: Data Normalization, Geocoding, and Error Assessment

Goal: 
The purpose of this lab was to become familiar with normalizing data so that it could be processed, geocoding addresses, and assessing errors in acquired data. We were provided data for all of the current frac sand mines in the state of Wisconsin. This data was provided by the Wisconsin DNR and had not been normalized or altered so that it could be successfully geocoded. Once we normalized the data, it was geocoded to begin to spatially analyze where the frac sand mines were located. We also had to assess errors within the data, as our classmates geocoding may have differed slightly from ours, or they may have normalized the data in a different manner.

Methods:
The first step was to normalize the acquired data. Many of the addresses for the mines did not provide street addresses. A number of the mines were listed in the Public Land Survey System (PLSS), and had no addresses at all. The addresses of the mines were also not divided up the same way, by address, city, and county. In an attempt to normalize the data, the web GIS servers for each county were opened and the parcel information was gathered. If not address was available for the parcel being located, a proximal parcel was chosen if it provided an access road to the desired parcel. Figures 1 and 2 show a comparison between the original data and the data after normalization. Once all of the data were normalized, they were ready to be geocoded.

Figure 1. This figure shows the excel data before it has been normalized. Many of the PLSS designations are available when there is no address applied to the parcel. Also, many of the addresses listed the street address, city, and state in the same line. ArcMap is not able to separate these, so we must set up our excel file so that everything is separated by a single attribute.
Figure 2. This figure shows the excel data after it has been normalized. The address has been split up and every entry was able to be assigned an entry, based on referencing of access roads in proximal properties.
After normalizing the data, the excel file was ready to import into ArcMap to be referenced for the geocoding process. In this instance, all of the data were normalized in a manner that made geocoding possible, allowing for a match for all data points. To attempt to compare our results with those of our classmates, all of the other geocoded data had to be merged. Figure 3 shows a map of west-central Wisconsin and the results of my geocoding and the geocoding of the rest of the class.

Figure 3. This map shows the location of my geocoded mines and the mines geocoded by the rest of the class. There are noticeable differences between the data. For example, the mine in western Wood County has a mine to its southeast. This mine is the same as the western mine, though the normalization process was performed differently and the geocoded results were not the same.
A query was developed to select all of the mines geocoded by our classmates with the same Mine UNIQUE ID field as our own. After these were selected a new feature class was created to make comparing the selected mines of our classmates to our own. This allowed us to utilize the Point Distance tool in the ArcMap toolbox to calculate the distance from out point to the surrounding mines. After projecting the geocoded results, this tool was run and a table was produced. If the Distance field listed a distance of 0 meters, there was a perfect match and that was the same mine. If there was a difference then it meant that there was a difference in where the same mine was geocoded based on normalization differences, or the distance was measured to another mine. Figure 4 shows the table with feature ID classes listed and the distances between the input and near feature ID.

Figure 4. This table shows the distance between our mines (INPUT_ID) and the distance to the other geocoded mines (NEAR_FID). A distance of 0 meters indicates that the mine is an exact match. Other mines needed to be checked to ensure that we were examining the correct mine. 
Discussion:
There are a number of possible explanations for the reasons behind the variation in distances from geocoding. These error types are listed in an assigned class reading Lo (2003). I did not notice any gross errors in this data (errors caused by blunders or mistakes). If the entire class had been trained on standardization prior to the normalization of the data, an argument would be correct that there would be gross errors. However, because the purpose of the exercise was to learn proper methods of how standardizing should be conducted, no standardization method was set, and therefore there was not really any "gross errors." There were definitely systematic errors in the data caused by human bias when determining where the mine should be placed, if a manual placement had to be assigned.

Inherent errors in the data are also a main source of the error in the geocoded data. According to Lo (2003), there is inherent error in digitizing and attribute data input. Both of these were conducted in this lab and both had inherent error within them. Each dataset was normalized by a different student, which meant that there were as many different methods as there were students for normalizing the data. This lead to different ways that things were divided, recorded, and listed. Operation errors were also very much a part of this lab, for many of the same reasons as listed previously. There is error that will come with user bias and different methods that nothing will alter but prestandardization of the normalization process. Even after standardizing, it is hard to remove all the error possibilities.

In order to determine which points are correct and which points are off, we would need a perfect dataset. We would need addresses for every mine in a format that leaves little to no room for error in normalization of the dataset. To determine the most accurate points on our map without a perfect dataset, we would need to come together as a class and decide which points should be counted as the correct location. Then we would be able to determine the distance of our points from the point that was deemed correct.

Conclusion:
Data normalization is something that needs to be determined before the start of a project. In order to ensure that data melds in a manner that makes it usable by all, strict guidelines must be put in place prior to the data management process. If this is not done, the chances that a dataset will be managed differently from another are much higher. 

References Cited:
Lo, C., & Yeung, A. (2003). Data Quality and Data Standards. In Concepts and Techniques in Geographic Information Systems (pp. 104-132). Pearson Prentice Hall.