Ag-Analytics Documentation

(1)

Joshua Woodard, Lin Xue

(2)

(3)

1 1. What is Ag-Analytics? 3

2 2. Finding Data 5

2.1 2.1. Data Overview. . . 5

2.2 2.2 Data Sources . . . 5

2.3 2.3 Different ways to use Ag-Analytics . . . 5

3 3. Querying the Database 7 3.1 3.1 Table Descriptions . . . 7

3.2 3.2 SQL Query . . . 7

3.3 3.3 Sample Codes for API request . . . 7

3.4 3.4 Graphical Query Builder . . . 7

3.5 Entities and attributes Panel . . . 8

3.6 Columns Panel . . . 8

3.7 Conditions Panel . . . 8

3.8 Basic queries: how to make them . . . 8

4 4. Downloading Bulk Data 9 4.1 4.1 List of Tables . . . 9

4.2 4.2 Download Files Programmatically . . . 9

5 5. Using the Tools 11 5.1 5.1 Tools Overview. . . 11

5.2 5.2 Crop Yield . . . 11

5.3 5.3 Dairy Margin Protection . . . 11

5.4 5.4 Crop Insurance Premium Calculator . . . 12

5.5 5.5 Commodity Futures . . . 12

5.6 5.6 Grape Cost Projection . . . 12

5.7 5.7 Spot Price Interpolation . . . 12

5.8 5.8 Sulfur Dioxide in Wine Calculator. . . 12

5.9 5.9 Ethiopia Soil Fertility and Yield Simulator . . . 13

5.10 5.10 Geographic Data Explorer . . . 13

6 6. Using the Forum 15 6.1 6.1 Registering and Logging in . . . 15

6.2 6.2 Writing a post . . . 15

6.3 6.3 Editing a post . . . 15

(4)

7.2 7.2 Data Preprocess Code Files and Job Description . . . 17

8 8. Pre-processing Data 19

9 Indices and tables 21

(5)

(6)

(7)

1. What is Ag-Analytics?

Ag-Analytics is a pioneering data portal to integrate and consolidate economic and environmental data into one com-prehensive, live-automated data warehouse.

Ag-Analytics was motivated by the research group’s efforts to integrate raw data into a central server in order to translate research efforts into open source web tools for farmers and policymakers. Via this centralization, we strive to eliminate the need for ad hoc data collection specific to individual research efforts by creating an open-source web-based interface for academic, government, and other researchers to integrate, query, and process large datasets in an automated, extensible, and scalable fashion. Moreover, grounding on this open and live data platform, we are able to constantly develop various visualization tools to facilitate the ease of data finding for common users.

The sheer scope and complexity of the “real world” necessitates adaptable and accessible data infrastructures, compu-tational models, and visualization methods to tackle the research questions of the future. Ag-Analytics is a prototype to such systems, and will usher in durable next-generation decision-making frameworks for assessing agricultural policy and risk management, and achieving technological advances in the face of climate change and disaster events such as disease, flooding, drought, severe storms, and market shocks.

(8)

(9)

2. Finding Data

2.1 2.1. Data Overview

Since 2013, the Woodard Research Group at Cornell University has embarked on an overarching integrated effort to lead the agricultural economics, policy, and risk management fields in the area of big data analysis/integration with our Ag-Analytics system. We are in the process of gathering and restructuring various public agency data, identifying agency partnerships, and building an industrial data warehousing system to allow for live and real time analysis. Our initial priorities are the standardization, synchronization and integration across data systems, models and user needs. The Ag-Analytics focuses on consolidated and raw weather, crop, soil, market, and geographical data. Our data efforts are currently focused on the United States. These datasets are retrieved from various data sources over the internet and transformed before being stored in the database. Each data set is automatically updated on a regular basis. Outputs from our statistical models to support analytical tool products are also stored in the database, and the web tools read the model outputs and visualize them on the browser window based on user selection criteria. The raw data are often on several different dimensions of temporal and spatial aggregation. Our database provides raw data as well as spatially processed data ready that is ready to use (e.g., average temperature data aggregated at the county level).

Our vision is that these efforts will lead to an open source platform which will allow researchers, farmers, and govern-ment agencies to more efficiently access and employ the vast amounts of data available for policy analysis and tools to help farmers understand and manage risk in a smart and integrated manner. We also envision opening the system to crowd-sourcing in order to facilitate future development.

2.2 2.2 Data Sources

Partial Summaries of Major Datasets/Sources Currently in Ag-Analytics

2.3 2.3 Different ways to use Ag-Analytics

There are three major ways a user can find and use data in Ag-Analytics.

First, a robust “Data Query” API for researchers with knowledge of SQL(Structured Query Language). The user can either enter valid SQL query on the web page and download query results or use Get or Post Request to access data via our API. Both sample SQL queries and sample codes for API access are provided on the “Data Query” web page. For users who need the entire data tables, “Bulk Download” option is provided. It is worth to mention that we also built a “Graphical Query Builder” tool so that users without SQL knowledge can still search data in a SQL query fashion. This is achieved by translating basic SQL commands into a user friendly web interface of check boxes, click buttons and drop down menus. On “Graphical Query Builder” web page, anyone can “write” SQL query commands by only a few mouse clicks.

(10)

Second, a set of “User Tools” our team developed to serve the common users. “User Tools” are self-explanatory web tools that require minimum training. With a few button clicks, drop-down menu selections, or simple number inputs, the users can view and download visualized results in the form of charts, spreadsheets, heat maps. Each user tool is developed to solve a specific problem and only uses a subset of the data sets.

Third, a “Data Explorer” built withCKAN, an open-source data portal platform. While this method does not require specific knowledge like “Data Query”, it can take a bit more time to get started than the “User Tools”. There are several features “Data Explorer” provides and complement the “Data Query” method. 1. It provides a quick “Google-style” keyword search experience as well as faceting by tags and browsing between related datasets. Users can quickly see what datasets are available, in which formats and with which license, straight from the search results. All dataset fields are searchable. 2. It provides a rich set of metadata for each dataset: title, identifier, groups, description, preview, revision history, license, tags, multiple formats, API key. 3. It has advanced geospatial features, which is great for our geospatial data. 4. It allows users to communicate and collaborate on data and share using social media integration. 5. Its data previewing tool has a host of powerful features for data visualization. [reference:http://ckan.org/features/] Our intention for “Data Explorer” is that users without a specific goal or wanting to get familiar with our data sets can use this tool to easily browse and visualize any data set. In the future, we hope to realize crowd sourcing by collecting comments and discussion on datasets from registered users.

(11)

3. Querying the Database

Our most important web application designed for researchers is the Data Query page. The user either needs basic SQL knowledge to write the query, or uses the “Graphical Query Builder” tool to generate the query by mouse clicks (See 4.4 for details on “Graphical Query Builder”).

3.1 3.1 Table Descriptions

By default, “Table Descriptions” displays names of all the data tables and their description in order to help the user decide which data set to query from. Shown below is a static table captured from our site on Dec.4, 2015. The actual table on the Data Query page is live and automatically updated when new data table or description is added.

3.2 3.2 SQL Query

As the user input SQL query data in the text area, after typing “FROM ”, the “Table Descriptions” table will display all the column names and their descriptions in the dataset to facilitate the understanding of the dataset so that user can write better SQL queries. When the user finishes type the SQL query, the query result in CSV form can be downloaded by clicking “Download” button. Shown below is an example fromRMA Projected Insurance Prices Query Example. The user can click the link on top of the text area to view example queries from the forum.

3.3 3.3 Sample Codes for API request

Instead of writing SQL statement in the text box, you can also use Get or Post Request to access to data programmati-cally through our API without needing to revisit this page. Sample codes for API request can be found at the bottom of the Data Query web page. Sample Code is provided for your reference for C#, Python, MATLAB, R, VBA and Http. Please note the data are returned as a large string. Row is delimited by linefeed and column is delimited by comma.

3.4 3.4 Graphical Query Builder

Graphical Query Builder is a user-friendly form for advanced search powered byEasyQuery. The visual filtering panel allows the users to build filter expressions without knowing how to write SQL query.

The user input the “Entities and attributes”, “Result columns”, “Column sorting” and “Query conditions”, the SQL is automatically generated, and result can be viewed by clicking “Update result”. The user can start a new search by clicking “Clear Query”.

(12)

3.5 Entities and attributes Panel

This panel constitute something like a structure of data that one can extract through construction of queries. An entity is some real-world object, information on which is stored in the database. Examples of entities are as follows: ‘CommodityFutures,’ ‘CornVolatility,’ ‘DailyPrism,’ etc. Every entity has an initial set of attributes, or parameters. Thus, attributes for such entity as ‘CommodityFutures’ can be: Expiration Year, Volume, High Price, and others. The entities panel can be used as a visual representation of all entities and their attributes, which you can use to build queries. It can be used to add entity attributes to a Columns Panel (and they will be displayed in the result) or to a Conditions Panel (in order to actually set the query conditions). You can use buttons “Add column” and “Add condition” to move selected attributes to the corresponding panels.

3.6 Columns Panel

Using columns panel you can specify the columns which you want to be displayed in your query results. You have a list of lines, each of them represents a column in the query results (in the Results Panel). Every line consists of Expression, which is an attribute, a Title, which serves as a column title in the Results Panel, and buttons – for sorting (on the left), aggregating and deleting (on the right).

3.7 Conditions Panel

Here you can specify filtering conditions for data selection. So the result of your query will include only the data that satisfy those conditions. Every line in the Conditions Panel corresponds to a single query condition. Every condition consists of 3 elements – an expression (some entity attribute), an operator, and value(s) (one or more). We have mentioned attributes above (e.g., ‘Company Name,’ ‘Product Price,’ and ‘Order Date’). Together with values, operators set certain limitations regarding data which is stored under a given attribute. For example, “(Date) is equal to 23 Jul 2014,” “(Quantity) is greater than 35,” “(Price) is less than 1,000,” and “(Name) starts with A.”

3.8 Basic queries: how to make them

Below is an example of a simple query result of the attributes “Commodity”, “Expiration Year”, and “Volume” in the “Commodity Futures” entity. The result is displayed and can be exported to Excel or CSV text file.

(13)

4. Downloading Bulk Data

4.1 4.1 List of Tables

Due to the data limit of the Data Query application, Bulk Download is provided for researchers who want to download the complete files containing all of the data for a specific table in our database. Each dataset comes with a “View Metadata” button to display the metadata description of the data columns.

4.2 4.2 Download Files Programmatically

Moreover, data tables can be downloaded programmatically using Python Code. For example, below is the code for downloading the RMA_COL data table:

import urllib

(14)

(15)

5. Using the Tools

5.1 5.1 Tools Overview

An ongoing number of user tools are developed and placed on the “Tools” web page. These tools are our attempts to translate research efforts and models efficiently into open-source web tools for farmers, policy makers and researchers alike. In fact, a lot of the tools are developed upon requests of fellow researchers. Since we have put in lots of work to create the Ag-Analytics data platform, it is relatively cost-efficient to develop tools taking advantages of the existing platform.

5.2 5.2 Crop Yield

Use this tool to view historical average yields for a variety of crops grown in the US. You can also select a county to view both historical weather data and climate change forecasts for that location.

The left side is a set of drop-down menus. Choose a crop, crop description, state, and county, then click “Display Result” Button. The right side is an interactive heat map that changes as the user chooses menu on the left side. The default map view is set at the scale to display all states. State average yield of the chosen crop is displayed as the mouse hover over the map. Users can zoom into state level by selecting a state or clicking on the map. Further selecting county from the menu and click “Display Result” button or clicking a specific county on the map will take the user to the result section on the lower potion of the web page.

The result display section has a primary menu tabs which includes “Historical Yield”, “Historical Weather” and “Cli-mate Change Forecast” and a secondary menu tabs which includes “Yield”, “Yield vs Temperature” and “Yield vs Precipitation”. Use the year slider and month drop-down to choose different time periods. The user can export the county level data that was queried in CSV format.

5.3 5.3 Dairy Margin Protection

The Dairy Margin Protection Program (MPP) Decision Tool is used to calculate insurance premium, forecast Dairy MPP payments and milk price based on the state, the annual milk production, and percentile of base production being insured. The choices are made in the drop-down boxes and text input area on the left side of the screen, while results can be viewed in graph or table form on the right side of the screen when “Calculate” button is clicked. The cells in the Milk Price Information is grayed out because the numbers are automatically generated from our live database when a State is selected.

(16)

5.4 5.4 Crop Insurance Premium Calculator

This tool calculates how much you should pay for crop insurance based on coverage level, your own yield history, as well as state, county, crop type and year. Fill out the form below and click “Calculate Premium” to get price and coverage level comparisons for your crop.

Fill in the Historical Yield Form below will populate the input boxes for “Rate Yield” and “Approved Yield”. After clicking “Calculate Premium”, the results of insurance price and coverage level comparisons for selected crop will be displayed in either graphs or tables form.

Alternative to fill in the inputs, a user can call the API directly. Click “Calling AgDB’s API” button to read more about how.

5.5 5.5 Commodity Futures

This tool is used to display the futures of certain commodity on selected contract year and month. Markets are always changing. Use this tool to keep up with trends and make sure you are informed about what the most recent prices are. By default, results of continuous years are displayed. The user can view different results by selecting “Contract Year” (the year that the contract expires), “Contract Month” (the month that the contract expires), and “Trading From” (the date of the transaction). “Trading To” field is set at current date by default. After hitting “Display Result”, two charts will be generated showing the Settlement Price and Volume of the commodity starting from “Trading From” date and ending at either contract expiration month or “Trading To” date, whichever comes first.

The daily commodity futures data is retrieved fromOpen Financial Data Project. Two things to note: first, the volume data is not available after 2020 (Contract Year); second, the starting date of the chart may be up to a few years behind the selected “Trading From” date after 2013 (Contract Year) due to the unavailability of the data.

5.6 5.6 Grape Cost Projection

This tool helps the user to determine the establishment and development cost and revenue in New York State. Fill out the assumptions web page and click on the “Submit” button to calculate the projected cost of starting a vineyard. All fields are pre-filled to set an example as well as giving users a quick peek at the result without having to fill all the assumptions.

5.7 5.7 Spot Price Interpolation

This tool utilize discrete set of known data points to produce continuous curves of Spot Price, Future Price, and their difference: Basis, using linear interpolation or natural neighborhood interpolation. When users mouse over the curves, a pop-up will display the price of that date.

While the chart display interpolated spot price of the selected state, the color coded US map on the bottom gives the user visual comparison of various states across the US. Spot Price or Basis of each state is shown when moused over.

5.8 5.8 Sulfur Dioxide in Wine Calculator

This tool measures Sulfur Dioxide (SO2) in wine using a head space gas detection tube method.

(17)

When mouse over the map, data of each Woreda will pop up. Double clicking on the Woreda will bring up percentile graphs of VCR(Vanguard Consumer Discretion ETF), Yield, Precipitation and Temperature data against Percentile. Clicking “Export Woreda Data” button on the bottom left corner of the graph will initiate the downloading of the data in CSV form.

5.10 5.10 Geographic Data Explorer

It is a web application which can query web map services and show multiple layers on the map. This tool can dynamically load Web Map Services layers from our GeoServer and from other web geographical services publicly available. Users can add/remove/set opacity multiple layers on the map and obtain useful information (Feature Info) of each layer. The web application also provides background map show/hide/replace, move to the current location of users and shape drawing tools features. The web application utilized Bootstrap and AngularJS for building the frontend and any GeoServer as a backend.

(18)

(19)

6. Using the Forum

6.1 6.1 Registering and Logging in

Anybody can browse the forum topics or post messages. But registration will give you access to additional features not available to guest users such as avatar images, private messaging etc. It only takes a few moments to register. After registering, you should receive an email to activate your new user. Once activated, you can log in using your username and password. If you forget your password, click the lost password link and follow the instructions. If you do not check the Remember me box when you log in, the forum will only keep you logged in for a preset time. This prevents misuse of your account by anyone else. To stay logged in, check the box during login. This is not recommended if you access the forum from a public computer, e.g. internet cafe etc.

If you have registered in the past but cannot login any more, attempt to locate the e-mail sent to you when you first registered, check your username and password and try again. It is possible an administrator has deactivated or deleted your account for some reason. Also, many forums periodically remove users who have not posted for a long time to reduce the size of the database. If this has happened, try registering again and being more involved in discussions.

6.2 6.2 Writing a post

To post a new topic in a forum, click the relevant button on either the forum or topic screens. You may need to register before you can post a message.

6.3 6.3 Editing a post

Unless you are a forum administrator or moderator, you can only edit or delete your own posts. You can edit a post by clicking the “edit” button for the relevant post.

6.4 6.4 Replying to a post

To reply to a topic in a forum, type in the text box underneath the post then click the Quick Reply button on topic screens. You may need to register before you can post a message.

(20)

(21)

7. Datasets and Metadata Input Standard

7.1 7.1 Metadata to be input along with each new dataset

• Titlee.g. Daily PRISM Data

• Contributorse.g. Joshua D. Woodard • Date of Inpute.g. 11/5/2015

• Data Source(URLs or name of the agency).e.g. prism.nacse.org • Data Formatse.g. CSV

• Descriptione.g. Daily gridded and processed data (state, county, township) for max temp, min temperature (Celsius) and precipitation (mm). OSU publishes one file for each variable/day in .bil format. We load these data into Ag-Analytics database and also process at various levels of aggregation. Data are updated daily. OSU publishes early release, provisional, and stable data. We have a flag to indicate if the data are stable release or not, and also update our database based on most recent data released by OSU.

• Source License e.g. http://prism.nacse.org/documents/PRISM_terms_of_use.pdf FROM PRISM WEBSITE: Data. All data (gridded, polygon, tabular, graphical) retrieved from the website or otherwise provided on the website may be freely reproduced and distributed for non-commercial purposes. When referring to the data, the source must be clearly and prominently stated and include, at a minimum, our name, URL, and the date of data creation. For example: PRISM Climate Group, Oregon State University, http://prism.oregonstate.edu, created 4 Feb 2004. Sale or other commercial use of the data must be arranged by contacting the PRISM Climate Group. • Update Frequencye.g. Daily

• Databasee.g. AgDB • Schema Name.e.g. dbo • Fields and Metadatae.g.

7.2 7.2 Data Preprocess Code Files and Job Description

• Code and File Structure Descriptione.g. This folder includes two python scripts, two SQL scripts and two zip files. The zip files contain county shapefile which is used to create the lookup table. The two SQL Files will set up the jobs (initial and daily update). Running these scripts create the job in SQL Server, but does not run it. The IntLoadDailyPrismJob must be run manually one time. This will take about 2 days to complete and is a one time operation. This call the “StagingPrepDailyPRISM.py” script, which in turn creates the needed tables in SQL server and creates the lookup table from the sample PRISM raster file (in the folder) and the county shapefile. It then runs the RawDailyPRISMProcess.py script. This script has 1 input argument, ‘U’ for update or ‘I’ for

(22)

initial. The InitLoad job calls the this script passing ‘I’ for initial load. The UpdateDailyPRISMJob should be run daily when the first job completes and calls the RawDailPRISMProcess.py script with input argument “U” for update. It loads recent data to a staging table, the replaces data in the RawDailyPrism table with new data if those data have changed from provisional to stable status. Once per month on the 12th day of the month, it reloads all data for the last 10 months in case provisionals have changed status. Every other day, it only loads for the previous 1 month. Check that SQL Server is scheduled.

• Libraries or Software Needede.g. ArcGIS, Python, SQL Server • Job Namee.g. InitLoadDailyPrismJob and UpdateDailyPrismJob

• Job Schedule Frequencye.g. InitLoadDailyPrismJob (one time) and UpdateDailyPrismJob (Daily) • Steps to set up job in SQL Servere.g. InitLoadDailyPrismJob:

UpdateDailyPrismJob:

(23)

(24)

(25)

Indices and tables

• genindex • search