This paper reports on the initial work and future trajectory of the Image Analysis of the Farm Security Administration – Office of War Information Photography Collection team, supported through an XSEDE startup grant and Extended Collaborative Support Service (ECSS). The team is developing and utilizing existing algorithms and running them on Comet to analyze the Farm Security Administration - Office of War Information image corpus from 1935-1944, held by the Library of Congress (LOC) and accessible online to the public. The project serves many fields within the humanities, including photography, art, visual rhetoric, linguistics, American history, anthropology, and geography, as well as the general public. Through robust image, metadata, and lexical semantics analysis, researchers will gain deeper insight into photographic techniques and aesthetics employed by FSA photographers, editorial decisions, and overall collection content. By pairing image analysis with metadata analysis, including lexiosemantic extraction, the opportunities for deep data mining of this collection expand even further.
Users of micro-blogging services and content sharing platforms are generating massive amount of Geotagged information on a daily basis. Although these big data streams are not intended as a source of Geospatial information, researchers have found that ambient geographic information (AGI) complements authoritative sources. In this regard, the digital footprints of users provides a real time monitoring of people activities and their spatial interaction, while more traditional sources such as remote sensing and land use maps provide a synoptic view of the physical infrastructure of the urban environment. Traditionally trained scientists in social science and geography usually face great challenges when experimenting with new methods to synthesize big data sources because of the data volume and its lack of a structure. In order to overcome these challenges we developed UrbanFlow, a platform that allows scientists to synthesize massive Geolocated Twitter data with detailed land use maps. This platform would allow scientists to gather observations to better understand human mobility patterns in relation to urban land use, study cities’ spatial networks based on identifying common frequent visitors between different urban neighborhoods and monitoring the patterns of urban land use change. A key aspect of UrbanFlow is utilizing the power of distributed computing (using Apache Hadoop and cloud-based services) to process massive number of tweets and integrate them with authoritative datasets, as well as efficiently store them in a database cluster to facilitate fast interaction with users.
With the advent of DEMs with finer resolution and higher accuracy to represent surface elevation, we face an enormous need to have optimized parallel hydrology algorithms that are imminent to be able to process big DEM data efficiently. TauDEM (Terrain Analysis Using Digital Elevation Models) is a suite of Digital Elevation Model (DEM) tools for the extraction and analysis of hydrologic information from topography. We present performance improvements on parallel hydrology algorithms in TauDEM suite that allowed us to process very big DEM data. The parallel algorithms are improved by applying block-wise data decomposition technique, improving communication model and parallel I/O enhancements to obtain maximum performance from available computational and storage resources at supercomputer systems. After the improvements, as a case study, we successfully filled the depressions of entire US 10-meter DEM data (667GB, 180 billion raster cells) within 2 hours that shows a significant improvement compared to the previous parallel algorithm that was unable to do the same task within 2 days using 4,096 processor cores on Stampede supercomputer. We report the methodology and make the performance analysis of the algorithm improvements.