XSEDE16 has ended
Back To Schedule
Monday, July 18 • 8:00am - 5:00pm
Tutorial: The many faces of data management, interaction, and analysis using Wrangler.

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Link to slides
The goal of this tutorial is to provide guidance to participants on large-scale data services and analysis support with the newest XSEDE data research system, Wrangler. Being a mostly first of its kind XSEDE resource, both user and XSEDE staff training is needed to enable the novel research opportunities Wrangler presents. The tutorial consists of two major components. The morning sessions focus on helping user to familiar with the unique architecture and characteristics of Wrangler System and a set of data services the wrangler supports, including large scale file based data managements, database services, and data sharing services. The morning presentation includes introduction on the Wrangler system and its user environment, use of reservations for computing, data systems for structured and unstructured data, and data access layers using both Wranglers replicated long term storage system and high speed flash storage system. We will also introduce the Wrangler graphical interfaces, including the Wrangler Portal, Web based tools served by Wrangler including Jupyter notebooks and RStudio, and the iDrop web interface for iRODS. The afternoon session will focus on data driven analysis support on wrangler. The presentations are center around use of the dynamic provisioned of Hadoop ecosystem on Wrangler. The presentations include introduction on the core Hadoop cluster for big data analysis, using existing analysis routines through Hadoop Streaming, interactive analysis with Spark, using Hadoop/Spark with the often more familiar to researchers Python and R interfaces. 

Monday July 18, 2016 8:00am - 5:00pm EDT