The Michigan Institute for Data Science (MIDAS) is organizing a micro Big Data Analytics workshop to openly discuss, share and collaborate on developing the foundation for developing a new Compressive Big Data Analytics (CBDA) theory. The highlights of the workshop are seminars by Dr. Saeid Amiri (Nebraska) and Dr. Ejaz Ahmed (Brock U).
Compressive Big Data Analytics (CBDA):
This is a high-risk/high-potential-impact idea. Basically, we are working on developing the foundations of a Compressive Big Data Analytics (CBDA) framework involving iterative generation of random (sub)samples from a Big Data collection, using classical techniques to develop model-based or non-parametric inference, repeat the (re)sampling and inference steps many times, and finally employ bootstrapping techniques to quantify probabilities, estimate likelihoods, or assess accuracy of findings. We are looking for collaborators with students that can help in the algorithmic development, deriving upper bound error estimates and demonstrating the application of this technique on several large, heterogeneous and multi-source datasets.
We expect that the CBDA approach may provide a scalable solution avoiding some of the Big Data management and analytics challenges. CBDA sampling is conducted on the data-element level, not on the case level, and the sampled values are not necessarily consistent across all data elements (e.g., high-throughput random sampling from cases and variables within cases). We are now investigating the theoretical properties (e.g., asymptotics, as sample sizes increase to infinity, but the data has sparse conditions) of model-free inference entirely based on the complete dataset without any parametric or model-limited restrictions.
Saeid Amiri
Presenter: Dr. Saeid Amiri (http://statistics.unl.edu/saeid-amiri)
Visit: April 21-23, 2015
Title: Random Subspace Scientific Inference based on High dimensional data
Seminar: Tuesday, April 21, 2015, 4:00-5:00 p.m., Palmer Commons, Great Lakes Room North
Abstract: Extraction of valuable information from Big data (n>>p) in high dimensions (p>>n) and the subsequent scientific inference using such derived information present considerable challenges in many medical, biological, social and data-driven sciences. In this talk, I will present statistical learning and unsupervised machine learning techniques for the low dimension data and discuss a new sub-space alternative approach. We will illustrate an extension method for higher-dimensions and big data based on random subspaces. We provide a series of arguments to justify the new technique and will provide examples involving real and simulated data to compare our method with other related techniques.
Ejaz Ahmed
Presenter: Dr. Ejaz Ahmed (http://statistics.unl.edu/saeid-amiri) (http://www.brocku.ca/mathematics-science/departments-and-centres/mathematics/people/professors/syed-ejaz-ahmed), distinguished prof at Brock U/Canada.
Visit: April 23-24, 2015
Title: TBD
Seminar: Friday, April 24, 2015, TBD (morning), place TBD
Abstract: TBD