Open Access Open Access  Restricted Access Subscription Access

Few-Shot Learning for Effective Pattern Discovery: Overcoming Limitations of Traditional Data Mining in Micro Datasets

Anisha Bhowmick

Abstract


The large datasets called “Big Data” is heavily favored in traditional researches due to its large volume velocity and variety which has led to the development and optimization of different algorithms. And due to this reason unique challenges that are presented by smaller data volumes are usually ignored. This creates a significant gap in systematic analysis in relation with performance, robustness and optimal parameter setting of the classic data mining algorithms when applied to smaller data sets. Modern domains such as industrial IoT, Rare diseases diagnosis etc. usually generates small datasets and the performance and optimal configuration of “micro datasets” are prone to overfitting and poor result generation.

The aim of the study is to analyze the limitations and underperformance of traditional data mining algorithms such as Decision Trees, Apriori, and Clustering. And propose Few- Shot Learning (FSL) as an effective alternative approach in pattern discovery of smaller datasets “micro datasets”. Methodology is to conduct the experiment in two phases. First, applying traditional algorithm in “micro dataset” to document the performance and the reason that leads to the failure. Secondly, evaluate the prototypic FSL model on the same dataset and pen down the observations. Then we compare both the approaches based on generalized accuracy


Full Text:

PDF

References


J. Han, J. Pei, and M. Kamber, Data Mining: Concepts and Techniques, 3rd ed. Morgan Kaufmann, 2011.

X. Wu, X. Zhu, G.-Q. Wu, and W. Ding, “Data mining with big data,” IEEE Transaction Knowledge and Data Engineering, vol. 26, no. 1, pp. 97–107, Jan. 2014.

L. G. A. V. et al., “Rare diseases: challenges and opportunities for research and public health,” World Health Organization, 2013.

S. Geman, E. Bienenstock, and R. Doursat, “Neural networks and the bias/variance dilemma,” Neural Computation, vol. 4, no. 1, pp. 1–58, 1992.

J. R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.

R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” in Proc. 20th Int. Conf. Very Large Data Bases (VLDB), 1994,

pp. 487–499.

A. K. Jain, “Data clustering: 50 years beyond K- means,” Pattern Recognition Letters, vol. 31, no. 8, pp. 651–666, 2010.

A. McAfee, E. Brynjolfsson, T. H. Davenport, D. J. Patil, and D. Barton, “Big data: the management revolution,” Harvard Business Review, vol. 90, no. 10, pp. 60–68, 2012.

Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a few examples: A survey on few-shot learning,” ACM Computing Surveys, vol. 53, no. 3, pp. 1–34, 2020.

J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few- shot learning,” in Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 4077–4087.


Refbacks

  • There are currently no refbacks.