Abstract |
Big data refers to a collection of data sets so large and complex that it becomes difficult to process with traditional data processing applications or methods, nor can it be managed as a single instance. Big data analytics, of course, is the process of examining these large and varied sets of information to deliver a more coherent understanding of this data, by uncovering unseen correlations, patterns, or trends. Typically, a large-scale data analysis operation, the analysis at hand is performed by a "supercomputer" or several high-performance computers (HPCs) that are linked together to pool computing resources. This schema is considered a "cluster" with each individual computer being a "node" work in parallel to solve problems, or in this instance, analyze big data. The topic of this paper addresses the planning and construction of a small-scale supercomputer (smaller than a breadbox) to perform analysis on data sets pertaining to employable skills for students entering the technology job field.
|