

 |
iPCA (interactive Principal Component Analaysis) is a system, with which uses are interactively analyze data. It uses Principle Component Analysis (PCA) that is a widely used mathematical technique in many fields for factor and trend analysis, dimension reduction, etc. However, it is often considered to be a "black box" operation whose results are difficult to interpret and sometimes counter-intuitive to the user. In order to assist the user in better understanding and utilizing PCA, the system has been developed by visualizing the results of principal component analysis using multiple coordinated views and a rich set of user interactions. Our design philosophy is to support analysis of multivariate datasets through extensive interaction with the PCA output.
It consists of four views (A ~ D) and two control panels (E ~ F).
Projection View(A): Two principal components (by default, the first and second most dominant eigenvectors) are used to project data points onto a two-dimensional coordinate system.
Eigenvector View(B): In the Eigenvector View, data points are shown in the eigenspace. The calculated eigenvectors and their eigenvalues are displayed in a vertically projected parallel coordinates visualization, with eigenvectors ranked from top to bottom by dominance. The distances between eigenvectors in the parallel coordinate view vary based on their eigenvalues, separating the eigenvectors based on their mathematical weights.
Data view(C): The Data View is located below the Projection View, and shows a parallel coordinates visualization of all data points in the original data dimensions. In this view, an auto-scaling function is applied to increase the readibility of data.
Correlation View(D): Pearson-correlation coefficients and relationships between variables are represented in the Correlation View as a matrix of scatter plots and values. Since correlations between dimensions are symmetric, repetition is avoided by separating the matrix into three components: the diagonal, the bottom triangle, and the top triangle. The diagonal displays the name of the dimension as a text string. The bottom triangle shows the coefficient value between two dimensions with a color indicating positive (red), neutral (white), and negative (blue) correlations. The top triangle contains cells of scatter plots in which all data items are projected onto the two intersecting dimensions. The colors of the data items are the same as the colors used in the other three views so that clusters are easily identified.
A simple description (how to use):Video(WMV)
- Data loading
- Click "Browse..." button, and select the data you want to load. Then click the "Start Loading" button located below of the "Browse..." button.
- Navigation (Only works in the Projection view)
- Zooming (Mouse left button pressing - Zooming In/ Mouse right button pressing - Zooming Out)
- Panning (Mouse middle button pressing & move your mouse)
- Item selection (Allows in all views)
- Single item selection (Ctrl + Mouse left button clicking): Useful for selecting an individual item.
- Range selection (Alt + Mouse left button pressing + Creating a region boundary): Useful for selecting multiple items.
- Changing the pre-selected principal components.
By default, the first principal component (PC1) and the second principal component (PC2) are mapped with X-axis and Y-axis in the Projection view, correspondingly.
In the options (located next to the Eigenvector view), pre-selected options of PC1 and PC2 can be changed by the user. If you want to change the selection of PC2 to PC3, first click the PC2 check box and click the PC3 check box. Based on the principal component changes, you might see the changed visual representation in the Projection view - PC1 (X-axis) and PC3 (Y-Axis).
Input file format
#0303# [file_format_indicator]
150 4 [row col]
class Sepal_Length Sepal_Width Petal_Length Petal_Width [class_indicator dimension_variables]
setosa 0.556 -0.250 0.864 0.917 [class_info values_of_each_item]
setosa 0.667 0.167 0.864 0.917
setosa 0.778 0.000 0.898 0.917
setosa 0.833 0.083 0.831 0.917
setosa 0.611 -0.333 0.864 0.917
.
.
.
.
.
.
virginica 0.111 0.167 -0.390 -0.417
Caution: - Variable or class name should have no space within. If variable or class names in the dataset include space character, it needs to be replace with other characters such as underline character("_") or else.
- Categorical variables are not allowed in iPCA. If you have categorical variables, you should change it to numerical form.
Other sample datasets can be get from UCI machine learning repository.
- Sample datasets Iris, Ecoli, Wine
Published Papers
- Dong Hyun Jeong, Caroline Ziemkiewicz, William Ribarsky, and Remco Chang, Understanding Principal Component Analysis Using a Visual Analytics Tool, UKC 2009, Mathematics: Fundamentals and Applications, 2009. [PDF]
- Dong Hyun Jeong, Caroline Ziemkiewicz, Brian Fisher, William Ribarsky, Remco Chang, iPCA: An Interactive System for PCA-based Visual Analytics, Computer Graphics Forum (Eurovis 2009). pp. 767-774, 2009. [PDF]
Contact
If you would like to try the system, please send email to Dong Hyun Jeong (djeong[at]udc.edu.). In addition, please provide your feedbacks, comments, issues (including system errors) for help us improving the system. Thank you.
|
|
|