比特派app最新版下载苹果|sklearn pca

作者: 比特派app最新版下载苹果
2024-03-09 22:23:15

sklearn.decomposition.PCA — scikit-learn 1.4.1 documentation

sklearn.decomposition.PCA — scikit-learn 1.4.1 documentation

Install

User Guide

API

Examples

Community

Getting Started

Tutorial

What's new

Glossary

Development

FAQ

Support

Related packages

Roadmap

Governance

About us

GitHub

Other Versions and Download

More

Getting Started

Tutorial

What's new

Glossary

Development

FAQ

Support

Related packages

Roadmap

Governance

About us

GitHub

Other Versions and Download

Toggle Menu

PrevUp

Next

scikit-learn 1.4.1

Other versions

Please cite us if you use the software.

sklearn.decomposition.PCA

PCA

PCA.fit

PCA.fit_transform

PCA.get_covariance

PCA.get_feature_names_out

PCA.get_metadata_routing

PCA.get_params

PCA.get_precision

PCA.inverse_transform

PCA.score

PCA.score_samples

PCA.set_output

PCA.set_params

PCA.transform

Examples using sklearn.decomposition.PCA

sklearn.decomposition.PCA¶

class sklearn.decomposition.PCA(n_components=None, *, copy=True, whiten=False, svd_solver='auto', tol=0.0, iterated_power='auto', n_oversamples=10, power_iteration_normalizer='auto', random_state=None)[source]¶

Principal component analysis (PCA).

Linear dimensionality reduction using Singular Value Decomposition of the

data to project it to a lower dimensional space. The input data is centered

but not scaled for each feature before applying the SVD.

It uses the LAPACK implementation of the full SVD or a randomized truncated

SVD by the method of Halko et al. 2009, depending on the shape of the input

data and the number of components to extract.

It can also use the scipy.sparse.linalg ARPACK implementation of the

truncated SVD.

Notice that this class does not support sparse input. See

TruncatedSVD for an alternative with sparse data.

For a usage example, see

PCA example with Iris Data-set

Read more in the User Guide.

Parameters:

n_componentsint, float or ‘mle’, default=NoneNumber of components to keep.

if n_components is not set all components are kept:

n_components == min(n_samples, n_features)

If n_components == 'mle' and svd_solver == 'full', Minka’s

MLE is used to guess the dimension. Use of n_components == 'mle'

will interpret svd_solver == 'auto' as svd_solver == 'full'.

If 0 < n_components < 1 and svd_solver == 'full', select the

number of components such that the amount of variance that needs to be

explained is greater than the percentage specified by n_components.

If svd_solver == 'arpack', the number of components must be

strictly less than the minimum of n_features and n_samples.

Hence, the None case results in:

n_components == min(n_samples, n_features) - 1

copybool, default=TrueIf False, data passed to fit are overwritten and running

fit(X).transform(X) will not yield the expected results,

use fit_transform(X) instead.

whitenbool, default=FalseWhen True (False by default) the components_ vectors are multiplied

by the square root of n_samples and then divided by the singular values

to ensure uncorrelated outputs with unit component-wise variances.

Whitening will remove some information from the transformed signal

(the relative variance scales of the components) but can sometime

improve the predictive accuracy of the downstream estimators by

making their data respect some hard-wired assumptions.

svd_solver{‘auto’, ‘full’, ‘arpack’, ‘randomized’}, default=’auto’

If auto :The solver is selected by a default policy based on X.shape and

n_components: if the input data is larger than 500x500 and the

number of components to extract is lower than 80% of the smallest

dimension of the data, then the more efficient ‘randomized’

method is enabled. Otherwise the exact full SVD is computed and

optionally truncated afterwards.

If full :run exact full SVD calling the standard LAPACK solver via

scipy.linalg.svd and select the components by postprocessing

If arpack :run SVD truncated to n_components calling ARPACK solver via

scipy.sparse.linalg.svds. It requires strictly

0 < n_components < min(X.shape)

If randomized :run randomized SVD by the method of Halko et al.

New in version 0.18.0.

tolfloat, default=0.0Tolerance for singular values computed by svd_solver == ‘arpack’.

Must be of range [0.0, infinity).

New in version 0.18.0.

iterated_powerint or ‘auto’, default=’auto’Number of iterations for the power method computed by

svd_solver == ‘randomized’.

Must be of range [0, infinity).

New in version 0.18.0.

n_oversamplesint, default=10This parameter is only relevant when svd_solver="randomized".

It corresponds to the additional number of random vectors to sample the

range of X so as to ensure proper conditioning. See

randomized_svd for more details.

New in version 1.1.

power_iteration_normalizer{‘auto’, ‘QR’, ‘LU’, ‘none’}, default=’auto’Power iteration normalizer for randomized SVD solver.

Not used by ARPACK. See randomized_svd

for more details.

New in version 1.1.

random_stateint, RandomState instance or None, default=NoneUsed when the ‘arpack’ or ‘randomized’ solvers are used. Pass an int

for reproducible results across multiple function calls.

See Glossary.

New in version 0.18.0.

Attributes:

components_ndarray of shape (n_components, n_features)Principal axes in feature space, representing the directions of

maximum variance in the data. Equivalently, the right singular

vectors of the centered input data, parallel to its eigenvectors.

The components are sorted by decreasing explained_variance_.

explained_variance_ndarray of shape (n_components,)The amount of variance explained by each of the selected components.

The variance estimation uses n_samples - 1 degrees of freedom.

Equal to n_components largest eigenvalues

of the covariance matrix of X.

New in version 0.18.

explained_variance_ratio_ndarray of shape (n_components,)Percentage of variance explained by each of the selected components.

If n_components is not set then all components are stored and the

sum of the ratios is equal to 1.0.

singular_values_ndarray of shape (n_components,)The singular values corresponding to each of the selected components.

The singular values are equal to the 2-norms of the n_components

variables in the lower-dimensional space.

New in version 0.19.

mean_ndarray of shape (n_features,)Per-feature empirical mean, estimated from the training set.

Equal to X.mean(axis=0).

n_components_intThe estimated number of components. When n_components is set

to ‘mle’ or a number between 0 and 1 (with svd_solver == ‘full’) this

number is estimated from input data. Otherwise it equals the parameter

n_components, or the lesser value of n_features and n_samples

if n_components is None.

n_samples_intNumber of samples in the training data.

noise_variance_floatThe estimated noise covariance following the Probabilistic PCA model

from Tipping and Bishop 1999. See “Pattern Recognition and

Machine Learning” by C. Bishop, 12.2.1 p. 574 or

http://www.miketipping.com/papers/met-mppca.pdf. It is required to

compute the estimated data covariance and score samples.

Equal to the average of (min(n_features, n_samples) - n_components)

smallest eigenvalues of the covariance matrix of X.

n_features_in_intNumber of features seen during fit.

New in version 0.24.

feature_names_in_ndarray of shape (n_features_in_,)Names of features seen during fit. Defined only when X

has feature names that are all strings.

New in version 1.0.

See also

KernelPCAKernel Principal Component Analysis.

SparsePCASparse Principal Component Analysis.

TruncatedSVDDimensionality reduction using truncated SVD.

IncrementalPCAIncremental Principal Component Analysis.

References

For n_components == ‘mle’, this class uses the method from:

Minka, T. P.. “Automatic choice of dimensionality for PCA”.

In NIPS, pp. 598-604

Implements the probabilistic PCA model from:

Tipping, M. E., and Bishop, C. M. (1999). “Probabilistic principal

component analysis”. Journal of the Royal Statistical Society:

Series B (Statistical Methodology), 61(3), 611-622.

via the score and score_samples methods.

For svd_solver == ‘arpack’, refer to scipy.sparse.linalg.svds.

For svd_solver == ‘randomized’, see:

Halko, N., Martinsson, P. G., and Tropp, J. A. (2011).

“Finding structure with randomness: Probabilistic algorithms for

constructing approximate matrix decompositions”.

SIAM review, 53(2), 217-288.

and also

Martinsson, P. G., Rokhlin, V., and Tygert, M. (2011).

“A randomized algorithm for the decomposition of matrices”.

Applied and Computational Harmonic Analysis, 30(1), 47-68.

Examples

>>> import numpy as np

>>> from sklearn.decomposition import PCA

>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])

>>> pca = PCA(n_components=2)

>>> pca.fit(X)

PCA(n_components=2)

>>> print(pca.explained_variance_ratio_)

[0.9924... 0.0075...]

>>> print(pca.singular_values_)

[6.30061... 0.54980...]

>>> pca = PCA(n_components=2, svd_solver='full')

>>> pca.fit(X)

PCA(n_components=2, svd_solver='full')

>>> print(pca.explained_variance_ratio_)

[0.9924... 0.00755...]

>>> print(pca.singular_values_)

[6.30061... 0.54980...]

>>> pca = PCA(n_components=1, svd_solver='arpack')

>>> pca.fit(X)

PCA(n_components=1, svd_solver='arpack')

>>> print(pca.explained_variance_ratio_)

[0.99244...]

>>> print(pca.singular_values_)

[6.30061...]

Methods

fit(X[, y])

Fit the model with X.

fit_transform(X[, y])

Fit the model with X and apply the dimensionality reduction on X.

get_covariance()

Compute data covariance with the generative model.

get_feature_names_out([input_features])

Get output feature names for transformation.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

get_precision()

Compute data precision matrix with the generative model.

inverse_transform(X)

Transform data back to its original space.

score(X[, y])

Return the average log-likelihood of all samples.

score_samples(X)

Return the log-likelihood of each sample.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Apply dimensionality reduction to X.

fit(X, y=None)[source]¶

Fit the model with X.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features)Training data, where n_samples is the number of samples

and n_features is the number of features.

yIgnoredIgnored.

Returns:

selfobjectReturns the instance itself.

fit_transform(X, y=None)[source]¶

Fit the model with X and apply the dimensionality reduction on X.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features)Training data, where n_samples is the number of samples

and n_features is the number of features.

yIgnoredIgnored.

Returns:

X_newndarray of shape (n_samples, n_components)Transformed values.

Notes

This method returns a Fortran-ordered array. To convert it to a

C-ordered array, use ‘np.ascontiguousarray’.

get_covariance()[source]¶

Compute data covariance with the generative model.

cov = components_.T * S**2 * components_ + sigma2 * eye(n_features)

where S**2 contains the explained variances, and sigma2 contains the

noise variances.

Returns:

covarray of shape=(n_features, n_features)Estimated covariance of data.

get_feature_names_out(input_features=None)[source]¶

Get output feature names for transformation.

The feature names out will prefixed by the lowercased class name. For

example, if the transformer outputs 3 features, then the feature names

out are: ["class_name0", "class_name1", "class_name2"].

Parameters:

input_featuresarray-like of str or None, default=NoneOnly used to validate feature names with the names seen in fit.

Returns:

feature_names_outndarray of str objectsTransformed feature names.

get_metadata_routing()[source]¶

Get metadata routing of this object.

Please check User Guide on how the routing

mechanism works.

Returns:

routingMetadataRequestA MetadataRequest encapsulating

routing information.

get_params(deep=True)[source]¶

Get parameters for this estimator.

Parameters:

deepbool, default=TrueIf True, will return the parameters for this estimator and

contained subobjects that are estimators.

Returns:

paramsdictParameter names mapped to their values.

get_precision()[source]¶

Compute data precision matrix with the generative model.

Equals the inverse of the covariance but computed with

the matrix inversion lemma for efficiency.

Returns:

precisionarray, shape=(n_features, n_features)Estimated precision of data.

inverse_transform(X)[source]¶

Transform data back to its original space.

In other words, return an input X_original whose transform would be X.

Parameters:

Xarray-like of shape (n_samples, n_components)New data, where n_samples is the number of samples

and n_components is the number of components.

Returns:

X_original array-like of shape (n_samples, n_features)Original data, where n_samples is the number of samples

and n_features is the number of features.

Notes

If whitening is enabled, inverse_transform will compute the

exact inverse operation, which includes reversing whitening.

score(X, y=None)[source]¶

Return the average log-likelihood of all samples.

See. “Pattern Recognition and Machine Learning”

by C. Bishop, 12.2.1 p. 574

or http://www.miketipping.com/papers/met-mppca.pdf

Parameters:

Xarray-like of shape (n_samples, n_features)The data.

yIgnoredIgnored.

Returns:

llfloatAverage log-likelihood of the samples under the current model.

score_samples(X)[source]¶

Return the log-likelihood of each sample.

See. “Pattern Recognition and Machine Learning”

by C. Bishop, 12.2.1 p. 574

or http://www.miketipping.com/papers/met-mppca.pdf

Parameters:

Xarray-like of shape (n_samples, n_features)The data.

Returns:

llndarray of shape (n_samples,)Log-likelihood of each sample under the current model.

set_output(*, transform=None)[source]¶

Set output container.

See Introducing the set_output API

for an example on how to use the API.

Parameters:

transform{“default”, “pandas”}, default=NoneConfigure output of transform and fit_transform.

"default": Default output format of a transformer

"pandas": DataFrame output

"polars": Polars output

None: Transform configuration is unchanged

New in version 1.4: "polars" option was added.

Returns:

selfestimator instanceEstimator instance.

set_params(**params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects

(such as Pipeline). The latter have

parameters of the form __ so that it’s

possible to update each component of a nested object.

Parameters:

**paramsdictEstimator parameters.

Returns:

selfestimator instanceEstimator instance.

transform(X)[source]¶

Apply dimensionality reduction to X.

X is projected on the first principal components previously extracted

from a training set.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features)New data, where n_samples is the number of samples

and n_features is the number of features.

Returns:

X_newarray-like of shape (n_samples, n_components)Projection of X in the first principal components, where n_samples

is the number of samples and n_components is the number of the components.

Examples using sklearn.decomposition.PCA¶

Release Highlights for scikit-learn 1.4

Release Highlights for scikit-learn 1.4

A demo of K-Means clustering on the handwritten digits data

A demo of K-Means clustering on the handwritten digits data

Principal Component Regression vs Partial Least Squares Regression

Principal Component Regression vs Partial Least Squares Regression

The Iris Dataset

The Iris Dataset

Blind source separation using FastICA

Blind source separation using FastICA

Comparison of LDA and PCA 2D projection of Iris dataset

Comparison of LDA and PCA 2D projection of Iris dataset

Faces dataset decompositions

Faces dataset decompositions

Factor Analysis (with rotation) to visualize patterns

Factor Analysis (with rotation) to visualize patterns

FastICA on 2D point clouds

FastICA on 2D point clouds

Incremental PCA

Incremental PCA

Kernel PCA

Kernel PCA

Model selection with Probabilistic PCA and Factor Analysis (FA)

Model selection with Probabilistic PCA and Factor Analysis (FA)

PCA example with Iris Data-set

PCA example with Iris Data-set

Faces recognition example using eigenfaces and SVMs

Faces recognition example using eigenfaces and SVMs

Image denoising using kernel PCA

Image denoising using kernel PCA

Multi-dimensional scaling

Multi-dimensional scaling

Displaying Pipelines

Displaying Pipelines

Explicit feature map approximation for RBF kernels

Explicit feature map approximation for RBF kernels

Multilabel classification

Multilabel classification

Balance model complexity and cross-validated score

Balance model complexity and cross-validated score

Dimensionality Reduction with Neighborhood Components Analysis

Dimensionality Reduction with Neighborhood Components Analysis

Kernel Density Estimation

Kernel Density Estimation

Column Transformer with Heterogeneous Data Sources

Column Transformer with Heterogeneous Data Sources

Concatenating multiple feature extraction methods

Concatenating multiple feature extraction methods

Pipelining: chaining a PCA and a logistic regression

Pipelining: chaining a PCA and a logistic regression

Selecting dimensionality reduction with Pipeline and GridSearchCV

Selecting dimensionality reduction with Pipeline and GridSearchCV

Importance of Feature Scaling

Importance of Feature Scaling

© 2007 - 2024, scikit-learn developers (BSD License).

Show this page source

SKLEARN中的PCA(Principal Component Analysis)主成分分析法 - 知乎

SKLEARN中的PCA(Principal Component Analysis)主成分分析法 - 知乎首发于数据分析切换模式写文章登录/注册SKLEARN中的PCA(Principal Component Analysis)主成分分析法HarryYang致力于用算法评估管理,改善管理PCA(Principal Component Analysis)主成分分析法是机器学习中非常重要的方法,主要作用有降维和可视化。PCA的过程除了背后深刻的数学意义外,也有深刻的思路和方法。1. 准备数据集本文利用sklearn中的datasets的Iris数据做示范,说明sklearn中的PCA方法。导入数据并对数据做一个概览:import numpy as np

import matplotlib.pyplot as plt

from sklearn import datasets

digits = datasets.load_digits()

X = digits.data

y = digits.target

X.shape,y.shape

((1797, 64), (1797,))将数据做一个分离,分离成训练数据集和测试数据集:from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y,random_state = 666)

X_train.shape,X_test.shape

((1347, 64), (450, 64))2. 利用PCA对数据集进行降维将按照降维前,降维后的模型训练时间和score来了解PCA的作用首先不对数据集做处理,直接fit,同时对降维之前的fit过程计时,查看score:%%time

from sklearn.neighbors import KNeighborsClassifier

knn_clf = KNeighborsClassifier()

knn_clf.fit(X_train, y_train)

Wall time: 88 ms

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',

metric_params=None, n_jobs=None, n_neighbors=5, p=2,

weights='uniform')

knn_clf.score(X_test, y_test)

0.9866666666666667通过sklearn中的PCA对数据集进行降维,查看降维后的运行时间和score:from sklearn.decomposition import PCA

pca = PCA(n_components = 2)

pca.fit(X_train)

X_train_reduction = pca.transform(X_train)

X_test_reduction = pca.transform(X_test)

%%time

knn_clf = KNeighborsClassifier()

knn_clf.fit(X_train_reduction,y_train)

Wall time: 3 ms

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',

metric_params=None, n_jobs=None, n_neighbors=5, p=2,

weights='uniform')

knn_clf.score(X_test_reduction,y_test)

0.6066666666666667从以上数据来看,时间的运行时间有明显降低,但是准确率不是我们可以接受的:#通过sklearn.PCA.explaine_variance_ration_来查看刚刚的2个纬度的方差爱解释度:

pca.explained_variance_ratio_

array([0.14566817, 0.13735469])

pca = PCA(n_components = X_train.shape[1])##计算所有纬度的方差解释度

pca.fit(X_train)

pca.explained_variance_ratio_

array([1.45668166e-01, 1.37354688e-01, 1.17777287e-01, 8.49968861e-02,

5.86018996e-02, 5.11542945e-02, 4.26605279e-02, 3.60119663e-02,

3.41105814e-02, 3.05407804e-02, 2.42337671e-02, 2.28700570e-02,

1.80304649e-02, 1.79346003e-02, 1.45798298e-02, 1.42044841e-02,

1.29961033e-02, 1.26617002e-02, 1.01728635e-02, 9.09314698e-03,

8.85220461e-03, 7.73828332e-03, 7.60516219e-03, 7.11864860e-03,

6.85977267e-03, 5.76411920e-03, 5.71688020e-03, 5.08255707e-03,

4.89020776e-03, 4.34888085e-03, 3.72917505e-03, 3.57755036e-03,

3.26989470e-03, 3.14917937e-03, 3.09269839e-03, 2.87619649e-03,

2.50362666e-03, 2.25417403e-03, 2.20030857e-03, 1.98028746e-03,

1.88195578e-03, 1.52769283e-03, 1.42823692e-03, 1.38003340e-03,

1.17572392e-03, 1.07377463e-03, 9.55152460e-04, 9.00017642e-04,

5.79162563e-04, 3.82793717e-04, 2.38328586e-04, 8.40132221e-05,

5.60545588e-05, 5.48538930e-05, 1.08077650e-05, 4.01354717e-06,

1.23186515e-06, 1.05783059e-06, 6.06659094e-07, 5.86686040e-07,

1.71368535e-33, 7.44075955e-34, 7.44075955e-34, 7.15189459e-34])

##将各个纬度的累计解释度和特征的数量绘制线形图如下:

plt.plot([i for i in range(X_train.shape[1])],

[np.sum(pca.explained_variance_ratio_[:i+1]) for i in range(X_train.shape[1])])

plt.show()其中sklearn已经有一个封装好的通过方差解释度来确定数据特征数量的超参数。以下按95%的解释度来计算特征数量:#其中sklearn中已经有封装一个函数

pca = PCA(0.95)

pca.fit(X_train)

PCA(copy=True, iterated_power='auto', n_components=0.95, random_state=None,

svd_solver='auto', tol=0.0, whiten=False)

#查看选择特征的数量

pca.n_components_

28

X_train_reduction = pca.transform(X_train)

X_test_reduction = pca.transform(X_test)

#查看各个特征的方差解释度:

pca.explained_variance_ratio_

array([0.14566817, 0.13735469, 0.11777729, 0.08499689, 0.0586019 ,

0.05115429, 0.04266053, 0.03601197, 0.03411058, 0.03054078,

0.02423377, 0.02287006, 0.01803046, 0.0179346 , 0.01457983,

0.01420448, 0.0129961 , 0.0126617 , 0.01017286, 0.00909315,

0.0088522 , 0.00773828, 0.00760516, 0.00711865, 0.00685977,

0.00576412, 0.00571688, 0.00508256])

%%time

##由此可看到fit时间明显降低

knn_clf = KNeighborsClassifier()

knn_clf.fit(X_train_reduction,y_train)

Wall time: 8 ms

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',

metric_params=None, n_jobs=None, n_neighbors=5, p=2,

weights='uniform')

knn_clf.score(X_test_reduction,y_test)

0.98由此可以看到纬度有67减低到28之后,score的降低比较小,但是训练时间降低非常多。pca = PCA(n_components = 2)

pca.fit(X)

X_reduction = pca.transform(X)3. PCA在可视化中的应用虽然降低纬度后会损失信息量,降到二维进行可视化,是一个很好的方法:##将数据集的的前2个特征绘制散点图:

for i in range(10):

plt.scatter(X_reduction[y == i, 0],X_reduction[y==i,1],alpha=0.8)

plt.show()从上图可以看到,对多维度的数据,可以通过降纬后进行可是,让人对很多的数据有一个直观的了解。发布于 2020-01-29 16:38数据降维sklearnPrincipal Component Analysis​赞同 21​​2 条评论​分享​喜欢​收藏​申请转载​文章被以下专栏收录数据分析通过量化和数据改善管

具体介绍sklearn库中:主成分分析(PCA)的参数、属性、方法_sklearn pca参数-CSDN博客

>

具体介绍sklearn库中:主成分分析(PCA)的参数、属性、方法_sklearn pca参数-CSDN博客

具体介绍sklearn库中:主成分分析(PCA)的参数、属性、方法

最新推荐文章于 2024-03-08 23:20:52 发布

SGangX

最新推荐文章于 2024-03-08 23:20:52 发布

阅读量2.3w

收藏

215

点赞数

45

分类专栏:

多元统计分析

机器学习

文章标签:

python

机器学习

pca降维

数据分析

统计学

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。

本文链接:https://blog.csdn.net/weixin_44781900/article/details/104839136

版权

多元统计分析

同时被 2 个专栏收录

2 篇文章

0 订阅

订阅专栏

机器学习

1 篇文章

0 订阅

订阅专栏

转载请注明出处:https://editor.csdn.net/md?articleId=104839136

文章目录

主成分分析(PCA)Sklearn库中PCA一、参数说明(Parameters)二、属性(Attributes)三、方法(Methods)四、示例(Sample)五、参考资料(Reference data)

主成分分析(PCA)

主成分分析(Principal components analysis,以下简称PCA)的思想是将n维特征映射到k维上(k

这里主要针对用Sklearn库里的PCA,并解释里面的参数、属性、方法。

Sklearn库中PCA

一、参数说明(Parameters)

sklearn.decomposition.PCA(n_components=None, copy=True, whiten=False)

1. n_components:int, float, None or str 意义 :代表返回的主成分的个数,也就是你想把数据降到几维 n_components=2 代表返回前2个主成分 0 < n_components < 1代表满足最低的主成分方差累计贡献率 n_components=0.98,指返回满足主成分方差累计贡献率达到98%的主成分 n_components=None,返回所有主成分 n_components=‘mle’,将自动选取主成分个数n,使得满足所要求的方差百分比

2. copy : bool类型, False/True 默认是True 意义:在运行的过程中,是否将原数据复制。由于你在运行的过程中,是在降维,数据会变动。 这copy主要影响的是,调用显示降维后的数据的方法不同。 copy=True时,直接 fit_transform(X),就能够显示出降维后的数据。 copy=False时,需要 fit(X).transform(X) ,才能够显示出降维后的数据。 (fit_transform()方法后面会讲到!)

3. whiten:bool类型,False/True 默认是False 意义:白化。白化是一种重要的预处理过程,其目的就是降低输入数据的冗余性,使得经过白化处理的输入数据具有如下性质:(i)特征之间相关性较低;(ii)所有特征具有相同的方差。

4. svd_solver:str类型,str {‘auto’, ‘full’, ‘arpack’, ‘randomized’} 意义:定奇异值分解 SVD 的方法。 svd_solver=auto:PCA 类自动选择下述三种算法权衡。 svd_solver=‘full’:传统意义上的 SVD,使用了 scipy 库对应的实现。 svd_solver=‘arpack’:直接使用 scipy 库的 sparse SVD 实现,和 randomized 的适用场景类似。 svd_solver=‘randomized’:适用于数据量大,数据维度多同时主成分数目比例又较低的 PCA 降维。

二、属性(Attributes)

1. components_:返回最大方差的主成分。 2. explained_variance_:它代表降维后的各主成分的方差值。方差值越大,则说明越是重要的主成分。 3. explained_variance_ratio_:它代表降维后的各主成分的方差值占总方差值的比例,这个比例越大,则越是重要的主成分。(主成分方差贡献率) 4. singular_values_:返回所被选主成分的奇异值。 实现降维的过程中,有两个方法,一种是用特征值分解,另一种用奇异值分解,前者限制比较多,需要矩阵是方阵,而后者可以是任意矩阵,而且计算量比前者少,所以说一般实现PCA都是用奇异值分解的方式。 5. mean_:每个特征的经验平均值,由训练集估计。 6. n_features_:训练数据中的特征数。 7. n_samples_:训练数据中的样本数量。 8. noise_variance_:噪声协方差

三、方法(Methods)

1. fit(self, X,Y=None) #模型训练,由于PCA是无监督学习,所以Y=None,没有标签。 如:

model=decomposition.PCA(n_components=2)

model.fit(X)

2. fit_transform(self, X,Y=None)#:将模型与X进行训练,并对X进行降维处理,返回的是降维后的数据。 如:

X_new=model.fit_transform(X)

3. get_covariance(self)#获得协方差数据 4. get_params(self,deep=True)#返回模型的参数 如:

print(model.get_params())

输出:{'copy': True, 'iterated_power': 'auto', 'n_components': 3, 'random_state': None, 'svd_solver': 'auto', 'tol': 0.0, 'whiten': False}

5. get_precision(self)#计算数据精度矩阵( 用生成模型) 6. inverse_transform(self, X)#将降维后的数据转换成原始数据,但可能不会完全一样 7. score(self, X, Y=None)#计算所有样本的log似然平均值 8. transform(X)#将数据X转换成降维后的数据。当模型训练好后,对于新输入的数据,都可以用transform方法来降维。

四、示例(Sample)

import numpy as np

from sklearn import decomposition,datasets

iris=datasets.load_iris()#加载数据

X=iris['data']

model=decomposition.PCA(n_components=2)

model.fit(X)

X_new=model.fit_transform(X)

Maxcomponent=model.components_

ratio=model.explained_variance_ratio_

score=model.score(X)

print('降维后的数据:',X_new)

print('返回具有最大方差的成分:',Maxcomponent)

print('保留主成分的方差贡献率:',ratio)

print('所有样本的log似然平均值:',score)

print('奇异值:',model.singular_values_)

print('噪声协方差:',model.noise_variance_)

g1=plt.figure(1,figsize=(8,6))

plt.scatter(X_new[:,0],X_new[:,1],c='r',cmap=plt.cm.Set1, edgecolor='k', s=40)

plt.xlabel('x1')

plt.ylabel('x2')

plt.title('After the dimension reduction')

plt.show()

五、参考资料(Reference data)

主成分分析(Principal components analysis)-最大方差解释: https://www.cnblogs.com/jerrylead/archive/2011/04/18/2020209.html 主成分分析(Principal components analysis)-最小平方误差解释: https://www.cnblogs.com/jerrylead/archive/2011/04/18/2020216.html 机器学习(七)白化whitening: https://blog.csdn.net/hjimce/article/details/50864602?depth_1-utm_source=distribute.pc_relevant.none-task&utm_source=distribute.pc_relevant.none-task scikit-learn源码之降维–PCA: https://zhuanlan.zhihu.com/p/53268659 Sklearn中的PCA: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html?highlight=pca#sklearn.decomposition.PCA.set_params

优惠劵

SGangX

关注

关注

45

点赞

215

收藏

觉得还不错?

一键收藏

知道了

7

评论

具体介绍sklearn库中:主成分分析(PCA)的参数、属性、方法

主成分分析(PCA)主成分分析(Principal components analysis,以下简称PCA)的思想是将n维特征映射到k维上(k<n),这k维是全新的正交特征(新的坐标系)。这k维特征称为主元,是重新构造出来的k维特征,而不是简单地从n维特征中去除其余n-k维特征。实现这思想的方法就是降维,用低维的数据去代表高维的数据,也就是用少数几个变量代替原有的数目庞大的变量,把重复的信...

复制链接

扫一扫

专栏目录

PCA.rar_DEMO_PCA VS_VS PCA_VS下主成分分析PCA使用方法_主成分分析

09-21

一个主成分分析算法的典型使用demo,在vs平台下实现

主成分分析(PCA):主成分分析(PCA)-matlab开发

05-31

对实值数据进行主成分分析 (PCA)。

有两种方法可用:'eig' 和 'svd',它们分别通过特征值分解和奇异值分解来解决问题。 请注意“svd”在“经济”模式下运行。

7 条评论

您还未登录,请先

登录

后发表或查看评论

使用Sklearn学习降维算法PCA和SVD

理科男同学

10-24

4811

1,概述

1.1,什么是维度?

我们先来解释一下维度的概念。

对于数组和Series来说,维度就是功能shape返回的结果,shape中返回了几个数字,就是几维。索引以外的数据,不分行列的叫一维(此时shape返回唯一的维度上的数据个数),有行列之分叫二维(shape返回行x列),也称为表。一张表最多二维,复数的表构成了更高的维度。当一个数组中存在2张3行4列的表时,shape返回的是(更高维,行,列)。当数组中存在2组2张3行4列的表时,数据就是4维,shape返回(2,2,3,4)。

数组中的每

sklearn.decomposition.PCA 参数速查手册

02-10

1801

sklearn.decomposition.PCA 参数速查手册

调用

sklearn.decomposition.PCA(ncomponents=None, copy=True, whiten=False, svdsolver='auto', tol=0.0, iteratedpower='auto', randomstate=None)参数n_components

释义

PCA 算法中所要保...

sklearn中主成分分析PCA参数解释

知识搬运者

08-20

737

【代码】sklearn中主成分分析PCA参数解释。

scikit-learn中PCA的使用方法

热门推荐

wepon的专栏

12-27

16万+

scikit-learn中PCA的使用方法

@author:wepon

@blog:http://blog.csdn.net/u012162613/article/details/42192293

在前一篇文章 主成分分析(PCA) 中,我基于python和numpy实现了PCA算法,主要是为了加深对算法的理解,算法的实现很粗糙,实际应用中我们一般调用成熟的包,本文就结束

主成分分析(PCA)简介及sklearn参数

lty_sky的专栏

04-05

1891

1.PCA简介

  PCA作为降维最重要的方法之一,在数据压缩消除冗余和数据噪音消除等领域都有广泛的应用。PCA的思想就是将高维数据投影到低维,一般基于两个标准选择投影方向:

基于最小投影距离

    样本点到投影超平面的距离足够近

基于最大投影方差

    样本点投影在超平面上的方差足够大,能够竟可能的分开,即方差最大方向的分解

  ps:什么情况下需要进行降维?

    数据集特征数...

PCA(主成分分析)函数使用方法及参数详解

AI_dataloads的博客

10-08

1846

PCA 的目标是通过线性变换将原始数据投影到一个新的低维空间,使得投影数据的方差最大化。通过降维,可以减少数据的复杂性,简化模型的计算,同时尽可能保留原始数据的信息。

Python机器学习笔记 使用scikit-learn工具进行PCA降维

weixin_30347335的博客

04-04

2480

  之前总结过关于PCA的知识:深入学习主成分分析(PCA)算法原理。这里打算再写一篇笔记,总结一下如何使用scikit-learn工具来进行PCA降维。

  在数据处理中,经常会遇到特征维度比样本数量多得多的情况,如果拿到实际工程中去跑,效果不一定好。一是因为冗余的特征会带来一些噪音,影响计算的结果;二是因为无关的特征会加大计算量,耗费时间和资源。所以我们通常会对数据重新变换一下,再跑模型。数...

sklearn.decomposition.PCA介绍

foxchopin的博客

09-22

1万+

sklearn.decomposition.PCA介绍

    下面我们主要基于sklearn.decomposition.PCA来讲解如何使用scikit-learn进行PCA降维。PCA类基本不需要调参,一般来说,我们只需要指定我们需要降维到的维度,或者我们希望降维后的主成分的方差和占原始维度所有特征方差和的比例阈值就可以了。

    现在我们对sklearn.decomposit

机器学习算法与Python实践(12) - sklearn中的 PCA 的使用

_提拉米苏的博客

01-03

2372

机器学习算法与Python实践(12) - sklearn中的 PCA 的使用

主成分分析_主成分分析_主成分分析PCA的matlab实现_

09-30

主成分分析的3套不同方法,每一行均有中文注释

python主成分分析PCA完整代码以及结果图片

03-13

python主成分分析PCA完整代码以及结果图片

svd算法matlab代码-pca:主成分分析

05-27

svd算法matlab代码

PCA

Matlab

PCA

algorithm

sample

code

using

SVD

MATLAB_PCA.rar_pca_主成分_主成分PCA分析_主成分分析_主成分分析pca

07-14

利用MATLAB进行主成分分析代码,利用MATLAB自带主成分分析函数进行主成分分析,只有湘西注释。

深入浅出机器学习算法:主成分分析

02-24

当两个特征包含几乎一模一样的信息时,其中一个特征往往是可以剔除的(比如温度和体感温度变量)主成分分析PrincipalComponentAnalysis,PCA是最常用的降维方法之一,它可以尽可能提取众多维度中的有效信息,降低数据...

matlab 实现主成分分析(PCA)

06-04

基于matlab实现PCA降维算法,可用于多维数据的损失最小化压缩,内附全代码

python基础9_序列类型

最新发布

qq_54503901的博客

03-08

281

我们可以发现字符串里面可以有多个字符, 同时每个字符呢,都有一个下标,什么是下标呢,就是每个字符的 编号,就跟我们上学时,是不是每个学生都有一个学号,通过学号,我就能找到你这个人。值得注意的是有一种情况叫做下标越界,什么叫做下标越界呢,简单来说就是我序列里面的下标只有10个下标,但是你获取值是通过下标12, 15等等这些没有的下标去获取,你可以这样理解, 你手里有一把手术刀,你用你魔法般的手法,把有序序列这个数据,想怎么切成几段,就切成几段,然后从中获取自己想要的值。元组也是属于有序的序列类型。

pca主成分分析 python

09-08

PCA(Principal Component Analysis)是一种常用的降维算法,可以用于对数据进行特征提取和数据可视化。下面是使用Python进行PCA主成分分析的步骤:

1. 首先,需要导入PCA模块,可以使用sklearn库中的PCA类来实现。具体的导入方式如下:

```python

from sklearn.decomposition import PCA

```

2. 接下来,需要准备数据并进行标准化处理。标准化数据是为了保证数据的均值为0,方差为1,使得不同维度的特征具有相同的重要性。可以使用sklearn库中的StandardScaler类来进行标准化处理。具体的代码如下:

```python

from sklearn.preprocessing import StandardScaler

# 假设数据集存储在X变量中

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

```

3. 然后,可以创建PCA对象,并调用其fit_transform方法对数据进行降维。在创建PCA对象时,可以指定主成分的数量(n_components参数),也可以根据样本特征方差来自动确定降维后的维度数(n_components=None)。具体的代码如下:

```python

pca = PCA(n_components=2) # 指定降维后的特征维度数目为2

X_pca = pca.fit_transform(X_scaled)

```

4. 最后,可以通过访问PCA对象的属性来获取降维后的特征向量和解释方差比。具体的代码如下:

```python

# 获取降维后的特征向量

components = pca.components_

# 获取解释方差比

explained_variance_ratio = pca.explained_variance_ratio_

```

以上就是使用Python进行PCA主成分分析的基本步骤和代码示例。通过PCA降维,可以将高维数据映射到低维空间,以达到降低数据维度和减少冗余信息的目的。

“相关推荐”对你有帮助么?

非常没帮助

没帮助

一般

有帮助

非常有帮助

提交

SGangX

CSDN认证博客专家

CSDN认证企业博客

码龄5年

暂无认证

3

原创

111万+

周排名

83万+

总排名

2万+

访问

等级

199

积分

2

粉丝

48

获赞

7

评论

222

收藏

私信

关注

热门文章

具体介绍sklearn库中:主成分分析(PCA)的参数、属性、方法

23887

(基础准备)多元相关与回归分析——一元线性相关与回归分析(一) python+numpy库 实现

799

机器学习笔记:python中scikit-learn库的K-Means类

621

分类专栏

无监督学习

1篇

聚类

1篇

机器学习

1篇

多元统计分析

2篇

最新评论

具体介绍sklearn库中:主成分分析(PCA)的参数、属性、方法

dengdengwo_123:

并不是在原来的feature中选取,而是重新得到了新的feature

具体介绍sklearn库中:主成分分析(PCA)的参数、属性、方法

fendon@l:

可以算一下所有特征方差,对比一下就知道选出来是哪几个特征了

具体介绍sklearn库中:主成分分析(PCA)的参数、属性、方法

Dream seeker_z:

主成分降维出来不会保留成分的标签或列名

具体介绍sklearn库中:主成分分析(PCA)的参数、属性、方法

weixin_44995786:

你好,请问解决了吗

具体介绍sklearn库中:主成分分析(PCA)的参数、属性、方法

9呀:

我也找半天,请问你解决了吗

您愿意向朋友推荐“博客详情页”吗?

强烈不推荐

不推荐

一般般

推荐

强烈推荐

提交

最新文章

机器学习笔记:python中scikit-learn库的K-Means类

(基础准备)多元相关与回归分析——一元线性相关与回归分析(一) python+numpy库 实现

2020年3篇

目录

目录

分类专栏

无监督学习

1篇

聚类

1篇

机器学习

1篇

多元统计分析

2篇

目录

评论 7

被折叠的  条评论

为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

祝福语

请填写红包祝福语或标题

红包数量

红包个数最小为10个

红包总金额

红包金额最低5元

余额支付

当前余额3.43元

前往充值 >

需支付:10.00元

取消

确定

下一步

知道了

成就一亿技术人!

领取后你会自动成为博主和红包主的粉丝

规则

hope_wisdom 发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额

0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。 2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值

用Python (scikit-learn) 做PCA分析 - 知乎

用Python (scikit-learn) 做PCA分析 - 知乎首发于数据应用学院切换模式写文章登录/注册用Python (scikit-learn) 做PCA分析数据应用学院原始图像(左)保留不同数量的方差我的上一个教程讨论了使用Python的逻辑回归(https://towardsdatascience.com/logistic-regression-using-python-sklearn-numpy-mnist-handwriting-recognition-matplotlib-a6b31e2b166a)。我们学到的一件事是,你可以通过改变优化算法来加速机器学习算法的拟合。加速机器学习算法的一种更常见的方法是使用主成分分析 Principal Component Analysis (PCA)。如果你的学习算法太慢,因为输入维数太高,那么使用PCA来加速是一个合理的选择。这可能是PCA最常见的应用。PCA的另一个常见应用是数据可视化。为了理解使用PCA进行数据可视化的价值,本教程的第一部分介绍了应用PCA后对IRIS数据集的基本可视化。第二部分使用PCA来加速MNIST数据集上的机器学习算法(逻辑回归)。现在,让我们开始吧!本教程中使用的代码如下所示:“PCA的数据可视化的应用https://http://github.com/mGalarnyk/Python_Tutorials/blob/master/Sklearn/PCA/PCA_Data_Visualization_Iris_Dataset_Blog.ipynb用PCA来加速机器学习的计算https://github.com/mGalarnyk/Python_Tutorials/blob/master/Sklearn/PCA/PCA_to_Speed-up_Machine_Learning_Algorithms.ipynb”PCA在数据可视化的应用对于许多机器学习应用程序来说,能够可视化你的数据是很有帮助的。将2维或3维数据可视化并不那么困难。然而,即使在本教程的这一部分中使用的Iris数据集也是四维的。你可以使用主成分分析将四维数据减少到2维或3维这样你就能更好地绘制并理解数据。加载Iris数据集Iris数据集是scikit-learn附带的数据集之一,不需要从外部网站下载任何文件。下面的代码将加载iris数据集。import pandas as pdurl = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"# load dataset into Pandas DataFramedf = pd.read_csv(url, names=['sepal length','sepal width','petal length','petal width','target'])原版pandas df(特征+目标)标准化数据主成分分析受尺度影响,所以在应用主成分分析之前,需要对数据中的特征进行尺度分析。使用StandardScaler帮助你将数据集的特性标准化到单元尺度(均值= 0,方差= 1),这是许多机器学习算法实现最佳性能的要求。如果你想看到不缩放数据可能带来的负面影响,scikit-learn有一节是讲关于不标准化数据的影响的(https://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html#sphx-glr-auto-examples-preprocessing-plot-scaling-importance-py)。from sklearn.preprocessingimport StandardScalerfeatures = ['sepal length', 'sepal width', 'petal length', 'petal width']# Separating out the featuresx = df.loc[:, features].values# Separating out the targety = df.loc[:,['target']].values# Standardizing the featuresx = StandardScaler().fit_transform(x)标准化前后的数组x(由panda dataframe显示)主成分分析二维“投影”原始数据有4列(萼片长度、萼片宽度、花瓣长度和花瓣宽度)。在本节中,代码将原来的4维数据“投影”到2维中。我要指出的是,在降维之后,通常不会给每个主成分赋予一个特定的意义。新的成分只是变化的两个主要维度。from sklearn.decompositionimport PCApca = PCA(n_components=2)principalComponents = pca.fit_transform(x)principalDf = pd.DataFrame(data = principalComponents , columns = ['principal component 1', 'principal component 2'])PCA和保留前2个主成分finalDf = pd.concat([principalDf, df[['target']]], axis = 1)通过设置axis=1连接dataframe。finalDf是绘制数据之前的最后一个dataframe。 可视化二维”投影”这部分只是绘制二维数据。请注意下面的图表,这些类似乎彼此分离得很好。fig = plt.figure(figsize = (8,8))ax = fig.add_subplot(1,1,1) ax.set_xlabel('Principal Component 1', fontsize = 15)ax.set_ylabel('Principal Component 2', fontsize = 15)ax.set_title('2 component PCA', fontsize = 20)targets = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']colors = ['r', 'g', 'b']for target, color in zip(targets,colors): indicesToKeep = finalDf['target'] == target ax.scatter(finalDf.loc[indicesToKeep, 'principal component 1'] , finalDf.loc[indicesToKeep, 'principal component 2'] , c = color , s = 50)ax.legend(targets)ax.grid()2分量主成分图被解释的方差被解释的方差告诉你有多少信息(方差)可以归因于每个主成分。这很重要,因为当你把四维空间转换成二维空间时,你会丢失一些方差(信息)。通过使用属性explained_variance_ratio_,你可以看到第一个主成分包含了72.77%的方差,第二个主成分包含了23.03%的方差。这两个部分总共包含了95.80%的信息。pca.explained_variance_ratio_PCA加速机器学习算法PCA最重要的应用之一是加速机器学习算法。在这里使用IRIS数据集是不切实际的,因为该数据集只有150行和4个特征列。MNIST手写数字数据库更合适,因为它有784个特征列(784个维度)、一组包含60,000个示例的训练集和一组包含10,000个示例的测试集。下载并加载数据还可以向fetch_mldata添加data_home参数,以更改下载数据的位置。from sklearn.datasets import fetch_openmlmnist = fetch_openml('mnist_784')你下载的图像包含在MNIST中。数据和形状(70000, 784)意味着有70000张具有784个维度(784个特征)的图像。标签(整数0-9)包含在mnist.target中。功能是784维(28 x 28图像)和标签只是从0到9的数字。将数据分解为训练集和测试集一般来说,训练测试分为80%的训练和20%的测试。在这个例子中,我选择了6/7的数据作为训练,1/7的数据作为测试集。from sklearn.model_selection import train_test_split# test_size: what proportion of original data is used for test settrain_img, test_img, train_lbl, test_lbl = train_test_split( mnist.data, mnist.target, test_size=1/7.0, random_state=0)标准化数据这一段的文字几乎完全是早先所写内容的翻版。主成分分析受尺度影响,因此在应用主成分分析之前,需要对数据中的特征进行尺度分析。你可以将数据转换到单位尺度(均值= 0和方差= 1),这是许多机器学习算法的最优性能的要求。StandardScaler帮助标准化数据集的特性。注意,你适合于训练集,并在训练和测试集上进行转换。如果你想了解不缩放数据可能带来的负面影响,scikit-learn有一节介绍不标准化数据的影响(https://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html#sphx-glr-auto-examples-preprocessing-plot-scaling-importance-py)。from sklearn.preprocessing import StandardScalerscaler = StandardScaler()# Fit on training set only.scaler.fit(train_img)# Apply transform to both the training set and the test set.train_img = scaler.transform(train_img)test_img = scaler.transform(test_img)导入并应用PCA注意,下面的代码使用.95作为成分数量参数。这意味着scikit-learn选择主成分的最小数量,这样95%的方差被保留。from sklearn.decomposition import PCA# Make an instance of the Modelpca = PCA(.95)在训练集中安装主成分分析。注意:你只在训练集中安装主成分分析。pca.fit(train_img)注意:通过使用pca.n_components_对模型进行拟合,可以知道PCA选择了多少个成分。在这种情况下,95%的方差相当于330个主成分。将“映射”(转换)应用到训练集和测试集。train_img = pca.transform(train_img)test_img = pca.transform(test_img)对转换后的数据应用逻辑回归步骤1:导入你想要使用的模型在sklearn中,所有的机器学习模型都被用作Python class。from sklearn.linear_model import LogisticRegression步骤2:创建模型的实例。#未指定的所有参数都设置为默认值#默认解算器非常慢,这就是为什么它被改为“lbfgs”logisticRegr = LogisticRegression(solver = 'lbfgs')步骤3:在数据上训练模型,存储从数据中学习到的信息模型学习的是数字和标签之间的关系logisticRegr.fit(train_img, train_lbl)步骤4:预测新数据(新图像)的标签使用模型在模型训练过程中学习到的信息下面的代码预测了一个观察结果#预测一次观测(图片)logisticRegr.predict(test_img[0].reshape(1,-1))下面的代码一次预测了多个观察结果#预测一次观测(图片)logisticRegr.predict(test_img[0:10])测量模型的性能虽然准确度并不总是机器学习算法的最佳度量标准(精度、回忆、F1分数、ROC曲线等会更好(https://http://towardsdatascience.com/receiver-operating-characteristic-curves-demystified-in-python-bd531a4364d0 ),但这里使用它是为了简单。logisticRegr.score(test_img, test_lbl)主成分分析后拟合逻辑回归的时间本节教程的全部目的是向你展示可以使用PCA来加速机器学习算法的拟合。下表显示了在我的MacBook上使用PCA(每次保留不同数量的方差)后进行logistic回归所花费的时间)。主成分分析后的逻辑回归的拟合时间,保留不同方差分量压缩后的图像重建本教程前面的部分演示了如何使用PCA将高维数据压缩为低维数据。我想简要地提一下,PCA还可以将数据的压缩重建(低维数据)还原为原始高维数据的近似形式。如果你对生成下图的代码感兴趣,请查看我的github(https://github.com/mGalarnyk/Python_Tutorials/blob/master/Sklearn/PCA/PCA_Image_Reconstruction_and_such.ipynb)。主成分分析后的原始图像(左)和原始数据的近似(右) 总结思想这篇文章我本来可以写得更长一些,因为PCA有很多不同的用途。我希望这篇文章能对你有所帮助。我的下一个机器学习教程将介绍如何理解用于分类的决策树(https://towardsdatascience.com/understanding-decision-trees-for-classification-python-9663d683c952)。更多高质量科技类原创文章,请访问数据应用学院官网Blog:https://www.dataapplab.com/参加数据应用学院线上免费公开课:https://www.dataapplab.com/event/查看数据应用学院往期课程视频:https://www.youtube.com/channel/UCa8NLpvi70mHVsW4J_x9OeQ发布于 2020-06-03 18:08sklearnPrincipal Component AnalysisPython​赞同 12​​1 条评论​分享​喜欢​收藏​申请转载​文章被以下专栏收录数据应用学院北美首屈一指的Data Bootcamp (www.DataAppLab.c

【python】sklearn中PCA的使用方法_from sklearn.decomposition import pca-CSDN博客

>

【python】sklearn中PCA的使用方法_from sklearn.decomposition import pca-CSDN博客

【python】sklearn中PCA的使用方法

我从崖边跌落

已于 2022-04-14 14:55:00 修改

阅读量10w+

收藏

735

点赞数

158

分类专栏:

python编程

算法

文章标签:

python

于 2019-07-09 23:01:53 首次发布

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。

本文链接:https://blog.csdn.net/qq_20135597/article/details/95247381

版权

python编程

同时被 2 个专栏收录

11 篇文章

2 订阅

订阅专栏

算法

4 篇文章

0 订阅

订阅专栏

from sklearn.decomposition import PCA

PCA

主成分分析(Principal Components Analysis),简称PCA,是一种数据降维技术,用于数据预处理。

PCA的一般步骤是:先对原始数据零均值化,然后求协方差矩阵,接着对协方差矩阵求特征向量和特征值,这些特征向量组成了新的特征空间。

sklearn.decomposition.PCA(n_components=None, copy=True, whiten=False)

参数:

n_components:  

意义:PCA算法中所要保留的主成分个数n,也即保留下来的特征个数n

类型:int 或者 string,缺省时默认为None,所有成分被保留。

          赋值为int,比如n_components=1,将把原始数据降到一个维度。

          赋值为string,比如n_components='mle',将自动选取特征个数n,使得满足所要求的方差百分比。

copy:

类型:bool,True或者False,缺省时默认为True。

意义:表示是否在运行算法时,将原始训练数据复制一份。若为True,则运行PCA算法后,原始训练数据的值不            会有任何改变,因为是在原始数据的副本上进行运算;若为False,则运行PCA算法后,原始训练数据的              值会改,因为是在原始数据上进行降维计算。

whiten:

类型:bool,缺省时默认为False

意义:白化,使得每个特征具有相同的方差。

PCA属性:

components_ :返回具有最大方差的成分。explained_variance_ratio_:返回 所保留的n个成分各自的方差百分比。n_components_:返回所保留的成分个数n。mean_:noise_variance_:

PCA方法:

1、fit(X,y=None)

fit(X),表示用数据X来训练PCA模型。

函数返回值:调用fit方法的对象本身。比如pca.fit(X),表示用X对pca这个对象进行训练。

拓展:fit()可以说是scikit-learn中通用的方法,每个需要训练的算法都会有fit()方法,它其实就是算法中的“训练”这一步骤。因为PCA是无监督学习算法,此处y自然等于None。

2、fit_transform(X)

用X来训练PCA模型,同时返回降维后的数据。

newX=pca.fit_transform(X),newX就是降维后的数据。

3、inverse_transform()

将降维后的数据转换成原始数据,X=pca.inverse_transform(newX)

4、transform(X)

将数据X转换成降维后的数据。当模型训练好后,对于新输入的数据,都可以用transform方法来降维。

此外,还有get_covariance()、get_precision()、get_params(deep=True)、score(X, y=None)等方法,以后用到再补充吧。

实例:

import numpy as np

from sklearn.decomposition import PCA

X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])

pca = PCA(n_components=2)

newX = pca.fit_transform(X) #等价于pca.fit(X) pca.transform(X)

invX = pca.inverse_transform(newX) #将降维后的数据转换成原始数据

print(X)

[[-1 -1]

[-2 -1]

[-3 -2]

[ 1 1]

[ 2 1]

[ 3 2]]

print(newX)

array([[ 1.38340578, 0.2935787 ],

[ 2.22189802, -0.25133484],

[ 3.6053038 , 0.04224385],

[-1.38340578, -0.2935787 ],

[-2.22189802, 0.25133484],

[-3.6053038 , -0.04224385]])

print(invX)

[[-1 -1]

[-2 -1]

[-3 -2]

[ 1 1]

[ 2 1]

[ 3 2]]

print(pca.explained_variance_ratio_)

[ 0.99244289 0.00755711]

我们所训练的pca对象的n_components值为2,即保留2个特征,第一个特征占所有特征的方差百分比为0.99244289,意味着几乎保留了所有的信息。即第一个特征可以99.24%表达整个数据集,因此我们可以降到1维:

pca = PCA(n_components=1)

newX = pca.fit_transform(X)

print(pca.explained_variance_ratio_)

[ 0.99244289]

优惠劵

我从崖边跌落

关注

关注

158

点赞

735

收藏

觉得还不错?

一键收藏

知道了

19

评论

【python】sklearn中PCA的使用方法

from sklearn.decomposition import PCAPCA主成分分析(Principal Components Analysis),简称PCA,是一种数据降维技术,用于数据预处理。PCA的一般步骤是:先对原始数据零均值化,然后求协方差矩阵,接着对协方差矩阵求特征向量和特征值,这些特征向量组成了新的特征空间。sklearn.decomposition.PC...

复制链接

扫一扫

专栏目录

机器学习代码实战——PCA(主成分分析)

12-22

文章目录1.主成分分析基本概念2.代码

1.主成分分析基本概念

2.代码

导入必要的库

import pandas as pd

import numpy as np

from sklearn.datasets import load_iris #sklearn中导入load_iris数据

import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler

%matplotlib inline

加载数据,构建DataFrame

iris = load_iris() #加载数据集

df =

Python sklearn库实现PCA教程(以鸢尾花分类为例)

09-17

今天小编就为大家分享一篇Python sklearn库实现PCA教程(以鸢尾花分类为例),具有很好的参考价值,希望对大家有所帮助。一起跟随小编过来看看吧

19 条评论

您还未登录,请先

登录

后发表或查看评论

[Python] 什么是PCA降维技术以及scikit-learn中PCA类使用案例(图文教程,含详细代码)

最新发布

老狼工作室的博客

01-30

1324

本文主要介绍了什么是PCA降维技术以及scikit-learn中PCA函数及其使用案例(含详细代码和图文)。

pca降维-python

04-28

pca降维

注解:

首先导入NumPy库和sklearn中的PCA模型;

输入原始数据矩阵X,其中有4个样本,每个样本有2个特征;

使用PCA()创建一个PCA对象,并指定降维后的维数为1;

使用fit_transform()方法对输入数据进行降维,得到降维后的数据;

使用shape属性输出降维前后数据的形状;

使用print()输出降维后的数据。

在实际使用PCA进行降维时,需要根据具体问题选择合适的降维维数,一般可通过累计贡献率、特征值等方法进行选择。另外,PCA降维仅适用于服从正态分布的数据特征降维,对于非正态分布、离散型特征的数据降维效果会受到一定的影响,需要对数据进行合适的处理和转换。

用Python的sklearn库进行PCA(主成分分析)

热门推荐

puredreammer的博客

08-20

14万+

在python的sklearn的库里面集成很多机器学习算法的库,其中也包括主成分分析的方法。

接下来讲讲怎么在python里面使用pca算法

首先要导入库:

from sklearn.decomposition import PCA

下面是官网上的例子:

>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3

Python-sklearn之PCA主成分分析

m0_43405302的博客

03-19

7232

文章目录写在前面一、PCA主成分分析1、主成分分析步骤2、主成分分析的主要作二、Python使用PCA主成分分析

写在前面

作为大数据开发人员,我们经常会收到一些数据分析工程师给我们的指标,我们基于这些指标进行数据提取。其中数据分析工程师最主要的一个特征提取方式就是PCA主成分分析,下面我将介绍Python的sklearn库中是如何实现PCA算法及其使用。

一、PCA主成分分析

什么是PCA主成分分析。百度百科给出如下定义:

1、主成分分析步骤

对于一个PCA主成分分析,一般分为以下几个步骤:

去除平均

sklearn学习06——PCA

qq_42929168的博客

12-30

5570

sklearn学习06——PCA前言一、PCA的核心思想1.1、PCA的原理1.2、PCA的大致流程1.3、样本信息量的衡量二、sklearn实现PCA过程2.1、引入相关库2.2、利用PCA降维2.3、不同主成分个数对应的可解释方差分析(Explained Variance)总结

前言

主成分分析(principal component analysis)是一种常见的数据降维方法,其目的是在“信息”损失较小的前提下,将高维的数据转换到低维,从而减小计算量。本篇简单介绍PCA的思想,然后继续使用skle

使用Sklearn学习降维算法PCA和SVD

理科男同学

10-24

4811

1,概述

1.1,什么是维度?

我们先来解释一下维度的概念。

对于数组和Series来说,维度就是功能shape返回的结果,shape中返回了几个数字,就是几维。索引以外的数据,不分行列的叫一维(此时shape返回唯一的维度上的数据个数),有行列之分叫二维(shape返回行x列),也称为表。一张表最多二维,复数的表构成了更高的维度。当一个数组中存在2张3行4列的表时,shape返回的是(更高维,行,列)。当数组中存在2组2张3行4列的表时,数据就是4维,shape返回(2,2,3,4)。

数组中的每

用sklearn进行PCA降维——基于python语言

weixin_46838605的博客

03-22

1万+

1. sklearn的PCA类

在sklearn中,与PCA相关的类都在sklearn.decomposition包中,主要有:

sklearn.decomposition.PCA

最常用的PCA类,接下来会在2中详细讲解。

KernelPCA类,主要用于非线性数据的降维,需要用到核技巧。因此在使用的时候需要选择合适的核函数并对核函数的参数进行调参。

IncrementalPCA类,主要解决单机内存限制。有时候样本量可能是上百万,维度可能也是上千,直接拟合数据可能会让内存爆掉, 此时可以用Incremen

sklearn的系统学习——PCA降维(案例及完整python代码)

weixin_44904136的博客

08-10

5700

在实际操作数据集的过程中,难免会遇到很多高维特征的数据,计算量也会暴增。为了在减少计算量的同时提高算法准确率,可尝试降维。下文将列出三个案例,分别是人脸识别、降噪和处理手写数据集。,即主成分分析方法,是一种使用最广泛的数据。...

python_主成分分析(PCA)降维

huizxhhui1994的博客

01-22

2万+

主成分分析(principal component analysis)是一种常见的数据降维方法,其目的是在“信息”损失较小的前提下,将高维的数据转换到低维,从而减小计算量。

  PCA的本质就是找一些投影方向,使得数据在这些投影方向上的方差最大,而且这些投影方向是相互正交的。这其实就是找新的正交基的过程,计算原始数据在这些正交基上投影的方差,方差越大,就说明在对应正交基上包含了更多的信息量。

对python sklearn one-hot编码详解

12-25

使用one-hot编码,将离散特征的取值扩展到了欧式空间,离散特征的某个取值就对应欧式空间的某个点 将离散特征通过one-hot编码映射到欧式空间,是因为,在回归,分类,聚类等机器学习算法中,特征之间距离的计算或...

Python PCA遥感影像变化检测算法代码

04-12

(1)基于Python sklearn与opencv实现的利用PCA方式的两期影像变化检测算法。 (2)支持大影像,并可以将变化图斑转成矢量。 (3)并基于图像处理的方式滤除一些面积过小(或者长宽比过大的区域)的图斑,这里可以...

sklearn实现pca

qq_48566899的博客

06-16

1238

导入

from sklearn.decomposition import PCA

n_components希望保留的主成分数量,如果为none,则所有主成分均被保留,为mle时自动选择组价数量

whiten白化,即对数据进行归一化,让期望为0,方差为1。whiten=false即

标准化改转变会损失部分方差信息,但有时候会是的后续的建模效果有所改进

白化就是对降维后的数据的每个特征进行归一化

from sklearn.datasets import load_wine

wine=load_wine()

x

【sklearn】sklearn.decomposition.PCA

烦成航的博客

07-23

710

sklearn.decomposition.PCA

PCA()使用

累积可解释方差贡献率曲线

n_components='mle' 自选

按信息量占比选超参数 SVD

Python机器学习笔记 使用scikit-learn工具进行PCA降维

weixin_30347335的博客

04-04

2480

  之前总结过关于PCA的知识:深入学习主成分分析(PCA)算法原理。这里打算再写一篇笔记,总结一下如何使用scikit-learn工具来进行PCA降维。

  在数据处理中,经常会遇到特征维度比样本数量多得多的情况,如果拿到实际工程中去跑,效果不一定好。一是因为冗余的特征会带来一些噪音,影响计算的结果;二是因为无关的特征会加大计算量,耗费时间和资源。所以我们通常会对数据重新变换一下,再跑模型。数...

【educoder 机器学习】PCA

weixin_44787324的博客

10-30

2418

PCA ( principal components analysis )即主成分分析,是一种使用最广泛的数据降维算法。 PCA 的主要思想是将n维特征映射到k维上,这k维是全新的正交特征也被称为主成分,是在原有n维特征的基础上重新构造出来的k维特征。

本实训项目的主要内容是基于 python 语言实现 PCA 算法,并熟悉 sklearn 中提供的 PCA 接口来对数据进行降维。

sklearn基础篇(九)-- 主成分分析(PCA)

CarpeDiem

12-15

2028

PCA(Principal Component Analysis) 是一种常见的数据分析方式,常用于高维数据的降维,可用于提取数据的主要特征分量。PCA 的数学推导可以从最大可分型和最近重构性两方面进行,前者的优化条件为划分后方差最大,后者的优化条件为点到划分平面距离最小,这里我原理出发,介绍算法流程和sklearn实现。

python sklearn pca

03-16

Python中的sklearn库中提供了PCA(Principal Component Analysis)算法,用于降维。PCA是一种常用的数据降维方法,可以将高维数据降到低维,从而减少数据的维度,提高数据的可视化和处理效率。在sklearn中,PCA算法可以通过调用sklearn.decomposition.PCA类来实现。该类提供了fit()、transform()和fit_transform()等方法,可以对数据进行拟合、转换和拟合转换等操作。

“相关推荐”对你有帮助么?

非常没帮助

没帮助

一般

有帮助

非常有帮助

提交

我从崖边跌落

CSDN认证博客专家

CSDN认证企业博客

码龄10年

暂无认证

54

原创

6万+

周排名

3万+

总排名

42万+

访问

等级

2684

积分

61

粉丝

346

获赞

45

评论

1555

收藏

私信

关注

热门文章

【python】sklearn中PCA的使用方法

109760

【python】二维数组按照某行或某列排序(numpy lexsort)

63661

【python】用 np.loadtxt() 读取txt文件

52143

【python】从数组随机取数据

27181

【python】scipy中pdist和squareform

26017

分类专栏

数据仓库

大数据实战

1篇

编程-剑指offer

10篇

强化学习

2篇

编程

6篇

笔试编程题

4篇

python编程

11篇

神经网络

2篇

论文笔记

7篇

图书资料

1篇

LeetCode

3篇

算法

4篇

学习

3篇

TensorFlow

14篇

深度学习

最优化

最新评论

【python】sklearn中PCA的使用方法

Nickee-Lin:

看了那么多PCA,终于在这里有进展了,别的贴基本都是泛泛而谈原理

【论文笔记】Neural Machine Translation by Jointly Learning to Align and Translate

Mike_0728:

你这篇笔记除了抄paper没有什么贡献

【论文笔记】Neural Machine Translation by Jointly Learning to Align and Translate

hmnnn:

请问隐藏状态到底是St还是Ht啊?

【python】Python中glob.glob按照阿拉伯数字排序问题

crazy_runnin_ant:

请问每次调用的时候,他自动排的顺序会变吗??

【tensorflow】batch_normalization

zjsfgmxm233:

你好,请问网络中有BN层,按文章这样做训练的设置,但使用tf.compat.v1.train.import_meta_graph加载训练好的模型和参数,怎么让推理(使用模型)时tf.layers.batch_normalization()的training=False呢,还是必须要用tf.train.Saver(),可否赐教,谢谢!

您愿意向朋友推荐“博客详情页”吗?

强烈不推荐

不推荐

一般般

推荐

强烈推荐

提交

最新文章

count(*)、count(1)和count(列名)的区别

【CDH搭建问题】unable to retrieve remote parcel repository manifest

【TensorFlow】GPU服务器上Anaconda的配置问题

2023年1篇

2022年1篇

2019年29篇

2018年28篇

目录

目录

分类专栏

数据仓库

大数据实战

1篇

编程-剑指offer

10篇

强化学习

2篇

编程

6篇

笔试编程题

4篇

python编程

11篇

神经网络

2篇

论文笔记

7篇

图书资料

1篇

LeetCode

3篇

算法

4篇

学习

3篇

TensorFlow

14篇

深度学习

最优化

目录

评论 19

被折叠的  条评论

为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

祝福语

请填写红包祝福语或标题

红包数量

红包个数最小为10个

红包总金额

红包金额最低5元

余额支付

当前余额3.43元

前往充值 >

需支付:10.00元

取消

确定

下一步

知道了

成就一亿技术人!

领取后你会自动成为博主和红包主的粉丝

规则

hope_wisdom 发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额

0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。 2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值

2.5. Decomposing signals in components (matrix factorization problems) — scikit-learn 1.4.1 documentation

2.5. Decomposing signals in components (matrix factorization problems) — scikit-learn 1.4.1 documentation

Install

User Guide

API

Examples

Community

Getting Started

Tutorial

What's new

Glossary

Development

FAQ

Support

Related packages

Roadmap

Governance

About us

GitHub

Other Versions and Download

More

Getting Started

Tutorial

What's new

Glossary

Development

FAQ

Support

Related packages

Roadmap

Governance

About us

GitHub

Other Versions and Download

Toggle Menu

PrevUp

Next

scikit-learn 1.4.1

Other versions

Please cite us if you use the software.

2.5. Decomposing signals in components (matrix factorization problems)

2.5.1. Principal component analysis (PCA)

2.5.1.1. Exact PCA and probabilistic interpretation

2.5.1.2. Incremental PCA

2.5.1.3. PCA using randomized SVD

2.5.1.4. Sparse principal components analysis (SparsePCA and MiniBatchSparsePCA)

2.5.2. Kernel Principal Component Analysis (kPCA)

2.5.2.1. Exact Kernel PCA

2.5.2.2. Choice of solver for Kernel PCA

2.5.3. Truncated singular value decomposition and latent semantic analysis

2.5.4. Dictionary Learning

2.5.4.1. Sparse coding with a precomputed dictionary

2.5.4.2. Generic dictionary learning

2.5.4.3. Mini-batch dictionary learning

2.5.5. Factor Analysis

2.5.6. Independent component analysis (ICA)

2.5.7. Non-negative matrix factorization (NMF or NNMF)

2.5.7.1. NMF with the Frobenius norm

2.5.7.2. NMF with a beta-divergence

2.5.7.3. Mini-batch Non Negative Matrix Factorization

2.5.8. Latent Dirichlet Allocation (LDA)

2.5. Decomposing signals in components (matrix factorization problems)¶

2.5.1. Principal component analysis (PCA)¶

2.5.1.1. Exact PCA and probabilistic interpretation¶

PCA is used to decompose a multivariate dataset in a set of successive

orthogonal components that explain a maximum amount of the variance. In

scikit-learn, PCA is implemented as a transformer object

that learns \(n\) components in its fit method, and can be used on new

data to project it on these components.

PCA centers but does not scale the input data for each feature before

applying the SVD. The optional parameter whiten=True makes it

possible to project the data onto the singular space while scaling each

component to unit variance. This is often useful if the models down-stream make

strong assumptions on the isotropy of the signal: this is for example the case

for Support Vector Machines with the RBF kernel and the K-Means clustering

algorithm.

Below is an example of the iris dataset, which is comprised of 4

features, projected on the 2 dimensions that explain most variance:

The PCA object also provides a

probabilistic interpretation of the PCA that can give a likelihood of

data based on the amount of variance it explains. As such it implements a

score method that can be used in cross-validation:

Examples:

PCA example with Iris Data-set

Comparison of LDA and PCA 2D projection of Iris dataset

Model selection with Probabilistic PCA and Factor Analysis (FA)

2.5.1.2. Incremental PCA¶

The PCA object is very useful, but has certain limitations for

large datasets. The biggest limitation is that PCA only supports

batch processing, which means all of the data to be processed must fit in main

memory. The IncrementalPCA object uses a different form of

processing and allows for partial computations which almost

exactly match the results of PCA while processing the data in a

minibatch fashion. IncrementalPCA makes it possible to implement

out-of-core Principal Component Analysis either by:

Using its partial_fit method on chunks of data fetched sequentially

from the local hard drive or a network database.

Calling its fit method on a memory mapped file using

numpy.memmap.

IncrementalPCA only stores estimates of component and noise variances,

in order update explained_variance_ratio_ incrementally. This is why

memory usage depends on the number of samples per batch, rather than the

number of samples to be processed in the dataset.

As in PCA, IncrementalPCA centers but does not scale the

input data for each feature before applying the SVD.

Examples:

Incremental PCA

2.5.1.3. PCA using randomized SVD¶

It is often interesting to project data to a lower-dimensional

space that preserves most of the variance, by dropping the singular vector

of components associated with lower singular values.

For instance, if we work with 64x64 pixel gray-level pictures

for face recognition,

the dimensionality of the data is 4096 and it is slow to train an

RBF support vector machine on such wide data. Furthermore we know that

the intrinsic dimensionality of the data is much lower than 4096 since all

pictures of human faces look somewhat alike.

The samples lie on a manifold of much lower

dimension (say around 200 for instance). The PCA algorithm can be used

to linearly transform the data while both reducing the dimensionality

and preserve most of the explained variance at the same time.

The class PCA used with the optional parameter

svd_solver='randomized' is very useful in that case: since we are going

to drop most of the singular vectors it is much more efficient to limit the

computation to an approximated estimate of the singular vectors we will keep

to actually perform the transform.

For instance, the following shows 16 sample portraits (centered around

0.0) from the Olivetti dataset. On the right hand side are the first 16

singular vectors reshaped as portraits. Since we only require the top

16 singular vectors of a dataset with size \(n_{samples} = 400\)

and \(n_{features} = 64 \times 64 = 4096\), the computation time is

less than 1s:

If we note \(n_{\max} = \max(n_{\mathrm{samples}}, n_{\mathrm{features}})\) and

\(n_{\min} = \min(n_{\mathrm{samples}}, n_{\mathrm{features}})\), the time complexity

of the randomized PCA is \(O(n_{\max}^2 \cdot n_{\mathrm{components}})\)

instead of \(O(n_{\max}^2 \cdot n_{\min})\) for the exact method

implemented in PCA.

The memory footprint of randomized PCA is also proportional to

\(2 \cdot n_{\max} \cdot n_{\mathrm{components}}\) instead of \(n_{\max}

\cdot n_{\min}\) for the exact method.

Note: the implementation of inverse_transform in PCA with

svd_solver='randomized' is not the exact inverse transform of

transform even when whiten=False (default).

Examples:

Faces recognition example using eigenfaces and SVMs

Faces dataset decompositions

References:

Algorithm 4.3 in

“Finding structure with randomness: Stochastic algorithms for

constructing approximate matrix decompositions”

Halko, et al., 2009

“An implementation of a randomized algorithm for principal component

analysis” A. Szlam et al. 2014

2.5.1.4. Sparse principal components analysis (SparsePCA and MiniBatchSparsePCA)¶

SparsePCA is a variant of PCA, with the goal of extracting the

set of sparse components that best reconstruct the data.

Mini-batch sparse PCA (MiniBatchSparsePCA) is a variant of

SparsePCA that is faster but less accurate. The increased speed is

reached by iterating over small chunks of the set of features, for a given

number of iterations.

Principal component analysis (PCA) has the disadvantage that the

components extracted by this method have exclusively dense expressions, i.e.

they have non-zero coefficients when expressed as linear combinations of the

original variables. This can make interpretation difficult. In many cases,

the real underlying components can be more naturally imagined as sparse

vectors; for example in face recognition, components might naturally map to

parts of faces.

Sparse principal components yields a more parsimonious, interpretable

representation, clearly emphasizing which of the original features contribute

to the differences between samples.

The following example illustrates 16 components extracted using sparse PCA from

the Olivetti faces dataset. It can be seen how the regularization term induces

many zeros. Furthermore, the natural structure of the data causes the non-zero

coefficients to be vertically adjacent. The model does not enforce this

mathematically: each component is a vector \(h \in \mathbf{R}^{4096}\), and

there is no notion of vertical adjacency except during the human-friendly

visualization as 64x64 pixel images. The fact that the components shown below

appear local is the effect of the inherent structure of the data, which makes

such local patterns minimize reconstruction error. There exist sparsity-inducing

norms that take into account adjacency and different kinds of structure; see

[Jen09] for a review of such methods.

For more details on how to use Sparse PCA, see the Examples section, below.

Note that there are many different formulations for the Sparse PCA

problem. The one implemented here is based on [Mrl09] . The optimization

problem solved is a PCA problem (dictionary learning) with an

\(\ell_1\) penalty on the components:

\[\begin{split}(U^*, V^*) = \underset{U, V}{\operatorname{arg\,min\,}} & \frac{1}{2}

||X-UV||_{\text{Fro}}^2+\alpha||V||_{1,1} \\

\text{subject to } & ||U_k||_2 <= 1 \text{ for all }

0 \leq k < n_{components}\end{split}\]

\(||.||_{\text{Fro}}\) stands for the Frobenius norm and \(||.||_{1,1}\)

stands for the entry-wise matrix norm which is the sum of the absolute values

of all the entries in the matrix.

The sparsity-inducing \(||.||_{1,1}\) matrix norm also prevents learning

components from noise when few training samples are available. The degree

of penalization (and thus sparsity) can be adjusted through the

hyperparameter alpha. Small values lead to a gently regularized

factorization, while larger values shrink many coefficients to zero.

Note

While in the spirit of an online algorithm, the class

MiniBatchSparsePCA does not implement partial_fit because

the algorithm is online along the features direction, not the samples

direction.

Examples:

Faces dataset decompositions

References:

[Mrl09]

“Online Dictionary Learning for Sparse Coding”

J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009

[Jen09]

“Structured Sparse Principal Component Analysis”

R. Jenatton, G. Obozinski, F. Bach, 2009

2.5.2. Kernel Principal Component Analysis (kPCA)¶

2.5.2.1. Exact Kernel PCA¶

KernelPCA is an extension of PCA which achieves non-linear

dimensionality reduction through the use of kernels (see Pairwise metrics, Affinities and Kernels) [Scholkopf1997]. It

has many applications including denoising, compression and structured

prediction (kernel dependency estimation). KernelPCA supports both

transform and inverse_transform.

Note

KernelPCA.inverse_transform relies on a kernel ridge to learn the

function mapping samples from the PCA basis into the original feature

space [Bakir2003]. Thus, the reconstruction obtained with

KernelPCA.inverse_transform is an approximation. See the example

linked below for more details.

Examples:

Kernel PCA

References:

[Scholkopf1997]

Schölkopf, Bernhard, Alexander Smola, and Klaus-Robert Müller.

“Kernel principal component analysis.”

International conference on artificial neural networks.

Springer, Berlin, Heidelberg, 1997.

[Bakir2003]

Bakır, Gökhan H., Jason Weston, and Bernhard Schölkopf.

“Learning to find pre-images.”

Advances in neural information processing systems 16 (2003): 449-456.

2.5.2.2. Choice of solver for Kernel PCA¶

While in PCA the number of components is bounded by the number of

features, in KernelPCA the number of components is bounded by the

number of samples. Many real-world datasets have large number of samples! In

these cases finding all the components with a full kPCA is a waste of

computation time, as data is mostly described by the first few components

(e.g. n_components<=100). In other words, the centered Gram matrix that

is eigendecomposed in the Kernel PCA fitting process has an effective rank that

is much smaller than its size. This is a situation where approximate

eigensolvers can provide speedup with very low precision loss.

Eigensolvers

Click for more details

The optional parameter eigen_solver='randomized' can be used to

significantly reduce the computation time when the number of requested

n_components is small compared with the number of samples. It relies on

randomized decomposition methods to find an approximate solution in a shorter

time.

The time complexity of the randomized KernelPCA is

\(O(n_{\mathrm{samples}}^2 \cdot n_{\mathrm{components}})\)

instead of \(O(n_{\mathrm{samples}}^3)\) for the exact method

implemented with eigen_solver='dense'.

The memory footprint of randomized KernelPCA is also proportional to

\(2 \cdot n_{\mathrm{samples}} \cdot n_{\mathrm{components}}\) instead of

\(n_{\mathrm{samples}}^2\) for the exact method.

Note: this technique is the same as in PCA using randomized SVD.

In addition to the above two solvers, eigen_solver='arpack' can be used as

an alternate way to get an approximate decomposition. In practice, this method

only provides reasonable execution times when the number of components to find

is extremely small. It is enabled by default when the desired number of

components is less than 10 (strict) and the number of samples is more than 200

(strict). See KernelPCA for details.

References:

dense solver:

scipy.linalg.eigh documentation

randomized solver:

Algorithm 4.3 in

“Finding structure with randomness: Stochastic

algorithms for constructing approximate matrix decompositions”

Halko, et al. (2009)

“An implementation of a randomized algorithm

for principal component analysis”

A. Szlam et al. (2014)

arpack solver:

scipy.sparse.linalg.eigsh documentation

R. B. Lehoucq, D. C. Sorensen, and C. Yang, (1998)

2.5.3. Truncated singular value decomposition and latent semantic analysis¶

TruncatedSVD implements a variant of singular value decomposition

(SVD) that only computes the \(k\) largest singular values,

where \(k\) is a user-specified parameter.

TruncatedSVD is very similar to PCA, but differs

in that the matrix \(X\) does not need to be centered.

When the columnwise (per-feature) means of \(X\)

are subtracted from the feature values,

truncated SVD on the resulting matrix is equivalent to PCA.

About truncated SVD and latent semantic analysis (LSA)

Click for more details

When truncated SVD is applied to term-document matrices

(as returned by CountVectorizer or

TfidfVectorizer),

this transformation is known as

latent semantic analysis

(LSA), because it transforms such matrices

to a “semantic” space of low dimensionality.

In particular, LSA is known to combat the effects of synonymy and polysemy

(both of which roughly mean there are multiple meanings per word),

which cause term-document matrices to be overly sparse

and exhibit poor similarity under measures such as cosine similarity.

Note

LSA is also known as latent semantic indexing, LSI,

though strictly that refers to its use in persistent indexes

for information retrieval purposes.

Mathematically, truncated SVD applied to training samples \(X\)

produces a low-rank approximation \(X\):

\[X \approx X_k = U_k \Sigma_k V_k^\top\]

After this operation, \(U_k \Sigma_k\)

is the transformed training set with \(k\) features

(called n_components in the API).

To also transform a test set \(X\), we multiply it with \(V_k\):

\[X' = X V_k\]

Note

Most treatments of LSA in the natural language processing (NLP)

and information retrieval (IR) literature

swap the axes of the matrix \(X\) so that it has shape

n_features × n_samples.

We present LSA in a different way that matches the scikit-learn API better,

but the singular values found are the same.

While the TruncatedSVD transformer

works with any feature matrix,

using it on tf–idf matrices is recommended over raw frequency counts

in an LSA/document processing setting.

In particular, sublinear scaling and inverse document frequency

should be turned on (sublinear_tf=True, use_idf=True)

to bring the feature values closer to a Gaussian distribution,

compensating for LSA’s erroneous assumptions about textual data.

Examples:

Clustering text documents using k-means

References:

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze (2008),

Introduction to Information Retrieval, Cambridge University Press,

chapter 18: Matrix decompositions & latent semantic indexing

2.5.4. Dictionary Learning¶

2.5.4.1. Sparse coding with a precomputed dictionary¶

The SparseCoder object is an estimator that can be used to transform signals

into sparse linear combination of atoms from a fixed, precomputed dictionary

such as a discrete wavelet basis. This object therefore does not

implement a fit method. The transformation amounts

to a sparse coding problem: finding a representation of the data as a linear

combination of as few dictionary atoms as possible. All variations of

dictionary learning implement the following transform methods, controllable via

the transform_method initialization parameter:

Orthogonal matching pursuit (Orthogonal Matching Pursuit (OMP))

Least-angle regression (Least Angle Regression)

Lasso computed by least-angle regression

Lasso using coordinate descent (Lasso)

Thresholding

Thresholding is very fast but it does not yield accurate reconstructions.

They have been shown useful in literature for classification tasks. For image

reconstruction tasks, orthogonal matching pursuit yields the most accurate,

unbiased reconstruction.

The dictionary learning objects offer, via the split_code parameter, the

possibility to separate the positive and negative values in the results of

sparse coding. This is useful when dictionary learning is used for extracting

features that will be used for supervised learning, because it allows the

learning algorithm to assign different weights to negative loadings of a

particular atom, from to the corresponding positive loading.

The split code for a single sample has length 2 * n_components

and is constructed using the following rule: First, the regular code of length

n_components is computed. Then, the first n_components entries of the

split_code are

filled with the positive part of the regular code vector. The second half of

the split code is filled with the negative part of the code vector, only with

a positive sign. Therefore, the split_code is non-negative.

Examples:

Sparse coding with a precomputed dictionary

2.5.4.2. Generic dictionary learning¶

Dictionary learning (DictionaryLearning) is a matrix factorization

problem that amounts to finding a (usually overcomplete) dictionary that will

perform well at sparsely encoding the fitted data.

Representing data as sparse combinations of atoms from an overcomplete

dictionary is suggested to be the way the mammalian primary visual cortex works.

Consequently, dictionary learning applied on image patches has been shown to

give good results in image processing tasks such as image completion,

inpainting and denoising, as well as for supervised recognition tasks.

Dictionary learning is an optimization problem solved by alternatively updating

the sparse code, as a solution to multiple Lasso problems, considering the

dictionary fixed, and then updating the dictionary to best fit the sparse code.

\[\begin{split}(U^*, V^*) = \underset{U, V}{\operatorname{arg\,min\,}} & \frac{1}{2}

||X-UV||_{\text{Fro}}^2+\alpha||U||_{1,1} \\

\text{subject to } & ||V_k||_2 <= 1 \text{ for all }

0 \leq k < n_{\mathrm{atoms}}\end{split}\]

\(||.||_{\text{Fro}}\) stands for the Frobenius norm and \(||.||_{1,1}\)

stands for the entry-wise matrix norm which is the sum of the absolute values

of all the entries in the matrix.

After using such a procedure to fit the dictionary, the transform is simply a

sparse coding step that shares the same implementation with all dictionary

learning objects (see Sparse coding with a precomputed dictionary).

It is also possible to constrain the dictionary and/or code to be positive to

match constraints that may be present in the data. Below are the faces with

different positivity constraints applied. Red indicates negative values, blue

indicates positive values, and white represents zeros.

The following image shows how a dictionary learned from 4x4 pixel image patches

extracted from part of the image of a raccoon face looks like.

Examples:

Image denoising using dictionary learning

References:

“Online dictionary learning for sparse coding”

J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009

2.5.4.3. Mini-batch dictionary learning¶

MiniBatchDictionaryLearning implements a faster, but less accurate

version of the dictionary learning algorithm that is better suited for large

datasets.

By default, MiniBatchDictionaryLearning divides the data into

mini-batches and optimizes in an online manner by cycling over the mini-batches

for the specified number of iterations. However, at the moment it does not

implement a stopping condition.

The estimator also implements partial_fit, which updates the dictionary by

iterating only once over a mini-batch. This can be used for online learning

when the data is not readily available from the start, or for when the data

does not fit into the memory.

Clustering for dictionary learning

Note that when using dictionary learning to extract a representation

(e.g. for sparse coding) clustering can be a good proxy to learn the

dictionary. For instance the MiniBatchKMeans estimator is

computationally efficient and implements on-line learning with a

partial_fit method.

Example: Online learning of a dictionary of parts of faces

2.5.5. Factor Analysis¶

In unsupervised learning we only have a dataset \(X = \{x_1, x_2, \dots, x_n

\}\). How can this dataset be described mathematically? A very simple

continuous latent variable model for \(X\) is

\[x_i = W h_i + \mu + \epsilon\]

The vector \(h_i\) is called “latent” because it is unobserved. \(\epsilon\) is

considered a noise term distributed according to a Gaussian with mean 0 and

covariance \(\Psi\) (i.e. \(\epsilon \sim \mathcal{N}(0, \Psi)\)), \(\mu\) is some

arbitrary offset vector. Such a model is called “generative” as it describes

how \(x_i\) is generated from \(h_i\). If we use all the \(x_i\)’s as columns to form

a matrix \(\mathbf{X}\) and all the \(h_i\)’s as columns of a matrix \(\mathbf{H}\)

then we can write (with suitably defined \(\mathbf{M}\) and \(\mathbf{E}\)):

\[\mathbf{X} = W \mathbf{H} + \mathbf{M} + \mathbf{E}\]

In other words, we decomposed matrix \(\mathbf{X}\).

If \(h_i\) is given, the above equation automatically implies the following

probabilistic interpretation:

\[p(x_i|h_i) = \mathcal{N}(Wh_i + \mu, \Psi)\]

For a complete probabilistic model we also need a prior distribution for the

latent variable \(h\). The most straightforward assumption (based on the nice

properties of the Gaussian distribution) is \(h \sim \mathcal{N}(0,

\mathbf{I})\). This yields a Gaussian as the marginal distribution of \(x\):

\[p(x) = \mathcal{N}(\mu, WW^T + \Psi)\]

Now, without any further assumptions the idea of having a latent variable \(h\)

would be superfluous – \(x\) can be completely modelled with a mean

and a covariance. We need to impose some more specific structure on one

of these two parameters. A simple additional assumption regards the

structure of the error covariance \(\Psi\):

\(\Psi = \sigma^2 \mathbf{I}\): This assumption leads to

the probabilistic model of PCA.

\(\Psi = \mathrm{diag}(\psi_1, \psi_2, \dots, \psi_n)\): This model is called

FactorAnalysis, a classical statistical model. The matrix W is

sometimes called the “factor loading matrix”.

Both models essentially estimate a Gaussian with a low-rank covariance matrix.

Because both models are probabilistic they can be integrated in more complex

models, e.g. Mixture of Factor Analysers. One gets very different models (e.g.

FastICA) if non-Gaussian priors on the latent variables are assumed.

Factor analysis can produce similar components (the columns of its loading

matrix) to PCA. However, one can not make any general statements

about these components (e.g. whether they are orthogonal):

The main advantage for Factor Analysis over PCA is that

it can model the variance in every direction of the input space independently

(heteroscedastic noise):

This allows better model selection than probabilistic PCA in the presence

of heteroscedastic noise:

Factor Analysis is often followed by a rotation of the factors (with the

parameter rotation), usually to improve interpretability. For example,

Varimax rotation maximizes the sum of the variances of the squared loadings,

i.e., it tends to produce sparser factors, which are influenced by only a few

features each (the “simple structure”). See e.g., the first example below.

Examples:

Factor Analysis (with rotation) to visualize patterns

Model selection with Probabilistic PCA and Factor Analysis (FA)

2.5.6. Independent component analysis (ICA)¶

Independent component analysis separates a multivariate signal into

additive subcomponents that are maximally independent. It is

implemented in scikit-learn using the Fast ICA

algorithm. Typically, ICA is not used for reducing dimensionality but

for separating superimposed signals. Since the ICA model does not include

a noise term, for the model to be correct, whitening must be applied.

This can be done internally using the whiten argument or manually using one

of the PCA variants.

It is classically used to separate mixed signals (a problem known as

blind source separation), as in the example below:

ICA can also be used as yet another non linear decomposition that finds

components with some sparsity:

Examples:

Blind source separation using FastICA

FastICA on 2D point clouds

Faces dataset decompositions

2.5.7. Non-negative matrix factorization (NMF or NNMF)¶

2.5.7.1. NMF with the Frobenius norm¶

NMF [1] is an alternative approach to decomposition that assumes that the

data and the components are non-negative. NMF can be plugged in

instead of PCA or its variants, in the cases where the data matrix

does not contain negative values. It finds a decomposition of samples

\(X\) into two matrices \(W\) and \(H\) of non-negative elements,

by optimizing the distance \(d\) between \(X\) and the matrix product

\(WH\). The most widely used distance function is the squared Frobenius

norm, which is an obvious extension of the Euclidean norm to matrices:

\[d_{\mathrm{Fro}}(X, Y) = \frac{1}{2} ||X - Y||_{\mathrm{Fro}}^2 = \frac{1}{2} \sum_{i,j} (X_{ij} - {Y}_{ij})^2\]

Unlike PCA, the representation of a vector is obtained in an additive

fashion, by superimposing the components, without subtracting. Such additive

models are efficient for representing images and text.

It has been observed in [Hoyer, 2004] [2] that, when carefully constrained,

NMF can produce a parts-based representation of the dataset,

resulting in interpretable models. The following example displays 16

sparse components found by NMF from the images in the Olivetti

faces dataset, in comparison with the PCA eigenfaces.

The init attribute determines the initialization method applied, which

has a great impact on the performance of the method. NMF implements the

method Nonnegative Double Singular Value Decomposition. NNDSVD [4] is based on

two SVD processes, one approximating the data matrix, the other approximating

positive sections of the resulting partial SVD factors utilizing an algebraic

property of unit rank matrices. The basic NNDSVD algorithm is better fit for

sparse factorization. Its variants NNDSVDa (in which all zeros are set equal to

the mean of all elements of the data), and NNDSVDar (in which the zeros are set

to random perturbations less than the mean of the data divided by 100) are

recommended in the dense case.

Note that the Multiplicative Update (‘mu’) solver cannot update zeros present in

the initialization, so it leads to poorer results when used jointly with the

basic NNDSVD algorithm which introduces a lot of zeros; in this case, NNDSVDa or

NNDSVDar should be preferred.

NMF can also be initialized with correctly scaled random non-negative

matrices by setting init="random". An integer seed or a

RandomState can also be passed to random_state to control

reproducibility.

In NMF, L1 and L2 priors can be added to the loss function in order to

regularize the model. The L2 prior uses the Frobenius norm, while the L1 prior

uses an elementwise L1 norm. As in ElasticNet,

we control the combination of L1 and L2 with the l1_ratio (\(\rho\))

parameter, and the intensity of the regularization with the alpha_W and

alpha_H (\(\alpha_W\) and \(\alpha_H\)) parameters. The priors are

scaled by the number of samples (\(n\_samples\)) for H and the number of

features (\(n\_features\)) for W to keep their impact balanced with

respect to one another and to the data fit term as independent as possible of

the size of the training set. Then the priors terms are:

\[(\alpha_W \rho ||W||_1 + \frac{\alpha_W(1-\rho)}{2} ||W||_{\mathrm{Fro}} ^ 2) * n\_features

+ (\alpha_H \rho ||H||_1 + \frac{\alpha_H(1-\rho)}{2} ||H||_{\mathrm{Fro}} ^ 2) * n\_samples\]

and the regularized objective function is:

\[d_{\mathrm{Fro}}(X, WH)

+ (\alpha_W \rho ||W||_1 + \frac{\alpha_W(1-\rho)}{2} ||W||_{\mathrm{Fro}} ^ 2) * n\_features

+ (\alpha_H \rho ||H||_1 + \frac{\alpha_H(1-\rho)}{2} ||H||_{\mathrm{Fro}} ^ 2) * n\_samples\]

2.5.7.2. NMF with a beta-divergence¶

As described previously, the most widely used distance function is the squared

Frobenius norm, which is an obvious extension of the Euclidean norm to

matrices:

\[d_{\mathrm{Fro}}(X, Y) = \frac{1}{2} ||X - Y||_{Fro}^2 = \frac{1}{2} \sum_{i,j} (X_{ij} - {Y}_{ij})^2\]

Other distance functions can be used in NMF as, for example, the (generalized)

Kullback-Leibler (KL) divergence, also referred as I-divergence:

\[d_{KL}(X, Y) = \sum_{i,j} (X_{ij} \log(\frac{X_{ij}}{Y_{ij}}) - X_{ij} + Y_{ij})\]

Or, the Itakura-Saito (IS) divergence:

\[d_{IS}(X, Y) = \sum_{i,j} (\frac{X_{ij}}{Y_{ij}} - \log(\frac{X_{ij}}{Y_{ij}}) - 1)\]

These three distances are special cases of the beta-divergence family, with

\(\beta = 2, 1, 0\) respectively [6]. The beta-divergence are

defined by :

\[d_{\beta}(X, Y) = \sum_{i,j} \frac{1}{\beta(\beta - 1)}(X_{ij}^\beta + (\beta-1)Y_{ij}^\beta - \beta X_{ij} Y_{ij}^{\beta - 1})\]

Note that this definition is not valid if \(\beta \in (0; 1)\), yet it can

be continuously extended to the definitions of \(d_{KL}\) and \(d_{IS}\)

respectively.

NMF implemented solvers

Click for more details

NMF implements two solvers, using Coordinate Descent (‘cd’) [5], and

Multiplicative Update (‘mu’) [6]. The ‘mu’ solver can optimize every

beta-divergence, including of course the Frobenius norm (\(\beta=2\)), the

(generalized) Kullback-Leibler divergence (\(\beta=1\)) and the

Itakura-Saito divergence (\(\beta=0\)). Note that for

\(\beta \in (1; 2)\), the ‘mu’ solver is significantly faster than for other

values of \(\beta\). Note also that with a negative (or 0, i.e.

‘itakura-saito’) \(\beta\), the input matrix cannot contain zero values.

The ‘cd’ solver can only optimize the Frobenius norm. Due to the

underlying non-convexity of NMF, the different solvers may converge to

different minima, even when optimizing the same distance function.

NMF is best used with the fit_transform method, which returns the matrix W.

The matrix H is stored into the fitted model in the components_ attribute;

the method transform will decompose a new matrix X_new based on these

stored components:

>>> import numpy as np

>>> X = np.array([[1, 1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]])

>>> from sklearn.decomposition import NMF

>>> model = NMF(n_components=2, init='random', random_state=0)

>>> W = model.fit_transform(X)

>>> H = model.components_

>>> X_new = np.array([[1, 0], [1, 6.1], [1, 0], [1, 4], [3.2, 1], [0, 4]])

>>> W_new = model.transform(X_new)

Examples:

Faces dataset decompositions

Topic extraction with Non-negative Matrix Factorization and Latent Dirichlet Allocation

2.5.7.3. Mini-batch Non Negative Matrix Factorization¶

MiniBatchNMF [7] implements a faster, but less accurate version of the

non negative matrix factorization (i.e. NMF),

better suited for large datasets.

By default, MiniBatchNMF divides the data into mini-batches and

optimizes the NMF model in an online manner by cycling over the mini-batches

for the specified number of iterations. The batch_size parameter controls

the size of the batches.

In order to speed up the mini-batch algorithm it is also possible to scale

past batches, giving them less importance than newer batches. This is done

introducing a so-called forgetting factor controlled by the forget_factor

parameter.

The estimator also implements partial_fit, which updates H by iterating

only once over a mini-batch. This can be used for online learning when the data

is not readily available from the start, or when the data does not fit into memory.

References:

[1]

“Learning the parts of objects by non-negative matrix factorization”

D. Lee, S. Seung, 1999

[2]

“Non-negative Matrix Factorization with Sparseness Constraints”

P. Hoyer, 2004

[4]

“SVD based initialization: A head start for nonnegative

matrix factorization”

C. Boutsidis, E. Gallopoulos, 2008

[5]

“Fast local algorithms for large scale nonnegative matrix and tensor

factorizations.”

A. Cichocki, A. Phan, 2009

[6]

(1,2)

“Algorithms for nonnegative matrix factorization with

the beta-divergence”

C. Fevotte, J. Idier, 2011

[7]

“Online algorithms for nonnegative matrix factorization with the

Itakura-Saito divergence”

A. Lefevre, F. Bach, C. Fevotte, 2011

2.5.8. Latent Dirichlet Allocation (LDA)¶

Latent Dirichlet Allocation is a generative probabilistic model for collections of

discrete dataset such as text corpora. It is also a topic model that is used for

discovering abstract topics from a collection of documents.

The graphical model of LDA is a three-level generative model:

Note on notations presented in the graphical model above, which can be found in

Hoffman et al. (2013):

The corpus is a collection of \(D\) documents.

A document is a sequence of \(N\) words.

There are \(K\) topics in the corpus.

The boxes represent repeated sampling.

In the graphical model, each node is a random variable and has a role in the

generative process. A shaded node indicates an observed variable and an unshaded

node indicates a hidden (latent) variable. In this case, words in the corpus are

the only data that we observe. The latent variables determine the random mixture

of topics in the corpus and the distribution of words in the documents.

The goal of LDA is to use the observed words to infer the hidden topic

structure.

Details on modeling text corpora

Click for more details

When modeling text corpora, the model assumes the following generative process

for a corpus with \(D\) documents and \(K\) topics, with \(K\)

corresponding to n_components in the API:

For each topic \(k \in K\), draw \(\beta_k \sim

\mathrm{Dirichlet}(\eta)\). This provides a distribution over the words,

i.e. the probability of a word appearing in topic \(k\).

\(\eta\) corresponds to topic_word_prior.

For each document \(d \in D\), draw the topic proportions

\(\theta_d \sim \mathrm{Dirichlet}(\alpha)\). \(\alpha\)

corresponds to doc_topic_prior.

For each word \(i\) in document \(d\):

Draw the topic assignment \(z_{di} \sim \mathrm{Multinomial}

(\theta_d)\)

Draw the observed word \(w_{ij} \sim \mathrm{Multinomial}

(\beta_{z_{di}})\)

For parameter estimation, the posterior distribution is:

\[p(z, \theta, \beta |w, \alpha, \eta) =

\frac{p(z, \theta, \beta|\alpha, \eta)}{p(w|\alpha, \eta)}\]

Since the posterior is intractable, variational Bayesian method

uses a simpler distribution \(q(z,\theta,\beta | \lambda, \phi, \gamma)\)

to approximate it, and those variational parameters \(\lambda\),

\(\phi\), \(\gamma\) are optimized to maximize the Evidence

Lower Bound (ELBO):

\[\log\: P(w | \alpha, \eta) \geq L(w,\phi,\gamma,\lambda) \overset{\triangle}{=}

E_{q}[\log\:p(w,z,\theta,\beta|\alpha,\eta)] - E_{q}[\log\:q(z, \theta, \beta)]\]

Maximizing ELBO is equivalent to minimizing the Kullback-Leibler(KL) divergence

between \(q(z,\theta,\beta)\) and the true posterior

\(p(z, \theta, \beta |w, \alpha, \eta)\).

LatentDirichletAllocation implements the online variational Bayes

algorithm and supports both online and batch update methods.

While the batch method updates variational variables after each full pass through

the data, the online method updates variational variables from mini-batch data

points.

Note

Although the online method is guaranteed to converge to a local optimum point, the quality of

the optimum point and the speed of convergence may depend on mini-batch size and

attributes related to learning rate setting.

When LatentDirichletAllocation is applied on a “document-term” matrix, the matrix

will be decomposed into a “topic-term” matrix and a “document-topic” matrix. While

“topic-term” matrix is stored as components_ in the model, “document-topic” matrix

can be calculated from transform method.

LatentDirichletAllocation also implements partial_fit method. This is used

when data can be fetched sequentially.

Examples:

Topic extraction with Non-negative Matrix Factorization and Latent Dirichlet Allocation

References:

“Latent Dirichlet Allocation”

D. Blei, A. Ng, M. Jordan, 2003

“Online Learning for Latent Dirichlet Allocation”

M. Hoffman, D. Blei, F. Bach, 2010

“Stochastic Variational Inference”

M. Hoffman, D. Blei, C. Wang, J. Paisley, 2013

“The varimax criterion for analytic rotation in factor analysis”

H. F. Kaiser, 1958

See also Dimensionality reduction for dimensionality reduction with

Neighborhood Components Analysis.

© 2007 - 2024, scikit-learn developers (BSD License).

Show this page source

Sklearn 可视化数据: 主成分分析(PCA) - 知乎

Sklearn 可视化数据: 主成分分析(PCA) - 知乎首发于奇客谷教程切换模式写文章登录/注册Sklearn 可视化数据: 主成分分析(PCA)吴吃辣编程20年,精品技术教程:www.qikegu.com主成分分析(PCA)是一种常用于减少大数据集维数的降维方法,把大变量集转换为仍包含大变量集中大部分信息的较小变量集。减少数据集的变量数量,自然是以牺牲精度为代价的,降维的好处是以略低的精度换取简便。因为较小的数据集更易于探索和可视化,并且使机器学习算法更容易和更快地分析数据,而不需处理无关变量。总而言之,主成分分析(PCA)的概念很简单——减少数据集的变量数量,同时保留尽可能多的信息。使用scikit-learn,可以很容易地对数据进行主成分分析:# 创建一个随机的PCA模型,该模型包含两个组件

randomized_pca = PCA(n_components=2, svd_solver='randomized')

# 拟合数据并将其转换为模型

reduced_data_rpca = randomized_pca.fit_transform(digits.data)

# 创建一个常规的PCA模型

pca = PCA(n_components=2)

# 拟合数据并将其转换为模型

reduced_data_pca = pca.fit_transform(digits.data)

# 检查形状

reduced_data_pca.shape

# 打印数据

print(reduced_data_rpca)

print(reduced_data_pca)输出[[ -1.25946586 21.27488217]

[ 7.95761214 -20.76870381]

[ 6.99192224 -9.95598251]

...

[ 10.80128338 -6.96025076]

[ -4.87209834 12.42395157]

[ -0.34439091 6.36555458]]

[[ -1.2594653 21.27488157]

[ 7.95761471 -20.76871125]

[ 6.99191791 -9.95597343]

...

[ 10.80128002 -6.96024527]

[ -4.87209081 12.42395739]

[ -0.34439546 6.36556369]]随机的PCA模型在维数较多时性能更好。可以比较常规PCA模型与随机PCA模型的结果,看看有什么不同。告诉模型保留两个组件,是为了确保有二维数据可用来绘图。现在可以绘制一个散点图来可视化数据:colors = ['black', 'blue', 'purple', 'yellow', 'white', 'red', 'lime', 'cyan', 'orange', 'gray']

# 根据主成分分析结果绘制散点图

for i in range(len(colors)):

x = reduced_data_rpca[:, 0][digits.target == i]

y = reduced_data_rpca[:, 1][digits.target == i]

plt.scatter(x, y, c=colors[i])

# 设置图例,0-9用不同颜色表示

plt.legend(digits.target_names, bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

# 设置坐标标签

plt.xlabel('First Principal Component')

plt.ylabel('Second Principal Component')

# 设置标题

plt.title("PCA Scatter Plot")

# 显示图形

plt.show()显示:章节SciKit-Learn 加载数据集SciKit-Learn 数据集基本信息SciKit-Learn 使用matplotlib可视化数据SciKit-Learn 可视化数据:主成分分析(PCA)SciKit-Learn 预处理数据SciKit-Learn K均值聚类SciKit-Learn 支持向量机SciKit-Learn 速查发布于 2019-08-13 16:24sklearnPython机器学习​赞同 7​​添加评论​分享​喜欢​收藏​申请转载​文章被以下专栏收录奇客谷教程精品编程教程与实例,互联网技

主成分分析(PCA)-scikit-learn中文社区

主成分分析(PCA)-scikit-learn中文社区

安装

用户指南

API

案例

更多

入门

教程

更新日志

词汇表

常见问题

交流群

Toggle Menu

Prev

Up

Next

CDA数据科学研究院 提供翻译支持

主成分分析(PCA)

主成分分析(PCA)¶

这些图有助于说明点云在一个方向上是如何非常平坦的--这就是PCA选择一个不是平坦方向的地方。

print(__doc__)# Authors: Gael Varoquaux#          Jaques Grobler#          Kevin Hughes# License: BSD 3 clausefrom sklearn.decomposition import PCAfrom mpl_toolkits.mplot3d import Axes3Dimport numpy as npimport matplotlib.pyplot as pltfrom scipy import stats# ############################################################################## Create the datae = np.exp(1)np.random.seed(4)def pdf(x):    return 0.5 * (stats.norm(scale=0.25 / e).pdf(x)                  + stats.norm(scale=4 / e).pdf(x))y = np.random.normal(scale=0.5, size=(30000))x = np.random.normal(scale=0.5, size=(30000))z = np.random.normal(scale=0.1, size=len(x))density = pdf(x) * pdf(y)pdf_z = pdf(5 * z)density *= pdf_za = x + yb = 2 * yc = a - b + znorm = np.sqrt(a.var() + b.var())a /= normb /= norm# ############################################################################## Plot the figuresdef plot_figs(fig_num, elev, azim):    fig = plt.figure(fig_num, figsize=(4, 3))    plt.clf()    ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=elev, azim=azim)    ax.scatter(a[::10], b[::10], c[::10], c=density[::10], marker='+', alpha=.4)    Y = np.c_[a, b, c]    # Using SciPy's SVD, this would be:    # _, pca_score, V = scipy.linalg.svd(Y, full_matrices=False)    pca = PCA(n_components=3)    pca.fit(Y)    pca_score = pca.explained_variance_ratio_    V = pca.components_    x_pca_axis, y_pca_axis, z_pca_axis = 3 * V.T    x_pca_plane = np.r_[x_pca_axis[:2], - x_pca_axis[1::-1]]    y_pca_plane = np.r_[y_pca_axis[:2], - y_pca_axis[1::-1]]    z_pca_plane = np.r_[z_pca_axis[:2], - z_pca_axis[1::-1]]    x_pca_plane.shape = (2, 2)    y_pca_plane.shape = (2, 2)    z_pca_plane.shape = (2, 2)    ax.plot_surface(x_pca_plane, y_pca_plane, z_pca_plane)    ax.w_xaxis.set_ticklabels([])    ax.w_yaxis.set_ticklabels([])    ax.w_zaxis.set_ticklabels([])elev = -40azim = -80plot_figs(1, elev, azim)elev = 30azim = 20plot_figs(2, elev, azim)plt.show()

脚本的总运行时间:(0分0.198秒)

Download Python source code: plot_pca_3d.py

Download Jupyter notebook: plot_pca_3d.ipynb

© 2007 - 2020, scikit-learn developers (BSD License).

用scikit-learn学习主成分分析(PCA) - 刘建平Pinard - 博客园

用scikit-learn学习主成分分析(PCA) - 刘建平Pinard - 博客园

会员

周边

新闻

博问

AI培训

云市场

所有博客

当前博客

我的博客

我的园子

账号设置

简洁模式 ...

退出登录

注册

登录

刘建平Pinard

十五年码农,对数学统计学,数据挖掘,机器学习,大数据平台,大数据平台应用开发,大数据可视化感兴趣。

博客园

首页

新随笔

联系

订阅

管理

用scikit-learn学习主成分分析(PCA)

    在主成分分析(PCA)原理总结中,我们对主成分分析(以下简称PCA)的原理做了总结,下面我们就总结下如何使用scikit-learn工具来进行PCA降维。

1. scikit-learn PCA类介绍

    在scikit-learn中,与PCA相关的类都在sklearn.decomposition包中。最常用的PCA类就是sklearn.decomposition.PCA,我们下面主要也会讲解基于这个类的使用的方法。

    除了PCA类以外,最常用的PCA相关类还有KernelPCA类,在原理篇我们也讲到了,它主要用于非线性数据的降维,需要用到核技巧。因此在使用的时候需要选择合适的核函数并对核函数的参数进行调参。

    另外一个常用的PCA相关类是IncrementalPCA类,它主要是为了解决单机内存限制的。有时候我们的样本量可能是上百万+,维度可能也是上千,直接去拟合数据可能会让内存爆掉, 此时我们可以用IncrementalPCA类来解决这个问题。IncrementalPCA先将数据分成多个batch,然后对每个batch依次递增调用partial_fit函数,这样一步步的得到最终的样本最优降维。

    此外还有SparsePCA和MiniBatchSparsePCA。他们和上面讲到的PCA类的区别主要是使用了L1的正则化,这样可以将很多非主要成分的影响度降为0,这样在PCA降维的时候我们仅仅需要对那些相对比较主要的成分进行PCA降维,避免了一些噪声之类的因素对我们PCA降维的影响。SparsePCA和MiniBatchSparsePCA之间的区别则是MiniBatchSparsePCA通过使用一部分样本特征和给定的迭代次数来进行PCA降维,以解决在大样本时特征分解过慢的问题,当然,代价就是PCA降维的精确度可能会降低。使用SparsePCA和MiniBatchSparsePCA需要对L1正则化参数进行调参。

2. sklearn.decomposition.PCA参数介绍

    下面我们主要基于sklearn.decomposition.PCA来讲解如何使用scikit-learn进行PCA降维。PCA类基本不需要调参,一般来说,我们只需要指定我们需要降维到的维度,或者我们希望降维后的主成分的方差和占原始维度所有特征方差和的比例阈值就可以了。

    现在我们对sklearn.decomposition.PCA的主要参数做一个介绍:

    1)n_components:这个参数可以帮我们指定希望PCA降维后的特征维度数目。最常用的做法是直接指定降维到的维度数目,此时n_components是一个大于等于1的整数。当然,我们也可以指定主成分的方差和所占的最小比例阈值,让PCA类自己去根据样本特征方差来决定降维到的维度数,此时n_components是一个(0,1]之间的数。当然,我们还可以将参数设置为"mle", 此时PCA类会用MLE算法根据特征的方差分布情况自己去选择一定数量的主成分特征来降维。我们也可以用默认值,即不输入n_components,此时n_components=min(样本数,特征数)。

    2)whiten :判断是否进行白化。所谓白化,就是对降维后的数据的每个特征进行归一化,让方差都为1.对于PCA降维本身来说,一般不需要白化。如果你PCA降维后有后续的数据处理动作,可以考虑白化。默认值是False,即不进行白化。

    3)svd_solver:即指定奇异值分解SVD的方法,由于特征分解是奇异值分解SVD的一个特例,一般的PCA库都是基于SVD实现的。有4个可以选择的值:{‘auto’, ‘full’, ‘arpack’, ‘randomized’}。randomized一般适用于数据量大,数据维度多同时主成分数目比例又较低的PCA降维,它使用了一些加快SVD的随机算法。 full则是传统意义上的SVD,使用了scipy库对应的实现。arpack和randomized的适用场景类似,区别是randomized使用的是scikit-learn自己的SVD实现,而arpack直接使用了scipy库的sparse SVD实现。默认是auto,即PCA类会自己去在前面讲到的三种算法里面去权衡,选择一个合适的SVD算法来降维。一般来说,使用默认值就够了。

    除了这些输入参数外,有两个PCA类的成员值得关注。第一个是explained_variance_,它代表降维后的各主成分的方差值。方差值越大,则说明越是重要的主成分。第二个是explained_variance_ratio_,它代表降维后的各主成分的方差值占总方差值的比例,这个比例越大,则越是重要的主成分。

3. PCA实例

    下面我们用一个实例来学习下scikit-learn中的PCA类使用。为了方便的可视化让大家有一个直观的认识,我们这里使用了三维的数据来降维。

    完整代码参见我的github: https://github.com/ljpzzz/machinelearning/blob/master/classic-machine-learning/pca.ipynb

    首先我们生成随机数据并可视化,代码如下:

import numpy as np

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

%matplotlib inline

from sklearn.datasets.samples_generator import make_blobs

# X为样本特征,Y为样本簇类别, 共1000个样本,每个样本3个特征,共4个簇

X, y = make_blobs(n_samples=10000, n_features=3, centers=[[3,3, 3], [0,0,0], [1,1,1], [2,2,2]], cluster_std=[0.2, 0.1, 0.2, 0.2],

random_state =9)

fig = plt.figure()

ax = Axes3D(fig, rect=[0, 0, 1, 1], elev=30, azim=20)

plt.scatter(X[:, 0], X[:, 1], X[:, 2],marker='o')

    三维数据的分布图如下:

    我们先不降维,只对数据进行投影,看看投影后的三个维度的方差分布,代码如下:

from sklearn.decomposition import PCA

pca = PCA(n_components=3)

pca.fit(X)

print pca.explained_variance_ratio_

print pca.explained_variance_

    输出如下:

[ 0.98318212  0.00850037  0.00831751][ 3.78483785  0.03272285  0.03201892]

    可以看出投影后三个特征维度的方差比例大约为98.3%:0.8%:0.8%。投影后第一个特征占了绝大多数的主成分比例。

    现在我们来进行降维,从三维降到2维,代码如下:

pca = PCA(n_components=2)

pca.fit(X)

print pca.explained_variance_ratio_

print pca.explained_variance_

    输出如下:

[ 0.98318212  0.00850037][ 3.78483785  0.03272285]

    这个结果其实可以预料,因为上面三个投影后的特征维度的方差分别为:[ 3.78483785  0.03272285  0.03201892],投影到二维后选择的肯定是前两个特征,而抛弃第三个特征。

    为了有个直观的认识,我们看看此时转化后的数据分布,代码如下:

X_new = pca.transform(X)

plt.scatter(X_new[:, 0], X_new[:, 1],marker='o')

plt.show()

    输出的图如下:

    可见降维后的数据依然可以很清楚的看到我们之前三维图中的4个簇。

    现在我们看看不直接指定降维的维度,而指定降维后的主成分方差和比例。

pca = PCA(n_components=0.95)

pca.fit(X)

print pca.explained_variance_ratio_

print pca.explained_variance_

print pca.n_components_

    我们指定了主成分至少占95%,输出如下:

[ 0.98318212]

[ 3.78483785]

1

    可见只有第一个投影特征被保留。这也很好理解,我们的第一个主成分占投影特征的方差比例高达98%。只选择这一个特征维度便可以满足95%的阈值。我们现在选择阈值99%看看,代码如下:

pca = PCA(n_components=0.99)

pca.fit(X)

print pca.explained_variance_ratio_

print pca.explained_variance_

print pca.n_components_

    此时的输出如下:

[ 0.98318212 0.00850037]

[ 3.78483785 0.03272285]

2

    这个结果也很好理解,因为我们第一个主成分占了98.3%的方差比例,第二个主成分占了0.8%的方差比例,两者一起可以满足我们的阈值。

    最后我们看看让MLE算法自己选择降维维度的效果,代码如下:

pca = PCA(n_components='mle')

pca.fit(X)

print pca.explained_variance_ratio_

print pca.explained_variance_

print pca.n_components_

    输出结果如下:

[ 0.98318212][ 3.78483785]1

    可见由于我们的数据的第一个投影特征的方差占比高达98.3%,MLE算法只保留了我们的第一个特征。

 

(欢迎转载,转载请注明出处。欢迎沟通交流: liujianping-ok@163.com) 

    

    

posted @

2017-01-02 20:55 

刘建平Pinard 

阅读(154250) 

评论(74) 

编辑 

收藏 

举报

会员力量,点亮园子希望

刷新页面返回顶部

公告

Copyright © 2024 刘建平Pinard

Powered by .NET 8.0 on Kubernetes

PCA example with Iris Data-set — scikit-learn 1.4.1 documentation

PCA example with Iris Data-set — scikit-learn 1.4.1 documentation

Install

User Guide

API

Examples

Community

Getting Started

Tutorial

What's new

Glossary

Development

FAQ

Support

Related packages

Roadmap

Governance

About us

GitHub

Other Versions and Download

More

Getting Started

Tutorial

What's new

Glossary

Development

FAQ

Support

Related packages

Roadmap

Governance

About us

GitHub

Other Versions and Download

Toggle Menu

PrevUp

Next

scikit-learn 1.4.1

Other versions

Please cite us if you use the software.

PCA example with Iris Data-set

Note

Go to the end

to download the full example code or to run this example in your browser via JupyterLite or Binder

PCA example with Iris Data-set¶

Principal Component Analysis applied to the Iris dataset.

See here for more

information on this dataset.

# Code source: Gaël Varoquaux

# License: BSD 3 clause

import matplotlib.pyplot as plt

# unused but required import for doing 3d projections with matplotlib < 3.2

import mpl_toolkits.mplot3d # noqa: F401

import numpy as np

from sklearn import datasets, decomposition

np.random.seed(5)

iris = datasets.load_iris()

X = iris.data

y = iris.target

fig = plt.figure(1, figsize=(4, 3))

plt.clf()

ax = fig.add_subplot(111, projection="3d", elev=48, azim=134)

ax.set_position([0, 0, 0.95, 1])

plt.cla()

pca = decomposition.PCA(n_components=3)

pca.fit(X)

X = pca.transform(X)

for name, label in [("Setosa", 0), ("Versicolour", 1), ("Virginica", 2)]:

ax.text3D(

X[y == label, 0].mean(),

X[y == label, 1].mean() + 1.5,

X[y == label, 2].mean(),

name,

horizontalalignment="center",

bbox=dict(alpha=0.5, edgecolor="w", facecolor="w"),

)

# Reorder the labels to have colors matching the cluster results

y = np.choose(y, [1, 2, 0]).astype(float)

ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=y, cmap=plt.cm.nipy_spectral, edgecolor="k")

ax.xaxis.set_ticklabels([])

ax.yaxis.set_ticklabels([])

ax.zaxis.set_ticklabels([])

plt.show()

Total running time of the script: (0 minutes 0.095 seconds)

Download Jupyter notebook: plot_pca_iris.ipynb

Download Python source code: plot_pca_iris.py

Related examples

The Iris Dataset

The Iris Dataset

K-means Clustering

K-means Clustering

Sparsity Example: Fitting only features 1 and 2

Sparsity Example: Fitting only features 1 and 2

Incremental PCA

Incremental PCA

Comparison of LDA and PCA 2D projection of Iris dataset

Comparison of LDA and PCA 2D projection of Iris dataset

Gallery generated by Sphinx-Gallery

© 2007 - 2024, scikit-learn developers (BSD License).

Show this page source