Search results
Results from the WOW.Com Content Network
183. Simple one-line answer to create a new dataframe with only numeric columns: df.select_dtypes(include=np.number) If you want the names of numeric columns: df.select_dtypes(include=np.number).columns.tolist() Complete code: import pandas as pd. import numpy as np.
Pandas: pd.cut. As @JonClements suggests, you can use pd.cut for this, the benefit here being that your new column becomes a Categorical. You only need to define your boundaries (including np.inf) and category names, then apply pd.cut to the desired numeric column. bins = [0, 2, 18, 35, 65, np.inf]
With the following code you can convert all data frame columns to numeric (X is the data frame that we want to convert it's columns): as.data.frame(lapply(X, as.numeric)) and for converting whole matrix into numeric you have two ways: Either: mode(X) <- "numeric". or:
Now the data look similar but are stored categorically. To capture the category codes: df['code'] = df.cc.codes Now you have: cc temp code 0 US 37.0 2 1 CA 12.0 1 2 US 35.0 2 3 AU 20.0 0 If you don't want to modify your DataFrame but simply get the codes: df.cc.astype('category').codes
EDIT: updated to avoid use of ill-advised sapply.. Since a data frame is a list we can use the list-apply functions:
Use sklearn.impute.IterativeImputer and replicate a MissForest imputer for mixed data (but you will have to processe separately numeric from categorical features). For example: For example:
It is text data and I learned that K means can not handle Non-Numerical data. I wanted to cluster data just on the basis of the tweets. The data looks like this. I found this code that can converts the text into numerical data. def handle_non_numerical_data(df): columns = df.columns.values. for column in columns:
I would suggest to use numpy.gradient, like in this example. import numpy as np. from matplotlib import pyplot as plt. # we sample a sin(x) function. dx = np.pi/10. x = np.arange(0,2*np.pi,np.pi/10) # we calculate the derivative, with np.gradient. plt.plot(x,np.gradient(np.sin(x), dx), '-*', label='approx')
I haven't used scikit much before but I suppose that that Gaussian Naive Bayes is suitable for continuous data and that Bernoulli Naive Bayes can be used for categorical data. However, since I want to have both categorical and continuous data in my model, I don't really know how to handle this. Any ideas would be much appreciated!
Just pick a type: you can use a NumPy dtype (e.g. np.int16), some Python types (e.g. bool), or pandas-specific types (like the categorical dtype). Call the method on the object you want to convert and astype() will try and convert it for you: # convert all DataFrame columns to the int64 dtype. df = df.astype(int)