What We Wanted to Achieve....
We were looking for a generic code snippet to drop certain column after reading a data from a file using spark 2.0. All we knew, in all the files we were aiming to process, the column to be dropped was in (n-m)th position(Where n is the number of columns in a data-frame).
What We Did ....
Read data:
val df1 = sqlContext.read.format("com.databricks.spark.csv").option("delimiter", "delim").load("/Path/to/file");
Drop Column from a dataframe:
Drop column based on position:
We were looking for a generic code snippet to drop certain column after reading a data from a file using spark 2.0. All we knew, in all the files we were aiming to process, the column to be dropped was in (n-m)th position(Where n is the number of columns in a data-frame).
What We Did ....
Read data:
val df1 = sqlContext.read.format("com.databricks.spark.csv").option("delimiter", "delim").load("/Path/to/file");
Get Schema of dataframe:
df1.printSchema()
Drop Column from a dataframe:
df1.drop("column name")
Drop column based on position:
//To drop last column of dataframe
val
col=df1.columns // get list of columns from a
dataframe.
val
n=df1.columns.length //get number of columns (n).
val ToBeDropped = n-m // get the index of the column to be dropped.
val
oldDf=df1.drop(col(ToBeDropped )) //drop (n-m)th column from df.
Example to drop last but one column from the data frame:
val col=df1.columns
val n=df1.columns.length
val ToBeDropped = n-2 // to drop last column subtract 1 from n (i.e n-1) and so on..
val oldDf=df1.drop(col(ToBeDropped ))