What We Wanted to Achieve....
We were looking for a generic code snippet to drop certain column after reading a data from a file using spark 2.0. All we knew, in all the files we were aiming to process, the column to be dropped was in (n-m)th position(Where n is the number of columns in a data-frame).
What We Did ....
Read data:
val df1 = sqlContext.read.format("com.databricks.spark.csv").option("delimiter", "delim").load("/Path/to/file");
Drop Column from a dataframe:
Drop column based on position:
We were looking for a generic code snippet to drop certain column after reading a data from a file using spark 2.0. All we knew, in all the files we were aiming to process, the column to be dropped was in (n-m)th position(Where n is the number of columns in a data-frame).
What We Did ....
Read data:
val df1 = sqlContext.read.format("com.databricks.spark.csv").option("delimiter", "delim").load("/Path/to/file");
Get Schema of dataframe:
df1.printSchema()
Drop Column from a dataframe:
df1.drop("column name")
Drop column based on position:
//To drop last column of dataframe
val
col=df1.columns // get list of columns from a
dataframe.
val
n=df1.columns.length //get number of columns (n).
val ToBeDropped = n-m // get the index of the column to be dropped.
val
oldDf=df1.drop(col(ToBeDropped )) //drop (n-m)th column from df.
Example to drop last but one column from the data frame:
val col=df1.columns
val n=df1.columns.length
val ToBeDropped = n-2 // to drop last column subtract 1 from n (i.e n-1) and so on..
val oldDf=df1.drop(col(ToBeDropped ))
Good article
ReplyDeleteExcellent article... Thank you for providing such valuable information; the contents are quite intriguing. I'll be waiting for the next post on Big Data Engineering Services with great excitement.
ReplyDeleteIT's very informative blog and useful article thank you for sharing with us , keep posting learn more about Product engineering services | Product engineering solutions.
ReplyDelete