DIfferent issues that may occur in spark and their remedies : Currently we are using m4.2xlarge Spark jobs are performed efficiently to process the large data with the configurations discussed below, taming the big data to get desired output with low latency. Here we discuss the Spark configuration parameters we applied to resolve issues ,and get efficient performance in AWS to process Big data of 30 gb. Spark on yarn environment: (set, below two properties to submit job though spark-submit. –num-executors NUM Number of executors to launch (Default: 2). –executor-cores NUM Number of cores per executor (Default: 1). Note: These switches to be used depending upon cluster capacity. Troubleshooting: Issue 1: Exception in thread “main” org.apache.spark.SparkException: A master URL must be configured. Resolution: Spark properties are configured in three ways: Set...
Comments
Post a Comment