Ashis Parajuli Blogs

Posts

Showing posts from April, 2017

Steps to Install zeppelin with spark

April 23, 2017

- Data Ingestion, Data Discovery,Data Analytics, Data Visualization &Collaboration we can use zeppelin, below are the steps to install it : 1. Download spark-2.1.0 2.Download zeppein-file from official site 3.Sudo vi .bashrc , and place it in last export JAVA_HOME=/usr/lib/jvm/java-8-oracle export Spark_HOME=/home/ashis/Downloads/spark-2.1.0 5. Source .bashrc 6. cd Downloads/conf 5. Sudo vi zeppelin-env.sh Place the same code here , export JAVA_HOME=/usr/lib/jvm/java-8-oracle export Spark_HOME=/home/ashis/Downloads/spark-2.1.0 6. Sudo vi zeppelin-site.xml Change the port to 8082 from 8080 7. Cd spark-2.1.0 sbin/start-all.sh 8. Cd zeppelin 9. bin/zeppelin-daemon.sh start 10.bin/spark-shell 11. http://localhost:8082/ for zeppelin UI 12. Try Zeppelin tutorial given here : http://localhost:8082/#/ or if its not working you can follow below link https://gist...

Nepali New Year

April 17, 2017

Nepali New Year There are more than 60 ethnic group in Nepal , so we have more than nine new years days in a year.Among them the primary one is celebrated on the first month called “Baisakh” of the Bikram sambat Nepali calendar and falls in mid April on International Calender. It’s a holiday in Nepal ,so we plan for some new exciting events such as Get-together with friends, family and relatives and have fun. We start celebrating New year from New year's eve, we usually stay Upto 12 AM in morning to welcome a new day with a bliss. In home we cook or bring some foods we love to eat and enjoy with family. Also we go outside in hotels-restaurants for dinner launch. Sometime with friends circle,sometimes family members we plan to visit some places , Parks acros...

Parquet is a column based data store or File Format (Useful for Spark read/write and SQL in order to boost performance)

April 06, 2017

PARQUET FILE SYSTEM Parquet is a column based store. Parquet was a joint project of cloudera and Twitter engineers. It is built to support very efficient compression and encoding schemes. Parquet allows compression schemes to be specified on a per-column level, and is future-proofed to allow adding more encodings as they are invented and implemented. We separate the concepts of encoding and compression, allowing Parquet consumers to implement operators that work directly on encoded data without paying decompression and decoding penalty when possible. Twitter is starting to convert some of its...

DIfferent issues that may occur in Apache spark and their remedies.

April 06, 2017

DIfferent issues that may occur in spark and their remedies : Currently we are using m4.2xlarge Spark jobs are performed efficiently to process the large data with the configurations discussed below, taming the big data to get desired output with low latency. Here we discuss the Spark configuration parameters we applied to resolve issues ,and get efficient performance in AWS to process Big data of 30 gb. Spark on yarn environment: (set, below two properties to submit job though spark-submit. –num-executors NUM Number of executors to launch (Default: 2). –executor-cores NUM Number of cores per executor (Default: 1). Note: These switches to be used depending upon cluster capacity. Troubleshooting: Issue 1: Exception in thread “main” org.apache.spark.SparkException: A master URL must be configured. Resolution: Spark properties are configured in three ways: Set...