Posts

Showing posts from April, 2017

Steps to Install zeppelin with spark

Image
- Data Ingestion, Data Discovery,Data Analytics, Data Visualization &Collaboration we can use zeppelin, below are the steps to install it : 1. Download spark-2.1.0 2.Download zeppein-file from official site 3.Sudo vi .bashrc  , and place it in last export JAVA_HOME=/usr/lib/jvm/java-8-oracle export Spark_HOME=/home/ashis/Downloads/spark-2.1.0 5. Source .bashrc 6. cd Downloads/conf 5. Sudo vi zeppelin-env.sh Place the same code here , export JAVA_HOME=/usr/lib/jvm/java-8-oracle export Spark_HOME=/home/ashis/Downloads/spark-2.1.0 6.  Sudo vi zeppelin-site.xml Change the port to 8082 from 8080 7. Cd spark-2.1.0  sbin/start-all.sh 8.  Cd zeppelin     9. bin/zeppelin-daemon.sh  start 10.bin/spark-shell 11. http://localhost:8082/  for zeppelin UI 12. Try Zeppelin tutorial given here : http://localhost:8082/#/ or if its not working you can follow below link https://gist.github.com/pratos/b2e2937106980a867d0558cba

Nepali New Year

Image
                                                            Nepali New Year There are more than 60 ethnic group in Nepal , so we have more than nine new years days in a year.Among them the primary one is celebrated on the first month called “Baisakh” of the Bikram sambat Nepali calendar and  falls in mid April on International Calender. It’s a holiday in Nepal ,so we plan for some new exciting events such as  Get-together with friends, family and relatives and have fun. We start celebrating New year from New year's eve, we usually stay Upto 12 AM in morning to welcome a new day with a bliss. In home we cook or bring some foods we love to eat and enjoy with family. Also we go outside in hotels-restaurants for dinner launch. Sometime with friends circle,sometimes family members we plan to visit some places , Parks across the country are full of celebrations with picnic  and some programs. Some schools, colleges and offices plan for new year cultural program to start a fre

Parquet is a column based data store or File Format (Useful for Spark read/write and SQL in order to boost performance)

Image
                                                                 PARQUET FILE SYSTEM Parquet is a column based store. Parquet was a joint project of cloudera and Twitter engineers. It is built to support very efficient compression and encoding schemes. Parquet allows compression schemes to be specified on a per-column level, and is future-proofed to allow adding more encodings as they are invented and implemented. We separate the concepts of encoding and compression, allowing Parquet consumers to implement operators that work directly on encoded data without paying decompression and decoding penalty when possible. Twitter is starting to convert some of its major data source to Parquet in order to take advantage of the compression and deserialization savings. Fusemachines is AI based sales based company we used to use JSON as a data store. Now we are upgrading to Parquet data store file system.   Time and space analysis on Small Datasets (Statistics on

DIfferent issues that may occur in Apache spark and their remedies.

DIfferent issues that may occur in spark and their remedies : Currently we are using m4.2xlarge Spark jobs are performed efficiently to process the large data with the configurations discussed below,  taming the big data to get desired output with low latency.    Here we discuss the Spark configuration parameters we applied to resolve issues ,and get efficient performance in AWS to process Big data of 30 gb. Spark on yarn environment: (set, below two properties to submit job though spark-submit. –num-executors NUM     Number of executors to launch (Default: 2). –executor-cores NUM     Number of cores per executor (Default: 1). Note: These switches to be used depending upon cluster capacity. Troubleshooting: Issue 1: Exception in thread “main” org.apache.spark.SparkException: A master URL must be configured. Resolution:      Spark properties are configured in three ways: Setting the configuration properties in the code using spark conf Setting