Tag Archives: java

Run pyspark on your windows machine

1) Download Spark lib on your local machine and decompress the archive. Then set SPARK_HOME and HADOOP_HOME env variables to point to this decompressed folder location – For example: C:\Users\some_user\PycharmProjects\spark-2.4.4-bin-hadoop2.7 Also lookup the winutils executable online and you need to put it in the spark bin folder. 2) Install Java JDK if you do not […]

Posted on September 13, 2019 by GauZ in Pyspark, spark

Tags: java, JDK, pyspark, Python, setup, Spark

java-jdk in pyspark project

A pyspark project that is running locally requires JAVA_HOME environment variable setup. If you’re using conda or anaconda-project to manage packages, then you do not need to install the bloated Oracle Java JDK but just add the java-jdk package from bioconda (linux) or cyclus (linux and win) channel and point JAVA_HOME property to the bin […]

Posted on September 13, 2019 by GauZ in Java, Python, Technology, Uncategorized

Tags: anaconda, conda, cyclus, java, java-jdk, java_home, JDK, package, pyspark

Setup Oracle Java (JDK – 8) on Ubuntu

Open up a Ubuntu terminal and run below statements in sequence to install JDK 8 $ sudo add-apt-repository ppa:webupd8team/java $ sudo apt-get update $ sudo apt-get install oracle-java8-installer $ sudo apt-get install oracle-java8-set-default

Posted on April 27, 2017 by GauZ in Technology

Tags: java, JDK, JDK 8, Oracle, Ubuntu

Tinkering with Apache Hadoop – Map Reduce Framework

I have used a Map Reduce based system at my present employer (Bank of America – Merrill Lynch) to process (read “crunch”) extremely large datasets in matter of seconds. Sometimes I used those to price bonds in real-time otherwise it was used for data processing/reporting purposes. It is an in-house product, known as Hugs framework […]

Posted on February 18, 2013 by GauZ in Java, Python, Statistics, Technology

Tags: Apache, awk, bash, basic, combiner, dataset, doublewritable, hadoop, intwritable, java, map reduce, mapper, mapred, mapreduce, Python, reducer, text, tutorial, Ubuntu, weather

Logging in Java, C# and Python

In Java & C#, the most commonly used logging utilities are log4j and log4net. Python has it’s own inbuilt logging module which includes the exact same features as other log4X brethren! Setting up logging module in python >> import logging def initLogger(appName=’Application’, handlerType=’StreamHandler’, \ loggerLevel=’INFO’, handlerLevel=’DEBUG’): ”’ * There are many handler types available such […]

Posted on May 11, 2012 by GauZ in C#, Java, Python, Technology

Tags: c#, java, log4j, log4net, logging, module, Python

Gauz's view on Data Science, Engineering, Finance, and beyond!

Tag Archives: java

Run pyspark on your windows machine

java-jdk in pyspark project

Setup Oracle Java (JDK – 8) on Ubuntu

Tinkering with Apache Hadoop – Map Reduce Framework

Logging in Java, C# and Python

Recent Posts

Archives

Recent Comments

Gauz's view on Data Science, Engineering, Finance, and beyond!

Tag Archives: java

Run pyspark on your windows machine

java-jdk in pyspark project

Setup Oracle Java (JDK – 8) on Ubuntu

Tinkering with Apache Hadoop – Map Reduce Framework

Logging in Java, C# and Python

Tags

Recent Posts

Search this site

Archives

Recent Comments