Tag Archives: java
Run pyspark on your windows machine
1) Download Spark lib on your local machine and decompress the archive. Then set SPARK_HOME and HADOOP_HOME env variables to point to this decompressed folder location – For example: C:\Users\some_user\PycharmProjects\spark-2.4.4-bin-hadoop2.7 Also lookup the winutils executable online and you need to put it in the spark bin folder. 2) Install Java JDK if you do not […]
java-jdk in pyspark project
A pyspark project that is running locally requires JAVA_HOME environment variable setup. If you’re using conda or anaconda-project to manage packages, then you do not need to install the bloated Oracle Java JDK but just add the java-jdk package from bioconda (linux) or cyclus (linux and win) channel and point JAVA_HOME property to the bin […]
Setup Oracle Java (JDK – 8) on Ubuntu
Open up a Ubuntu terminal and run below statements in sequence to install JDK 8 $ sudo add-apt-repository ppa:webupd8team/java $ sudo apt-get update $ sudo apt-get install oracle-java8-installer $ sudo apt-get install oracle-java8-set-default
Tinkering with Apache Hadoop – Map Reduce Framework
I have used a Map Reduce based system at my present employer (Bank of America – Merrill Lynch) to process (read “crunch”) extremely large datasets in matter of seconds. Sometimes I used those to price bonds in real-time otherwise it was used for data processing/reporting purposes. It is an in-house product, known as Hugs framework […]
Logging in Java, C# and Python
In Java & C#, the most commonly used logging utilities are log4j and log4net. Python has it’s own inbuilt logging module which includes the exact same features as other log4X brethren! Setting up logging module in python >> import logging def initLogger(appName=’Application’, handlerType=’StreamHandler’, \ loggerLevel=’INFO’, handlerLevel=’DEBUG’): ”’ * There are many handler types available such […]
Login