Python Introduction#

Section Title: Python Introduction

  • Python is one of the most commonly used programming languages.

  • It was created by Dutch programmer Guido van Rossum in the late 1980s.

  • He named it Python inspired by the BBC show Monty Python’s Flying Circus.

  • Python has become the primary programming language for numerous data science applications.

Advantages#

  1. Beginner-friendly: Python’s syntax is simple and easy to read and write.

    • Example: Utilizing indentation (spaces) instead of parentheses results in clean code.

    • The following two code checks whether the integer 15 is an even or odd number.

    • Even if you do not understand what these two pieces of code are doing, you can see that Python has a simpler syntax.

    Comparison of Python and Java Code for Checking If 15 is Even or Odd

    • Example: No need to declare a variable and its type before using it.

      • In Java, you need to specify that age is an integer, name is a character (similar to a string in Python), game_over is a boolean, and weight is a float. There is no such requirement in Python because Python can understand the type from the value.

Python

Java

age=25

int age =25;

name = ‘mike’

String name = “mike”;

game_over=True

boolean game_over=true;

weight=120.75

float weight=120.75f;


  1. Free and open source: Python is freely usable and distributable, including commercial purposes.

    • You can freely download and install it to your computer.

    • Online editors are also available for use without any installation.

    • Python packages are developed by major companies and shared for everyone’s use.

      • Tensorflow was developed by Google.

      • Pytorch was developed by Facebook.


  1. General-purpose programming language: You can write code for different purposes.

    • Write game code

    • Develop programs for stores

    • Perform visualizations

    • Build predictive models


  1. Rich Libraries: Python has rich collection of libraries encompassing of tools for various fields.

    • Numpy: for scientific computing

    • Statistics: offers statistics tools

    • Pandas: for data wrangling and analysis

    • Matplotlib: for visualization

    • Keras: for constructing neural network models

    • Django: for develop websites

    • Flask: for online applications

    • Scikit-learn: for predictive data analysis


  1. Object oriented: Build on the concept of objects.

  • Utilizes simple and reusable parts like blueprints

  • Short code may encompass many hidden functionalities


  1. Community Support: Python has a vast and active community that can provide assistance.

    • You can find answers to your questions on platfoms like stackoverflow.

    • Extensive resources available on platforms like linkedin


  1. Portability: Python code can be run on other platforms with little to no modifications.

    • Works on virtual platforms

    • Windows, Unix, Linux, macOS


  1. Python code can be combined with components written in other languages like C++, Java.


  1. Python can be seen as a combination of general-purpose languages (such as C++ and Java) and domain-specific languages (like Matlab).

Disadvantages#

  1. Python code is visible to anyone using the application, allowing for code to be copied or modified.

  2. The execution speed of Python is slower compared to languages like C++. This is mainly due to the use of an interpreter instead of a compiler.

    • The interpreter translates Python code into machine code line by line, enabling the computer to understand and execute the code

    • While a compiler translates the entire source code in a single run, an interpreter processes the source code line by line.

    • To understand the difference between interpreters and compilers, you can watch the following video.

from IPython.lib.display import YouTubeVideo
YouTubeVideo('_C5AHaS1mOA', width=500, height=300)

Installing Python#

  • It is easier to install Python with Anaconda

  • Anaconda is a free and open-source distribution of the Python.

  • It installs Python.

  • It simplifies package management and deployment by providing a comprehensive collection of tools and libraries pre-installed.

    • It comes with over 250 packages automatically installed.

    • It includes popular Python packages like NumPy, Pandas, Matplotlib, SciPy.

Integrated development environment (IDE)#

  • IDE provides tools to write codes in a better, easier, and faster way.

  • IDE includes:

    • Code editor

    • Debugger

    • Code highlighting

    • Auto completion

    • Project Management

  • Examples :

    • Spyder: It comes with Anaconda.

      • It is free and open source.

      • Designed by and for scientists, engineers and data analysts.

    • Pycharm: It is developed by Jet Brains which is professional and very advanced.

      • Its community edition is free.

      • Professional edition is free for students and educators.

Jupyter Notebook#

  • Jupyter Notebook allows writing or running code in a web browser.

  • It is a web-based interactive development environment.

  • It does not require internet access.

  • It is free and comes with Anaconda.

Google Colab#

  • Short for Google Colaboratory.

  • An online tool for writing and running code.

  • A cloud-based computational environment or notebook.

  • Recommended for beginners since there’s no need to install any software.

  • Files are automatically saved to Google Drive.

  • As it is cloud-based, it requires an internet connection.

  • Colab notebooks are Jupyter notebooks that are hosted by Colab.

Applications#

  • The following examples demonstrate what can be accomplished with Python.

  • The aim of this section is to provide you with an idea of Python’s capabilities.

  • You’re not expected to comprehend the code at this stage.

Import Stock Data#

  • You can import historical stock data from Yahoo Finance.

  • The following represents data for Apple Stocks.

import yfinance
df = yfinance.Ticker('AAPL').history()
df.head().round(2)
Open High Low Close Volume Dividends Stock Splits
Date
2024-09-16 00:00:00-04:00 216.54 217.22 213.92 216.32 59357400 0.0 0.0
2024-09-17 00:00:00-04:00 215.75 216.90 214.50 216.79 45519300 0.0 0.0
2024-09-18 00:00:00-04:00 217.55 222.71 217.54 220.69 59894900 0.0 0.0
2024-09-19 00:00:00-04:00 224.99 229.82 224.63 228.87 66781300 0.0 0.0
2024-09-20 00:00:00-04:00 229.97 233.09 227.62 228.20 318679900 0.0 0.0

Import Data from Wikipedia#

import pandas as pd
pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0].head()
Symbol Security GICS Sector GICS Sub-Industry Headquarters Location Date added CIK Founded
0 MMM 3M Industrials Industrial Conglomerates Saint Paul, Minnesota 1957-03-04 66740 1902
1 AOS A. O. Smith Industrials Building Products Milwaukee, Wisconsin 2017-07-26 91142 1916
2 ABT Abbott Laboratories Health Care Health Care Equipment North Chicago, Illinois 1957-03-04 1800 1888
3 ABBV AbbVie Health Care Biotechnology North Chicago, Illinois 2012-12-31 1551152 2013 (1888)
4 ACN Accenture Information Technology IT Consulting & Other Services Dublin, Ireland 2011-07-06 1467373 1989

Scatter Plot#

import matplotlib.pyplot as plt
plt.figure(figsize=(10,5))
plt.scatter(df.index, df.Close, color='r')
plt.xticks(rotation=30);
_images/66d4faba60d3f3803af679ef55b91f6cb7dd62c1f1e212b788d7a6eb4f8c9b11.png

Line Plot#

plt.figure(figsize=(10,5))
plt.plot(df.index, df.Close, color='r')
plt.xticks(rotation=30);
_images/7491fb809b84a5f43f46e1e5f6a74cf17ba2017f2ca3b1762b5bc13116dbe909.png

Histogram#

plt.figure(figsize=(10,3))
plt.hist(df.Close, bins=20, color='orange', orientation='horizontal');
_images/df0dc2aa4ac957d3b8cb47f5f483da68feaaea866e9fabeaabcf890c6cf75c35.png

Pie Chart#

number = [53, 122, 96, 239]
color_list = ['y', 'purple', 'g', 'r']
coins  = ['Penny', 'Nickel', 'Dime', 'Quarter']
plt.pie(number, colors = color_list, autopct='%1.1f%%',  labels = coins, radius=0.75);
_images/479cc850874d44c18452c4eb1345ec633539793b698ac416621b12dca18b5e4a.png

Multiple Plots#

plt.figure(figsize = (20, 10))
color_set = ['r--', 'g--', 'b--', 'o--']
for i in range(1,5):
    plt.subplot(2, 2, i)
    plt.plot(df.iloc[:,i-1], color_set[i-1]);
    plt.ylabel(df.columns[i-1],fontsize=15)
    plt.xticks(rotation=30)
    plt.grid()
_images/250622a4570959c08256d5547defa882def5a4937d3736ecd6ef36dc18bfbbd4.png