Python Introduction#

Section Title: Python Introduction

  • Python is one of the most commonly used programming languages.

  • It was created by Dutch programmer Guido van Rossum in the late 1980s.

  • He named it Python inspired by the BBC show Monty Python’s Flying Circus.

  • Python has become the primary programming language for numerous data science applications.

Advantages#

  1. Beginner-friendly: Python’s syntax is simple and easy to read and write.

    • Example: Utilizing indentation (spaces) instead of parentheses results in clean code.

    • The following two code checks whether the integer 15 is an even or odd number.

    • Even if you do not understand what these two pieces of code are doing, you can see that Python has a simpler syntax.

    Comparison of Python and Java Code for Checking If 15 is Even or Odd

    • Example: No need to declare a variable and its type before using it.

      • In Java, you need to specify that age is an integer, name is a character (similar to a string in Python), game_over is a boolean, and weight is a float. There is no such requirement in Python because Python can understand the type from the value.

Python

Java

age=25

int age =25;

name = ‘mike’

String name = “mike”;

game_over=True

boolean game_over=true;

weight=120.75

float weight=120.75f;


  1. Free and open source: Python is freely usable and distributable, including commercial purposes.

    • You can freely download and install it to your computer.

    • Online editors are also available for use without any installation.

    • Python packages are developed by major companies and shared for everyone’s use.

      • Tensorflow was developed by Google.

      • Pytorch was developed by Facebook.


  1. General-purpose programming language: You can write code for different purposes.

    • Write game code

    • Develop programs for stores

    • Perform visualizations

    • Build predictive models


  1. Rich Libraries: Python has rich collection of libraries encompassing of tools for various fields.

    • Numpy: for scientific computing

    • Statistics: offers statistics tools

    • Pandas: for data wrangling and analysis

    • Matplotlib: for visualization

    • Keras: for constructing neural network models

    • Django: for develop websites

    • Flask: for online applications

    • Scikit-learn: for predictive data analysis


  1. Object oriented: Build on the concept of objects.

  • Utilizes simple and reusable parts like blueprints

  • Short code may encompass many hidden functionalities


  1. Community Support: Python has a vast and active community that can provide assistance.

    • You can find answers to your questions on platfoms like stackoverflow.

    • Extensive resources available on platforms like linkedin


  1. Portability: Python code can be run on other platforms with little to no modifications.

    • Works on virtual platforms

    • Windows, Unix, Linux, macOS


  1. Python code can be combined with components written in other languages like C++, Java.


  1. Python can be seen as a combination of general-purpose languages (such as C++ and Java) and domain-specific languages (like Matlab).

Disadvantages#

  1. Python code is visible to anyone using the application, allowing for code to be copied or modified.

  2. The execution speed of Python is slower compared to languages like C++. This is mainly due to the use of an interpreter instead of a compiler.

    • The interpreter translates Python code into machine code line by line, enabling the computer to understand and execute the code

    • While a compiler translates the entire source code in a single run, an interpreter processes the source code line by line.

    • To understand the difference between interpreters and compilers, you can watch the following video.

from IPython.lib.display import YouTubeVideo
YouTubeVideo('_C5AHaS1mOA', width=500, height=300)

Installing Python#

  • It is easier to install Python with Anaconda

  • Anaconda is a free and open-source distribution of the Python.

  • It installs Python.

  • It simplifies package management and deployment by providing a comprehensive collection of tools and libraries pre-installed.

    • It comes with over 250 packages automatically installed.

    • It includes popular Python packages like NumPy, Pandas, Matplotlib, SciPy.

Integrated development environment (IDE)#

  • IDE provides tools to write codes in a better, easier, and faster way.

  • IDE includes:

    • Code editor

    • Debugger

    • Code highlighting

    • Auto completion

    • Project Management

  • Examples :

    • Spyder: It comes with Anaconda.

      • It is free and open source.

      • Designed by and for scientists, engineers and data analysts.

    • Pycharm: It is developed by Jet Brains which is professional and very advanced.

      • Its community edition is free.

      • Professional edition is free for students and educators.

Jupyter Notebook#

  • Jupyter Notebook allows writing or running code in a web browser.

  • It is a web-based interactive development environment.

  • It does not require internet access.

  • It is free and comes with Anaconda.

Google Colab#

  • Short for Google Colaboratory.

  • An online tool for writing and running code.

  • A cloud-based computational environment or notebook.

  • Recommended for beginners since there’s no need to install any software.

  • Files are automatically saved to Google Drive.

  • As it is cloud-based, it requires an internet connection.

  • Colab notebooks are Jupyter notebooks that are hosted by Colab.

Applications#

  • The following examples demonstrate what can be accomplished with Python.

  • The aim of this section is to provide you with an idea of Python’s capabilities.

  • You’re not expected to comprehend the code at this stage.

Import Stock Data#

  • You can import historical stock data from Yahoo Finance.

  • The following represents data for Apple Stocks.

import yfinance
df = yfinance.Ticker('AAPL').history()
df.head().round(2)
Open High Low Close Volume Dividends Stock Splits
Date
2024-12-26 00:00:00-05:00 258.19 260.10 257.63 259.02 27237100 0.0 0.0
2024-12-27 00:00:00-05:00 257.83 258.70 253.06 255.59 42355300 0.0 0.0
2024-12-30 00:00:00-05:00 252.23 253.50 250.75 252.20 35557500 0.0 0.0
2024-12-31 00:00:00-05:00 252.44 253.28 249.43 250.42 39480700 0.0 0.0
2025-01-02 00:00:00-05:00 248.93 249.10 241.82 243.85 55740700 0.0 0.0

Import Data from Wikipedia#

import pandas as pd
pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0].head()
Symbol Security GICS Sector GICS Sub-Industry Headquarters Location Date added CIK Founded
0 MMM 3M Industrials Industrial Conglomerates Saint Paul, Minnesota 1957-03-04 66740 1902
1 AOS A. O. Smith Industrials Building Products Milwaukee, Wisconsin 2017-07-26 91142 1916
2 ABT Abbott Laboratories Health Care Health Care Equipment North Chicago, Illinois 1957-03-04 1800 1888
3 ABBV AbbVie Health Care Biotechnology North Chicago, Illinois 2012-12-31 1551152 2013 (1888)
4 ACN Accenture Information Technology IT Consulting & Other Services Dublin, Ireland 2011-07-06 1467373 1989

Scatter Plot#

import matplotlib.pyplot as plt
plt.figure(figsize=(10,5))
plt.scatter(df.index, df.Close, color='r')
plt.xticks(rotation=30);
_images/92d90e17fd74577891984591a8edbfa7f6054caec1a34c56a1f0713e7911d75b.png

Line Plot#

plt.figure(figsize=(10,5))
plt.plot(df.index, df.Close, color='r')
plt.xticks(rotation=30);
_images/401b2dfb2e0b7e62f8b0c4866853fac3c9b5ffe330ba9d3fbfe1b17a31cb1a09.png

Histogram#

plt.figure(figsize=(10,3))
plt.hist(df.Close, bins=20, color='orange', orientation='horizontal');
_images/15e44f9fea7a2ee31996d6eb0c9f28b359e46bdb3a48b287f7be9c32c507091e.png

Pie Chart#

number = [53, 122, 96, 239]
color_list = ['y', 'purple', 'g', 'r']
coins  = ['Penny', 'Nickel', 'Dime', 'Quarter']
plt.pie(number, colors = color_list, autopct='%1.1f%%',  labels = coins, radius=0.75);
_images/479cc850874d44c18452c4eb1345ec633539793b698ac416621b12dca18b5e4a.png

Multiple Plots#

plt.figure(figsize = (20, 10))
color_set = ['r--', 'g--', 'b--', 'o--']
for i in range(1,5):
    plt.subplot(2, 2, i)
    plt.plot(df.iloc[:,i-1], color_set[i-1]);
    plt.ylabel(df.columns[i-1],fontsize=15)
    plt.xticks(rotation=30)
    plt.grid()
_images/643cd1dcc431a3d9bc2698cf28072c28e5b0800f254d442264f85bd59742cfb3.png