Python Data Analysis for Beginners: pandas, NumPy & Matplotlib

📑 Table of Contents

Environment Setup
NumPy Basics
pandas Fundamentals
Data Visualization with Matplotlib
Practical Example: Sales Data Analysis

Python has become the dominant language for data analysis, used by data scientists, analysts, and engineers worldwide. This guide introduces the three essential libraries — NumPy, pandas, and Matplotlib — with practical code examples you can run immediately.

💡 Key Takeaway

These three libraries form the foundation of Python data analysis. NumPy for numerical computation, pandas for data manipulation, and Matplotlib for visualization — mastering these three covers 80% of common data analysis tasks.

1. Environment Setup

# Install required libraries
pip install numpy pandas matplotlib jupyter

# Start Jupyter Notebook
jupyter notebook

2. NumPy Basics

NumPy is the foundation for numerical computation in Python, providing high-performance multidimensional arrays.

import numpy as np

# Create arrays
arr = np.array([1, 2, 3, 4, 5])
matrix = np.array([[1, 2, 3], [4, 5, 6]])

# Vectorized operations (no loops needed!)
print(arr * 2)        # [2, 4, 6, 8, 10]
print(arr.mean())     # 3.0
print(arr.std())      # 1.414

# Statistical operations
data = np.random.randn(1000)
print(f"Mean: {data.mean():.4f}")
print(f"Std:  {data.std():.4f}")

3. pandas Fundamentals

pandas provides the DataFrame — the most important data structure for data analysis.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'age': [28, 35, 42, 31],
    'department': ['Engineering', 'Marketing', 'Engineering', 'Sales'],
    'salary': [85000, 72000, 95000, 68000]
})

# Basic exploration
print(df.describe())  # Statistical summary
print(df.info())      # Column types & null counts

# Filtering
engineers = df[df['department'] == 'Engineering']
high_earners = df[df['salary'] > 80000]

# Grouping & aggregation
dept_avg = df.groupby('department')['salary'].mean()
print(dept_avg)

# Reading CSV files
# df = pd.read_csv('sales_data.csv')
# df = pd.read_excel('report.xlsx')

4. Data Visualization with Matplotlib

import matplotlib.pyplot as plt

# Line chart
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
revenue = [120, 135, 148, 162, 178, 195]

plt.figure(figsize=(10, 6))
plt.plot(months, revenue, marker='o', linewidth=2, color='#8b5cf6')
plt.title('Monthly Revenue Trend')
plt.xlabel('Month')
plt.ylabel('Revenue ($K)')
plt.grid(True, alpha=0.3)
plt.savefig('revenue_trend.png', dpi=150)
plt.show()

# Bar chart with pandas
df.groupby('department')['salary'].mean().plot(
    kind='bar', color=['#10b981', '#8b5cf6', '#f59e0b']
)
plt.title('Average Salary by Department')
plt.tight_layout()
plt.show()

5. Practical Example: Sales Data Analysis

# Real-world analysis workflow
import pandas as pd
import matplotlib.pyplot as plt

# 1. Load data
# df = pd.read_csv('sales_2026.csv')

# 2. Data cleaning
# df = df.dropna()
# df['date'] = pd.to_datetime(df['date'])

# 3. Analysis
# monthly = df.resample('M', on='date')['amount'].sum()
# top_products = df.groupby('product')['amount'].sum().nlargest(10)

# 4. Visualization
# fig, axes = plt.subplots(1, 2, figsize=(14, 6))
# monthly.plot(ax=axes[0], title='Monthly Sales')
# top_products.plot(kind='barh', ax=axes[1], title='Top 10 Products')
# plt.tight_layout()
# plt.savefig('sales_report.png')

print("Analysis complete! 📊")

Python data analysis starts with these three libraries. Begin with small datasets, practice the patterns, and you'll be analyzing real-world data in no time.