📑 Table of Contents
Python has become the dominant language for data analysis, used by data scientists, analysts, and engineers worldwide. This guide introduces the three essential libraries — NumPy, pandas, and Matplotlib — with practical code examples you can run immediately.
These three libraries form the foundation of Python data analysis. NumPy for numerical computation, pandas for data manipulation, and Matplotlib for visualization — mastering these three covers 80% of common data analysis tasks.
1. Environment Setup
# Install required libraries
pip install numpy pandas matplotlib jupyter
# Start Jupyter Notebook
jupyter notebook
2. NumPy Basics
NumPy is the foundation for numerical computation in Python, providing high-performance multidimensional arrays.
import numpy as np
# Create arrays
arr = np.array([1, 2, 3, 4, 5])
matrix = np.array([[1, 2, 3], [4, 5, 6]])
# Vectorized operations (no loops needed!)
print(arr * 2) # [2, 4, 6, 8, 10]
print(arr.mean()) # 3.0
print(arr.std()) # 1.414
# Statistical operations
data = np.random.randn(1000)
print(f"Mean: {data.mean():.4f}")
print(f"Std: {data.std():.4f}")
3. pandas Fundamentals
pandas provides the DataFrame — the most important data structure for data analysis.
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'age': [28, 35, 42, 31],
'department': ['Engineering', 'Marketing', 'Engineering', 'Sales'],
'salary': [85000, 72000, 95000, 68000]
})
# Basic exploration
print(df.describe()) # Statistical summary
print(df.info()) # Column types & null counts
# Filtering
engineers = df[df['department'] == 'Engineering']
high_earners = df[df['salary'] > 80000]
# Grouping & aggregation
dept_avg = df.groupby('department')['salary'].mean()
print(dept_avg)
# Reading CSV files
# df = pd.read_csv('sales_data.csv')
# df = pd.read_excel('report.xlsx')
4. Data Visualization with Matplotlib
import matplotlib.pyplot as plt
# Line chart
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
revenue = [120, 135, 148, 162, 178, 195]
plt.figure(figsize=(10, 6))
plt.plot(months, revenue, marker='o', linewidth=2, color='#8b5cf6')
plt.title('Monthly Revenue Trend')
plt.xlabel('Month')
plt.ylabel('Revenue ($K)')
plt.grid(True, alpha=0.3)
plt.savefig('revenue_trend.png', dpi=150)
plt.show()
# Bar chart with pandas
df.groupby('department')['salary'].mean().plot(
kind='bar', color=['#10b981', '#8b5cf6', '#f59e0b']
)
plt.title('Average Salary by Department')
plt.tight_layout()
plt.show()
5. Practical Example: Sales Data Analysis
# Real-world analysis workflow
import pandas as pd
import matplotlib.pyplot as plt
# 1. Load data
# df = pd.read_csv('sales_2026.csv')
# 2. Data cleaning
# df = df.dropna()
# df['date'] = pd.to_datetime(df['date'])
# 3. Analysis
# monthly = df.resample('M', on='date')['amount'].sum()
# top_products = df.groupby('product')['amount'].sum().nlargest(10)
# 4. Visualization
# fig, axes = plt.subplots(1, 2, figsize=(14, 6))
# monthly.plot(ax=axes[0], title='Monthly Sales')
# top_products.plot(kind='barh', ax=axes[1], title='Top 10 Products')
# plt.tight_layout()
# plt.savefig('sales_report.png')
print("Analysis complete! 📊")
Python data analysis starts with these three libraries. Begin with small datasets, practice the patterns, and you'll be analyzing real-world data in no time.