Shell Scripting & CLI Tools: From the Command Line to Real Automation
Section 10 of 18

Essential CLI Tools for Shell Scripting and Automation

Beyond the text-processing trinity, Bash scripts regularly lean on a whole ecosystem of small, focused command-line utilities. Each one does one thing well. Here's a tour of the essential ones, with practical examples of where they actually show up.

find: The File Locator

find is far more powerful than most people realize. Most folks use it for find . -name "*.txt" and stop there — but it can traverse directory trees filtering by type, size, modification time, permissions, and more, and then execute commands on whatever it finds:

# Basic usage
find /path -name "*.txt"           # Find .txt files
find . -type f                     # Regular files only
find . -type d                     # Directories only
find . -name "*.log" -mtime -7     # .log files modified in last 7 days
find . -size +10M                  # Files larger than 10MB
find . -name "*.tmp" -delete       # Find and delete temp files
find . -perm 644                   # Files with exactly these permissions

# Execute a command on each found file
find . -name "*.jpg" -exec convert {} {}.webp \;

# Use with xargs for efficiency
find . -name "*.log" | xargs grep "ERROR"

# Null-delimited for filenames with spaces
find . -name "*.txt" -print0 | xargs -0 wc -l

# Find files modified in the last 24 hours
find . -type f -mtime -1

# Find files modified since a specific date (GNU find)
find . -type f -newermt '2024-01-01'

# Find empty files and directories
find . -empty

# Find files and print with details
find . -name "*.py" -exec ls -lh {} \;

The -print0 and xargs -0 pair is the safe way to handle filenames that contain spaces or special characters. -print0 separates results with null bytes instead of newlines, and xargs -0 reads null-separated input. It's a bit verbose, but you'll be glad you used it the first time a filename with a space breaks your script.

xargs: Turning stdin into Arguments

xargs reads items from stdin and passes them as arguments to another command. Without it, you'd need a full loop to process each item individually. With it, you can do the same thing in a one-liner:

# Delete all .tmp files
find . -name "*.tmp" | xargs rm

# Process in batches of 5
echo "a b c d e f g h i j" | xargs -n 5 echo

# Run commands in parallel (4 processes at once)
find . -name "*.png" -print0 | xargs -0 -P 4 -I {} convert {} {}.jpg

# Build commands with placeholder
cat urls.txt | xargs -I {} curl -O {}

# Preview what xargs would do (add echo)
find . -name "*.log" | xargs echo rm

-P 4 runs 4 processes in parallel — genuinely useful when you're doing something like bulk image conversion. -I {} lets you specify exactly where in the command the argument should go, rather than always appending it at the end. And always pair -print0 with find and -0 with xargs when filenames are involved.

sort and uniq: Ordering and De-duplicating

sort file.txt                  # Alphabetical sort
sort -n numbers.txt            # Numeric sort
sort -rn numbers.txt           # Reverse numeric sort
sort -k2 data.txt              # Sort by second field
sort -k2 -k3n data.txt         # Sort by second field, then third numerically
sort -t',' -k2 data.csv        # Sort CSV by second column
sort -u file.txt               # Sort and remove duplicates

uniq file.txt                  # Remove consecutive duplicates (input must be sorted)
uniq -c file.txt               # Count occurrences
uniq -d file.txt               # Show only duplicates
uniq -u file.txt               # Show only unique lines

The canonical "frequency counter" pattern comes up constantly:

cat data.txt | sort | uniq -c | sort -rn

It works because sort groups identical lines together, uniq -c counts consecutive duplicates, and the second sort -rn orders by frequency, highest first. Once you've used this pattern a few times, it becomes muscle memory.

cut: Extracting Fields

cut extracts specific fields or character ranges from text — simpler than awk when you just need to grab a column:

# Extract fields by delimiter
cut -d: -f1 /etc/passwd            # First field from colon-separated file
cut -d, -f1,3 data.csv             # Fields 1 and 3 from CSV
cut -d: -f1-3 /etc/passwd          # Fields 1 through 3

{% element elem_flashcard_9_1_44915a %}


{% element elem_flashcard_9_0_2110c3 %}


# Extract by character position
cut -c1-10 file.txt                # First 10 characters of each line
cut -c-5 file.txt                  # Characters up to position 5

tr: Character Translation

tr translates or deletes characters in a stream — character by character, no regex needed:

echo "hello world" | tr 'a-z' 'A-Z'     # Uppercase
echo "hello world" | tr -d 'aeiou'       # Delete vowels
echo "hello   world" | tr -s ' '         # Squeeze repeated spaces
echo "one:two:three" | tr ':' '\n'        # Replace colons with newlines
cat file.txt | tr -d '\r'                 # Remove Windows carriage returns

That last one — removing Windows carriage returns — is something you'll need more often than you'd expect. Files transferred from Windows have \r\n line endings; Linux tools only expect \n. Strange characters at the end of lines, mysterious command failures, variables with garbage values that shouldn't have garbage values — often a tr -d '\r' fixes the whole thing.

head and tail: Top and Bottom of Files

head -5 file.txt              # First 5 lines
tail -5 file.txt              # Last 5 lines
tail -f /var/log/syslog       # Follow a file (watch for new lines)
tail -f app.log | grep "ERROR" # Watch log in real-time, filter errors
head -1 file.txt              # Just the first line (e.g., CSV header)
tail -n +2 file.txt           # Everything except the first line

tail -f is an essential debugging tool — it keeps reading and displaying new content as it's appended to a file. You'll use it constantly when watching server logs during a deployment or investigating a running process.

date: Time Handling in Scripts

date                          # Current date and time
date +%Y-%m-%d                # Just the date (2024-01-15)
date +%Y%m%d_%H%M%S           # Compact timestamp (20240115_143022)
date +%s                      # Unix timestamp (seconds since epoch)
date -d "2024-01-01"          # Parse a date string (GNU date)
date -d "yesterday"           # Yesterday's date
date -d "+7 days"             # One week from now

Timestamps are incredibly useful in scripts for naming log files and backups:

TIMESTAMP=$(date +%Y%m%d_%H%M%S)
backup_file="backup_${TIMESTAMP}.tar.gz"

Every backup file gets a unique name automatically. Simple and effective.

wc: Counting Things

wc -l file.txt     # Count lines
wc -w file.txt     # Count words
wc -c file.txt     # Count bytes
wc -m file.txt     # Count characters

# Practical: count lines in script output
ls | wc -l         # How many files are here?
cat /etc/passwd | wc -l   # How many user accounts?

curl and jq: HTTP and JSON on the Command Line

curl is the command-line HTTP client, and jq is the command-line JSON processor. Together, they let you interact with APIs directly from your scripts:

# Basic HTTP GET
curl https://api.example.com/data

# POST with JSON body
curl -X POST \
     -H "Content-Type: application/json" \
     -d '{"key": "value"}' \
     https://api.example.com/endpoint

# Follow redirects, save to file
curl -L -o output.html https://example.com

# Download with progress bar
curl -# -O https://example.com/largefile.zip

# With authentication
curl -u username:password https://api.example.com/protected

# Check if URL is reachable (silent, just exit code)
curl -s -f https://api.example.com/health > /dev/null

jq for processing JSON:

# Pretty-print JSON
curl https://api.example.com/data | jq '.'

# Extract a field
curl https://api.example.com/user/1 | jq '.name'

# Filter an array
curl https://api.example.com/users | jq '.[] | select(.active == true) | .email'

# Create a new JSON object
jq -n '{name: "Alice", age: 30}'

# Use in a script
name=$(curl -s https://api.example.com/user/1 | jq -r '.name')
echo "User: $name"

The -r flag in jq outputs "raw" strings without JSON quotes — essential when you're using jq output in shell variables. Without it, your variable ends up with literal quote characters in it, which causes all kinds of subtle breakage downstream.