9.6. Data Analysis with AI¶

Data analysis workflows typically involve multiple tools: a spreadsheet application for viewing CSVs, a code editor for writing scripts, a terminal for running them, and another editor for writing up findings. Backend.AI GO's Cowork menu brings all of this into one autonomous workflow: describe your analysis goal, and the agent reads your data, runs the calculations, and produces a report—entirely on your local machine.

This guide walks through setting up Cowork for data analysis and provides concrete examples using CSV files, JSON data, and Python-based processing.

Why Use Cowork for Data Analysis?¶

Traditional workflow	Cowork workflow
Open CSV in spreadsheet, write formulas manually	Agent parses CSV, performs calculations autonomously
Write and run Python scripts in separate tools	Agent writes and executes Python in a sandboxed environment
Copy results into a report by hand	Agent writes the final report directly to your file system
Repeat for every new dataset	Reuse global instructions for consistent formatting

A key advantage is privacy: your data never leaves your machine. Analysis runs entirely through the local model and local tool execution—no cloud uploads required.

Prerequisites¶

Before you begin, make sure you have:

Backend.AI GO installed and running
At least one model loaded — a capable model (7B+ parameters recommended) for best analysis quality
Data files on your local file system (CSV, JSON, TXT, or other text formats)

Built-in Data Analysis Tools¶

The Cowork menu provides several built-in tools for data analysis workflows. No plugins or extensions are required.

Tool	Description	Typical Use
`csv_reader`	Parse CSV files with configurable delimiters, column selection, and row limits	Load sales data, filter columns, preview structure
`json_query`	Query structured JSON data using JSONPath expressions	Extract fields from API exports, filter nested objects
`run_python`	Execute Python scripts in a sandboxed environment	pandas analysis, statistics, data processing
`read_file`	Read any text file (CSV, JSON, TXT, logs)	Load raw data files
`write_file`	Save results, reports, and processed data	Export analysis output
`search_content`	Search across files using regex patterns	Find specific entries in log files
`calculator`	Evaluate mathematical expressions	Quick spot calculations

Setting Up for Data Analysis¶

Step 1: Grant Folder Access¶

The agent needs permission to read your data files and (optionally) write results.

Click the Folder Permissions toggle in the task input area at the bottom of the Cowork page.
Click Add Folder and select the folder containing your data files.
Choose a permission level:

Permission Levels

Read Only: The agent can read files but cannot create or modify them. Use this when you want the agent to explore and analyze without making changes.
Read & Write: The agent can read existing files and create or modify files. Use this when you want the agent to save analysis results, generated reports, or processed datasets.
Full Access: The agent can also delete and move files. Use with caution.

A common pattern is to add your raw data folder as Read Only and a separate results/ folder as Read & Write.

Step 2: Configure Global Instructions (Optional)¶

Global Instructions let you set persistent preferences that apply to all Cowork tasks.

Open the Settings drawer (gear icon in the header).
Go to the Instructions tab.

Enter analysis preferences such as:

Use Python with pandas for data manipulation.
Present all numbers with 2 decimal places.
Always include a statistical summary (count, mean, min, max, std) for numeric columns.
Save output files to the 'results' subfolder.
Use Markdown tables for tabular output.

Enable the instructions and close the drawer.

These instructions will apply to all future Cowork tasks until you change them.

Step-by-Step Example: Analyzing Sales Data¶

This example demonstrates a complete data analysis workflow using a CSV file.

Scenario: You have a file sales_2024.csv with columns date, product_category, region, units_sold, and revenue. You want to identify top-performing categories and calculate month-over-month growth.

Step 1: Open Cowork¶

Click the Cowork icon in the sidebar to open the Cowork interface.

Step 2: Set Up Folder Permissions¶

Click the Folder Permissions toggle.
Add the folder containing sales_2024.csv with Read Only permission.
Add (or create) a results/ folder with Read & Write permission for the output.

Step 3: Enter the Analysis Task¶

In the task input at the bottom of the page, describe what you want:

Analyze the file sales_2024.csv in my data folder. Give me a summary of total revenue by product category, identify the top 5 performing months, and calculate month-over-month growth rates. Save the analysis as sales_analysis.md in the results folder.

Press Enter (or click Start Task) to begin.

Step 4: Watch the Agent Work¶

The agent breaks the task into steps and executes them autonomously. You can watch the progress in the step viewer:

Inspect structure — The agent uses csv_reader to examine the file's columns, data types, and a few sample rows to understand the schema.
Load data — It uses read_file to load the full dataset.
Run analysis — The agent writes and executes a Python script with run_python:
- Parses dates and groups data by month and category
- Calculates total revenue per category
- Ranks months by revenue
- Computes month-over-month growth rates
Write report — The agent formats the results as a Markdown document and uses write_file to save sales_analysis.md.
Present summary — A final summary with key findings is shown in the conversation.

Tool Approval

The first time the agent uses run_python or write_file, you may be prompted to approve the tool. You can approve once or for the entire session. Read operations within permitted folders are auto-approved by default.

Step 5: Steer the Agent Mid-Task¶

If you want to adjust the analysis while the agent is running, use the Steering input:

"Also break down revenue by region within each category"
"Add a visualization of month-over-month growth as an ASCII chart"
"Focus only on the Electronics and Software categories"

The agent incorporates your guidance without restarting from scratch.

Step 6: Follow Up¶

After the initial analysis completes, you can continue in the same session:

Create a bar chart of revenue by category using matplotlib and save it as revenue_by_category.png in the results folder.

The agent retains the context from the previous analysis and continues from where it left off.

Example: JSON Data Analysis¶

For structured JSON data—such as API exports or configuration files—use json_query to extract specific fields before processing.

Scenario: You have an api_export.json file with a nested structure and want to extract and analyze specific metrics.

Add the folder containing api_export.json with Read Only permission.
Enter a task:

Load api_export.json. Use json_query with JSONPath $.data[*].metrics.revenue to extract all revenue values. Calculate the total, average, and top 10 entries by revenue. Save the results as revenue_report.md.

The agent will:

Use json_query to extract the revenue field from each record in the array
Use run_python to calculate statistics on the extracted values
Use write_file to save the formatted report

JSONPath Syntax

json_query uses standard JSONPath expressions:

$.field — top-level field
$.array[*].field — field from every element in an array
$.array[?(@.status == "active")] — filter by condition
$..field — recursive search for a field at any depth

Python Execution Details¶

The run_python tool runs Python scripts in a sandboxed environment. Understanding its constraints helps you write effective analysis tasks.

Available Libraries¶

The following libraries are available by default:

Standard library: math, statistics, json, csv, collections, itertools, functools, datetime, re, io, string, decimal, fractions, random
Data analysis: pandas, numpy
Visualization: matplotlib

Sandboxing Restrictions¶

For security, a Python import hook blocks dangerous modules at the top level. The following modules are blocked:

os, subprocess, shutil — command execution and file system manipulation
socket, http, urllib, ftplib, smtplib, telnetlib — network access
pickle, shelve, marshal — unsafe deserialization
ctypes — native code execution
multiprocessing, signal, resource — process and system management
importlib, pkgutil, zipimport — import system manipulation
tempfile, glob, pathlib — file system access
code, codeop, compileall — dynamic code compilation
xmlrpc — remote procedure calls

Modules that are safe for computation (math, json, csv, re, datetime, random, collections, itertools, typing, etc.) are allowed. The sys, io, and threading modules are also available because they are required internally by many standard library modules.

Note that this is an application-level defense via import hooks, not kernel-level sandboxing. It provides a strong default barrier against accidental or agent-generated dangerous code.

If your script needs file I/O, ask the agent to use read_file and write_file tools to load and save data, then pass the content into the Python script as a variable.

Timeout¶

Scripts have a configurable timeout (default: 30 seconds, maximum: 300 seconds). For large datasets, ask the agent to process data in chunks or request a higher timeout in your task description:

Run the analysis with a 120-second timeout—the dataset is large.

Capturing Results¶

Results are captured via:

print() output — anything printed to stdout is returned
result variable — assign your final value to a variable named result and it will be included in the output

Tips for Data Analysis¶

Start with a structural overview. Ask the agent to use csv_reader first to show column names, data types, and a few sample rows before diving into full analysis. This helps catch encoding issues or unexpected formats early.
Pre-filter large JSON files. For large JSON datasets, use json_query to extract only the fields you need before passing data to run_python. This reduces memory usage and speeds up analysis.
Use folder-specific instructions. Attach instructions to your data folder describing the schema: column meanings, units, known data quality issues. The agent will apply this context automatically.
Chain tasks iteratively. Start with exploration, then analysis, then reporting. Each step builds on the last without losing context:
1. "Describe the structure and content of sales_2024.csv"
2. "Now analyze revenue trends by category"
3. "Generate a formatted report from the analysis"
Use Read Only for raw data. Add your original data files with Read Only permission to prevent accidental modification. Use a separate folder with Read & Write for outputs.
Include column context in your prompt. If the agent doesn't know what your columns mean, tell it: "The rev column represents monthly revenue in USD. The cat_id column maps to product categories defined in categories.json."

Troubleshooting¶

Problem	Solution
`csv_reader` reports encoding errors	Specify the encoding in your task: "The CSV uses Latin-1 encoding". Common encodings: `utf-8`, `latin-1`, `cp1252`.
`run_python` times out	Break the analysis into smaller steps, or ask for a higher timeout. For very large files, ask the agent to sample the data first.
`json_query` returns no results	Check your JSONPath expression. Ask the agent to first run `json_query` with `$` (root) to show the top-level structure, then refine the path.
Agent modifies the wrong files	Use Read Only permission for source data. Only grant Read & Write to your output folder.
Analysis results are inconsistent	Add explicit formatting instructions in Global Instructions (e.g., "Always round to 2 decimal places", "Use ISO 8601 date format").
`run_python` fails with import error	The required library may not be available. Ask the agent to use standard library alternatives, or use `csv_reader` and `json_query` directly instead of Python imports.

Cowork Overview — How the Cowork agent works (ReAct reasoning, autonomous execution)
Tools & Permissions — Full reference for built-in tools and permission settings
Research & Summarization — Using Cowork for research workflows
Running Models — Load and manage local models