9.6. Data Analysis with AI¶
Data analysis workflows typically involve multiple tools: a spreadsheet application for viewing CSVs, a code editor for writing scripts, a terminal for running them, and another editor for writing up findings. Backend.AI GO's Cowork menu brings all of this into one autonomous workflow: describe your analysis goal, and the agent reads your data, runs the calculations, and produces a report—entirely on your local machine.
This guide walks through setting up Cowork for data analysis and provides concrete examples using CSV files, JSON data, and Python-based processing.
Why Use Cowork for Data Analysis?¶
| Traditional workflow | Cowork workflow |
|---|---|
| Open CSV in spreadsheet, write formulas manually | Agent parses CSV, performs calculations autonomously |
| Write and run Python scripts in separate tools | Agent writes and executes Python in a sandboxed environment |
| Copy results into a report by hand | Agent writes the final report directly to your file system |
| Repeat for every new dataset | Reuse global instructions for consistent formatting |
A key advantage is privacy: your data never leaves your machine. Analysis runs entirely through the local model and local tool execution—no cloud uploads required.
Prerequisites¶
Before you begin, make sure you have:
- Backend.AI GO installed and running
- At least one model loaded — a capable model (7B+ parameters recommended) for best analysis quality
- Data files on your local file system (CSV, JSON, TXT, or other text formats)
Built-in Data Analysis Tools¶
The Cowork menu provides several built-in tools for data analysis workflows. No plugins or extensions are required.
| Tool | Description | Typical Use |
|---|---|---|
csv_reader | Parse CSV files with configurable delimiters, column selection, and row limits | Load sales data, filter columns, preview structure |
json_query | Query structured JSON data using JSONPath expressions | Extract fields from API exports, filter nested objects |
run_python | Execute Python scripts in a sandboxed environment | pandas analysis, statistics, data processing |
read_file | Read any text file (CSV, JSON, TXT, logs) | Load raw data files |
write_file | Save results, reports, and processed data | Export analysis output |
search_content | Search across files using regex patterns | Find specific entries in log files |
calculator | Evaluate mathematical expressions | Quick spot calculations |
Setting Up for Data Analysis¶
Step 1: Grant Folder Access¶
The agent needs permission to read your data files and (optionally) write results.
-
Click the Folder Permissions toggle in the task input area at the bottom of the Cowork page.
-
Click Add Folder and select the folder containing your data files.
-
Choose a permission level:
Permission Levels
- Read Only: The agent can read files but cannot create or modify them. Use this when you want the agent to explore and analyze without making changes.
- Read & Write: The agent can read existing files and create or modify files. Use this when you want the agent to save analysis results, generated reports, or processed datasets.
- Full Access: The agent can also delete and move files. Use with caution.
A common pattern is to add your raw data folder as Read Only and a separate results/ folder as Read & Write.
Step 2: Configure Global Instructions (Optional)¶
Global Instructions let you set persistent preferences that apply to all Cowork tasks.
-
Open the Settings drawer (gear icon in the header).
-
Go to the Instructions tab.
-
Enter analysis preferences such as:
-
Enable the instructions and close the drawer.
These instructions will apply to all future Cowork tasks until you change them.
Step-by-Step Example: Analyzing Sales Data¶
This example demonstrates a complete data analysis workflow using a CSV file.
Scenario: You have a file sales_2024.csv with columns date, product_category, region, units_sold, and revenue. You want to identify top-performing categories and calculate month-over-month growth.
Step 1: Open Cowork¶
Click the Cowork icon in the sidebar to open the Cowork interface.
Step 2: Set Up Folder Permissions¶
-
Click the Folder Permissions toggle.
-
Add the folder containing
sales_2024.csvwith Read Only permission. -
Add (or create) a
results/folder with Read & Write permission for the output.
Step 3: Enter the Analysis Task¶
In the task input at the bottom of the page, describe what you want:
Analyze the file
sales_2024.csvin my data folder. Give me a summary of total revenue by product category, identify the top 5 performing months, and calculate month-over-month growth rates. Save the analysis assales_analysis.mdin the results folder.
Press Enter (or click Start Task) to begin.
Step 4: Watch the Agent Work¶
The agent breaks the task into steps and executes them autonomously. You can watch the progress in the step viewer:
-
Inspect structure — The agent uses
csv_readerto examine the file's columns, data types, and a few sample rows to understand the schema. -
Load data — It uses
read_fileto load the full dataset. -
Run analysis — The agent writes and executes a Python script with
run_python:- Parses dates and groups data by month and category
- Calculates total revenue per category
- Ranks months by revenue
- Computes month-over-month growth rates
-
Write report — The agent formats the results as a Markdown document and uses
write_fileto savesales_analysis.md. -
Present summary — A final summary with key findings is shown in the conversation.
Tool Approval
The first time the agent uses run_python or write_file, you may be prompted to approve the tool. You can approve once or for the entire session. Read operations within permitted folders are auto-approved by default.
Step 5: Steer the Agent Mid-Task¶
If you want to adjust the analysis while the agent is running, use the Steering input:
- "Also break down revenue by region within each category"
- "Add a visualization of month-over-month growth as an ASCII chart"
- "Focus only on the Electronics and Software categories"
The agent incorporates your guidance without restarting from scratch.
Step 6: Follow Up¶
After the initial analysis completes, you can continue in the same session:
Create a bar chart of revenue by category using matplotlib and save it as
revenue_by_category.pngin the results folder.
The agent retains the context from the previous analysis and continues from where it left off.
Example: JSON Data Analysis¶
For structured JSON data—such as API exports or configuration files—use json_query to extract specific fields before processing.
Scenario: You have an api_export.json file with a nested structure and want to extract and analyze specific metrics.
-
Add the folder containing
api_export.jsonwith Read Only permission. -
Enter a task:
Load
api_export.json. Usejson_querywith JSONPath$.data[*].metrics.revenueto extract all revenue values. Calculate the total, average, and top 10 entries by revenue. Save the results asrevenue_report.md.
The agent will:
- Use
json_queryto extract the revenue field from each record in the array - Use
run_pythonto calculate statistics on the extracted values - Use
write_fileto save the formatted report
JSONPath Syntax
json_query uses standard JSONPath expressions:
$.field— top-level field$.array[*].field— field from every element in an array$.array[?(@.status == "active")]— filter by condition$..field— recursive search for a field at any depth
Python Execution Details¶
The run_python tool runs Python scripts in a sandboxed environment. Understanding its constraints helps you write effective analysis tasks.
Available Libraries¶
The following libraries are available by default:
- Standard library:
math,statistics,json,csv,collections,itertools,functools,datetime,re,io,string,decimal,fractions,random - Data analysis:
pandas,numpy - Visualization:
matplotlib
Sandboxing Restrictions¶
For security, a Python import hook blocks dangerous modules at the top level. The following modules are blocked:
os,subprocess,shutil— command execution and file system manipulationsocket,http,urllib,ftplib,smtplib,telnetlib— network accesspickle,shelve,marshal— unsafe deserializationctypes— native code executionmultiprocessing,signal,resource— process and system managementimportlib,pkgutil,zipimport— import system manipulationtempfile,glob,pathlib— file system accesscode,codeop,compileall— dynamic code compilationxmlrpc— remote procedure calls
Modules that are safe for computation (math, json, csv, re, datetime, random, collections, itertools, typing, etc.) are allowed. The sys, io, and threading modules are also available because they are required internally by many standard library modules.
Note that this is an application-level defense via import hooks, not kernel-level sandboxing. It provides a strong default barrier against accidental or agent-generated dangerous code.
If your script needs file I/O, ask the agent to use read_file and write_file tools to load and save data, then pass the content into the Python script as a variable.
Timeout¶
Scripts have a configurable timeout (default: 30 seconds, maximum: 300 seconds). For large datasets, ask the agent to process data in chunks or request a higher timeout in your task description:
Run the analysis with a 120-second timeout—the dataset is large.
Capturing Results¶
Results are captured via:
print()output — anything printed to stdout is returnedresultvariable — assign your final value to a variable namedresultand it will be included in the output
Tips for Data Analysis¶
-
Start with a structural overview. Ask the agent to use
csv_readerfirst to show column names, data types, and a few sample rows before diving into full analysis. This helps catch encoding issues or unexpected formats early. -
Pre-filter large JSON files. For large JSON datasets, use
json_queryto extract only the fields you need before passing data torun_python. This reduces memory usage and speeds up analysis. -
Use folder-specific instructions. Attach instructions to your data folder describing the schema: column meanings, units, known data quality issues. The agent will apply this context automatically.
-
Chain tasks iteratively. Start with exploration, then analysis, then reporting. Each step builds on the last without losing context:
- "Describe the structure and content of
sales_2024.csv" - "Now analyze revenue trends by category"
- "Generate a formatted report from the analysis"
- "Describe the structure and content of
-
Use Read Only for raw data. Add your original data files with Read Only permission to prevent accidental modification. Use a separate folder with Read & Write for outputs.
-
Include column context in your prompt. If the agent doesn't know what your columns mean, tell it: "The
revcolumn represents monthly revenue in USD. Thecat_idcolumn maps to product categories defined incategories.json."
Troubleshooting¶
| Problem | Solution |
|---|---|
csv_reader reports encoding errors | Specify the encoding in your task: "The CSV uses Latin-1 encoding". Common encodings: utf-8, latin-1, cp1252. |
run_python times out | Break the analysis into smaller steps, or ask for a higher timeout. For very large files, ask the agent to sample the data first. |
json_query returns no results | Check your JSONPath expression. Ask the agent to first run json_query with $ (root) to show the top-level structure, then refine the path. |
| Agent modifies the wrong files | Use Read Only permission for source data. Only grant Read & Write to your output folder. |
| Analysis results are inconsistent | Add explicit formatting instructions in Global Instructions (e.g., "Always round to 2 decimal places", "Use ISO 8601 date format"). |
run_python fails with import error | The required library may not be available. Ask the agent to use standard library alternatives, or use csv_reader and json_query directly instead of Python imports. |
Related Pages¶
- Cowork Overview — How the Cowork agent works (ReAct reasoning, autonomous execution)
- Tools & Permissions — Full reference for built-in tools and permission settings
- Research & Summarization — Using Cowork for research workflows
- Running Models — Load and manage local models