About me
I am Zirui Huang , I go by Ray. I am currently a Data Scientist of Cambridge Systematics, Inc., a premier transportation consulting company.
I specialize in bridging the gap between robust data engineering and analytical science. I have experience transforming complex, large-scale data challenges—primarily within traffic and transit systems—into high-performance, intuitive systems. Using a modern tech stack, I build scalable ELT pipelines that turn raw records into production-ready assets.
My focus is on architecting reliable infrastructure to power everything from BI dashboards to Machine Learning models. I am particularly interested in the intersection of data and AI, developing RAG-based chatbots and MCP servers for intelligent data interaction. I ensure complex backend workflows deliver clear, actionable strategies that bring data to life.
Skillset
Data Engineering & Infrastructure
-
Pipeline Orchestration
Dagster, Airflow
-
Extraction
dlt (data load tool), Fivetran
-
Transformation
dbt
-
Storage
DuckDB, PostgresSQL
-
Cloud Platforms
GCP, AWS, MotherDuck, Snowflake
Data Science & Analytics
-
BI Dashboards
Power BI, Tableau, Evidence.env
-
Web Interface
Streamlit
-
Machine Learning
Scikit-learn, PyTorch
Generative AI & Agents
-
LLM Integration
MCP servers, LangChain, Hugging Face
-
Vector Databases
Chroma, Pinecone
Development & Workflow
-
Version Control
Git, GitHub, GitHub Actions (CI/CD)
-
DevOps
Virtual Environments (uv), Docker