Technical Consideration
Base Model Selection
The following coding-focused models will be tested to find the best performed one:
CodeLlama-70B (Meta)
DeepSeek-Coder
Mistral Codestral
Qwen 2.5 Coder
Vertical Approach
Targeted Fine-Tuning
Use Solidity/EVM-specific datasets (high-quality GitHub contracts + security audit reports)
Incorporate vulnerability pattern datasets
Include other programming language datasets (Rust, Move, etc.)
Knowledge Enhancement
Integrate OpenZeppelin standard libraries and EIP documentation as RAG (Retrieval-Augmented Generation) knowledge sources
Data Strategy
High-Quality Data Sources
Positive Samples
Audited, well-known protocol code (e.g., Uniswap V4, Compound)
Standard implementations from OpenZeppelin Contracts
Reference implementations from EIP proposals (ERC-20/721/4337)
Negative Samples
Smart contract vulnerability case libraries (e.g., DASP Top 10)
Reconstructed code from DeFi attacks (The DAO, Parity Multisig, etc.)
Edge cases generated by fuzz testing
Priority Smart Contract Types for Training
Token Contracts
Payment Contracts
Governance Contracts
Identity Contracts
Staking Contracts
Lending Contracts
Exchange Contracts
Bridge Contracts
Multisig Contracts
Prediction Market Contracts
Last updated