Open Source
OpenRTLSet: A Fully Open-Source Dataset for Large Language Model-based Verilog Module Design
OpenRTLSet is a newly released open-source dataset comprising over 131,000 Verilog code samples, including contributions from GitHub, VHDL translations, and C/C++ translations, aimed at enhancing hardware design research. It supports fine-tuning of language models like Qwen and Granite with paired natural language descriptions generated by the DeepSeek-R1 model, while also exploring various quantization techniques and performance metrics across model sizes ranging from 7B to 32B parameters. This dataset provides a significant resource for practitioners in AI and hardware design, facilitating advancements in Verilog code generation and promoting open-source methodologies in the field.
verilogdatasethardware-design