C++ code generation using a pre-trained model. Create a basic program in C++ that accepts numeric input from the user and maintains a record of previous user input together with time stamps. The record should be sorted in ascending order based on the provided input. Leverage a pre-trained code model to generate the C++ code based on specific prompts.
Please satisfy the following dependencies to train MS-TCT correctly:
- pytorch 2.3
- python 3.10
- transformers=4.33.0
- accelerate>=0.20.3
- peft
- llama-cpp-python
- openai
- tiktoken
- chromadb
- langchain
| Name | Size | Languages | Organization | ⭐/❤️ | Released | Open Source |
|---|---|---|---|---|---|---|
| Codestral | 22B | 80+ (Python, Java, C, C++, etc.) | Mistral AI | 1.21k | 2024-5 | No |
| DeepSeek Coder | 1B/6B/7B/33B | 338 (Python, JS, Java, C/C++, etc.) | DeepSeek | 493 | 2024-1 | Yes |
| Magicoder | 6.7B/7B | 40+ | WizardLM Team | 201 | 2023-12 | Yes |
| StableCode | 3B | Python, JS, Java, C, etc. | Stability AI | 125 | 2023-11 | Yes |
| Phind CodeLlama v2 | 34B | Many | Phind | 502 | 2023-8-27 | Yes |
| WizardCoder-Python | 7/13/34B | Python | WizardLM | 658 | 2023-8 | Yes |
| CodeLlama | 7/13/34B | Many | Meta AI | 1.11k | 2023-8 | Yes |
| WizardCoder | 15B | 80+ | WizardLM | 661 | 2023-6 | Yes |
| replit-glaive | 3B | Python | Replit | 85 | 2023-7 | Yes |
| CodeGeeX | 13B | Python, C++, Java, JavaScript, Go | THUDM | 7.42k | 2023-5-16 | Yes |
| Starcoder | 15B | 80+ | Hugging Face and ServiceNow | 6.60k | 2023-5 | Yes |
| replit-v1-3b | 3B | 20+ | Replit | 698 | 2023-5 | Yes |
| CodeT5+ | 220M, 770M, 2B, 6B, 16B | Ruby, JavaScript, Go, Python, Java, PHP | Salesforce | 2.44k | 2023-5 | Yes |
| IBM Watson Code Assistant | 350M | Many | IBM | 2023-5 | No | |
| SantaCoder | 1.1B | Python, Java, JavaScript | Bigcode | 305 | 2023-4 | Yes |
| InCoder | 1.3B, 6B | 20 | 280 | 2023-4 | Yes | |
| GPT4 | 700B (not sure) | Many | OpenAI | 2023-3 | No |
| Name | Examples | Languages | ⭐/❤️ | Released |
|---|---|---|---|---|
| LiveCodeBench | 4,000+ | Python, Java, JS, C++ | 311 | 2024-5 |
| EvalPlus | 164+ (HumanEval++) 974+ (MBPP++) |
Python | 1.4k | 2023-12 |
| CodeArena | 10,000+ security-focused | Python, C/C++ | 870 | 2023-11 |
| CrossCodeEval | 1.5M parallel samples | 8 languages | 129 | 2024-3 |
| HumanEval-X | 820 | Python, C++, Java, JavaScript, and Go | 53 | 2023-8-27 |
| CoderEval | 328 | Python, Java | 13 | 2023-2-1 |
| DS-1000 | 1000 | Python | 146 | 2022-11-18 |
| MultiPL-E | "HumanEval" and "MBPP" Python benchmarks, and their translations into other 18 languages. | 19 (include C++ and C) | 108 | 2022-8-17 |
| XLCoST | 11198 C++, 11028 Java, 10622 Python, 10735 C#, 9951 JS, 3553 PHP, 574 C | C++, Java, Python, C#, Javascript, PHP, C | 48 | 2022-6-16 |
| MBPP | 974 | Python | 62 | 2021-8-16 |
| HumanEval | 164 | Python | 1.62k | 2021-7-17 |
| APPS | 10000 | Python | 316 | 2021-5-20 |
| CodeXGLUE | 124K | Java, C# | 1.32k | 2021-2-9 |
| SPoC | 18356 Pseudocode to Code | C++ | 10 | 2019 |
- You can try this command to try different prompts and models for C++ code generation:
python Prompt_Engineering.py- You can fine-tune the CodeLlama model with the Lora method by this command:
python Finetune_CodeLlama.pyIn this script, I used Weights & Biases to record the entire fine-tuning training process. The tracking and visualization report of my example can be viewed here.
The fine-tuned CodeLlama model CodeLlama_CPP_FineTuned can be interacted with in Hugging Face 🤗
- You can try this command to verify the generated C++ code snippet:
g++ -o evaluate evaluate.cpp -std=c++11 -Wall -I path/to/catch2 ./testsThe generated code with different prompts, tools, and models can be found here.
I use the Catch2 unit test framework for generated C++ code snippets correctness verification.
We have identified several key factors influencing code generation, However, the precision and applicability of generated code still encounter certain limitations. To address these challenges, it is imperative to explore standardized coding practices, with options like deductive formal verification presenting viable solutions. I have encapsulated the core findings of this code generation project, providing a summary, which can be accessed here.
| generated code | can compile | can exit | input number in ascending order | have record of time stamps |
|---|---|---|---|---|
| CodeLlama-7b-Instruct-prompt1.cpp | + | - | - | - |
| CodeLlama-7b-Instruct-prompt2.cpp | + | + | + | - |
| CodeLlama-7b-Instruct-prompt3.cpp | - | - | - | - |
| CodeLlama-13b-Instruct-prompt1.cpp | + | + | - | - |
| CodeLlama-13b-Instruct-prompt2.cpp | + | + | + | - |
| CodeLlama-13b-Instruct-prompt3.cpp | + | - | - | - |
| CodeLlama-34b-Instruct-prompt1.cpp | + | + | + | - |
| CodeLlama-34b-Instruct-prompt2.cpp | + | + | + | + |
| CodeLlama-34b-Instruct-prompt3.cpp | + | + | + | + |
| CodeLlama-7b-finetuned-prompt1.cpp | + | + | + | + |
| CodeLlama-7b-finetuned-prompt2.cpp | + | + | + | + |
| Langchain_GPT3.5_turbo-prompt1.cpp | + | + | + | + |
| Langchain_GPT3.5_turbo-prompt2.cpp | + | + | + | + |
| Langchain_GPT4-prompt1.cpp | + | + | + | + |
| Langchain_GPT4-prompt2.cpp | + | + | + | + |
