CodeSapper - pre-emptive analysis for a hassle free build !

Thanks for stopping by ! to get a very basic overview of what the site is about, kindly visit https://takingstock.github.io/codesapper.ai/ To add a few more basic details, the project was born out of all the pains i went through reviewing code, fixing build fails and ensuring test coverage. If you have gone through the architecture diagram then you probably have an idea of what the critical components in this project are going to be, but let me summarize

Steps to use the app

currently i have tested the app for python and javascript and there are known issues besides the fact that you will have to dump all your python code into code_db/python and js code into code_db/js ..apologies for that .. working on making it more real world :)
you only need to do 2 things ( though we have 4 steps :) )
1. change the ENV variables specified in the .github/workflows/analyze_changes.yml; specifically NETWORKX_S3 where you give the name of your S3 bucket .. also please configure remote access to the S3 bucket from whichever env u use .. i dont think i can cover that part programmatically !
2. please ensure your git repo has actions enabled
3. for the LLM i am currently using a groq API ( please note this is not elon musks grok ) that implements LLama3-70B ..and its lightening fast ..but if you already have an API that serves LLama / claude / openai , go for it ( u will have to implement a custom method for these )
4. once u have the above setup, simply make changes to the code base, check in and once the workflow ends, search for the keywords "BASE CHANGE IMPACT" and "DOWNSTREAM" in the logs .. sorry, i will provide a cleaner way to access this but for now, this will give u an end to end idea !

Summary of critical components

AST parsers

every language has its own idiosyncracies when it comes to defining methods, variables, package declarations etc. Though LLMs are kind of ok when it comes to parsing these details in a language agnostic format, given the criticality of getting this ~100% correct, we will be relying on language specific AST parsers. Currently i have already written the parsers for python and javascript and a parser for Java is in the pipeline
once we have parsers for python, js and java, the idea will be to test them on scale with open source projects that use these languages. Thats probably the only way we can assure our community that all the details we seek are being extracted
each parser sticks to a predefined format thats available in local_db/python and local_db/js ; these will be called <language_extn>_graph_entity_summary.json and stored in a data repo ( S3 , for now )

Graph operations

once the above json's are generated, an algorithm defined in the respective utils/ast_utils folder will find out the "usage of each of these methods / calls to these methods" in the local file
then the match_inter_service_calls method defined in the utils folder starts discovery of inter module / service calls
post updating the respective input json's we invoke the graph code defined in utils/graph_utils/networkx ( the choice of networkX was driven by 2 considerations a) min memory foot print b) 0 setup for anyone using this project ; though i feel neo4j is a much more versatile platform for graphs )

LLM ops

since the whole process is triggered by a code commit into git, we use the diff file, process it some and then traverse the graph to find the code snippets of the base file thats impacted and then also get all the downstream consumers of this particular method
extract code snippets from all respective files and conjoin them with the prompts defined in the utils/LLM_INTERFACE/llm_config.json and call the models
extract the outputs and display .. this can be emailed / sent on group messaging channels ( based on availability of APIs )

Short term roadmap

fine tuning LLaMa-3 70B model on the https://github.com/github/CodeSearchNet dataset for refined and accurate impact analysis
integrate Java parsers and all of its components ( for e.g. frameworks like Spring, JSF ) since a ****-ton of code in the world today is in Java
integrate with slack / discord or other powerful channels for dissipating notifications
integrating feedback module (yet to be designed; just have a basic idea) with RLHF components to ensure minimal FP's and FN's in the notifications generated by the system

Contributions

would love for people to reach out. Skill sets i am looking out for

dev's who have worked in complicated code bases that encompass atleast 2 or more programming languages OR same programming language but different frameworks
LLM fine tuning enthusiasts
UX designers ( to improve workflows )
shoot an email to [email protected] if you would like to contribute OR open issues , whatever floats your boat

Name		Name	Last commit message	Last commit date
Latest commit History 205 Commits
.github/workflows		.github/workflows
NEO4J		NEO4J
UX		UX
__pycache__		__pycache__
code_db		code_db
config		config
impact_analysis		impact_analysis
local-directory		local-directory
local_db		local_db
test_db/test_plans		test_db/test_plans
utils		utils
.import.swp		.import.swp
README.md		README.md
auto_puller.py		auto_puller.py
download_artefacts.sh		download_artefacts.sh
first_time_scan.py		first_time_scan.py
generated-files.zip		generated-files.zip
git_basic.sh		git_basic.sh
local_test_setup.sh		local_test_setup.sh
local_trigger_dependency_analysis.py		local_trigger_dependency_analysis.py
ownership_cfg.json		ownership_cfg.json
requirements.txt		requirements.txt
s3_utils.py		s3_utils.py
setup.sh		setup.sh
setup_only_env_variables.sh		setup_only_env_variables.sh
trigger_dependency_analysis.py		trigger_dependency_analysis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeSapper - pre-emptive analysis for a hassle free build !

Steps to use the app

Summary of critical components

AST parsers

Graph operations

LLM ops

Short term roadmap

Contributions

About

Releases

Packages

Contributors 3

Languages

takingstock/CodeSapper

Folders and files

Latest commit

History

Repository files navigation

CodeSapper - pre-emptive analysis for a hassle free build !

Steps to use the app

Summary of critical components

AST parsers

Graph operations

LLM ops

Short term roadmap

Contributions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages