Skip to content

This project seeks to apply machine learning algorithms to Android malware classification.

License

Notifications You must be signed in to change notification settings

mwleeds/android-malware-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Getting an API Key

AndroTotal has simplified the process for getting an API Key. Login/Create an Account at http://andrototal.org/ and you will then be able to view your profile settings. There is an API Tab which contains your key.

This repository contains a set of scripts to automate the process of gathering data from malware samples, training a machine learning model on that data, and plotting its classification accuracy.

  1. Make a copy of config-template.ini called config.ini and edit it.

  2. Ensure that the "tools" subdirectory has been initialized ("$ git submodule update --init tools")

  3. Either use get_samples.py to download samples or copy them into "all_apks" from another source. If you're using get_samples.py, you can monitor it in another shell by running watch "ls -l *.apk | wc -l"

  4. sort_malicious.py uses andrototal.org to sort them into "malicious_apk" and "benign_apk" folders. You can monitor it in another shell by running watch "ls -l benign_apk/*.apk | wc -l && ls -l malicious_apk/*.apk | wc -l"

  5. extract_apks_parallel.sh unpacks the .apk files into folders and processes some of the data therein. You can monitor it in another shell by running watch "wc -l benign_apk/valid_apks.txt; wc -l malicious_apk/valid_apks.txt"

  6. Run one of the following scripts to generate feature vectors:

    • parse_xml.py for permissions. "app_permission_vectors.json" is generated
    • parse_maline_output.py for syscalls. "app_syscall_vectors.json" is generated. You will have to run maline first for this to work.
    • parse_disassembled.py for API calls. "app_method_vectors.json" is generated
    • parse_ssdeep.py for fuzzy hashes. "app_hash_vectors.json" is generated. You will have to run ssdeep first for this to work.
    • combine_features.py for a combination of the top weighted features. "app_feature_vectors.json" is generated. This only works if you've previously trained a network on the specified features, and the feature weights files are named appropriately.
  7. Run $ run_trials.sh app_feature_vectors.json (or whichever json you want) which runs the tensorflow_learn.py script (where the ML happens) a number of times and puts the results into a folder. It also runs plot_data.py and match_features.py to create a plot and create a list of top weighted features, respectively.

  8. Change the parameters or input data and repeat step 6. It should be non-destructive so you can compare the results of different runs.

Note: If you want to use a SVM instead of a neural network, use sklearn_svm.py in place of tensorflow_learn.py. You can also use sklearn_tree.py to use a decision tree.

About

This project seeks to apply machine learning algorithms to Android malware classification.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published