We already have a Solr 8.2.0 (Lucene) in place which will substitute our production 6.2.0. The issue is fine tuning so we can return relevant searches for our music related website (production instance has better results than staging instance). We have about 2 million documents (lyrics, artists, albums, radio stations) being compiled in a Docker image through Gitlab CI.
What's the issue?
Implement a test step in the CI figuring out if the index and data have expected behavior. Some keywords must expect to be listed (e.g. "Imagine" should return Imagine Dragons (band) and Imagine (song from John Lennon) but not cover from other artists - we have popularity field). This script is half way completed. Therefore, fine tuning tokenization and filters should be needed. There's going to be a list of 50+ test keywords and expected behavior.
How's the system setup:
- A Gitlab CI process builds a docker image with the full index (weekly). This makes the deployment easier. Another project (update) will be added here later on.
- An Nginx perl module modifies search queries so it returns what's desired and protect Solr instance from outside exploit. This perl module shouldn't need fine tune.
We have a test folder with a couple of scripts that make it easier to test some strings and expected returns.
We need to:
- Move folder to src/test/
- Adjust script to run during the CI process (check .[login to view URL])
- Organize better $testMap with expected results (type song, artist, etc)
- Script should query localhost during the build process (check if it's possible do do via dgoss or a simpler process)
- Add [login to view URL] test for local ports (example)
- Script should test each query below and provide ANSI colors. An ANSI green color for good results and ANSI red color for bad results. If there's a bad result, go through all queries but fail in the final step, so the CI doesn't release the new Docker image (but we can debug all queries)
- Remove unnecessary files
- Remove PHP-Shared dependency - We have some functions being called by the build script. We need to copy these functions to a new local php lib, this way it makes it easier for building the project.
Things to consider:
- The script already returns human readable table-like format, it's good to keep this formatting so we can debug the pipeline
We (try to) focus on simple implementation, so look for better code readability
- You can test the CI process in your own branch by creating a merge request. Try to minimize the amount of time that it takes so you can debug faster and we don't run out of CI minutes (we might need to buy extra minutes)
- Make it easier to add new test queries, so we can have a better build/release process
- Compare results with production Solr (6.2.0) instance running in the URL [login to view URL]
If you have questions, please let me know.